plasTeX 3.0 — A Python Framework for Processing LaTeX Documents

3.3.1 Lists

Lists are normalized slightly more than the rest of the document. They are treated almost like sections in that they are only allowed to contain a minimal set of child node types. In fact, lists can only contain one type of child node: list item. The consequence of this is that any content before the first item in a list will be thrown out. In turn, list items will only contain paragraph nodes. The structure of all list structures will look like the structure in Figure 3.4.

\includegraphics[width=3in]{liststruct}
Figure 3.4 Normalized structure of all lists

This structure allows you to easily traverse a list with code like the following.

# Iterate through the items in the list node
for item in listnode:

    # Iterate through the paragraphs in each item
    for par in item:

        # Print the text content of each paragraph
        print par.textContent

    # Print a blank line to separate each item
    print