plasTeX 3.0 — A Python Framework for Processing LaTeX Documents

2.1.5 Input and Output Files

If you have a renderer that only generates one file, specifying the output filename is simple: use the --filename option to specify the name. However, if the renderer you are using generates multiple files, things get more complicated. The --filename option is also capable of handling multiple names, as well as giving you a templating way to build filenames.

Below is a list of all of the options that affect filename generation.

Characters that shouldn’t be used in a filename


Command-Line Options: --bad-filename-chars=string
Config File: [ files ] bad-chars
Default: : #$%^&*!~‘“=?/[]()|<>;\,.
specifies all characters that should not be allowed in a filename. These characters will be replaced by the value in --bad-filename-chars-sub.

String to use in place of invalid characters


Command-Line Options: --bad-filename-chars-sub=string
Config File: [ files ] bad-chars-sub
Default: -
specifies a string to use in place of invalid filename characters ( specified by the --bad-chars-sub option)

Output Directory


Command-Line Options: --dir=directory or -d directory
Config File: [ files ] directory
Default: $jobname
specifies a directory name to use as the output directory.

Escaping characters higher than 7-bit


Command-Line Options: --escape-high-chars
Config File: [ files ] escape-high-chars
Default: False
some output types allow you to represent characters that are greater than 7-bits with an alternate representation to alleviate the issue of file encoding. This option indicates that these alternate representations should be used.

Note: The renderer is responsible for doing the translation into the alternate format. This might not be supported by all output types.

Template to use for output filenames


Command-Line Options: --filename=string
Config File: [ files ] filename
specifies the templates to use for generating filenames. The filename template is a list of space separated names. Each name in the list is returned once. An example is shown below.

index.html toc.html file1.html file2.html

If you don’t know how many files you are going to be reproducing, using static filenames like in the example above is not practical. For this reason, these filenames can also contain variables as described in Python’s string Templates (e.g. $title , $id ). Note that, if this option is configured on command line rather than in a configuration file, the dollar characters probably need to be protected. For instance bash would require single quote protection, as in plastex --filename='$id'. These variables come from the namespace created in the renderer and include:

  • $name , the name of the item (e.g. part, chapter or section),

  • $id , the ID (i.e. label) of the item,

  • $ref , the counter associated to the item (if it exists),

  • $title , the title of the item,

  • $jobname , the basename of the LaTeX file being processed.

One special variable is $num . This value in generated dynamically whenever a filename with $num  is requested. Each time a filename with $num  is successfully generated, the value of $num  is incremented.

The values of variables can also be modified by a format specified in parentheses after the variable. The format is simply an integer that specifies how wide of a field to create for integers (zero-padded), or, for strings, how many space separated words to limit the name to. The example below shows $num  being padded to four places and $title  being limited to five words.

sect$num(4) $title(5)

The list can also contain a wildcard filename (which should be specified last). Once a wildcard name is reached, it is used from that point on to generate the remaining filenames. The wildcard filename contains a list of alternatives to use as part of the filename indicated by a comma separated list of alternatives surrounded by a set of square brackets ([ ]). Each of the alternatives specified is tried until a filename is successfully created (i.e. all variables resolve). For example, the specification below creates three alternatives.

$jobname_[$id, $title, sect$num(4)]

The code above is expanded to the following possibilities.

$jobname_$id
$jobname_$title
$jobname_sect$num(4)

Each of the alternatives is attempted until one of them succeeds. In order for an alternative to succeed, all of the variables referenced in the template must be populated. For example, the $id  variable will not be populated unless the node had a \$label macro pointing to it. The $title  variable would not be populated unless the node had a title associated with it (e.g. such as section, subsection, etc.). Generally, the last one should contain no variables except for $num  as a fail-safe alternative.

The default value for this option is index [$id, sect$num(4)] which, assuming HTML output, will first generate a file index.html . Then, for each node triggering a file creation, it will try to use the node label. If no label exists, it will use sectN.html  where N  is the next available number (starting from one), padded to four digits. Of course the prefix sect  is chosen because the default value for split-level  is \(2\), which means generating a new file or each section.

As last example, one could use index $name-[$ref, sect$num(4)] . Assuming our document contains two chapters which each contain two sections (and using the LaTeXdefault numbering scheme and default plasTeXsplit level), we would get filenames index.html , chapter-1.html , section-1-1.html , section-1-2.html , chapter-2.html , section-2-1.html , section-2-2.html .

Input Encoding


Command-Line Options: --input-encoding=string
Config File: [ files ] input-encoding
Default: utf-8
specifies which encoding the LaTeX source file is in

Output Encoding


Command-Line Options: --output-encoding=string
Config File: [ files ] output-encoding
Default: utf-8
specifies which encoding the output files should use. Note: This depends on the output format as well. While HTML and XML use encodings, a binary format like MS Word, would not.

Splitting document into multiple files


Command-Line Options: --split-level=integer
Config File: [ files ] split-level
Default: 2
specifies the highest section level that generates a new file. Each section in a LaTeX document has a number associated with its hierarchical level. These levels are -2 for the document, -1 for parts, 0 for chapters, 1 for sections, 2 for subsections, 3 for subsubsections, 4 for paragraphs, and 5 for subparagraphs. A new file will be generated for every section in the hierarchy with a value less than or equal to the value of this option. This means that for the value of 2, files will be generated for the document, parts, chapters, sections, and subsections.

Log messages to file


Command-Line Options: --log
Config File: [ files ] log
Default: False
specifies whether log messages should be put into a file instead of printed.