Liblouis User's and Programmer's Manual
2.1 Overview ¶
Many translation (contraction) tables have already been made up. They are included in the distribution in the tables directory and can be studied as part of the documentation. Some of the more helpful (and normative) are listed in the following table:
- chardefs.cti
Character definitions for U.S. tables
- compress.ctb
Remove excessive whitespace
- en-us-g1.ctb
Uncontracted American English
- en-us-g2.ctb
Contracted or Grade 2 American English
- en-us-brf.dis
Make liblouis output conform to BRF standard
- en-us-comp8.ctb
8-dot computer braille for use in coding examples
- en-us-comp6.ctb
6-dot computer braille
The names used for files containing translation tables are completely arbitrary. They are not interpreted in any way by the translator. Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian Unicode files or 16-bit little-endian Unicode files. Blank lines are ignored. Any leading and trailing whitespace (any number of blanks and/or tabs) is ignored. Lines which begin with a number sign or hatch mark (‘#’) are ignored, i.e. they are comments. If the number sign is not the first non-blank character in the line, it is treated as an ordinary character. If the first non-blank character is less-than (‘<’) the line is also treated as a comment. This makes it possible to mark up tables as xhtml documents. Lines which are not blank or comments define table entries. The general format of a table entry is:
opcode operands comments
Table entries may not be split between lines. The opcode is a mnemonic that specifies what the entry does. The operands may be character sequences, braille dot patterns or occasionally something else. They are described for each opcode, please see Opcode Index. With some exceptions, opcodes expect a certain number of operands. Any text on the line after the last operand is ignored, and may be a comment. A few opcodes accept a variable number of operands. In this case a number sign (‘#’) begins a comment unless it is preceded by a backslash (‘\’).
Here are some examples of table entries.
# This is a comment. always world 456-2456 A word and the dot pattern of its contraction
Most opcodes have both a "characters" operand and a "dots" operand, though some have only one and a few have other types.
The characters operand consists of any combination of characters and escape sequences proceeded and followed by whitespace. Escape sequences are used to represent difficult characters. They begin with a backslash (‘\’). They are:
- \
backslash
- \f
form feed
- \n
new line
- \r
carriage return
- \s
blank (space)
- \t
horizontal tab
- \v
vertical tab
- \e
"escape" character (hex 1b, dec 27)
- \xhhhh
4-digit hexadecimal value of a character
If liblouis has been compiled for 32-bit Unicode the following are also recognized.
- \yhhhhh
5-digit (20 bit) character
- \zhhhhhhhh
Full 32-bit value.
Please take a look at the public directory of the Unicode Character Database as well as at the Unicode names list with their code points to figure out the corresponding Unicode code point for a given Unicode character.
The dots operand is a braille dot pattern. The real braille dots, 1 through 8, must be specified with their standard numbers.
liblouis recognizes virtual dots, which are used for special purposes, such as distinguishing accent marks. There are seven virtual dots. They are specified by the number 9 and the letters ‘a’ through ‘f’.
For a multi-cell dot pattern, the cell specifications must be separated from one another by a dash (‘-’). For example, the contraction for the English word ‘lord’ (the letter ‘l’ preceded by dot 5) would be specified as ‘5-123’. A space may be specified with the special dot number 0.
An opcode which is helpful in writing translation tables is
include
. Its format is:
include filename
It reads the file indicated by filename
and incorporates or
includes its entries into the table. Included files can include other
files, which can include other files, etc. For an example, see what
files are included by the entry include en-us-g1.ctb
in the table
en-us-g2.ctb. If the included file is not in the same directory
as the main table, use a full path name for filename. Tables can also be
specified in a table list, in which the table names are separated by
commas and given as a single table name in calls to the translation
functions.
The order of the various types of opcodes or table entries is
important. Character-definition opcodes should come first. However, if
the optional display
opcode (see display
) is used it should precede
character-definition opcodes. Braille-indicator opcodes should come
next. Translation opcodes should follow. The context
opcode (see context
) is a
translation opcode, even though it is considered along with the
multipass opcodes. These latter should follow the translation opcodes.
The correct
opcode (see correct
) can be used anywhere after the
character-definition opcodes, but it is probably a good idea to group
all correct
opcodes together. The include
opcode (see include
) can be
used anywhere, but the order of entries in the combined table must
conform to the order given above. Within each type of opcode, the
order of entries is generally unimportant. Thus the translation
entries can be grouped alphabetically or in any other order that is
convenient. Hyphenation tables may be specified either with an
include
opcode or as part of a table list. They should come after
everything else.