Skip to content

2.1 Overview

Many translation (contraction) tables have already been made up. They are included in the distribution in the tables directory and can be studied as part of the documentation. Some of the more helpful (and normative) are listed in the following table:

chardefs.cti

Character definitions for U.S. tables

compress.ctb

Remove excessive whitespace

en-us-g1.ctb

Uncontracted American English

en-us-g2.ctb

Contracted or Grade 2 American English

en-us-brf.dis

Make liblouis output conform to BRF standard

en-us-comp8.ctb

8-dot computer braille for use in coding examples

en-us-comp6.ctb

6-dot computer braille

The names used for files containing translation tables are completely arbitrary. They are not interpreted in any way by the translator. Contraction tables may be 8-bit ASCII files, UTF-8, 16-bit big-endian Unicode files or 16-bit little-endian Unicode files. Blank lines are ignored. Any leading and trailing whitespace (any number of blanks and/or tabs) is ignored. Lines which begin with a number sign or hatch mark (‘#’) are ignored, i.e. they are comments. If the number sign is not the first non-blank character in the line, it is treated as an ordinary character. If the first non-blank character is less-than (‘<’) the line is also treated as a comment. This makes it possible to mark up tables as xhtml documents. Lines which are not blank or comments define table entries. The general format of a table entry is:

opcode operands comments

Table entries may not be split between lines. The opcode is a mnemonic that specifies what the entry does. The operands may be character sequences, braille dot patterns or occasionally something else. They are described for each opcode, please see Opcode Index. With some exceptions, opcodes expect a certain number of operands. Any text on the line after the last operand is ignored, and may be a comment. A few opcodes accept a variable number of operands. In this case a number sign (‘#’) begins a comment unless it is preceded by a backslash (‘\’).

Here are some examples of table entries.

# This is a comment.
always world 456-2456 A word and the dot pattern of its contraction

Most opcodes have both a "characters" operand and a "dots" operand, though some have only one and a few have other types.

The characters operand consists of any combination of characters and escape sequences proceeded and followed by whitespace. Escape sequences are used to represent difficult characters. They begin with a backslash (‘\’). They are:

\

backslash

\f

form feed

\n

new line

\r

carriage return

\s

blank (space)

\t

horizontal tab

\v

vertical tab

\e

"escape" character (hex 1b, dec 27)

\xhhhh

4-digit hexadecimal value of a character

If liblouis has been compiled for 32-bit Unicode the following are also recognized.

\yhhhhh

5-digit (20 bit) character

\zhhhhhhhh

Full 32-bit value.

Please take a look at the public directory of the Unicode Character Database as well as at the Unicode names list with their code points to figure out the corresponding Unicode code point for a given Unicode character.

The dots operand is a braille dot pattern. The real braille dots, 1 through 8, must be specified with their standard numbers.

liblouis recognizes virtual dots, which are used for special purposes, such as distinguishing accent marks. There are seven virtual dots. They are specified by the number 9 and the letters ‘a’ through ‘f’.

For a multi-cell dot pattern, the cell specifications must be separated from one another by a dash (‘-’). For example, the contraction for the English word ‘lord’ (the letter ‘l’ preceded by dot 5) would be specified as ‘5-123’. A space may be specified with the special dot number 0.

An opcode which is helpful in writing translation tables is include. Its format is:

include filename

It reads the file indicated by filename and incorporates or includes its entries into the table. Included files can include other files, which can include other files, etc. For an example, see what files are included by the entry include en-us-g1.ctb in the table en-us-g2.ctb. If the included file is not in the same directory as the main table, use a full path name for filename. Tables can also be specified in a table list, in which the table names are separated by commas and given as a single table name in calls to the translation functions.

The order of the various types of opcodes or table entries is important. Character-definition opcodes should come first. However, if the optional display opcode (see display) is used it should precede character-definition opcodes. Braille-indicator opcodes should come next. Translation opcodes should follow. The context opcode (see context) is a translation opcode, even though it is considered along with the multipass opcodes. These latter should follow the translation opcodes. The correct opcode (see correct) can be used anywhere after the character-definition opcodes, but it is probably a good idea to group all correct opcodes together. The include opcode (see include) can be used anywhere, but the order of entries in the combined table must conform to the order given above. Within each type of opcode, the order of entries is generally unimportant. Thus the translation entries can be grouped alphabetically or in any other order that is convenient. Hyphenation tables may be specified either with an include opcode or as part of a table list. They should come after everything else.