Liblouis - An open-source braille translator and back-translator.

Next: Character-Class Opcodes, Previous: Special Processing Opcodes, Up: How to Write Translation Tables [Contents][Index]

2.9 Translation Opcodes ¶

These opcodes define the braille representations for character sequences. Each of them defines an entry within the contraction table. These entries may be defined in any order except, as noted below, when they define alternate representations for the same character sequence.

Each of these opcodes specifies a condition under which the translation is legal, and each also has a characters operand and a dots operand. The text being translated is processed strictly from left to right, character by character, with the most eligible entry for each position being used. If there is more than one eligible entry for a given position in the text, then the one with the longest character string is used. If there is more than one eligible entry for the same character string, then the one defined first is is tested for legality first. (This is the only case in which the order of the entries makes a difference.)

The characters operand is a sequence or string of characters preceded and followed by whitespace. Each character can be entered in the normal way, or it can be defined as a four-digit hexadecimal number preceded by ‘\x’.

The dots operand defines the braille representation for the characters operand. It may also be specified as an equals sign (‘=’). This means that the the default representation for each character (see Character-Definition Opcodes) within the sequence is to be used. It is an error if not all the characters in the rule have been previously defined in a character-definition rule. Note that the ‘=’ shortcut for dot patterns has a known bug¹ that might cause problems when back-translating.

In what follows the word ‘characters’ means a sequence of one or more consecutive letters between spaces and/or punctuation marks.

noback opcode ...

This is an opcode prefix, that is to say, it modifies the operation of the opcode that follows it on the same line. noback specifies that back-translation is not to use information on this line.

noback always ;\s; 0

nofor opcode ...

This is an opcode prefix which modifies the operation of the opcode following it on the same line. nofor specifies that forward translation is not to use the information on this line.

nocross opcode characters ...

nocross is an opcode prefix which modifies the operation of the opcode following it. The following opcode can be always opcode (see always) or any other translation opcode listed below with characters as the first operand. nocross specifies that the operation should not be done when the characters are not all in one syllable (when they cross a syllable boundary). For this opcode to work, a hyphenation table must be included. If this is not done, nocross is ignored. For example, if the English Grade 2 table is being used and the appropriate hyphenation table has been included nocross always sh 146 will cause the ‘sh’ in ‘monkshood’ not to be contracted.

compbrl characters

If the characters are found within a block of text surrounded by whitespace the entire block is translated according to the default braille representations defined by the Character-Definition Opcodes, if 8-dot computer braille is enabled or according to the dot patterns given in the comp6 opcode (see comp6), if 6-dot computer braille is enabled. For example:

compbrl www translate URLs in computer braille

comp6 character dots

This opcode specifies the translation of characters in 6-dot computer braille. The first operand must be a single character. The second operand may specify as many cells as necessary. The opcode is somewhat of a misnomer, since any dots, not just dots 1 through 6, can be specified. This even includes virtual dots (see virtual dots).

nocont characters

Like compbrl, except that the string is uncontracted. prepunc opcode (see prepunc) and postpunc opcode (see postpunc) rules are applied, however. This is useful for specifying that foreign words should not be contracted in an entire document.

replace characters {characters}

Replace the first set of characters, no matter where they appear, with the second. Note that the second operand is NOT a dot pattern. It is also optional. If it is omitted the character(s) in the first operand will be discarded. This is useful for ignoring characters. It is possible that the "ignored" characters may still affect the translation indirectly. Therefore, it is preferable to use correct opcode (see correct).

always characters dots

Replace the characters with the dot pattern no matter where they appear. Do NOT use an entry such as always a 1. Use the character definition opcodes instead. For example:

always world 456-2456 unconditional translation

repeated characters dots

Replace the characters with the dot pattern no matter where they appear. Ignore any consecutive repetitions of the same character sequence. This is useful for shortening long strings of spaces or hyphens or periods. For example:

repeated --- 36-36-36 shorten separator lines made with hyphens

repword characters dots

For some braille systems there is the requirement to remove repeated words which are connected by some character. In Malaysian braille for example you want to contract as follows:

Text:: ‘tasik-tasik’
Grade 1:: ‘2345-1-234-24-13-36-2345-1-234-24-13’
Grade 2:: ‘2345-1-234-24-13-123456’

The dash and the second occurrence of ‘tasik’ is replaced with dots ‘123456’. To achieve this you can use the repword opcode as follows:

repword - 123456

rependword characters dots, dots

Like the repword opcode (see repword), but for indicating a repetition of a string at the end of a word. When characters are encountered check to see if a part of the word before these characters (but not the whole word) matches the string after them. If so, insert the first dot pattern at the position where the part of the word started, replace characters with the second dot pattern and eliminate the repeated string and any string following another occurrence of characters that is the same. This opcode is used in Malaysian braille. In this case the rule is:

rependword - 25,123456

largesign characters dots

Replace the characters with the dot pattern no matter where they appear. In addition, if two words defined as large signs follow each other, remove the space between them. For example, in en-us-g2.ctb the words ‘and’ and ‘the’ are both defined as large signs. Thus, in the phrase ‘the cat and the dog’ the space would be deleted between ‘and’ and ‘the’, with the result ‘the cat andthe dog’. Of course, ‘and’ and ‘the’ would be properly contracted. The term largesign is a bit of braille jargon that pleases braille experts.

word characters dots

Replace the characters with the dot pattern if they are a word, that is, are surrounded by whitespace and/or punctuation.

syllable characters dots

As its name indicates, this opcode defines a "syllable" which must be represented by exactly the dot patterns given. Contractions may not cross the boundaries of this "syllable" either from left or right. The character string defined by this opcode need not be a lexical syllable, though it usually will be. The equal sign in the following example means that the the default representation for each character within the sequence is to be used (see Translation Opcodes):

syllable horse = sawhorse, horseradish

joinword characters dots

Replace the characters with the dot pattern if they are a word which is followed by whitespace and a letter. In addition remove the whitespace. For example, en-us-g2.ctb has joinword to 235. This means that if the word ‘to’ is followed by another word the contraction is to be used and the space is to be omitted. If these conditions are not met, the word is translated according to any other opcodes that may apply to it.

lowword characters dots

Replace the characters with the dot pattern if they are a word preceded and followed by whitespace. No punctuation either before or after the word is allowed. The term lowword derives from the fact that in English these contractions are written in the lower part of the cell. For example:

lowword were 2356

contraction characters

If you look at en-us-g2.ctb you will see that some words are actually contracted into some of their own letters. A famous example among braille transcribers is ‘also’, which is contracted as ‘al’. But this is also the name of a person. To take another example, ‘altogether’ is contracted as ‘alt’, but this is the abbreviation for the alternate key on a computer keyboard. Similarly ‘could’ is contracted into ‘cd’, but this is the abbreviation for compact disk. To prevent confusion in such cases, the letter sign (see letsign opcode (see letsign)) is placed before such letter combinations when they actually are abbreviations, not contractions. The contraction opcode tells the translator to do this.

sufword characters dots

Replace the characters with the dot pattern if they are either a word or at the beginning of a word.

prfword characters dots

Replace the characters with the dot pattern if they are either a word or at the end of a word.

begword characters dots

Replace the characters with the dot pattern if they are at the beginning of a word.

begmidword characters dots

Replace the characters with the dot pattern if they are either at the beginning or in the middle of a word.

midword characters dots

Replace the characters with the dot pattern if they are in the middle of a word.

midendword characters dots

Replace the characters with the dot pattern if they are either in the middle or at the end of a word.

endword characters dots

Replace the characters with the dot pattern if they are at the end of a word.

partword characters dots

Replace the characters with the dot pattern if the characters are anywhere in a word, that is, if they are preceded or followed by a letter.

exactdots @dots

Note that the operand must begin with an at sign (‘@’). The dot pattern following it is evaluated for validity. If it is valid, whenever an at sign followed by this dot pattern appears in the source document it is replaced by the characters corresponding to the dot pattern in the output. This opcode is intended for use in liblouisutdml semantic-action files to specify exact dot patterns, as in mathematical codes. For example:

exactdots @4-46-12356

will produce the characters with these dot patterns in the output.

prepunc characters dots

Replace the characters with the dot pattern if they are part of punctuation at the beginning of a word.

postpunc characters dots

Replace the characters with the dot pattern if they are part of punctuation at the end of a word.

begnum characters dots

Replace the characters with the dot pattern if they are at the beginning of a number, that is, before all its digits. For example, in en-us-g1.ctb we have begnum # 4.

midnum characters dots

Replace the characters with the dot pattern if they are in the middle of a number. For example, en-us-g1.ctb has midnum . 46. This is because the decimal point has a different dot pattern than the period.

endnum characters dots

Replace the characters with the dot pattern if they are at the end of a number. For example en-us-g1.ctb has endnum th 1456. This handles things like ‘4th’. A letter sign is NOT inserted.

joinnum characters dots

Replace the characters with the dot pattern. In addition, if whitespace and a number follows omit the whitespace. This opcode can be used to join currency symbols to numbers for example:

joinnum \x20AC 15 (EURO SIGN)
joinnum \x0024 145 (DOLLAR SIGN)
joinnum \x00A3 1234 (POUND SIGN)
joinnum \x00A5 13456 (YEN SIGN)

Footnotes

(1)

See https://github.com/liblouis/liblouis/issues/500#issuecomment-365753137.

Next: Character-Class Opcodes, Previous: Special Processing Opcodes, Up: How to Write Translation Tables [Contents][Index]

Liblouis User's and Programmer's Manual

2.9 Translation Opcodes ¶

Footnotes

(1)