Skip to content

2.12 The Context and Multipass Opcodes

The context and multipass opcodes (pass2, pass3 and pass4) provide translation capabilities beyond those of the basic translation opcodes (see Translation Opcodes) discussed previously. The multipass opcodes cause additional passes to be made over the string to be translated. The number after the word pass indicates in which pass the entry is to be applied. If no multipass opcodes are given, only the first translation pass is made. The context opcode is basically a multipass opcode for the first pass. It differs slightly from the multipass opcodes per se. When back-translating, the passes are performed in the reverse order, i.e. pass4, pass3, pass2, context. Each of these opcodes must be prefixed by either the noback opcode (see noback) or the nofor opcode (see nofor). The format of all these opcodes is opcode test action. The specific opcodes are invoked as follows:

context test action
pass2 test action
pass3 test action
pass4 test action

The test and action operands have suboperands. Each suboperand begins with a non-alphanumeric character and ends when another non-alphanumeric character is encountered. The suboperands and their initial characters are as follows.

" (double quote)

a string of characters. This string must be terminated by another double quote. It may contain any characters. If a double quote is needed within the string, it must be preceded by a backslash (‘\’). If a space is needed, it must be represented by the escape sequence \s. This suboperand is valid in the test and action parts of the correct opcode, in the test part of the context opcode when forward translating, and in the action part of the context opcode when back translating.

@ (at sign)

a sequence of dot patterns. Cells are separated by hyphens as usual. This suboperand is valid in the test and action parts of the pass2, pass3, and pass4 opcodes, in the action part of the context opcode when forward translating, and in the test part of the context opcode when back translating.

` (accent mark)

If this is the beginning of the string being translated this suboperand is true. It is valid only in the test part and must be the first thing in this operand.

~ (tilde)

If this is the end of the string being translated this suboperand is true. It is valid only in the test part and must be the last thing in this operand.

$ (dollar sign)

a string of attributes, such as ‘d’ for digit, ‘l’ for letter, etc. For a list of all valid attributes see valid attribute characters. More than one attribute can be given. If you wish to check characters with any attribute, use the letter ‘a’. Input characters are checked to see if they have at least one of the attributes. The attribute string can be followed by numbers specifying how many characters are to be checked. If no numbers are given, 1 is assumed. If two numbers separated by a hyphen are given, the input is checked to make sure that at least the first number of characters with the attributes are present, but no more than the second number. If only one number is present, then exactly that many characters must have the attributes. A period instead of the numbers indicates an indefinite number of characters (for technical reasons the number of characters that are actually matched is limited to 65535).

This suboperand is valid in all test parts but not in action parts. For the characters which can be used in attribute strings, see the following table.

! (exclamation point)

reverses the logical meaning of the suboperand which follows. For example, !$d is true only if the character is NOT a digit. This suboperand is valid in test parts only.

% (percent sign)

the name of a character class, predefined or defined using the attribute opcode (see attribute), or the name of a swap set defined by the swap opcodes (see Swap Opcodes). Names must contain only letters (a-z and A-Z). The letters may be upper or lower-case but the case matters. Character class names may be used in test parts only. Swap names are valid everywhere.

{ (left brace)

Name: the name of a grouping pair. The left brace indicates that the first (or left) member of the pair is to be used in matching. If this is between replacement brackets it must be the only item. This is also valid in the action part.

The brace actions, {name and }name, refer to named groupings. A grouping is created with the grouping opcode (see grouping) and contains exactly two characters which represent the opening character and the matching closing character for a character grouping. The first operand is the grouping name, the second is the two (opening and closing) characters, and the third is the two dot patterns separated by a comma.

Let’s say that you’d like to define the opening and closing parentheses via multipass rules, and that you’d like to use dots ‘123478’ for the opening parenthesis and dots ‘145678’ for the closing parenthesis. One way to do so is like this:

grouping parentheses () 123478,145678
noback correct {parentheses {parentheses
noback correct }parentheses }parentheses

The references within the test part of the multipass rule match against the characters (the second operand) of the grouping rule, and the references within the action part replace with the dot patterns (the third operand) of the grouping.

} (right brace)

Name: the name of a grouping pair. The right brace indicates that the second (or right) member is to be used in matching. See the remarks on the left brace immediately above.

/ (slash)

Search the input for the expression following the slash and return true if found. This can be used to set a variable.

_ (underscore)

Move backward. If a number follows, move backward that number of characters. The default is to move backward one character. This suboperand is valid only in test parts. The test fails if moving backward beyond the beginning of the input string.

[ (left bracket)

Start replacement here. This suboperand must always be paired with a right bracket and is valid only in test parts. Multiple pairs of square brackets in a single expression are not allowed.

] (right bracket)

End replacement here. This suboperand must always be paired with a left bracket and is valid only in test parts.

# (number sign or crosshatch)

Test or set a variable. Variables are referred to by numbers (0 through 49), e.g. #1, #2, #25. Variables may be set by one context or multipass opcode and tested by another. Thus, an operation that occurs at one place in a translation can tell an operation that occurs later within the same pass about itself. This feature is used in math translation, and may also help to alleviate the need for new opcodes. This suboperand is valid everywhere.

Variables are set in the action part. To set a variable, use an expression like #1=1. All of the variables are initialized to 0 at the start of each pass.

Variables can also be incremented and decremented by one in the action part with expressions like #1+ and #3- respectively. An attempt to decrement a variable below 0 is silently ignored.

Variables are tested in the test part with conditional expressions like: #1=2, #3<4, #5>6, #7<=8, #9>=10.

* (asterisk)

Copy the input characters or dot patterns within the replacement brackets into the output, and discard anything else that was matched. If there are no replacement brackets then copy all of the matched input. This suboperand is only valid within the action part. It may be specified any number of times. This feature is used, for example, for handling numeric subscripts in Nemeth.

? (question mark)

Valid only in the action part. The characters to be replaced are simply ignored. That is, they are replaced with nothing. If either member of a grouping pair is in the replace brackets the other member at the same level is also removed.

The valid characters which can be used in attribute strings are as follows:

a

any attribute

d

digit

D

literary digit

l

letter

m

math

p

punctuation

S

sign

s

space

U

uppercase

u

lowercase

w

first user-defined character class

x

second user-defined character class

y

third user-defined character class

z

fourth user-defined character class

The following illustrates the algorithm how text is evaluated with multipass expressions:

Loop over context, pass2, pass3 and pass4 and do the following for each pass:

  1. Match the text following the cursor against all expressions in the current pass. If an expression has square brackets to indicate the part to be replaced, and the opening bracket would correspond with a position before the cursor, it is not a match.
  2. If there is no match: shift the cursor one position to the right and continue the loop
  3. If there are matches: choose the longest match
  4. Do the replacement. If the expression has square brackets, the part of the input that matches the part in between the brackets is replaced with the right-hand side of the rule. If the expression has no square brackets, the whole match is replaced.
  5. Place the cursor after the replaced text
  6. continue loop

Normally, when a rule is applied, the characters in the input that the rule applies to are "consumed", i.e. the position of the input string is stepped forward, and the characters are no longer available for subsequent rules. However, with the multipass opcodes, the context opcode (see context) opcode and the correct opcode (see correct) opcode, it is possible to make rules which don’t consume any characters from the input. This could happen, e.g. if you use the context opcode (see context) opcode to insert a dot pattern before a special group of characters. In these cases, Liblouis will always advance the position by one character to make sure that the program doesn’t apply a rule to the same characters again and again.