Skip to content

2.10 Character-Class Opcodes

These opcodes define and use character classes, also known as character attributes. A character class associates a set of characters with a name. The name then refers to any character within the class. A character may belong to more than one class.

The basic character classes correspond to the character definition opcodes. These classes are:

space

Whitespace characters such as blank and tab

digit

Numeric characters

letter

Both uppercase and lowercase alphabetic characters

lowercase

Lowercase alphabetic characters

uppercase

Uppercase alphabetic characters

punctuation

Punctuation marks

sign

Signs such as percent (‘%’)

math

Mathematical symbols

litdigit

Literary digit

The opcodes which define and use character classes are shown below. For examples see el.ctb.

attribute name characters

Add characters to a character class. The class may be one of the predefined classes listed above, a user-defined class previously created with this opcode, or a new one. The name operand must contain only letters (a-z and A-Z, case matters). For historical reasons and to support the match opcode (see match) it can also be a number between 0 and 7. The characters operand must be specified as a string. Each character in the string, as well as its dot counterpart if it occupies a single cell, is added to the character class.

A user-defined character class may not be used in other rules until it has been created with a attribute rule. Numbered classes can be used in match rules (see match). Named classes can be used with the before and after opcodes and in context and multipass rules (see The Context and Multipass Opcodes).

after class opcode ...

The specified opcode is further constrained in that the matched character sequence must be immediately preceded by a character belonging to the specified class. If this opcode is used more than once on the same line then the union of the characters in all the classes is used.

before class opcode ...

The specified opcode is further constrained in that the matched character sequence must be immediately followed by a character belonging to the specified class. If this opcode is used more than once on the same line then the union of the characters in all the classes is used.