srctools.tokenizer
Parses text into groups of tokens.
This is used internally for parsing KV1, text DMX, FGDs, VMTs, etc. If available this will be replaced with a faster Cython-optimised version.
The BaseTokenizer class implements various helper functions for navigating through the
token stream. The Tokenizer class then takes text file objects, a full string or an
iterable of strings and actually parses it into tokens, while IterTokenizer allows
transforming the stream before the destination receives it.
Once the tokenizer is created, either iterate over it or call the tokenizer to fetch the next
token/value pair. One token of lookahead is supported, accessed by the
BaseTokenizer.peek() and BaseTokenizer.push_back() methods. They also track
the current line number as data is read, letting you raise BaseTokenizer.error(...) to easily
produce an exception listing the relevant line number and filename.
- exception srctools.tokenizer.TokenSyntaxError( )
An error that occurred when parsing a file.
Normally this is created via
BaseTokenizer.error()which formats text into the error and includes the filename/line number from the tokenizer.The string representation will include the provided file and line number if present.
- file: str | _os.PathLike[str] | None
The filename of the file being parsed, or
Noneif not known.
- srctools.tokenizer.BARE_DISALLOWED: Final = frozenset({'\t', '\n', '\r', ' ', '"', "'", '(', ')', ',', ';', '=', '[', ']', '{', '}'})
Characters not allowed for bare strings. These must be quoted.
- class srctools.tokenizer.Token(
- value,
- names=None,
- *,
- module=None,
- qualname=None,
- type=None,
- start=1,
- boundary=None,
Bases:
EnumA token type produced by the tokenizer.
- EOF = 0
Produced indefinitely after the end of the file is reached.
- STRING = 1
Quoted or unquoted text.
- NEWLINE = 2
Produced at the end of every line.
- PAREN_ARGS = 3
Parenthesised
(data).
- DIRECTIVE = 4
#name(automatically casefolded).
- COMMENT = 5
A
//or/* */comment.
- BRACE_OPEN = 6
A
{character.
- BRACE_CLOSE = 7
A
}character.
- PROP_FLAG = 11
A
[!flag]
- BRACK_OPEN = 12
A
[character. Only used ifPROP_FLAGis not.
- BRACK_CLOSE = 13
A
]character.
- COLON = 14
A
:character, ifcolon_operatoris enabled.
- EQUALS = 15
A
=character.
- PLUS = 16
A
+character, ifTokenizer.plus_operatoris enabled.
- COMMA = 17
A
,character.
- has_value
If true, this type has an associated value.
- class srctools.tokenizer.BaseTokenizer(
- filename: str | _os.PathLike[str] | None,
- error: Type[TokenSyntaxError],
Provides an interface for processing text into tokens.
It then provides tools for using those to parse data. This is an
abc.ABC, a subclass must be used to provide a source for the tokens.- filename: str | None
The filename that is being parsed. This is passed along to the error class, to produce relevant errors.
- error_type: Type[TokenSyntaxError]
The exception class to produce if an error occurs. This must be a subtype of
TokenSyntaxError, since it is passed the line number and filename in addition to the error message. Theerror()method can be used to intelligently construct an instance to raise.
- line_num: int
The line number of the last token. Can be changed, but is automatically updated whenever
Token.NEWLINEtokens are seen.
- error( ) TokenSyntaxError
Raise a syntax error exception.
This returns the
TokenSyntaxErrorinstance, with line number and filename attributes filled in. The message can be aTokenwith the associated string value to produce a wrong token error, or a string which will be {}-formatted with the positional args if they are present.
- push_back( ) None
Return a token, so it will be reproduced when called again.
Only one token can be pushed back at once. The value is required for
Token.STRING,PAREN_ARGSandPROP_FLAG, but ignored for other token types.
- block( ) Iterator[str]
Helper iterator for parsing keyvalue style blocks.
This will first consume a
{. Then it will skip newlines, and output each string section found. When}is found it terminates, anything else produces an appropriate error. This is safely re-entrant, and tokens can be taken or put back as required.
- class srctools.tokenizer.Tokenizer(data: str | ~typing.Iterable[str], filename: str | _os.PathLike[str] | None = None, error: ~typing.Type[~srctools.tokenizer.TokenSyntaxError] = <class 'srctools.tokenizer.TokenSyntaxError'>, *, string_bracket: bool = False, allow_escapes: bool = True, allow_star_comments: bool = False, preserve_comments: bool = False, colon_operator: bool = False, plus_operator: bool = False)
Processes text data into groups of tokens.
This mainly groups strings and removes comments.
Due to many inconsistencies in Valve’s parsing of files, several options are available to control whether different syntaxes are accepted.
- string_bracket: bool
If set, [bracket] blocks are parsed as a single string-like block. If disabled these are parsed as
BRACK_OPEN,STRINGthenBRACK_CLOSE.
- allow_star_comments: bool
If enabled, this allows
/* */comments. Otherwise, an immediate error is produced.
- colon_operator: bool
This controls whether
:producesCOLONtokens, or is treated as part of a bare string.
- plus_operator: bool
This controls whether
+producesPLUStokens, or is treated as part of a bare string.
- preserve_comments: bool
Token.COMMENTare produced if this is set.
- class srctools.tokenizer.IterTokenizer(source: ~typing.Iterable[~typing.Tuple[~srctools.tokenizer.Token, str]], filename: str | _os.PathLike[str] = '', error: ~typing.Type[~srctools.tokenizer.TokenSyntaxError] = <class 'srctools.tokenizer.TokenSyntaxError'>)
Wraps a token iterator to provide the tokenizer interface.
This is useful to pre-process a token stream before parsing it with other code.