srctools.tokenizer
Parses text into groups of tokens.
This is used internally for parsing KV1, text DMX, FGDs, VMTs, etc. If available this will be replaced with a faster Cython-optimised version.
The BaseTokenizer
class implements various helper functions for navigating through the
token stream. The Tokenizer
class then takes text file objects, a full string or an
iterable of strings and actually parses it into tokens, while IterTokenizer
allows
transforming the stream before the destination receives it.
Once the tokenizer is created, either iterate over it or call the tokenizer to fetch the next
token/value pair. One token of lookahead is supported, accessed by the
BaseTokenizer.peek()
and BaseTokenizer.push_back()
methods. They also track
the current line number as data is read, letting you raise BaseTokenizer.error(...)
to easily
produce an exception listing the relevant line number and filename.
- exception srctools.tokenizer.TokenSyntaxError( )
An error that occurred when parsing a file.
Normally this is created via
BaseTokenizer.error()
which formats text into the error and includes the filename/line number from the tokenizer.The string representation will include the provided file and line number if present.
- file: str | _os.PathLike[str] | None
The filename of the file being parsed, or
None
if not known.
- srctools.tokenizer.BARE_DISALLOWED: Final = frozenset({'\t', '\n', '\r', ' ', '"', "'", '(', ')', '+', ',', ';', '=', '[', ']', '{', '}'})
Characters not allowed for bare strings. These must be quoted.
- class srctools.tokenizer.Token(
- value,
- names=None,
- *,
- module=None,
- qualname=None,
- type=None,
- start=1,
- boundary=None,
Bases:
Enum
A token type produced by the tokenizer.
- EOF = 0
Produced indefinitely after the end of the file is reached.
- STRING = 1
Quoted or unquoted text.
- NEWLINE = 2
Produced at the end of every line.
- PAREN_ARGS = 3
Parenthesised
(data)
.
- DIRECTIVE = 4
#name
(automatically casefolded).
- COMMENT = 5
A
//
or/* */
comment.
- BRACE_OPEN = 6
A
{
character.
- BRACE_CLOSE = 7
A
}
character.
- PROP_FLAG = 11
A
[!flag]
- BRACK_OPEN = 12
A
[
character. Only used ifPROP_FLAG
is not.
- BRACK_CLOSE = 13
A
]
character.
- COLON = 14
A
:
character.
- EQUALS = 15
A
=
character.
- PLUS = 16
A
+
character.
- COMMA = 17
A
,
character.
- has_value
If true, this type has an associated value.
- class srctools.tokenizer.BaseTokenizer(
- filename: str | _os.PathLike[str] | None,
- error: Type[TokenSyntaxError],
Provides an interface for processing text into tokens.
It then provides tools for using those to parse data. This is an
abc.ABC
, a subclass must be used to provide a source for the tokens.- error( ) TokenSyntaxError
Raise a syntax error exception.
This returns the
TokenSyntaxError
instance, with line number and filename attributes filled in. The message can be aToken
with the associated string value to produce a wrong token error, or a string which will be {}-formatted with the positional args if they are present.
- push_back( ) None
Return a token, so it will be reproduced when called again.
Only one token can be pushed back at once. The value is required for
Token.STRING
,PAREN_ARGS
andPROP_FLAG
, but ignored for other token types.
- block( ) Iterator[str]
Helper iterator for parsing keyvalue style blocks.
This will first consume a
{
. Then it will skip newlines, and output each string section found. When}
is found it terminates, anything else produces an appropriate error. This is safely re-entrant, and tokens can be taken or put back as required.
- class srctools.tokenizer.Tokenizer(data: str | ~typing.Iterable[str], filename: str | _os.PathLike[str] | None = None, error: ~typing.Type[~srctools.tokenizer.TokenSyntaxError] = <class 'srctools.tokenizer.TokenSyntaxError'>, *, string_bracket: bool = False, allow_escapes: bool = True, allow_star_comments: bool = False, preserve_comments: bool = False, colon_operator: bool = False)
Processes text data into groups of tokens.
This mainly groups strings and removes comments.
Due to many inconsistencies in Valve’s parsing of files, several options are available to control whether different syntaxes are accepted:
string_bracket parses [bracket] blocks as a single string-like block. If disabled these are parsed as
BRACK_OPEN
,STRING
thenBRACK_CLOSE
.allow_escapes controls whether
\n
-style escapes are expanded.allow_star_comments if enabled allows
/* */
comments.preserve_comments causes
Token.COMMENT
tokens to be produced.colon_operator controls if
:
producesCOLON
tokens, or is treated as a bare string.
- class srctools.tokenizer.IterTokenizer(source: ~typing.Iterable[~typing.Tuple[~srctools.tokenizer.Token, str]], filename: str | _os.PathLike[str] = '', error: ~typing.Type[~srctools.tokenizer.TokenSyntaxError] = <class 'srctools.tokenizer.TokenSyntaxError'>)
Wraps a token iterator to provide the tokenizer interface.
This is useful to pre-process a token stream before parsing it with other code.