bpo-25324: Move the description of tokenize tokens to token.rst. (#1911) · python/cpython@5cefb6c (original) (raw)
`@@ -17,7 +17,7 @@ as well, making it useful for implementing "pretty-printers," including
`
17
17
`colorizers for on-screen displays.
`
18
18
``
19
19
`` To simplify token stream handling, all :ref:operators
and :ref:delimiters
``
20
``
`` -
tokens are returned using the generic :data:token.OP
token type. The exact
``
``
20
`` +
tokens are returned using the generic :data:~token.OP
token type. The exact
``
21
21
``` type can be determined by checking the exact_type
property on the
`22`
`22`
`` :term:`named tuple` returned from :func:`tokenize.tokenize`.
``
`23`
`23`
``
`` @@ -44,7 +44,7 @@ The primary entry point is a :term:`generator`:
``
`44`
`44`
``
`45`
`45`
`` The returned :term:`named tuple` has an additional property named
``
`46`
`46`
``` ``exact_type`` that contains the exact operator type for
47
``
:data:`token.OP` tokens. For all other token types ``exact_type``
``
47
:data:`~token.OP` tokens. For all other token types ``exact_type``
48
48
``` equals the named tuple type
field.
`49`
`49`
``
`50`
`50`
` .. versionchanged:: 3.1
`
`` @@ -58,26 +58,7 @@ The primary entry point is a :term:`generator`:
``
`58`
`58`
``
`59`
`59`
``
`60`
`60`
`` All constants from the :mod:`token` module are also exported from
``
`61`
``
`` -
:mod:`tokenize`, as are three additional token type values:
``
`62`
``
`-`
`63`
``
`-
.. data:: COMMENT
`
`64`
``
`-`
`65`
``
`-
Token value used to indicate a comment.
`
`66`
``
`-`
`67`
``
`-`
`68`
``
`-
.. data:: NL
`
`69`
``
`-`
`70`
``
`-
Token value used to indicate a non-terminating newline. The NEWLINE token
`
`71`
``
`-
indicates the end of a logical line of Python code; NL tokens are generated
`
`72`
``
`-
when a logical line of code is continued over multiple physical lines.
`
`73`
``
`-`
`74`
``
`-`
`75`
``
`-
.. data:: ENCODING
`
`76`
``
`-`
`77`
``
`-
Token value that indicates the encoding used to decode the source bytes
`
`78`
``
`` -
into text. The first token returned by :func:`.tokenize` will always be an
``
`79`
``
`-
ENCODING token.
`
`80`
``
`-`
``
`61`
`` +
:mod:`tokenize`.
``
`81`
`62`
``
`82`
`63`
`Another function is provided to reverse the tokenization process. This is
`
`83`
`64`
`useful for creating tools that tokenize a script, modify the token stream, and
`
`@@ -96,8 +77,8 @@ write back the modified script.
`
`96`
`77`
` token type and token string as the spacing between tokens (column
`
`97`
`78`
` positions) may change.
`
`98`
`79`
``
`99`
``
`-
It returns bytes, encoded using the ENCODING token, which is the first
`
`100`
``
`` -
token sequence output by :func:`.tokenize`.
``
``
`80`
`` +
It returns bytes, encoded using the :data:`~token.ENCODING` token, which
``
``
`81`
`` +
is the first token sequence output by :func:`.tokenize`.
``
`101`
`82`
``
`102`
`83`
``
`103`
`84`
`` :func:`.tokenize` needs to detect the encoding of source files it tokenizes. The
``
`@@ -115,7 +96,7 @@ function it uses to do this is available:
`
`115`
`96`
``
`116`
`97`
` It detects the encoding from the presence of a UTF-8 BOM or an encoding
`
`117`
`98`
`` cookie as specified in :pep:`263`. If both a BOM and a cookie are present,
``
`118`
``
`-
but disagree, a SyntaxError will be raised. Note that if the BOM is found,
`
``
`99`
`` +
but disagree, a :exc:`SyntaxError` will be raised. Note that if the BOM is found,
``
`119`
`100`
``` ``'utf-8-sig'`` will be returned as an encoding.
120
101
``
121
102
``` If no encoding is specified, then the default of 'utf-8'
will be
`@@ -147,8 +128,8 @@ function it uses to do this is available:
`
`147`
`128`
` 3
`
`148`
`129`
``
`149`
`130`
`Note that unclosed single-quoted strings do not cause an error to be
`
`150`
``
``` -
raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of
151
``
`-
their contents.
`
``
131
`` +
raised. They are tokenized as :data:~token.ERRORTOKEN
, followed by the
``
``
132
`+
tokenization of their contents.
`
152
133
``
153
134
``
154
135
`.. _tokenize-cli:
`
`@@ -260,7 +241,7 @@ the name of the token, and the final column is the value of the token (if any)
`
260
241
` 4,11-4,12: NEWLINE '\n'
`
261
242
` 5,0-5,0: ENDMARKER ''
`
262
243
``
263
``
The exact token type names can be displayed using the ``-e`` option:
``
244
`` +
The exact token type names can be displayed using the :option:-e
option:
``
264
245
``
265
246
`.. code-block:: sh
`
266
247
``