bpo-25324: Move the description of tokenize tokens to token.rst. (#1911) · python/cpython@5cefb6c (original) (raw)

`@@ -17,7 +17,7 @@ as well, making it useful for implementing "pretty-printers," including

`

17

17

`colorizers for on-screen displays.

`

18

18

``

19

19

`` To simplify token stream handling, all :ref:operators and :ref:delimiters

``

20

``

`` -

tokens are returned using the generic :data:token.OP token type. The exact

``

``

20

`` +

tokens are returned using the generic :data:~token.OP token type. The exact

``

21

21

``` type can be determined by checking the exact_type property on the


`22`

`22`

`` :term:`named tuple` returned from :func:`tokenize.tokenize`.

``

`23`

`23`

``

`` @@ -44,7 +44,7 @@ The primary entry point is a :term:`generator`:

``

`44`

`44`

``

`45`

`45`

``  The returned :term:`named tuple` has an additional property named

``

`46`

`46`

```  ``exact_type`` that contains the exact operator type for

47

``


:data:`token.OP` tokens. For all other token types ``exact_type``

``

47


:data:`~token.OP` tokens. For all other token types ``exact_type``

48

48

``` equals the named tuple type field.


`49`

`49`

``

`50`

`50`

` .. versionchanged:: 3.1

`

`` @@ -58,26 +58,7 @@ The primary entry point is a :term:`generator`:

``

`58`

`58`

``

`59`

`59`

``

`60`

`60`

`` All constants from the :mod:`token` module are also exported from

``

`61`

``

`` -

:mod:`tokenize`, as are three additional token type values:

``

`62`

``

`-`

`63`

``

`-

.. data:: COMMENT

`

`64`

``

`-`

`65`

``

`-

Token value used to indicate a comment.

`

`66`

``

`-`

`67`

``

`-`

`68`

``

`-

.. data:: NL

`

`69`

``

`-`

`70`

``

`-

Token value used to indicate a non-terminating newline. The NEWLINE token

`

`71`

``

`-

indicates the end of a logical line of Python code; NL tokens are generated

`

`72`

``

`-

when a logical line of code is continued over multiple physical lines.

`

`73`

``

`-`

`74`

``

`-`

`75`

``

`-

.. data:: ENCODING

`

`76`

``

`-`

`77`

``

`-

Token value that indicates the encoding used to decode the source bytes

`

`78`

``

`` -

into text. The first token returned by :func:`.tokenize` will always be an

``

`79`

``

`-

ENCODING token.

`

`80`

``

`-`

``

`61`

`` +

:mod:`tokenize`.

``

`81`

`62`

``

`82`

`63`

`Another function is provided to reverse the tokenization process. This is

`

`83`

`64`

`useful for creating tools that tokenize a script, modify the token stream, and

`

`@@ -96,8 +77,8 @@ write back the modified script.

`

`96`

`77`

` token type and token string as the spacing between tokens (column

`

`97`

`78`

` positions) may change.

`

`98`

`79`

``

`99`

``

`-

It returns bytes, encoded using the ENCODING token, which is the first

`

`100`

``

`` -

token sequence output by :func:`.tokenize`.

``

``

`80`

`` +

It returns bytes, encoded using the :data:`~token.ENCODING` token, which

``

``

`81`

`` +

is the first token sequence output by :func:`.tokenize`.

``

`101`

`82`

``

`102`

`83`

``

`103`

`84`

`` :func:`.tokenize` needs to detect the encoding of source files it tokenizes. The

``

`@@ -115,7 +96,7 @@ function it uses to do this is available:

`

`115`

`96`

``

`116`

`97`

` It detects the encoding from the presence of a UTF-8 BOM or an encoding

`

`117`

`98`

``  cookie as specified in :pep:`263`. If both a BOM and a cookie are present,

``

`118`

``

`-

but disagree, a SyntaxError will be raised. Note that if the BOM is found,

`

``

`99`

`` +

but disagree, a :exc:`SyntaxError` will be raised. Note that if the BOM is found,

``

`119`

`100`

```  ``'utf-8-sig'`` will be returned as an encoding.

120

101

``

121

102

``` If no encoding is specified, then the default of 'utf-8' will be


`@@ -147,8 +128,8 @@ function it uses to do this is available:

`

`147`

`128`

` 3

`

`148`

`129`

``

`149`

`130`

`Note that unclosed single-quoted strings do not cause an error to be

`

`150`

``

``` -

raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of

151

``

`-

their contents.

`

``

131

`` +

raised. They are tokenized as :data:~token.ERRORTOKEN, followed by the

``

``

132

`+

tokenization of their contents.

`

152

133

``

153

134

``

154

135

`.. _tokenize-cli:

`

`@@ -260,7 +241,7 @@ the name of the token, and the final column is the value of the token (if any)

`

260

241

` 4,11-4,12: NEWLINE '\n'

`

261

242

` 5,0-5,0: ENDMARKER ''

`

262

243

``

263

``


The exact token type names can be displayed using the ``-e`` option:

``

244

`` +

The exact token type names can be displayed using the :option:-e option:

``

264

245

``

265

246

`.. code-block:: sh

`

266

247

``