[Python-3000] Lines breaking (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue May 29 10:17:20 CEST 2007

Previous message: [Python-3000] Lines breaking
Next message: [Python-3000] Lines breaking
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Martin v. Löwis" writes:

Alexandre Vassalotti writes:

The change would extend the line breaking behavior to three other ASCII characters: NEL "Next Line" 85 VT "Vertical Tab" 0B FF "Form Feed" 0C Of course, it is not really necessary to change, but I think full conformance to the standard [1] could give Python better support of multilingual texts. However, full conformance would require a good amount of work.

I don't understand why full conformance would require much work, not for the language. Unicode does not propose to place requirements on the syntax of Python including the repertoire of characters allowed, only that where a character does occur, it must have the semantics defined in UAX#14. (Of course text processing modules in the stdlib will have some work to do!)

I see no reason in UAX#14 that the Python grammar cannot ignore or prohibit VT and NEL (see below), prohibit use of LINE SEPARATOR and PARAGRAPH SEPARATOR, and restrict FORM FEED to occur immediately after a line break. (All outside of strings, of course, where there would be no restriction. Restrictions must apply to comment content, however.) Note that given Python's semantics for lines, the algorithm in Unicode (v4.1, Section 5.8, R1) for remapping to unambiguous use of LS and PS is well-defined and will leave zero residual ambiguity in a legal Python program (and no instances of PS).

With the provisions above, you'll get the same display of a legal Python program as ever when you switch to a UAX#14-conforming text editor, except that it may provide a more friendly display for strings containing very long lines. People who wish to edit Python programs in Microsoft Word should preprocess with the R1 algorithm.

Can you please point to the chapter and verse where it says that VT must be considered? I only found mention of FF, in R4.

In UAX#14, revision 19, in the descriptions of classes it says:

BK: Mandatory Break (A) (Non-tailorable)

Explicit breaks act independently of the surrounding characters. No characters can be added to the BK class as part of tailoring, but implementations are not required to support the VT character.

000C FORM FEED (FF) 000B LINE TABULATION (VT)

FORM FEED separates pages. The text on the new page starts at the beginning of the line. No paragraph formatting is applied.

2028 LINE SEPARATOR (LS)

The text after the Line Separator starts at the beginning of the line. No paragraph formatting is applied. This is similar to HTML
.

2029 PARAGRAPH SEPARATOR (PS)

The text of the new paragraph starts at the beginning of the line. Paragraph formatting is applied.

Newline Function (NLF)

Newline Functions are defined in the Unicode Standard as providing additional explicit breaks. They are not individual characters, but are encoded as sequences of the control characters NEL, LF, and CR.

In the descriptions of the singleton classes LF, CR, and NL (containing NEL), it is indicated that supporting LF and CR is mandatory, the rules are the ones used by Python's universal newline feature AFAICT. And NL need not be supported:

NL: Next Line (A) (Non-tailorable)

0085 NEXT LINE (NEL)

The NL class acts like BK in all respects (there is a mandatory break after any NEL character). It cannot be tailored, but implementations are not required to support the NEL character; see the discussion under BK.

Previous message: [Python-3000] Lines breaking
Next message: [Python-3000] Lines breaking
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list