Issue 30608: argparse calculates string widths incorrectly (original) (raw)

I don't really understand your example code. What result did you expect? The output shown in Github seems correct to me:

optional arguments: -h, --help show this help message and exit --language1 XXXXXXXXXX Lanugage for output --language2 LANGUAGE Lanugage for output

I've substituted "X" for the "missing characters" that show up, as I don't have a Tibetan font installed.

This is a more complicated "bug" (feature?) than it might seem, and I don't think it is really an argparse issue so much as a string issue. The length of a string is the number of code points in it, without trying to distinguish zero-width code points and combining characters from the rest.

I don't believe that argparse has any way of knowing how the string will be displayed. It could be displayed as:

a series of 10 "missing character" square glyphs;
the correct glyphs, but still 10 columns wide (if the font has glyphs for Tibetan, but does not render the vowel markers correctly);
or it might render the text properly, according to the rules for Tibetan, requiring less than 10 (I guess) columns.

I believe that, unfortunately, the only way that those three scenarios can be distinguished would be to print the text to a GUI framework with a rich text widget capable of measuring the width of text in pixels.

Working in a console app, as argparse does, it is limited to the typefaces the console supports, and cannot get the pixel width. I think the only safe way to proceed is to count code points (i.e. the length as reported by Python strings) and assume each code point requires one column. That way you can be reasonably confident that the string won't be any more than that number of columns wide.

(Even that might be wrong, if the string includes full width Asian code points, which may take two columns each.)

I don't think there is any good solution here, but I think the status quo might be the least worst. If argparse assumes that the vowel markers are zero-width, it will format the output correctly

optional arguments: -h, --help show this help message and exit --language1 XXXXXXX Lanugage for output --language2 LANGUAGE Lanugage for output

but only for those who have the correct Tibetan typeface installed. Everyone else will see:

optional arguments: -h, --help show this help message and exit --language1 XXXXXXXXXX Lanugage for output --language2 LANGUAGE Lanugage for output

(By the way, I'm guessing what the output might be -- I don't know Tibetan and don't know how many columns the correctly displayed string will take.)

Vanessa, if my analysis is wrong in any way, or if you can think of a patch to argparse that will solve this issue, please tell us.

Otherwise, I think this has to be treated as "won't fix".

By the way, perhaps a simpler demonstration which is more likely to render correctly on most people's systems would be to use Latin-1 combining characters:

py> s1 = 'àéîõü' py> s2 = unicodedata.normalize('NFD', s1) # decompose into combining chars py> s1, s2 ('àéîõü', 'àéîõü') py> assert len(s1) == 5 and len(s2) == 10 py>