bpo-33731: Implement support for locale specific format (WIP) by james-emerton · Pull Request #8612 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation15 Commits4 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Adds support for 'l' and 'L' which will format a string as per 'f'
except that they will use locale specific grouping, separators, and
decimal point. In the case of 'L' the LC_MONETARY values will be used.

https://bugs.python.org/issue33731

Adds support for 'l' and 'L' which will format a string as per 'f' except that they will use locale specific grouping, separators, and decimal point. In the case of 'L' the LC_MONETARY values will be used.

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

You can check yourself
to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Compare the output of the 'l' and 'L' formats against the output of locale.format_string(). The locale data seems to differ between platforms, so testing against literals is challenging. (Other format tests are explicitly providing locale data, but that approach would fail to properly test my modifications.)

Also added a test against literals for the en_US locale in the hopes that it's consistent across platforms.

I think I need some guidance on testing this correctly. I see that the existing format tests for the Decimal type are passing locale data explicitly via an undocumented parameter. Unfortunately, using this approach doesn't really test my changes, which were actually to libmpdec.

I've run tests locally on both MacOS and Windows, and they pass on my local machine, but the Windows CI build is failing. I haven't yet tested a Linux build locally, for which I'll need to spin up another VM.

Finally, since this requires changes to libmpdec, is it acceptable to commit those changes here or do they first need to be incorporated into the upstream repository?

_decimal has the undocumented parameter, see test_n_format.

If I approve the changes, the patch can include libmpdec. It seems that there's still some discussion in the issue though. An immediate observation is that I'd prefer 'm' for monetary -- There is at least one open issue for the 'm' parameter, too.

I can certainly use the aforementioned parameter, but I feel that doing so doesn't really exercise the changes. This changes mpd_parse_format_str to treat the format as 'f' with the addition of locale data. Using the parameter to provide the locale data would test that we switch the format to 'f' but not that the correct locale data is being loaded. Maybe that's okay/the best we can do in this case?

The use of 'l' and 'L' was the suggestion of Eric Smith in his comment on bpo-34311. The intention being 'l' for locale and 'L' for monetary locale, as this format would not be exclusively for the monetary context. At any rate, I'm certainly open to changing the letters being used.

I don't feel strongly about l and L. m certainly works for me, too. Has any other language broken ground on this? Can we follow their example?

I did a bit more looking around, and the Single UNIX Specification for printf provides a modifier that performs grouping. From http://man7.org/linux/man-pages/man3/printf.3.html

 '      For decimal conversion (i, d, u, f, F, g, G) the output is to
        be grouped with thousands' grouping characters if the locale
        information indicates any.  (See setlocale(3).)  Note that
        many versions of gcc(1) cannot parse this option and will
        issue a warning.  (SUSv2 did not include %'F, but SUSv3 added
        it.)

Thus, n should be equivalent to 'g and we should also support the modifier for f, o, x, d, and their uppercase equivalents. I think this is a better approach than introducing another type.

It appears that the C99 implementation provides no mechanism to use the values from LC_MONETARY in place of LC_NUMERIC. This distinction seems exceedingly rare in practice, and I'd personally be okay with leaving it out.

Currently we're using uppercase to mean "print an uppercase exponent". So:

'n' => regular_locale + 'g'
'N' => regular_locale + 'G'

'l' => regular_locale + 'f'
'L' => monetary_locale + 'f'?

Here I'd expect 'F', even though 'F' doesn't really do anything:

'L' => regular_locale + 'F'

libmpdec actually makes use of the regular convention:

if (isupper((uchar)type)) {
    type = tolower((uchar)type);
    flags |= MPD_FMT_UPPER;
}

Perhaps we can use a modifier like $n, $l for use monetary?

Upon further discussion and research it appears that we should be supporting grouping via the ' modifier as per C99

This implements the ' modifier in place of the thousands separator to enable locale specific grouping and decimal point.

I've now implemented this (for just Decimal so far) by accepting ' in place of the , or _ characters. (I noticed while was in here that Decimal doesn't currently support _)

I also see that those other two modifiers are documented in PEP 378 and PEP 515. Should I be writing this up as a PEP as well?

Interesting. Without knowing about this PR, I whipped up something similar (#11405). Clearly there is a demand for this feature.

@james-emerton Perhaps a mini-PEP that briefly lists the motivation and syntax alternatives (I still like $n) is a good way forward. Discussion is currently scattered among two GitHub issues and two bugs.python.org issues.

This is just my opinion, @ericvsmith is the format-language expert.

I've sent an e-mail to python-ideas, however, I can't see it in the archive yet.

Last I was working on this (months ago!) I started but never finished drafting a PEP. I've finished that off and added a bit about the alternative suggestions I've seen.

python/peps#886