BUG: Fixed Unicode decoding error in Period.strftime when a locale-specific directive is used by smarie · Pull Request #46405 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation104 Commits60 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

smarie

@smarie smarie changed the titleFixed Unicode decoding error in Period.strftime and PeriodIndex.strftime when a locale-specific directive is used Fixed Unicode decoding error in Period.strftime when a locale-specific directive is used

Mar 17, 2022

@smarie

I have an issue: I can reproduce the bug locally but not on CI.
It seems that there is no CI runner as of today that uses non-utf8 encoded locale.

The issue can only be reproduced when the current locale encoding is not utf8:

image

(zh_CN defaults encoding is gb2312)

What is the recommended way to proceed ?

I'll try the second option for now (when github issues are fixed, it seems down for now)

@smarie

gha runner was not triggered apparently :( re-trying

@smarie

Unnfortunately it does not work :(

/usr/bin/bash: warning: setlocale: LC_ALL: cannot change locale (zh_CN.GB2312)

EDIT: retrying with zh_CN.gb2312

@smarie

ok with lower case zh_CN.gb2312 it is worse: the runner does not even start :(
Can anyone help me to define a runner configured with this locale+encoding ? (@jreback maybe ?) Otherwise I am not able to reproduce the issue on CI, which is a bit problematic to prevent regression once I push the fix.

jreback

Sylvain MARIE added 5 commits

March 18, 2022 15:45

…ure/46319_locale_strftime_period

� Conflicts: � pandas/tests/io/formats/test_format.py

@smarie

@smarie

Sylvain MARIE added 7 commits

March 19, 2022 11:04

@smarie

smarie

smarie

params=[
pytest.param(None, id=str(locale.getlocale())),
"it_IT.utf8",
"zh_CN",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

"zh_CN",
"zh_CN", # Note: encoding is automatically 'gb2312'

What do you think ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add zh_CH.utf8 and zh_CH.gb2312 explicitly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add zh_CH.utf8.
Strangely enough, zh_CN.gb2312 did not seem to be working, but zh_CN does involve gb2312 .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also add it_IT by the way, so that both locales have their utf8 AND non-utf8 encoding activated in the test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mroeschke

mroeschke

- name: Generate extra locales
# These extra locales will be available for locale.setlocale() calls in tests
run: |
sudo locale-gen ${{ env.EXTRA_LOC }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the Locale: it_IT.utf8 isnt working either?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends what you mean by "working". Do you refer to the bug #46319 or to something more general ?

Does that answer your question ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was assumed setting the environment variables

      LANG: ${{ matrix.lang || '' }}
      LC_ALL: ${{ matrix.lc_all || '' }}

would ensure the locale for the test run would not be the default (en_US.UTF-8) for the other locale builds.

So IIUC, this is necessary if setting a non-utf8 locale?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is what I thought too.
Unfortunately when I tried setting LANG and LC_ALL to zh_CN (non-utf8), then there was a strange segmentation fault appearing (see #46405 (comment))

So this is why I now do not set it through these env variables but I only install it.

Maybe the problem is that when the env variables are set, many side effects happen in the build chain, that I do not quite understand.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're setting the locale this way, can we remove setting the LANG and LC_ALL environment variables here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I have no idea :) At least for the tests in this PR, yes these env variables are not used.

However these env vars are probably related to other locale-related tests, including being able to compile pandas on alternate locale, etc. It seems that in the CI those env variables are used to even add a specific line at the beginning of the init file of pandas ! (see ci/setup_env.sh)

This seems a bit beyond the scope of this PR. However it is maybe worth keeping track of this, in a dedicated ticket.

Sylvain MARIE added 6 commits

September 3, 2022 14:39

…ure/46319_locale_strftime_period

� Conflicts: � .github/workflows/ubuntu.yml

@smarie

@mroeschke @jreback @jbrockmendel all set now. Sorry for the delay.
Note that the only failing CI build seems completely unrelated (the issue is about a TLS certificate bundle)

Maybe still on time for 1.5 milestone ?

mroeschke

@mroeschke

Maybe still on time for 1.5 milestone ?

Sorry, since we released the rc already, we are only backporting fixes that are uncovered in the 1.5.0rc. Could you move the whatsnew note to 1.6.0.rst?

Sylvain MARIE added 2 commits

September 6, 2022 23:03

@pep8speaks

smarie

@smarie

@smarie

@smarie

mroeschke

@mroeschke

Thanks @smarie for your patience and persistence! Very nice fix

@smarie

@smarie smarie deleted the feature/46319_locale_strftime_period branch

September 8, 2022 20:23

noatamir pushed a commit to noatamir/pandas that referenced this pull request

Nov 9, 2022

@smarie @noatamir

…specific directive is used (pandas-dev#46405)

Co-authored-by: Sylvain MARIE sylvain.marie@se.com