msg337476 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 13:58 |
When a translation .po file contains a comment in headers, it's kept when compiled as .mo by msgfmt. Example with test.po: --- msgid "" msgstr "" "Content-Type: text/plain; charset=UTF-8\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" "#-#-#-#-# plo.po (PACKAGE VERSION) #-#-#-#-#\n" --- Compile it with "msgfmt". Parse the output file messages.mo using test.py script: --- import gettext, pprint with open("messages.mo", "rb") as fp: t = gettext.GNUTranslations() t._parse(fp) pprint.pprint(t._info) --- Output on Python 3.7.2: --- {'content-type': 'text/plain; charset=UTF-8', 'plural-forms': 'nplurals=2; plural=(n != 1);\n' '#-#-#-#-# plo.po (PACKAGE VERSION) #-#-#-#-#'} --- Output of Fedora Python 2.7.15 which contains a fix: --- {'content-type': 'text/plain; charset=UTF-8', 'plural-forms': 'nplurals=2; plural=(n != 1);'} --- I'm not sure that keeping the comment as part of plural forms is correct. Comments should not be ignored? I made my test on Fedora 29: msgfmt 0.19.8.1, Python 3.7.2. Links: * https://bugs.python.org/issue1448060#msg27754 * https://bugs.python.org/issue1475523 * https://bugzilla.redhat.com/show_bug.cgi?id=252136 Fedora has a patch since 2007 to ignore comments: https://src.fedoraproject.org/rpms/python2/blob/master/f/python-2.5.1-plural-fix.patch I can easily convert the patch to a PR, maybe with a test. The question is more if the fix is correct or not. |
|
|
msg337477 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 13:59 |
Attached files: * comments.po: PO file with a comment in headers * messages.mo: comments.po compiled with msgfmt * parse.py: Python script to parse messages.mo |
|
|
msg337486 - (view) |
Author: Julien Palard (mdk) *  |
Date: 2019-03-08 14:43 |
After some research I found a few comments around comments being marked as starting by #-#-#-#-# and ending with #-#-#-#-#, not just starting with #. In gettext-0.19.8.1 sources for example: $ grep -r '#-#-#-#-' | head gettext-tools/misc/po-mode.el:#-#-#-#-# file name reference #-#-#-#-# gettext-tools/misc/po-mode.el: (let* ((marker-regex "^#-#-#-#-# \\(.*\\) #-#-#-#-#\n") gettext-tools/src/msgl-cat.c: char *id = xasprintf ("#-#-#-#-# %s #-#-#-#-#", Or more precisly in `gettext-tools/tests/msgcat-10`: # Verify msgcat of two files, when the header entries have different comments # but the same contents. The resulting header entry is not marked fuzzy, # because the #-#-#-#-# are only in comments and do not necessarily require # translator attention; in other words, an msgstr which is valid in both input # files is also valid in the result. I'm however surprised not to find much of "#-#-#-#-#" in the source code, like if they are just looking a single # like you do here. Not sure which one is the better, eliminating lines with a pair of #-#-#-#-# or lines starting with a #, both looks OK to me (we're only speaking about the header here, not the msgstr, so it won't have much impact). Personally I'd go for eliminating #-#-#-#-# as this is the only case we've seen, and is the "documented" one in the GNU gettext test cases. |
|
|
msg337490 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 15:20 |
I found a .po file with "#" in headers on the Internet, Sympa mailing list project: https://www.sympa.org/distribution/sympa-6.0.10/po-wwsympa/et.po: # #-#-#-#-# blank_web_help_et.po (sympa) #-#-#-#-# # Sympa online help internationalisation. # Copyright (C) 2007 # This file is distributed under the same license as Sympa. # FIRST AUTHOR <david.verdin@cru.fr>, 2007. # # #-#-#-#-# tmp_web_help_et.po (et) #-#-#-#-# # translation of et.po to # translation of et.po to # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # Copyright (C) 2005 Free Software Foundation, Inc. # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR , YEAR. # Copyright (C) YEAR Free Software Foundation, Inc. # FIRST AUTHOR , YEAR.#. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER. # root <root@vykk.vil.ee>, 2005. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: et\n" "POT-Creation-Date: 2007-11-13 14:50+0200\n" "PO-Revision-Date: 2007-10-22 00:03+0200\n" "Last-Translator: Alar Sing <alar.sing@etv.ee>\n" "Language-Team: Estonian\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "#-#-#-#-# blank_web_help_et.po (sympa) #-#-#-#-#\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" "#-#-#-#-# tmp_web_help_et.po (et) #-#-#-#-#\n" "X-Generator: Pootle 1.0.2\n" They are 2 headers starting with >"#-#-#-#-# < and ending with > #-#-#-#-#\n"<. |
|
|
msg337491 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 15:20 |
I hacked gettext.py to parse all files of my system. I found 3 .mo files which contain "#" in headers: /usr/share/locale/fa/LC_MESSAGES/digikam.mo: {'content-transfer-encoding': '8bit\n' '#-#-#-#-# digikamimageplugin_channelmixer.po ' '(digikamimageplugin_channelmixer) #-#-#-#-#', 'content-type': 'text/plain; charset=UTF-8', 'language': 'fa', 'language-team': 'Farsi (Persian) <>', 'last-translator': 'Mohammad Reza Mirdamadi <mohi@ubuntu.ir>', 'mime-version': '1.0', 'plural-forms': 'nplurals=1; plural=0;', 'po-revision-date': '2012-01-13 15:00+0330', 'pot-creation-date': '2018-03-18 03:11+0100', 'project-id-version': 'digikam', 'report-msgid-bugs-to': 'http://bugs.kde.org', 'x-generator': 'KBabel 1.11.4'} /usr/share/locale/ia/LC_MESSAGES/akonadicontact5-serializer.mo: {'content-transfer-encoding': '8bit\n' '#-#-#-#-# akonadi_kalarm_resource.po ' '#-#-#-#-#', 'content-type': 'text/plain; charset=UTF-8', 'language': 'ia', 'language-team': 'Interlingua <kde-i18n-it@kde.org>', 'last-translator': 'g.sora <g.sora@tiscali.it>', 'mime-version': '1.0', 'plural-forms': 'nplurals=2; plural=n != 1;', 'po-revision-date': '2011-11-29 19:38+0100', 'pot-creation-date': '2018-11-12 06:56+0100', 'project-id-version': '', 'report-msgid-bugs-to': 'http://bugs.kde.org', 'x-generator': 'Lokalize 1.2'} /usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo: {'content-transfer-encoding': '8bit', 'content-type': 'text/plain; charset=UTF-8', 'language': 'ml', 'language-team': 'Swathanthra|സ്വതന്ത്ര Malayalam |
മലയാളം ' 'Computing |
കമ്പ്യൂട്ടിങ്ങ് <smc-discuss@googlegroups.com>', 'last-translator': '# ANI PETER |
msg337492 - (view) |
Author: Julien Palard (mdk) *  |
Date: 2019-03-08 15:27 |
The 'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>', case does not looks like an issue, it does *not* starts with #, it's in the middle of the line, the line starts with "Last-Translator". |
|
|
msg337493 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 15:30 |
/usr/share/locale/fa/LC_MESSAGES/digikam.mo: I downloaded the .po file using: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/fa/messages/extragear-graphics/digikam.po > fa_digikam.po It contains many comments in headers. Extract: (...) # MaryamSadat Razavi <razavi@itland.ir>, 2007. # Nasim Daniarzadeh <daniarzadeh@itland.ir>, 2007. # Nazanin Kazemi <kazemi@itland.ir>, 2007. # Mohammad Reza Mirdamadi <mohi@ubuntu.ir>, 2011, 2012. msgid "" msgstr "" "Project-Id-Version: digikam\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" "POT-Creation-Date: 2019-03-08 03:08+0100\n" "PO-Revision-Date: 2012-01-13 15:00+0330\n" "Last-Translator: Mohammad Reza Mirdamadi <mohi@ubuntu.ir>\n" "Language-Team: Farsi (Persian) <>\n" "Language: fa\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "#-#-#-#-# digikamimageplugin_channelmixer.po " "(digikamimageplugin_channelmixer) #-#-#-#-#\n" "X-Generator: Lokalize 1.2\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_refocus.po (digikamimageplugin_refocus) #-#-#-" "#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_oilpaint.po (digikamimageplugin_oilpaint) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_perspective.po " "(digikamimageplugin_perspective) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_freerotation.po " "(digikamimageplugin_freerotation) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugins.po (digikamimageplugins) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_blowup.po (digikamimageplugin_blowup) #-#-#-#-" "#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_charcoal.po (digikamimageplugin_charcoal) #-#-" "#-#-#\n" (...) |
|
|
msg337494 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-03-08 15:38 |
/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/ml/messages/kde-workspace/ktraderclient5.po > ml_ktraderclient5.po Extract: msgid "" msgstr "" "Project-Id-Version: ktraderclient\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" "POT-Creation-Date: 2018-08-16 09:14+0200\n" "PO-Revision-Date: 2008-07-10 22:04+0530\n" "Last-Translator: # ANI PETER|അനി പീറ്റര്<200d> <peter.ani@gmail.com>\n" "Language-Team: Swathanthra |
സ്വതന്ത്ര Malayalam |
മലയാളം Computing |
msg337495 - (view) |
Author: Julien Palard (mdk) *  |
Date: 2019-03-08 15:38 |
That's literally sick þ Looks like we have to trust the "\n", not the file wrapping, but this means that: msgstr "" "Pro" "jec" "t-I" "d-V" "ers" "ion" ": " "dig" "ika" "m\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" is valid, too? I have to try it! HAHA it is: $ cat ~/clones/python-docs-fr/glossary.po | head -n 20 # Copyright (C) 2001-2018, Python Software Foundation # For licence information, see README file. # msgid "" msgstr "" "Pr" "oj" "ec" "t-" "Id" "-V" "er" "si" "on" ":" " P" "ython 3.6\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-12-21 09:48+0100\n" "PO-Revision-Date: 2019-03-08 14:48+0100\n" $ msgcat ~/clones/python-docs-fr/glossary.po |
head -n 20 # Copyright (C) 2001-2018, Python Software Foundation # For licence information, see README file. # msgid "" msgstr "" "Project-Id-Version: Python 3.6\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-12-21 09:48+0100\n" "PO-Revision-Date: 2019-03-08 14:48+0100\n" "Last-Translator: Jules Lasne <jules.lasne@gmail.com>\n" "Language-Team: FRENCH <traductions@lists.afpy.org>\n" "Language: fr\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "X-Generator: Poedit 2.0.2\n" "# Pouette\n" |
|
msg337497 - (view) |
Author: Julien Palard (mdk) *  |
Date: 2019-03-08 15:56 |
I tested further, and when we have this horrible mess in the po files: msgstr "" "Pro" "jec" "t-I" "d-V" "ers" "ion" ": " "dig" "ika" "m\n" We have a clean string in the .mo file. So there is no fear to have of: "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" It will be nicely stored in the mo as: Plural-Forms: nplurals=1; plural=0; #-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-#-#-# X-Generator: KBabel 1.11.4 So you can safely remove lines starting and ending with #-#-#-#-#. |
|
|
msg341981 - (view) |
Author: Julien Palard (mdk) *  |
Date: 2019-05-09 14:22 |
New changeset afd1e6d2f0f5aaf4030d13342809ec0915dedf81 by Julien Palard in branch 'master': bpo-36239: Skip comments in gettext infos (GH-12255) https://github.com/python/cpython/commit/afd1e6d2f0f5aaf4030d13342809ec0915dedf81 |
|
|
msg342002 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-05-09 22:24 |
Julien: Why not fixing Python 3.7? You approved https://github.com/python/cpython/pull/13218 (Python 3.7 backport) but then you closed it. Only Azure Pipelines PR failed on "ERROR: test_drain_raises (test.test_asyncio.test_streams.StreamTests)" which is unrelated. |
|
|