[Python-3000] Support for PEP 3131 (original) (raw)

Ka-Ping Yee python at zesty.ca
Fri May 25 21:29:50 CEST 2007


On Fri, 25 May 2007, Josiah Carlson wrote:

Apples and oranges to be sure, but there are no other statistics that anyone else is able to offer about use of non-ascii identifiers in Java, Javascript, C#, etc.

Let's see what we can find. I made several attempts to search for non-ASCII identifiers using google.com/codesearch and here's what I got.

Java or JavaScript (total: about 1480000 files found with "lang:java .")

  1. lang:java ^[^"][^\s!-~].= (assignment to non-ASCII name)

    2 files with a UTF-8 BOM at the beginning; 1 file with non-ASCII in comments; 5 files with non-ASCII in strings; 2 files with non-ASCII elsewhere in source code:

    1. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file.

    2. SMSkyline.wdgt/fr.lproj/localizedStrings.js UTF-16 BOM beginning of a UTF-8 file. (!)

  2. lang:java ^[^"][^\s!-~]\w. (method call on non-ASCII name)

    2 files with a UTF-8 BOM at the beginning; 13 files with non-ASCII in comments; 5 files with non-ASCII in strings; 5 files with non-ASCII elsewhere in source code:

    1. struts-2.0.6/src/core/src/.../Editor2Plugin/FindReplaceDialog.js UTF-8 BOM in middle of file.

    2. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file.

    3. chickenfoot/chickenscratch/tests/findTest.js Non-breaking spaces embedded in indentation.

  3. lang:java ^\sclass.[^\s!-~] (class declaration)

    2 files with non-ASCII in strings; no other hits.

  4. lang:javascript ^\sfunction.[^\s!-~] (function declaration)

    1 non-JavaScript file; 9 files with non-ASCII in comments; 1 file with non-ASCII in strings; 1 file with non-ASCII elsewhere in source code:

    1. google_hacks_3E_code/hack_61/zoom-google.user.js Thin spaces (U+2009) embedded in code.

C# (total: about 266000 files found with "lang:c# .")

  1. lang:c# ^[^"][^\s!-~].= (assignment to non-ASCII name)

    5 non-C# files; 6 files with a UTF-8 BOM at the beginning; 9 files with non-ASCII in comments; 7 files with non-ASCII elsewhere in source code:

    1. blam-1.8.4pre2/src/PreferencesDialog.cs Non-breaking spaces in the middle of the line.

    2. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII.

    3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class2.cs Identifier containing non-ASCII.

    4. Rule.cs Identifier containing non-ASCII.

    5. SharpIntroduction/ComplexExample/Zv?????tko.cs Identifier containing non-ASCII.

    6. WitherwynWebDist/Witherwyn/Map.cs "Times" character in expression, probably a typo.

    7. PDFsharp/XGraphicsLab/MainForm.cs Identifier containing non-ASCII.

  2. lang:c# ^[^"][^\s!-~]\w( (function call on non-ASCII name)

    4 files with non-ASCII in comments; 6 files with non-ASCII elsewhere in source code:

    1. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII.

    2. SharpIntroduction/ComplexExample/Program.cs Identifier containing non-ASCII.

    3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class1.cs Identifier containing non-ASCII.

    4. ActiveRecord/Generator/.../RelationshipBuilderTestCase.cs Identifier containing non-ASCII, almost certainly a typo.

    5. Sample1/Sample1/Program.cs Identifier containing non-ASCII.

    6. Kap11/03/TEXT.CS Identifier containing non-ASCII.

  3. lang:c# ^\sclass.[^\s!-~] (class declaration)

    1 hit:

    1. Kap06/03/Kalen.cs Identifier containing non-ASCII.

In summary, that means out of around 5.7 million Java, JavaScript, and C# files that are indexed by Google Code Search, the only use of non-ASCII identifiers I could find was in 12 C# files, and one of those 12 occurrences is almost certainly a mistake.

-- ?!ng



More information about the Python-3000 mailing list