[Python-Dev] Split unicodeobject.c into subfiles (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu Oct 25 06:22:03 CEST 2012

Previous message: [Python-Dev] Split unicodeobject.c into subfiles
Next message: [Python-Dev] Split unicodeobject.c into subfiles
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Nick Coghlan writes:

OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a lack of modularity within a complex piece of functionality.

Sure, but large numbers of tiny source files are also a code smell, the smell of purist adherence to the literal principle of modularity without application of judgment.

If you want to argue that the pragmatic point of view nevertheless is to break up the file, I can see that, but I think Victor is going too far. (Full disclosure dept.: the call graph of the Emacs equivalents is isomorphic to the Dungeon of Zork, so I may be a bit biased.) You really should speak to the question of "how many" and "what partition".

the real gain is in modularity, making it clear to readers which parts can be understood and worked on separately from each other.

Yeah, so which do you think they are? It seems to me that there are three modules to be carved out of unicodeobject.c:

The internal object management that is not exposed to Python: allocation, deallocation, and PEP 393 transformations.
The public interface to Python implementation: methods and properties, including operators.
Interaction with the outside world: codec implementations. But conceptually, these really don't have anything to do with internal implementation of Unicode objects. They're just functions that convert bytes to Unicode and vice versa. In principle they can be written in terms of ord(), chr(), and bytes(). On the other hand, they're rather repetitive: "When you've seen one codec implementation, you've seen them all." I see no harm in grouping them in one file, and possibly a gain from proximity: casual passers-by might see refactorings that reduce redundancy.

I'm not sure what to do with the charmap stuff. In current CPython head it seems incoherent to me: there's an IO codec, but there's also unicode-to-unicode stuff (PyUnicode_Translate). I haven't had time to look at Victor's reorganization to see what he actually did with it, but in terms of modularity, it seems to me that refactoring this stuff would be a real win, as opposed to splitting the files which is presentational improvement for the rest of the code which is pretty modular.

As for Victor's proposal itself:

1176 Objects/unicodecharmap.c 1678 Objects/unicodecodecs.c 1362 Objects/unicodeformat.c 253 Objects/unicodeimpl.h 733 Objects/unicodelegacy.c 1836 Objects/unicodenew.c 2777 Objects/unicodeobject.c 2421 Objects/unicodeoperators.c 1235 Objects/unicodeoscodecs.c 1288 Objects/unicodeutfcodecs.c

As Victor himself admits, "unicodelegacy" and "unicodenew" are not descriptive of what they contain. In I18N discussions, "legacy" is usually a deprectory reference to non-Unicode encodings, and I would tend to guess this file contains codecs from the name. A better name might be "unicodedeprecated" (if what he really means is deprecated APIs).

I don't understand why splitting out "unicodeoperators" is a great idea; it's done nowhere else in CPython. If that makes sense, why not split out "unicodemethods" (for methods normally invoked explicitly rather than by syntax) too? N.B. For bytes, the corresponding file is spelled "bytes_methods".

"unicodecodecs" vs "unicodeutfcodecs": Say what? I would forever be looking in the wrong one.

"unicodeoscodecs" suggests to me that these codecs are only usable on some OSes. If so, shouldn't the relevant OS be in the name? If not, the name is basically misleading IMO.

Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle any codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules?

Steve

Previous message: [Python-Dev] Split unicodeobject.c into subfiles
Next message: [Python-Dev] Split unicodeobject.c into subfiles
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list