[Python-Dev] Split unicodeobject.c into subfiles (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Thu Oct 25 08:42:55 CEST 2012

Previous message: [Python-Dev] Split unicodeobject.c into subfiles
Next message: [Python-Dev] Split unicodeobject.c into subfiles
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Oct 25, 2012 at 2:22 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Nick Coghlan writes:

> OK, I need to weigh in after seeing this kind of reply. Large source files > are discouraged in general because they're a code smell that points > strongly towards a lack of modularity within a *complex piece of > functionality*. Sure, but large numbers of tiny source files are also a code smell, the smell of purist adherence to the literal principle of modularity without application of judgment.

Absolutely. The classic example of this is Java's unfortunate insistence on only-one-public-top-level-class-per-file. Bleh.

If you want to argue that the pragmatic point of view nevertheless is to break up the file, I can see that, but I think Victor is going too far. (Full disclosure dept.: the call graph of the Emacs equivalents is isomorphic to the Dungeon of Zork, so I may be a bit biased.) You really should speak to the question of "how many" and "what partition".

Yes, I agree I was too hasty in calling the specifics of Victor's current proposal a good idea. What raised my ire was the raft of replies objecting to the refactoring in principle for completely specious reasons like being able to search within a single file instead of having to use tools that can search across multiple files.

unicodeobject.c is too big, and should be restructured to make any natural modularity explicit, and provide an easier path for users that want to understand how the unicode implementation works.

> the real gain is in modularity, making it clear to readers which > parts can be understood and worked on separately from each other.

Yeah, so which do you think they are? It seems to me that there are three modules to be carved out of unicodeobject.c: 1. The internal object management that is not exposed to Python: allocation, deallocation, and PEP 393 transformations. 2. The public interface to Python implementation: methods and properties, including operators. 3. Interaction with the outside world: codec implementations. But conceptually, these really don't have anything to do with internal implementation of Unicode objects. They're just functions that convert bytes to Unicode and vice versa. In principle they can be written in terms of ord(), chr(), and bytes(). On the other hand, they're rather repetitive: "When you've seen one codec implementation, you've seen them all." I see no harm in grouping them in one file, and possibly a gain from proximity: casual passers-by might see refactorings that reduce redundancy.

I suspect you and Victor are in a much better position to thrash out the details than I am. It was the trend in the discussion to treat the question as "split or don't split?" rather than "how should we split it?" when a file that large should already contain some natural splitting points if the implementation isn't a tangled monolithic mess.

Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle any codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules?

I believe it's a combination of history and whether or not they're needed by the interpreter during the bootstrapping process before the encodings namespace is importable.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] Split unicodeobject.c into subfiles
Next message: [Python-Dev] Split unicodeobject.c into subfiles
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list