[Python-Dev] [NPERS] Re: a feature i'd like to see in python #2: indexing of match objects (original) (raw)

Josiah Carlson jcarlson at uci.edu
Thu Dec 7 22:57:49 CET 2006


"Michael Urman" <murman at gmail.com> wrote:

On 12/6/06, Josiah Carlson <jcarlson at uci.edu> wrote: > Special cases aren't special enough to break the rules.

Sure, but where is this rule that would be broken? I've seen it invoked, but I've never felt it myself. I seriously thought of slicing as returning a list of elements per range(start,stop,skip), with the special case being str (and then unicode followed)'s type preservation.

Tuple slicing doesn't return lists. Array slicing doesn't return lists. None of Numarray, Numeric, or Numpy array slicing returns lists. Only list slicing returns lists in current stdlib and major array package Python. Someone please correct me if I am wrong.

This is done because a list of characters is a pain to work with in most contexts, so there's an implicit ''.join on the list. And because assuming that the joined string is the desired result, it's much faster to have just built it in the first place. A pure practicality beats purity argument.

Python returns strings from string slices not because of a "practicality beats purity" argument, but because not returning a string from string slicing is insane. The operations on strings tend to not be of the kind that is done with lists (insertion, sorting, etc.), they are typically linguistic, parsing, or data related (chop, append, prepend, scan for X, etc.). Also, typically, each item in a list is a singular item. Whereas in a string, typically blocks of characters represent a single item: spaces, words, short, long, float, double, etc. (the latter being related to packed representations of data structures, as is the case in some socket protocols).

Semantically, lists differ from strings in various substantial ways, which is why you will never see a typical user of Python asking for list.partition(lst). Strings have the sequence interface out of convenience, not because strings are like a list. Don't try to combine the two ideas. Also, when I said that strings/unicode were special and "has already been discussed ad-nauseum", I wasn't kidding. Take string and unicode out of the discussion and search google for the thousands of other threads that talk about why string and unicode are the way they are. They don't belong in this conversation.

We both arrive at the same place in that we have a model describing the behavior for list/str/unicode, but they're different models when extended outside.

But this isn't about str/unicode (or buffer). For all other types available in Python with a slice operation, slicing them returns the same type as the original sequence. Expand that to 3rd party modules, and the case still holds for every package (at least those that I have seen and used). If you can point out an example (not str/unicode/buffer) for which this rule is broken in the stdlib or even any major 3rd party library, I'll buy you a cookie if we ever meet.

Now that I see the connection you're drawing between your argument and the paper, I don't believe it's directly inspired by the paper. I read the paper to say those who could create and work with a set of rules, could learn to work with the correct rules. Consistency in Python makes things easier on everyone because there's less to remember, not because it makes us better learners of the skills necessary for programming well. The arguments I saw in the paper only addressed the second point.

Right, but if you have a set of rules:

  1. RULE 1
  2. RULE 2
  3. RULE 3

The above will be easier to understand than:

  1. RULE 1
  2. a. RULE 2 if X b. RULE 2a otherwise
  3. RULE 3

This is the case even ignoring the paper. Special cases are more difficult to learn than no special cases. Want a real world example? English. The english language is so fraught with special cases that it is the only language for which dyslexia is known to exist (or was known to exist for many years, I haven't kept up on it).

In the context of the paper, their findings suggested that those who could work with a consistent set of rules could be taught the right consistent rules. Toss in an inconsistency? Who knows if in this case it will make any difference in Python; regular expressions are already confusing for many people.

This is a special case. There's a zen. Does practicality beat purity zen apply? I don't know.

At this point I've just about stopped caring. Make the slice return a list, don't allow slicing, or make it a full on group variant. I don't really care at this point. Someone write a patch and lets go with it. Adding slicing producing a list should be easy after the main patch is done, and we can emulate it in Python if necessary.



More information about the Python-Dev mailing list