[Python-Dev] RFD: how to build strings from lots of slices? (original) (raw)
Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Sun, 27 Feb 2000 13:01:38 +0100
- Previous message: [Python-Dev] Re: [Patches] Readline replacement under QNX in myreadline.c
- Next message: [Python-Dev] RFD: how to build strings from lots of slices?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
when hacking on SRE's substitution code, I stumbled upon a problem. to do a substitution, SRE needs to merge slices from the target strings and from the sub- stitution pattern.
here's a simple example:
re.sub(
"(perl|tcl|java)",
"python (not \\1)",
"perl rules"
)
contains a "substitution pattern" consisting of three parts:
"python (not " (a slice from the substitution string)
group 1 (a slice from the target string)
")" (a slice from the substitution string)
PCRE implements this by doing the slicing (thus creating three new strings), and then doing a "join" by hand into a PyString buffer.
this isn't very efficient, and it also doesn't work for uni- code strings.
in other words, this needs to be fixed. but how?
...
here's one proposal, off the top of my head:
introduce a PySliceListObject, which behaves like a simple sequence of strings, but stores them as slices. the type structure looks something like this:
typedef struct { PyObject* string; int start; int end; } PySliceListItem;
typedef struct { PyObject_VAR_HEAD PySliceListItem item[1]; } PySliceListObject;
where start and end are normalized (0..len(string))
__len__ returns self->ob_size
__getitem__ calls PySequence_GetSlice()
PySliceListObjects are only used internally; they have no Python-level interface.
- tweak string.join and unicode.join to look for PySliceListObject's, and have special code that copies slices directly from the source strings.
(note that a slice list can still be used with any method that expects a sequence of strings, but at a cost)
...
give the above, the substitution engine can now create a slice list by combining slices from the match object and the substitution object, and hand the result off to the string implementation; e.g:
sep =3D PySequence_GetSlice(subst_string, 0, 0):
result =3D PyObject_CallMethod(sep, "join", "O", slice_list)
Py_DECREF(sep);
(can anyone come up with something more elegant than the [0:0] slice?)
comments? better ideas?
- Previous message: [Python-Dev] Re: [Patches] Readline replacement under QNX in myreadline.c
- Next message: [Python-Dev] RFD: how to build strings from lots of slices?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]