Python things (original) (raw)
Python Things | [![]() |
---|
Python is a clean and powerful semicompiled object-oriented programming language. If you hadn't heard of Python,go find out about it now!
Stop press.:)Strange "Python" twins sighted at Python 10.
Recent:
- Proposals to speed up global variable access.
- delegate: This module provides generic mechanisms for delegating work items. At the moment there are just two: "parallelize", which forks and manages children to do work in parallel, and "timeout", which forks a single child and limits its running time. The arguments and return values are pickled and sent through pipes. Get it here: <delegate.py> and see the pydoc page of HTML documentation on delegate.
- cgitb will be going into Python 2.2a2. It will not be enabled by default. To enable it, do:
import cgitb; cgitb.enable()
It now also supports saving tracebacks to files. - cgitb for Python 2.1: If you have Python 2.1, all you need to get is <cgitb.py>. Then put these lines in your sitecustomize.py (if the module doesn't already exist, create it in site-packages):
import os
if os.environ.has_key("GATEWAY_INTERFACE"):
import sys, cgitb
sys.excepthook = cgitb.excepthook
Now all Python CGI scripts on your server will magically produce pretty tracebacks in the Web browser whenever errors occur. You don't have to change any of your CGI scripts! This even works when there's a SyntaxError. (Of course, don't do this if you absolutely have to keep your scripts secret.) - cgitb: a traceback printer for CGI scripts that really shows off the power of inspect! Imagine this CGI script. Now, instead of seeingthis orthis orthis, you can getthis! Just get <inspect.py>,<pydoc.py>, and<cgitb.py>. Wrap your main function in a "try: ... except: cgitb.handler()" block.
- pydoc: a text and HTML documentation generator for Python that works with Python 1.5.2 and up. To use it, get<pydoc.py> and <inspect.py>. These two modules are now part of the Python 2.1b1 distribution. They are described here andhere.
- inspect.py: a module for inspecting live objects, submitted for consideration for inclusion in the standard library. The module, tests, test output, and documentation are all available here.
- PEPs: iterators,string interpolation.
- rxb for re and Python 1.5 (preliminary version is now here),
- Roundup roundup.tar.gz (32 kb), a simple and effective bug-tracking system released as open source (see the short paper). You will _really_have to excuse the mess, and don't say i didn't warn you! It has been hurriedly extracted from its running implementation into a hacked-up copy, but at least you should be able to run it yourself if you have a web server. (There are more instructions in the README.) Good luck!
Here is a collection of some possibly useful Python things. They're in chronological order, so scroll down to the end for the most recent stuff.
Perlish features for Python regexes
The internal regex
module has a syntax mode labelled RE_SYNTAX_AWK
that helps to make regexes more familiar for Perl hackers by allowing unbackslashed parens and pipes. But it doesn't have \d
,\D
,\s
, or\S
; and\w
and \W
don't work quite the same as Perl's. So this patch adds these conveniences under a new regex syntax flag named RE_EXTRA_CLASSES
. For kicks, i also added \h
for hexadecimal digits and \l
for letters of the alphabet.
Also, this patch adds a syntax flag named RE_MINIMAL_OPS
which enables the new operators??
,*?
, and+?
(from Perl 5). These have similar meanings to their counterparts?
,*
, and+
, but the minimal versions prefer to match the_shortest_ string possible, which can be really useful in some situations.
These two features are both enabled when you select the regex syntax dubiously labelled RE_SYNTAX_PERLISH
. This patchfile should be applied to thePython 1.4 beta 3 distribution, and it alters three files:Modules/regexpr.c
,Modules/regexpr.h
, andLib/regex_syntax.py
.
To apply the patch, simply go to the directory into which you extracted the Python tar
file (probably namedPython1.4beta3
), and -- if you saved the patch file as, say, /home/bob/regex-perlish.patch
-- go
patch </home/bob/regex-perlish.patch
Then just run make
to build the new Python interpreter.
extra registers for regex matches
The compiled regular expression objects produced by the internal regex
module don't have attributes equivalent to Perl's $&
,<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">‘</mi><mi mathvariant="normal">‘</mi><mi mathvariant="normal">‘</mi><mo separator="true">,</mo><mi>o</mi><mi>r</mi><mi mathvariant="normal">‘</mi></mrow><annotation encoding="application/x-tex">`
, or</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord">‘‘‘</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">or</span><span class="mord">‘</span></span></span></span>'
(the part of the string that matched, everything before what matched, and everything after what matched, respectively). It's possible to construct these strings using the indices from the regs
attribute, but it's somewhat inconvenient, harder to read, and slower.
So i suggested the addition of three attributes, before
,found
, andafter
, to the compiled regex objects. Andrew Kuchling almost immediately posted a patch to do just that.
rxb, a regex builder
Here's an idea suggested by Greg Ewing that i coded up one day. This Python module lets you verbosely build regular expressions with phrases such as
digit + some(whitespace) + exactly(':)')
without worrying about the exact syntax or bothering to backslash dangerous characters. You might like it if you find yourself wasting a lot of time looking up regular-expression syntax.
This new version allows a more concise syntax and generates instances of a Pattern
class, instead of strings; this way you can directly use methods like search
on the result, and you don't need to worry about compiling and caching. It makes regexes more convenient to use. Here's an example:
import rxb rxb.welcome()
pat = label.spam(some(letters)) + digit pat.search('foo bar python8') 8 pat.spam 'python'
pat <Pattern \([A-Za-z]+\)[0-9]>
rxb.banish()
or, a slightly more complex example from Grail:
import rxb rxb.welcome()
flag = member(letters, '-')
LISTING_PATTERN = (begline + label.mode( flag + # file type flag3 + flag3 + flag3) + # owner, group, world perms label.data( somespace + anything + # links, owner, grp, sz, date somespace + digit2 + maybe(':') + digit*2 + # year or hh:mm somespace) + label.file( anybut('->')) + # anything before symlink maybe(label.link( somespace + '->' + anything)) + # possible symlink endline)
rxb.banish()
Thanks to William S. Lear rael@dejanews.com who pointed out a problem with this example which was due to a bug in the regex.symcomp()
routine. The rxb
module has been recently modified to produce regular expressions using backslashed (instead of bare) parentheses for grouping, as a workaround for the symcomp()
bug.
The bug is this: if you open a new subgroup with a left-parenthesis immediately following the greater-than sign which ends a group label,symcomp()
will miss the parenthesis and thus miscount the rest of the subgroups. The bug has been fixed in Python 1.4 final.
concatenation in place for lists
Since lists are mutable, you can modify them in place using list methods. Using list.append(element)
is _much_faster than doing list = list + element
, since the latter has to make a new object with a copy of the whole list. Unfortunately,list.append
can only append one element at a time. I wrote the following patch to add the concat
method to lists, which will concatenate a list argument onto another list in place.
new version of faq2html.py
I decided to rewrite faq2html.py
as an exercise in text-processing with Python (in part, this is what prompted me to think of the regex modifications above, but this script does not require them). The new version is quite a bit more general than the old, and should be able to convert most reasonably-formatted FAQs into HTML, provided that questions are preceded with Q.
and answers preceded with A.
. Check the top of the script for details.
small improvement to dis.py
This tiny patch makes dis.disco()
display the names of local variables along with the disassembly.
string interpolation for Python
This module lets you quickly and conveniently interpolate values into strings (in the flavour of Perl or Tcl, but with less extraneous punctuation). You get a bit more power than in other languages, because this module allows subscripting, slicing, function calls, attribute lookup, or arbitrary expressions. Here are the simple interpolation rules:
- A dollar sign and a name, possibly followed by any of:
- an open paren, and anything up to the matching paren
- an open bracket, and anything up to the matching bracket
- a period and a name
any number of times, is evaluated as a Python expression.
- A dollar sign immediately followed by an open curly-brace, and anything up to the matching curly-brace, is evaluated as a Python expression.
- Two dollar signs in a row give you one literal dollar sign.
- Anything else is left alone. Expressions are evaluated in the namespace of the caller. This lets you painlessly do:
"Here is a $string."
"Here is a $module.member."
"Here is an $object.member."
"Here is a $functioncall(with, arguments)."
"Here is an ${arbitrary + expression}."
"Here is an $array[3] member."
"Here is a $dictionary['member']."
You can download the module from this site. It contains a class named 'Itpl' for representing interpolated-string objects, and a function named 'printpl' which will interpolate a given string and print the results. Here is the documentation pagegenerated from the module.
- Itpl.py for Python 1.4
- Itpl.py for Python 1.5(thanks to Par Kurlberg for calling to my attention the need for a Python 1.5 version, and thanks to Niki Spahiev for pointing out a bug in the attribute lookup)
- Itpl.py for Python 2.x(thanks to Dave Benjamin for submitting the fix)
generalized string.join
This patch generalizes the string.join()
routine to accept any instance of a class that implements the __len__
and __getitem__
disciplines, rather than accepting only the built-in sequence types (list and tuple). Your __getitem__ method will be called twice for each element (once to add up the total length of the result, and once during construction), so it had better return consistent results for this to work...
This lack of safety is bad. If the returned string lengths are inconsistent, you can cause a segmentation fault. Watch here for a more robust update.
- download <stropjoin.patch>.
assignment of while
and if
conditions (warning: controversial!)
This small patch, recently discussed on the Python newsgroup to some degree, changes the syntax of while
and if
statements to allow an optional from
keyword to save the result of the conditional in a variable. This lets you write, for example:
while line from sys.stdin.readline(): do_something_with(line)
if status from pipe.close(): handle_the_error(status)
My goal was to put the condition where it belongs instead of having to put extra "if ... break" statements inside the loop or duplicate the condition at the end where it it less apparent. There have been a fair number of comments about this. Just for fun, i'll quote some here (with apologies to the speakers)...
"Reads like Python." (David Ascher)
"... a very elegant solution, IMHO." (Andrew Kuchling)
"... seems like a C idiom trying to work its way into Python."(Johann Hibschman)
"... can easily be emulated using a file iterator."(Fredrik Lundh)
"I like this proposal." (Anthony Baxter)
"... looks just right to me." (Konrad Hinsen)
"I don't see why grammar changes are needed for what is essentially just an addition to a class's methods..." (Tony J Ibbs)
"Is a syntax change really worth it when all you save is one (1) line of source code?" (Fredrik Lundh)
"I'd really like to see it in the official release."(Marnix Klooster)
"I don't see what all the fuss is about. I commonly use a while 1 loop with one or more if:break clauses..." (Donn Cave)
"I have to concur with Donn on this one. I'm never really been inconvenienced by using the while 1:...break idiom." (Barry Warsaw)
"... agree with Donn that this is all unnecessary and we're better off with the 'while 1' idiom." (Guido van Rossum)
"I really like the 'from' proposal." (Richard Jones)
Oh, well. Anyway, here it is. After applying the patch, you need to go into the Grammar/
subdirectory and do amake
to rebuild the parser (isn't that cool?) before going back up and doing make
to build the interpreter.
- download <ifwhilefrom.patch>
improved tokenizer module
You don't need to get tokenize.py if you have Python 1.5. A patched copy of this module made it into the standard distribution.
The tokenize
module included with Python 1.3 and Python 1.4 does not quite "match the working of the Python tokenizer exactly", as it claims. Specifically, the new double-star operator is not recognized, CR/LF is not accepted at the end of a line, FF is not accepted, and there is no support for triple-quoted strings or backslash-continuations of lines. The new module fixes the regex intokenize.tokenprog
to accept the double-star, but such a regex is only good for scanning individual lines of text.
So the new module (posted 1 April) includes a new functiontokenize.tokenize()
which will scan streams of text. The function accepts a readline-like method which is called to come up with the next input line (or "" for EOF) and a "token-eater" function. The "token-eater" function should accept five arguments: the type of the token, a string containing the token, the starting and ending (row, column) coordinates of the token, and the line itself. This function should match the working of the Python tokenizer, nd will return INDENT and DEDENT tokens as the line indentation changes.
The information your "token-eater" function gets fromtokenize.tokenize()
should be enough to exactly reconstruct the original source script, if you need it. Theregurgitate
script below is an example of how to do this. The cedit
script below is an example of using the tokenizer to colourize Python code in a simple Tk text-editing window.
- download <tokenize.py>
- download
- download
simple text-based Python "lint" script
This entry in the growing list of Python "lint" scripts has two (so far) unique features in particular:
- It can be told to import modules that you import in your code, so it will not complain about known symbols within those modules.
- It uses the aforementioned new
tokenize
module, which helps to make it concise.
The principle is simple -- any identifier which is seen only once in your script is considered suspect. Warnings are not generated for keywords or for built-in object methods (when used as methods); extra warnings are generated for identifiers that look like __reserved__
words but aren't known.
With the -i
option, this script will also import modules whenever it sees import
statements in your script, so that if you use string.split
only once, there won't be a complaint about split
if you have imported string
in your script.
To use this script, you need to also have the "tokenize.py" module mentioned above.
- download
an interactive Tk-enabled shell (like wish
)
This script uses the _tkinter.createfilehandler
call and a simple Python interpreter written in Python to make it look like you're running Python the normal way in a terminal window, but still have live widgets in Tk windows, like wish
. Funnily enough, this one's called pywish
. With it, you can play with user interfaces in a quick and natural way:
wheat[251]% pywish Python 1.4 (Mar 17 1997) [C] (pywish) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
import sys def hello(): ... print 'hi!' ... b = Tkinter.Button(root, text='hi', command=hello) b.pack() hi! hi! hi! q = Tkinter.Button(root, text='quit', command=sys.exit) q.pack() hi! hi! wheat[252]%
The words in boldface are not ones that i entered; they were printed on the screen when i pressed the "hi" button in the Tk window (first three times, then twice, and then i pressed the "quit" button, which exited to the shell prompt).
My thanks are due to Guido van Rossum for pointing out createfilehandler
so i could produce this effect, and also for a tip on successful use of compile
and exec
with multi-line strings: tack on a few newlines.
- download
a Tkinter-based console component
This console features more tolerant pasting of code from other interactive sessions, better handling of continuations than the standard Python interpreter, highlighting of the most recently-executed code block, the ability to edit and reexecute previously entered code, a history of recently-entered lines, and automatic multi-level completion with pop-up menus. It will also present pop-up help based on documentation strings.
I plan to clean it up a little to make it more usable as a general component in other applications, but for now i'll just post it in its current state and hope you find it useful. You can just run this script directly to pop up a Console window.
- download <Console.py>
Roundup, a simple and effective bug-tracking system (see the short paper).
Roundup is out! (sort of) You will _really_have to excuse the mess, and don't say i didn't warn you! It has been hurriedly extracted from its running implementation into a hacked-up copy -- but at least you should be able to run it yourself if you have a webserver setup.
- download roundup.tar.gz (32 kb)
htmldoc, a documentation generator that produces HTML documents from live Python objects.
You can see samples of the output from htmldoc (i ran it over the Python 1.5.2 standard library) athttp://www.lfw.org/python/htmldoc/.
Done:
- documentation of user modules, classes, methods, and functions
- documentation of built-in functions
- automatic hyperlinking when classes and methods are mentioned
- detection of leading comments when no docstring is present
- overview of class inheritance tree within a module
- generation of HTML files from the command line
- all generated files are SGML compliant! :) :) (HTML 4.0 Transitional) To do:
- no directory/package indexing yet
- simple HTTP server (so you can get fresh docs on your own modules)
The "htmldoc" module is actually quite small (only about 300 lines) as most of the hard work has been factored out into the "inspect" module -- a non-HTML-specific collection of routines for getting all kinds of information out of your Python objects. My favourite routine in "inspect" is inspect.getsource(object), which can get you the source code for a function, method, or class.
- download htmldoc.py (12 kb)
- download inspect.py (18 kb)
pydoc, a documentation generator that works from the command line, in the Python interpreter, and as a web server in the background.
Examples:
pydoc sys # document a built-in module pydoc copy # document a module written in Python pydoc types # document a module written in Python pydoc abs # document a built-in function pydoc repr.Repr # document a single class pydoc -k mail # keyword search like man -k pydoc -p 6789 # start a web server at http://localhost:6789/
from pydoc import help help("getopt.getopt") # document something you haven't imported import calendar help(calendar) # document a live object
To get it, download these two files:
- download pydoc.py (54 kb)
- download inspect.py (26 kb)