Definitive XML Application Development: 9780130889027: Computer Science Books @ Amazon.com (original) (raw)
Preface
This book was written to help you develop applications that use XML.It focuses on general principles and techniques, aiming to give youknowledge that will remain valuable even after the standards andtools described have evolved a few iterations further than they aretoday.
The text approaches XML by asking questions like: What problems isXML used to solve? What general approaches to these problems exist,and what tools support them? What other technologies is XML relatedto? How can XML be used with these other technologies? Afterreading the book you should, whenever you need to do something withXML, be able to think of several different ways of solving yourproblem and to choose the best of these.
Who is this book for?
This book is written for developers, and much of it requires knowledgeof programming. In general, the reader is expected tohave done enough object-oriented programming to know what a class or amethod is. The chapters in the first part of the book, as well as thefirst two chapters on XSLT, do not require programming knowledge andcould probably be useful to anyone who is familiar with XML.
This book is not an introduction to XML, as it assumes that you knowwhat XML is and have some familiarity with its main features. It doesnot assume that you are an expert, however, and will explain many ofthe subtler aspects of XML that have consequences for softwaredevelopment.
Although the book uses Python in the source code examples, knowingPython is not a prerequisite, since the book contains an Appendix A,"A lightning introduction to Python," on page 1054. Readers who arenot familiar with Python are strongly encouraged to read this appendixbefore going on to the rest of the book.
What the book covers
The book begins with a look at XML from the point of view of softwaredevelopment, comparing it to other related technologies. Many ofthe subtler aspects of XML, the XML family of standards, as well as theirrelationship to software development are also examined. Much space isdevoted to the principles of XML software development, usingparsers, and the existing techniques for development.
Three chapters are dedicated to each of the two most important XMLprogramming APIs: the SAX and the DOM. Two chapters are dedicated to XSLT.In addition to these standards, several lesser-known APIs, toolsand technologies are described. Some are included because of theirutility, others were meant to put the main technologies in perspective.
The last part of the book describes XML application design issues inmore detail and provides some larger examples that presentcomplete XML applications or toolkits.
In Appendix C, "Python XML packages," there is a description of variousdistributions of Python XML software and how to install each of thesedistributions. If you are new to XML processing with Python, it isprobably a good idea to look over this appendix before starting toread the tool-related parts of the book. Installing the tools so thatyou have them available and can play around with them as you read mayalso be a good idea.
The programming language
Python is a very high-level programming language that is unusuallywell suited for information-centric program development, since it hasexcellent support for creation and manipulation of data structures. It isa simple language, in many ways similar to the more widespread languages,such as Java, C++, and Visual Basic, but easier to understand and use.
This means that even though you may not understand Python now, youwill be able to learn it quickly. In general, I have found thatdevelopers need to study Python for two days in order to be able tocontribute usefully to projects. And since Python has so much incommon with other languages, you should be able to make use of whatyou learn even if you usually develop in other languages.
This book mainly uses Python in examples and does in fact have ageneral bent towards Python. Why this is so, and what is sointeresting about Python, may not be immediately obvious to you, sothis section explains what Python is and why it is so interesting.However, even though the book uses Python, it is intended to be usefulto all XML programmers, regardless of what programming languages theyknow or want to do XML programming in.
What is it?
Python is a programming language. It has often been called a scriptinglanguage, but I think this is a little misleading. The image, the term"scripting language" evokes in me is of a simple little language,dynamically typed and easy to use for amateurs, unsuitable for largeapplications, not as powerful as a "real" programming language, anddefinitely slower.
Python, however, is very much a "real" programming language, but atthe same time it has some of the characteristics of a scriptinglanguage. It is simple, it is very dynamic, it is easy to use foramateurs, and it is slower than compiled languages such as C++, Eiffel,and Common Lisp. At the same time, however, it is very powerful,certainly every bit as powerful as Java, if not more, and eminentlysuitable for large applications. Among the things that have beenwritten in Python are CORBA ORBs, Web browsers, relational databaseengines, validating XML parsers, and a full XSLT engine.
I often describe it as "Perl done right," and Python does have a lotin common with Perl. It is a scripting-like language, very suitablefor text processing and systems programming, with excellent operatingsystem integration and with many of the same features. (In fact,Perl's object-oriented features are modeled on Python's objectmodel.) Python is also like Perl in that it was created by asingle person for his own needs (Guido van Rossum), it used tobe distributed as a single widely-ported open source interpreterimplementation (there are now more than one), it is closely connectedto the Internet and Unix, etc., etc.
At the same time, Python has much in common with Java, in that itis dynamic (much more so than Java) and object-oriented, hasexceptions, has a very similar package model, supports in-programdocumentation, and Python byte-code can also be transferredacross a network and executed in a restricted environment.
I am something of a programming language freak and have donedevelopment in at least a dozen different programming languages, andstudied many more. In my experience, Python stands out because it is soeasy and natural to develop in, something that makes Pythondevelopment just plain nice and fun. Returning to Java or C++ afterdoing Python development simply feels painful and awkward. I thinkthis is because Python is so clean, simple, and predictable, with fewsurprises or restrictions and with a large set of ready-made andeasy-to-use libraries. Paul Prescod (affectionately known asthe "St. Paul" of Python evangelism) has said that "Python is alanguage that gets its tradeoffs exactly right," which sums it uppretty well.
A common denominator
Another reason for choosing Python is that no matter which programminglanguage the reader is already used to, Python should be easy to pickup, at least well enough to read. The syntax is clear and simple, and theconcepts in the language are very similar to those of mainstreamlanguages such as Java, C, C++, Visual Basic, and Perl. So Pythonshould not be an obstacle for any reader. In fact, it has often beendescribed as "executable pseudo-code," and you will see it used aspseudo-code in some parts of the book.
Furthermore, using Python does not limit us to a single platform.Python runs just as well on Unix as it does on Mac or Windows,or even on a Psion palmtop or a VMS machine.
Python can talk to anything
One of the most appealing aspects of Python is that it is very wellintegrated with the rest of the world. This means that choosing Pythonhardly ever shuts you off from some technology or system that youwould like your programs to interact with. For example, Microsoftfans will quickly discover that the Windows version of Python can talkto COM objects, create COM servers, connect to ActiveX, DDE, the Win32API, the Windows registry, MFC, Windows Scripting Host, ADO, ODBC, andso on and so forth. In other words, even though Python is highlyportable, you don't have to give up anything under Windows justbecause you use Python.
Many people, however, prefer to use something other than Windows, suchas the Mac. Python is technologically agnostic, so it allows thesepeople to have their way as well. Python runs on Mac, and the Macversion can access the communications toolbox, the font manager, thespeech manager, the sound manager, the QuickTime services, and so on.
Other people believe in Unix and would rather use Python there.Again, this is no problem: the Unix versions of Python fit very wellinto Unix, and there are bindings for things such as Qt, KDE, Gtk,GNOME, Irix and Solaris sound modules, special Linux APIs, etc.
Yet others would like to remain pure, platform-wise, and prefer astrictly operating system-independent platform such as Java. Pythoncan accommodate these people too! Jython (the interpreter formerlyknown as JPython) is an implementation of Python written in 100%Java which lets you run Python programs inside the Java virtualmachine. You can use this as an embedded scripting language for anapplication, or simply write Python programs with full access tothe nice Java stuff such as Swing, JDBC, Jini, RMI, etc.
And of course, apart from the platform issues, most of us wouldlike to be able to speak Internet protocols and connect to otherindependent technologies. Again, Python can help. There are severalways to connect Python with CORBA, a standardized relational databaseAPI (a JDBC for Python), lots of XML tools (of course), LDAP modules,and so on. And the interpreter comes with librariessupporting FTP, HTTP, gopher, NNTP, SMTP, IMAP, POP, HTML, URL parsing,and much more out of the box.
To put it another way, Python is buzzword-friendly and TLA-compatible (i.e., Three-Letter Acronym—Often used as a synonym for technologies, since many of them have three-letter acronym).
Python is a natural fit for XML programming
Whenever I have to write an XML program of some sort, I usuallythink of Python first as the programming language to write it in.There are several reasons for this, the most important beingthat Python is so easy and natural to program in and that it is verywell suited for text processing. It is also very easy to build datastructures in Python, something that is very important for XMLprocessing.
Another thing is that for anything that involves moving informationbetween different systems, Python is a natural choice, given thatwhatever these systems may be, Python can very likely talk tothem.
Also, Python is highly suitable for the many little programs andscripts that you write to do the small but necessary tasksthat usually appear during a project. Doing everything in Python makesit easy to turn prototypes into full programs, and it also means thatwhenever a little script has to be developed further, the full toolbox implemented for that application is already available to you.
Errors
Of course, I made changes after all the reviewers read the text,and in so doing no doubt introduced new errors. The honour for these errors, as well as for those so subtly hidden that they escaped the eyesof all my reviewers, and even the Editor himself, I should like toreserve for myself. You will find the best of these listed onhttp://www.garshol.priv.no/download/text/ph1/errata.html.If you spot one that is not on the list already, I would like to hearabout it.
Uptodateness
The main problem with books about Internet technology is that thetechnology changes so quickly that books rapidly become dated. To helpyou decide whether the book is up to date or not, here is a list ofthe various standards and tools covered in this book, and the versionof each that the book is based on.
In the table above, WD is used as an abbreviation for "Working Draft,"and CR for "Candidate Recommendation," both meaning W3C specificationsthat are still work in progress.