Re: [XML-SIG] pulldom CHARACTERS problem (original) (raw)

I solved the problem and am responding to myself for the benifit of future googlers. The sax parsers my split nodes of type CHARACTERS into multiple nodes so they have to be joined back together. Since pulldom depends on a sax parser it also may do this. My method to find and join together the next CHARACTERS node is below. It assumes that self.event,self.node = iter.next() was executed previously.

def getCharacterNode(self,iter):
    while self.event != 'CHARACTERS':
        self.event,self.node  = iter.next()
    chars=[]
    chars.append(self.node.nodeValue)
    self.event,self.node  = iter.next()
    while self.event == 'CHARACTERS':
        chars.append(self.node.nodeValue)
        self.event,self.node  = iter.next()
    return ''.join(chars)

Cheers, Grant

I am having a problem with only getting part of characters in CHARACTERS node. I am using code like this

doc = xml.dom.pulldom.parse(inFile) iter=iter(doc) event,node = iter.next() if event == 'CHARACTERS': char =self.node.nodeValue

In my small tests it works fine but with a large file (2MB) errors start occuring. XML like

Name

sometimes produces char== 'N' or 'Na' where and what it produces varies if I remove some nodes at the begining of the file. the nodes I remove seem parse fine but which later node parses wrong changes. I though maybe it was related a buffering problem but this only a 4 character string. I tried changing the buffering to line buffering-- parse(inFile,None,1) --as the phrase Name always occurs on one line, this had no affect. I tried this with both python 2.3.5 and 2.4 I have not installed pyXML

Any suggestions would be appreciated.

Cheers, Grant


XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig