Decoding bencoded data with python (original) (raw)
I've written this because bencode.py in the BitTorrent source doesn't really handle nested bencoded data found in scrapes and this is on average 5-6 times faster than the perl implementation -- Hackeron
import re try: import psyco # Optional, 2.5x improvement in speed psyco.full() except ImportError: pass
decimal_match = re.compile('\d')
def bdecode(data): '''Main function to decode bencoded data''' chunks = list(data) chunks.reverse() root = _dechunk(chunks) return root
def _dechunk(chunks): item = chunks.pop()
if item == 'd':
item = chunks.pop()
hash = {}
while item != 'e':
chunks.append(item)
key = _dechunk(chunks)
hash[key] = _dechunk(chunks)
item = chunks.pop()
return hash
elif item == 'l':
item = chunks.pop()
list = []
while item != 'e':
chunks.append(item)
list.append(_dechunk(chunks))
item = chunks.pop()
return list
elif item == 'i':
item = chunks.pop()
num = ''
while item != 'e':
num += item
item = chunks.pop()
return int(num)
elif decimal_match.search(item):
num = ''
while decimal_match.search(item):
num += item
item = chunks.pop()
line = ''
for i in range(int(num)):
line += chunks.pop()
return line
raise "Invalid input!"
Note: this implementation turns out to be ca. twice as slow as the original Python implementation from Brahm Cohen, if psyco is not available. Anybody interested in using those encoding and decoding methods can do so with a standalone package from the Cheese Shop.