Nmap Development: Interesting Zenmap encoding bug (original) (raw)

nmap-dev logo

Nmap Developmentmailing list archives


From: David Fifield <david () bamsoftware com>
Date: Sun, 11 Oct 2009 21:46:03 -0600


Hi,

I had gotten some Zenmap crash reports that were variations on this theme:

File "zenmapGUI\ScanNotebook.pyo", line 184, in _target_entry_changed File "zenmapCore\NmapOptions.pyo", line 719, in render_string UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 1: unexpected end of data

It looks like a UTF-8 string got truncated, because bytes starting around 0xC2 are the start of UTF-8 sequences. It was happening when something was entered in the target box, after splitting the target string on whitespace. PyGTK returns the text content of its widgets in UTF-8, so that part wasn't surprising. I tried entering all kinds of characters that have multibyte UTF-8 representations, but I couldn't reproduce the crash. Then I got one report saying that the character à (which is shift-0 on a French keyboard).

I could reproduce the crash with à (U+00E0), but what's interesting is that it wouldn't happen with á (U+00E1). The key is in their UTF-8 representations. à is C3 A0 while á is C3 A1. U+00A0 happens to be NO-BREAK SPACE while U+00A1 is INVERTED EXCLAMATION MARK. That was the key. The NO-BREAK SPACE was being treated as whitespace. à in the target box was becoming the UTF-8 encoded byte string "\xc3\xa0", which the split function was turning into ["\xc3"], which when decoded led to an error because of the truncated sequence.

I was surprised that the split function would split on a non-ASCII character. In fact it doesn't by default, but apparently it does by default when the locale is loaded on Windows. In other words,

u'\u00e0'.encode('UTF-8')

'\xc3\xa0'

'\xc3\xa0'.split()

['\xc3\xa0']

import locale locale.setlocale(locale.LC_ALL, '')

'English_United States.1252'

'\xc3\xa0'.split()

['\xc3']

The problem is fixed by decoding the byte string returned by PyGTK before processing it. (However Nmap will likely choke once you try to run the scan because it's going to see the UTF-8 bytes in the host specification once it is serialized for the command line.)

David Fifield


Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org


Current thread: