Issue 2694: msilib file names check too strict ? (original) (raw)

Created on 2008-04-26 04:18 by cdavid, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
make_id_fix_and_test.patch markm,2011-03-26 12:24 Patch to fix msilib.make_id() and test it review
Messages (8)
msg65834 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 04:18
Hi, I wanted to build a msi using the build_msi distutils command for one of my package, but at some point, it fails, at the function make_id, at line 177 in mstlib/__init__.py, for a file named aixc++.py. The regex indeed refuses any character which is not alphanumeric: is msi itself really that strict, or could this check be relaxed ?
msg65842 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 12:19
Indeed, the primary keys in many tables must be Identifiers, see http://msdn2.microsoft.com/en-us/library/aa369212(VS.85).aspx make_id tries to synthesize an identifier from a file name, and fails for your file names.
msg65845 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 15:56
Ok, thanks for the information. It may good to have a bit more informative error, though, such as saying which characters are allowed when checking against a regex ?
msg65846 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 16:02
Actually, the algorithm should be fixed to generate a valid identifier for any input. Would you like to work on a fix?
msg65847 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 16:13
It's not that I don't want to work on it, but I don't know anything about msi, except that some windows users of my packages request it :) So I would need some indication on what to fix exactly Do I understand right that dist_msi builds a database of the files, and that the identifiers could be named differently than the filenames themselves, as long as they are unique ?
msg65848 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 16:23
> Do I understand right that dist_msi builds a database of the files, and > that the identifiers could be named differently than the filenames > themselves, as long as they are unique ? Correct. As a design objective, I try to use identifiers close to the file names, to simplify debugging of the MSI file (Microsoft itself typically uses UUIDs instead). In short, just make make_id generate valid identifiers. An algorithm on top of that will make them unique in case of conflicts. Regards, Martin
msg132232 - (view) Author: Mark Mc Mahon (markm) * Date: 2011-03-26 12:24
How about the following patch and tests... Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx """The Identifier data type is a text string. Identifiers may contain the ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore.""" So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that. All the following were flagged as errors: 'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd' I also did some speed testing (just in case non/regex might be slow) Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from timeit import timeit >>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []' >>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import re") 4.434621757767205 >>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []' >>> timeit('"".join([c if c in identifier_chars else "_" for c in "somefilename.txt"])', setup) 3.3757537425069906 >>>
msg132543 - (view) Author: Mark Mc Mahon (markm) * Date: 2011-03-29 22:14
This issue has been fixed by changes made in and
History
Date User Action Args
2022-04-11 14:56:33 admin set github: 46946
2011-03-30 05:33:38 loewis set status: open -> closedresolution: fixed
2011-03-29 22:14:07 markm set messages: +
2011-03-26 12:24:18 markm set files: + make_id_fix_and_test.patchnosy: + markmmessages: + keywords: + patch
2010-01-13 01:58:56 brian.curtin set priority: normalstage: needs patchversions: + Python 2.7, - Python 2.5
2008-04-26 16:23:53 loewis set messages: +
2008-04-26 16:13:55 cdavid set messages: +
2008-04-26 16:02:31 loewis set messages: +
2008-04-26 15:56:06 cdavid set messages: +
2008-04-26 12:19:34 loewis set nosy: + loewismessages: +
2008-04-26 04🔞42 cdavid create