Issue 12015: possible characters in temporary file name is too few (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, eric.araujo, jwilk, methane, miss-islington, ncoghlan, neologix, pitrou, planet36, python-dev, rhettinger, vstinner
Priority: normal Keywords: easy, needs review, patch

Created on 2011-05-06 05:51 by planet36, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tempfile.py.diff planet36,2011-05-06 05:51 Patch to revert the character set of temporary file names to [A-Za-z0-9_] review
tempfile.py.diff planet36,2011-05-10 02:54 Choose 8 random characters rather than 6. review
Pull Requests
URL Status Linked Edit
PR 6425 merged miss-islington,2018-04-09 00:43
PR 6426 merged miss-islington,2018-04-09 00:44
Messages (20)
msg135261 - (view) Author: Steve Ward (planet36) Date: 2011-05-06 05:51
In this revision <http://hg.python.org/cpython/rev/b6da97930f63>, the possible characters that comprise a temporary file decreased. It was noted in this thread <http://mail.python.org/pipermail/python-dev/2010-November/105452.html>. The old character set had more than 24 times as many possibilities! [1] According to POSIX, the possible characters from which temporary file names should be constructed is [A-Za-z0-9._-]. [2] The character set should be restored to [A-Za-z0-9_] at the least. A patch is attached. The practices of other platforms, libraries, and languages are listed below. [3-12] [1] Old character set: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_ (26+26+10+1) ^ 6 62523502209 Current character set: abcdefghijklmnopqrstuvwxyz0123456789_ (26+10+1) ^ 6 2565726409 62523502209 / 2565726409 24.36873315474378000994 [2] POSIX <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276> 3.276 Portable Filename Character Set The set of characters from which portable filenames are constructed. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ - [3] int mkstemp(char *template); <http://pubs.opengroup.org/onlinepubs/9699919799/functions/mkstemp.html> The string in template should look like a filename with six trailing 'X' s; mkstemp() replaces each 'X' with a character from the portable filename character set. [4] glibc <http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/posix/tempname.c;hb=HEAD> /* These are the characters used in temporary filenames. */ static const char letters[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; [5] gnulib <http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/tempname.c;hb=HEAD> /* These are the characters used in temporary file names. */ static const char letters[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; [6] freebsd libc <http://svnweb.freebsd.org/base/head/lib/libc/stdio/mktemp.c?view=markup> static const unsigned char padchar[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; [7] openbsd libc <http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdio/mktemp.c?rev=HEAD;content-type=text%2Fplain> #define TEMPCHARS "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" [8] netbsd libc <http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/lib/libc/stdio/gettemp.c?rev=HEAD&content-type=text/plain> (uses pid) [9] perl <http://perl5.git.perl.org/perl.git/blob/HEAD:/cpan/File-Temp/Temp.pm> my @CHARS = (qw/ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 _ /); [10] php <http://svn.php.net/viewvc/php/php-src/trunk/main/php_open_temporary_file.c?view=markup> (if Windows, call GetTempFileName; else if havel mkstemp, call it; else call mktemp) [11] ruby <http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/lib/tmpdir.rb?view=markup> (uses current date, pid, and a random number between 0 and 4294967296 represented in base 36 (lowercase)) [12] Windows <http://msdn.microsoft.com/en-us/library/aa364991.aspx> \
.TMP Component	Meaning 	Path specified by the lpPathName parameter 
	First three letters of the lpPrefixString string 	Hexadecimal value of uUnique Only the lower 16 bits of the uUnique parameter are used. This limits GetTempFileName to a maximum of 65,535 unique file names if the lpPathName and lpPrefixString parameters remain the same.
msg135264 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-05-06 06:31
- How is it an issue? is the number of combinations really important to you? - Isn't there a greater risk of collisions on a case-insensitive filesystem?
msg135265 - (view) Author: Steve Ward (planet36) Date: 2011-05-06 08:00
It is an issue because... 1) Python falls out of line with most other popular systems and languages. 2) There was never a good reason to decrease the entropy in the first place. If there was, the reason of case-sensitive files systems was not given. 3) If case-sensitivity was an issue, why wasn't it raised before? For more than 8 years (2002-08-09 till 2010-11-09) the issue never warranted a change. The initial commit is here. http://svn.python.org/view?view=revision&revision=28113
msg135336 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-05-06 17:15
FWIW: Conformity with popular systems is not a goal.
msg135602 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:05
Python should still work reliably even if the temp directory is on a case-insensitive filesystem. Since that isn't an easy thing to determine, -1 on restoring the uppercase characters to the filenames. As for it not being fixed previously, it's a rare bug that only affects case-insensitive filesystems. Just because other groups have decided not to care about it (typically due to a heavy bias towards case-sensitive POSIX filesystems), that is no reason for us to ignore it once we're aware of it. If we really wanted to increase the entropy, better to increase the number of random characters included in the name to 7 or 8 (with each additional character increasing the pool size by a factor of 37). After all, why perpetuate an arbitrary restriction that is due primarily to the difficulty of constructing nice string manipulation interfaces in C?
msg135603 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-09 16:09
If you would like more entry, use a longer filename: >>> (26+10+1) ** 6 > (26+26+10+1) ** 6 False >>> (26+10+1) ** 7 > (26+26+10+1) ** 6 True
msg135604 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:15
The number of characters in the filename is hardcoded in the guts of tempfile (in _RandomNameSequence.__next__ to be exact). Increasing that to 7 (or even 8) would be a straightforward change at the library level, but it can't be done from user code.
msg135605 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-09 16:20
Could someone explain me what's the risk on a case-insensitive filesystem? Since files are created with O_CREAT|O_EXCL, I don't see where the problem is.
msg135607 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:34
Basically, adding the uppercase letters back in would provide no gain in entropy on a case-insensitive filesystem (as the upper and lower case letters will collide). Increasing the length of the random sequence would provide a consistent gain regardless of filesystem properties.
msg135609 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-05-09 16:37
+1 for increasing the number of random characters to 8 -1 for restoring uppercase letters
msg135611 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:48
Minor clarification: after my first comment, I realised that the original change in the character set didn't actually fix anything, it just propagated the effectively smaller sample space on case-insensitive filesystems to all filesystems. Increasing the number of random characters to 8 will more than compensate for that reduction while retaining the cross-platform consistency in behaviour.
msg135612 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-09 16:50
@Nick I fully agree with you, but my point was that it doesn't make it less safe on case-insensitive filesystems. Apart from that, it's of course better to increase the length of the random sequence.
msg135615 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 17:06
Oh, you mean the security risk if the temporary names can be guessed? My recollection is that it is more of a problem for temporary directories than it is for temporary files, since the module can't control how files inside a temp directory get created. It's been a long time since I read anything detailed on the threat models and attack scenarios mkstemp() and friends were designed to handle though, so Google may be a better source of answers on that front.
msg135644 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-09 20:49
If you care about security, you should not use the Python random module because it is not cryptographic. Oh oh, ssl doesn't expose RAND_bytes().
msg135646 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-09 20:54
> If you care about security, you should not use the Python random > module because it is not cryptographic. Oh oh, ssl doesn't expose > RAND_bytes(). Feel free to open a separate issue and even provide a patch!
msg195103 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-13 23:28
New changeset de5077aca668 by Victor Stinner in branch 'default': Close #12015: The tempfile module now uses a suffix of 8 random characters http://hg.python.org/cpython/rev/de5077aca668
msg195104 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-13 23:35
With 8 lowercase characters, the entropy is 41.7 bits, whereas it is only 35.9 bits for 6 characters with uppercase and lowercase letters. >>> math.log(26+26+10+1, 2) * 6 # (a-zA-Z0-9_) x 6 35.8636795409995 >>> math.log(26+10+1, 2) * 6 # (a-z0-9_) x 6 31.256720193773702 >>> math.log(26+10+1, 2) * 8 # (a-z0-9_) x 8 41.6756269250316 My changeset improves the entropy, it is now higher than with the old charset. I don't know if it is enough or not to be safe. systemd creates a temporary directory per service. Linux 3.11 will add a new O_TMPFILE to open() which allow to create a file with no name. Using the flag should help to workaround the race condition attack. See #18673 for O_TMPFILE.
msg315101 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-04-09 00:42
New changeset 9c463ec88ba21764f6fff8e01d6045a932a89438 by INADA Naoki (Wolfgang Maier) in branch 'master': Update docstring of tempfile._RandomNameSequence (GH-6414) https://github.com/python/cpython/commit/9c463ec88ba21764f6fff8e01d6045a932a89438
msg315102 - (view) Author: miss-islington (miss-islington) Date: 2018-04-09 01:01
New changeset d964f51f813282171d4da831e8de0fe003253e9e by Miss Islington (bot) in branch '3.7': Update docstring of tempfile._RandomNameSequence (GH-6414) https://github.com/python/cpython/commit/d964f51f813282171d4da831e8de0fe003253e9e
msg315104 - (view) Author: miss-islington (miss-islington) Date: 2018-04-09 01:26
New changeset 335efd7c252799eeeb8cbf51d178b1b897a91ae2 by Miss Islington (bot) in branch '3.6': Update docstring of tempfile._RandomNameSequence (GH-6414) https://github.com/python/cpython/commit/335efd7c252799eeeb8cbf51d178b1b897a91ae2
History
Date User Action Args
2022-04-11 14:57:17 admin set github: 56224
2018-04-09 01:26:59 miss-islington set messages: +
2018-04-09 01:01:56 miss-islington set nosy: + miss-islingtonmessages: +
2018-04-09 00:44:49 miss-islington set pull_requests: +
2018-04-09 00:43:50 miss-islington set pull_requests: +
2018-04-09 00:42:49 methane set nosy: + methanemessages: +
2014-01-30 14:49:11 jwilk set nosy: + jwilk
2013-08-13 23:35:48 vstinner set messages: +
2013-08-13 23:28:40 python-dev set status: open -> closednosy: + python-devmessages: + resolution: fixedstage: patch review -> resolved
2013-08-03 13:38:08 neologix set keywords: + easy, needs reviewstage: patch reviewtype: enhancementversions: + Python 3.4, - Python 3.3
2011-05-10 02:54:07 planet36 set files: + tempfile.py.diff
2011-05-09 20:54:15 pitrou set messages: +
2011-05-09 20:49:22 vstinner set messages: +
2011-05-09 17:06:36 ncoghlan set messages: +
2011-05-09 16:50:08 neologix set messages: +
2011-05-09 16:48:15 ncoghlan set messages: +
2011-05-09 16:37:34 rhettinger set messages: +
2011-05-09 16:34:58 ncoghlan set messages: +
2011-05-09 16:20:52 neologix set nosy: + neologixmessages: +
2011-05-09 16:15:24 ncoghlan set messages: +
2011-05-09 16:09:03 vstinner set nosy: + vstinnermessages: +
2011-05-09 16:05:26 ncoghlan set messages: +
2011-05-06 17:15:59 eric.araujo set nosy: + ncoghlan, pitrou
2011-05-06 17:15:33 eric.araujo set nosy: + eric.araujomessages: + versions: - Python 3.1, Python 3.2, Python 3.4
2011-05-06 08:13:03 amaury.forgeotdarc set nosy: + rhettinger
2011-05-06 08:00:27 planet36 set messages: +
2011-05-06 06:31:22 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2011-05-06 05:51:46 planet36 create