Request for pathlib.Path.as_unc()
(original) (raw)
Some Windows apps might not support paths longer than MAX_PATH (~260 characters) without the UNC prefix. There are two workarounds for bypassing this limit[1]:
- Ask the end user to globally opt in to remove the MAX_PATH limit.
- Pass a UNC path. A well-behaved Windows app will work with long UNC paths without requiring the user to opt in globally.
However, pathlib.Path
seems to have no way to get its UNC representation. So my proposal is adding a pathlib.Path.as_unc()
[2] method. There are some questions though, primarily: what will this do on non-Windows platforms? Just not exist there? Return the path unchanged? Raise an error?
- Assuming the app itself supports longer paths, which most modern Windows apps should. ↩︎
- Feel free to bikeshed the name. ↩︎
Monarch (Monarch) May 8, 2025, 8:18pm 2
I think I’ve got my terminology confused a bit. What I’m looking for here is the DOS device path syntax where:
$ foobar.exe C:\Test\Foo\oo\oo\oo\oo\oo\oo\oo.txt # error because it exceeds MAX_PATH
$ foobar.exe \\?\C:\Test\Foo\oo\oo\oo\oo\oo\oo\oo.txt # success
File path formats on Windows systems - .NET | Microsoft Learn specifically refers to these as “device paths” rather than UNC paths.
My usecase is something along the lines of:
file = Path(...) # Outside of my control
subprocess.run(("foobar.exe", file))
The above fails because file
exceeds MAX_PATH
but foobar.exe
does support longer paths via the \\?\
syntax. This is where I would like pathlib
to save the day:
file = Path(...) # Outside of my control
subprocess.run(("foobar.exe", file.as_unc())) # or .as_dos_device_path() but that's a mouthful
pf_moore (Paul Moore) May 8, 2025, 10:11pm 3
Isn’t the transformation simply to add \\?\
to the start of the path? That doesn’t seem like something that needs a dedicated method.
Monarch (Monarch) May 8, 2025, 10:53pm 4
Assuming I’m reading the Windows docs right:
DOS device paths are fully qualified by definition and cannot begin with a relative directory segment (. or ..). Current directories never enter into their usage. - source
The transformation should be:
file = Path(...)
unc = Path(r"\\?\" + str(file.resolve(strict=True)))
I think the knowledge is obscure enough that a method to do the right thing is justified. We already have methods like as_posix()
. There might also be edge cases I’m not aware of. I’m not a Windows expert by any means. This is what I’ve found after a bit of researching after encountering some failures because my paths were too long.
csm10495 (Charles Machalow) May 9, 2025, 1:16am 5
This gets a bit more confusing. In Windows you could have a path that leads to a network share. Something like:
\\csm10495-server\Z
is valid if csm10495-server is a accessible network location with a “Z” share. I don’t think just adding \\?\
to the front works in that case.
In [1]: from pathlib import *
In [2]: Path(r"\\csm10495-server\Z").resolve(strict=True)
Out[2]: WindowsPath('//csm10495-server/z/')
In [3]: Path('//csm10495-server/z/').exists()
Out[3]: True
In [4]: Path('\\\\?\\' + '//csm10495-server/z/').exists()
Out[4]: False
I think in that case it needs to be in the UNC namespace (if that’s the right term?):
In [5]: Path("\\\\?\\UNC\\" + '//csm10495-server/z/').is_dir()
Out[5]: True
So I guess there would have to be some sort of logic to know if it needs UNC\
or not. I’m also not 100% sure if the prefix should be \\.\
or \\?\
. Even more interesting is that the UNC
version that works in Python, doesn’t work in the Windows Command Prompt.
C:\Windows\System32>dir "\\?\csm10495-server\Z"
The system cannot find the path specified.
C:\Windows\System32>dir "\\csm10495-server\Z"
Volume in drive \\csm10495-server\Z is Backup
Volume Serial Number is 3E4D-C8A9
Directory of \\csm10495-server\Z
<omitted>
C:\Windows\System32>dir "\\?\UNC\csm10495-server\Z"
The filename, directory name, or volume label syntax is incorrect.
If it was a simple thing to prefix and it always was right, I might be more onboard with it, though for now I think it has some footguns that without fully understanding and being sure of: we shouldn’t add to the stdlib.
Monarch (Monarch) May 9, 2025, 7:11am 6
That to me sounds like all the more reason why it should be in the stdlib. It’s not trivial to implement it yourself.
Monarch (Monarch) May 9, 2025, 1:05pm 7
Here’s an example implementation I wrote based on File path formats on Windows systems - .NET | Microsoft Learn.
import ntpath
import re
from pathlib import WindowsPath as BasePath
from typing import Self
class WindowsPath(BasePath):
def as_unc(self) -> Self:
# They designate a legacy device (CON, LPT1).
if any(name == str(self) for name in ntpath._reserved_names):
return type(self)(r"\\?" + rf"\{self}")
# They are device paths; that is, they begin with two separators and a question mark or period (\\? or \\.).
# They are UNC paths; that is, they begin with two separators without a question mark or period.
if str(self).startswith((r"\\", "//")):
return self
# They are fully qualified DOS paths; that is, they begin with a drive letter, a volume separator, and a component separator (C:\).
if re.match(r"^[A-Z]:[\\\/]", str(self.resolve()), re.IGNORECASE):
return type(self)(r"\\?" + rf"\{self.resolve()}")
return self
reserved = WindowsPath("CON")
print(reserved.as_unc()) #> \\?\CON
abs = WindowsPath(r"C:\Users\foobar\Pictures\img.jpeg").as_unc()
print(abs, abs.is_file()) #> \\?\C:\Users\foobar\Pictures\img.jpeg True
unc = WindowsPath(r"\\.\C:\Users\foobar\Pictures\img.jpeg").as_unc()
print(unc, unc.is_file()) #> \\.\C:\Users\foobar\Pictures\img.jpeg True
unc_posix_slash = WindowsPath(r"//?/C:/Users/foobar/Pictures/img.jpeg").as_unc()
print(unc_posix_slash, unc_posix_slash.is_file()) #> \\?\C:\Users\foobar\Pictures\img.jpeg True
rel = WindowsPath("img.jpeg").as_unc()
print(rel, rel.is_file()) #> \\?\C:\Users\foobar\dev\img.jpeg True
csm = WindowsPath("//csm10495-server/z/").as_unc()
print(csm) #> \\csm10495-server\z\
csm10495 (Charles Machalow) May 9, 2025, 3:20pm 8
They are UNC paths; that is, they begin with two separators without a question mark or period.
I thought that means that paths that start with \\.
or \\?
aren’t UNC.
For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system. For example, if the file system supports large paths and file names, you can exceed the MAX_PATH limits that are otherwise enforced by the Windows APIs.
From Naming Files, Paths, and Namespaces - Win32 apps | Microsoft Learn
Even then that prefix only makes sense for local files. Network ones need UNC\
in there like in my original example. Since otherwise the Windows API won’t resolve the network path since parsing is off.
So it’s not to_unc()
you’re after but something more like to_disable_winapi_string_parsing()
. (Bikeshedding, I know: but getting a real UNC path is harder than just disabling winapi strong parsing)
I wonder if for the original issue a function calling
GetShortPathNameW function (fileapi.h) - Win32 apps | Microsoft Learn might be equally helpful.
Monarch (Monarch) May 9, 2025, 3:39pm 9
Yep, a better approach might be splitting the functionality into two:
as_dos_device_path
oras_dos_device
(akin toas_posix
) that only deals with local pathsas_unc
for the rest
As always, both of the above are up for bikeshedding and possibly a better API.
barneygale (Barney Gale) May 9, 2025, 4:27pm 10
Interesting idea!
In order for this to land in pathlib
, I think it would need to be added to ntpath
first. IMO folks shouldn’t need to create Path
objects if they’d rather use strings.
(edit: to be clear, this is not an endorsement, I’m just clarifying that we try to avoid pathlib-exclusive features nowadays)
Monarch (Monarch) May 9, 2025, 4:58pm 11
That’s fine by me . I would just like a way to get the DOS device path representation of paths to work around a very real limitation on Windows. While
pathlib
today does preserve the \\?\
prefix, there’s no way to go from a regular path to a DOS device path, and as you can see from the above discussion, it’s probably not simple enough that we should just ask the end user to implement a utility method or function.
For some prior art, Rust’s fs::canonicalize always returns a DOS device path (although this has led to other issues, such as the fact that these paths will no longer work with software that is not long path aware). The popular choice there seems to be dunce::canonicalize, which is a third-party replacement for fs::canonicalize
that states:
This crate converts paths to legacy format whenever possible, but leaves UNC paths as-is when they can’t be unambiguously expressed in a simpler way. This allows legacy programs to access all paths they can possibly access, and UNC-aware programs to access all paths.
On non-Windows platforms these functions leave paths unmodified, so it’s safe to use them unconditionally for all platforms.
However, I’m not proposing to adopt the exact same behavior as Rust here. I’m not married to a specific API, so I’m open to any way this issue can be solved.