Representing Unix filenames in Unicode (original) (raw)

Next message: Doug Ewell: "Re: Representing Unix filenames in Unicode"


On 28 Nov 2005, at 03:39, Christopher JS Vance wrote:

> UTF-8, created as FSS-UTF, was invented specifically to enable its use
> for Unix/POSIX and similar filenames.

Right.

> The problem is people trying to create filenames which aren't UTF-8.
> Provided you use the same character set for all filenames, the problem
> was solved before the Unicode/10646 merger (see Plan 9 from Bell
> Labs).

Right, on a high level, on the human interface level. But the problem
is that the same character set is not going to be used for all
filenames, especially when you mix filesystems. One cannot even be
sure that the Unicode/10646 set will be a final character set. On a
low level, the computer to computer interface level, almost all
filesystems do not interpret the byte strings used as filenames (only
one exception was quoted on the UNIX/POSIX list), and there is no
obvious benefit of doing so. For example, the case insensitive Mac OS
HFS filesystem stores the filenames as is, and emulates the case
insensitivity by interface functions addressing it. By overriding
those functions, case sensitivity can implemented, and in addition,
Apple now has a case sensitive version. Most facts points to that the
Unicode/10646 is a human interface, not a computer to computer to
computer interface. But it is not impossible to do it otherwise, just
question of what might be most efficient.

Hans Aberg



This archive was generated by hypermail 2.1.5: Sun Nov 27 2005 - 23:37:30 CST