mod_fileiri development notes (original) (raw)
mod_fileiri: new Apache module under development
Development Notes | Testing | Documentation
For contributions (comments, testing, ports, patches,...) please contactMartin Dürst
Source
Documentation
See XML file in CVS (still a lot to be done)
Talks
Internationalized Resounce Identifiers (IRIs) - Server-side Implementation, 24th Internationalization & Unicode Conference in Atlanta, GA, USA, 4 September 2004.
Things to work on next:
- Think through end conditions in the code to doublecheck (we may be able to stop sooner on some cases)
- Check conflicts (e.g. both UTF-8 and legacy files exist)
- Test with Off directories and nonconvertible names
- Work through logging details
- Check what
AllowOverride
category(ies) the directives should go into (currentlyOptions
, maybe alsoFileInfo
) - Look at port to Apache 1.3
Things that work:
- Interaction with mod_speling
- Nested directories
- Multiple languages/encodings
- External redirects to UTF-8 (with Location: header; permanent)
- Double redirects (first external to UTF-8, then internally back to legacy)
- Only one external redirect per overall request
- Backwards mode with files in UTF-8 but requests in legacy
Directives (this may be outdated, not totally sure):
FileIRI
main option, settings:On
: Pathnames and filenames in legacy encoding, indicated byFilenameCharset
. External permanent redirect to UTF-8 if necessary, serving content from UTF-8 URI.Backwards
: Pathnames and filenames in UTF-8. External permanent redirect to UTF-8 if request inOldFilenameCharset
.Off
: No conversion is done. Any other/undefined value is equivalent to Off
FilenameCharset
: character encoding of pathnames and filenames in the directory (or below)OldFilenameCharset
: former character encoding of pathnames and filenames in the directory (or below), before switching to UTF-8.
FileIRI | Request encoding | Server file encoding | Action |
---|---|---|---|
Off | UTF-8 | UTF-8 | 200 OK |
Off | legacy | UTF-8 | 404 not found |
Off | UTF-8 | legacy | 404 not found |
Off | legacy | legacy | 200 OK |
On | UTF-8 | UTF-8 | ?? |
On | legacy | UTF-8 | ?? |
On | UTF-8 | legacy | internal redirect |
On | legacy | legacy | External redirect->internal redirect |
Backwards | UTF-8 | UTF-8 | 200 OK |
Backwards | legacy | UTF-8 | internal redirect |
Backwards | UTF-8 | legacy | 404 not found |
Backwards | legacy | legacy | 200 (accidental) |
Only | UTF-8 | UTF-8 | 404 not found |
Only | legacy | UTF-8 | 404 not found |
Only | UTF-8 | legacy | 200 OK |
Only | legacy | legacy | 404 not found |
2-d table:
Request encoding | UTF-8 | legacy | UTF-8 | legacy |
---|---|---|---|---|
Server file encoding | UTF-8 | UTF-8 | legacy | legacy |
FileIRI Off | 200 OK | 404 not found | 404 not found | 200 OK |
FileIRI On | ?? | ?? | internal redirect | External redirect->internal redirect |
FileIRI Backwards | 200 OK | internal redirect | 404 not found | 200 (accidental) |
FileIRI Only | 404 not found | 404 not found | 200 OK | 404 not found |
Testing
List of file names to test with (U: UTF-8,L: Legacy):
Language | Text | Translation | legacy encoding | U -> L | L -> L | L -> U | U -> U | escaped UTF-8 | escaped legacy |
---|---|---|---|---|---|---|---|---|---|
French | résumé | summary | iso-8859-1 | résumé | résumé | résumé | résumé | r%C3%A9sum%C3%A9 | r%E9sum%E9 |
German | Übersetzung | translation | iso-8859-1 | Übersetzung | Übersetzung | Übersetzung | Übersetzung | %C3%9Cbersetzung | %DCbersetzung |
German | Bücher | books | iso-8859-1 | Bücher | Bücher | Bücher | Bücher | B%C3%BCcher | B%FCcher |
German | Übersetzung | translation | iso-8859-2 | Übersetzung | Übersetzung | Übersetzung | Übersetzung | %C3%9Cbersetzung | %DCbersetzung |
Hungarian | előírás | regulation | iso-8859-2 | előírás | előírás | előírás | előírás | el%C5%91%C3%ADr%C3%A1s | el%F5%EDr%E1s |
Chinese (simp.) | 词典 | dictionary | gb2312 | 词典 | 词典 | 词典 | 词典 | %E8%AF%8D%E5%85%B8 | %B4%CA%B5%E4 |
Japanese | 日記 | diary | shift_jis | 日記 | 日記 | 日記 | 日記 | %E6%97%A5%E8%A8%98 | %93%FA%8BL |
Japanese | 日記 | diary | euc-jp | 日記 | 日記 | 日記 | 日記 | %E6%97%A5%E8%A8%98 | %C6%FC%B5%AD |
Korean | 소설 | novel | euc-kr | 소설 | 소설 | 소설 | 소설 | %EC%86%8C%EC%84%A4 | %BC%D2%BC%B3 |
Arabic | كتب | books | iso-8859-6 | كتب | كتب | كتب | كتب | %D9%83%D8%AA%D8%A8 | %E3%CA%C8 |
Arabic | كتب | books | windows-1256 | كتب | كتب | كتب | كتب | %D9%83%D8%AA%D8%A8 | %DF%CA%C8 |
Russian | перевод | translation | iso-8859-5 | перевод | перевод | перевод | перевод | %D0%BF%D0%B5%D1%80%D0%B5%D0%B2%D0%BE%D0%B4 | %DF%D5%E0%D5%D2%DE%D4 |
Russian | перевод | translation | koi8-r | перевод | перевод | перевод | перевод | %D0%BF%D0%B5%D1%80%D0%B5%D0%B2%D0%BE%D0%B4 | %D0%C5%D2%C5%D7%CF%C4 |
Russian | перевод | translation | windows-1251 | перевод | перевод | перевод | перевод | %D0%BF%D0%B5%D1%80%D0%B5%D0%B2%D0%BE%D0%B4 | %EF%E5%F0%E5%E2%EE%E4 |
Hierarchical tests
UTF-8 -> legacy: Übersetzung/перевод/Übersetzung.html
mixed -> legacy: UÜbersetzung/Uперевод/LÜbersetzung.html,UÜbersetzung/Lперевод/UÜbersetzung.html,UÜbersetzung/Lперевод/LÜbersetzung.html,LÜbersetzung/Uперевод/UÜbersetzung.html,LÜbersetzung/Uперевод/LÜbersetzung.html,LÜbersetzung/Lперевод/UÜbersetzung.html
legacy -> legacy: Übersetzung/перевод/Übersetzung.html
UTF-8 -> UTF-8: Übersetzung/перевод/Übersetzung.html
mixed -> UTF-8: UÜbersetzung/Uперевод/LÜbersetzung.html,UÜbersetzung/Lперевод/UÜbersetzung.html,UÜbersetzung/Lперевод/LÜbersetzung.html,LÜbersetzung/Uперевод/UÜbersetzung.html,LÜbersetzung/Uперевод/LÜbersetzung.html,LÜbersetzung/Lперевод/UÜbersetzung.html
legacy -> UTF-8: Übersetzung/перевод/Übersetzung.html
Special cases
Nonexisting documents: réservation(UTF-8->legacy), réservation(legacy->legacy), réservation(legacy->UTF-8), réservation(UTF-8->UTF-8)
Misspellings: résumè(UTF-8->legacy), résumè(legacy->legacy), résumè(legacy->UTF-8), résumè(UTF-8 -> UTF-8)