Minutes for XML Core WG telcon of 2007 October 24] from Konrad Lanz on 2007-10-25 (public-xml-core-wg@w3.org from October 2007) (original) (raw)

We'll so as you said on our last call RFC 3986 is hard and so is this ...

... believe me it was a pain to stick as close as possible to RFC 3986 ...

I had to dig out the actual code I wrote for this to reconstruct some pseudo code showing what it really does, so I think the approach to stick as close as possible to RFC 3986 has failed.

I think by changing the whole algorithm we may have more success. The main problem lies from my point of view in how step E. iterates over the input buffer.

I think the easiest would be to take a completely different approach! Maybe something like:

  1. Set a "path-absolute" flag if the input starts with a slash.

  2. Take an empty stack and an empty buffer.

  3. while(not at the end of the input) {

clear the buffer;

continue to scan the input from the left and append all non slash characters to a temporary buffer (buf) until a slash is reached that was preceded by a non slash character.

if (buf is '.'){ //ignore } else if (buf is '..') { if (stack is empty and not path-absolute) { //start to accumulate complete '..' path segments to the left push '..' onto the stack; }else if (stack is empty and path-absolute) { //ignore '..' path segments hitting the root; }else if (stack-peek is '..') { //stack is not empty is implied and continue to accumulate //complete '..' path segments to the left push '..' onto the stack; } else { //stack is not empty is implied so lets pop the the path //segment to the left pop the stack; } } }

  1. Take the stack now as a slash sperated list with the peek as the last element and prepend a slash if "path-absolute".

  2. If the last character of the input was a slash as well append a slash.

  3. If the last path segment is '..' and not already terminated by a slash append a slash as well.

Enjoy, can someone else also try to get his head around this .... I'll test it in my implementation as soon as I have time.


Anyhow please see below what we can achieve with RFC 3986 ...

Note: the // comments shall indicate the text from http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2007Jun/att-0000/Apendix.html

and they are interleaved with what it actually does. I further used used the asterisk to emphasize some things.

Grosso, Paul schrieb:

When you say:

If the input buffer starts with a root slash "/" ...

what is a "root slash"? Do you just mean a slash?

Well this should emphasize the point, that it is a

  path-absolute = "/" [ segment-nz *( "/" segment ) ]

rather than a

  path-rootless = segment-nz *( "/" segment )

I hope this makes sense to you.

And continuing:

...the output buffer is initialized with this root slash "/"

Does this mean the slash is removed from the input buffer now or not?

Yes, this shall emphasize that it is moved to the output buffer.

When you say:

if also the output does not contain the root slash "/" only

does this just mean "if it is not the case that the output buffer consists of just a single '/' character"?

  if (input starts with '../') {

// then // if also the output does not contain the root slash "/" only input delete prefix '../'; if(output == '/') { // move this prefix to the end of the output buffer output append '../'; }

I don't know how to parse (in 2A):

if...then...else if...then if...then...else...;otherwise.

  if (input starts with './') {
    // then remove that prefix from the input buffer
    input delete prefix './';
  } // , +else
  else
  // if the input buffer begins with a prefix of "../",
  if (input starts with '../') {

// then // if also the output does not contain the root slash "/" only input delete prefix '../'; if(output equals '/') { // move this prefix to the end of the output buffer output append '../'; } // else remove // that prefix

  } // otherwise,
  else

B.

In 2C, I don't know how to interpret:

if also the output buffer is empty, last segment in the output buffer

equals "../" or "..", where ".." is a complete path segment

I don't know what's being or-ed, and I'm not sure what the if test really is.

  // C. if the input buffer begins with a prefix of "/../" or "/..",
  // where ".." is a complete path segment,
  if (input starts with '/../' or '/..') {

// then replace that // prefix with "/" in the input buffer

    input replace prefix "/../" or "/.." with "/";

// and if also the output // buffer is empty, last segment in the output buffer equals // "../" or "..", where ".." is a complete path segment, then // append ".." or "/.." for the latter case respectively to the // output buffer if ( output is empty || last segment in the output buffer equals "../" || last segment in the output buffer equals "..") { // then append that prefix to the output buffer, if (last segment in the output buffer equals "../" ){ output append '..' ; else if (last segment in the output buffer equals ".." ){ } else { // else remove the last segment including it's // preceding "/" (if any) from the output buffer output delete last segment;

// and if hereby // the first character in the output buffer was removed and it // was not the root slash then delete a leading slash from the // input buffer.

        if (output had root slash && input starts with '/'){
          input delete first character;
        }
       
      }
    }       
  } // otherwise,
  else

In:

append ".." or "/.." for the latter case respectively

I don't know what that means. What is the latter case, what respectively to what, and just what am I appending when?

See above ....

In 3, where you say:

if the only or last segment of the output buffer is "..", where ".."

is a complete path segment

I know 3986 uses the term "complete path segment" (in fact, in 5.2.4, it refers to 'the special "." and ".." complete path segments'), but I'm still finding this wording complicated.
Do you just mean:

if the last (or only) segment of the output buffer is the ".."

complete path segment

I think complete the term complete path segment is intended to exclude segments like "..yes", "...", "a..b..c.." and so on .....

Konrad

-- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at

Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm