Subtle combination causing printk to go astray (original) (raw)

A rather interesting issue related to the discussion in #71, the implementation of a precision timer and a fix for divide-by-zero.

The problem occurs only in QEMU, which means that QEMU itself may be the culprit. That said, it takes either a very fast or very recent version (maybe both) of QEMU for the problem to occur. Which makes the analysis even harder. The symptoms simply don't make sense. A classic start of a real rabbit chase.

The problem: A special sequence of actions cause printk to either cause the current function to exit or the system to hang, a classic stack pollution case.

The situation leading to the problem:

@ghaerr, it would be interesting if you run this code snippet on ELKS on your QEMU which is likely quite a bit faster than my Mac Mini, which does not show the problem.

Anyway, this code snippet (not doing anything other than demonstrating the problem) demonstrates the problem: Either the statements from the second printk to the end of the function are skipped (like an immediate return) - or we have a hang (the experimental code resides in init/main.c):

{               unsigned long temp;
                int dd;
                unsigned d5, d6;

                d5 = 50;
                get_ptime();
                temp = get_ptime();
                printk("Direct %lu <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>", temp);       <--- this is printed, value between 3 and 6 (11 on the MacMini)
                //temp = 3UL;
                dd = temp*838UL/1000UL;
                printk("XXXX %lu", temp);                 <---- this prints too, showing that the div is OK and temp us unaffected
                printk("UL dd=%d <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi mathvariant="normal">&quot;</mi><mo separator="true">,</mo><mi>d</mi><mi>d</mi><mo stretchy="false">)</mo><mo separator="true">;</mo><mo>&lt;</mo><mo>−</mo><mo>−</mo><mo>−</mo><mi>t</mi><mi>h</mi><mi>i</mi><mi>s</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>a</mi><mi>n</mi><mi>y</mi><mi>p</mi><mi>r</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>k</mi><mi>b</mi><mi>e</mi><mi>l</mi><mi>o</mi><mi>w</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>s</mi><mi>a</mi><mi>m</mi><mi>e</mi><mi>f</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mi>n</mi><mi>e</mi><mi>v</mi><mi>e</mi><mi>r</mi><mi>s</mi><mi>h</mi><mi>o</mi><mi>w</mi><mo separator="true">,</mo><mi>t</mi><mi>h</mi><mi>e</mi><mi>f</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mi>j</mi><mi>u</mi><mi>s</mi><mi>t</mi><mi>e</mi><mi>x</mi><mi>i</mi><mi>t</mi><mi>s</mi><mo>−</mo><mi>o</mi><mi>r</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>s</mi><mi>y</mi><mi>s</mi><mi>t</mi><mi>e</mi><mi>m</mi><mi>h</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>s</mi><mi>r</mi><mi>i</mi><mi>g</mi><mi>h</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>d</mi><mi>d</mi><mo>=</mo><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mi>l</mi><mi>o</mi><mi>n</mi><mi>g</mi><mo stretchy="false">)</mo><mi>t</mi><mi>e</mi><mi>m</mi><mi>p</mi><mo>∗</mo><mn>100</mn><mi>L</mi><mo stretchy="false">)</mo><mi mathvariant="normal">/</mi><mo stretchy="false">(</mo><mi>l</mi><mi>o</mi><mi>n</mi><mi>g</mi><mo stretchy="false">)</mo><mi>d</mi><mn>5</mn><mo separator="true">;</mo><mi>p</mi><mi>r</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>k</mi><mo stretchy="false">(</mo><mi mathvariant="normal">&quot;</mi><mi>S</mi><mi>i</mi><mi>g</mi><mi>n</mi><mi>e</mi><mi>d</mi><mi>d</mi><mi>d</mi><mo>=</mo></mrow><annotation encoding="application/x-tex">&quot;, dd);    &lt;--- this and any printk below in the same function never show, the function just exits - or the system hangs right here
                dd = ((long)temp * 100L)/(long)d5;
                printk(&quot;Signed dd=%d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">&quot;</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">dd</span><span class="mclose">)</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">&lt;</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord">−</span><span class="mord mathnormal">t</span><span class="mord mathnormal">hi</span><span class="mord mathnormal">s</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal">an</span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span><span class="mord mathnormal">p</span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal">kb</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">es</span><span class="mord mathnormal">am</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">c</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">nn</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">ers</span><span class="mord mathnormal">h</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">c</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.05724em;">nj</span><span class="mord mathnormal">u</span><span class="mord mathnormal">s</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord mathnormal">i</span><span class="mord mathnormal">t</span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">or</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">esys</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">mhan</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal" style="margin-right:0.02778em;">sr</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">h</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">ere</span><span class="mord mathnormal">dd</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">((</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mclose">)</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">m</span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">100</span><span class="mord mathnormal">L</span><span class="mclose">)</span><span class="mord">/</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mclose">)</span><span class="mord mathnormal">d</span><span class="mord">5</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">p</span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mopen">(</span><span class="mord">&quot;</span><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">n</span><span class="mord mathnormal">e</span><span class="mord mathnormal">ddd</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span></span> ", dd);
                ...
}

Some observations from a seemingly infinite number of testvariants:

  1. Activating the //temp = 3UL; line eliminates the problem - which is very interesting because it means that...
   int d5;
   get_ptime();
   d5 = get_ptime();

... is not the same as

... even if that number is exactly what was returned from get_ptime();. So it's not the number returned but the call to get_ptime() that seems to be important. That said,

   get_ptime();
   get_ptime();
   d5 = 4351;

... doesn't cut it either. The error cannot be provoked unless the actual return value from get_ptime() is used. Very strange indeed.

  1. Still, the variable itself is unimportant - as is its size. Setting d6 = temp; where d6 is an unsigned int and subsequently use d6 instead of temp in the division still triggers the problem.
  2. Adding delays between the two get_ptime() calls to increase the # of pticks to > 64k makes no difference, IOW whether get_ptime returns a long or a short doesn't make any difference.
  3. This observation is supported by changing temp to unsigned int and casting the 2nd call to get_ptime(), still no difference.
  4. Changing the dd (int) in the first divide statement to d6 (unsigned) changes behaviour slightly, from 'exit function and continue' to 'hang right there' - which means nothing other than that a 'stray' return happens to 'land' on a slightly different address.
  5. The only way to make the routine 'work' with the return value from get_ptime() is to AVOID long divide.
                d5 = d6*838/1000;       /* THIS ONE WORKS */
                printk("2: Direct %x <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>", d5);

... which obviously is no good if d6 > 78 - but we're closer to a conclusion: Somehow the interaction between get_ptime() and long div (signed and unsigned) cause a stack pollution which in turn cause printk to malfunction on very fast machines - or there is a bug in QEMU 7.2.0.

And the hunt continues.