Subtle combination causing printk to go astray (original) (raw)
A rather interesting issue related to the discussion in #71, the implementation of a precision timer and a fix for divide-by-zero.
The problem occurs only in QEMU, which means that QEMU itself may be the culprit. That said, it takes either a very fast or very recent version (maybe both) of QEMU for the problem to occur. Which makes the analysis even harder. The symptoms simply don't make sense. A classic start of a real rabbit chase.
The problem: A special sequence of actions cause printk to either cause the current function to exit or the system to hang, a classic stack pollution case.
The situation leading to the problem:
- using the value returned from
get_ptime - a long divide involving that returned value
- a call to
printkusing the result of that division - and encapsulating it all, a very fast QEMU - or possibly just a very fast processor, which may be a clue to where the rabbit hides, a race condition??
@ghaerr, it would be interesting if you run this code snippet on ELKS on your QEMU which is likely quite a bit faster than my Mac Mini, which does not show the problem.
Anyway, this code snippet (not doing anything other than demonstrating the problem) demonstrates the problem: Either the statements from the second printk to the end of the function are skipped (like an immediate return) - or we have a hang (the experimental code resides in init/main.c):
{ unsigned long temp;
int dd;
unsigned d5, d6;
d5 = 50;
get_ptime();
temp = get_ptime();
printk("Direct %lu <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>", temp); <--- this is printed, value between 3 and 6 (11 on the MacMini)
//temp = 3UL;
dd = temp*838UL/1000UL;
printk("XXXX %lu", temp); <---- this prints too, showing that the div is OK and temp us unaffected
printk("UL dd=%d <span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi mathvariant="normal">"</mi><mo separator="true">,</mo><mi>d</mi><mi>d</mi><mo stretchy="false">)</mo><mo separator="true">;</mo><mo><</mo><mo>−</mo><mo>−</mo><mo>−</mo><mi>t</mi><mi>h</mi><mi>i</mi><mi>s</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>a</mi><mi>n</mi><mi>y</mi><mi>p</mi><mi>r</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>k</mi><mi>b</mi><mi>e</mi><mi>l</mi><mi>o</mi><mi>w</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>s</mi><mi>a</mi><mi>m</mi><mi>e</mi><mi>f</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mi>n</mi><mi>e</mi><mi>v</mi><mi>e</mi><mi>r</mi><mi>s</mi><mi>h</mi><mi>o</mi><mi>w</mi><mo separator="true">,</mo><mi>t</mi><mi>h</mi><mi>e</mi><mi>f</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mi>j</mi><mi>u</mi><mi>s</mi><mi>t</mi><mi>e</mi><mi>x</mi><mi>i</mi><mi>t</mi><mi>s</mi><mo>−</mo><mi>o</mi><mi>r</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>s</mi><mi>y</mi><mi>s</mi><mi>t</mi><mi>e</mi><mi>m</mi><mi>h</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>s</mi><mi>r</mi><mi>i</mi><mi>g</mi><mi>h</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>d</mi><mi>d</mi><mo>=</mo><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mi>l</mi><mi>o</mi><mi>n</mi><mi>g</mi><mo stretchy="false">)</mo><mi>t</mi><mi>e</mi><mi>m</mi><mi>p</mi><mo>∗</mo><mn>100</mn><mi>L</mi><mo stretchy="false">)</mo><mi mathvariant="normal">/</mi><mo stretchy="false">(</mo><mi>l</mi><mi>o</mi><mi>n</mi><mi>g</mi><mo stretchy="false">)</mo><mi>d</mi><mn>5</mn><mo separator="true">;</mo><mi>p</mi><mi>r</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>k</mi><mo stretchy="false">(</mo><mi mathvariant="normal">"</mi><mi>S</mi><mi>i</mi><mi>g</mi><mi>n</mi><mi>e</mi><mi>d</mi><mi>d</mi><mi>d</mi><mo>=</mo></mrow><annotation encoding="application/x-tex">", dd); <--- this and any printk below in the same function never show, the function just exits - or the system hangs right here
dd = ((long)temp * 100L)/(long)d5;
printk("Signed dd=%d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">"</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">dd</span><span class="mclose">)</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel"><</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord">−</span><span class="mord mathnormal">t</span><span class="mord mathnormal">hi</span><span class="mord mathnormal">s</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal">an</span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span><span class="mord mathnormal">p</span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal">kb</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">es</span><span class="mord mathnormal">am</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">c</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">nn</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">ers</span><span class="mord mathnormal">h</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">c</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.05724em;">nj</span><span class="mord mathnormal">u</span><span class="mord mathnormal">s</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord mathnormal">i</span><span class="mord mathnormal">t</span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">or</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">esys</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">mhan</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal" style="margin-right:0.02778em;">sr</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">h</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">ere</span><span class="mord mathnormal">dd</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">((</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mclose">)</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">m</span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">100</span><span class="mord mathnormal">L</span><span class="mclose">)</span><span class="mord">/</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mclose">)</span><span class="mord mathnormal">d</span><span class="mord">5</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">p</span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mopen">(</span><span class="mord">"</span><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">n</span><span class="mord mathnormal">e</span><span class="mord mathnormal">ddd</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span></span> ", dd);
...
}
Some observations from a seemingly infinite number of testvariants:
- Activating the
//temp = 3UL;line eliminates the problem - which is very interesting because it means that...
int d5;
get_ptime();
d5 = get_ptime();
... is not the same as
... even if that number is exactly what was returned from get_ptime();. So it's not the number returned but the call to get_ptime() that seems to be important. That said,
get_ptime();
get_ptime();
d5 = 4351;
... doesn't cut it either. The error cannot be provoked unless the actual return value from get_ptime() is used. Very strange indeed.
- Still, the variable itself is unimportant - as is its size. Setting
d6 = temp;whered6is anunsigned intand subsequently used6instead oftempin the division still triggers the problem. - Adding delays between the two
get_ptime()calls to increase the # of pticks to > 64k makes no difference, IOW whether get_ptime returns a long or a short doesn't make any difference. - This observation is supported by changing
temptounsigned intand casting the 2nd call toget_ptime(), still no difference. - Changing the
dd(int) in the first divide statement tod6(unsigned) changes behaviour slightly, from 'exit function and continue' to 'hang right there' - which means nothing other than that a 'stray' return happens to 'land' on a slightly different address. - The only way to make the routine 'work' with the return value from
get_ptime()is to AVOID long divide.
d5 = d6*838/1000; /* THIS ONE WORKS */
printk("2: Direct %x <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>", d5);
... which obviously is no good if d6 > 78 - but we're closer to a conclusion: Somehow the interaction between get_ptime() and long div (signed and unsigned) cause a stack pollution which in turn cause printk to malfunction on very fast machines - or there is a bug in QEMU 7.2.0.
And the hunt continues.