FM Radio Noise (original) (raw)

Terry Ritter

ACiphers By Ritter Page

At various times an FM radio has been suggested as a good source of random noise, and, thus, cryptographic random bits. Consequently, I included an FM radio noise source in my noise generator experiments.

In these experiments, the noise from various different generators was digitized by an ordinary PC sound card and stored in files, which were then analyzed. A massive array of statistical tests and graphs were computed for each noise source. One of these graphs was autocorrelation.

Strangely, the FM generator showed surprising long-term_correlations_ in the noise. I suspect these results are normal in FM receivers. While these correlations probably would not allow one to "predict the next bit" with absolute confidence, they almost certainly _do_allow one to predict bits with better than 0.5 probability, the cryptographic standard.

Because these correlations are so surprising, it is reasonable to wonder if the test setup is to blame. Fortunately, the various other measured generators testify otherwise; for example, the LFSR-based pseudo-noise generator shows just how good the test setup really is. We are thus forced to assume the effect is real, and that FM noise is a questionable cryptographic source.

In addition, it is common to suggest decimating the noise data, by using, for example, just a single bit of each sampled value. Using the lsb (least-significant bit) is probably not a good idea, due to nonlinearities in the sound card DAC (Digital-to-Analog Converter). But decimation of any sort does not fix the data; it just hides problems from ordinary statistical tests, so they can no longer testify about the quality of the generator.

I claim we want a noise generator to closely represent the signal we expect from a truly random quantum process. Only then can we be sure the data have a primarily quantum origin. After that, we can work to flatten the distribution and improve the entropy concentration by subsequent processing. But first, we need a good generator.


Contents


Subject: Re: Hardware RNGs Date: 2 Oct 2000 16:40:10 -0500 From: Zonn Message-ID: ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com References: 8r51rd$6i5$1@news.cis.ohio-state.edu 8r50a6$v1u$1@nnrp1.deja.com Newsgroups: sci.crypt.random-numbers Lines: 35

On 30 Sep 2000 15:45:49 GMT, in sci.crypt.random-numbers, carroll@cis.ohio-state.edu (Mark Carroll) wrote:

In article 8r50a6$v1u$1@nnrp1.deja.com, tr0pical@my-deja.com wrote:

In article 8qaofc$6f0$1@news.cis.ohio-state.edu, carroll@cis.ohio-state.edu (Mark Carroll) wrote:

Can anyone recommend any relatively cheap RNG cards for Intel boxes?

Why not use the intel pentium time stamp instruction to time interupt 13 instead? Do you need more than 2 million bits a minute?

I want something that gives me fairly raw data from some physical stochastic phenomenon. I'm not convinced about the entropy from IRQ13 timing.

A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card. 1khz and below are fine, since even the cheapest of radios and sound cards will pass this frequency. The volume control of the sound card should be set so that the hiss sampled is about 3/4 of maximum range. You never want the hiss to be "clipped" by the sound card since this would give you a runs of zeros and ones.

This is a simple "quick and dirty" method of generating random bits. In production it would be more realistic to replace the FM tuner with the noise generated by a leaky transistor or the noise generated across a zener diode, or similar circuitry.

Of course in reality there are no "noise free" sound cards, so you could just use the background thermal noise, on the sound card alone, to generate random bits if you stick with using only bit-0. Set the volume to max, set the input to "mic" and sample the "noise" of the input preamps.

-Zonn


Subject: Re: Hardware RNGs Date: 2 Oct 2000 21:48:54 GMT From: carroll@cis.ohio-state.edu (Mark Carroll) Message-ID: 8ravs6$1ug$1@news.cis.ohio-state.edu References: ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 3

Thanks very much - that's food for thought. (-:

-- Mark


Subject: Re: Hardware RNGs Date: Tue, 03 Oct 2000 02:58:06 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39d94a8e.3018968@news.io.com References: ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 44

On 2 Oct 2000 16:40:10 -0500, in ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, in sci.crypt.random-numbers Zonn wrote:

[...] A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card. 1khz and below are fine, since even the cheapest of radios and sound cards will pass this frequency. The volume control of the sound card should be set so that the hiss sampled is about 3/4 of maximum range. You never want the hiss to be "clipped" by the sound card since this would give you a runs of zeros and ones.

Some time ago I recorded noise from various sources and analyzed the results in detail with extensive statistics and various graphs. See, for example:

http://www.io.com/~ritter/NOISE/NOISCHAR.HTM

which is the characterization of the noise recordings from the descriptions and schematics in:

http://www.io.com/~ritter/NOISE/NOISRC.HTM

As one source, I recorded the noise from a pair of FM headphones, in mono mode, inside a ferrous and conductive tea can with only an RCA jack conducting audio to the outside. Interestingly, analysis showed unexpected and pronounced autocorrelations in the noise data. See, in particular:

http://www.io.com/~ritter/NOISE/FM1ME904.HTM

Note well the autocorrelation graph.

It is easy to think that noise per se must be unpredictable. But after substantial detailed analysis, my experience is that good noise is harder to find than one might think.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: 3 Oct 2000 09:50:19 -0500 From: hrubin@odds.stat.purdue.edu (Herman Rubin) Message-ID: 8rcrnb$1p2g@odds.stat.purdue.edu References: 39d94a8e.3018968@news.io.com Newsgroups: sci.crypt.random-numbers Lines: 71

In article 39d94a8e.3018968@news.io.com, Terry Ritter ritter@io.com wrote:

On 2 Oct 2000 16:40:10 -0500, in ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, in sci.crypt.random-numbers Zonn wrote:

[...] A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card. 1khz and below are fine, since even the cheapest of radios and sound cards will pass this frequency. The volume control of the sound card should be set so that the hiss sampled is about 3/4 of maximum range. You never want the hiss to be "clipped" by the sound card since this would give you a runs of zeros and ones.

Some time ago I recorded noise from various sources and analyzed the results in detail with extensive statistics and various graphs. See, for example:

http://www.io.com/~ritter/NOISE/NOISCHAR.HTM

which is the characterization of the noise recordings from the descriptions and schematics in:

http://www.io.com/~ritter/NOISE/NOISRC.HTM

As one source, I recorded the noise from a pair of FM headphones, in mono mode, inside a ferrous and conductive tea can with only an RCA jack conducting audio to the outside. Interestingly, analysis showed unexpected and pronounced autocorrelations in the noise data. See, in particular:

http://www.io.com/~ritter/NOISE/FM1ME904.HTM

Note well the autocorrelation graph.

It is easy to think that noise per se must be unpredictable. But after substantial detailed analysis, my experience is that good noise is harder to find than one might think.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM

I am not at all surprised at this. While the oscillator in changing fairly rapidly, the other parameters of the circuit are changing much more slowly. This can introduce transient biases, which will show up as autocorrelations.

Even the much more robust counting of the parity of radioactive decays is not as free of error as one might want, although it is probably fine for cryptographic purposes; this has been mentioned in the simulation literature a long time ago. The problem is not caused by dead time, although this does provide deviations from what is wanted, but these can be made quite small, and in fact, improve performance if not too much. It is rather that the counter is not quite symmetrical, and has a slightly different chance of having flipped upon interrogation rather than having flopped. Getting the conditional probability that a bit will be 0 given all preceding bits to differ from .5 by about 1/10^4 should not be difficult, and the channel capacity for intercept information is a small multiple of 1/10^8.

This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558


Subject: Re: Hardware RNGs Date: 4 Oct 2000 16:31:04 -0500 From: Zonn zonn@zonn.com Message-ID: 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com References: 39d94a8e.3018968@news.io.com Newsgroups: sci.crypt.random-numbers Lines: 83

On Tue, 03 Oct 2000 02:58:06 GMT, in msg 39d94a8e.3018968@news.io.com, ritter@io.com (Terry Ritter) wrote:

On 2 Oct 2000 16:40:10 -0500, in ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, in sci.crypt.random-numbers Zonn wrote:

[...] A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card. 1khz and below are fine, since even the cheapest of radios and sound cards will pass this frequency. The volume control of the sound card should be set so that the hiss sampled is about 3/4 of maximum range. You never want the hiss to be "clipped" by the sound card since this would give you a runs of zeros and ones.

Some time ago I recorded noise from various sources and analyzed the results in detail with extensive statistics and various graphs. See, for example:

http://www.io.com/~ritter/NOISE/NOISCHAR.HTM

which is the characterization of the noise recordings from the descriptions and schematics in:

http://www.io.com/~ritter/NOISE/NOISRC.HTM

As one source, I recorded the noise from a pair of FM headphones, in mono mode, inside a ferrous and conductive tea can with only an RCA jack conducting audio to the outside. Interestingly, analysis showed unexpected and pronounced autocorrelations in the noise data. See, in particular:

http://www.io.com/~ritter/NOISE/FM1ME904.HTM

There are many reasons not to use the direct samples from an FM radio, and certainly not by tapping of a headphone!

If the sample is taken directly across the headphones, you will have problems with the inductance of the headphones, which will skew your frequency response tests. (along with frequency response of the audio amplifiers, etc.)

Note well the autocorrelation graph.

It is easy to think that noise per se must be unpredictable. But after substantial detailed analysis, my experience is that good noise is harder to find than one might think.

There are many predictable things about FM noise, the most obvious being that it will only exist in the audio range so will roll off at around 15khz and 30hz (not counting de-emphisis built into the FM audio preamps). So using FM samples directly would be a poor choice of a HRNG.

In all of your sited cases I think you will find a very random source if "bits" (though not full samples), if you were to take only bit-0 of your samples and concatenate them to any word size you desire (with the possible exception of the pseudo random source). Keep your sample rate within the high end frequency response of your source. You could then save these results to a .WAV file (using any fake sample rate you want) and run them through your same tests. You should see very different results.

This is the reason I used only the lower bit (bit-0) of 16 bit samples, and concatenated them to form the full RN. Even if the "noise" consisted of a pure 60hz sine wave (hardly noise), and you attempted to sample at 60hz, in the real world slight drifts (the crystal used to control the sample rate will drift with temperature for example), will still cause bit-0 to twitter unpredictably.

To protect against even the above very unlikely scenario leading long runs of 0's and 1's (for one, I would highly suggest NOT sampling at 60hz), you could use the values obtained from previously generated random numbers to add/sub from the sample rates so that sampling is done with a random bias.

Using only bit-0 also protects against DC offset errors in the sample circuitry which always exists to some degree, and causes full sample reads to be skewed positive or negative.

It would be interesting to see if you find any correlations in a random numbers generated using only bit-0 of FM noise samples. I was unable to find any, though I didn't try particularly hard.

-Zonn


Subject: Re: Hardware RNGs Date: Wed, 04 Oct 2000 23:04:26 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39dbb76e.7181177@news.io.com References: 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 154

On 4 Oct 2000 16:31:04 -0500, in 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com, in sci.crypt.random-numbers Zonn zonn@zonn.com wrote:

On Tue, 03 Oct 2000 02:58:06 GMT, in msg 39d94a8e.3018968@news.io.com, ritter@io.com (Terry Ritter) wrote:

On 2 Oct 2000 16:40:10 -0500, in ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, in sci.crypt.random-numbers Zonn wrote:

[...] A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card. 1khz and below are fine, since even the cheapest of radios and sound cards will pass this frequency. The volume control of the sound card should be set so that the hiss sampled is about 3/4 of maximum range. You never want the hiss to be "clipped" by the sound card since this would give you a runs of zeros and ones.

Some time ago I recorded noise from various sources and analyzed the results in detail with extensive statistics and various graphs. See, for example:

http://www.io.com/~ritter/NOISE/NOISCHAR.HTM

which is the characterization of the noise recordings from the descriptions and schematics in:

http://www.io.com/~ritter/NOISE/NOISRC.HTM

As one source, I recorded the noise from a pair of FM headphones, in mono mode, inside a ferrous and conductive tea can with only an RCA jack conducting audio to the outside. Interestingly, analysis showed unexpected and pronounced autocorrelations in the noise data. See, in particular:

http://www.io.com/~ritter/NOISE/FM1ME904.HTM

There are many reasons not to use the direct samples from an FM radio, and certainly not by tapping of a headphone!

If the sample is taken directly across the headphones, you will have problems with the inductance of the headphones, which will skew your frequency response tests. (along with frequency response of the audio amplifiers, etc.)

Note well the autocorrelation graph.

It is easy to think that noise per se must be unpredictable. But after substantial detailed analysis, my experience is that good noise is harder to find than one might think.

There are many predictable things about FM noise, the most obvious being that it will only exist in the audio range so will roll off at around 15khz and 30hz (not counting de-emphisis built into the FM audio preamps). So using FM samples directly would be a poor choice of a HRNG.

In all of your sited cases I think you will find a very random source if "bits" (though not full samples), if you were to take only bit-0 of your samples and concatenate them to any word size you desire (with the possible exception of the pseudo random source).

That sort of approach is exactly what I have avoided, and it would prevent the sort of analysis we see on my pages. My analysis has revealed correlations -- faults -- which have not to my knowledge previously been documented. The analysis gives us a concrete way to distinguish between the quality of various noise sources.

Statistical tests are extremely useful in understanding flaws in noise generation, but statistical tests cannot certify data as being unpredictable. We also cannot verify or prove unpredictability. So, if we want unpredictable, in my view, the best approach is to start with the best noise source we can build, and go from there. Then we can hash those data appropriately and develop the necessary flat

It is easy enough to decimate any measured noise data and so confuse statistical analysis, but doing that does not make those sources any better. Most statistical tests are specifically designed to expose a particular kind of structure in integer or real data. Taking just a single bit converts the data into Bernoulli trials which require different tests and vastly more data to detect the original problem. It is not appropriate to decimate the data any way we want and then say: "Oh, look, the statistical tests can no longer find the problem, so we have made the data more random."

Every time we convert an integer value to a Boolean, we discard massive amounts of data which could be used to understand noise structure, and by doing that we act to deceive ourselves.

Keep your sample rate within the high end frequency response of your source. You could then save these results to a .WAV file (using any fake sample rate you want) and run them through your same tests. You should see very different results.

It is easy to hide faults from statistical tests. We don't want to do that. We want to expose faults and fix them.

This is the reason I used only the lower bit (bit-0) of 16 bit samples, and concatenated them to form the full RN. Even if the "noise" consisted of a pure 60hz sine wave (hardly noise), and you attempted to sample at 60hz, in the real world slight drifts (the crystal used to control the sample rate will drift with temperature for example), will still cause bit-0 to twitter unpredictably.

Not true. Sampling a sine wave is not unpredictable at all -- in fact it is completely predictable, except for noise.

Synchronization "jitter" is well-understood and non-random. Temperature variations are correctable and eventually predictable.

To protect against even the above very unlikely scenario leading long runs of 0's and 1's (for one, I would highly suggest NOT sampling at 60hz), you could use the values obtained from previously generated random numbers to add/sub from the sample rates so that sampling is done with a random bias.

And then we would have yet another sort of pseudorandom number generator, the hardware analogy to a software RNG. As such it is completely predictable, except for noise.

Using only bit-0 also protects against DC offset errors in the sample circuitry which always exists to some degree, and causes full sample reads to be skewed positive or negative.

Sound cards do not pass DC signals. The intervening capacitors do prevent against DC offset errors.

The more interesting effect is asymmetry in the waveform, which we can detect by computing the mean and noting that it is not zero. That effect, however, is real. We don't want to compensate for it, we want to fix it.

It would be interesting to see if you find any correlations in a random numbers generated using only bit-0 of FM noise samples. I was unable to find any, though I didn't try particularly hard.

No sort of statistical test can certify randomness or unpredictability. Consequently, the best we can do is build the best possible noise source, and then -- after we test it to be the best noise source we can get -- process the noise into random values. When we test the the resulting random values, what we are testing is our processing, and our ability to fool statistical tests.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: 5 Oct 2000 17:39:08 -0500 From: Zonn zonn@zonn.com Message-ID: 3svptskdq8t5k85miedh64jb681e33h65d@4ax.com References: 39dbb76e.7181177@news.io.com Newsgroups: sci.crypt.random-numbers Lines: 112

On Wed, 04 Oct 2000 23:04:26 GMT, in sci.crypt.random-numbers, ritter@io.com (Terry Ritter) wrote:

In all of your sited cases I think you will find a very random source if "bits" (though not full samples), if you were to take only bit-0 of your samples and concatenate them to any word size you desire (with the possible exception of the pseudo random source).

That sort of approach is exactly what I have avoided, and it would prevent the sort of analysis we see on my pages. My analysis has revealed correlations -- faults -- which have not to my knowledge previously been documented. The analysis gives us a concrete way to distinguish between the quality of various noise sources.

Statistical tests are extremely useful in understanding flaws in noise generation, but statistical tests cannot certify data as being unpredictable. We also cannot verify or prove unpredictability. So, if we want unpredictable, in my view, the best approach is to start with the best noise source we can build, and go from there. Then we can hash those data appropriately and develop the necessary flat distribution.

It is easy enough to decimate any measured noise data and so confuse statistical analysis, but doing that does not make those sources any better. Most statistical tests are specifically designed to expose a particular kind of structure in integer or real data. Taking just a single bit converts the data into Bernoulli trials which require different tests and vastly more data to detect the original problem. It is not appropriate to decimate the data any way we want and then say: "Oh, look, the statistical tests can no longer find the problem, so we have made the data more random."

Every time we convert an integer value to a Boolean, we discard massive amounts of data which could be used to understand noise structure, and by doing that we act to deceive ourselves.

The data being discarded was data that was being skewed by the filters, amps, etc, that stood between me and the original noise source which does not have structure.

Mark's original request was for a source of random bits with the request:

"I'm particularly interested in ones that generate numbers based on some physical stochastic phenomenon..."

My point with the FM, and using the lowest bit (which definitely has it's own disadvantages -- see some other follow ups to this thread), is that, assuming the sampling hardware is working properly, you will never come up with a formula that will predict what the next bit value will be.

The unpredictability of thermal noise and background radiation is very real, and absolutely unpredictable (and hopefully not in question here.)

What your measurements are measuring is not the "predictability" of the next sample, but the flaws of the hardware you're using to run the tests.

Thermodynamic noise is not audio, and it's not an 16 bit sample, and doesn't change at a constant 44.1khz rate, and can't directly be measured as such.

Take the classic example of radioactive decay. This has always been assumed to be a truly unpredictable event. Yet in order to use this phenomenon you have to build an apparatus to sample the event. Let's assume we'll measure the time between detected protons. In reality this is unpredictable, in the real world to measure this I must use a clock, any clock I use will have a "tick". So now if I take the output of this device and look for correlations, the first thing I find is that detections only take place on "tick" boundaries. So if I look for photons once every microsecond, then my sample is going to have an artificial 1mhz signal running through it. It's an unavoidable sampling error.

In all you're measurements, with the exception of the LFSR, you are sampling truly random thermodynamic events. And in each case what you have measured is flaws in the "random event to digital" interface. You have artificially converted the instantaneous unpredictable nature of the instantaneous voltage across a transistor, zener diode, FM hiss, etc to a 44.1khz 16 bit sample. And the have run correlations on this data. What you haven't done is try to predict the direction of a voltage change depending upon the previous runs of voltage changes.

A properly working 5 volt zener diode is going to have an average value of 5 volts, which is certainly NOT random! Yet the instantaneous, absolute voltage (to an infinite number of decimal places), is based on the tunneling effects of protons/electrons in the zener diode. This is as unpredictable as radioactive decay -- regardless of how you measure it.

The studies you've done are very useful in searching for an audio source of noise, for testing speakers, mics, room acoustics, etc. But what they are measuring are the "random voltage to sound source" interface. Not the predictability of the noise source -- which would be a matter best taken to rec.quantum.mechanics (or something like that ;^)

What I believe Mark was looking for was a cheap and dirty random bit generator, and not a noise source.

By using, what I might have naively considered, the bit in the ADC that most closely follows the current "hiss" voltage, I purposely tried to remove flaws in the hardware sampling of the random voltage. By as you say "decimating the noise" I was not trying to make the data "more random". That's impossible, the background radiation and thermodynamic unpredictability of FM hiss, is not in question here. This is a truly random event, and if it were to be measured otherwise, then the measurement apparatus must be brought into question.

A better way to capture the unpredictability of a the FM hiss, or the voltage across a zener diode, might be a fast comparator that compares the current, absolute voltage, with a running average of the voltage, but regardless, when done right, these are truly unpredictable voltage sources, and can be used as a source of truly random bits. Though designing the perfect sampling hardware may be near impossible!

As far as good audio noise sources go, long run LFSRs work rather well, as shown to be the case with your tests!

-Zonn


Subject: Re: Hardware RNGs Date: Fri, 06 Oct 2000 05:04:16 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39dd5c3a.5244885@news.io.com References: 3svptskdq8t5k85miedh64jb681e33h65d@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 248

On 5 Oct 2000 17:39:08 -0500, in 3svptskdq8t5k85miedh64jb681e33h65d@4ax.com, in sci.crypt.random-numbers Zonn zonn@zonn.com wrote:

On Wed, 04 Oct 2000 23:04:26 GMT, in sci.crypt.random-numbers, ritter@io.com (Terry Ritter) wrote:

In all of your sited cases I think you will find a very random source if "bits" (though not full samples), if you were to take only bit-0 of your samples and concatenate them to any word size you desire (with the possible exception of the pseudo random source).

That sort of approach is exactly what I have avoided, and it would prevent the sort of analysis we see on my pages. My analysis has revealed correlations -- faults -- which have not to my knowledge previously been documented. The analysis gives us a concrete way to distinguish between the quality of various noise sources.

Statistical tests are extremely useful in understanding flaws in noise generation, but statistical tests cannot certify data as being unpredictable. We also cannot verify or prove unpredictability. So, if we want unpredictable, in my view, the best approach is to start with the best noise source we can build, and go from there. Then we can hash those data appropriately and develop the necessary flat distribution.

It is easy enough to decimate any measured noise data and so confuse statistical analysis, but doing that does not make those sources any better. Most statistical tests are specifically designed to expose a particular kind of structure in integer or real data. Taking just a single bit converts the data into Bernoulli trials which require different tests and vastly more data to detect the original problem. It is not appropriate to decimate the data any way we want and then say: "Oh, look, the statistical tests can no longer find the problem, so we have made the data more random."

Every time we convert an integer value to a Boolean, we discard massive amounts of data which could be used to understand noise structure, and by doing that we act to deceive ourselves.

The data being discarded was data that was being skewed by the filters, amps, etc, that stood between me and the original noise source which does not have structure.

Mark's original request was for a source of random bits with the request:

"I'm particularly interested in ones that generate numbers based on some physical stochastic phenomenon..."

My point with the FM, and using the lowest bit (which definitely has it's own disadvantages -- see some other follow ups to this thread), is that, assuming the sampling hardware is working properly, you will never come up with a formula that will predict what the next bit value will be.

First of all, my response was that we now have solid experimental evidence that FM noise can have significant long-term correlation structure which makes it non-random.

Next, your claim is a red herring: The real test is whether one can predict the next bit with better than .5 probability. And that will happen whenever any sort of repeatable statistical fault occurs.

The unpredictability of thermal noise and background radiation is very real, and absolutely unpredictable (and hopefully not in question here.)

The statement is false. Certainly the thermal noise is unknowable, but it is likely to have a theoretical distribution which is not flat. And anything other than a flat distribution is, to some extent, predictable.

But the usual problem is that the machine which measures this supposedly random signal does so in an imperfect manner. Simply invoking a theoretically random source has little or nothing to do with the randomness of the measured result.

What your measurements are measuring is not the "predictability" of the next sample, but the flaws of the hardware you're using to run the tests.

As I concluded on that page, that FM radio, measured at the earphone, is not a good noise source.

But one might well have thought that radio would be a good cheap noise source, until I actually conducted the experiment and showed that it was not.

And if one uses any other FM source, one should worry about possible correlations there as well. Using FM noise was your suggestion.

Thermodynamic noise is not audio, and it's not an 16 bit sample, and doesn't change at a constant 44.1khz rate, and can't directly be measured as such.

Thermal noise exists only in theory until a machine detects it, but when that happens noise certainly can be audio, which can be approximated to CD fidelity by 16-bit samples at CD rate.

Recorded digital noise can be very good noise indeed, and we only need look at the graphic difference between measured good noise and measured bad noise to see how good recorded noise can be.

I note that a Fast Fourier Transform (FFT) used to show the frequency response of recorded data inherently requires sampled data.

Take the classic example of radioactive decay. This has always been assumed to be a truly unpredictable event. Yet in order to use this phenomenon you have to build an apparatus to sample the event. Let's assume we'll measure the time between detected protons. In reality this is unpredictable, in the real world to measure this I must use a clock, any clock I use will have a "tick". So now if I take the output of this device and look for correlations, the first thing I find is that detections only take place on "tick" boundaries. So if I look for photons once every microsecond, then my sample is going to have an artificial 1mhz signal running through it. It's an unavoidable sampling error.

First of all, radioactive decay is only random and unpredictable in a casual sense: In fact, we expect the delays between detected events to have a Poisson distribution. Since that is not a flat more likely than either shorter or longer values. It is only after we process and flatten the distribution that we can begin to think about real unpredictability.

The issues involved in the digital sampling of analog signals are well known. Many problems can be discussed. Simply interpreting FFT results involves similar problems. But the measured data clearly do reflect reality, despite such sampling.

In all you're measurements, with the exception of the LFSR, you are sampling truly random thermodynamic events. And in each case what you have measured is flaws in the "random event to digital" interface.

Sure. I measure the extent to which particular machine designs expose thermal noise. That is different than, say, simply claiming such sources must be random and therefore unpredictable.

You have artificially converted the instantaneous unpredictable nature of the instantaneous voltage across a transistor, zener diode, FM hiss, etc to a 44.1khz 16 bit sample.

And that is exactly what all digital recording does.

And the have run correlations on this data. What you haven't done is try to predict the direction of a voltage change depending upon the previous runs of voltage changes.

What you haven't done is to understand that correlations are those predictions. They represent the extent to which a later sample will reflect the value of an earlier sample. They demonstrate that noise samples are not necessarily independent.

A properly working 5 volt zener diode is going to have an average value of 5 volts, which is certainly NOT random! Yet the instantaneous, absolute voltage (to an infinite number of decimal places), is based on the tunneling effects of protons/electrons in the zener diode. This is as unpredictable as radioactive decay -- regardless of how you measure it.

But unless one can expose randomness, it cannot be used. And if the ways which are commonly used to produce random values have repeatable correlations, it is false to think that the samples are independent. Samples which are not independent are not random.

The studies you've done are very useful in searching for an audio source of noise, for testing speakers, mics, room acoustics, etc. But what they are measuring are the "random voltage to sound source" interface. Not the predictability of the noise source -- which would be a matter best taken to rec.quantum.mechanics (or something like that ;^)

I would say that anything which cannot be measured is not a real issue for those who want to use the noise.

What I believe Mark was looking for was a cheap and dirty random bit generator, and not a noise source.

A good inexpensive noise source can be built cheaply. Sampling that noise can produce random-like noise data. When properly processed, that data can be fairly cheap and good random generator -- far better than cheap and dirty.

By using, what I might have naively considered, the bit in the ADC that most closely follows the current "hiss" voltage, I purposely tried to remove flaws in the hardware sampling of the random voltage. By as you say "decimating the noise" I was not trying to make the data "more random". That's impossible, the background radiation and thermodynamic unpredictability of FM hiss, is not in question here. This is a truly random event, and if it were to be measured otherwise, then the measurement apparatus must be brought into question.

Indeed -- that is my entire point: Some machines intended to detect noise do not do it well at all. One can easily claim that they must be "unpredictable" because they supposedly expose unpredictable quantum events. Alas, such claims are often false. In particular, the claim that one can just use FM noise is seriously questionable.

Decimated or Boolean data are not appropriate to be accumulated and used as "values" in common statistical tests. Most such tests want to see entire integer or real values representing sampled values. Problems which are easily detected by normal statistical tests are hidden when we sample only a bit. But that does not make the generator better, or the bit output better, it just fools the usual tests and so leads to misguided confidence in the result. Special tests and huge numbers of samples probably will expose the original error.

A better way to capture the unpredictability of a the FM hiss, or the voltage across a zener diode, might be a fast comparator that compares the current, absolute voltage, with a running average of the voltage, but regardless, when done right, these are truly unpredictable voltage sources, and can be used as a source of truly random bits. Though designing the perfect sampling hardware may be near impossible!

Perhaps the major point of my work -- actual devices actually built and actually measured and actually analyzed instead of being handwaved and claimed -- is to confront the widespread belief that good random noise is easy to get. The variety of sources and circuits with measurable problems shows that belief is false.

More than that, reporting noise as a Boolean signal complicates our ability to detect and understand faults in that generator. If one desires unpredictability, that is a fundamentally dangerous approach.

As far as good audio noise sources go, long run LFSRs work rather well, as shown to be the case with your tests!

But the LFSR measurement calibrated the tests and computations. Since we use the exact same tests in each case, when we find unexpected results, we can have good confidence that those results are from the data, and not peculiarities in the tests.

For example, since the soundcard sampling rate was in no way related to the internal LFSR oscillator and bit output, we know from the good LFSR results that sampling per se is not a significant issue in these noise tests.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: Fri, 06 Oct 2000 06:49:43 -0700 From: Mark Johnson mark@matrixsemi.com Message-ID: 39DDD877.9C3731AE@matrixsemi.com References: 39dd5c3a.5244885@news.io.com Newsgroups: sci.crypt.random-numbers Lines: 23

There's an interesting paper by Vazirani that shows how to construct arbitrarily good random bitsequences starting from only-slightly-random bitsources.

In essence, you take a large number N of independent, slightly-random bitsources (Vazirani suggests using Zener diodes and 1-bit A-to-D's for these) and add their sequences (modulo 2). He goes on to give some proofs -- as they always do in the IEEE Foundations of Computer Science proceedings -- of the unpredictability of the next bit, as a function of the entropy of the original sources.

Part of the appeal is the amazingly low cost of the hardware: each bitgenerator is 4 discrete parts and one chip (Zener diode, 2 resistors, 1 capacitor, 1 IC) and the modulo-2 adder is a dirt cheap single chip that can accept 9 bitgenerator inputs (74LS280). The parts cost of a Vazirani generator is about ($3.00 * N) where N is the number of independent bitgenerators built and summed modulo-2.


Subject: Re: Hardware RNGs Date: Fri, 06 Oct 2000 18:59:25 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39de207b.10154345@news.io.com References: 39DDD877.9C3731AE@matrixsemi.com Newsgroups: sci.crypt.random-numbers Lines: 88

On Fri, 06 Oct 2000 06:49:43 -0700, in 39DDD877.9C3731AE@matrixsemi.com, in sci.crypt.random-numbers Mark Johnson mark@matrixsemi.com wrote:

There's an interesting paper by Vazirani that shows how to construct arbitrarily good random bitsequences starting from only-slightly-random bitsources.

That is a well-known paper. I have included it in my literature survey on random number machines:

http://www.io.com/~ritter/RES/RNGMACH.HTM#Santha86

and in my 1991 Cryptologia crypto RNG survey:

http://www.io.com/~ritter/ARTS/CRNG2ART.HTM#Sect5.4.1

And on my pages I have archived some 1990 sci.crypt discussions about it:

http://www.io.com/~ritter/REALRAND/REALRAND.HTM#RandGen

In essence, you take a large number N of independent, slightly-random bitsources (Vazirani suggests using Zener diodes and 1-bit A-to-D's for these) and add their sequences (modulo 2). He goes on to give some proofs -- as they always do in the IEEE Foundations of Computer Science proceedings -- of the unpredictability of the next bit, as a function of the entropy of the original sources.

I would call the exclusive-OR stuff "processing," and it is one of many approaches which could be used, but which I would apply only after obtaining the best approach to theoretical randomness I could get.

I'm not the guy to address the structure of that math. Nevertheless, I remain disturbed by the possibility of correlations existing in each noise stream, thus adding in the same way no matter how many exclusive-OR's there are. And in practice, it has been suggested (and I think it reasonable) that the zener devices will tend to not be independent unless very strongly isolated.

But the real problem, for me, is that the approach hides defects in the noise by producing only a single bit output. We thus rely on a mathematical model to assure us that our basis for randomness is sufficient, yet that model in no way reflects the many complexities of real diodes and real electronic machines in practice.

Part of the appeal is the amazingly low cost of the hardware: each bitgenerator is 4 discrete parts and one chip (Zener diode, 2 resistors, 1 capacitor, 1 IC) and the modulo-2 adder is a dirt cheap single chip that can accept 9 bitgenerator inputs (74LS280). The parts cost of a Vazirani generator is about ($3.00 * N) where N is the number of independent bitgenerators built and summed modulo-2.

Actually, there is a real problem proving that the simple approach "works." Oh, we get "random-like" data all right, but do we have correlations? Do we set up appropriate tests which could detect them? Generally not. But we can use normal statistical tests very effectively on sampled integer values.

If we work directly on the low-level unamplified noise, the normal noise of the chip might be drowning the supposed zener noise source. That might not be too bad, except that we then depend upon a value which manufacturers try to reduce.

Then we have the Boolean output itself. That is not really a "bit," because there is no information about when that level will be sampled. Sampling is itself a problem, because ordinary digital circuits have both "setup" and "hold" time requirements which could and probably would be violated in this use. Moreover, the XOR process increases the number of edges, which increases that problem.

There is more complexity here than one might at first think.

As much as anything, my work with noise sources is an attempt to escape these problems.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: 5 Oct 2000 01:14:58 GMT From: korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) Message-ID: 8rgkmi$app$1@agate.berkeley.edu References: 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 24

In article 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com, Zonn zonn@zonn.com wrote:

Using only bit-0 also protects against DC offset errors in the sample circuitry which always exists to some degree, and causes full sample reads to be skewed positive or negative.

It would be interesting to see if you find any correlations in a random numbers generated using only bit-0 of FM noise samples. I was unable to find any, though I didn't try particularly hard.

There will always be autocorrelations in digitized samples due to non-linearities in the ADC. DC offsets are the least of your worries. Hook a ramp generator into your sound card and sample at constant intervals that are very small compared to the ramp time. Then add up how many times each output value occurs. I can guarantee that the bin to bin variations will be much larger than you expect. With ADCs there are always preferred output values.

Eric

Eric Korpela | An object at rest can never be korpela@ssl.berkeley.edu | stopped. Click for home page.


Subject: Re: Hardware RNGs Date: Thu, 05 Oct 2000 04:03:01 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39dbfd25.1820931@news.io.com References: 8rgkmi$app$1@agate.berkeley.edu Newsgroups: sci.crypt.random-numbers Lines: 54

On 5 Oct 2000 01:14:58 GMT, in 8rgkmi$app$1@agate.berkeley.edu, in sci.crypt.random-numbers korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) wrote:

In article 9j3nts04uh79n5pm9tusume8cormqskqt8@4ax.com, Zonn zonn@zonn.com wrote:

Using only bit-0 also protects against DC offset errors in the sample circuitry which always exists to some degree, and causes full sample reads to be skewed positive or negative.

It would be interesting to see if you find any correlations in a random numbers generated using only bit-0 of FM noise samples. I was unable to find any, though I didn't try particularly hard.

There will always be autocorrelations in digitized samples due to non-linearities in the ADC.

We can (to some extent) expose systematic testing problems. What we do is to look at the results and see how some sources do much better than others on the exact same tests. In particular,

http://www.io.com/~ritter/NOISE/DIG1PIO8.HTM

describes tests on noise data from a MM5837 Digital Noise Generator. While this is an LFSR, and not a really-random noise generator, the results do demonstrate that the test system is not the source of the autocorrelation structure seen in the FM data. We can also see what especially low autocorrelation looks like.

DC offsets are the least of your worries. Hook a ramp generator into your sound card and sample at constant intervals that are very small compared to the ramp time. Then add up how many times each output value occurs. I can guarantee that the bin to bin variations will be much larger than you expect. With ADCs there are always preferred output values.

It is true that none of my experiments included a linear ramp.

Nevertheless, we do have numerous examples of counting sample values, since this is what is done to create the Normal Graph on each analysis page. Some of these graphs are remarkably good (see, for example

http://www.io.com/~ritter/NOISE/ZCN161T8.HTM

). So unless we believe this happens by accident, we are pretty well forced to believe that the measurement system is working sufficiently well to properly analyze the data.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: 5 Oct 2000 23:09:57 GMT From: korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) Message-ID: 8rj1o5$m68$1@agate.berkeley.edu References: 39dbfd25.1820931@news.io.com Newsgroups: sci.crypt.random-numbers Lines: 35

In article 39dbfd25.1820931@news.io.com, Terry Ritter ritter@io.com wrote:

We can (to some extent) expose systematic testing problems. What we do is to look at the results and see how some sources do much better than others on the exact same tests. In particular,

http://www.io.com/~ritter/NOISE/DIG1PIO8.HTM

Well, the rebinning in your normal graph will certainly help hide non-random low order bits. (Which is where I entered the conversation, cautioning against the use of the low order bit of a sound card as a source of randomness.) At http://sag-www.ssl.berkeley.edu/~korpela/dnl.eps is a DNL measurement for a 16 bit ADC. For a perfect ADC, this would be a flat line with no deviations. (The statistical errors in this plot are less than 0.6%) http://sag-www.ssl.berkeley.edu/~korpela/dnl_spec.eps is a power spectrum of the DNL. The high frequency stuff is probably intrinsic to the ADC itself. The low frequency stuff is probably due to other parts of the setup.
Ignore the dotted line. Unfortunately this plot has the top clipped so you can't see how big the peaks at 32768 and 16384 (period of 2 and 4 samples respectively). In other words, I wouldn't count on randomness from the low order bits.

). So unless we believe this happens by accident, we are pretty well forced to believe that the measurement system is working sufficiently well to properly analyze the data.

Take more samples and don't rebin your normal distribution graph hide the missing values. When you're only analyzing 102400 samples, the statistics will hide a lot.

Eric

Eric Korpela | An object at rest can never be korpela@ssl.berkeley.edu | stopped. Click for home page.


Subject: Re: Hardware RNGs Date: 5 Oct 2000 01:04:57 GMT From: korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) Message-ID: 8rgk3p$ang$1@agate.berkeley.edu References: ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com Newsgroups: sci.crypt.random-numbers Lines: 25

In article ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, Zonn wrote:

A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card.

I would suggest against using only the least significant bit. Typical 16 bit ADCs have very significant differential non-linearity, approaching 100% bin to bin in some models. In some ADCs, odd output is favored, in some even is. In others which is favored depends upon the input value in a non-obvious way. Often there is a highly favored bin in the middle of the range (which would be pretty close to the zero input level). Even going to the 2nd or 3rd bit you're likely to see significant non-linearity.

In other words, you won't quite get one bit of entropy out of the low order bit and you won't quite get three bits of entropy out of the 3 low order bits.

Eric

-- Eric Korpela | An object at rest can never be korpela@ssl.berkeley.edu | stopped. Click for home page.


Subject: Re: Hardware RNGs Date: Thu, 05 Oct 2000 04:15:31 GMT From: ritter@io.com (Terry Ritter) Message-ID: 39dc002e.2598079@news.io.com References: 8rgk3p$ang$1@agate.berkeley.edu Newsgroups: sci.crypt.random-numbers Lines: 23

On 5 Oct 2000 01:04:57 GMT, in 8rgk3p$ang$1@agate.berkeley.edu, in sci.crypt.random-numbers korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) wrote:

[...] In other words, you won't quite get one bit of entropy out of the low order bit and you won't quite get three bits of entropy out of the 3 low order bits.

I note that the statistical results on each of my analysis pages do in fact compute "entropy," and do so in two different ways: First from the classic equation, and then from my own work on population estimation using "augmented repetitions."

From the analysis, we can first see that the two computations generally produce very similar values. Next, many of these sources seem to have about 14 bits of "entropy" (out of 16). And that is another good reason why we don't want to throw most of it away.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Hardware RNGs Date: 5 Oct 2000 18:16:09 -0500 From: Zonn zonn@zonn.com Message-ID: 202qts0gv8rdts13vgmhenomhiaomev1cj@4ax.com References: 8rgk3p$ang$1@agate.berkeley.edu Newsgroups: sci.crypt.random-numbers Lines: 39

On 5 Oct 2000 01:04:57 GMT, in msg 8rgk3p$ang$1@agate.berkeley.edu, korpela@ellie.ssl.berkeley.edu (Eric J. Korpela) wrote:

In article ihuhtsgts6t9duiqe92db8mm5ngnmi3i7c@4ax.com, Zonn wrote:

A simple way to generate truly random bits is to use a FM tuner tuned to an off station and sample its "hiss" with a sound card. Set to 16 bit samples and only use bit-0 (the least most significant bit) and sample at a rate lower than the upper frequency response of both the tuner and sound card.

I would suggest against using only the least significant bit. Typical 16 bit ADCs have very significant differential non-linearity, approaching 100% bin to bin in some models. In some ADCs, odd output is favored, in some even is. In others which is favored depends upon the input value in a non-obvious way. Often there is a highly favored bin in the middle of the range (which would be pretty close to the zero input level). Even going to the 2nd or 3rd bit you're likely to see significant non-linearity.

In other words, you won't quite get one bit of entropy out of the low order bit and you won't quite get three bits of entropy out of the 3 low order bits.

I agree, especially on sound cards! The ADCs used here are not of the highest quality! I've had all the problems you've described when using ADCs in the past Including the ramp problem. Most of the problems are documented in the ADC's spec sheet (something you seldom get with a soundcard!)

I was just looking for quick and dirty random bits (and thought I'd share that with Mark). And for that purpose I was able to create a bit stream with no predictability in my very limited test.

When counting all the bytes of a 1 meg file created by concatenating bits, all values were equally represented (within the expect few counts). All words were also equally distributed. The number of one bits was within 200 counts of the number of 0 bits. Not bad for 8 million bits! I was able to find a few small runs of all 256 byte values.

It would make a nice file for a one time pad encryption.

-Zonn


Terry Ritter, hiscurrent address, and histop page.

Last updated: 2001-06-24