Hidden Markov Models (original) (raw)

Terry Ritter

ACiphers By Ritter Page

It appears that we do not know very much about Hidden Markov Models (HMM's).


Contents



Subject: Hidden Markov Models on web site! Date: Sun, 20 Aug 2000 21:47:56 GMT From: jsavard@fNrOeSePnAeMt.edmonton.ab.ca (John Savard) Message-ID: 39a050fc.14626826@news.ecn.ab.ca Newsgroups: sci.crypt Lines: 17

Having heard from some posts that the concept of "Hidden Markov Models" is relevant to cryptanalysis, I did some web searching, and gained enough understanding to put a partial introduction on my web page at

http://home.ecn.ab.ca/~jsavard/co041003.htm

This introduction only goes up to the Viterbi algorithm; I did not understand such matters as Baum-Welch estimation well enough to attempt to describe them, but I did note the name to assist those seeking to do further reading. Also, since Kullback-Leibler distance (between probability distributions - related to Friedman's chi test) was mentioned, I checked, and yes, that was Solomon Kullback, among those who broke PURPLE.

John Savard http://home.ecn.ab.ca/~jsavard/crypto.htm


Subject: Re: Hidden Markov Models on web site! Date: Mon, 21 Aug 2000 14:22:20 GMT From: "Douglas A. Gwyn" gwyn@arl.army.mil Message-ID: 39A13B1C.3A367077@arl.army.mil References: 39a050fc.14626826@news.ecn.ab.ca Newsgroups: sci.crypt

John Savard wrote:

... I did not understand such matters as Baum-Welch estimation well enough to attempt to describe them, ...

Basically, all that matters (unless you have to implement it) is that it runs forward and backward through the observed output using the actual transitions to successively refine an initial estimate of the HMM parameters, and the algorithm has been proved to be stable and convergent. The result is a set of parameters for the HMM for which the observed output is at least as likely as for any other set of parameters. That's an example of Maximum Likelihood Estimation, which is widely used in data analysis. Maximum-Likelihood methods generally fit the observed data too tightly, but it's a reasonable criterion that is mathematically tractable, and in practice it often produces good results.


Subject: Re: Hidden Markov Models on web site! Date: 22 Aug 2000 16:57:34 +0100 From: Gunnar Evermann ge204@eng.cam.ac.uk Message-ID: mqq8ztp2sr5.fsf@eng.cam.ac.uk References: 39A13B1C.3A367077@arl.army.mil Newsgroups: sci.crypt Lines: 18

"Douglas A. Gwyn" gwyn@arl.army.mil writes:

John Savard wrote:

... I did not understand such matters as Baum-Welch estimation well enough to attempt to describe them, ...

Basically, all that matters (unless you have to implement it) is that it runs forward and backward through the observed output using the actual transitions to successively refine an initial estimate of the HMM parameters, and the algorithm has been proved to be stable and convergent. The result is a set of parameters for the HMM for which the observed output is at least as likely as for any other set of parameters.

Baum-Welch only finds a local maximum, it is not guaranteed to find the global maximum.

Gunnar


Subject: Re: Hidden Markov Models on web site! Date: Tue, 22 Aug 2000 17:47:39 GMT From: "Douglas A. Gwyn" gwyn@arl.army.mil Message-ID: 39A2BCBB.3519D9EF@arl.army.mil References: mqq8ztp2sr5.fsf@eng.cam.ac.uk Newsgroups: sci.crypt Lines: 24

Gunnar Evermann wrote:

Baum-Welch only finds a local maximum, it is not guaranteed to find the global maximum.

Yes; as I said, it refines an initial estimate. If the initial estimate is too far off, for certain kinds of system the convergence will indeed be to a local maximum-likelihood peak (over the parameter space) that is lower than the global maximum.

In practice in cryptologic applications, a good initial estimate is often easy to devise, for example equiprobability. The end result produces categories in some order that might be different from what would have been obtained from a different initial estimate, but (usually) they are the same categories. E.g. (nouns,adjectives,verbs) as opposed to (verbs,nouns, adjectives), if the units are (thinly disguised) PT words, or (vowels,consonants) if the units are (thinly disguised) PT letters. (Actually, real linguistic categories are not quite that simple, but the general idea is right. There are also uses to unravel internal structure of algorithmic systems, but I don't know of any good examples I can present.)


Subject: Re: Hidden Markov Models on web site! Date: 22 Aug 2000 21:01:58 +0100 From: Gunnar Evermann ge204@eng.cam.ac.uk Message-ID: mqqu2cd12vd.fsf@eng.cam.ac.uk References: 39A2BCBB.3519D9EF@arl.army.mil Newsgroups: sci.crypt Lines: 25

"Douglas A. Gwyn" gwyn@arl.army.mil writes:

Gunnar Evermann wrote:

Baum-Welch only finds a local maximum, it is not guaranteed to find the global maximum.

Yes; as I said, it refines an initial estimate. If the initial estimate is too far off, for certain kinds of system the convergence will indeed be to a local maximum-likelihood peak (over the parameter space) that is lower than the global maximum.

In practice in cryptologic applications, a good initial estimate is often easy to devise, for example equiprobability. The end result produces categories in some order that might be different from what would have been obtained from a different initial estimate, but (usually) they are the same categories.

Ok, fair enough. I was thinking of continuous HMMs as used in e.g. speech recognition. There the initialisation is more difficult and potentially more important than for the discrete case.

-- Gunnar Evermann Speech, Vision & Robotics Group Cambridge University Engineering Department


Subject: Re: Hidden Markov Models on web site! Date: Tue, 22 Aug 2000 17:58:10 GMT From: "Douglas A. Gwyn" gwyn@arl.army.mil Message-ID: 39A2BF32.8FD94FD7@arl.army.mil References: 39A2AAE0.EDE1B100@t-online.de 39A13B1C.3A367077@arl.army.mil Newsgroups: sci.crypt Lines: 24

Mok-Kong Shen wrote:

Can the HMM be used for predicting sequences?

Sure. Baum-Welch-Eagon MLE can be thought of as "training" the model using that particular output string; then the model can be "run" to generate a synthetic output sequence with the same parameters that were fitted to the original actual data. If you preload the model state to match a particular (generally non-training) observed string, then when you run it it produces a MLE prediction for subsequent observations.

Are there any literatures on its applications in crypto?

Well, there is considerable literature, but not much of it is available to the general public. There was one Master's thesis or Doctoral dissertation (I forget which) and as I recall a couple of articles in Cryptologia. (I'd have to rummage through my files to find specifics.) Also, I managed to get one of D'Imperio's NSATJ articles on HMM applied to analysis of the "hands" in the Voynich manuscript declassified two years ago; it is summarized on Jim Reed's Voynich page but so far as I know has not been made available on-line in its entirety.


Subject: Re: Hidden Markov Models on web site! Date: Wed, 23 Aug 2000 13:23:06 -0400 From: "Douglas A. Gwyn" DAGwyn@null.net Message-ID: 39A4087A.DFF32218@null.net References: 39A2D7B8.58451824@t-online.de 39A2BF32.8FD94FD7@arl.army.mil Newsgroups: sci.crypt Lines: 68

Mok-Kong Shen wrote:

From what I read in this thread I got the (maybe wrong) impression that the sequences dealt with are character strings. Would HMM be powerful enough to deal with bit strings (which are at a finer granularity) from some not too bad PRNGs (to predict future output)?

HMM is a general technique, not related to text characters except when you choose to apply it to them. The model is simply a finite number of (internal) states that have fixed state-transition probabilities (ordinary Markov model) and the additional feature that the only thing observable is output that depends probabilistically on the transition. (Each piece of output for a single transition can be called a "character" for convenience in discussion, but the only assumed property is that it is uniquely identifiable. The output "character set" might be the set {0,1}, for example.) So the output contains clues about what is going on "inside" the model, but the clues are indirect since the same output can result from many possible transitions. There are further clues about the state sequence contained in the sequence of outputs. The process of fitting the model to an observed output sequence amounts to estimating what sequence of states (thus transitions) would be the most likely way of producing that output. Of course, that guess could be wrong, but given enough output the trained model (specified by the parameters for the best fit) usually tends to reflect what is actually going on inside the (hidden) system. It's sort of like a fuzzy

How successful one is at fitting an HMM is largely a function of how well the assumed number of states "resonates with" the internal system structure. You could try various assumptions and see if any produce good fits. Presumably there is an objective measure of goodness of fit of an HMM to the observations, although it isn't used directly in the usual fitting algorithm. Informally, if the transition and output probability matrices are quite far from random, there is a good fit. For example, a transition matrix A B C A 0.7 0.1 0.2 B 0.2 0.5 0.3 C 0.1 0.4 0.5 shows a fairly good categorization, in that the system tends to stay in the same state, and there are decided biases in transitions out of the state (e.g. C->A is 4 times less likely than C->B). Suppose the character set is {0,1} and the output probabilities are: out 0: A B C A 0.1 0.7 0.2 B 0.3 0.1 0.6 C 0.6 0.2 0.2 P(out 1) = 1 - P(out 0) That shows that most "1"s occur when the system remains in the same state (at this level of modeling), which is a potentially valuable clue about how the system works. Of course, the actual system structure would involve a lot more than 3 states; if we had picked the exact number and had enough data, all the matrix entries would be very close to 0 or 1 since the system is deterministic at that level. By using fewer states, we lose the details and end up with a probabilistic model, but some trends can still be seen, and these often indicate some large-scale aspects of how the system actually functions.


Subject: Re: Hidden Markov Models on web site! Date: 21 Aug 2000 15:45:01 -0700 From: jsavard@ecn.ab.ca () Message-ID: 39a1a2dd@ecn.ab.ca References: 39A15873.6D0A79DE@t-online.de 39a050fc.14626826@news.ecn.ab.ca Newsgroups: sci.crypt Lines: 11

Mok-Kong Shen (mok-kong.shen@t-online.de) wrote: : John Savard wrote:

: > http://home.ecn.ab.ca/~jsavard/co041003.htm

: The page was not accessible. Could you please look : after the matter?

Apologies: the URL should be

http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm


Subject: Re: Hidden Markov Models on web site! Date: Mon, 21 Aug 2000 23:44:45 GMT From: Tim Tyler tt@cryogen.com Message-ID: Fzo1yL.DEE@bath.ac.uk References: 39A15873.6D0A79DE@t-online.de 39a050fc.14626826@news.ecn.ab.ca Newsgroups: sci.crypt Lines: 11

Mok-Kong Shen mok-kong.shen@t-online.de wrote: : John Savard wrote:

:> http://home.ecn.ab.ca/~jsavard/co041003.htm

: The page was not accessible. [...]

Try instead: http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm

__________ Lotus Artificial Life http://alife.co.uk/ tt@cryogen.com |im |yler The Mandala Centre http://mandala.co.uk/ Namaste.


Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption" Date: Sun, 18 Feb 2001 22:23:07 -0800 From: "John A. Malley" 102667.2235@compuserve.com Message-ID: 3A90BBCB.E8977ACF@compuserve.com References: 3A90A968.4195911B@null.net slrn99116d.5df.sjmeyer@localhost.localdomain Newsgroups: sci.crypt Lines: 26

"Douglas A. Gwyn" wrote:

[snip]

What you mean is a PT source with very low entropy. Such a source makes the cryptanalysis difficult for nearly any system, not just the kind you describe.

Note that genetic protein sequences have nonnegligible entropy. Indeed, HMMs have been used to characterize (model) them.

There's been a number of references to Hidden Markov Models and cryptanalysis in various threads over the past few months. I just found (yesterday) a good (IMO) introduction to statistical language learning, hidden markov models and probabilistic grammars - "Statistical Language Learning" by Eugene Charniak, MIT Press, 1993 - ISBN 0-262-53141-0 - at a Borders bookstore.

Are there other books on the subject you'd recommend - especially any covering cryptanalysis with hidden markov models?

Thanks for any pointers,

John A. Malley 102667.2235@compuserve.com


Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption" Date: Tue, 20 Feb 2001 04:27:34 GMT From: "Douglas A. Gwyn" DAGwyn@null.net Message-ID: 3A91F272.4F598F3C@null.net References: 3A90BBCB.E8977ACF@compuserve.com Newsgroups: sci.crypt Lines: 18

"John A. Malley" wrote:

Are there other books on the subject you'd recommend - especially any covering cryptanalysis with hidden markov models?

There is now a vast literature on technology and applications of HMMs, but there are only a couple of articles I know of in the open literature specifically addressing application to cryptanalysis, and unfortunately I can't recommend them, because they don't show any real advantage over other well-known approaches. That's not to say that there aren't good applications for the technology, just that it is not evident from what has been published.

A couple of years ago I ran across an interesting application of HMMs to the problem of "hands" in the Voynich manuscript, and managed to get it declassified (except for one example and a few names); there is a brief summary of its results in Jim Reed's Voynich bibliography at URL http://www.research.att.com/~reeds/voynich/bib.html Maybe some day I'll make a copy available on a Web site.


Terry Ritter, hiscurrent address, and histop page.

Last updated: 2001-06-27