[3.0] The Rise Of Field Ciphers (original) (raw)
v6.0.1 / chapter 3 of 9 / 01 jul 23 / greg goebel
* By the 19th century, improvements in codebreaking had made life more difficult for codemakers, and the challenge was compounded by the invention of mass communications in the form of the telegraph, which changed the landscape for cryptology. Codemakers developed many new and clever cryptographic schemes, though most were soon cracked. This chapter outlines cryptographic schemes devised from the middle of the 19th century and into World War I.
[3.1] TELEGRAPHY & CRYPTOLOGY
[3.2] CRACKING THE VIGENERE CIPHER
[3.3] THE UNION ROUTE CIPHER / MULTIPLE ANAGRAMMING
[3.4] POLYBIUS, BIFID, PLAYFAIR, & STRADDLING CHECKERBOARD CIPHERS
[3.5] GRILLES & TURNING GRILLES
[3.1] TELEGRAPHY & CRYPTOLOGY
* In 1844, the American inventor Samuel F.B. Morse performed the first demonstration of a practical electric telegraph, which would quickly introduce the world to nearly instantaneous long-distance communications. The telegraph would put cryptology on a new playing board.
At first, commercial users of the telegraph were concerned about security. It meant that the strangers handling the telegraph system would be able to inspect private messages about personal concerns or business transactions; in fact, it was almost unavoidable. This concern turned out to be overblown, since telegraph companies generally had good reason to be discreet about the messages passing over their wires. However, since the cost of telegrams increased with their length, commercial users had good reason to want to keep them as short as possible, all the more so because the telegraph, in making long-distance communications much cheaper and easier, greatly increased the total volume of messages passing from one place to another.
The need for compactness quickly led to the development of "commercial codes", with one of the first published in 1845 by Morse's lawyer and promotional agent, Francis O.J. Smith. Such commercial codes replaced words or entire phrases with codenumbers or codewords to reduce the length of messages. As commercial codes evolved, codewords became favored, since transmitting codenumbers was error-prone. The commercial codebooks continually increased in sophistication and length, with more and longer phrases mapped to short codewords. Eventually, some commercial codes had as many as twenty times as many codewords for phrases as for words.
Commercial codes, being essentially an economy measure, were relatively insecure one-part codes and, much more to the point, the codebooks were available to anyone who wanted to buy them. They were secure in the sense that they blocked casual reading; if more security was required, various superencipherment schemes were sometimes provided along with the codes, with a user selecting a specific superencipherment scheme and defining keys to ensure confidentiality.
Commercial telegraph codes remained in popular use until well after World War I. For example, the "Acme" code was a one-part code developed by William J. Mitchel in the 1920s and featured 100,000 entries. Some of the entries seemed almost tailored for encrypting contemporary pulp fiction, for example:
: BUKSI avoid arrest if possible : OBNYX escape at once : PYTUO collided with an iceberg : URPXO for what use was the mixing machine intended? : YBDIG plundered by natives : GNUEK rubber, slightly moldy : NARVO do not part with the documents : CULKE bad as possibly can be
As governments used the telegraph more and their volume of communications traffic increased, they too began to appreciate the economy of codes, and adapted commercial codes to their own purposes. Of course, diplomatic codes were not available to anyone who wanted to buy them, at least not legally, but since they generally adopted the one-part code format from the existing commercial codes, they were still not very secure. Interestingly, the rise of commercial and diplomatic codes sent the nomenclator into oblivion, since it did not provide the same economy of transmission as a true code.
* The impact of the telegraph on military cryptology was even greater. The telegraph provided a battle commander with fast communications over a battle theater and back down the logistics trail. Coupled with the rapid transportation provided by the locomotive, and the increasing lethality of firearms and artillery as they evolved at a rapid rate through the 19th century, the result was a revolution in warfare. However, the telegraph also made it easier for an enemy to intercept military communications. Scouting parties could penetrate behind enemy lines and discreetly tap into telegraph wires, giving immediate and invisible access to battle plans being sent from one place to another. Communications security became very important.
Codes wouldn't do. They had worked well enough for military communications in earlier times, since the volume of messages was much lower, but with armies sprawled all over battle theaters and linked by telegraph lines, distributing codebooks was a logistical nightmare, particularly for an army on the move. There was no realistic way to prevent codebooks from eventually falling into enemy hands.
That led to the development of "field ciphers" for military use. Such ciphers had to be easy to support and use, but still highly secure. The use of keys helped ensure that even if an enemy understood the cipher, if they didn't have the keys, they still could not understand intercepted transmissions without a code-breaking effort that would take so long the messages would be generally useless by the time they were read.
[3.2] CRACKING THE VIGENERE CIPHER
* The Vigenere cipher was one of the first field ciphers to be widely adopted, with overhead generally reduced through use of a "cipher disk", a device described in a later chapter. The Vigenere cipher was regarded as unbreakable and perfectly secure for such communications, at least if certain precautions were taken. It was known as the "le chiffre indechiffrable (the indecipherable cipher)". In reality, it was not indecipherable, since a solution to the Vigenere cipher was first published by a retired Prussian Army officer named Friedrich Wilhelm Kasiski (1805:1881) in 1863, and his solution is appropriately known as the "Kasiski test".
A Vigenere cipher cannot be cracked by simple frequency analysis, but Kasiski found an indirect method. The Vigenere cipher, as discussed previously, allows the same letter to be enciphered in a number of different ways. Given the cipher keyword "WARTHOG", for example, the letter "e" would be enciphered seven different ways:
: W: e -> A : A: e -> E : R: e -> V : T: e -> X : H: e -> L : O: e -> S : G: e -> K
Similarly, the same string of plaintext could be enciphered in seven different ways, depending on which letter of the cipher key matches the beginning of the string, or in other terms what step of the "cycle" of different substitution alphabets was in effect. This means that common strings, such as "the" or "-ing", might be enciphered in exactly the same way in any fairly large message or set of messages.
Given the WARTHOG key, for example, if the plaintext word "the" is enciphered as "AVK", then if it occurs exactly 7, 14, 21, ... letters later in the plaintext message, it will also be enciphered as "AVK". Such repetitive patterns can be used to get a fingerhold into a cipher.
Kasiski's attack on the Vigenere cipher involved two insights. The first was that, as just shown, repetitive patterns in messages encrypted by a Vigenere cipher gave a hint as to the length of the key. It wasn't of immediate importance what the repetitive pattern actually was; only that it was repeated on a certain interval.
Suppose Alice encrypts a message using the cipher keyword "WARTHOG". WARTHOG has seven letters. For a plaintext string occurring several times to be encrypted into the same ciphertext patterns several times, the plaintext string has to begin at exactly the same step in the cycle of substitution alphabets. Given the WARTHOG key, this means that these patterns will repeat in the ciphertext with a spacing of an exact multiple of seven letters apart. Of course, this pattern can be confounded by the generation of the same ciphertext pattern from an entirely different plaintext string at a different step in the cycle, but this becomes less likely with longer strings, and such coincidences are more in the nature of noise than insurmountable obstacles.
Suppose Holmes, our codebreaker, finds a number of repetitive ciphertext patterns in a set of messages encrypted with a Vigenere cipher using a common cipher key, and the patterns repeat at the given intervals:
: pattern interval : ------- ---------------- : GIKK 133 : YHDX 140 : VXMA 1,190 : POLQK 3,341 : MOGTZL 4,550 : ------- ----------------
We assume here for simplicity that Holmes finds only one repetition of each pattern, though of course in reality there may be several repetitions of the same pattern, and each is likely to be at a different interval. Holmes then factors the intervals into primes:
! pattern interval factoring ! ------- ---------------- ------------------------ ! GIKK 133 7 * 19 ! YHDX 140 2 * 2 * 5 * 7 ! VXMA 1,190 2 * 5 * 7 * 17 ! POLQK 3,341 3 * 7 * 23 ! MOGTZL 4,550 2 * 5 * 5 * 7 * 13 ! ------- ---------------- ------------------------
He searches for primes because they represent the shortest possible length that the key could be. For example, since the ciphertext pattern GIKK repeats on an interval of 133, that means the key could be 133, or 7, or 19 letters long -- it's not going to be a fractional number since fractional letters don't make sense in this context. Since the actual (if still unknown) key length must be common to all five of the repetitive patterns Holmes finds in this message, he can narrow down the possible lengths to those factors that are common to all the patterns. The lists of factors at right show that all include the value 7, which is a clue, a really big one, that the length of the cipher key is seven.
There is, of course, no reason to assume that the key length will actually be a prime number. It could be a sum or multiple of primes, but that doesn't matter. Whatever the actual length is, all of its factors will appear in all the factorings of the interval lengths. For example, if all of the pattern intervals had the common prime factors of 3 and 7, that could mean that the key is 3, 7, or (most likely) 3 * 7 = 21 letters long.
There is, again, also no reason to assume that one text pattern, say VXMA, might not be produced by coincidence from two entirely unrelated four-letter plaintext strings, but this becomes less likely as the pattern grows longer. If Holmes finds a set of repeating patterns and most point to a particular key length, he would sensibly judge that the few patterns that point to completely different key lengths are just "noise" caused by such coincidences and should be ignored.
* Kasiski's second insight was that the length of the cipher key provided a direct lever for cracking a Vigenere cipher. Suppose, as just explained, Holmes determines that the cipher key is seven letters long. That means that every letter separated by seven letters is enciphered with the same cipher alphabet, and that the letters can be indexed into seven distinct sets:
: SET 1: letters with index 1, 8, 15, 22, 29, ... : SET 2: letters with index 2, 9, 16, 23, 30, ... : SET 3: letters with index 3, 10, 17, 24, 31, ... : SET 4: letters with index 4, 11, 18, 25, 32, ... : SET 5 letters with index 5, 12, 19, 26, 33, ... : SET 6: letters with index 6, 13, 20, 27, 34, ... : SET 7: letters with index 7, 14, 21, 28, 35, ...
Holmes could slice the ciphertext into seven such sets of letters, and then crack each set using frequency analysis. Given that the standard Vigenere cipher was based on a Caesar shift, Holmes could easily figure out the shift by observing a Pareto chart of the frequency distribution of the set and comparing it to a Pareto chart of the frequency distribution of average English text.
Another way of looking at this is to imagine that Holmes writes the ciphertext down in rows, with each row seven letters long. Once he does this, then each column of the message is enciphered using the same cipher alphabet, and can be cracked on a column-by-column basis with frequency analysis. Of course, he only gets a seventh of the plaintext message from each column, and so he won't be able to understand what the message actually says until he cracks all seven columns.
For example, suppose Alice decides to use a Vigenere cipher to encrypt, say, Lincoln's Gettysburg address, a relatively long document which begins as:
: Fourscore and seven years ago our fathers brought forth on this : continent ...
She uses the keyword WARTHOG to perform the encryption:
: WARTHOGWARTHOGWARTHOGWARTHOGWARTHOGWARTHOGWARTHOGWARTHOGWARTHO ... : fourscoreandsevenyearsagoourfathersbroughtforthonthiscontinent ... : BOLKZQUNERGKGKREEQLOXOAXHVIXBAKALFYXRFNNVZBOIMOCTPHZLJCTPIEXUH ...
Alice encrypts the rest of the document in the same way and sends it to Bob. Holmes intercepts the message. At first the ciphertext:
: BOLKZQUNERGKGKREEQLOXOAXHVIXBAKALFYXRFNNVZBOIMOCTPHZLJCTPIEXUH ...
-- looks like nothing but gibberish, but he pokes through the full ciphertext, finds a few repeating patterns, and factors out their intervals to determine that the cipher key used by Alice was seven letters long. Holmes then writes the ciphertext in rows, with seven letters per row, as follows:
: BOLKZQU : NERGKGK : REEQLOX : OAXHVIX : BAKALFY : XRFNNVZ : BOIMOCT : PHZLJCT : PIEXUH ...
He now breaks the message into seven separate columns:
: B O L K Z Q U : N E R G K G K : R E E Q L O X : O A X H V I X : B A K A L F Y : X R F N N V Z : B O I M O C T : P H Z L J C T : P I E X U H ...
He reads down the first column to obtain:
: COLUMN 1: BNROBXBPP ...
Holmes performs a frequency analysis on this string of text to get the matching plaintext letters:
: COLUMN 1: BNROBXBPP ... -> frvsfbftt ...
This seems clearly unhelpful in itself, but Holmes then repeats the exercise for the other six columns:
: COLUMN 2: OEEAAROHI ... -> oeeaarohi ... : COLUMN 3: LREXKFIZE ... -> uangtorin ... : COLUMN 4: KGQHANMLX ... -> rnyohutse ... : COLUMN 5: ZKLVLNOJU ... -> sdeoeghcn ... : COLUMN 6: QGOIFVCCH ... -> csaurhoot ... : COLUMN 7: UKXXYZTT ... -> oerrstnn ...
Holmes takes these seven sets of plaintext letter matches and arranges them as columns, matching the ciphertext from which they were derived:
: 1 2 3 4 5 6 7 : ------------- : f o u r s c o : r e a n d s e : v e n y e a r : s a g o o u r : f a t h e r s : b r o u g h t : f o r t h o n : t h i s c o n : t i n e n t ...
Now the solution just jumps out:
: foursco : reandse : venyear : sagoour : fathers : brought : forthon : thiscon : tinent ... -> : : fourscoreandsevenyearsagoourfathersbroughtforthonthiscontinent ... -> : : Fourscore and seven years ago our fathers brought forth on this : continent ...
Of course, it is unrealistic to think that Holmes could perform all the frequency analyses perfectly and go directly to the solution, but even if he doesn't figure out exactly what the proper plaintext letter matches are on his first attempt, he can substitute his initial sets of matches back into the columns, perform some anagramming by inspecting the result to see where the text makes some sense and where it doesn't, and then start adjusting his matches until he converges on a solution.
In addition, Alice did some things that made life easier for Holmes. Her Vigenere cipher square used shift ciphers, which meant that if Holmes could just figure out a few letters in each set, he had the entire set. Furthermore, she used the keyword WARTHOG, and once Holmes had figured out about four of the cipher sets and plugged in their associated key letters, for example:
: W???HOG
-- he might be able to guess that the keyword was WARTHOG and save himself some effort. Alice's message would have been more secure if her Vigenere square used mixed cipher alphabets, and if she had used a less obvious keyword. However, this approach would have still cracked her message; it just would have forced Holmes to do more work.
The Kasiski test has obvious caveats. It requires ciphertexts of reasonable length, and as the key grows longer, trying to find valid repetitive patterns in the text grows more difficult, and the size of the letter sets that can be sorted out of the ciphertext grows smaller and less penetrable by frequency analysis. If the key is the same length as the text, the Kasiski test is completely defeated.
* The Kasiski test had been independently discovered almost a decade earlier, in 1854, by the English inventor Charles Babbage (1792:1871), famed among the modern computer community as a foresighted genius who developed mechanical computers and invented some of the basic principles of computing machines. Babbage had begun cracking ciphers when he was a boy and eventually acquired a reputation for expertise in the field. In 1854, a Bristol dentist named John Hall Brock Thwaites proudly announced in a scholarly journal that he had developed a new cipher that he intended to patent. It turned out to be just a restating of the Vigenere cipher, and Babbage replied to the journal that Thwaites was walking on well-traveled ground.
Thwaites, with the same bogus logic found in quarrels on modern internet forums, then challenged Babbage to crack his cipher. That had no bearing on whether the cipher was original or not, but Babbage was intrigued anyway, and came up with the solution. However, he never published it, and so cannot be properly credited as its inventor. Babbage was notorious for not completing or publishing his work, though it is possible that he kept the solution secret at the request of British government cryptanalysts, who wanted to make use of it while other governments still thought the Vigenere cipher was secure.
* There is another approach to cracking a Vigenere cipher known as "superimposition" that was devised by Auguste Kerckhoffs in his great text on cryptology, LA CRYPTOGRAPHIE MILITAIRE, published in 1883. Superimposition works very well no matter how long the key is, as long as the codebreaker has a substantial number of messages enciphered with the same key.
Suppose Holmes takes the first letter of all these messages. All of them have been enciphered with the same cipher alphabet, and so this collection of letters effectively defines a ciphertext encrypted by a monoalphabetic substitution cipher. Given a large enough set of messages, the cipher alphabet for the first letter could be determined by simple frequency analysis.
The same procedure could be used on the second letter of all the messages, the third letter, and so on, working through the set of messages in "columns". If Holmes can determine that two different columns have been enciphered with the same letter, possibly by identifying a few high-frequency letters, he can then combine the two columns to help pick out lower-frequency letters.
For example, suppose Holmes gets a pile of twenty messages, all enciphered with the same key. The messages start out as follows:
: MESSAGE 1: DHKSJPO ... : MESSAGE 2: RGMNKLJ ... : MESSAGE 3: LKLNMDG ... : MESSAGE 4: DHKBNED ... : ... : MESSAGE 20: JVDGYSV ...
He arranges the separate messages as if they were rows of a single message, and then splits the result into columns:
: D H K S J P O ... : R G M N K L J ... : L K L N M D G ... : D H K B N E D ... : ... : J V D G Y S V ...
Each of the columns amounts to a simple monoalphabetic substitution cipher:
: COLUMN 1: DRLD ... J : COLUMN 2: HGKH ... V : COLUMN 3: KMLK ... D : COLUMN 4: SNNB ... G : COLUMN 5: JKMN ... Y : COLUMN 6: PLDE ... S : COLUMN 7: OJGD ... V : ...
-- which can be solved by frequency analysis, just as with the Kasiski test. Of course, also just as with the Kasiski test, each resulting set of plaintext letter matches is gibberish by itself, and only makes sense once they are all substituted back in by columns.
Now in this example Holmes has only 20 messages and so only 20 letters in each column, which isn't really enough to do a detailed frequency analysis, but if Holmes is lucky, superimposition can also give hints on the length of the keyword. Suppose Holmes sorts out a set of columns from a set of messages as above, and performs frequency analysis on the columns. If he notices that letter distribution from column 1 is similar to that from column 8, say having the same top four letters; the results from column 2 are similar to those from column 9; the results from column 3 are similar to those from column 10; and so on, then he has a very strong clue that the keyword is seven letters long.
Knowing this, he can merge the sets by columns -- merging column 1 with column 8 and 15 and 22 and so on; column 2 with column 9 and 16 and 23 and so on; continuing in this way until he merges column 7 with column 14 and 21 and 28 and so on -- in effect combining the Kasiski test and superimposition to multiply the number of letters for his frequency analyses. Of course, if the keyword is short then Holmes would have likely used the Kasiski test first and not bothered with superimposition, but this approach would be able to penetrate a long keyword whose length might not be evident using the Kasiski test.
[3.3] THE UNION ROUTE CIPHER / MULTIPLE ANAGRAMMING
* During the American Civil War, the Union Army adopted a simple but surprisingly effective encryption scheme, a form of "route" transposition based on a rectangular matrix. The basic design of the "Union route cipher", as it is now known, was established shortly after the beginning of the war by Anson Stager, a superintendent of the Western Union Telegraph Company. Stager had been asked by the governor of Ohio to provide an encryption scheme to protect the communications of the state government; although Stager had never been very interested in codes and ciphers, he did some investigation and came up with a solution.
Word of this exercise got back to General George Brinton McClellan, then in charge of Ohio volunteers fighting in the hill country of Western Virginia. He requested that Stager devise a cipher for military use, and Stager did so. When McClellan took command of the Army of the Potomac in Virginia, he took the cipher along with him, and its influence spread until it became an effective standard of the Union Army. Stager joined the Union Army himself, and would eventually be assigned overall command of the Army's Signal Corps, rising to the rank of brigadier general -- though he would return to civilian life at the end of the conflict.
Although all examples of transpositions mentioned in this document so far have transposed letters, there's also no reason why complete words can't be shuffled around as well, and the Union route cipher adopted this approach. The advantage of shuffling words around was that it reduced transmission errors, since telegraphers could send full words instead of strings of gibberish. The disadvantage of such an approach was that, without additional encryption, the names of generals, officials, cities, and whatever would be transmitted "in the clear" for the Confederates to read. Such names would be vital clues for unscrambling the messages, and even if the message weren't unscrambled the names might still provide significant information.
Stager's original cipher didn't address this flaw, but Samuel L. Beckwith, the cipher operator for Union General Ulysses S. Grant, created a small and convenient set of codewords for such names to ensure that the transmissions remained secure. Beckwith made sure the codewords were easy to spell and that they did not have any apparent logical connection to the plaintext words they represented.
The Union route cipher was improved steadily throughout the war, featuring more and more complicated routes, nulls, and ever-increasing sets of codewords. By the end of the war, the cipherbook for the final version of the Union route cipher consisted of 12 pages of possible routes and 36 pages with 1,608 codewords. For example, on 1 June 1863, President Lincoln sent the following telegram:
: FOR COLONEL LUDLOW: : RICHARDSON AND BROWN, CORRESPONDENTS OF : THE TRIBUNE, CAPTURED AT VICKSBURG, ARE : DETAINED AT RICHMOND. PLEASE ASCERTAIN : WHY THEY ARE DETAINED AND GET THEM OFF : IF YOU CAN. : THE PRESIDENT.
The codebook for the route cipher version current at the time provided the following codewords for this message:
: VENUS: COLONEL : WAYLAND: CAPTURED : ODOR: VICKSBURG : NEPTUNE: RICHMOND : ADAM: PRESIDENT LINCOLN
The cipher clerk also appended the codeword "NELLY", giving the time of the dispatch as 4:30 PM. With the codeword changes, the message looked like this, with the codewords highlighted with asterisks:
: FOR VENUS LUDLOW: : RICHARDSON AND BROWN, CORRESPONDENTS OF : THE TRIBUNE, WAYLAND AT ODOR, ARE : DETAINED AT NEPTUNE. PLEASE ASCERTAIN : WHY THEY ARE DETAINED AND GET THEM OFF : IF YOU CAN. : ADAM NELLY
The cipher clerk then selected the transposition scheme he wanted to use from the codebook. In this case, he selected the scheme designated "GUARD". Following GUARD, he broke the message into consecutive lines of five words as follows, using nulls to pad out the message:
: FOR VENUS LUDLOW RICHARDSON AND : BROWN CORRESPONDENTS OF THE TRIBUNE : WAYLAND AT ODOR ARE DETAINED : AT NEPTUNE PLEASE ASCERTAIN WHY : THEY ARE DETAINED AND GET : THEM OFF IF YOU CAN : ADAM NELLY THIS FILLS UP
GUARD then indicated the route through the message as follows:
- Up column 1.
- Down column 2.
- Up column 5.
- Down column 4.
- Up column 3.
The cipher clerk also appended arbitrary null words to each column to make the message more confusing. The transposed words were preceded by the scheme name, GUARD, resulting in the following ciphertext:
: GUARD ADAM THEM THEY AT WAYLAND BROWN FOR (KISSING) : VENUS CORRESPONDENTS AT NEPTUNE ARE OFF NELLY (TURNING) : UP CAN GET WHY DETAINED TRIBUNE AND (TIMES) : RICHARDSON THE ARE ASCERTAIN AND YOU FILLS (BELLY) : THIS IF DETAINED PLEASE ODOR OF LUDLOW (COMMISSIONER)
This gave the final message:
: GUARD ADAM THEM THEY AT WAYLAND BROWN FOR KISSING : VENUS CORRESPONDENTS AT NEPTUNE ARE OFF NELLY : TURNING UP CAN GET WHY DETAINED TRIBUNE AND TIMES : RICHARDSON THE ARE ASCERTAIN AND YOU FILLS BELLY : THIS IF DETAINED PLEASE ODOR OF LUDLOW COMMISSIONER
The message is easy for a recipient to decipher, but it proved very secure. Partly this was because the Confederacy simply didn't have the resources to seriously attack Union ciphers. The South was an agrarian society whose leaders were not particularly tuned to technology and engineering, and as a result there were relatively few Southerners who could even contemplate the art of codebreaking. They were so desperate, it is said, that in some case Union ciphers were published in Southern newspapers along with a plea for someone to try cracking them. Nobody ever succeeded in doing so.
The Confederates were also backwards in their use of ciphers, establishing no standard methods for encrypting messages, in some cases even using simple Caesar shift ciphers for military communications. However, the Confederacy most often used a Vigenere cipher, which didn't prove vastly more secure because they used short keywords and didn't change the keywords very often. Union cryptanalysts often cracked intercepted Confederate messages.
* The general technique for cracking transpositions, multiple anagramming, wasn't discovered until 1878. The US presidential election of 1876 was one of the closest in American history, resulting after much controversy in sending Republican Rutherford B. Hayes to the White House.
There were unsurprisingly rumors that shady deals had been struck by both Democrats and Republicans to get their respective candidates the presidency. The election had been supported by a flood of telegrams, many of them encrypted, and though Western Union burned most of the company's copies of the correspondence with public fanfare to reassure customers that their messages were secure, some of the messages were legally seized by a Congressional committee investigating the charges of irregularities.
In the summer of 1878, Republicans on the committee leaked 27 encrypted telegrams, sent by Democrats, to the Republican-leaning NEW YORK TRIBUNE newspaper in hopes of embarrassing the Democrats. Some of the telegrams were encrypted with a book cipher and, following a lead from the DETROIT TRIBUNE that the book was a popular dictionary, were quickly deciphered by TRIBUNE staffers. They revealed attempts to buy electoral votes and were published on the TRIBUNE's front page.
Some of the other telegrams were harder to crack. They appeared to be in a simplified version of the Union Army word-based route cipher. One of the TRIBUNE's editors, John R.G. Hassard, and the economics editor, Colonel William M. Grosvenor, struggled with the messages, as did a mathematician at the US Naval Observatory, Edward S. Holden, working from ciphertext samples published in the TRIBUNE. The three men discovered multiple anagramming.
Multiple anagramming requires two transpositions with the same number of words or letters, and shuffled in exactly the same way. Suppose, for a simple example, Holmes has two transpositions consisting of five letters each, as follows:
: TASPR : PRSCO
A little examination shows that both of these two transpositions could be unscrambled into two different words:
: TASPR -> PARTS, TRAPS : PRSCO -> CROPS, CORPS
If Holmes didn't have the two messages, he would have no idea which of the two unscramblings would be correct for each message. However, performing the_same_ unscrambling on both messages in parallel shows that there's only one unscrambling that yields an intelligent answer for both messages:
: TASPR -> TRAPS PRATS PARTS : PRSCO -> PORCS CORPS CROPS : 12345 15243 45213 42513
Of course, multiple anagramming will also work on transpositions that use the same block size, though determining the block size complicates matters a bit.
The TRIBUNE published the solutions to the ciphers, which also revealed vote-buying exercises by the Democrats. The cleverness of the solution helped raise the visibility of the scandal to the public, and the Democrats suffered in the Congressional elections of that year.
[3.4] POLYBIUS, BIFID, PLAYFAIR, & STRADDLING CHECKERBOARD CIPHERS
* A number of new cipher techniques were developed that were based on the "Polybius square" or "checkerboard". This was a way of converting letters into pairs of numbers, devised in classical Greek times by a thinker named Polybius.
Updated into English, a Polybius square consists of the letters of the alphabet arranged as a 5-by-5 checkerboard grid with rows and columns numbered, and "I" and "J" assigned to the same grid location:
: 1 2 3 4 5 : ----------------- : 1 A B C D E : 2 F G H I/J K : 3 L M N O P : 4 Q R S T U : 5 V W X Y Z
The row-and-column index numbers are then substituted for letters. For example, "help" becomes "23 15 31 35". For added security, the arrangement of the letters in the grid can be scrambled, preferably by using a cipher keyword to determine the scrambling. For example, if we use the key "RICHARD MILHOUS NIXON", we get a checkerboard that looks like this:
: 1 2 3 4 5 : ----------------- : 1 R I/J C H A : 2 D M L O U : 3 S N X B E : 4 F G K P Q : 5 T V W Y Z
No matter how the grid is arranged, however, as stated it's just a monoalphabetic substitution cipher, with pairs of ciphertext digits in place of plaintext letters. It offers little security, though it can be used as a basis for "semaphore" codes, in which a signaler can hold two flags in five different positions each to send the full alphabet.
* Such a checkerboard has much more potential. Each plaintext letter is represented by two ciphertext digits, and that gives a new option for encrypting the message. Suppose Alice has a checkerboard based on the key "RICHARD MILHOUS NIXON" as above, and the plaintext:
: down with big brother
She can then obtain row-and-column indexes for the letters in this plaintext from the grid as follows:
: d: row = 2 / column = 1 : o: row = 2 / column = 4 : w: row = 5 / column = 3 : n: row = 3 / column = 2 : w: row = 5 / column = 3 : i: row = 1 / column = 2 : t: row = 5 / column = 1 : h: row = 1 / column = 4 : b: row = 3 / column = 4 : i: row = 1 / column = 2 : g: row = 4 / column = 2 : b: row = 3 / column = 4 : r: row = 1 / column = 1 : o: row = 2 / column = 4 : t: row = 5 / column = 1 : h: row = 1 / column = 4 : e: row = 3 / column = 5 : r: row = 1 / column = 1
Now she divides the plaintext into blocks of, say, five letters, and writes the indexes vertically underneath the letters:
: downw ithbi gbrot her : 22535 15131 43125 131 : 14323 21442 24141 451
To produce the final encryption, she concatenates the two rows on a block-by-block basis:
: 2124533253 1251143412 4234112451 143511
The message is divided into blocks to make encryption and decryption more manageable; there's no particular reason to use a block size of five, any other reasonable value will work as well. This particular scheme breaks down each letter into two cipher letters, a concept known as "fractionation"; and then it breaks the message down into blocks and concatenates the two halves, a concept known as "seriation". This scheme is known as a "bifid" cipher, and was introduced by a Frenchman named Felix Marie Delastelle (1840:1902) in a book published in 1901, although some of Delastelle's ideas were anticipated by an American mathematician and astronomer named Pliny Chase in 1859. It is also possible to "map" plaintext letters to three or more digits that can then be transposed, and such a three-digit mapping is called a "trifid" cipher.
* The checkerboard was also indirectly the basis for another tricky cipher, known as the "Playfair cipher", which was used by the British around the turn of the century. The Playfair cipher was actually invented by Sir Charles Wheatstone, a British pioneer in telegraphy, but it was promoted in 1854 by his friend Baron Playfair, who encouraged the British government to adopt it. Although Playfair consistently credited Wheatstone with the invention, it is still known as the Playfair cipher.
The scheme starts out with a checkerboard, but the index numbers are not necessary. Once again, using the key "RICHARD MILHOUS NIXON" gives the Alice the checkerboard:
: R I/J C H A : D M L O U : S N X B E : F G K P Q : T V W Y Z
Suppose Alice uses this checkerboard to encipher a message such as:
: we all attack at sunrise
She first divides the message into digraphs, or pairs of letters. To prevent Holmes from spotting letter pairs, the same two letters cannot be used in the same digraph, and so she uses an "x" to pad the text where necessary to break up the redundancy. She uses a "z" to fill the text out so the last letter is part of a digraph:
: we al la tx ta ck at su nr is ez
Next, Alice enciphers her message using the following three rules:
RULE ONE: If two letters in a digraph are in the same row of the grid, they are replaced by the letters to their right in the grid. If one of the letters to be enciphered is in the rightmost column of the grid, the replacement letter is selected by "wrapping around" to the leftmost column. For example, using the checkerboard defined above, a plaintext digraph like "uo" would become the ciphertext digraph "DU".
: R I/J C H A R I/J C H A : >D< M L O D M L >U< : S N X B E S N X B E : F G K P Q F G K P Q : T V W Y Z T V W Y Z : ----------------- ----------------- : "u" maps to "D" "o" maps to "U"
RULE TWO: Similarly, if the two letters in the digraph are in the same column of the grid, they are replaced by the letters below them in the grid. If one of the letters to be enciphered is in the bottom row of the grid, the replacement letter is selected by "wrapping around" to the top row. For example, the plaintext digraph "xw" becomes the ciphertext digraph "KC".
: R I/J C H A R I/J >C< H A** **: D M L O U D M L O U** **: S N B E S N X B E : F G >K< P Q F G K P Q** **: T V W Y Z T V Y Z
RULE THREE: If the two letters in the digraph are not in the same column or row, then the replacement letter is obtained by scanning along the row where one letter occurs until the column where the other letter occurs is found. The replacement letter is found at this intersection.
In the case of rule three, the two digraphs form a "rectangle" in the grid, with the plaintext pair of letters forming two corners at one angle through the rectangle, and the ciphertext pair of letters forming the other two corners, making conversion between the two straightforward. In this example, the plaintext digraph "we" becomes the ciphertext digraph "ZX".
: R I/J C H A : D M L O U : S N >X< B : F G K P Q : T V Y >Z< : ----------------- : "w" maps to "Z" : "e" maps to "X"
Using these rules, Alice takes the plaintext:
: we al la tx ta ck at su nr is ez
-- and generates the following ciphertext:
: ZX CU UC WS ZR LW RZ ED SI RN QA
The Playfair cipher sounds a bit complicated, but with a little practice ciphering and deciphering is simple. It is also not extremely secure, since it is just a digraph cipher, which had been both known and cracked long before. The only improvement the Playfair alphabet offered was a means to easily generate a substitution set from a keyword or keyphrase. This did improve security, since Alice could send every message to Bob with a unique key.
* There is yet another variation on the checkerboard scheme, called a "straddling checkerboard", that is something of a "part bifid" scheme. In a straddling checkerboard, eight letters are assigned one-digit values and the others are assigned two-digit values. A simple approach would be to start with a keyword, we'll say RICHARD MILHOUS NIXON again. Alice would arrange it as follows:
: 0 1 2 3 4 5 6 7 8 9 : ------------------------------------- : R I C H A D M L : 1 O U S N X B E F G J : 2 K P Q T V W Y Z . /
This table appears a bit puzzling, but it does follow orderly rules;
- The top row, of course, gives the one-digit letters ("0" for "R", "3" for "I", up to "9" for "L"); while the bottom two rows gives the two-digit letters ("10" for "O", "11" for "U", "12" for "S", and so on).
- Since this scheme implies a 28-letter cipher alphabet, it allows a separate table entry for "J". It also adds the "." and a "numeric escape" character, defined as "/" here. The numeric escape character is used to specify that numeric digits, not letters, are being enciphered.
- "1" and "2" can't be used to define single-digit letters, because they're used as row indexes. That would make it impossible for Bob to figure out if "12", for example, meant the two one-digit cipher letters "1" and "2", or if it meant the two-digit cipher letter "12".
- There's no particular reason that "1" and "2" have to be reserved for the two-digit cipher letters -- in fact, it is probably wise to vary the two numbers used, using "4" and "7", for example, instead of "1" and "2". Of course, whatever such arrangements are used, Alice and Bob must have agreed on them beforehand.
- Languages that have larger letter sets, such as the Russian Cyrillic alphabet, require the use of three digits. This gives a table with seven one-digit cipher letters and 30 two-digit characters.
Anyway, Alice could then use this mapping to encipher a secret message such as:
: there will be a delivery of 223 guns
-- resulting in:
: t he re w illb e ade liv e ry o f / 2 2 3 / g u n s : 2351601625399151667169324160261017292222332918111312
Notice how the number "223" is enciphered unchanged, enclosed by the "/" numeric escape character, with the digits repeated twice to compensate for errors in transmission. Spelling errors are often obvious, but wrong numbers can be hard to spot.
This isn't a very secure cipher in itself, since Holmes might guess from the common use of "1" and "2" that they are the first digits of the two-digit codes. Knowing that, Holmes would crack it just as he would crack any other empty-headed monoalphabetic substitution cipher. However, once Alice gets this cipher string she then divides it into groups and transposes it according to some prearranged pattern, as in a bifid cipher. The irregular enciphering pattern complicates Holmes' attempts to figure out the transposition pattern.
* During World War II, Soviet spies actually used a much more sophisticated version of a straddling checkerboard cipher as their standard method of encryption. One significant change was that the eight single-digit cipher letters were assigned to the eight most common plaintext letters, improving the efficiency of the cipher, as well as helping to conceal the first digits of the two-digit cipher letters by reducing their frequency.
These eight letters could be memorized as a mnemonic. For example, assuming that Alice and Bob had been recruited as Red spies -- though of course rather than think of our heroine and hero as dirty rotten traitors we can further assume they're acting as double agents -- they could memorize the eight most common English letters -- in alphabetical order, A E I N O R S T -- with the mnemonic "a sin to err", or more appropriately cryptically ASINTOER.
This of course complicates arranging the straddling checkerboard using a keyword, say our dependable standard WARTHOG. Alice could then use WARTHOG to generate a grid as follows:
: W A R T H O G : B C D E F I J : K L M N P Q S : U V X Y Z . /
Alice now reserves the digits "3" and "7" as the first digits of her two-digit characters. Of course, the selection of these two particular digits has to be arranged in advance with Bob.
Alice then goes through the grid by columns, top to bottom, left to right, and assigns the values of 0, 1, 2, 4, 5, 6, 8, and 9 to the ASINTOER characters as she finds them:
: W A R T H O G : 0 1 2 6 : B C D E F I J : 4 8 : K L M N P Q S : 5 9 : U V X Y Z . /
She then fills out the "30" through "39" characters using the same traversal:
: W A R T H O G : 30 0 1 2 6 : B C D E F I J : 31 34 37 4 8 : K L M N P Q S : 32 35 38 5 9 : U V X Y Z . / : 33 36 39
-- and fills out the rest of the grid with the "70" through "79" characters:
: W A R T H O G : 30 0 1 2 71 6 77 : B C D E F I J : 31 34 37 4 72 8 78 : K L M N P Q S : 32 35 38 5 73 75 9 : U V X Y Z . / : 33 36 39 70 74 76 79
She sorts this out into the more conventional (and readable) straddling checkerboard pattern:
: 0 1 2 3 4 5 6 7 8 9 : ------------------------------------- : A R T E N O I S : 3 W B K U C L V D M X : 7 Y H F P Z Q . G J /
Alice can use this straddling checkerboard to encipher a secret message, say:
: the war will start in 13 days
-- resulting in:
: th ew arw il l startin/ 1 3 / d ay s : 271430013083535920128579113379370709
-- or, spaced into five-digit groups for clarity and padded with zeroes as nulls:
: 27143 00130 83535 92012 85791 13379 37070 90000
Now Alice adds a second level of enciphering to this string of 40 digits. Instead of a transposition, however, she uses a numerical "masking" pattern, derived from some preselected commercially-published reference book of industrial statistics, adding it to the cipher digits on a digit-by-digit basis, with the "carry out" from the addition discarded.
For example, suppose on page 23 of our imaginary reference book there is a table of values as follows, listed from the top down:
: 304 706 1247 6738 204 563 967 324 : 4201 544 3047 1212 8794 2133 523 897 : 351 432 437 442 321 231 1086 123 : 555 552 1260 442 4067 5613 5068 667 : ...
Incidentally, this particular imaginary reference book has eight columns and never more than 80 rows per page, meaning the column number will always be 1 digit and the row number will never be more than two digits. Alice picks a particular row and column, say row 3, column 4. She does this arbitrarily; Bob has no way of knowing what she picked without being told. By convention, she starts with the third digit of the number in that location, and then takes the digits from the following fields until she gets 40 digits:
: 2 321 231 1086 123 : 555 552 1260 442 4067 5613 5068 6
-- or:
: 23212 31108 61235 55552 12604 42406 75613 50686
This is then added to the first-pass ciphertext on a digit-by-digit basis as follows:
: 27143 00130 83535 92012 85791 13379 37070 90000 : 23212 31108 61235 55552 12604 42406 75613 50686 : ----------------------------------------------- : 40355 31238 44760 47564 97395 55775 02683 40686
Essentially this is a Vigenere enciphering, with an unpredictable key as long as the text to frustrate the Kasiski test. The last touch is for Alice to add an "indicator" prefix which tells Bob where to look in his own reference book to find the mask. It is in row 3, column 4, page 23, so the indicator is:
: 03423
The row number is expressed as "03" and not "3" to eliminate any ambiguity. Now the indicator by itself would be a clue to Holmes, so Alice performs another digit-by-digit addition to the indicator, twice, using the fourth field of the ciphertext from the start and the fourth field of the ciphertext from the end:
: 40355 31238 44760 <47564> <97395> 55775 02683 40686 : ----- ----- : 03423 : 47564 : 97395 : ----- : 37272
Alice then prefixes this to the ciphertext to give the result:
: 37272 40355 31238 44760 47564 97395 55775 02683 40686
For a pencil and paper cipher, it was actually pretty damned hard to crack.
[3.5] GRILLES & TURNING GRILLES
* Yet another attempt to improve encryption involved "grilles" and "turning grilles", which were mechanical aids for generating a complicated transposition. A grille was a piece of cardboard with holes cut into it in a special pattern to define a transposition. For example, a 6-by-6 grille might look like this, where "O" indicates a hole:
: - O - - O O : O - - O O - : - - O O - - : O - - - - O : O O - - O O : O - O O - -
The message is written in across the grille, and then read off by columns. If Alice wants to send the message:
: Hang on a little longer, help is on the way!
-- she fills up the grille with plaintext as follows:
: hang on a little long: - h - - a n : g - - o n - : - - a l - - : i - - - - t : t l - - e l : o - n g - -
-- and then reads the result down by columns:
: gito hl an olg ane ntl
She then repeats the process, filling up the grille with remaining plaintext from the message. However, there's no reason that she has to use the grille in the same orientation. She can just as easily flip it over:
: O O - - O - : - O O - - O : - - O O - - : O - - - - O : O O - - O O : - - O O - O
-- and enter the second block as:
: er help is on the way: e r - - h - : - e l - - p : - - i s - - : o - - - - n : t h - - e w : - - a y - x
The final "x" is added as a null. This is read off by columns as:
: eot reh lia sy he pnwx
The final ciphertext is then:
: gito hl an olg ane ntl eot reh lia sy he pnwx : GITOHLANOLGANENTLEOTREHLIASYHEPNWX
If there had been more text in this message, Alice could have also turned it sideways or upside down. There is a total of eight possible orientations for such a grille, and to decipher the message Bob has to have the appropriate grille and know its sequence of orientations.
* The turning grille is a similar idea, literally with a twist. A 6-by-6 turning grille might look like this:
: - O - O - O : - - - - O - : - - O - - - : - O - - O - : - - - - - O : - - - O - -
To use this turning grille, Alice lays it over a sheet of paper, and writes the first nine letters of her message through the holes, writing across by rows. She then rotates the grille 90 degrees, writes the next nine letters; rotates it again, writes nine more letters; and rotates it once more to write another nine letters. The holes are arranged so that letters will not be written on top of each other when the grille is rotated.
Alice writes the first nine letters as follows:
: hang on a li: - H - A - N : - - - - G - : - - O - - - : - N - - A - : - - - - - L : - - - I - -
She then rotates the grille 90 degrees and writes nine more letters:
: ttle longe: T - - - T - : - L - E - - : L - - - - O : - - N - - - : G - - E - - : - - - - - -
This done, she rotates the grille again and writes the next nine letters:
: r help is on: - - R - - - : H - - - - - : - E - - L - : - - - P - - : - I - - - - : S - O - N -
Finally, she rotates the grille one last time and writes the last six letters, adding "now" to the message to make it come out even:
: the way now: - - - - - - : - - T - - H : - - - E - - : W - - - - A : - - Y - N - : - O - - - W
This gives the result:
: T H R A T N : H L T E G H : L E O E L O : W N N P A A : G I Y E N L : S O O I N W
It is surprisingly easy to devise a turning grille. For example, suppose Alice wants to build a turning grille with a 6 by 6 grid like that above. She starts by marking the numbers 1 through 9 in each "quadrant" of the grille, with each block of numbers shifted 90 degrees while traversing around this pattern in a clockwise fashion:
: 1 2 3 7 4 1 : 4 5 6 8 5 2 : 7 8 9 9 6 3 : : 3 6 9 9 8 7 : 2 5 8 6 5 4 : 1 4 7 3 2 1
In this example, the four quadrants are spaced apart for clarity, but they would not be in practice. Alice then selects the values 1 through 9 from the quadrants of the grid at random, selecting each value only once:
: - - - - 4 1 : - 5 - - - - : 7 - 9 - - - : : - 6 - - - - : - - 8 - - - : - - - 3 2 -
This directly gives the turning grille:
: - - - - 4 1 : - 5 - - - - : 7 - 9 - - - : - 6 - - - - : - - 8 - - - : - - - 3 2 -
-- or:
: - - - - O O : - O - - - - : O - O - - - : - O - - - - : - - O - - - : - - - O O -
A larger grille would be used in practice. Grille transpositions can be tough to crack, but they are vulnerable to multiple anagramming, aided by the Holmes' knowledge of how the grilles are constructed. If grilles are captured, that makes his task just that much easier.