|
It is a curious fact that
popular interest in this country in the subject |
Lt. Col. William F. Friedman, 1936 |
The main character is William Legrand. To escape some misfortunes, he lives at Sullivan's Island, near Charleston, South Carolina. Legrand discovers a brilliant gold-colored bug but lends it to someone else. When his friend, the narrator of the story, visits him, Legrand describes the rare bug with a death's-head on its back, and draws a picture of the bug on a piece of paper. A short while later, Legrand urges his friend to come back as soon as possible.
When his friend arrives, Legrand explains that he discovered a secret message in invisible ink after accidentally heating the paper with the drawing of the gold-bug. Having managed to decrypt the message, he invites his friend to join him on an expedition into the forest to search for a treasure near some rocks. Fearing that Legrand has lost his sanity, the worried friend agrees to accompany him.
| The
cryptogram, as discovered by Legrand:
|
|
The Gold-Bug is not only an exciting story about the discovery of an old treasure, but also a great introduction to cryptography and codebreaking. It tickles the reader's curiosity and Poe gave a detailed description of how to decipher the cryptogram and also provided the solution. However, deciphering the message yourself is even more exciting than reading how Legrand did in the story. Can I challenge you to decrypt the message, composed more than 160 years ago?
Rather than just reading Poes story, we will show the technique and give you the chance to do it all yourself. It might be useful to read The Gold-Bug first, as the story might provide information that will help to solve the cryptogram, but only read to where the cryptogram appears! Don't cheat by reading any further for the solution. Searching the Internet for Poe or The Gold-Bug will also spoil the fun.
The message is encrypted by mono-alphabetic substitution, a cipher where each letter of the alphabet is replaced by another letter or symbol. We can calculate all possible he combinations for the 26 letters of the alphabet. The first letter is substituted by one of 26 symbols or letters, including itself. The second by one of the 25 remaining symbols or letters, and so on. The calculation 26 x 25 x 24 x 3 x 2 x 1 or 26! gives us a total of 403,291,461,126,605,635,584,000,000 different ways to replace 26 letters by another symbol or letter. How could we possibly decipher such a cryptogram? For centuries, substitution ciphers were regarded as unbreakable, but it is easier than it looks.
Although there are septillions of ways to allocate a set of symbols to letters, there are only a few ways to combine vowels and consonants in a natural language. Strict linguistic rules determine which letter combinations are possible and which are not. The syntax prescribes how words, and their order, are combined into a sentence, and conjugation rules determine how verbs are written. When we substitute letters with symbols, those symbols still follow all these rules and therefore create patterns that we can detect. Just as certain letter combinations are impossible (ZLG, XOJ, KFN,...), so will certain symbols avoid one another. Just as one vowel can only fit in a given set of consonants (THR?ST, L?GHT, ?NSW?R,...) so will certain symbols attract each other. But where to start?
The most important weapon to solve our message is letter frequency analysis, the basis of all codebreaking. Each language has its own typical distribution of letters in a text. In English, the letter E is by far the dominant letter, with an average of 12.7 percent. If we locate some of the most frequent vowels or consonants in the ciphertext or find the most frequently recurring symbols or combinations of symbols, then the rules of the language will give us strong leads to the words they are used in, or the letter they represent.
In the English frequency table below, the letters are sorted from most frequent left to least frequent right. Poe used an older and slightly different frequency table. However, frequency tables in any given language can differ, depending on subject, style and size of the text. Also, Poe did not use spaces, or a symbol for space, in his text. In plain text, the space occurs much more often than the letter E and would stand out clearly. To illustrate how distinctive letter frequency is, lets count how often certain letters occur on this webpage: E=1163 T=887 A=630 D=320 W=158 V=70 Z=10.
To solve Legrands message, use a graph
sheet to write down the secret text with symbols. Leave a
blank row below each row of symbols, to write your
solutions underneath the symbols in pencil (easily
corrected with eraser). Next, count how many times each
of the symbols appears in the cryptogram, and write down
the symbols, ordered from most frequent to least
frequent, underneath the letters of the English letter
frequency distribution, shown in the table on the right.
This gives a first rough indication of the frequency
distribution of the symbols and the letters they might
represent.
You will notice that one symbol clearly stands out. This is the first major clue, because the most frequent symbol represents without doubt the letter E, which is the most frequent letter in English. Write the letter E in pencil everywhere in the encrypted text, under the corresponding symbol, and also write that symbol underneath the letter E in the English letter frequency list. Make yourself also a second table with all symbols and corresponding letters you already found.
Next, try to spot recurring combinations of symbols. Some of the most used words in English, in order of frequency, are THE, OF, TO, AND, IN. Search for recurring combinations of three or two symbols that might represent those words. You should spot each THE quite easily. If so, you have discovered the solution for two more letters that are used frequently. However, not every combination of three symbols will represent THE or AND.
When completing words with an obvious letter, you will also be able to find or complete more words. If, for instance, you already identified the letters T and E, and you encounter fragment T?EE, it is not hard to imagine the missing letter R, especially when you already know the context. Vowel twins (AA, EE, OO ) are common but not that many different words contain such pairs. Try to find those words. Initially, it will be difficult to divide the text into words, but the more letters you solve, the easier it gets. If you cant see it immediately, try various unidentified letters of the alphabet until you get something readable. When you assign a newly discovered letter to all its corresponding symbols in the encrypted text, this also helps you to reconstruct other fragments or locate words.
Be patient. It could take some time before the words appear in front of you, and some letter combinations might turn out to be incorrect. However, the more letters you do find, the faster you identify new letters or words. Before you know, youre in a straight line to the finish. Letter frequency analysis is a powerful tool in cryptology and letter frequency tables for many other languages are available on the Internet.
Good luck...and make sure you don't get bitten by the bug!
Note: in the original edition, one symbol "(" was not printed near the end of the message, right after 9;48; although Legrand describes just that missing symbol to assist in finding a word. Since the story refers to that symbol, it is unlikely that it was omitted on purpose and probably got lost during the publishing.
© Cipher Machines and Cryptology 2004.
Last changes: 26 February 2026
| Home |