Letter Frequency Analysis Tool
Frequency analysis examines how often each letter appears in a text and compares the distribution against known language patterns. It is one of the oldest and most powerful techniques in cryptanalysis — first described by Al-Kindi in the 9th century — and remains the primary method for breaking classical substitution ciphers like Caesar, Atbash, and keyword ciphers.
Input Text
Frequency Distribution
Enter text above to see the frequency distribution chart.
Frequently Asked Questions About Frequency Analysis
What is frequency analysis in cryptography?
Frequency analysis is a cryptanalysis technique that studies how often each letter appears in a piece of text. Since every language has a characteristic letter frequency distribution (for example, E is the most common letter in English at about 12.7%), analyzing the frequencies of letters in ciphertext can reveal the substitution pattern used to encrypt it. This method was first described by the Arab polymath Al-Kindi in the 9th century and remains one of the most fundamental tools in classical cryptography.
How does frequency analysis break substitution ciphers?
In a simple substitution cipher, each plaintext letter is consistently replaced by a single ciphertext letter. This means the frequency pattern of the original language is preserved in the ciphertext — just mapped to different letters. By comparing the ciphertext letter frequencies to known language frequencies, a cryptanalyst can match the most common ciphertext letter to E, the second most common to T, and so on. Combined with analysis of common digrams (TH, HE, IN) and trigrams (THE, AND, ING), most substitution ciphers can be broken with moderate amounts of ciphertext.
What are the most common English letter frequencies?
The most common letters in English, in order, are: E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%), N (6.7%), S (6.3%), H (6.1%), R (6.0%), and D (4.3%). The mnemonic ETAOIN SHRDLU captures the top 12 letters by frequency. The least common letters are Z (0.07%), Q (0.10%), X (0.15%), and J (0.15%). These frequencies are averages across large bodies of English text and may vary with specific texts, genres, and writing styles.
What is the chi-squared statistic in frequency analysis?
The chi-squared statistic measures how much an observed frequency distribution differs from an expected one. In frequency analysis, it compares the actual letter counts in your text against the counts you would expect if the text followed standard language frequencies. A low chi-squared value (below about 30 for 25 degrees of freedom) suggests the text matches normal language patterns, while a high value suggests the text is encrypted, written in a different language, or has an unusual letter distribution.
Which ciphers are vulnerable to frequency analysis?
Simple (monoalphabetic) substitution ciphers are the most vulnerable, including Caesar cipher, Atbash cipher, keyword cipher, and affine cipher. These all map each plaintext letter to exactly one ciphertext letter, preserving frequency patterns. Polyalphabetic ciphers like Vigenère make frequency analysis harder because each plaintext letter can encrypt to multiple ciphertext letters, but they can still be broken using the Kasiski examination or index of coincidence to determine the key length, after which each sub-cipher can be attacked individually.
How much ciphertext do you need for frequency analysis to work?
Generally, frequency analysis becomes reliable with at least 100-200 characters of ciphertext for simple substitution ciphers. With shorter texts, the natural variation in letter frequencies makes it harder to draw reliable conclusions. Very short messages (under 50 characters) may not contain enough data for letter frequencies to match the expected language pattern. For polyalphabetic ciphers, even more ciphertext is needed because the analysis must be performed on subsets of the text corresponding to each key position.