Question 1

What is frequency analysis in cryptography?

Accepted Answer

Frequency analysis is a cryptanalysis technique that studies how often each letter appears in a piece of text. Since every language has a characteristic letter frequency distribution (for example, E is the most common letter in English at about 12.7%), analyzing the frequencies of letters in ciphertext can reveal the substitution pattern used to encrypt it. This method was first described by the Arab polymath Al-Kindi in the 9th century and remains one of the most fundamental tools in classical cryptography.

Question 2

How does frequency analysis break substitution ciphers?

Accepted Answer

In a simple substitution cipher, each plaintext letter is consistently replaced by a single ciphertext letter. This means the frequency pattern of the original language is preserved in the ciphertext — just mapped to different letters. By comparing the ciphertext letter frequencies to known language frequencies, a cryptanalyst can match the most common ciphertext letter to E, the second most common to T, and so on. Combined with analysis of common digrams (TH, HE, IN) and trigrams (THE, AND, ING), most substitution ciphers can be broken with moderate amounts of ciphertext.

Question 3

What are the most common English letter frequencies?

Accepted Answer

The most common letters in English, in order, are: E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%), N (6.7%), S (6.3%), H (6.1%), R (6.0%), and D (4.3%). The mnemonic ETAOIN SHRDLU captures the top 12 letters by frequency. The least common letters are Z (0.07%), Q (0.10%), X (0.15%), and J (0.15%). These frequencies are averages across large bodies of English text and may vary with specific texts, genres, and writing styles.

Question 4

What is the chi-squared statistic in frequency analysis?

Accepted Answer

The chi-squared statistic measures how much an observed frequency distribution differs from an expected one. In frequency analysis, it compares the actual letter counts in your text against the counts you would expect if the text followed standard language frequencies. A low chi-squared value (below about 30 for 25 degrees of freedom) suggests the text matches normal language patterns, while a high value suggests the text is encrypted, written in a different language, or has an unusual letter distribution.

Question 5

Which ciphers are vulnerable to frequency analysis?

Accepted Answer

Simple (monoalphabetic) substitution ciphers are the most vulnerable, including Caesar cipher, Atbash cipher, keyword cipher, and affine cipher. These all map each plaintext letter to exactly one ciphertext letter, preserving frequency patterns. Polyalphabetic ciphers like Vigenere make frequency analysis harder because each plaintext letter can encrypt to multiple ciphertext letters, but they can still be broken using the Kasiski examination or index of coincidence to determine the key length, after which each sub-cipher can be attacked individually.

Question 6

How much ciphertext do you need for frequency analysis to work?

Accepted Answer

Generally, frequency analysis becomes reliable with at least 100-200 characters of ciphertext for simple substitution ciphers. With shorter texts, the natural variation in letter frequencies makes it harder to draw reliable conclusions. Very short messages (under 50 characters) may not contain enough data for letter frequencies to match the expected language pattern. For polyalphabetic ciphers, even more ciphertext is needed because the analysis must be performed on subsets of the text corresponding to each key position.

Letter Frequency Analysis Tool

Input Text

Frequency Distribution

Frequently Asked Questions About Frequency Analysis