Letter Frequency Analysis Tool

Frequency analysis examines how often each letter appears in a text and compares the distribution against known language patterns. It is one of the oldest and most powerful techniques in cryptanalysis — first described by Al-Kindi in the 9th century — and remains the primary method for breaking classical substitution ciphers like Caesar, Atbash, and keyword ciphers.

ETAOIN SHRDLUMost common English letters

Input Text

0 letters
Analysis:
Compare with:
View:
Sort:

Frequency Distribution

Actual
Expected (English)

Enter text above to see the frequency distribution chart.

Frequently Asked Questions About Frequency Analysis

What is frequency analysis in cryptography?

Frequency analysis is a cryptanalysis technique that studies how often each letter appears in a piece of text. Since every language has a characteristic letter frequency distribution (for example, E is the most common letter in English at about 12.7%), analyzing the frequencies of letters in ciphertext can reveal the substitution pattern used to encrypt it. This method was first described by the Arab polymath Al-Kindi in the 9th century and remains one of the most fundamental tools in classical cryptography.

How does frequency analysis break substitution ciphers?

In a simple substitution cipher, each plaintext letter is consistently replaced by a single ciphertext letter. This means the frequency pattern of the original language is preserved in the ciphertext — just mapped to different letters. By comparing the ciphertext letter frequencies to known language frequencies, a cryptanalyst can match the most common ciphertext letter to E, the second most common to T, and so on. Combined with analysis of common digrams (TH, HE, IN) and trigrams (THE, AND, ING), most substitution ciphers can be broken with moderate amounts of ciphertext.

What are the most common English letter frequencies?

The most common letters in English, in order, are: E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%), N (6.7%), S (6.3%), H (6.1%), R (6.0%), and D (4.3%). The mnemonic ETAOIN SHRDLU captures the top 12 letters by frequency. The least common letters are Z (0.07%), Q (0.10%), X (0.15%), and J (0.15%). These frequencies are averages across large bodies of English text and may vary with specific texts, genres, and writing styles.

What is the chi-squared statistic in frequency analysis?

The chi-squared statistic measures how much an observed frequency distribution differs from an expected one. In frequency analysis, it compares the actual letter counts in your text against the counts you would expect if the text followed standard language frequencies. A low chi-squared value (below about 30 for 25 degrees of freedom) suggests the text matches normal language patterns, while a high value suggests the text is encrypted, written in a different language, or has an unusual letter distribution.

Which ciphers are vulnerable to frequency analysis?

Simple (monoalphabetic) substitution ciphers are the most vulnerable, including Caesar cipher, Atbash cipher, keyword cipher, and affine cipher. These all map each plaintext letter to exactly one ciphertext letter, preserving frequency patterns. Polyalphabetic ciphers like Vigenère make frequency analysis harder because each plaintext letter can encrypt to multiple ciphertext letters, but they can still be broken using the Kasiski examination or index of coincidence to determine the key length, after which each sub-cipher can be attacked individually.

How much ciphertext do you need for frequency analysis to work?

Generally, frequency analysis becomes reliable with at least 100-200 characters of ciphertext for simple substitution ciphers. With shorter texts, the natural variation in letter frequencies makes it harder to draw reliable conclusions. Very short messages (under 50 characters) may not contain enough data for letter frequencies to match the expected language pattern. For polyalphabetic ciphers, even more ciphertext is needed because the analysis must be performed on subsets of the text corresponding to each key position.

What are the most common English letter bigrams?

The most common English bigrams are TH (3.56%), HE (3.07%), IN (2.43%), ER (2.05%), AN (1.99%), RE (1.85%), ON (1.76%), AT (1.49%), EN (1.45%), and ND (1.35%). Analyzing bigram frequency can reveal patterns that single-letter frequency analysis misses.

How do you use frequency analysis to crack a cipher?

Count how often each letter appears in the ciphertext. Compare these frequencies with standard English letter frequencies (E=12.7%, T=9.1%, A=8.2%, O=7.5%, I=7.0%). The most frequent ciphertext letter likely represents E. Use common bigrams (TH, HE, IN) and short words (THE, AND, FOR) to confirm substitutions and gradually decode the message.

What is the Index of Coincidence?

The Index of Coincidence (IC) measures how likely two randomly chosen letters from a text are to be identical. English text has an IC of approximately 0.0667, while random text is about 0.0385. IC helps determine whether a cipher is monoalphabetic (IC near English) or polyalphabetic (IC closer to random), guiding which cryptanalysis approach to use.

When does frequency analysis fail?

Frequency analysis is unreliable on very short texts (under 100 characters), texts in specialized vocabularies, polyalphabetic ciphers like Vigenère (which flatten frequency distributions), and homophonic substitution ciphers that map frequent letters to multiple symbols. For polyalphabetic ciphers, you must first determine the key length using Kasiski examination or IC analysis.

How to Use Frequency Analysis

Frequency analysis is the process of counting how often each letter appears in a piece of text and using those counts to draw conclusions about the text's origin or encryption. Here is a step-by-step guide for using the tool on this page:

  1. Paste or type your text into the input field. The tool accepts any text — plaintext, ciphertext, or mixed content. For best results, use at least 100 characters of text so that frequency patterns are statistically meaningful.

  2. Review the frequency chart. The interactive bar chart displays each letter's frequency as a percentage of total alphabetic characters. Letters are arranged alphabetically by default, but you can sort by frequency to quickly identify the most and least common letters.

  3. Compare with English frequencies. The chart overlays standard English letter frequencies alongside your text's distribution. Look for the characteristic peaks at E, T, A, O, and I. If the peaks are shifted uniformly, you may be looking at a Caesar cipher. If the distribution appears flattened, a polyalphabetic cipher like Vigenere is likely.

  4. Check the chi-squared statistic. This single number summarizes how closely your text matches expected English frequencies. A chi-squared value below 30 suggests normal English; values above 50 strongly suggest encryption or a non-English language.

  5. Examine per-letter deviations. The detailed statistics table shows each letter's observed frequency, expected frequency, and the deviation between them. Large positive deviations indicate letters that appear more often than expected in English; large negative deviations indicate letters that appear less often.

  6. Form hypotheses and test them. If you suspect a monoalphabetic substitution cipher, map the most frequent ciphertext letter to E, the second to T, and so on. Check these guesses against common bigrams and trigrams. Adjust your substitutions until coherent plaintext emerges.

English Letter Frequency Reference Table

The following table shows the standard frequency distribution for each letter in English text, based on analysis of large text corpora. These values represent averages and will vary across different genres, authors, and text lengths.

LetterFrequency (%)Example WordsLetterFrequency (%)Example Words
A8.167and, are, atN6.749not, new, no
B1.492but, be, byO7.507of, or, on
C2.782can, comeP1.929put, part
D4.253do, did, dayQ0.095queen, quite
E12.702the, he, beR5.987are, her, or
F2.228for, fromS6.327so, she, is
G2.015get, go, gotT9.056the, to, it
H6.094he, has, hadU2.758up, us, use
I6.966in, is, itV0.978very, have
J0.153just, jobW2.360was, we, with
K0.772know, keepX0.150next, six
L4.025like, lastY1.974you, year
M2.406my, me, mayZ0.074zero, zone

The mnemonic ETAOIN SHRDLU captures the twelve most frequent letters in descending order: E, T, A, O, I, N, S, H, R, D, L, U. This sequence was so well known among typographers that it became a cultural reference in its own right.

Breaking a Cipher: Worked Example

Consider the following ciphertext, which was encrypted using a simple substitution cipher:

GSZIV GSV OVGGVI UIVJFVMXB WRHGIRYF GRLM LU GSRH GVCG DRGS HGZMWZIW VMTORHSFIVJFVMXRVH GL XIZXP GSV XRKSVI

Step 1: Count letter frequencies.

Analyzing this text, the most frequent letters are:

RankLetterCountFrequency
1G1413.2%
2V1211.3%
3R98.5%
4H87.5%
5I76.6%

Step 2: Compare with English frequencies.

In standard English, the top five letters are E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%). Comparing:

  • G (13.2%) likely maps to T (9.1%) or E (12.7%)
  • V (11.3%) likely maps to E (12.7%) or T (9.1%)

Step 3: Look for common patterns.

The three-letter word "GSV" appears multiple times. The most common three-letter word in English is "THE." If GSV = THE, then G=T, S=H, V=E.

Step 4: Apply the hypothesis and extend.

With G=T, S=H, V=E, let us check "GSZIV" — substituting gives "THA_E" which strongly suggests "SHARE" (Z=R, I=R... wait, I is already different). Actually, Z=A and I=R gives "THARE" — close to "SHARE." Checking further: this is actually an Atbash cipher where each letter is mapped to its reverse (A<->Z, B<->Y, etc.). The letter G (position 7) maps to T (position 20), confirming 7+20=27, which is the Atbash pattern (position + reverse position = 27).

Step 5: Decode the full message.

Applying the Atbash substitution decodes the entire message to: "SHARE THE LETTER FREQUENCY DISTRIBUTION OF THIS TEXT WITH STANDARD ENGLISH FREQUENCIES TO CRACK THE CIPHER"

This example demonstrates how frequency analysis, combined with pattern recognition and knowledge of common words, can systematically break a substitution cipher.

N-gram Analysis: Bigrams and Trigrams

Single-letter frequency analysis is powerful, but analyzing pairs (bigrams) and triples (trigrams) of consecutive letters reveals even more about the structure of a text. N-gram analysis exploits the fact that English — and every natural language — has strong statistical preferences for certain letter combinations.

Top 10 English Bigrams

RankBigramFrequency (%)Notes
1TH3.56The most common bigram; starts "the," "that," "this," "them"
2HE3.07Found in "the," "he," "her," "here," "them"
3IN2.43Common preposition and word ending ("-ing," "-tion")
4ER2.05Common word ending ("-er," "-ler," "-ber") and in "her," "every"
5AN1.99Article "an" and in "and," "any," "can," "man"
6RE1.85Prefix "re-" and in "are," "were," "here"
7ON1.76Preposition and in "one," "only," "upon"
8AT1.49Preposition and in "that," "what," "cat"
9EN1.45Common ending ("-en," "-ment") and in "then," "when"
10ND1.35Ending of "and," "end," "find," "kind"

Top 10 English Trigrams

RankTrigramFrequency (%)Notes
1THE3.51The most common English word
2AND1.59The most common conjunction
3ING1.47Present participle suffix
4HER0.90Pronoun and in "there," "where," "other"
5THA0.83Start of "that," "than"
6ERE0.78In "there," "where," "here"
7FOR0.76Common preposition
8ENT0.73Suffix in "went," "sent," "ment"
9ION0.70Common suffix "-tion," "-sion"
10TER0.68In "after," "water," "letter"

Using N-grams in Cryptanalysis

When single-letter frequency analysis produces multiple plausible mappings, bigram and trigram analysis helps narrow down the correct substitution:

  1. Identify repeated bigrams in the ciphertext. The most common ciphertext bigram likely corresponds to TH.
  2. Look for trigram patterns. If a particular three-letter sequence dominates, it probably represents THE.
  3. Check word boundaries. Two-letter words are extremely constrained in English (common ones: OF, TO, IN, IS, IT, AS, AT, WE, HE, BY, OR, ON, DO, IF, ME, MY, UP, AN, GO, NO, US, AM, SO). If you can identify word boundaries in the ciphertext, testing these against known two-letter words rapidly constrains the solution space.
  4. Combine with letter frequency data. Once you have high-confidence mappings from n-gram analysis, use them to anchor your single-letter frequency assignments.

When Frequency Analysis Fails

Frequency analysis is not a universal cipher-breaking tool. Several types of encryption resist or defeat it entirely:

Polyalphabetic Ciphers

Ciphers like the Vigenere cipher use multiple substitution alphabets, cycling through them with each letter. This distributes each plaintext letter across several different ciphertext letters, flattening the frequency distribution and making it resemble random text. Breaking polyalphabetic ciphers requires first determining the key length (using the Kasiski examination or the Index of Coincidence), then applying frequency analysis separately to each sub-cipher.

Homophonic Substitution

A homophonic substitution cipher maps each plaintext letter to multiple possible ciphertext symbols, with more frequent letters having more alternatives. For instance, E might map to any of five different symbols, while Z maps to only one. This equalizes the ciphertext frequency distribution, defeating simple counting-based attacks. Breaking homophonic ciphers requires more sophisticated techniques including bigram frequency analysis and hill-climbing algorithms.

Short Texts

With fewer than 100 characters, the natural statistical variation in letter usage can be larger than the signal you are trying to detect. A short text might happen to contain no letter E at all, even though E is the most common letter in English. In these cases, frequency analysis provides only weak evidence and must be supplemented by other techniques such as known-plaintext attacks or context-based guessing.

Null Ciphers and Steganography

Some encryption methods hide the message within apparently innocent text, making frequency analysis of the carrier text useless because the carrier text has normal frequency distributions. Detecting these requires different analytical approaches entirely.

Modern Encryption

Modern cryptographic algorithms (AES, RSA, ChaCha20) produce ciphertext that is computationally indistinguishable from random data. Every byte value appears with equal probability, and no amount of frequency analysis can reveal any information about the plaintext. Frequency analysis is strictly a tool for classical ciphers.

  • Caesar Cipher Decoder — The Caesar cipher is one of the simplest substitution ciphers, vulnerable to frequency analysis because it merely shifts the entire frequency distribution by a fixed amount.
  • Keyword Cipher — A monoalphabetic substitution cipher that rearranges the alphabet using a keyword. Frequency analysis is the primary method for breaking keyword ciphers.
  • Homophonic Cipher — Designed specifically to defeat frequency analysis by mapping common letters to multiple ciphertext symbols, equalizing the output distribution.
  • Cipher Identifier — Use the cipher identifier to determine what type of cipher was used to encrypt a message before choosing an analysis approach.
  • Vigenere Cipher — A polyalphabetic cipher that resists simple frequency analysis. Breaking it requires first determining the key length using Kasiski examination or the Index of Coincidence, then applying frequency analysis to each sub-cipher individually.