How to Use Frequency Analysis
Frequency analysis is the process of counting how often each letter appears in a piece of text and using those counts to draw conclusions about the text's origin or encryption. Here is a step-by-step guide for using the tool on this page:
-
Paste or type your text into the input field. The tool accepts any text — plaintext, ciphertext, or mixed content. For best results, use at least 100 characters of text so that frequency patterns are statistically meaningful.
-
Review the frequency chart. The interactive bar chart displays each letter's frequency as a percentage of total alphabetic characters. Letters are arranged alphabetically by default, but you can sort by frequency to quickly identify the most and least common letters.
-
Compare with English frequencies. The chart overlays standard English letter frequencies alongside your text's distribution. Look for the characteristic peaks at E, T, A, O, and I. If the peaks are shifted uniformly, you may be looking at a Caesar cipher. If the distribution appears flattened, a polyalphabetic cipher like Vigenere is likely.
-
Check the chi-squared statistic. This single number summarizes how closely your text matches expected English frequencies. A chi-squared value below 30 suggests normal English; values above 50 strongly suggest encryption or a non-English language.
-
Examine per-letter deviations. The detailed statistics table shows each letter's observed frequency, expected frequency, and the deviation between them. Large positive deviations indicate letters that appear more often than expected in English; large negative deviations indicate letters that appear less often.
-
Form hypotheses and test them. If you suspect a monoalphabetic substitution cipher, map the most frequent ciphertext letter to E, the second to T, and so on. Check these guesses against common bigrams and trigrams. Adjust your substitutions until coherent plaintext emerges.
English Letter Frequency Reference Table
The following table shows the standard frequency distribution for each letter in English text, based on analysis of large text corpora. These values represent averages and will vary across different genres, authors, and text lengths.
| Letter | Frequency (%) | Example Words | Letter | Frequency (%) | Example Words |
|---|---|---|---|---|---|
| A | 8.167 | and, are, at | N | 6.749 | not, new, no |
| B | 1.492 | but, be, by | O | 7.507 | of, or, on |
| C | 2.782 | can, come | P | 1.929 | put, part |
| D | 4.253 | do, did, day | Q | 0.095 | queen, quite |
| E | 12.702 | the, he, be | R | 5.987 | are, her, or |
| F | 2.228 | for, from | S | 6.327 | so, she, is |
| G | 2.015 | get, go, got | T | 9.056 | the, to, it |
| H | 6.094 | he, has, had | U | 2.758 | up, us, use |
| I | 6.966 | in, is, it | V | 0.978 | very, have |
| J | 0.153 | just, job | W | 2.360 | was, we, with |
| K | 0.772 | know, keep | X | 0.150 | next, six |
| L | 4.025 | like, last | Y | 1.974 | you, year |
| M | 2.406 | my, me, may | Z | 0.074 | zero, zone |
The mnemonic ETAOIN SHRDLU captures the twelve most frequent letters in descending order: E, T, A, O, I, N, S, H, R, D, L, U. This sequence was so well known among typographers that it became a cultural reference in its own right.
Breaking a Cipher: Worked Example
Consider the following ciphertext, which was encrypted using a simple substitution cipher:
GSZIV GSV OVGGVI UIVJFVMXB WRHGIRYF GRLM LU GSRH GVCG DRGS HGZMWZIW VMTORHSFIVJFVMXRVH GL XIZXP GSV XRKSVI
Step 1: Count letter frequencies.
Analyzing this text, the most frequent letters are:
| Rank | Letter | Count | Frequency |
|---|---|---|---|
| 1 | G | 14 | 13.2% |
| 2 | V | 12 | 11.3% |
| 3 | R | 9 | 8.5% |
| 4 | H | 8 | 7.5% |
| 5 | I | 7 | 6.6% |
Step 2: Compare with English frequencies.
In standard English, the top five letters are E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%). Comparing:
- G (13.2%) likely maps to T (9.1%) or E (12.7%)
- V (11.3%) likely maps to E (12.7%) or T (9.1%)
Step 3: Look for common patterns.
The three-letter word "GSV" appears multiple times. The most common three-letter word in English is "THE." If GSV = THE, then G=T, S=H, V=E.
Step 4: Apply the hypothesis and extend.
With G=T, S=H, V=E, let us check "GSZIV" — substituting gives "THA_E" which strongly suggests "SHARE" (Z=R, I=R... wait, I is already different). Actually, Z=A and I=R gives "THARE" — close to "SHARE." Checking further: this is actually an Atbash cipher where each letter is mapped to its reverse (A<->Z, B<->Y, etc.). The letter G (position 7) maps to T (position 20), confirming 7+20=27, which is the Atbash pattern (position + reverse position = 27).
Step 5: Decode the full message.
Applying the Atbash substitution decodes the entire message to: "SHARE THE LETTER FREQUENCY DISTRIBUTION OF THIS TEXT WITH STANDARD ENGLISH FREQUENCIES TO CRACK THE CIPHER"
This example demonstrates how frequency analysis, combined with pattern recognition and knowledge of common words, can systematically break a substitution cipher.
N-gram Analysis: Bigrams and Trigrams
Single-letter frequency analysis is powerful, but analyzing pairs (bigrams) and triples (trigrams) of consecutive letters reveals even more about the structure of a text. N-gram analysis exploits the fact that English — and every natural language — has strong statistical preferences for certain letter combinations.
Top 10 English Bigrams
| Rank | Bigram | Frequency (%) | Notes |
|---|---|---|---|
| 1 | TH | 3.56 | The most common bigram; starts "the," "that," "this," "them" |
| 2 | HE | 3.07 | Found in "the," "he," "her," "here," "them" |
| 3 | IN | 2.43 | Common preposition and word ending ("-ing," "-tion") |
| 4 | ER | 2.05 | Common word ending ("-er," "-ler," "-ber") and in "her," "every" |
| 5 | AN | 1.99 | Article "an" and in "and," "any," "can," "man" |
| 6 | RE | 1.85 | Prefix "re-" and in "are," "were," "here" |
| 7 | ON | 1.76 | Preposition and in "one," "only," "upon" |
| 8 | AT | 1.49 | Preposition and in "that," "what," "cat" |
| 9 | EN | 1.45 | Common ending ("-en," "-ment") and in "then," "when" |
| 10 | ND | 1.35 | Ending of "and," "end," "find," "kind" |
Top 10 English Trigrams
| Rank | Trigram | Frequency (%) | Notes |
|---|---|---|---|
| 1 | THE | 3.51 | The most common English word |
| 2 | AND | 1.59 | The most common conjunction |
| 3 | ING | 1.47 | Present participle suffix |
| 4 | HER | 0.90 | Pronoun and in "there," "where," "other" |
| 5 | THA | 0.83 | Start of "that," "than" |
| 6 | ERE | 0.78 | In "there," "where," "here" |
| 7 | FOR | 0.76 | Common preposition |
| 8 | ENT | 0.73 | Suffix in "went," "sent," "ment" |
| 9 | ION | 0.70 | Common suffix "-tion," "-sion" |
| 10 | TER | 0.68 | In "after," "water," "letter" |
Using N-grams in Cryptanalysis
When single-letter frequency analysis produces multiple plausible mappings, bigram and trigram analysis helps narrow down the correct substitution:
- Identify repeated bigrams in the ciphertext. The most common ciphertext bigram likely corresponds to TH.
- Look for trigram patterns. If a particular three-letter sequence dominates, it probably represents THE.
- Check word boundaries. Two-letter words are extremely constrained in English (common ones: OF, TO, IN, IS, IT, AS, AT, WE, HE, BY, OR, ON, DO, IF, ME, MY, UP, AN, GO, NO, US, AM, SO). If you can identify word boundaries in the ciphertext, testing these against known two-letter words rapidly constrains the solution space.
- Combine with letter frequency data. Once you have high-confidence mappings from n-gram analysis, use them to anchor your single-letter frequency assignments.
When Frequency Analysis Fails
Frequency analysis is not a universal cipher-breaking tool. Several types of encryption resist or defeat it entirely:
Polyalphabetic Ciphers
Ciphers like the Vigenere cipher use multiple substitution alphabets, cycling through them with each letter. This distributes each plaintext letter across several different ciphertext letters, flattening the frequency distribution and making it resemble random text. Breaking polyalphabetic ciphers requires first determining the key length (using the Kasiski examination or the Index of Coincidence), then applying frequency analysis separately to each sub-cipher.
Homophonic Substitution
A homophonic substitution cipher maps each plaintext letter to multiple possible ciphertext symbols, with more frequent letters having more alternatives. For instance, E might map to any of five different symbols, while Z maps to only one. This equalizes the ciphertext frequency distribution, defeating simple counting-based attacks. Breaking homophonic ciphers requires more sophisticated techniques including bigram frequency analysis and hill-climbing algorithms.
Short Texts
With fewer than 100 characters, the natural statistical variation in letter usage can be larger than the signal you are trying to detect. A short text might happen to contain no letter E at all, even though E is the most common letter in English. In these cases, frequency analysis provides only weak evidence and must be supplemented by other techniques such as known-plaintext attacks or context-based guessing.
Null Ciphers and Steganography
Some encryption methods hide the message within apparently innocent text, making frequency analysis of the carrier text useless because the carrier text has normal frequency distributions. Detecting these requires different analytical approaches entirely.
Modern Encryption
Modern cryptographic algorithms (AES, RSA, ChaCha20) produce ciphertext that is computationally indistinguishable from random data. Every byte value appears with equal probability, and no amount of frequency analysis can reveal any information about the plaintext. Frequency analysis is strictly a tool for classical ciphers.
Related Tools
- Caesar Cipher Decoder — The Caesar cipher is one of the simplest substitution ciphers, vulnerable to frequency analysis because it merely shifts the entire frequency distribution by a fixed amount.
- Keyword Cipher — A monoalphabetic substitution cipher that rearranges the alphabet using a keyword. Frequency analysis is the primary method for breaking keyword ciphers.
- Homophonic Cipher — Designed specifically to defeat frequency analysis by mapping common letters to multiple ciphertext symbols, equalizing the output distribution.
- Cipher Identifier — Use the cipher identifier to determine what type of cipher was used to encrypt a message before choosing an analysis approach.
- Vigenere Cipher — A polyalphabetic cipher that resists simple frequency analysis. Breaking it requires first determining the key length using Kasiski examination or the Index of Coincidence, then applying frequency analysis to each sub-cipher individually.