How to Break the Vigenere Cipher: Kasiski Examination and Key Recovery

For over three centuries, the Vigenere cipher was known as "le chiffre indéchiffrable" — the indecipherable cipher. Its polyalphabetic substitution defeated every known attack of its era by concealing the letter frequency patterns that make monoalphabetic ciphers like the Caesar cipher trivially breakable. The same plaintext letter could encrypt to entirely different ciphertext letters depending on its position, and cryptanalysts had no way to exploit that variation.

Then, in the 1850s and 1860s, two breakthroughs shattered the myth of invincibility. Charles Babbage secretly broke the cipher around 1854, and Friedrich Kasiski independently published a general attack in 1863. Their insight was elegant: the repeating keyword creates periodic patterns in the ciphertext, and those patterns can be measured, analyzed, and exploited to recover the key without ever seeing the original plaintext.

This article walks through the complete cryptanalysis of the Vigenere cipher — from recognizing that you are dealing with polyalphabetic encryption, through determining the key length, to recovering each individual key letter. Whether you are solving a CTF challenge, studying for a cryptography course, or simply curious about how codebreaking works, these techniques will give you a systematic method for breaking any Vigenere-encrypted message.

Why the Vigenere Cipher Was Considered Unbreakable

To understand why breaking the Vigenere cipher required entirely new methods, it helps to see what makes it different from simpler substitution ciphers.

In a Caesar cipher, every letter shifts by the same amount. The letter E, which is the most common letter in English (appearing roughly 12.7% of the time), simply moves to a new position. A cryptanalyst can look at the frequency distribution of the ciphertext, find the peak, and immediately determine the shift. The entire key space is just 25 possible values.

The Vigenere cipher eliminates this vulnerability by using a repeating keyword to vary the shift at each position. If the keyword is "LEMON" (length 5), then position 1 shifts by 11 (L), position 2 shifts by 4 (E), position 3 shifts by 12 (M), position 4 shifts by 14 (O), and position 5 shifts by 13 (N). Position 6 cycles back to a shift of 11, and the pattern repeats.

The result is that the letter E no longer encrypts to a single ciphertext letter. Depending on which keyword letter it aligns with, E could become P, I, Q, S, R, or any other letter. The frequency distribution of the ciphertext flattens dramatically — no single letter dominates — and the straightforward frequency analysis that cracks Caesar in seconds becomes useless.

But "useless" is not the same as "impossible." The repeating keyword introduces a subtle structural weakness that Kasiski and Babbage independently discovered how to exploit.

Recognizing Vigenere Ciphertext

Before attempting to break a cipher, you need to identify whether you are dealing with Vigenere encryption. Several characteristics distinguish it from other cipher types:

Alphabetic characters with preserved word boundaries. Like Caesar, the Vigenere cipher traditionally encrypts only letters, leaving spaces and punctuation intact. Word lengths match the original plaintext.
A flattened but not uniform letter distribution. In Caesar ciphertext, the frequency distribution looks like English but shifted. In Vigenere ciphertext, the distribution is significantly flatter — no single letter dominates — but it is not completely uniform either. Completely uniform frequencies would suggest a one-time pad or random text.
An Index of Coincidence between 0.04 and 0.05. The IC of English text is approximately 0.0667, while truly random text has an IC of approximately 0.0385. Vigenere ciphertext typically falls between these values, and that intermediate range is a strong indicator of polyalphabetic substitution.
Repeated sequences at regular intervals. If you spot identical three-letter or longer sequences appearing at distances that share a common factor, the text is very likely Vigenere-encrypted. This observation is the foundation of Kasiski examination.

If you want to automate this identification step, the Cipher Identifier tool uses these characteristics to detect Vigenere encryption and estimate the key length automatically.

The Two-Stage Attack Strategy

Breaking the Vigenere cipher requires a two-stage approach. This division is fundamental: the cipher's security depends on two separate secrets (the key length and the key letters themselves), and each must be attacked with different techniques.

Stage 1: Determine the key length using Kasiski examination and/or Index of Coincidence analysis.

Stage 2: Once the key length is known, split the ciphertext into groups and recover each key letter using frequency analysis.

The beauty of this approach is that it reduces the problem of breaking a polyalphabetic cipher into multiple instances of breaking simple Caesar ciphers — a problem that was already solved centuries earlier.

Stage 1: Finding the Key Length

Method A: Kasiski Examination

Published in 1863 by Friedrich Kasiski, a retired Prussian military officer, in his book Die Geheimschriften und die Dechiffrir-Kunst ("Secret Writing and the Art of Deciphering"), this method exploits a fundamental consequence of the repeating keyword.

The core insight is this: common plaintext sequences — words like "THE", "AND", "ING", "TION" — will occasionally align with the same position in the repeating keyword. When that happens, the same plaintext sequence encrypted with the same key letters produces identical ciphertext sequences. By finding these repeated ciphertext sequences and measuring the distances between them, you can determine the key length.

Worked Example of Kasiski Examination

Suppose you are analyzing a ciphertext and discover the following repeated trigram:

Occurrence	Sequence	Position	Distance to Next
1st	VGH	8	24
2nd	VGH	32	24
3rd	VGH	56	—

The distances between repetitions are 24 and 24. The key length must be a factor of these distances, because the keyword repeats every k letters, so two identical plaintext sequences will only produce identical ciphertext sequences when their positional difference is a multiple of k.

The factors of 24 are: 1, 2, 3, 4, 6, 8, 12, 24.

Now suppose you find additional repeated sequences:

Sequence	Positions	Distance	Factors
VGH	8, 32, 56	24, 24	1, 2, 3, 4, 6, 8, 12, 24
QMR	15, 33	18	1, 2, 3, 6, 9, 18
BWLX	42, 78	36	1, 2, 3, 4, 6, 9, 12, 18, 36

The common factor across all three sets of distances is 6. This strongly suggests the keyword is 6 characters long.

When Kasiski Examination Works Best

The effectiveness of Kasiski analysis depends heavily on the amount of ciphertext available:

Fewer than 100 characters: Repeated sequences are rare and may not appear at all. Kasiski analysis is unreliable at this length.
100 to 200 characters: You may find a few repeated trigrams, enough for a tentative key length estimate but possibly not conclusive.
200+ characters: The method becomes highly reliable. Multiple repeated sequences will appear, and their distances will converge on the correct key length.
1,000+ characters: The analysis becomes nearly certain, with numerous repeated trigrams and even quadgrams providing overwhelming statistical evidence.

It is also important to note that not every repeated sequence in the ciphertext is meaningful. Some repetitions occur by coincidence — a plaintext sequence aligning with different key positions just happens to produce the same ciphertext. These false positives introduce noise into the analysis, which is why examining multiple repeated sequences and looking for the most common shared factor is essential.

Method B: Index of Coincidence (IC)

Developed by William Friedman in 1920 during his groundbreaking work for the US Army Signal Intelligence Service, the Index of Coincidence offers a purely statistical approach to determining key length. While Kasiski examination looks for visible patterns, the IC method measures the underlying statistical structure of the text.

The Index of Coincidence is the probability that two letters chosen at random from a text are the same. The formula is:

IC = Σ nᵢ(nᵢ - 1) / N(N - 1)

Where nᵢ is the count of the i-th letter (A through Z) and N is the total number of letters in the text.

Expected IC Values

Different types of text produce characteristic IC values:

Text Type	Expected IC
English plaintext	~0.0667
German plaintext	~0.0762
French plaintext	~0.0778
Random (uniform distribution)	~0.0385
Vigenere ciphertext (short key)	0.04 – 0.05
Vigenere ciphertext (long key)	~0.038

The key insight is that a Caesar cipher preserves the IC of the original language (because it merely relabels letters without changing their frequencies), while Vigenere encryption reduces the IC toward the random baseline.

Using IC to Find the Key Length

The procedure is systematic:

For each candidate key length k (starting from 1), extract every k-th letter from the ciphertext to create k separate subsequences.
Compute the IC of each subsequence.
Average the IC values across all k subsequences.
When the average IC approaches 0.0667 (the English value), you have found the correct key length.

The reasoning is straightforward: when you guess the correct key length, each subsequence consists of letters that were all encrypted with the same Caesar shift. Those letters retain the frequency distribution of English (just shifted), so their IC will match English. When you guess the wrong key length, the subsequences mix letters encrypted with different shifts, producing a flatter distribution and a lower IC.

For example, testing key lengths on a ciphertext might produce:

Candidate Key Length	Average IC
1	0.0421
2	0.0398
3	0.0412
4	0.0405
5	0.0638
6	0.0423
7	0.0401

The sharp spike at key length 5 (IC = 0.0638, close to the English value of 0.0667) reveals the correct key length. All other candidates produce IC values near the random baseline of 0.0385.

Combining Kasiski and IC for Confidence

In practice, experienced cryptanalysts use both methods together. Kasiski examination provides candidate key lengths based on repeated sequences, and the IC test confirms which candidate is correct. If Kasiski suggests key lengths of 4, 6, or 12 (as common factors of observed distances), computing the IC for each candidate will identify the right one. This combined approach is what our Vigenere cipher decoder implements automatically.

Stage 2: Recovering Each Key Letter

Once the key length k is determined, the hardest part is over. The ciphertext is divided into k groups, each encrypted with a single Caesar shift:

Group 1: Letters at positions 1, k+1, 2k+1, 3k+1, ...
Group 2: Letters at positions 2, k+2, 2k+2, 3k+2, ...
Group k: Letters at positions k, 2k, 3k, 4k, ...

Each group is a simple Caesar cipher, and breaking Caesar ciphers is elementary. Two approaches work well:

Frequency Analysis

For each group, count the frequency of every letter and compare it against the expected English letter frequency distribution. The most common letter in the group is likely to be the encryption of E (the most common English letter). If the most frequent letter in group 1 is P, then the shift for that group is P - E = 11, meaning the first key letter is L (position 11).

This approach works well with large groups (100+ letters per group), but can be unreliable with smaller samples where statistical fluctuations distort the distribution.

Chi-Squared Testing

A more robust approach is to compute the chi-squared statistic for each possible shift value (0 through 25) and select the shift that produces the lowest chi-squared value — indicating the closest match to the expected English distribution.

For a given shift s applied to a group, the chi-squared statistic is:

χ² = Σ (Oᵢ - Eᵢ)² / Eᵢ

Where Oᵢ is the observed count of the i-th letter after decrypting with shift s, and Eᵢ is the expected count based on English letter frequencies.

The shift that minimizes χ² is almost certainly the correct one, even with relatively small sample sizes. Testing all 26 shifts for each group and selecting the minimum is both fast and accurate.

Putting It All Together

Suppose Kasiski and IC analysis determined a key length of 5. You split the ciphertext into 5 groups and run chi-squared analysis on each:

Group	Best Shift	Key Letter
1	11	L
2	4	E
3	12	M
4	14	O
5	13	N

The recovered keyword is LEMON. You can now decrypt the entire message using the standard Vigenere decryption formula:

Pᵢ = (Cᵢ - Kᵢ + 26) mod 26

To verify the result, decrypt the first few words and check whether they form coherent English. If they do, the key is confirmed.

Key Length vs. Security

The security of the Vigenere cipher depends almost entirely on the length of the keyword relative to the plaintext. Understanding this relationship explains both why the cipher was effective for centuries and why it ultimately failed:

Key Length	Effective Security	Vulnerability
1 letter	Equivalent to Caesar cipher	Trivially broken by brute force (25 keys)
3-5 letters	Low	Easily broken with 100+ characters of ciphertext
6-12 letters	Moderate (for the era)	Breakable with Kasiski + IC analysis
20+ letters	High (for the era)	Requires very long ciphertext to attack
= plaintext length (random)	Theoretically unbreakable	This is the one-time pad (Vernam cipher)

The one-time pad — where the key is truly random, at least as long as the message, and never reused — is the only provably unbreakable encryption scheme in existence. It is the logical endpoint of the Vigenere principle: use a key so long that it never repeats, eliminating the periodic patterns that Kasiski and IC exploit.

Security Assessment: Why Modern Standards Require More

The Vigenere cipher is not secure by modern standards, and its weaknesses go beyond simply having a short key:

Weakness	Detail
Repeating key	The keyword repeats, creating periodic patterns exploitable by Kasiski analysis
Key length is the bottleneck	Security is proportional to key length; short keys are easily recovered
Vulnerable to IC analysis	Statistical methods can determine key length from ciphertext alone
No diffusion	Each ciphertext letter depends only on one plaintext letter and one key letter
Deterministic per position	The same plaintext letter at positions with the same key letter always produces the same ciphertext

Modern encryption algorithms like AES use 128 to 256-bit keys, multiple rounds of transformation, and both confusion and diffusion to ensure that changing any single input bit affects every output bit. The Vigenere cipher achieves none of these properties.

Historical Codebreaking: Babbage and the Crimean War

Charles Babbage (1791-1871), the English mathematician famous for designing the first mechanical computer, secretly broke the Vigenere cipher around 1854 — nearly a decade before Kasiski's published attack. Working in his private study, Babbage discovered that repeated sequences in the ciphertext reveal the key length, essentially the same insight Kasiski later published independently.

But Babbage never published his results. Historians believe this was deliberate: British intelligence likely used the technique during the Crimean War (1853-1856) to read encrypted Russian and allied communications. Publishing the method would have alerted adversaries to the vulnerability. Babbage's breakthrough was only rediscovered in the 20th century through his unpublished notes.

This episode illustrates a pattern that recurs throughout cryptographic history: the gap between secret government cryptanalysis and public academic knowledge can span decades. The same dynamic played out with the NSA and public-key cryptography in the 1970s.

Real-World Failure: The Vigenere Cipher in the American Civil War

The American Civil War (1861-1865) provides a vivid case study of how the Vigenere cipher fails in practice, even when the underlying mathematics is sound.

The Confederate States relied on a variant of the Vigenere cipher for field communications. Confederate officers used a brass cipher disk — two concentric alphabet wheels that could be rotated to set different shift values — along with a pre-agreed keyword that changed periodically. The most commonly used Confederate keyword was reportedly "MANCHESTER BLUFF", though keys like "COMPLETE VICTORY" were also employed.

However, the Confederate implementation suffered from critical weaknesses:

Short, predictable keywords. Often patriotic phrases that Union cryptanalysts could guess or partially reconstruct.
Repeated keys. The same keyword was reused across many messages, providing ample ciphertext for Kasiski-style analysis.
Poor operational security. Keywords were sometimes transmitted in the clear or recorded in captured documents.

The Union's cipher bureau, led by cryptanalysts including Albert J. Myer and the "Sacred Three" telegraph operators (David Homer Bates, Albert Chandler, and Charles Tinker), broke Confederate Vigenere messages with regularity. Captain Campbell Brown of the Confederate Army later wrote that the cipher system "was so poorly managed that Federal authorities probably deciphered our dispatches with regularity throughout the war."

Their success demonstrated a principle that remains fundamental in modern cryptography: no encryption algorithm, no matter how mathematically sound, can protect messages if the keys are predictable, reused, or carelessly handled. Key management is at least as important as the cipher itself.

The True History: Bellaso vs. Vigenere

The cipher universally known as "Vigenere" was not invented by Blaise de Vigenere. The true inventor was Giovan Battista Bellaso, an Italian cryptographer who published the polyalphabetic substitution method using a repeating keyword in his 1553 work La Cifra del Sig. Giovan Battista Bellaso.

Bellaso's innovation was elegant: rather than using a fixed shift (as Caesar did) or a progressively changing shift (as Trithemius proposed in 1508), he introduced a secret keyword that both sender and receiver agreed upon in advance, with the keyword repeating across the message.

Blaise de Vigenere (1523-1596), a French diplomat and cryptographer, published a different and arguably more sophisticated cipher in 1586 in his Traicté des Chiffres. Vigenere's actual cipher was an autokey system — the key starts with a priming letter and then uses the plaintext itself to generate subsequent key letters, making the effective key non-repeating. This is a fundamentally stronger system than the repeating-keyword cipher that bears his name. You can explore Vigenere's actual invention on our Autokey Cipher page.

The misattribution occurred in the 19th century when historians conflated the two systems. By the time the error was recognized, "Vigenere cipher" had become the standard name for Bellaso's repeating-keyword method, and the usage persists to this day.

Implementing Kasiski Analysis in Python

A practical Python implementation of the cryptanalysis techniques discussed above helps solidify the concepts. The following code demonstrates both Kasiski examination and IC-based key length detection:

Python98 lines

Highlighting code...

3,394 chars

from collections import Counter
import math

def find_repeated_sequences(ciphertext, min_length=3):
    """Find repeated sequences and their distances (Kasiski examination)."""
    sequences = {}
    text = ''.join(c for c in ciphertext.upper() if c.isalpha())

    for length in range(min_length, min(6, len(text) // 2)):
        for i in range(len(text) - length + 1):
            seq = text[i:i + length]
            if seq not in sequences:
                sequences[seq] = []
            sequences[seq].append(i)

    # Keep only sequences that appear more than once
    repeated = {seq: positions for seq, positions in sequences.items()
                if len(positions) > 1}

    # Calculate distances
    distances = []
    for seq, positions in repeated.items():
        for i in range(len(positions) - 1):
            distances.append(positions[i + 1] - positions[i])

    return repeated, distances

def compute_ic(text):
    """Compute the Index of Coincidence for a text."""
    counts = Counter(c for c in text.upper() if c.isalpha())
    n = sum(counts.values())
    if n <= 1:
        return 0
    return sum(c * (c - 1) for c in counts.values()) / (n * (n - 1))

def find_key_length_ic(ciphertext, max_key_length=20):
    """Use IC analysis to find the most likely key length."""
    text = ''.join(c for c in ciphertext.upper() if c.isalpha())
    results = {}

    for key_len in range(1, min(max_key_length + 1, len(text) // 2)):
        groups = ['' for _ in range(key_len)]
        for i, char in enumerate(text):
            groups[i % key_len] += char

        avg_ic = sum(compute_ic(g) for g in groups) / key_len
        results[key_len] = avg_ic

    return results

def recover_key(ciphertext, key_length):
    """Recover key letters using chi-squared analysis."""
    text = ''.join(c for c in ciphertext.upper() if c.isalpha())
    english_freq = [0.0817, 0.0150, 0.0278, 0.0425, 0.1270,
                    0.0223, 0.0202, 0.0609, 0.0697, 0.0015,
                    0.0077, 0.0403, 0.0241, 0.0675, 0.0751,
                    0.0193, 0.0010, 0.0599, 0.0633, 0.0906,
                    0.0276, 0.0098, 0.0236, 0.0015, 0.0197, 0.0007]

    key = []
    for group_idx in range(key_length):
        group = text[group_idx::key_length]
        n = len(group)
        best_shift = 0
        best_chi2 = float('inf')

        for shift in range(26):
            chi2 = 0
            for i in range(26):
                observed = sum(1 for c in group
                              if (ord(c) - ord('A') - shift) % 26 == i)
                expected = english_freq[i] * n
                if expected > 0:
                    chi2 += (observed - expected) ** 2 / expected

            if chi2 < best_chi2:
                best_chi2 = chi2
                best_shift = shift

        key.append(chr(best_shift + ord('A')))

    return ''.join(key)

# Full cryptanalysis pipeline
ciphertext = "LXFOPVEFRNHR"  # Example: ATTACKATDAWN with key LEMON

# Step 1: Find key length using IC
ic_results = find_key_length_ic(ciphertext)
print("IC Analysis Results:")
for length, ic in sorted(ic_results.items()):
    marker = " <-- likely" if ic > 0.06 else ""
    print(f"  Key length {length}: IC = {ic:.4f}{marker}")

# Step 2: Recover the key
best_length = max(ic_results, key=lambda k: ic_results[k])
key = recover_key(ciphertext, best_length)
print(f"\nRecovered key: {key}")

This implementation follows the exact two-stage approach described above: first determine the key length through IC analysis, then recover each key letter through chi-squared testing against expected English letter frequencies. For a complete encryption/decryption implementation, visit the Vigenere Cipher Examples & Code page.

The Vigenere cipher belongs to a family of polyalphabetic systems, and understanding its relatives helps deepen your knowledge of cryptanalysis:

Beaufort Cipher — A reciprocal variant where the key letter is subtracted from a fixed position rather than added. The same operation performs both encryption and decryption, but it is vulnerable to the same Kasiski and IC attacks.
Autokey Cipher — The cipher Vigenere actually invented. By using the plaintext itself as part of the key, it eliminates the repeating-key weakness, making Kasiski examination ineffective. However, it has its own vulnerabilities based on known-plaintext attacks.
Caesar Cipher — The monoalphabetic predecessor. Understanding Caesar cryptanalysis is essential because Stage 2 of the Vigenere attack reduces to solving multiple Caesar ciphers independently.
Vigenere Table — The 26x26 tabula recta used for manual Vigenere encryption. Understanding this table helps visualize why different key letters produce different substitution alphabets.

The evolution from Caesar (single shift) to Vigenere (repeating keyword) to autokey (non-repeating key) to the one-time pad (truly random key) represents one of the most important conceptual arcs in cryptographic history. Each step addresses a specific weakness of the previous design, and the attacks developed against each cipher drove the invention of its successor.

Try It Yourself

The best way to understand Vigenere cryptanalysis is to practice it. Try our free Vigenere Cipher decoder with automatic Kasiski examination and key recovery. Paste any Vigenere-encrypted ciphertext, and the tool will estimate the key length, recover the keyword, and decrypt the message — all using the techniques described in this article.