Vigenere

Breaking the Vigenère Cipher: Kasiski Examination & Index of Coincidence

Learn how to break the Vigenère cipher without the key using Kasiski examination and Index of Coincidence. Includes history, Civil War examples, and Python code.

Published March 18, 2026
15 minute read
Cryptography Guide

For over three centuries, the Vigenère cipher was known as "le chiffre indéchiffrable" — the indecipherable cipher. Its polyalphabetic substitution defeated every known attack of its era by concealing the letter frequency patterns that make monoalphabetic ciphers like the Caesar cipher trivially breakable. The same plaintext letter could encrypt to entirely different ciphertext letters depending on its position, and cryptanalysts had no way to exploit that variation.

Then, in the 1850s and 1860s, two breakthroughs shattered the myth of invincibility. Charles Babbage secretly broke the cipher around 1854, and Friedrich Kasiski independently published a general attack in 1863. Their insight was elegant: the repeating keyword creates periodic patterns in the ciphertext, and those patterns can be measured, analyzed, and exploited to recover the key without ever seeing the original plaintext.

This article walks through the complete cryptanalysis of the Vigenère cipher — from recognizing that you are dealing with polyalphabetic encryption, through determining the key length, to recovering each individual key letter. Whether you are solving a CTF challenge, studying for a cryptography course, or simply curious about how codebreaking works, these techniques will give you a systematic method for breaking any Vigenère-encrypted message.

Why the Vigenère Cipher Was Considered Unbreakable

To understand why breaking the Vigenère cipher required entirely new methods, it helps to see what makes it different from simpler substitution ciphers.

In a Caesar cipher, every letter shifts by the same amount. The letter E, which is the most common letter in English (appearing roughly 12.7% of the time), simply moves to a new position. A cryptanalyst can look at the frequency distribution of the ciphertext, find the peak, and immediately determine the shift. The entire key space is just 25 possible values.

The Vigenère cipher eliminates this vulnerability by using a repeating keyword to vary the shift at each position. If the keyword is "LEMON" (length 5), then position 1 shifts by 11 (L), position 2 shifts by 4 (E), position 3 shifts by 12 (M), position 4 shifts by 14 (O), and position 5 shifts by 13 (N). Position 6 cycles back to a shift of 11, and the pattern repeats.

The result is that the letter E no longer encrypts to a single ciphertext letter. Depending on which keyword letter it aligns with, E could become P, I, Q, S, R, or any other letter. The frequency distribution of the ciphertext flattens dramatically — no single letter dominates — and the straightforward frequency analysis that cracks Caesar in seconds becomes useless.

But "useless" is not the same as "impossible." The repeating keyword introduces a subtle structural weakness that Kasiski and Babbage independently discovered how to exploit.

Recognizing Vigenère Ciphertext

Before attempting to break a cipher, you need to identify whether you are dealing with Vigenère encryption. Several characteristics distinguish it from other cipher types:

  • Alphabetic characters with preserved word boundaries. Like Caesar, the Vigenère cipher traditionally encrypts only letters, leaving spaces and punctuation intact. Word lengths match the original plaintext.
  • A flattened but not uniform letter distribution. In Caesar ciphertext, the frequency distribution looks like English but shifted. In Vigenère ciphertext, the distribution is significantly flatter — no single letter dominates — but it is not completely uniform either. Completely uniform frequencies would suggest a one-time pad or random text.
  • An Index of Coincidence between 0.04 and 0.05. The IC of English text is approximately 0.0667, while truly random text has an IC of approximately 0.0385. Vigenère ciphertext typically falls between these values, and that intermediate range is a strong indicator of polyalphabetic substitution.
  • Repeated sequences at regular intervals. If you spot identical three-letter or longer sequences appearing at distances that share a common factor, the text is very likely Vigenère-encrypted. This observation is the foundation of Kasiski examination.

If you want to automate this identification step, the Cipher Identifier tool uses these characteristics to detect Vigenère encryption and estimate the key length automatically.

The Two-Stage Attack Strategy

Breaking the Vigenère cipher requires a two-stage approach. This division is fundamental: the cipher's security depends on two separate secrets (the key length and the key letters themselves), and each must be attacked with different techniques.

Stage 1: Determine the key length using Kasiski examination and/or Index of Coincidence analysis.

Stage 2: Once the key length is known, split the ciphertext into groups and recover each key letter using frequency analysis.

The beauty of this approach is that it reduces the problem of breaking a polyalphabetic cipher into multiple instances of breaking simple Caesar ciphers — a problem that was already solved centuries earlier.

Stage 1: Finding the Key Length

Method A: Kasiski Examination

Published in 1863 by Friedrich Kasiski, a retired Prussian military officer, in his book Die Geheimschriften und die Dechiffrir-Kunst ("Secret Writing and the Art of Deciphering"), this method exploits a fundamental consequence of the repeating keyword.

The core insight is this: common plaintext sequences — words like "THE", "AND", "ING", "TION" — will occasionally align with the same position in the repeating keyword. When that happens, the same plaintext sequence encrypted with the same key letters produces identical ciphertext sequences. By finding these repeated ciphertext sequences and measuring the distances between them, you can determine the key length.

Worked Example of Kasiski Examination

Suppose you are analyzing a ciphertext and discover the following repeated trigram:

OccurrenceSequencePositionDistance to Next
1stVGH824
2ndVGH3224
3rdVGH56

The distances between repetitions are 24 and 24. The key length must be a factor of these distances, because the keyword repeats every k letters, so two identical plaintext sequences will only produce identical ciphertext sequences when their positional difference is a multiple of k.

The factors of 24 are: 1, 2, 3, 4, 6, 8, 12, 24.

Now suppose you find additional repeated sequences:

SequencePositionsDistanceFactors
VGH8, 32, 5624, 241, 2, 3, 4, 6, 8, 12, 24
QMR15, 33181, 2, 3, 6, 9, 18
BWLX42, 78361, 2, 3, 4, 6, 9, 12, 18, 36

The common factor across all three sets of distances is 6. This strongly suggests the keyword is 6 characters long.

When Kasiski Examination Works Best

The effectiveness of Kasiski analysis depends heavily on the amount of ciphertext available:

  • Fewer than 100 characters: Repeated sequences are rare and may not appear at all. Kasiski analysis is unreliable at this length.
  • 100 to 200 characters: You may find a few repeated trigrams, enough for a tentative key length estimate but possibly not conclusive.
  • 200+ characters: The method becomes highly reliable. Multiple repeated sequences will appear, and their distances will converge on the correct key length.
  • 1,000+ characters: The analysis becomes nearly certain, with numerous repeated trigrams and even quadgrams providing overwhelming statistical evidence.

It is also important to note that not every repeated sequence in the ciphertext is meaningful. Some repetitions occur by coincidence — a plaintext sequence aligning with different key positions just happens to produce the same ciphertext. These false positives introduce noise into the analysis, which is why examining multiple repeated sequences and looking for the most common shared factor is essential.

Method B: Index of Coincidence (IC)

Developed by William Friedman in 1920 during his groundbreaking work for the US Army Signal Intelligence Service, the Index of Coincidence offers a purely statistical approach to determining key length. While Kasiski examination looks for visible patterns, the IC method measures the underlying statistical structure of the text.

The Index of Coincidence is the probability that two letters chosen at random from a text are the same. The formula is:

IC = Σ nᵢ(nᵢ - 1) / N(N - 1)

Where nᵢ is the count of the i-th letter (A through Z) and N is the total number of letters in the text.

Expected IC Values

Different types of text produce characteristic IC values:

Text TypeExpected IC
English plaintext~0.0667
German plaintext~0.0762
French plaintext~0.0778
Random (uniform distribution)~0.0385
Vigenère ciphertext (short key)0.04 – 0.05
Vigenère ciphertext (long key)~0.038

The key insight is that a Caesar cipher preserves the IC of the original language (because it merely relabels letters without changing their frequencies), while Vigenère encryption reduces the IC toward the random baseline.

Using IC to Find the Key Length

The procedure is systematic:

  1. For each candidate key length k (starting from 1), extract every k-th letter from the ciphertext to create k separate subsequences.
  2. Compute the IC of each subsequence.
  3. Average the IC values across all k subsequences.
  4. When the average IC approaches 0.0667 (the English value), you have found the correct key length.

The reasoning is straightforward: when you guess the correct key length, each subsequence consists of letters that were all encrypted with the same Caesar shift. Those letters retain the frequency distribution of English (just shifted), so their IC will match English. When you guess the wrong key length, the subsequences mix letters encrypted with different shifts, producing a flatter distribution and a lower IC.

For example, testing key lengths on a ciphertext might produce:

Candidate Key LengthAverage IC
10.0421
20.0398
30.0412
40.0405
50.0638
60.0423
70.0401

The sharp spike at key length 5 (IC = 0.0638, close to the English value of 0.0667) reveals the correct key length. All other candidates produce IC values near the random baseline of 0.0385.

Combining Kasiski and IC for Confidence

In practice, experienced cryptanalysts use both methods together. Kasiski examination provides candidate key lengths based on repeated sequences, and the IC test confirms which candidate is correct. If Kasiski suggests key lengths of 4, 6, or 12 (as common factors of observed distances), computing the IC for each candidate will identify the right one. This combined approach is what our Vigenère cipher decoder implements automatically.

Stage 2: Recovering Each Key Letter

Once the key length k is determined, the hardest part is over. The ciphertext is divided into k groups, each encrypted with a single Caesar shift:

  • Group 1: Letters at positions 1, k+1, 2k+1, 3k+1, ...
  • Group 2: Letters at positions 2, k+2, 2k+2, 3k+2, ...
  • Group k: Letters at positions k, 2k, 3k, 4k, ...

Each group is a simple Caesar cipher, and breaking Caesar ciphers is elementary. Two approaches work well:

Frequency Analysis

For each group, count the frequency of every letter and compare it against the expected English letter frequency distribution. The most common letter in the group is likely to be the encryption of E (the most common English letter). If the most frequent letter in group 1 is P, then the shift for that group is P - E = 11, meaning the first key letter is L (position 11).

This approach works well with large groups (100+ letters per group), but can be unreliable with smaller samples where statistical fluctuations distort the distribution.

Chi-Squared Testing

A more robust approach is to compute the chi-squared statistic for each possible shift value (0 through 25) and select the shift that produces the lowest chi-squared value — indicating the closest match to the expected English distribution.

For a given shift s applied to a group, the chi-squared statistic is:

χ² = Σ (Oᵢ - Eᵢ)² / Eᵢ

Where Oᵢ is the observed count of the i-th letter after decrypting with shift s, and Eᵢ is the expected count based on English letter frequencies.

The shift that minimizes χ² is almost certainly the correct one, even with relatively small sample sizes. Testing all 26 shifts for each group and selecting the minimum is both fast and accurate.

Putting It All Together

Suppose Kasiski and IC analysis determined a key length of 5. You split the ciphertext into 5 groups and run chi-squared analysis on each:

GroupBest ShiftKey Letter
111L
24E
312M
414O
513N

The recovered keyword is LEMON. You can now decrypt the entire message using the standard Vigenère decryption formula:

Pᵢ = (Cᵢ - Kᵢ + 26) mod 26

To verify the result, decrypt the first few words and check whether they form coherent English. If they do, the key is confirmed.

Key Length vs. Security

The security of the Vigenère cipher depends almost entirely on the length of the keyword relative to the plaintext. Understanding this relationship explains both why the cipher was effective for centuries and why it ultimately failed:

Key LengthEffective SecurityVulnerability
1 letterEquivalent to Caesar cipherTrivially broken by brute force (25 keys)
3-5 lettersLowEasily broken with 100+ characters of ciphertext
6-12 lettersModerate (for the era)Breakable with Kasiski + IC analysis
20+ lettersHigh (for the era)Requires very long ciphertext to attack
= plaintext length (random)Theoretically unbreakableThis is the one-time pad (Vernam cipher)

The one-time pad — where the key is truly random, at least as long as the message, and never reused — is the only provably unbreakable encryption scheme in existence. It is the logical endpoint of the Vigenère principle: use a key so long that it never repeats, eliminating the periodic patterns that Kasiski and IC exploit.

Security Assessment: Why Modern Standards Require More

The Vigenère cipher is not secure by modern standards, and its weaknesses go beyond simply having a short key:

WeaknessDetail
Repeating keyThe keyword repeats, creating periodic patterns exploitable by Kasiski analysis
Key length is the bottleneckSecurity is proportional to key length; short keys are easily recovered
Vulnerable to IC analysisStatistical methods can determine key length from ciphertext alone
No diffusionEach ciphertext letter depends only on one plaintext letter and one key letter
Deterministic per positionThe same plaintext letter at positions with the same key letter always produces the same ciphertext

Modern encryption algorithms like AES use 128 to 256-bit keys, multiple rounds of transformation, and both confusion and diffusion to ensure that changing any single input bit affects every output bit. The Vigenère cipher achieves none of these properties.

Historical Codebreaking: Babbage and the Crimean War

Charles Babbage (1791-1871), the English mathematician famous for designing the first mechanical computer, secretly broke the Vigenère cipher around 1854 — nearly a decade before Kasiski's published attack. Working in his private study, Babbage discovered that repeated sequences in the ciphertext reveal the key length, essentially the same insight Kasiski later published independently.

But Babbage never published his results. Historians believe this was deliberate: British intelligence likely used the technique during the Crimean War (1853-1856) to read encrypted Russian and allied communications. Publishing the method would have alerted adversaries to the vulnerability. Babbage's breakthrough was only rediscovered in the 20th century through his unpublished notes.

This episode illustrates a pattern that recurs throughout cryptographic history: the gap between secret government cryptanalysis and public academic knowledge can span decades. The same dynamic played out with the NSA and public-key cryptography in the 1970s.

Real-World Failure: The Vigenère Cipher in the American Civil War

The American Civil War (1861-1865) provides a vivid case study of how the Vigenère cipher fails in practice, even when the underlying mathematics is sound.

The Confederate States relied on a variant of the Vigenère cipher for field communications. Confederate officers used a brass cipher disk — two concentric alphabet wheels that could be rotated to set different shift values — along with a pre-agreed keyword that changed periodically. The most commonly used Confederate keyword was reportedly "MANCHESTER BLUFF", though keys like "COMPLETE VICTORY" were also employed.

However, the Confederate implementation suffered from critical weaknesses:

  • Short, predictable keywords. Often patriotic phrases that Union cryptanalysts could guess or partially reconstruct.
  • Repeated keys. The same keyword was reused across many messages, providing ample ciphertext for Kasiski-style analysis.
  • Poor operational security. Keywords were sometimes transmitted in the clear or recorded in captured documents.

The Union's cipher bureau, led by cryptanalysts including Albert J. Myer and the "Sacred Three" telegraph operators (David Homer Bates, Albert Chandler, and Charles Tinker), broke Confederate Vigenère messages with regularity. Captain Campbell Brown of the Confederate Army later wrote that the cipher system "was so poorly managed that Federal authorities probably deciphered our dispatches with regularity throughout the war."

Their success demonstrated a principle that remains fundamental in modern cryptography: no encryption algorithm, no matter how mathematically sound, can protect messages if the keys are predictable, reused, or carelessly handled. Key management is at least as important as the cipher itself.

The True History: Bellaso vs. Vigenère

The cipher universally known as "Vigenère" was not invented by Blaise de Vigenère. The true inventor was Giovan Battista Bellaso, an Italian cryptographer who published the polyalphabetic substitution method using a repeating keyword in his 1553 work La Cifra del Sig. Giovan Battista Bellaso.

Bellaso's innovation was elegant: rather than using a fixed shift (as Caesar did) or a progressively changing shift (as Trithemius proposed in 1508), he introduced a secret keyword that both sender and receiver agreed upon in advance, with the keyword repeating across the message.

Blaise de Vigenère (1523-1596), a French diplomat and cryptographer, published a different and arguably more sophisticated cipher in 1586 in his Traicté des Chiffres. Vigenère's actual cipher was an autokey system — the key starts with a priming letter and then uses the plaintext itself to generate subsequent key letters, making the effective key non-repeating. This is a fundamentally stronger system than the repeating-keyword cipher that bears his name. You can explore Vigenère's actual invention on our Autokey Cipher page.

The misattribution occurred in the 19th century when historians conflated the two systems. By the time the error was recognized, "Vigenère cipher" had become the standard name for Bellaso's repeating-keyword method, and the usage persists to this day.

Implementing Kasiski Analysis in Python

A practical Python implementation of the cryptanalysis techniques discussed above helps solidify the concepts. The following code demonstrates both Kasiski examination and IC-based key length detection:

Python98 lines
Highlighting code...
3394 chars

This implementation follows the exact two-stage approach described above: first determine the key length through IC analysis, then recover each key letter through chi-squared testing against expected English letter frequencies. For a complete encryption/decryption implementation, visit the Vigenère Cipher Examples & Code page.

The Vigenère cipher belongs to a family of polyalphabetic systems, and understanding its relatives helps deepen your knowledge of cryptanalysis:

  • Beaufort Cipher — A reciprocal variant where the key letter is subtracted from a fixed position rather than added. The same operation performs both encryption and decryption, but it is vulnerable to the same Kasiski and IC attacks.
  • Autokey Cipher — The cipher Vigenère actually invented. By using the plaintext itself as part of the key, it eliminates the repeating-key weakness, making Kasiski examination ineffective. However, it has its own vulnerabilities based on known-plaintext attacks.
  • Caesar Cipher — The monoalphabetic predecessor. Understanding Caesar cryptanalysis is essential because Stage 2 of the Vigenère attack reduces to solving multiple Caesar ciphers independently.
  • Vigenère Table — The 26x26 tabula recta used for manual Vigenère encryption. Understanding this table helps visualize why different key letters produce different substitution alphabets.

The evolution from Caesar (single shift) to Vigenère (repeating keyword) to autokey (non-repeating key) to the one-time pad (truly random key) represents one of the most important conceptual arcs in cryptographic history. Each step addresses a specific weakness of the previous design, and the attacks developed against each cipher drove the invention of its successor.

Try It Yourself

The best way to understand Vigenère cryptanalysis is to practice it. Try our free Vigenère Cipher decoder with automatic Kasiski examination and key recovery. Paste any Vigenère-encrypted ciphertext, and the tool will estimate the key length, recover the keyword, and decrypt the message — all using the techniques described in this article.

About This Article

This article is part of our comprehensive vigenere cipher tutorial series. Learn more about classical cryptography and explore our interactive cipher tools.

Try Vigenère Cipher Cipher Tool

Put your knowledge into practice with our interactive vigenère cipherencryption and decryption tool.

Try Vigenère Cipher Tool