Vigenere Cipher

Introduction

The Vigenere cipher is an old cryptographic system, which offers very little security, but is interesting to analyse. This program allows you to encrypt and decrypt messages and files using the original form of the cipher, which only handles capital letters. It also has a facility to try to decode a ciphertext with unknown key, which is usually successful, within some limits. The "save document with password" function in some desktop applications use the Vigenere cipher, slightly modified to handle all bytes rather than just capital letters (by using XOR), but having the same vulnerabilities.

Using the Program

The zip file contains a Windows 32-bit executable, HTML documention and C source code. So far, I've tested the code using DevStudio under Windows NT, but it should work on most platforms apart from MS-DOS. If you are familiar with computers and basic cryptography then the program is largely self explanatory, although these points are worth highlighting:

How the Cipher Works

The cipher needs a plaintext/ciphertext and a key to start. These are tidied by removing all non-alphabetic characters, and capitalising what remains. The tidied key must not be zero length. Firstly, each character in the text is assigned to a character in the key, like this:

PLAINTEXTMESSAGE
KEYKEYKEYKEYKEYK

Then, each character has a Caesar shift applied, with the amount of shift determined by the key character. The table below shows exactly what happens. To encrypt a character, find the column with the plaintext at the top, and the row with the key on the left side. The ciphertext is the character that this row and column intersect on. To decrypt, locate the row with the key on the left side, and look along this until you find the ciphertext. The plaintext is the character at the top of this column.

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R R S T U V W X Y Z A B C D E F G H I J K L M N O P S
S S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
The ciphertext for the above example works out to be:

ZPYSRROBRWIQCEEO

Cracking the Cipher

Decrypting a ciphertext without a key is usually possible, provided that the message is substantially longer than the key (the program only works with keys of less than 255 characters), and the plaintext's language is known (the program only works with English). Of course, the program can't be sure of what the key is - it can only find the most likely possibility. Because of this, information about each decision is presented to the user, who then has a chance to override the automatic suggestion.

The first step is to determine the length of the key, using the method of coincidence indices. To calculate a coincidence index, shift the ciphertext and compare against itself, counting character matches:

CIPHERTEXT
   CIPHERTEXT
   ----*-*

In this example, for a shift of three, there are two matches out of a possible seven, so the coincidence index is 28.57%. For random text the expected value is 3.85%, but this number tends to be higher for non-random text in any language, because some letters are more frequent than others. For Vigenere ciphertexts, the coincidence index is higher for shifts that are a multiple of the key length than otherwise (can you see why?). The program calculates the first 256 shifts, and finds the shortest length for which all shifts that are a multiple of this length have a coincidence index greater than 6%, which it suggests as the key length.

Knowing the key length, you can break the ciphertext up into chunks, each of which have been encrypted by the same character in the key. For each chunk, try decrypting it with each possible key letter (A...Z). Compare the letter frequencies in each possible plaintext with a table of English letter frequencies. The key letter which results in the least squared deviation is the most likely candidate, which the program suggests.

To test the cracking code, I used words.txt, a list of many English words, found in the Word Patterns download. With this much ciphertext, keys as long as 200 characters can be uncovered without the user overriding any automatic suggestions.

Program Design