Sounds Same Warmup
Posted: Tue Feb 17, 2009 2:01 pm
This is how I solved it:
I made the assumption that for each plaintext letter each homophone is used with the same probability.
From this follows that if the ciphertext letters X and Y stand for the same plaintext letter, the frequency of the bigrams (groups of 2 letters) AX and AY will be similar for all other ciphertext letters A.
For example if we assume that '6' and 'A' are homophones for the same letter, then '60' and 'A0' would occur with the same (or almost the same) frequency. So I calculate the difference between these two frequencies and do this for all analogous bigrams. Then I sum up these differences. If the result is very low compared to other bigrams then '6' and 'A' are really translations for the same letter.
The following python code implements this idea:
(The original code was a lot bigger - It's so cool how you can shrink down python programs
)
The output is:
0 [0, 186, 126, 48, 118, 128, 180, 125, 117, 124, 194, 182, 184, 194, 180, 132]
1 [186, 0, 214, 178, 212, 206, 68, 195, 211, 102, 46, 54, 54, 50, 52, 104]
2 [126, 214, 0, 114, 62, 44, 192, 47, 49, 184, 212, 210, 208, 208, 194, 186]
3 [48, 178, 114, 0, 100, 108, 164, 113, 95, 116, 180, 178, 178, 184, 166, 120]
4 [118, 212, 62, 100, 0, 50, 186, 59, 51, 184, 214, 210, 204, 204, 188, 188]
5 [128, 206, 44, 108, 50, 0, 184, 45, 51, 180, 208, 206, 200, 204, 182, 184]
6 [180, 68, 192, 164, 186, 184, 0, 169, 183, 102, 56, 48, 68, 62, 62, 104]
7 [125, 195, 47, 113, 59, 45, 169, 0, 58, 171, 193, 191, 185, 189, 175, 173]
8 [117, 211, 49, 95, 51, 51, 183, 58, 0, 179, 213, 209, 205, 209, 193, 183]
9 [124, 102, 184, 116, 184, 180, 102, 171, 179, 0, 104, 106, 114, 114, 98, 26]
A [194, 46, 212, 180, 214, 208, 56, 193, 213, 104, 0, 52, 46, 52, 44, 100]
B [182, 54, 210, 178, 210, 206, 48, 191, 209, 106, 52, 0, 60, 60, 68, 108]
C [184, 54, 208, 178, 204, 200, 68, 185, 205, 114, 46, 60, 0, 54, 52, 118]
D [194, 50, 208, 184, 204, 204, 62, 189, 209, 114, 52, 60, 54, 0, 64, 118]
E [180, 52, 194, 166, 188, 182, 62, 175, 193, 98, 44, 68, 52, 64, 0, 98]
F [132, 104, 186, 120, 188, 184, 104, 173, 183, 26, 100, 108, 118, 118, 98, 0]
As you can see, the lists are very similar for the homophonic groups.
Knowing the order of frequencies of the letters in the text I could easily determine wich group belongs to which letter.
I made the assumption that for each plaintext letter each homophone is used with the same probability.
From this follows that if the ciphertext letters X and Y stand for the same plaintext letter, the frequency of the bigrams (groups of 2 letters) AX and AY will be similar for all other ciphertext letters A.
For example if we assume that '6' and 'A' are homophones for the same letter, then '60' and 'A0' would occur with the same (or almost the same) frequency. So I calculate the difference between these two frequencies and do this for all analogous bigrams. Then I sum up these differences. If the result is very low compared to other bigrams then '6' and 'A' are really translations for the same letter.
The following python code implements this idea:
Code: Select all
c = "7A62E92C...F215F8" # the ciphertext
# build dictionary of bigram-frequency in ciphertext
d = dict([(bigram,c.count(bigram)) for bigram in ["%02X" % i for i in range(0x100)]])
ct = "0123456789ABCDEF"
for x in ct:
print x, [sum(map(lambda a: abs(d[x+a]-d[y+a]) + abs(d[a+x]-d[a+y]), ct)) for y in ct]

The output is:
0 [0, 186, 126, 48, 118, 128, 180, 125, 117, 124, 194, 182, 184, 194, 180, 132]
1 [186, 0, 214, 178, 212, 206, 68, 195, 211, 102, 46, 54, 54, 50, 52, 104]
2 [126, 214, 0, 114, 62, 44, 192, 47, 49, 184, 212, 210, 208, 208, 194, 186]
3 [48, 178, 114, 0, 100, 108, 164, 113, 95, 116, 180, 178, 178, 184, 166, 120]
4 [118, 212, 62, 100, 0, 50, 186, 59, 51, 184, 214, 210, 204, 204, 188, 188]
5 [128, 206, 44, 108, 50, 0, 184, 45, 51, 180, 208, 206, 200, 204, 182, 184]
6 [180, 68, 192, 164, 186, 184, 0, 169, 183, 102, 56, 48, 68, 62, 62, 104]
7 [125, 195, 47, 113, 59, 45, 169, 0, 58, 171, 193, 191, 185, 189, 175, 173]
8 [117, 211, 49, 95, 51, 51, 183, 58, 0, 179, 213, 209, 205, 209, 193, 183]
9 [124, 102, 184, 116, 184, 180, 102, 171, 179, 0, 104, 106, 114, 114, 98, 26]
A [194, 46, 212, 180, 214, 208, 56, 193, 213, 104, 0, 52, 46, 52, 44, 100]
B [182, 54, 210, 178, 210, 206, 48, 191, 209, 106, 52, 0, 60, 60, 68, 108]
C [184, 54, 208, 178, 204, 200, 68, 185, 205, 114, 46, 60, 0, 54, 52, 118]
D [194, 50, 208, 184, 204, 204, 62, 189, 209, 114, 52, 60, 54, 0, 64, 118]
E [180, 52, 194, 166, 188, 182, 62, 175, 193, 98, 44, 68, 52, 64, 0, 98]
F [132, 104, 186, 120, 188, 184, 104, 173, 183, 26, 100, 108, 118, 118, 98, 0]
As you can see, the lists are very similar for the homophonic groups.
Knowing the order of frequencies of the letters in the text I could easily determine wich group belongs to which letter.