Cryptanalysis of Hill Cipher

If you need a reminder on how the Hill Cipher works click here.

The first thing to note is that when encoding in Hill Cipher each row of the key matrix encodes to 1 letter independently of the rest of the key matrix.

 \begin{bmatrix}21 & 18 & 12 \\9 & 0 & 23 \\8 & 3 & 2 \end{bmatrix}\begin{bmatrix}a \\b \\c \end{bmatrix}=\begin{bmatrix}21a+18b+12 c \\9 a+0b+23c \\8a+3b+2c \end{bmatrix}\bmod 26

Notice how the top row of the far left matrix is only involved in the top cell of the ciphertext matrix, the middle row is only involved in the middle cell etc.

We can use this fact to dramatically decrease the number of keys we have to test to break the Hill Cipher.

For square matrix of size N, there are 26N×N unique keys (there will be less as not all matrices have an inverse). For N=3, there is 269 ≈ 5.43×1012 keys, to test all of these is not feasible (I calculated on my pc it would take ≈ 8 years to test them all).

However, if we test each row individually then there is only 26N keys we need to test, For N=3 there is 263 = 17,576 which is a very small number in comparison (Takes 0.5 seconds on my pc!)

With this property of Hill Cipher we can go about cracking it.

First you will need to identify N (the size of the matrix) the size will be a multiple of the text length – this narrows it down a lot

Now you will be to iterate over all the row vectors with a size of N and possible values of 0 (inclusive) to 26 (exclusive).

For a 3 by 3 there are 17,576 combinations. They look will look something like this. On the left is the iteration number…

1/17576         [ 0,  0,  0]
2/17576         [ 0,  0,  1]
3/17576         [ 0,  0,  2] ……
16249/17576     [24,  0, 24]
16250/17576     [24,  0, 25]
16251/17576     [24,  1,  0] ……
17576/17576     [25, 25, 25]

For each one of these possibilities assume it is part of the key and multiply your ciphertext by it, you will multiply in blocks of N and get a single letter out for each block.

\begin{bmatrix}a & b & c \end{bmatrix} \begin{bmatrix}L_{1} \\L_{2} \\L_{3} \end{bmatrix}=\begin{bmatrix}a\times L_{1} + b\times L_{2} + c\times L_{3} \end{bmatrix}\bmod26

Once you have all the output letters for a particular possibility, score the letters using the Chi-Squared Statistic. Store the row vectors from smallest to largest Chi-Squared value.

Once you have checked all the possibilities. Take the best results from the list you have compiled and then go through all the permutations of creating an N by N matrix and checking it has an inverse in modular 26.

Example:

Let’s say you know N=3 and the best row vectors found using this method were with a Chi-Squared value of… (note is some cases the best N vectors may not be the correct ones so you may need to try a combination of a few different ones)

[22,  6,  7]    X2 = 71.721647
[23, 17, 18]    X2 = 50.562860
[25,  0,  6]    X2 = 81.987751

Rearranging each row to every possible position (For R number of rows there is R!, R×(R-1)×(R-2)…×1, permutations)

The next (3! = 6) matrices are all the permutations of each row vector.

\begin{bmatrix}22 & 6 & 7 \\23 & 17 & 18\\25 & 0 & 6  \end{bmatrix} \begin{bmatrix}22 & 6 & 7 \\25 & 0 & 6\\23 & 17 & 18  \end{bmatrix} \begin{bmatrix}23 & 17 & 18 \\22 & 6 & 7\\25 & 0 & 6\end{bmatrix}
\begin{bmatrix}25 & 0 & 6 \\22 & 6 & 7\\23 & 17 & 18 \end{bmatrix}\begin{bmatrix}25 & 0 & 6 \\23 & 17 & 18\\22 & 6 & 7 \end{bmatrix}\begin{bmatrix}{23} & 17 & 18 \\25 & 0 & 6\\22 & 6 & 7\end{bmatrix}

Then encrypt your ciphertext using these matrices (encrypting using the inverse key matrix is the same as decrypting using the key matrix). One of these results should be English – being your solution. If you wish to find the key matrix, you will need to inverse the inverse key matrix in mod 26.

To Conclude

For larger matrices like 4 by 4 and up the sheer number of keys make a brute force attack impossible, I don’t believe anyone has the patience or life expectancy to wait around 64 trillion years to solve one cipher. Other methods like crib dragging require you to guess or make assumptions for large chunks of the plaintext, a crib of 19+ characters very hard to come by. The method described above can solve a 4 by 4 Hill cipher in about 10 seconds, with no known cribs. The only thing it requires is that the text is of a certain length, about 100×(N-1) or greater when N is the size of the matrix being tested, so that statistical properties are not affected by a lack of data.

This same method can be adapted to decrypted ciphertext in other languages you just need to change the frequencies of letters that the Chi-Squared Statistic uses.

[powr-hit-counter id=4db2581c_1482002480525]

7 thoughts on “Cryptanalysis of Hill Cipher”

    1. A brute force attack would not be viable on a matrix of this size. For a 6 x 6 matrix in modulo 26 there are 2.9^40 matrices and it is simply not possible to try every possible matrix even if you eliminate those without an inverse in a time insignificant time. The best method of attack is use the method described in this post – for a 6×6 matrix there are 26^6=308,915,776 6×1 arrays that need to be tested. In comparison this is tiny, however this does require there to be a decent amount of text (prob 400 characters or more). For shorter texts i.e 100 or less it would almost be impossible to decrypt it. As well as there not being enough text for simple statistics to make these tests accurate, there is also the possibility that many keys can decrypt to text that makes sense. For example an extreme case would be a text encrypted in a simple substitution “JKW”, this could decrypt to “AND”, “THE”, “KEY”, “MAN”, “PEN” etc. For each cipher there is a theoretical number of characters that is required before the chance of getting multiple keys that decrypt to english, this can be calculated based on the key space and some other facts. However in the real world you often need 2-3x more characters.

  1. Can you help me decrypt this: I have tried All 25 shifts of ceasar cipher

    HMDICOMIRFZGCRQRZJRFICFLMMDIHFWSRICKIMDLSWHMIWSMZ WSPCOMZWPZRZWFZRBZFWPMZ,OMZORPQRGPZAPKRIWJ,FZGG RGPOFWRGWMWSRUIMUMCPWPMZWSFWFAAYRZFIROIRFWRGR VDFA. ZMBBRFIRRZLFLRGPZFLIRFWOPQPABFI,WRCWPZLBSRWSRIWSF WZFWPMZMIFZJZFWPMZCMOMZORPQRGFZGCMGRGPOFWRG,OF ZAMZLRZGDIR.BRFIRYRWMZFLIRFWKFWWARHPRAGMHWSFWBFI.BRSFQROMYRWMGRGPOFWRFUMIWPMZMH WSFWHPRAG,FCFHPZFAIRCWPZLUAFORHMIWSMCRBSMSRIRLFQ RWSRPIAPQRCWSFWWSFWZFWPMZYPLSWAPQR.PWPCFAWMLR WSRIHPWWPZLFZGUIMURIWSFWBRCSMDAGGMWSPC. KDW,PZFAFILRICRZCR,BROFZZMWGRGPOFWRBROFZZMWOMZCR OIFWRBROFZZMWSFAAMBWSPCLIMDZG.WSRKIFQRYRZ,APQPZLF ZGGRFG,BSMCWIDLLARGSRIR,SFQROMZCROIFWRGPW,HFIFKMQ RMDIUMMIUMBRIWMFGGMIGRWIFOW.WSRBMIAGBPAAAPWWARZ MWR,ZMIAMZLIRYRYKRIBSFWBRCFJSRIR,KDWPWOFZZRQRIHMILR WBSFWWSRJGPGSRIR.PWPCHMIDCWSRAPQPZL,IFWSRI,WMKRGR GPOFWRGSRIRWMWSRDZHPZPCSRGBMIXBSPOSWSRJBSMHMDL SWSRIRSFQRWSDCHFICMZMKAJFGQFZORG.PWPCIFWSRIHMIDC WMKRSRIRGRGPOFWRGWMWSRLIRFWWFCXIRYFPZPZLKRHMIRD CWSFWHIMYWSRCRSMZMIRGGRFGBRWFXRPZOIRFCRGGRQMWP MZWMWSFWOFDCRHMIBSPOSWSRJLFQRWSRAFCWHDAAYRFCDI RMHGRQMWPMZWSFWBRSRIRSPLSAJIRCMAQRWSFWWSRCRGR FGCSFAAZMWSFQRGPRGPZQFPZWSFWWSPCZFWPMZ,DZGRILMG ,CSFAASFQRFZRBKPIWSMHHIRRGMYFZGWSFWLMQRIZYRZWMH WSRURMUAR,KJWSRURMUAR,HMIWSRURMUAR,CSFAAZMWURIPC SHIMYWSRRFIWS.HFICMZMKAJFGQFZORG.PWPCIFWSRIHMIDCW MKRSRIRGRGPOFWRGWMWSRLIRFWWFCXIRYFPZPZLKRHMIRDC WSFWHIMYWSRCRSMZMIRGGRFGBRWFXRPZOIRFCRGGRQMWPM ZWMWSFWOFDCRHMIBSPOSWSRJLFQRWSRAFCWHDAAYRFCDIR MHGRQMWPMZWSFWBRSRIRSPLSAJIRCMAQRWSFWWSRCRGRF GCSFAAZMWSFQRGPRGPZQFPZWSFWWSPCZFWPMZ,DZGRILMG, CSFAASFQRFZRBKPIWSMHHIRRGMYFZGWSFWLMQRIZYRZWMHW
    SRURMUAR,KJWSRURMUAR,HMIWSRURMUAR,CSFAAZMWURIPCS HIMYWSRRFIWS

    And this:

    A’CHGSHXGAFCII,OGNLIPHBX’CIHDQGQVTLN, BLHNCAZDAFPQGBILHVQIPHNNLINBVQVTLN’CQHCNTQIHEVXT, BLACIODAHPCNDVZICHXPODVTLNCNHDCNLDAFTLNLIZIDVQAFCW VTLN, A’IDNLIDHEZHDNCBIBHNSLIP,BIDICATHQQHXNQGCNDIHEVXT? HXPNLIDASYINC’DIPTQHDI,NLIOAEOCOFDCNVXTVXHVD, THRIZDAAWNLDAFTLNLIXVTLNNLHNAFDWQHTBHCCNVQQNLIDI; ACHGPAICNLHNCNHD-CZHXTQIPOHXXIDGINBHRI A’IDNLIQHXPAWNLIWDIIHXPNLILAEIAWNLIODHRI?

    AXNLICLADIPVEQGCIIXNLDAFTLNLIEVCNCAWNLIPIIZ, BLIDINLIWAI’CLHFTLNGLACNVXPDIHPCVQIXSIDIZACIC, BLHNVCNLHNBLVSLNLIODIIJI,A’IDNLINABIDVXTCNIIZ, HCVNWVNWFQQGOQABC,LHQWSAXSIHQC,LHQWPVCSQACIC? XABVNSHNSLICNLITQIHEAWNLIEADXVXT’CWVDCNOIHE, VXWFQQTQADGDIWQISNIPXABCLVXICVXNLICNDIHE: ‘NVCNLICNHD-CZHXTQIPOHXXID,AQAXTEHGVNBHRI A’IDNLIQHXPAWNLIWDIIHXPNLILAEIAWNLIODHRI.

    HXPBLIDIVCNLHNOHXPBLACARHFXNVXTQGCBADI NLHNNLILHRASAWBHDHXPNLIOHNNQI’CSAXWFCVAX, HLAEIHXPHSAFXNDG,CLAFQPQIHRIFCXAEADI? NLIVDOQAAPLHCBHCLIPAFNNLIVDWAFQWAANCNIZC’ZAQQFNVAX. XADIWFTISAFQPCHRINLILVDIQVXTHXPCQHRI WDAENLINIDDADAWWQVTLN,ADNLITQAAEAWNLITDHRI: HXPNLICNHD-CZHXTQIPOHXXIDVXNDVFEZLPANLBHRI, A’IDNLIQHXPAWNLIWDIIHXPNLILAEIAWNLIODHRI

    And This:

    “ONZKNDDBEAXAOYQOEKBAPEAHNBOYQKNEQOSSWAHAQYOYK OVEAGBNVEKUWGUNEUVEHUZAYVEGNEOAYIAGIGUUHAZOYEKU KOVEAGBAIAPGYNEOAY.” “IOJUVRAGUBUNGVNWA,NWGUNENZUGORNY,OYQKAVUVBZFASO RVKNHAQQUVENYHVOWYUHEKUUZNYRODNEOAYDGARSNZNEOA Y.EKOVZAZUYEAPVHURGUURNZUNVNWGUNEFUNRAYSOWKEAIK ADUEAZOSSOAYVAIYUWGAVSNJUVQKAKNHFUUYVUNGUHOYEKUI SNZUVAIQOEKUGOYWOYXPVEORU.OERNZUNVNXABAPVHNBFGU NCEAUYHEKUSAYWYOWKEAIRNDEOJOEB. “FPEAYUKPYHGUHBUNGVSNEUG,QUZPVEINRUEKUEGNWORINRE EKNEEKUYUWGAOVVEOSSYAEIGUU.AYUKPYHGUHBUNGVSNEUG, EKUSOIUAIEKUYUWGAOVVEOSSVNHSBRGODDSUHFBEKUZNYNR SUVAIVUWGUWNEOAYNYHEKURKNOYVAIHOVRGOZOYNEOAY.AY UKPYHGUHBUNGVSNEUG,EKUYUWGASOJUVAYNSAYUSBOVSNYH AIDAJUGEBOYEKUZOHVEAINJNVEARUNYAIZNEUGONSDGAVDUGO EB.AYUKPYHGUHBUNGVSNEUG,EKUYUWGAOVVEOSSSNYWPOVK OYWOYEKURAGYUGVAINZUGORNYVAROUEBNYHIOYHVKOZVUSIN YULOSUOYKOVAQYSNYH.VAQUKNJURAZUKUGUEAHNBEAHGNZN EOMUNYNDDNSSOYWRAYHOEOAY.”

  2. the 3 row with most score could be rows of inverse matrix but its not necessary those 3 be the answer .

    1. True, sometimes there may be some more vectors that score highly that are not the correct ones, have added a note to the page.

  3. Oh please, you sound like you could do a really great simplified way to find the Key for a 3×3 Hill Cipher given the Plain-text and the Cipher….. oh I think you would be the first on the entire internet that could plainly explain it 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.