# Chi-Squared Statistic

The Chi-Squared Statistic is a measure of how two categorical distributions differ from one another. So for 2 identical distributions the score would be 0 and as the distributions begin to diff the score will increase. The formula is…

$X^{2}=\sum_{i=A}^{i=Z}\frac{(O_{i}-E_{i})^2}{E_{i}}$
Oi is the observed count of that letter in your text.
Ei is the expected count of that letter in the length of your text.

Chi-Squared Statistic in words is, “the sum, of the squared difference between observed count and expected count divided by the expected count, of each letter.”

Example: ‘WHENTHECLOCKSTRIKESTWELVEATTACK’ Text length = 31

 Letter Observed Count (Oi) Frequency in English Expected Count (Ei)* (Oi – Ei)2/Ei A 2 8.17% 2.53177 0.11169 B 0 1.49% 0.46252 0.46252 C 3 2.78% 0.86242 5.29817 D 0 4.25% 1.31843 1.31843 E 5 12.70% 3.93762 0.28663 F 0 2.23% 0.69068 0.69068 G 0 2.02% 0.62465 0.62465 H 2 6.09% 1.88914 0.00651 I 1 7.00% 2.16876 0.62985 J 0 0.15% 0.04743 0.04743 K 3 0.77% 0.23932 31.84587 L 2 4.03% 1.24775 0.45352 M 0 2.41% 0.74586 0.74586 N 1 6.75% 2.09219 0.57016 O 1 7.51% 2.32717 0.75688 P 0 1.93% 0.59799 0.59799 Q 0 0.10% 0.02945 0.02945 R 1 5.99% 1.85597 0.39477 S 2 6.33% 1.96137 0.00076 T 5 9.06% 2.80736 1.71252 U 0 2.76% 0.85498 0.85498 V 1 0.98% 0.30318 1.60155 W 2 2.36% 0.73160 2.19907 X 0 0.15% 0.04650 0.04650 Y 0 1.97% 0.61194 0.61194 Z 0 0.07% 0.02294 0.02294 Total 31 1.00029 31.00899 51.92133

* Expected Count = FREQ / 100 × LEN

For English a Chi-Squared value of about 150 or less is expected anything above does likely does not resemble English.

WHENTHECLOCKSTRIKESTWELVEATTACK, X2 = 51.92133
THWKEEVIWTETSCANHKERCTTAKSCLLOE, X2 = 51.92133
ZDXPLXTDOWXSWCRSGPWVVOCWEOTTXOK, X2 = 425.59631

As you can see English text scores low however score is independent of letter order and a random text does not score highly.

I have created an Excel spreadsheet that can calculate Chi-Squared when given the frequencies of letters. It does not use macros. Chi-Squared Calculator

## 1 thought on “Chi-Squared Statistic”

This site uses Akismet to reduce spam. Learn how your comment data is processed.