Chi-Squared Statistic

The Chi-Squared Statistic is a measure of how two categorical distributions differ from one another. So for 2 identical distributions the score would be 0 and as the distributions begin to diff the score will increase. The formula is…

X^{2}=\sum_{i=A}^{i=Z}\frac{(O_{i}-E_{i})^2}{E_{i}}
Oi is the observed count of that letter in your text.
Ei is the expected count of that letter in the length of your text.

Chi-Squared Statistic in words is, “the sum, of the squared difference between observed count and expected count divided by the expected count, of each letter.”

Example: ‘WHENTHECLOCKSTRIKESTWELVEATTACK’ Text length = 31

Letter Observed Count (Oi) Frequency in English Expected Count (Ei)* (Oi – Ei)2/Ei
A 2 8.17% 2.53177 0.11169
B 0 1.49% 0.46252 0.46252
C 3 2.78% 0.86242 5.29817
D 0 4.25% 1.31843 1.31843
E 5 12.70% 3.93762 0.28663
F 0 2.23% 0.69068 0.69068
G 0 2.02% 0.62465 0.62465
H 2 6.09% 1.88914 0.00651
I 1 7.00% 2.16876 0.62985
J 0 0.15% 0.04743 0.04743
K 3 0.77% 0.23932 31.84587
L 2 4.03% 1.24775 0.45352
M 0 2.41% 0.74586 0.74586
N 1 6.75% 2.09219 0.57016
O 1 7.51% 2.32717 0.75688
P 0 1.93% 0.59799 0.59799
Q 0 0.10% 0.02945 0.02945
R 1 5.99% 1.85597 0.39477
S 2 6.33% 1.96137 0.00076
T 5 9.06% 2.80736 1.71252
U 0 2.76% 0.85498 0.85498
V 1 0.98% 0.30318 1.60155
W 2 2.36% 0.73160 2.19907
X 0 0.15% 0.04650 0.04650
Y 0 1.97% 0.61194 0.61194
Z 0 0.07% 0.02294 0.02294
Total 31 1.00029 31.00899 51.92133

* Expected Count = FREQ / 100 × LEN

For English a Chi-Squared value of about 150 or less is expected anything above does likely does not resemble English.

WHENTHECLOCKSTRIKESTWELVEATTACK, X2 = 51.92133
THWKEEVIWTETSCANHKERCTTAKSCLLOE, X2 = 51.92133
ZDXPLXTDOWXSWCRSGPWVVOCWEOTTXOK, X2 = 425.59631

As you can see English text scores low however score is independent of letter order and a random text does not score highly.

I have created an Excel spreadsheet that can calculate Chi-Squared when given the frequencies of letters. It does not use macros. Chi-Squared Calculator

1 thought on “Chi-Squared Statistic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.