The Chi-Squared Statistic is a measure of how two categorical distributions differ from one another. So for 2 identical distributions the score would be 0 and as the distributions begin to diff the score will increase. The formula is…

O_{i} is the observed count of that letter in your text.

E_{i} is the expected count of that letter in the length of your text.

Chi-Squared Statistic in words is, “the sum, of the squared difference between observed count and expected count divided by the expected count, of each letter.”

Example: ‘WHENTHECLOCKSTRIKESTWELVEATTACK’ Text length = 31

Letter | Observed Count (O_{i}) |
Frequency in English | Expected Count (E_{i})* |
(O_{i – }E_{i})^{2}/E_{i} |

A | 2 | 8.17% | 2.53177 | 0.11169 |

B | 0 | 1.49% | 0.46252 | 0.46252 |

C | 3 | 2.78% | 0.86242 | 5.29817 |

D | 0 | 4.25% | 1.31843 | 1.31843 |

E | 5 | 12.70% | 3.93762 | 0.28663 |

F | 0 | 2.23% | 0.69068 | 0.69068 |

G | 0 | 2.02% | 0.62465 | 0.62465 |

H | 2 | 6.09% | 1.88914 | 0.00651 |

I | 1 | 7.00% | 2.16876 | 0.62985 |

J | 0 | 0.15% | 0.04743 | 0.04743 |

K | 3 | 0.77% | 0.23932 | 31.84587 |

L | 2 | 4.03% | 1.24775 | 0.45352 |

M | 0 | 2.41% | 0.74586 | 0.74586 |

N | 1 | 6.75% | 2.09219 | 0.57016 |

O | 1 | 7.51% | 2.32717 | 0.75688 |

P | 0 | 1.93% | 0.59799 | 0.59799 |

Q | 0 | 0.10% | 0.02945 | 0.02945 |

R | 1 | 5.99% | 1.85597 | 0.39477 |

S | 2 | 6.33% | 1.96137 | 0.00076 |

T | 5 | 9.06% | 2.80736 | 1.71252 |

U | 0 | 2.76% | 0.85498 | 0.85498 |

V | 1 | 0.98% | 0.30318 | 1.60155 |

W | 2 | 2.36% | 0.73160 | 2.19907 |

X | 0 | 0.15% | 0.04650 | 0.04650 |

Y | 0 | 1.97% | 0.61194 | 0.61194 |

Z | 0 | 0.07% | 0.02294 | 0.02294 |

Total | 31 | 1.00029 | 31.00899 | 51.92133 |

* Expected Count = FREQ / 100 × LEN

For English a Chi-Squared value of about 150 or less is expected anything above does likely does not resemble English.

WHENTHECLOCKSTRIKESTWELVEATTACK, X^{2}= 51.92133 THWKEEVIWTETSCANHKERCTTAKSCLLOE, X^{2}= 51.92133 ZDXPLXTDOWXSWCRSGPWVVOCWEOTTXOK, X^{2}= 425.59631

As you can see English text scores low however score is independent of letter order and a random text does not score highly.

I have created an Excel spreadsheet that can calculate Chi-Squared when given the frequencies of letters. It does not use macros. Chi-Squared Calculator

Justin TimeBest explanation I’ve found. Thanks