A little Word-le research

Judging by my little corner of the internet, Wordle – certainly since its take over by the New York Times – is not being played as much as it was in the early part of 2022 when it began to attract the attention of media columnists (e.g. here and also here too in the same paper on the same day). Some players certainly are carrying on (and with a variety of philosophical and dialectical approaches) but I tend to see a few less posts based on coloured squares and certainly a few less times does Wordle ‘trend’. Perhaps that’s Twitter’s algorithm speaking, or perhaps it’s just that people indeed aren’t communicating about it as much as before (one of the aims of its original inventor) – but, at this point, it looks to me as though getting a six-figure sum for the rights to the game looks a pretty good bit of business for Josh Wardle. A lesson in the importance of selling at a peak.

After a couple of trial goes, my interest as a researcher was piqued and, armed with my spreadsheets, I decided to use the daily game to test a few things, not least around the usage of letters in the English alphabet, chiefly: what were the usage differences between five-letter words and all other words; and what word would maximise the possibilities of solving each game in as few attempts as possible. I used to have a boss who insisted that you knew the way any trade union ballot was going to go after the first 100 returns so, after (actually 101) goes, I reckon I’ve got enough to tell me some answers.

Comparing my list of Wordle letter usages (built since early January) with a standard list available off the net (I used this one) tells me that the usage of five letter words varies little from other words. The five most common letters in Words – E, A, O, R, L/T – compares pretty well with the standard list (E, A, R, I, O) and, indeed, across the whole alphabet, Spearman’s Rank Correlation Coefficient stands at 0.93 – so it correlates very highly. From the Wordle list, H and K are used a lot more than ‘normal’ but they’re anyway not particularly common letters: within the top half of used letters in the alphabet, D is used a lot less in five-letter words, as also are N and I, while L appears a lot more frequently (often as a result of double usages – knoll, skill, swill, spill, allow and shall have all appeared in this first 100 (as has lowly). E tops out at a fraction over 50 appearances – i.e. it appears in every other Wordle – and frequently at the end of one (23 times – nearly one in four); and then there’s quite a gap to A (44) which, in contrast to E, appears at the end of a word only three times. The rest until L/T (35 each) follow with small gaps between each one, with S (33) quite close before a sizable fall to I (25) with H, C and N the only other letters scoring more than 20. P and U complete the top half of Wordle’s most used letters in the alphabet, which then descends to X, Q, Z and J with the last of these being the final letter to break its duck.

With three vowels in the top five letters, it’s pretty hard to make a single five-letter word (that Wordle would accept as legitimate) from the most used letters – though maybe a cosmetics company might at one point usefully have decided to branch out into word games. Trialling a few different words and summing their letter counts across these 100 Wordles produces ‘rater’ and ‘treat’ as the top scorers (206 and 203, respectively) although doubling letters is not the best choice in a word game of this type (actually, 23 words of these 100 have double letters but the difficulty is picking which one to double – L, E, A, B, C, T, O, and even V, have all featured twice in one word. Using five different letters comes up with ‘store’, ‘steal’ and ‘roles’ each of which score 198 but ‘store’ looks the most likely choice on the grounds it also has an E at the end. So, I think on the basis of continued research, I’m going to switch from ‘raise’ (which scores 191 so far) to ‘store’ for future attempts. The advantage is small, but present. There may be another report.

{Edit 30/4/22: ‘stare’ is marginally better than ‘store’, scoring 201 on yesterday’s counts, so ‘stare’ it is from now on.]

The system of picking common letters – depending on the outcome of ‘raise’ I try and use the second attempt to squeeze in T and N and, as required, either O or U – hasn’t yet let me down. I’ve not had an ‘/X’ since ‘proxy’ on 18 January spread across a ‘run’ of 92 games although I had a ten-day break in the middle and my current streak thus stands at 47. My average score – omitting two Xs – is 3.77 (and it has been as low as 3.69) – i.e. somewhat more towards ‘splendid’ although with a dose of ‘impressive’, too. Where it does cause problems is where the word choice features uncommon letters – proxy being a good example: R and O were in place after my second go, but the rest all featured much more common letter combinations than P, X and Y (‘broom’ being my sixth and final go). Once the core of the word is in place, all the less common letters are as possible as each other, even if they look apparently less likely.

And that’s not a bad thing, either in word games or indeed in life in general.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s