If you’ve been on social media this month, you may have seen a curious sight cross your feed. It’s a grid of yellow, green, and black or white squares with the term “Wordle” across the top. For those in the know, you can nearly parse those Wordle grids as a type of language by this point.
But how can you use math to come up with the best strategy for this viral word game? And how can you use that knowledge to come up with the best-possible starting words? Put on your thinking cap and prepare for a quick-and-dirty Wordle lesson.
🔡 What Is Wordle?
Whether or not you guess the right word within six guesses, the game generates a grid of emoji squares that you can copy and paste into any social media site in a snap (check out the image at the top of this section). All of these factors make Wordle super shareable, which is part of why you’ve probably started to see it everywhere. It’s a great design, all made by Brooklyn, New York-based software engineer Josh Wardle.
🔡 What Are the Best Starting Words To Use in Wordle?
Okay, so the game is good and its creator made cool choices in the design, but what we’re really here to do is get down to brass tacks (both valid Wordles) and explore the strategy of designing good Wordle guesses.
I’ve been studying Wordle for about a week now, from the specific perspective of a person who loves to play and design word puzzles of all kinds; I’m sort of a low-level aficionado. If you don’t know more about word games than the rules for the occasional living room Scrabble tournament (where people often introduce arbitrary rules like “You have to be able to define the word”), you may not realize that Scrabble—and similar word-based games—have rich and complex strategies that are a lot more about math than your profligate vocabulary.
So for me, Wordle immediately seemed like a strategy puzzle, and thankfully for us, those can be mathematized. First, I had to choose a dataset to study. Wordle itself has two libraries: one of about 2,500 words that contain possible Wordle solutions, and one of about 10,000 words that can be used as guesses. This means Wordle is both a relatively small pool, and intentionally designed to use more common words as the solutions; more obscure words are pretty much reserved as guesses on the path toward the solution.
Here are my starter Wordle guesses: ALTER, BISON, and DUCHY.
🔡 Which Letters Appear Most Frequently in Wordle?
Now, let’s talk about how I arrived at those three starting words.
To get a more-or-less straightforward sampling of natural English language, I turned to a novel that has just entered the U.S. public domain (meaning that copyrights and trademarks no longer apply to the work): Ernest Hemingway’s first novel, The Sun Also Rises, first published in 1926. This Modernist masterpiece is all-killer no-filler, with little slang and a relatively timeless take on American English. I dropped the whole text into a spreadsheet for calculation purposes.
Our first takeaway is about letter frequency. Here’s what the text bears out:
There are four vowels (everything but “U”) in the top ten most-common letters represented in the novel. The other six most-common letters are consonants: “T,” “N,” “H,” “S,” “R,” and “D.” Keen-eyed observers will note that this letter distribution is different from the “gimme” letters in the final challenge on the word-guessing TV game show Wheel of Fortune. Those letters are “R,” “S,” “T,” “L,” “N,” and “E.” I started by using guesses based on the Wheel, but they didn’t reflect the real incidence of consonants in Wordle, in particular.
From my data, I developed a slightly different set of guesses to use: ADULT, GRIME, and SHOWN. The goal was to cover as many of the most-common letters as possible. It turns out that across three guesses (and 15 letters), you can cover nearly 90 percent of the letters. At that point, I shared my theory on Medium, an online self-publishing platform, and let people take a look. A few let me know that my advice helped them improve at Wordle, which was a delight to hear.
🔡 Account for Letter Pairings and Positions
Still, some commenters let me know that there were aspects of the game that my advice wasn’t accounting for. One of these, a linguist friend told me, is the idea of digraphs and trigraphs: pairs and trios, respectively, of letters that may go together more commonly than their individual stats suggest. This, he says, could explain the frequency of “H,” because it so often appears as “TH,” “CH,” or “SH.” The Guardian has a very cool linguistics-powered take on the game, if you’d like to read more about that.
I couldn’t feasibly study the combinations of letters this way (if you take a whack at it, let me know), but I could study another factor people mentioned to me: letter order. This makes intuitive sense, like the fact that “Y” usually shows up at the end of a word, or that “Q” usually appears at the beginning. To examine this further, I went back to my dataset of five-letter words from The Sun Also Rises.
🎭 The Great American Vs. Briton Debate
Britons had a meltdown when one of this week’s answers, FAVOR, was spelled the American way. But when you make a guess attempt, the game does accept British spellings like ODOUR.
For each word’s letter breakdown, I added a note of its position in the word: first, second, third, fourth, or fifth. Then, I ranked them as percentages of the total appearances of each letter. Some, like Y, had runaway winners. Others had less-clear outcomes. In all cases, I highlighted the top one (or two, when there was no clear winner) and then recalculated my word list using these results.
Right away, the top words according to letter frequency are no longer the top words when we lump in letter order. Some words, like OCEAN, EXTRA, and OPERA, have no letters in their most frequently occurring positions. The most popular words with all five letters in the best places are WHITE, PONTE, WRITE, SAINT, and PLATE. Further down, there are ones like MONTH, GRANT, and BRUTE.
So you have two philosophical options when you open Wordle each day. Well, I guess there are three options, really. First, you can be a complete improviser, choosing words that feel right each day. In that case, the advice here will help you shape those thoughts as you decide what to guess. Second, you can choose words based on letter commonality alone, increasing your likelihood of pulling yellow squares rather than green necessarily. And third, you can consider letter order in your guesses.