NYT Crossword Data

NYT Crossword Data

NYT Crossword Data

Python / Selenium / Airtable

GitHub repo →

I’ve always liked doing crossword puzzles, but only began doing them consistently in 2022. After doing the NYT crossword (nearly) everyday for about a year and a half, I was curious about how my solve times were improving. I was setting new records for each weekday somewhat regularly (puzzles get harder as the week progresses, so Monday solve times will be shorter than Tuesday on average, etc) and it seemed like my average solve times were going down, but I wanted to actually look at the data to see the trend.

I can hear the
I can hear the jingle just looking at this screenshot

I emailed NYT Games to see if they could share my data with me, but unfortunately the answer was no. So, I wrote a Python script using Selenium to scrape and export my solve time for every crossword since I began doing them daily in April of 2022.

Then, I wrote another set of scripts to find, as of every day in that period, my running average for that weekday; and whether a new record was set (one script requires an Airtable account, the other just exports the data as a CSV).

Code

For more details / instructions or to view the code itself, see GitHub.

This set of scripts is meant to be run on your local machine. Together, you can use them to:

  1. Scrape your crossword solve times from New York Times Games and export as a CSV. I needed to use Selenium because each crossword’s page has a pop-up window with a “Play” button you need to click before the solve time element is loaded.
  2. Calculate running averages and solve time records for each day within a given time frame. Running averages and solve time records are calculated/stored for each day of the week separately (e.g. Monday will have its own set of running averages and solve time records). You can choose to calculate and store this data locally (exported as a CSV), or calculate and store in Airtable so that you can easily visualize how your running solve time average has changed over time (hopefully trending downwards!) and when new records were set.

get_crossword_stats.py

  • Prompts for a date range for retrieving stats
  • Opens up a Chrome window controlled by Selenium
  • Waits for you to login manually using your Apple or Facebook login (if you try to login with your email/password, it will detect that a “robot” is trying to login)
  • Redirects to the page for the first puzzle in the date range you entered
  • For each date in your set date range, scrapes your solve time and exports to a CSV

calculate_running_averages_and_records_to_csv.py

  • Opens your solve times CSV and saves as a new file, then for each day in the range given:
    • Calculates/stores your running average for that day of week
    • Determines whether you set a new record
    • Stores your current record for that day of the week

calculate_running_averages_and_records_airtable.py

I knew I wanted my data in Airtable for analysis and long-term storage, so this script just does the same thing as the last one, but instead of exporting a CSV it updates the data in Airtable directly.

Data

With all my data in Airtable, I was able to visualize the trends I had been interested in. Here’s the big picture — my running average solve time for every day of the week from April 4, 2022 through October 11, 2023:

https://images.spr.so/cdn-cgi/imagedelivery/j42No7y-dcokJuNgXeA0ig/5f5a8260-df26-4324-9221-18475d0a5335/all_days_running_average/w=1920,quality=80

Clearly, the NYT does a good job increasing the difficulty by day of week — after just a month of crosswords (~4 of each day of the week), my average solve times settled into a neat order from fastest (Mondays) to slowest (Sundays).

In the chart above, the easier days look pretty flat. But when they’re isolated, there’s a clear downward trend. Here’s my running average for Mondays:

https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2Fac5a1441-d833-47b0-b596-672c5cc843bd%2Fmonday_running_average.png?id=b03ebcd1-413c-4491-a365-692665cd29ee&table=block

With easy days like Monday and Tuesday, even when I’m consistently below my average, it takes a while to come down. Here’s my running average for Mondays with individual solve times in gray:

image

Though I’m consistently solving below my average time, it’s tough to bring it down much more — even at my best for an easier-than-average Monday puzzle, I can only enter the answers so fast. Since July, my average Monday solve time has been around 5 minutes, and I’ve only raised my average solve time twice. Yet my all-time average only dropped from 6:32 to 6:14.

For harder days like Friday and Saturday (leaving out Sunday since those are larger grids, not necessarily higher difficulty), there’s more variability overall, and it’s more common to have high outliers (I sometimes get stuck for 10+ minutes with 3-4 squares left). Here’s my Saturday running average:

https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2Fe5864e86-d7b5-466b-9cef-778a4c5b8459%2Fsaturday_running_average.png?id=c56b7777-b2ec-4019-9edf-e6e32b22c60c&table=block

Not quite as smooth of a downward trend, and that’s even clearer looking at the running average with individual solve times in gray:

image

For Saturdays it’s not too uncommon, even recently, for me to swing from almost 60 minutes back down to around 10-15 minutes within the span of a few weeks.

The last thing I was interested in was how often I was setting new records. When it happens for any particular puzzle, it’s usually due to some combination of the following factors:

  • I’m getting faster on average
  • The puzzle is easier than usual for the day of the week, for anyone solving it
  • A few long or otherwise important clues came to me quickly for some reason

Since those last two reasons are pretty much luck, it can take a while to break records even when I’m gradually getting faster on average.

Here’s a look at my records for every day of week since April 2022:

https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2F3d7f0fca-b373-4b57-a715-f97009f3a85b%2Fall_days_records.png?id=2078566a-0daa-4dfd-981c-96e2f8ad3ce1&table=block

It’s a little neater if we just look at the past year, from October 12, 2022 through October 11, 2023:

https://images.spr.so/cdn-cgi/imagedelivery/j42No7y-dcokJuNgXeA0ig/f8a18cc9-9b22-42ea-b78f-a35dcd7839de/all_days_records_past_year/w=1920,quality=80

I was surprised to see that despite my running solve time average consistently improving, for some days, my record has gone unbroken for a long time. It’s been about 11 months since I set a new Thursday record and 7 months since I last set a new Sunday record. Here are all my records for those days:

https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2F24bf8faf-9c03-4ada-854e-5567e0d47ca1%2Fsunday_records_data.png?id=04e756ca-4663-408d-9439-82a1ad6e92c3&table=block
https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2Fa061171a-1ebb-4dbe-a2af-4a2fd83328ab%2Fthursday_records_data.png?id=38577efc-5512-49df-84d7-5f5b328950dd&table=block

And my most recent new record (Tuesday, October 10) was my first new record in over 2 months, for any day of the week:

https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F78cf2b16-8db5-48b7-a774-101f060e9da2%2F5cfe8448-a0ca-4f31-918d-0bf230348d21%2Fall_days_recent_records_data.png?id=aea0ae11-aa7d-4a73-beb9-aa74e8145618&table=block

Why do this?

Just like the crosswords themselves, gathering and analyzing this data was mostly just for fun — to see if anything interesting popped out. But now that I’ve seen the data, it feels good to know that I’ve been making progress over the past year and a half of puzzles — and motivates me to keep up my streak.