Sudoku Difficulty

Sudoku Difficulty

Why can I solve Diabolical Sudoku puzzles on one website but only Easy ones on another?

Research Paper Preprint on arXiv Sudoku benchmark datasets from 3 popular websites available for evaluation

The Story

Two years ago, my Patti (grandmother in Tamil), Shyamala Vaidyanathan, visited from India. During her stay, I discovered how much she loved solving Sudoku puzzles. It had become part of her daily routine. Her neurologist had even encouraged it, since Sudoku is known to help with cognition and memory, which was especially important for her as she began experiencing the early stages of dementia.

Every day she would sit down to solve her puzzle. Sometimes she would make quick progress, other times she would pause for long stretches. But there was always joy in the process—an activity that was hers, that kept her engaged, and that gave her the satisfaction of finishing something challenging.

One afternoon, though, something unusual happened. I noticed she looked frustrated, almost upset, while working through her puzzle. When I asked why, she turned to me and asked a question:

“Why can I solve Diabolical level Sudoku puzzles on one website, but not Easy puzzles on another?”

That was the beginning of Project Patti.

Characterizing Sudoku Difficulty

How I Approached the Problem

To answer this question, I needed to find a way to define the difficulty of a single Sudoku puzzle. To do this I first use two distinct methods that can solve every Sudoku puzzle. From these methods I derive a metric that summarizes the difficulty of solving a Sudoku puzzle using that method. The two new metrics I propose are Clause Length Distribution and Nishio Human Cycles to characterize Sudoku difficulty. The two solving methods are outlined below.

Clause Length Distribution

A more computer-solving oriented method, the first method is based on converting a Sudoku puzzle into its corresponding Satisfiability (SAT) problem. The metric derived from SAT Clause Length Distribution captures the structural complexity of a Sudoku puzzle, including the number of given digits and the positions of the cells.

Nishio Human Cycles

A human-oriented method, the second method simulates human Sudoku solvers by intertwining four popular Sudoku strategies within a backtracking algorithm called Nishio. The metric is computed by counting the number of times Sudoku strategies are applied within the backtracking iterations of a randomized Nishio, providing a measure of procedural difficulty.

Using these two metrics, I analyzed more than a thousand Sudoku puzzles across five popular websites to characterize every difficulty level in each website. I evaluated the relationship between the proposed metrics and website-labeled difficulty levels using Spearman's rank correlation coefficient, finding strong correlations for 4 out of 5 websites. I also constructed a universal rating system using a simple, unsupervised classifier based on the two proposed metrics. This rating system is capable of classifying both individual puzzles and entire difficulty levels from the different Sudoku websites into three categories - Universal Easy, Universal Medium, and Universal Hard.

See the full a preprint of the research paper on arXiv here: Project Patti: Why can You Solve Diabolical Puzzles on one Sudoku Website but not Easy Puzzles on another Sudoku Website?

Sudoku Benchmark Datasets from 3 popular websitess

Dataset Overview

For this study, puzzles from five popular Sudoku websites have been collected, along with the difficulty level of each puzzle in accordance with that of the website. The websites include:

  • Sudoku.org.uk
  • Extreme Sudoku
  • Sudoku of the Day UK
  • Sudoku of the Day
  • New York Times

Most websites have an archive, including Sudoku.org.uk, Extreme Sudoku, and Sudoku of the Day UK. Some websites have limited archives such as Sudoku of the Day, and others, such as New York Times, have no archive. For these websites, puzzles for each difficulty were collected over an extended period to ensure an equal number of puzzles per difficulty level. The dataset consists of a total of 1320 puzzles, 60 from each difficulty level in each website.

I currently have permission from puzzlemakers for 3 of the 5 websites, shown below, to release the datasets publicly for academic use only. These datasets, a total of 900 puzzles, are now available for evaluation on GitHub:

Sudoku.org.uk
  • Difficulties: Gentle, Moderate, Tough, Diabolical
  • Number of Puzzles: 240
  • Collection Dates: Apr 15, 2024 – Jun 5, 2025
  • Archive Availability: Publicly available
  • Link: https://sudoku.org.uk/
Extreme Sudoku
  • Difficulties: Evil, Excessive, Egregious, Excruciating, Extreme
  • Number of Puzzles: 300
  • Collection Dates: Dec 5, 2024 – Feb 2, 2025
  • Archive Availability: Publicly available
  • Link: https://www.extremesudoku.info/
Sudoku of the Day
  • Difficulties: Beginner, Easy, Medium, Tricky, Fiendish, Diabolical
  • Number of Puzzles: 360
  • Collection Dates: Dec 17, 2024 – Feb 20, 2025
  • Archive Availability: Archive published for previous week
  • Link: https://www.sudokuoftheday.com/dailypuzzles

Important Note: This archive is intended for academic and research purposes only.

Sudoku Difficulty Research Talk at Adobe Applied Science Research Group

This talk shares how I approached that question, the methods I developed to characterize Sudoku difficulty, and insights from my experiments and analysis. I also discuss the motivation behind the project and how it all started with my grandmother.

Acknowledgments

These are exchanges (email interactions) with puzzlemakers for some of the Sudoku websites and researchers who have done work in similar areas.

Extreme Sudoku Puzzlemaker

I corresponded with the Extreme Sudoku puzzlemaker, who validated the conclusions that I made about Extreme Sudoku, specifically that all the difficulty levels are relatively hard. He confirmed that all difficulty levels on the site are intentionally designed to require more advanced human strategies than those used in my study, explaining why the site does not follow the normal trend of increasing difficulty. He also noted that Nishio guessing is not considered an appropriate way to assess difficulty within their philosophy of puzzle design, supporting why my Nishio metric showed weak correlation for Extreme Sudoku (see page 11 of my paper).

Sudoku of the Day Puzzlemaker

The creator of Sudoku of the Day shared the detailed heuristic used to rate puzzle difficulty on the website, which evaluates puzzles based on both the complexity of human strategies required and the number of times each technique is applied. This directly validates one of my main conclusions: that Sudoku difficulty depends not only on the sophistication of the strategies involved, but also how often they are used in solving the puzzle. His heuristic can be found here .

Professor Maria Ercsey-Ravasz & Professor Zoltan Toroczkai

Authors of "The Chaos Within Sudoku"

Professors Ercsey-Ravasz and Toroczkai wrote the paper “The Chaos Within Sudoku” , which inspired me to do this study. Their work introduces a scalar difficulty metric using a deterministic continuous-time dynamical system solver applied to a Sudoku puzzle’s SAT formulation. Professor Toroczkai validated that my metrics are a step towards characterizing Sudoku difficulty and that my study on over 1300 puzzles supports their conclusions and results from the work in "The Chaos Within Sudoku".

Llion Jones & Team

Sakana AI

In my interaction with Llion Jones, Co-Founder of Sakana AI, he validated my experiments on their nikoli_100 dataset, a Sudoku benchmark dataset which they wrote about in detail here , noting that my difficulty classifications aligned with the actual difficulty of the Sudoku puzzles. He confirmed that bridging the gap between computer and human solving methods remains a crucial challenge.

Professor David Eppstein

Professor of Computer Science, UC Irvine

Professor David Eppstein is a distinguished professor of Computer Science at the University of California, Irvine and has conducted extensive research on Sudoku, including formal analysis of Nishio and Sudoku as NP-Hard. Indeed, I first learnt about Nishio from his paper found here. After sharing my research with him, I learnt that Professor Eppstein has an unpublished paper, which can be found here , in which he presents an intuition similar to the methodology Simulating Human Solver via Randomized Nishio with Human Strategies in section 5.2 of my paper. Professor Eppstien expresses in his email that human-oriented metrics and methods align better with true puzzle difficulty than purely computational approaches.

Coming Soon: Try It Yourself

This website is the next step in my project. Soon, you will be able to input any Sudoku puzzle and receive two pieces of information about that puzzle:

  • The puzzle’s Nishio Human Cycles value, providing a universal difficulty rating, allowing comparison of the puzzle’s difficulty to any other Sudoku puzzle, no matter what website it comes from.
  • Its Universal Difficulty Category, so you know whether it is truly easy, medium, or hard—regardless of what the website claims.

The goal is to help anyone, from casual to experienced solvers, find puzzles that are appropriately challenging and enjoyable, without unnecessary frustration.

About Me

My name is Arman Eisenkolb-Vaithyanathan and I am a junior at Lynbrook High School.