AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic

View All Posts

23 min read

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tags · Support

« Build Your Own DIY Laser Show: Easy Steps & Amazing Results!

Build Your Own Voice-Controlled Robot with ESP32 & TensorFlow Lite »

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Discover how to recreate a Sudoku app using browser APIs and ecosystem advancements, and explore the image processing pipeline technique for extracting Sudoku puzzles and solving them.

Sudoku Grab for the iPhone - Unleash the power of technology to effortlessly solve Sudoku puzzles with this incredible iPhone App available on the App Store! Click to learn more and get started.

#Shorts - Wordle solving robot - Witness the incredible Wordle-solving robot in action! Click through to see how this fascinating machine tackles the popular word puzzle with ease.

Wordle Solving Robot - Learn how a Wordle-solving robot was built using a 3D printer and a Raspberry Pi, tackling challenges like locating the phone screen, mapping printer bed coordinates, and selecting the best possible guesses.

A Smarter Magic Mirror - Raspberry Pi Face Recognition and Doppler Radar Motion Detection - Learn how to enhance a magic mirror with motion sensors, face detection, and face recognition using PIR sensors, radar sensors, and image processing on a Raspberry Pi.

Augmented Reality In The Browser - This blog showcases the progress of my idea to create an augmented reality Sudoku solver using technology that enables us to solve puzzles in our browser rather than with dedicated apps. I have developed an AR Sudoku solver with a simple image processing pipeline. It identifies and extracts Sudoku puzzles from pictures, recognizes each cell's numbers, solves the puzzle, and renders the solution on top of the original image. This process is accomplished by converting the image to greyscale, conducting thresholding, OCR-processing, and puzzle-solving. I've also done parallel image location and extraction. It's a technical journey that's achieved some high-quality, low-drift results, visible in the video included. The full codebase is accessible on Github. Enjoy the joy of leveraging technology to efficiently solve Sudoku puzzles!

How does it all work? - In this blog, I explain the process behind my Sudoku Grab app, a solution that uses basic image processing techniques to recognize Sudoku puzzles. This is done by locating the puzzle in an image, turning it back into a square form, segmenting it to find potential numbers, and lastly, recognizing those numbers. This involves simple thresholding techniques, blob extraction algorithm, perspective transform, and a Neural Network for Optical Character Recognition (OCR) to recognize digits from the photograph. I conclude by mentioning the potential for multiple enhancements to this process.

Updating Sudoku Grab - Currently, I am revitalizing Sudoku Grab, my first app ever, published in 2009. The plan involves updating the code, upgrading to ARC, simplifying the menu system, accommodating iOS7, and supporting 4-inch display sizes. Future updates may include iCloud syncing features for universal access to puzzles. This endeavor also gives me an excuse to switch to an iPhone5s!

Version 2 of SudokuGrab released - We've updated SudokuGrab with some major enhancements including iCloud syncing for your puzzles and support for iPads. There are a few bugs we're aware of and we're actively working to fix them in the coming weeks, with a new version already submitted to the app store.

Mutiple Devices and User Defaults - In this blog, we're diving into the details of setting up and using iCloud's NSUbiquitousKeyValueStore to help persist settings in the new universal app version of Sudoku Grab. We'll look at how to enable cloud syncing for your app, handle updates from iCloud, and persist changes, allowing us to keep user settings consistent across devices. Note, this storage method is meant for small amounts of data with particular quotas - so keep an eye on your usage. For hands-on experience, there's a simple demo project on GitHub ready for you to experiment with!

[0:00] Hey Everyone!
[0:01] A long time ago I wrote an app for the iPhone that let you take a grab a sudoku puzzle using
[0:08] your iPhone’s camera.
[0:10] Recently when I was investigating my self organising Christmas lights project I realised
[0:16] that the browser APIs and ecosystem had advanced to the point where it was possible
[0:22] to recreate my Sudoku application running purely in the browser.
[0:27] As you can see it works pretty well - you can try it out for yourself using the links
[0:32] in the description and as always, all the code is in GitHub for you to try out for yourself.
[0:38] Hopefully, this video will give you a good idea how the system works and the thinking behind what I’ve done.
[0:46] Let’s take a look under the hood…
[0:49] Let’s define our problem:
[0:52] Given an image identify and extract the Sudoku puzzle, recognise the numbers in each square
[0:59] of the puzzle, solve the puzzle, render the results on top of the original image.
[1:06] As a classically trained image processing guy, my first port of call is to remove redundant information.
[1:14] We don’t care about colours, so we’ll convert the image to greyscale.
[1:18] As you can see we’ve not really lost any information.
[1:22] Looking at the image, we only ready care about paper and ink,
[1:26] so it makes sense to binarise the image and make black or white.
[1:31] Next we need to identify the blob that is the puzzle.
[1:35] Once we’ve got that we can extract the image of the puzzle and extract the contents of
[1:40] each box of the puzzle.
[1:41] Applying some OCR to the contents of each box, we can then solve the puzzle and our job is done.
[1:50] It’s a pretty straightforward image processing pipeline!
[1:54] Let’s dive into each of these steps in turn and see how they work.
[1:58] We’re taking a feed from the camera on the device.
[2:02] This comes into us as an RGB image.
[2:05] We’re not really interested in colour as we’re working with printed puzzlestypically
[2:09] which will be printed in black and white.
[2:13] So our first step is to convert from RGB to greyscale.
[2:18] If we look at the standard formula for this we get this fomula (https://en.wikipedia.org/wiki/Grayscale).
[2:21] We can see that a large portion of the value is coming from the green channel.
[2:26] So we can apply a little shortcut to our RGB to greyscale conversion and just take the
[2:31] green channel of the image.
[2:33] We’re going to be using morphological operations for locating the puzzle
[2:39] typically these work on black and white binary images, so our next step is binarise our image.
[2:46] This involves applying a threshold to seperate out foreground from background pixels.
[2:52] I’ve simulated an image here that has been taken in poor lighting conditions.
[2:58] As you can see from the histogram there’s not a clear seperation of ink and paper.
[3:03] This makes it very hard to apply a global threshold to the whole image.
[3:09] However, if we look at a small section of the image we can see from the histogram that
[3:14] we do have an obvious collection of ink pixels and paper pixels.
[3:19] What we need to do is examine each pixel in the context of its surrounding area.
[3:24] A simple way to do this is to apply a blur to the image and then compare the original
[3:31] image with this blurred image.
[3:35] Doing this gives us a very clean segmented image even when lighting conditions are not ideal
[3:42] We’ve now got a fairly clean image that contains our puzzle along with a bunch of random other
[3:48] printed elements.
[3:50] A classic approach would be to jump straight in here with something like a hough transform to detect lines
[3:56] However, we can be a bit smarter about what we are doing and take a bit of a shortcut.
[4:03] We know that whoever is taking the picture should be pointing the camera at a puzzle.
[4:07] So we can assume that the puzzle would be the largest object in the image
[4:13] We can also apply simple heuristics to filter out elements that probably aren’t a puzzle.
[4:21] If we scan the image pulling out all connected components.
[4:26] The connected component that contains the puzzle should be the one with the most points. And matches our heuristics.
[4:32] This lets us isolate the puzzle grid.
[4:36] With the puzzle grid identified we now need to identify the four corners of the puzzle.
[4:43] Once again we can apply a fairly simple heuristic to this problem.
[4:47] The pixels that are in the four corners should have the smallest manhatten distance from
[4:53] the corners of the max extents of the extracted component.
[4:58] You can see the calculation of manhattan distance in the animations.
[5:04] Runing this algorithm for all four extents of our located puzzle gives us the four corners.
[5:13] We know where the puzzle is in our image.
[5:17] We have the location of the four corner points.
[5:21] To extract the puzzle image we can reformulate our problem into one of a homographic transform.
[5:29] We want to map from the camera image of the puzzle to an ideal image of the puzzle that
[5:35] is not distorted by perpective or rotations.
[5:40] We do this by computing a homography between the two planes.
[5:46] The formula shown can be transformed into this new formula - as you can see we need four points
[5:54] - this maps nicely onto the corner points that we’ve already located for the puzzle.
[6:02] If we rewrite the matrices algebraically we can find h using this algorithm.
[6:07] Please see the linked paper for details on how this is derived
[6:14] and alternative more stable methods for calculating the homography.
[6:19] Now that we have the homography between our ideal image and the camera image we can map pixels between them.
[6:28] This lets us extract a nice square puzzle image from the camera image.
[6:33] Now that we have the square puzzle image we need to extract the contents of each individual cell.
[6:40] We can use the thresholded version of the image to help us with this.
[6:45] Looking at each box in turn we extract the bounds of any connected region starting from
[6:51] the center of each cell.
[6:54] We can then use this bounds to extract an image of the digit from the square greyscale image.
[7:01] If there is no connected region in the center of the cell then we know that it is empty.
[7:07] We now have an image for each populated cell of the puzzle.
[7:13] We’re going to use a neural network to perform OCR on each image.
[7:18] I’m going to use tensorflow to train the network and then tensorflow js to run the network
[7:24] in the browser.
[7:25] Our neural network is going to be trained to recognise the digits 1 through 9.
[7:32] I’ve synthesized a large number of training examples from a wide selection of fonts rendering
[7:38] a greyscale image of each digit.
[7:42] I’ve also added a small amount of blur and some Gaussian noise to each image.
[7:47] Hopefully this will provide a reasonable representation of what we will be getting in a live environment.
[7:54] I’ve also sperated out 10% of the images into a testing data set that we can use to evaluate
[8:01] how well our network performs.
[8:06] I’m using an interactive jupyter notebook to train the network.
[8:11] We’ll import TensorFlow and then set up the batch size
[8:15] and the number of epochs for our training session.
[8:19] I’m going to use a batch size of 32 and run the training for 100 epochs.
[8:26] We’re also going to resize our images to 20 by 20 pixels.
[8:32] We need to remember to also do this in the javascript code so that it matches.
[8:39] I’m augmenting our training data to help the network generalise.
[8:44] This will add some random mutations to our input data set.
[8:49] We’ll split out 20% of our images into a validation set for use during training and also normalises the pixel values.
[9:00] We’ll also have a data generator that does no augmentation and only normalises the pixels values.
[9:08] We can now load the data, splitting our training images into training and validation subsets.
[9:14] Here’s a small sample of our augmented training data.
[9:25] For our problem we don’t need a very complicated model, I have built a single convolution layer
[9:31] followed by a dense hidden layer and then finally an output layer with nine outputs
[9:38] one output for each digit.
[9:40] We’ll log the trainign out to tensorboard and also create a confusion matrix to show
[9:46] us which digits are causing problems.
[9:49] For training, we need to use CategoricalCrossentropy loss function.
[9:57] Running the training we can see that we have pretty good accuracy
[10:01] on both training and validation.
[10:05] Looking at the confusion matrix we can see that “1”s and “7”s are currently getting confused.
[10:15] Once the training is complete we can see that we have good performance on both the training data
[10:21] and the validation data.
[10:24] This indicates that our network is performing well on data it has seen as well as data it
[10:31] has never seen before.
[10:33] Looking at the confusion matrix there are still some issues between 1s and 7s.
[10:40] Let’s save the model and see which images it’s actually failing on.
[10:45] Looking at the failed images I’m pretty happy with the performance.
[10:49] I think the fonts that it is failing on would not be used in the real world so I’m happy
[10:54] to ignore these failures and move on with this network.
[10:58] As I’m happy with neural network structure I’m going to train it on all the data and
[11:06] use the resulting network in my production code.
[11:12] The final training gives us really good acuuracy.
[11:16] Checking to see if any images fail.
[11:19] We now see that no images are incorrect.
[11:23] We’ll convert this model for use in TensorFlow.js.
[11:28] To run our model we need to perform the same image resizing and normalisations as we did
[11:34] during training and then we can simply feed our images into the model and ask it to predict
[11:41] the digits.
[11:44] We can solve a sudoku puzzle using Donald Knuths application of Dancing Links to his
[11:50] Algorithm X for solving the exact cover problem.
[11:55] Dancing links uses the seemingly very simple realisation that we can remove and put back
[12:01] nodes from a doubly-linked list very efficiently.
[12:06] We can see that when we remove node B from the list it still has pointers back to the
[12:12] nodes that used to link to it.
[12:15] This means that node B contains all the information required to add it back into the list.
[12:23] So what is Algorithm X?
[12:27] Algorithm X is an algorithm that solves the exact cover problem.
[12:33] The exact cover problem is solved by finding the selection of rows in the grid that will
[12:37] combine so that there is a 1 in each column.
[12:41] This example is taken from Wikipedia - https://en.wikipedia.org/wiki/Knuth%27s_Algorithm_X
[12:48] Looking at our grid Column 1 has the fewest entries so we select it.
[12:56] This gives us a choice of adding rows A or B to our solution.
[13:03] We try adding row A to our potential solution set.
[13:06] We can see it has an entry in columns 1, 4 and 7.
[13:10] And these columns have entries in Rows A, B, C, E and F
[13:17] Removing these columns leave us with row D.
[13:21] We now have zero rows in column 2 so we have failed to find a solution.
[13:27] We need to backtrack and try again.
[13:32] This time we try adding Row B to our solution.
[13:36] This flags column 1 and 4 for removal which means that rows A, B and C should also be removed.
[13:44] Doing this leaves us with rows D, E and F and columns 2, 3, 5,6 and 7.
[13:53] Now column 5 has the lowest number of entries so we select it.
[13:59] This gives us a choice of Row D. Selecting row D flags columns 3, 5 and 6 for removal
[14:07] which means that rows D and E should also be removed.
[14:12] We now have Row F remaining.
[14:14] We include Row F in our solution which means that columns 2 and 7 are flagged for removal
[14:21] which removes row F.
[14:23] We now have an empty matix which means we have found a solution.
[14:29] The final solution is rows B, D and F.
[14:33] And if you look in the grid you can see that combining those rows together would give you a 1 in each column.
[14:41] So, how does dancing links fit into this?
[14:45] Dancing links solves the problem of backtracking in an efficient way.
[14:50] We can encode the sparse matrix as a doubly linked circular list going left and right
[14:55] for the rows and up and down for the columns.
[14:59] Since these are now linked lists we can use fact that we can efficiently remove and re-add
[15:04] elements to perform the column and row removals and now when we backtrack it can be done very efficiently.
[15:10] This is what a Sudoku puzzle looks like when it’s been turned into a set of constraints.
[15:17] You can see as we add the known numbers.
[15:20] The number of rows and columns decreases.
[15:23] As we run the search algorithm you can see how quickly the search space is reduced.
[15:29] You can also see that it occasionally needs to backtrack before we find the complete solution.
[15:36] To render our results we could draw our text into a square image and then project each
[15:42] pixel onto the image coming from the camera.
[15:45] This would be quite computationally intensive.
[15:49] Alternatively, for each puzzle cell, we could take an imaginary line through the centre of the cell.
[15:55] We can then project that using our homographic transform and use this project line to calculate
[16:01] an approximate height and angle for the cell.
[16:04] We then simply draw the digit at the projected location of the cell with the calculated height and angle.
[16:14] So that’s it for this video.
[16:16] All the code for this project is on GitHub - check the video description for the link.
[16:24] I hope you’ve enjoyed this video - please hit the subscribe button and leave any thoughts
[16:29] you might have in the comments.
[16:31] Thanks for watching and I’ll see you in the next video!

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tags · Support

AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic

Related Videos

Related Posts

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...