Blog


OCR and Neural Nets in JavaScript

A pretty amazing piece of JavaScript dropped yesterday and it's going to take a little bit to digest it all. It's a GreaseMonkey script, written by 'Shaun Friedle', that automatically solves captchas provided by the site Megaupload. There's a demo online if you wish to give it a spin.

Now, the captchas provided by the site aren't very "hard" to solve (in fact, they're downright bad - some examples are below):

But there are many interesting parts here:

  1. The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
  2. The script includes an implementation of a neural network, written in pure JavaScript.
  3. The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used - in a sort of crude form of Optical Character Recognition (OCR).

If we crack open the source code we can see how it works. A lot of it comes down to how the captcha is implemented. As I mentioned before it's not a very good captcha. It has 3 letters, each in a separate color, using a possible 26 letters, and they're all in the same font.

The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.

function convert_grey(image_data){
  for (var x = 0; x < image_data.width; x++){
    for (var y = 0; y < image_data.height; y++){
      var i = x*4+y*4*image_data.width;
      var luma = Math.floor(image_data.data[i] * 299/1000 +
        image_data.data[i+1] * 587/1000 +
        image_data.data[i+2] * 114/1000);

      image_data.data[i] = luma;
      image_data.data[i+1] = luma;
      image_data.data[i+2] = luma;
      image_data.data[i+3] = 255;
    }
  }
}

The canvas is then broken apart into three separate pixel matrices - each containing an individual character (this is quite easy to do - since each character is a separate color, they're broken apart just based upon the different colors used).

filter(image_data[0], 105);
filter(image_data[1], 120);
filter(image_data[2], 135);
function filter(image_data, colour){
  for (var x = 0; x < image_data.width; x++){
    for (var y = 0; y < image_data.height; y++){
      var i = x*4+y*4*image_data.width;

      // Turn all the pixels of the certain colour to white
      if (image_data.data[i] == colour) {
        image_data.data[i] = 255;
        image_data.data[i+1] = 255;
        image_data.data[i+2] = 255;
     
      // Everything else to black
      } else {
        image_data.data[i] = 0;
        image_data.data[i+1] = 0;
        image_data.data[i+2] = 0;
      }
    }
  }
}

Finally any extraneous noisy pixels are removed from the image (providing a clear character). This is done by looking for white pixels (ones that've been matched) that are surrounded (above and below) by black, un-matched, pixels. If that's the case then the matching pixel is simply removed.

var i = x*4+y*4*image_data.width;
var above = x*4+(y-1)*4*image_data.width;
var below = x*4+(y+1)*4*image_data.width;

if (image_data.data[i] == 255 &&
    image_data.data[above] == 0 &&
    image_data.data[below] == 0)  {
  image_data.data[i] = 0;
  image_data.data[i+1] = 0;
  image_data.data[i+2] = 0;
}

We're getting really close to having a shape that we can feed into the neural network, but it's not completely there yet. The script then goes on to do some very crude edge detection on the shape. The script looks for the top, left, right, and bottom-most pixels in the shape and turns it into a rectangle - and converts that shape back into a 20 by 25 pixel matrix.

cropped_canvas.getContext("2d").fillRect(0, 0, 20, 25);
var edges = find_edges(image_data[i]);
cropped_canvas.getContext("2d").drawImage(canvas, edges[0], edges[1],
  edges[2]-edges[0], edges[3]-edges[1], 0, 0,
  edges[2]-edges[0], edges[3]-edges[1]);

image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0,
  cropped_canvas.width, cropped_canvas.height);

So - after all this work, what do we have? A 20 by 25 matrix containing a single rectangle, drawn in black and white. Terribly exciting.

That rectangle is then reduced even further. A number of strategically-chosen points are then extracted from the matrix in the form of "receptors" (these will feed the neural network). For example a receptor might be to look at the pixel at position 9x6 and see if it's "on" or not. A whole series of these states are computed (much less than the full 20x25 grid - a mere 64 states) and fed into the neural network.

The question that you should be asking yourself now is: Why not just do a straight pixel comparison? Why all this mess with the neural network? Well, the problem is, with all of reduction of information a lot ambiguity exists. If you run the online demo of this script you're more likely to find the occasional failure from the straight pixel comparison than from running it through the network. That being said, for most users, a straight pixel comparison would probably be sufficient.

The next step is attempting to guess the letter. The network is being fed with 64 boolean inputs (collected from one of the extracted letters) along with another series of pre-computed values. One of the concepts behind how a neural network works is that you pre-seed it with some of the results from a previous run. It's likely that the author of this script simply ran it again and again and collected a whole series of values to get an optimal score. The score itself may not have any particular meaning (other than to the neural network itself) but it helps to derive the value.

When the neural net is run it takes the 64 values that've been computed from one of the characters in the captcha and compares it against a single pre-computed letter of the alphabet. It continues in the manner assigning a score for each letter of the alphabet (a final result might be 'A 98% likely', 'B 36% likely', etc.).

Going through the three letters in the captcha the final result is devised. It's not 100% perfect (I wonder if better scores would be achieved if the letter wasn't turned into a featureless rectangle before all these computations) but it's pretty good for what it is - and pretty amazing considering that it's all happening 100% in the browser using standards-based technology.

As a note - what's happening here is rather instance-specific. This technique *might* be able to work on a few more poorly-constructed captchas, but beyond that the complexity of most captchas just becomes too great (especially so for any client-side analysis).

I'm absolutely expecting some interesting work to be derived from this project - it holds a lot of potential.

Tags: javascript, canvas, greasemonkey

Adv. JavaScript and Processing.js

Recently I gave two talks at the Web 2.0 Expo in New York City and one for the Boston IxDA.

Learning Advanced JavaScript

An advanced talk on the JavaScript language. Explored functions, closures, function prototypes, and inheritance. The entire presentation was given using an interactive site/presentation (tested in Firefox and Safari).

Feel free to browse through the presentation (I'm not sure how useful it will be without me talking about the particulars - but it may be nice).

There are a number of neat things that I like about the implementation of this talk:

  • It's interactive. Each code slide is executable (the user can see the output right away). Additionally each slide is editable - just double-click the code to go into an edit mode.
  • Code editing is simple. Basic IDE functions (auto-indentation, proper tabbing, and backspace-to-delete-tab) are included. It's not a ton but it's enough to get started.
  • All code slides include syntax hilighting.
  • All slides are bookmarkable.

The presentation includes a number fill-in-the-blank quizzes to help test your knowledge of what you just learned. In practice I may save this for situations in which more people have laptops/computers at the talk.

You can download the full presentation as a zip file.

Building a Visualization Language

I gave a talk on my work with Processing.js, together with covering how the Canvas element works and the Processing language itself.

During the talk I stepped through the construction of a visualization using Canvas:

jQuery for the Boston IxDA

An introductory presentation explaining how jQuery works.

The meat of the presentation was a series of interactive slides which could be run and played with in order to better understand how jQuery works.

You can download the runnable code as a zip file.

Upcoming

I'm going to be giving a number of talks this weekend at the jQuery Conference followed by The Ajax Experience, I'll be sure to post the slides and code from them, as well.

Bonus

Last week was the MIT Career Fair - I stopped by and worked the Mozilla booth with Boris, Brad, and Julie - a good time as usual:

Boris, Brad, and John at the MIT Career Fair

Tags: jquery, javascript, processing, canvas

Sparklines with Javascript and Canvas

About a month ago, when the new Firefox beta was released, I decided to play around with the brand-spanking new Canvas element, which is going into HTML 5. Currently Mozilla, Safari, and Opera 9 all support it - which is a good sign. Essentially, this element allows you to do 2D graphics (drawing lines, rotating images, etc.) - which is great for doing some more 'intense' web applications.

The first thing I decided to implement was a simple Javascript Sparklines library. All it does is run through your HTML, look for your embedded Sparklines, and replace them with pretty little charts - it even scales them appropriately. It's completely unobtrusive so a simple call in the header of your HTML file should be sufficient to run it. For more information, and a snazzy demo, visit the project page.

Tags: canvas, sparklines, programming, javascript, firefox

JavaScript Books

Secrets of the JavaScript Ninja

JavaScript Secrets

Secret techniques of top JavaScript programmers.

Pro JavaScript Techniques

Pro JavaScript

The best techniques for professional JavaScript. Published by Apress.

Micro Updates

John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.

JavaScript Jobs



Hosting provided by: Ruby Hosting by Engine Yard