John Resig - OCR and Neural Nets in JavaScript

OCR and Neural Nets in JavaScript

A pretty amazing piece of JavaScript dropped yesterday and it’s going to take a little bit to digest it all. It’s a GreaseMonkey script, written by ‘Shaun Friedle‘, that automatically solves captchas provided by the site Megaupload. There’s a demo online if you wish to give it a spin.

Now, the captchas provided by the site aren’t very “hard” to solve (in fact, they’re downright bad – some examples are below):

But there are many interesting parts here:

The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
The script includes an implementation of a neural network, written in pure JavaScript.
The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used – in a sort of crude form of Optical Character Recognition (OCR).

If we crack open the source code we can see how it works. A lot of it comes down to how the captcha is implemented. As I mentioned before it’s not a very good captcha. It has 3 letters, each in a separate color, using a possible 26 letters, and they’re all in the same font.

The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.

function convert_grey(image_data){
for (var x = 0; x < image_data.width; x++){ for (var y = 0; y < image_data.height; y++){ var i = x*4+y*4*image_data.width; var luma = Math.floor(image_data.data[i] * 299/1000 + image_data.data[i+1] * 587/1000 + image_data.data[i+2] * 114/1000); image_data.data[i] = luma; image_data.data[i+1] = luma; image_data.data[i+2] = luma; image_data.data[i+3] = 255; } } }[/js] The canvas is then broken apart into three separate pixel matrices - each containing an individual character (this is quite easy to do - since each character is a separate color, they're broken apart just based upon the different colors used). [js]filter(image_data[0], 105); filter(image_data[1], 120); filter(image_data[2], 135);[/js] [js]function filter(image_data, colour){ for (var x = 0; x < image_data.width; x++){ for (var y = 0; y < image_data.height; y++){ var i = x*4+y*4*image_data.width; // Turn all the pixels of the certain colour to white if (image_data.data[i] == colour) { image_data.data[i] = 255; image_data.data[i+1] = 255; image_data.data[i+2] = 255; // Everything else to black } else { image_data.data[i] = 0; image_data.data[i+1] = 0; image_data.data[i+2] = 0; } } } }[/js] Finally any extraneous noisy pixels are removed from the image (providing a clear character). This is done by looking for white pixels (ones that've been matched) that are surrounded (above and below) by black, un-matched, pixels. If that's the case then the matching pixel is simply removed. [js]var i = x*4+y*4*image_data.width; var above = x*4+(y-1)*4*image_data.width; var below = x*4+(y+1)*4*image_data.width; if (image_data.data[i] == 255 && image_data.data[above] == 0 && image_data.data[below] == 0) { image_data.data[i] = 0; image_data.data[i+1] = 0; image_data.data[i+2] = 0; }[/js] We're getting really close to having a shape that we can feed into the neural network, but it's not completely there yet. The script then goes on to do some very crude edge detection on the shape. The script looks for the top, left, right, and bottom-most pixels in the shape and turns it into a rectangle - and converts that shape back into a 20 by 25 pixel matrix. [js]cropped_canvas.getContext("2d").fillRect(0, 0, 20, 25); var edges = find_edges(image_data[i]); cropped_canvas.getContext("2d").drawImage(canvas, edges[0], edges[1], edges[2]-edges[0], edges[3]-edges[1], 0, 0, edges[2]-edges[0], edges[3]-edges[1]); image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0, cropped_canvas.width, cropped_canvas.height);[/js] So - after all this work, what do we have? A 20 by 25 matrix containing a single rectangle, drawn in black and white. Terribly exciting. That rectangle is then reduced even further. A number of strategically-chosen points are then extracted from the matrix in the form of "receptors" (these will feed the neural network). For example a receptor might be to look at the pixel at position 9x6 and see if it's "on" or not. A whole series of these states are computed (much less than the full 20x25 grid - a mere 64 states) and fed into the neural network. The question that you should be asking yourself now is: Why not just do a straight pixel comparison? Why all this mess with the neural network? Well, the problem is, with all of reduction of information a lot ambiguity exists. If you run the online demo of this script you’re more likely to find the occasional failure from the straight pixel comparison than from running it through the network. That being said, for most users, a straight pixel comparison would probably be sufficient.

The next step is attempting to guess the letter. The network is being fed with 64 boolean inputs (collected from one of the extracted letters) along with another series of pre-computed values. One of the concepts behind how a neural network works is that you pre-seed it with some of the results from a previous run. It’s likely that the author of this script simply ran it again and again and collected a whole series of values to get an optimal score. The score itself may not have any particular meaning (other than to the neural network itself) but it helps to derive the value.

When the neural net is run it takes the 64 values that’ve been computed from one of the characters in the captcha and compares it against a single pre-computed letter of the alphabet. It continues in the manner assigning a score for each letter of the alphabet (a final result might be ‘A 98% likely’, ‘B 36% likely’, etc.).

Going through the three letters in the captcha the final result is devised. It’s not 100% perfect (I wonder if better scores would be achieved if the letter wasn’t turned into a featureless rectangle before all these computations) but it’s pretty good for what it is – and pretty amazing considering that it’s all happening 100% in the browser using standards-based technology.

As a note – what’s happening here is rather instance-specific. This technique *might* be able to work on a few more poorly-constructed captchas, but beyond that the complexity of most captchas just becomes too great (especially so for any client-side analysis).

I’m absolutely expecting some interesting work to be derived from this project – it holds a lot of potential.

Posted: January 23rd, 2009

Subscribe for email updates

47 Comments (Show Comments)

Alan Hogan (January 23, 2009 at 8:17 pm)

Wow. This is a great find, John – pushing the limits of what I assumed was possible with JavaScript in at least two different ways.
Andrew Dupont (January 23, 2009 at 8:23 pm)

I would hope developments like these would convince those who use CAPTCHAs not to use them. But more likely it’ll result in more and more inscrutable CAPTCHAs.
Mike Taylor (January 23, 2009 at 8:24 pm)

That’s insane.

It would be awesome if this script was sending all the actual captchas images and their human-read equivalents back to some database that would serve as further training data. I’m sure spammers would pay big money for that. ;)
David Bolter (January 23, 2009 at 8:41 pm)

Warning: thinking aloud after a long day…

It would be nice if we could harvest our (human) successful CAPTCHA interactions to train a neural net. An existing neural net could use back prop. or something based on the correctness of it’s guesses when presented with a CAPTCHA. Perhaps it could sit passively as a FF extension, churning into silent action when a CAPTCHA is presented.

Maybe this could be part of the WebVisum FF extension.
David Bolter (January 23, 2009 at 8:43 pm)

Uhm. And what @Mike Taylor said while I was reading and replying :)
Jon Baer (January 23, 2009 at 8:49 pm)

Im curious w/ getImageData is it possible to use other type of algorithms for say form factor detecting (ie face, objects, etc) I have to imagine this type of work is not extremely fast but the nn aspect seems like there are a bunch of things you can do beyond simple CAPTCHA … any thoughts? I have not played around w/ canvas much yet but I think this post peaked my interest big time.
John Resig (January 23, 2009 at 8:55 pm)

@Jon Baer: Check out the image processing demos from back when I ported Processing to JavaScript. There’s all sorts of stuff in there (granted, it’s rather slow and limited) – but it shows potential!
Adam Schwartz (January 24, 2009 at 3:01 am)

@Jon Baer: Speed is likely going to be slower when compared to say, php image processing. But there’s still a lot you can do. One site, canvaspaint.org, has attempted to recreate MS Paint using the canvas element.

Also, I’m currently working on a site which generates collages from user-uploaded images. Using getImageData on each pixel of the original image, I use flickr’s API to find photos which match the hex color. It’s still mostly hacked together, but it’s certainly another good example of what can be done. Check it out here. (Also, here’s an example collage using the google logo.)

@John Resig: Very, very cool find. The neural network aspect of this really interests me. Just a note: I think you meant to point your link to “the source code” (in the sentence “If we crack open the source code we can see how it…”) to userscript 38736, (not 3873).
Shaun Friedle (January 24, 2009 at 4:59 am)

“A pretty amazing piece of JavaScript dropped yesterday and it’s going to take a little bit to digest it all.”

Oh, thank you. I actually wrote most of the code in December, I just updated it a few times this month. I’m somewhat amazed at the sudden publicity and praise since I consider myself only mediocre in my ability to use javascript and to use neural networks. I’m sure it could be improved a lot by someone who really knows what they’re doing. There really isn’t even a good reason to convert it to greyscale in the javascript implementation either, I just held it over when porting it from my python version.
Timothy (January 24, 2009 at 9:21 am)

It’s pretty impressive, and complex. But the captchas on megaupload are simplistic. This wouldn’t work well on most sites.
George Glass (January 24, 2009 at 11:28 am)

For some reason I’m not too impressed, cause this kinda sounds like a hack to me. Meaning this may work on a particular instance of captcha,but could easily be overriden. If their were some principles there that made it more generalizable then I’d really be impressed.

I think the kind of image and letter recognition software needed to crack most captchua’s is beyond the scope of javascript. Their have been some impressive gains lately in the rapid feed forward modeling of human vision. It is more likely that we will see these complex algorithms in external software breaking turing tests.
podunk (January 24, 2009 at 11:48 am)

@Andrew Dupont

Do you have an alternative to captchas that should be used instead? Or are you one of the nefarious few who profit from automated spamming of forums across the internet? Or perhaps you simply enjoy spam posts more than legitimate ones? I surely hope it’s not one of the latter two options.
Msr (January 24, 2009 at 12:13 pm)

Hey, thanks to whomever created this. I guess we can expect them to start using the impossible-to-read garbage that Google, Yahoo and everyone else is using. It’s nice to be able to do it in one guess instead of three or four, but I guess it’s too important to hack everything.
DoubtingThomas (January 24, 2009 at 12:24 pm)

@podunk

Oh, come on. There are more than enough empirical studies that show that getting in the way of user access *degrades* community discussion. Captchas have been broken as a concept for a while now – just because we don’t have a good answer for how to fix the problem doesn’t mean we have to put blinders on and claim that they’re a good answer.

Anyway, as tech demo this is great.
IceBrain (January 24, 2009 at 2:12 pm)

@podunk: It’s much better if a guy cracks a captcha and publishes the code, possibly helping the site change to something more secure, than letting some spammer develop something like this secretly.
Read http://slashdot.org/features/980720/0819202.shtml

On topic, great job! That will make me learn about neural networks :P
Michal Migurski (January 24, 2009 at 3:18 pm)

Verrrry interesting.

This work fits a fairly established pattern: old ideas implemented in new, seemingly inappropriate language, to great amazement. Actionscript went through a similar phase about 8 years ago with various drag/drop interactive widgetry. This one’s an application of well understood visual search techniques that have got to be at least 20 years old.

The interesting bit is always the new context granted by the language. What does javascript or greasemonkey get you, here? It runs in a browser, on the web, and can be made to send results back to some central location, so there’s a new kind of results sharing David Bolter mentions above. It also stretches the boundaries of javascript somewhat, which I expect will result in more attention being paid to image/pixel manipulation by engine writers, just like all that Praystation business in 2000/01 led Macromedia to more seriously consider the use of Actionscript for desktop-like interactions.
Richard Lopes (January 24, 2009 at 3:25 pm)

This is a very clever use of Javascript.
At least it is good to see advanced programming in Javascript that is not DOM, UI or performance related.

It reminds us that:
– we now have the means to do advanced stuff in the browser (thanks canvas in that case)
– Javascript is a powerful language and you can implement advanced algorithm like in any other language
– Javascript performance in the browser got better and allows expensive computation
– people minds are amazing

Regards,

Richard Lopes
James (January 24, 2009 at 3:58 pm)

Captchas are one of the worst things on the net, if your having to resort to that your failing as a developer.

Really hope this will get rid of them
Waldo (January 24, 2009 at 5:49 pm)

To me it’s not the Javascript code that terribly interesting; it’s really the canvas element that makes this whole thing even remotely possible.

I didn’t realize until now that the canvas element had APIs that allowed you to stuff any image in and extract pixel data.
jfing googlit (January 24, 2009 at 7:25 pm)

Just google for “CAPTCHA Breaker” or pwntcha and you’d have much more advanced captcha breakers capable of much more. But javascript won’t be able to handle other captchas very well because you need good linear algebra libraries to break harder captchas. Even a normalization step such as PCA requires eigenvectors. This means that JS will be wildly inappropriate.
Nox (January 24, 2009 at 10:07 pm)

This is brilliant. And if it helps people see how utterly useless CAPTCHAs are, so much the better.
Steve (January 24, 2009 at 11:48 pm)

I remember a while back some friends of mine working on some code to try to figure out how to record license plates as a project as we drove down the road. So you’d be on your way to work, and you could have your car computer tell you how many times it has recorded the cars around you on previous occasions.

The OCR was one trick to it, the other was that different states have different color combination.

I never would have imagined that somebody could write this code in javascript, that’s impressive.
Lenin (January 25, 2009 at 1:42 am)

Very interesting hack.
Marc (January 25, 2009 at 7:12 am)

@ James, DoubtingThomas, Andrew Dupont:

Like podunk I’m genuinely interested if you *do* have an alternative to captchas, or what you think *should* be used in their place until a better solution comes along.

Unprotected comment posting or account creation does not seem to be an option to me, so what would you do about it?

I would love to get rid of captchas and use something better, but until something better is available, I see no alternative. Do you?
podunk (January 25, 2009 at 12:10 pm)

@IceBrain:

I certainly agree that this example code is excellent and a worthy contribution to the arms race between spammers and those who want to protect forums from them. I simply would like to know if anyone denigrating captchas in general has an idea that is better than getting rid of captchas altogether. I’ve seen too many unprotected forums and services decline at the hands of spammers to accept that the solution is to have no mechanism whatsoever.
Glenn (January 25, 2009 at 12:53 pm)

Amy Hoy had an interesting column on CAPTCHAs a couple of weeks ago:

http://www.slash7.com/articles/2009/1/6/are-you-human-how-captcha-asks-the-wrong-question-solves-nothing
coldclimate (January 25, 2009 at 3:22 pm)

good lord, I think thats impressive enough to make me re-install greasemonkey. I have no need of it, but it’s so damned impressive.
Iraê (January 25, 2009 at 5:53 pm)

Awesome!

I think this kind of algorithm could be used in browsers speed tests. I wonder witch browser is faster to run neural networks and getting canvas pixel data.
DK (January 26, 2009 at 8:06 am)

Iraê: I’ve found Chrome to be the fastest at executing heavy JS!
Captcha (January 26, 2009 at 1:25 pm)

Why do y’all hate Captchas so much? What’s the alternative? Do you want your blogs overrun with SPAM??
D-eye (January 26, 2009 at 3:56 pm)

WOW, I am amazing with this, OCR and Neural Nets in JS???

Great Work, really great!
Lindo (January 26, 2009 at 7:02 pm)

This neural network could easily be setup to learn of its own accord; via detection of the ‘success’ and ‘failure’ states on the response after submitting the CAPTCHA-“protected” form. When an input is confirmed as either successful or failed, only then does it become of any value for use with future inputs…so to “teach” a neural network you either need to tell it what’s right or wrong (exhaustively) or give it the ability to make that decision itself (which can be done quite easily in this case!).
In terms of handling CAPTCHAs of greater complexity…just up the number of inputs (and of course the number of NN layers), will take longer but _can be done_!
PS: I
Adam Kahtava (January 27, 2009 at 10:56 am)

This is really cool!
joe (January 27, 2009 at 6:03 pm)

cool!!
Alex (January 27, 2009 at 9:01 pm)

how could you transform this to make it a web based aplication?
I have a php method because php can read grayscale images too but I was wondering how could you put javascript on a site and make it work like the online example and then auto-get download (I just want to know how can I put this to work in a web aplication by itself without need use grease monkey)
-greetz-
josh (January 28, 2009 at 4:06 am)

When you first posted you had a nice list of tools you used to prettify the source code. It appears as if you have elided this bit since the author has opened up the source. Could you please list those tools again? They sounded handy.
Evil Spammer (January 28, 2009 at 10:20 pm)

> Do you have an alternative to captchas that should be used instead? Or are you one of the nefarious few who profit from automated spamming of forums across the internet?

You’re pro-choice? So you like to KILL BABIES?

Seriously, get over yourself.
Davide Setti (February 4, 2009 at 4:25 pm)

It would be really interesting to see WorkerPool in action with a better implementation of this “OCR”
Giorgio (February 18, 2009 at 8:49 am)

How can I get this to work with jdownloader?
matt (February 20, 2009 at 9:16 am)

MegaUpload have redesigned their site (and captcha) so this no longer works. Ah well, great whilst it lasted.
Human_Bagel (February 23, 2009 at 3:07 pm)

Wow…
I am SO nerdgasming on this…just…WOW!

I had no idea this could even be done in JavaScript, I have to play with this concept on my own.

Again, this is absolutely incredible, the only CAPTCHA breaking scripts I have ever seen were based in Perl.

*hats off*
Congratulations!
Yansky (February 25, 2009 at 2:08 pm)

“The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.”

I don’t quite understand this bit. I’ve looked through the source code, but I can’t figure out how it accesses the captcha image to be copied pixel-by-pixel.
Think about it (February 25, 2009 at 5:32 pm)

Think about it. If it weren’t for captchas, these comments would be nothing but spam.
David Harris (February 26, 2009 at 10:55 am)

how about the 25Billion simple things that humans could answer in an instant, assuming you’re capable of actually using the internet you should as well be able to. like for instance: won plus won iz how many?

now take that simple statement, apply whatever idiotic imaging you want to, put the statement “interperet the following” “decipher what follows” “figure this out” “the answer is” make all of them completely independant, and have lets just say for reference sake google (i’d assume they’d like to assist in spam prevention especially if they could make some profit from the process, as well as the capacity to do so) centralize distribution with simple logarithms routed through other partnership sites using a dedicated portion of resources, and thus potential for profiting from use of resources as well as no added cost to end user…………..
Dean (February 28, 2009 at 11:16 pm)

It is irrelevant as to how hard you make captchas because of the supply and demand effect. Spammers have a great need to do their spamming and an entire industry is being born from captcha’s. Captcha images are being routed to india to be broken by humans whom are paid 80 cents per 1000 images broken. So you can come up with a million ideas to stop it but in the end it doesnt matter because you will have humans cracking captchas by the thousands.
IceBrain (March 1, 2009 at 12:20 pm)

@Dean: If the spammer uses real people no kind of turing test can protect the system, but by forcing the spammer to hire people you’re raising the costs and lowering his profit. And at least we give more people a job :P
Will Dwinnell (March 19, 2009 at 12:51 pm)

This is a cool project. It would be interesting if you could provide a set of pre-processed data, so that we could try different character recognition solutions on it.

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Subscribe for email updates

@jeresig / Mastodon

Infrequent, short, updates and links.

OCR and Neural Nets in JavaScript

47 Comments (Show Comments)

Alan Hogan (January 23, 2009 at 8:17 pm)

Andrew Dupont (January 23, 2009 at 8:23 pm)

Mike Taylor (January 23, 2009 at 8:24 pm)

David Bolter (January 23, 2009 at 8:41 pm)

David Bolter (January 23, 2009 at 8:43 pm)

Jon Baer (January 23, 2009 at 8:49 pm)

John Resig (January 23, 2009 at 8:55 pm)

Adam Schwartz (January 24, 2009 at 3:01 am)

Shaun Friedle (January 24, 2009 at 4:59 am)

Timothy (January 24, 2009 at 9:21 am)

George Glass (January 24, 2009 at 11:28 am)

podunk (January 24, 2009 at 11:48 am)

Msr (January 24, 2009 at 12:13 pm)

DoubtingThomas (January 24, 2009 at 12:24 pm)

IceBrain (January 24, 2009 at 2:12 pm)

Michal Migurski (January 24, 2009 at 3:18 pm)

Richard Lopes (January 24, 2009 at 3:25 pm)

James (January 24, 2009 at 3:58 pm)

Waldo (January 24, 2009 at 5:49 pm)

jfing googlit (January 24, 2009 at 7:25 pm)

Nox (January 24, 2009 at 10:07 pm)

Steve (January 24, 2009 at 11:48 pm)

Lenin (January 25, 2009 at 1:42 am)

Marc (January 25, 2009 at 7:12 am)

podunk (January 25, 2009 at 12:10 pm)

Glenn (January 25, 2009 at 12:53 pm)

coldclimate (January 25, 2009 at 3:22 pm)

Iraê (January 25, 2009 at 5:53 pm)

DK (January 26, 2009 at 8:06 am)

Captcha (January 26, 2009 at 1:25 pm)

D-eye (January 26, 2009 at 3:56 pm)

Lindo (January 26, 2009 at 7:02 pm)

Adam Kahtava (January 27, 2009 at 10:56 am)

joe (January 27, 2009 at 6:03 pm)

Alex (January 27, 2009 at 9:01 pm)

josh (January 28, 2009 at 4:06 am)

Evil Spammer (January 28, 2009 at 10:20 pm)

Davide Setti (February 4, 2009 at 4:25 pm)

Giorgio (February 18, 2009 at 8:49 am)

matt (February 20, 2009 at 9:16 am)

Human_Bagel (February 23, 2009 at 3:07 pm)

Yansky (February 25, 2009 at 2:08 pm)

Think about it (February 25, 2009 at 5:32 pm)

David Harris (February 26, 2009 at 10:55 am)

Dean (February 28, 2009 at 11:16 pm)

IceBrain (March 1, 2009 at 12:20 pm)

Will Dwinnell (March 19, 2009 at 12:51 pm)

Secrets of the JS Ninja

Subscribe for email updates

@jeresig / Mastodon