January 23rd, 2009
A pretty amazing piece of JavaScript dropped yesterday and it's going to take a little bit to digest it all. It's a GreaseMonkey script, written by 'Shaun Friedle', that automatically solves captchas provided by the site Megaupload. There's a demo online if you wish to give it a spin.
Now, the captchas provided by the site aren't very "hard" to solve (in fact, they're downright bad - some examples are below):


But there are many interesting parts here:
- The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
- The script includes an implementation of a neural network, written in pure JavaScript.
- The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used - in a sort of crude form of Optical Character Recognition (OCR).
If we crack open the source code we can see how it works. A lot of it comes down to how the captcha is implemented. As I mentioned before it's not a very good captcha. It has 3 letters, each in a separate color, using a possible 26 letters, and they're all in the same font.
The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.
function convert_grey
(image_data
){
for (var x =
0; x < image_data.
width; x++
){
for (var y =
0; y < image_data.
height; y++
){
var i = x*
4+y*
4*image_data.
width;
var luma = Math.
floor(image_data.
data[i
] *
299/
1000 +
image_data.
data[i+
1] *
587/
1000 +
image_data.
data[i+
2] *
114/
1000);
image_data.data[i] = luma;
image_data.data[i+1] = luma;
image_data.data[i+2] = luma;
image_data.data[i+3] = 255;
}
}
}
The canvas is then broken apart into three separate pixel matrices - each containing an individual character (this is quite easy to do - since each character is a separate color, they're broken apart just based upon the different colors used).
filter(image_data[0], 105);
filter(image_data[1], 120);
filter(image_data[2], 135);
function filter
(image_data, colour
){
for (var x =
0; x < image_data.
width; x++
){
for (var y =
0; y < image_data.
height; y++
){
var i = x*
4+y*
4*image_data.
width;
// Turn all the pixels of the certain colour to white
if (image_data.data[i] == colour) {
image_data.data[i] = 255;
image_data.data[i+1] = 255;
image_data.data[i+2] = 255;
// Everything else to black
} else {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}
}
}
}
Finally any extraneous noisy pixels are removed from the image (providing a clear character). This is done by looking for white pixels (ones that've been matched) that are surrounded (above and below) by black, un-matched, pixels. If that's the case then the matching pixel is simply removed.
var i = x*
4+y*
4*image_data.
width;
var above = x*
4+
(y-
1)*
4*image_data.
width;
var below = x*
4+
(y+
1)*
4*image_data.
width;
if (image_data.data[i] == 255 &&
image_data.data[above] == 0 &&
image_data.data[below] == 0) {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}
We're getting really close to having a shape that we can feed into the neural network, but it's not completely there yet. The script then goes on to do some very crude edge detection on the shape. The script looks for the top, left, right, and bottom-most pixels in the shape and turns it into a rectangle - and converts that shape back into a 20 by 25 pixel matrix.
cropped_canvas.
getContext("2d").
fillRect(0,
0,
20,
25);
var edges = find_edges
(image_data
[i
]);
cropped_canvas.
getContext("2d").
drawImage(canvas, edges
[0], edges
[1],
edges
[2]-edges
[0], edges
[3]-edges
[1],
0,
0,
edges
[2]-edges
[0], edges
[3]-edges
[1]);
image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0,
cropped_canvas.width, cropped_canvas.height);
So - after all this work, what do we have? A 20 by 25 matrix containing a single rectangle, drawn in black and white. Terribly exciting.
That rectangle is then reduced even further. A number of strategically-chosen points are then extracted from the matrix in the form of "receptors" (these will feed the neural network). For example a receptor might be to look at the pixel at position 9x6 and see if it's "on" or not. A whole series of these states are computed (much less than the full 20x25 grid - a mere 64 states) and fed into the neural network.
The question that you should be asking yourself now is: Why not just do a straight pixel comparison? Why all this mess with the neural network? Well, the problem is, with all of reduction of information a lot ambiguity exists. If you run the online demo of this script you're more likely to find the occasional failure from the straight pixel comparison than from running it through the network. That being said, for most users, a straight pixel comparison would probably be sufficient.
The next step is attempting to guess the letter. The network is being fed with 64 boolean inputs (collected from one of the extracted letters) along with another series of pre-computed values. One of the concepts behind how a neural network works is that you pre-seed it with some of the results from a previous run. It's likely that the author of this script simply ran it again and again and collected a whole series of values to get an optimal score. The score itself may not have any particular meaning (other than to the neural network itself) but it helps to derive the value.
When the neural net is run it takes the 64 values that've been computed from one of the characters in the captcha and compares it against a single pre-computed letter of the alphabet. It continues in the manner assigning a score for each letter of the alphabet (a final result might be 'A 98% likely', 'B 36% likely', etc.).
Going through the three letters in the captcha the final result is devised. It's not 100% perfect (I wonder if better scores would be achieved if the letter wasn't turned into a featureless rectangle before all these computations) but it's pretty good for what it is - and pretty amazing considering that it's all happening 100% in the browser using standards-based technology.
As a note - what's happening here is rather instance-specific. This technique *might* be able to work on a few more poorly-constructed captchas, but beyond that the complexity of most captchas just becomes too great (especially so for any client-side analysis).
I'm absolutely expecting some interesting work to be derived from this project - it holds a lot of potential.
Tags: javascript, canvas, greasemonkey
47 Comments on 'OCR and Neural Nets in JavaScript'
July 18th, 2005
If you haven't been keeping up on the recent security concerns with Greasemonkey - now's a good time to jump in. I had no idea that the problems where 'that bad' until today. I assumed that it was only possible to do something malicious within a user script, not outside of it (due to bad scoping issues). At least, until, this post caught my eye.
Uninstall Greasemonkey altogether. At this point, I don't trust having it on my computer at all. I would think that whoever is in charge of addons.mozilla.org should immediately remove the Greasemonkey XPI and post a large warning in its place advising people to uninstall it. --Mark
Backtracking through the entire security thread brings up quite a few serious problems. Currently, it's possible to do the following things:
Do not fear! - Headway is already being made. The main concern is that it's possible to access all of the above data outside of a user script's scope. Once this is resolved (and the afformentioned hack may just do that) then Greasemonkey will be back on the fast-track.
Tags: bugs, greasemonkey, firefox, extensions, security
3 Comments on 'Serious Greasemonkey Security Problems'
June 27th, 2005
The results are in! My AniWiki project placed second in the Waxy Automated Wikipedia Contest. The first place entry was really smooth and nicely put together, but I was able to get some scraps based purely on technical development, which is cool.
Although John Resig's AniWiki entry had several innovations, Dan wins because of the elegant Wikipedia integration and the ease of use. Dan's entry was the first to use a slider for navigation, allowing you to scrub across revisions with changes reflected in real-time, and I like the ability to switch between selected arbitrary ranges using the existing Wikipedia buttons or the entire revision history. It looks like a seamless part of Wikipedia. He'll receive $200, one Flickr Pro account, a $20 Threadless gift certificate, and the Socialtext Starter package.
Second place goes to John Resig's innovative AniWiki. Although I didn't like the slideshow navigation as much, I was blown away by his graphical chart of activity over time and the visual diffs written entirely in Javascript. (Dan Phiffer was inspired to add that same feature to his script after seeing John's implementation.) For his excellent work, John will receive $50 and a Flickr Pro account.
Probably the best feature to come out of the contest is the highly-usable Javascript Diff Algorithm that I made - and I'm sure will get some use all around the 'net. On a side note, I really hope these sort of 'lazyweb-free-for-all' contests happen more often, I really enjoyed myself and got some cash for my hard work. Maybe there should be a web site dedicated to managing these mini-contests.... anyone?
Tags: wikipedia, waxy, animate, contests, greasemonkey, javascript, wiki
2 Comments on 'Wikipedia Animate Results'
March 28th, 2005
The big news of the weekend was the release of the Delicious Tag Auto-Complete extension using Greasemonkey.
Friday afternoon, Julia mentioned the fact that an auto-complete utility for delicious would be very handy. So, a couple hours later, I had hacked one together, using Greasemoneky as the delivery device. I publicized it through the delicious mailing list and posted a link to my account, and within 24 hours I had the top spot on delicious popular - which is rather exciting. (I have a screenshot at home, which I will upload later.) I'm really intrigued by how quickly the whole thing propogated. Apparently, it doesn't take much to spread the word around the community of delicious users. I have a couple more projects up my sleeve that I'll probably release here within the week, and we'll have to see how well they fair in comparision.
Tags: auto, del.icio.us, greasemonkey, javascript, tags
Comment on 'Delicious Tag Auto-Complete'
March 28th, 2005
At the last Social Computing Club meeting an interesting idea came up for discussion. We were trying to figure out what the easiest possible way to schedule an event could be. But in order to do so, we needed to figure out where people got their event notifications from, so I've compiled a mini-list.
- Email - A lot of people plan new events by email. Some of these even do it by attaching a new ical event to the email for the recipients to add to their calendar. Attaching an event is the most efficient way for the recipients to manage the event, not necessarily so for the sender. The proposed solution, by Jon Schull, was to simply forward the email that you received with a subject line of "Tomorrow at 8, Meeting with Fred" (for example)to a specified email box. This will automatically update your calendar with this event and attach the email as data. This is would be very easy.
- Instant Messenger - I, personally, plan a lot of events through AIM. Similar to the email solution, one could simply forward a new event to an AIM bot. An issue with this, however, lies in the fact that you don't have the prior conversation automatically attached to the event (for context).
- Web Sites - Browsing around web sites and spotting a new event (such as 'FooBar Concert, 8pm, July 1, 2005') is the final location, that I can think of, where an event would exist. To test this theory, I wrote a quick GreaseMonkey hack which parses through some selected text, looks for something representing a date, and returns the date in a properly-formatted time (you can check it out here). Note: It doesn't actually do anything yet, but hopefully will soon. It currently only supports phrases like 'tomorrow', 'yesterday', 'evening', and 'morning' - which are much much easier to find then all the possible date formats.
In all, it's an intriguing problem: Constructing some form of an interface through which users can most easily maintain their calendar. At least one feature that I would find to be intriguing would be if someone says to you "Are you available tomorrow evening?" your calendar application would be able to tell you what time to meet would be best. and maybe even what location? Anyway, it's all just a bunch of speculation right now, but the Lab for Social Computing is going to try hacking on it and see if they can take it somewhere. I'll be interested to see what the results look like.
Tags: date, event, greasemonkey, planning, schedule, time
Comment on 'Date Extraction'
March 14th, 2005
Something that I've been tinkering around with the past couple of days is the concept of providing visual cues to associate a name with a face, so to speak. For example, I find it to be much easier, mentally, to make the connection between someones face and who they are then someone's cryptic username (which, in turn, is associated with someone's name, then associated to a face - a much, much slower process, for me, that results in a lot of dead ends). To combat this, I've been making a lot of changes to my personal data. The most notable of which is: Locating a headshot picture of all of your friends. In theory, I want to quickly and easily associate someone's online persona with their real life person. It's a challenge and I'm not yet sure how well it will go. However, in order to test it, the first step is to find as many friend headshots as possible. Here are a couple resources that I've used, thus far:
Now that you have a nice list of pictures for all of your friends, here is what you can do: Associate that picture with that person everywhere possible. The first thing that I did was to update the buddy icons for all of my AIM buddies. This gave me a highly usable visual buddy list to browse (also pictured above). The second step was to associate the images with all of my frequent email contacts. Thankfully, OSX makes this process terribly easy. I can take an email address/name from Mail.app, right-click, and add it to my address book. I can then edit the address book entry for that user and add their AIM buddy name. Now I've tackled two of my most frequently used forms of communication: Instant Messenger and Email, but that still leaves a large ocean uncharted: The web.
At this point in the game, I decided to go back to my old friend GreaseMonkey. Essentially, I wanted to write a script that would search through a page looking for certain names, nicknames, and usernames and insert an image to be associated with it. And so, that's what I did. Right now it's very rough around the edges and requires a lot of user customization.
- name2face - This script requires a lot of configuration. Please modify the data structure within the program to change which users you would like to match and display for, otherwise you'll just see a few of my friends, currently.
Ideally I'd like this plug-in to pull from some sort of a dynamic XML repository (possibly in FOAF format?) that could be updated easily. The results are very interesting. Browsing social networking sites, Gmail, and other forms of communication have taken on a whole new feel. I really feel that a service like this has a lot of potential and should be explored more fully, which I hope to do soon.
Tags: friends, greasemonkey, network, social, visual
27 Comments on 'Visual Friend Identification'