Blog


Overview of Processing

This weekend I gave two talks at BarCamp Rochester (which was very well put together and quite enjoyable) - one on jQuery and a very quick one on the Processing language. I've deconstructed my slides into some bullet points here. If you're not familiar with the language, or what it's capable of, this should give you a good overview.

Processing is a data visualization programming language.

It has three components:

  • The Processing language
  • The Processing drawing API
  • The implementation (in Java - can optionally pass the drawing API through to OpenGL).

The Processing Language and API

  • Strictly typed
  • Has classes, inheritance
  • Includes a bunch of globally-accessible functions (the drawing API - very flat and PHP-like).

Basic Program Structure

Two core methods: setup() and draw()

  • Very OpenGL-like
  • draw() is called continually at a specific framerate (if none is specified then it goes as fast as possible)

Simple example: Drawing a continuous line with the mouse.

void setup() {
  size(200, 200);
  background(102);
}

void draw() {
  stroke(255);
  if(mousePressed) {
    line(mouseX, mouseY, pmouseX, pmouseY);
  }
}

Initialization

  • setup() is called initially
  • size(...) - set the width/height of the drawing area
  • Can include calls to any other number of methods, such as: background(...) - (which draws and fills a background with a specified color).
  • Note: All colors are done in RGBA. background(102) is equivalent to background(102,102,102,255) (opaque gray color)

The draw() loop

  • draw() gets called as fast as possible, unless a frameRate is specified (with framerate(20), for example). You can disable any looping by calling noLoop().
  • stroke() sets color of drawing outline (the color of lines, points, and the outsides of polygons)
  • fill() sets inside color of drawing (inside of polygons)
  • mousePressed is true if mouse is down
  • Very different from typical asynchronous events - since program keeps looping we get state updates automatically. (Unless you specify mousePressed as a global function - then it'll be called as a callback.)
  • mouseX, mouseY - mouse position, pmouseX, pmouseY - previous mouse position in last draw() call

Drawing

Different drawing methods: line(), rect(), arc(), ellipse(), point(), quad(), triangle(), bezier(), etc.

All use stroke(), strokeWeight(), and fill().

Can also draw complex polygons using beginShape, endShapre, and vertex - like in this example.

  fill(127);
  beginShape();
  for (int i=0; i<segments; i++){
    vertex(ground[i].x1, ground[i].y1);
    vertex(ground[i].x2, ground[i].y2);
  }
  vertex(ground[segments-1].x2, height);
  vertex(ground[0].x1, height);
  endShape(CLOSE);

The Canvas

Very OpenGL-like. You can mutate the canvas rendering using: translate(), scale(), and rotate().

You can also save and restore the state of the canvas using: pushMatrix() and popMatrix().

A basic example using pushMatrix/popMatrix: A movable arm.

Classes

Can hold data, do inheritance.

Example: Bouncing an object off of rocky terrain

class Ground {
  float x1, y1, x2, y2, x, y, len, rot;
  Ground(){  }
  Ground(float x1, float y1, float x2, float y2) {
    this.x1 = x1; this.y1 = y1; this.x2 = x2; this.y2 = y2;
    x = (x1+x2)/2;
    y = (y1+y2)/2;
    len = dist(x1, y1, x2, y2);
    rot = atan2((y2-y1), (x2-x1));
  }
}

Math

A whole mess of math functions are provided, as well: dist(), map(), constrain(), abs(), floor(), ceil(), random(), noise(), atan2(), cos(), sin(), pow(), sqrt(), radians().

Images

Can be used to load in external images. Example: Animation of guy dancing.

int numFrames = 12// The number of frames in the animation
int frame = 0;
PImage[] images = new PImage[numFrames];
void setup(){
  size(200, 200);
  frameRate(30)
  for(int i=0; i<numFrames; i++) {
    String imageName = "PT_anim" + nf(i, 4) + ".gif";
    images[i] = loadImage(imageName);
  }
} 
void draw() {
  frame = (frame+1)%numFrames;
  image(images[frame], 0, 0);
}

Demos

Some fun demos that I really like:

  • Zipdecode - rendering all zipcodes in the country and searching them in real time.
  • Substrate - rendering a piece of art.
  • Genetic Trees - selectively breed and mutate trees.
  • World is not round - live VJing a song using Processing and a set of physical input controls.

Books

Tags: graphics, processing, data, visualization, barcamp

JavaScript Engine Speeds

Recently, I've been spending a lot of time analyzing the speed of pure JavaScript engines, looking at how well they perform and what their particular strengths and weaknesses are. To start with, I analyzed the bleeding-edge code from:

Right now I'm only looking at pure, JavaScript-only, tests (no tests of DOM or other APIs) and am NOT looking at the speed of the browsers' native JavaScript engine implementations. (So, even though you may see a speed for a particular engine, that does not directly correlate to the speed of the JavaScript running within the browser itself. There's always a significant amount of overhead required to run JavaScript code seurely within a browser, thus the efficiency of that security layer will frequently become a deciding factor in the results.

The four engines that I picked all had complete JavaScript implementations and usable JavaScript shells (that way I could feed my tests in and have them cleanly run).

To browse the results I've pulled together a simple application that can be used to view a representation of the data from all the major JavaScript engines paired with the code from the tests which run them.

Right now the browser works fine in Firefox, is quirky in Opera and Safari, and explodes in IE (it requires canvas support). I'll finesse it into shape when I have a little more time this week.

Note: This demo uses a bunch of functionality from the new jQuery UI library, including themes, tabs, accordion, and resizables.

Tags: analysis, speed, data, javascript, ecmascript

The Netflix Prize

I don't think I could possibly be any more giddy about something, than how I am concerning The Netflix Prize.

In short: Netflix's vote prediction algorithm gets a deviation of 0.95 stars away from predicting your vote for a movie. If you can do 10% better, they'll give you $1 million dollars.

That's awesome and all, but what's really awesome is their amazing training dataset. This is every data miners wet dream: 100,000,000 votes, 17,000 movies, 250,000 500,000 users.

They have two tests that you can run: One against your known data, and one that you'll submit to Netflix. As far as I can tell, your standing (aka, your current deviation) is made public. The lower your number, the higher your rank. Every year that the algo isn't improved by 10%, $50,000 is paid out to the current leader.

Another thing that I find to be interesting: Netflix gets the score that they do without assuming anything about the movie titles, genre, actors, etc. They just do straight number crunching. I'm impressed.

I've already got some techniques that I wanna try. I've got a feeling that I'm overly optimistic at this point, and that I'm going to be highly disappointed when I see my first score. But first, I have to generate my test bed and get to work, this is so cool.

I don't know what it is with me and large, nicely formatted, datasets, but I don't think there's anything that can get me more excited.

Tags: netflix, data_mining, data

Helping People With Data

From a BoingBoing post made earlier today a worthy cause came to my attention. There have been mountains of people in, and around, New Orleans who need to find out if they're relatives are OK - or even tell their relatives that they are OK. This data exists in poor databases that need to be improved - and the quality of the data better structured. If there are any data fans in the audience, then this is for you:

Social Source Foundation, CivicSpace Labs and Salesforce.com Foundation are working with a wide community to solve a simple problem.

Refugee records are in databases spread across the web. What if everyone published their data in a standard format into a central database? A refugee could look in one place for records from across the web.

We have released a data standard and specs for populating a central database. We need community organizers to lead the effort of populating the central database.

Katrina PeopleFinder Project: Implementing data exchange from existing sites to central database

I highly recommend that you give this site a visit and see what you can do.

Tags: scrape, database, hurricane, support, data

Dictionaries and Word Lists

The other day I was working on a new application which needed to process large batches of words - as comprehensively as possible. After some quick searches I found that there are (unsurprisingly) a number of freely available dictionary/wordlist files available on the Internet.

The first repository that I tried was that of one hosted on Sourceforge, simply called 'Wordlist'. Many of the lists hosted on that page are spell-checker centric, but the 12 Dicts package, in particular, was rather comprehensive. It originally contained 12 dictionaries, which has since been pruned down. Within the package there are a number of different dictionaries, some contain old English words, some have hyphenated words, some have acronyms, etc. You need to use the grid, that they provide, to determine which package is best suited for you. After doing some work with this list, however, I determined that it simply wasn't comprehensive enough for me (at 74,000 words).

After some more digging I came across the public domain list called ENABLE, which is overwelmingly comprehensive. This particular list is used in just about every word game on the planet - containing approximately 173,000 words! This particular list is very clear-cut and has no limitations imposed as to the words contained within it. If you need a word list for any of your upcoming projects, I highly recommend it!

Tags: data, words, dictionary

Universal Transverse Mercator (UTM)

A side project that I'm currently working on, involving maps, had a bit of data that I was unfamiliar with. I'll start by saying that I'm very familiar with Latitude/Longitude - it's something that everyone learns in school. However, the data that I was provided with was of an entirely different sort and labeled as 'Easting' and 'Northing'. A couple Google searches later brought me to a an article explaining the concept of something called Universal Transverse Mercator (UTM). The article is very math heavy, but the premise is: The earth is broken down into tiny 'zones' around the ecuator and extending northing and south. Within these zones a measurement, in metres, is taken both to the north and to the east of the ecuator. So your final figure is something like Zone: 17, Easting: 300,000, Northing: 4,000,000 (I just made those numbers up). The reasoning for this is that if your measurements are within a tinier 'slice' of the world, your results immediately become more accurate (especially considering the curvature of the earth). Here's a figure showing all the different zones, taken from this page on different coordinate systems:

Now, using that chart above I determined the zone in which I needed to run all my coordinates through - but the problem is: How to convert them into Lat/Long - which every piece of mapping software understands. There are a number of Java Applets which will do this task, but I wanted something that I could automate. This is where Perl, as always, comes to the rescue - there is a module for it! The module is very lightweight, it simply implements the algorithms specified in the article, mentioned before. Useing that module I was able to quickly run through my data and get perfect numbers out - concluding my UTM adventure. As always, more information can be found on Wikipedia.

Tags: maps, geography, data, conversion, utm, geo

Data Grab Bag

  • In the new release of Google Earth, there's an exciting feature that lets you dynamically load geographical data in from other sources. They even have an markup language for it called KML. People have already started putting Flickr photos ontop of the maps.
  • On a similar note, if you have a Geotagged RSS feed that you want to put onto a Google Map, you should consider giving this utility a try.
  • The newer versions of Microsoft Word save their documents in an XML format. So it was only a matter of time before someone wrong an XSL template to generate these documents.
  • Do you have a lot of text that you want converted into speech? You should check out the say command on OS X.
  • Google now has built in currency conversion. I've been waiting for this for a long time, considering that you've been able to convert units of measurement and weight for the longest time, this step only seemed logical.
  • Interested to see how the moods of large-scale communites fluctuate over time? The Livejournal Mood Browser does just that, with informative graphs too!
  • What's better then a free textbook on Graph Theory? Not much!

Tags: graph, rss, osx, data, geo, theory, free, livejournal, xml, xslt, geotag, google, earth, kml

Number of RSS Readers

A piece of information that I've been analyzing, in my spare time, is the number of readers on this web log. How this is done can be very tricky, as there are a number of factors (people can click your RSS feed and 'view' it in their browser, but it doesn't mean that they're reading it on a regular basis). Regardless, the easiest way to figure out, approximately, how many readers you have is to count the numbers provided by news aggregators in their user agent string. Some information on common user agent formats can be found in an excellent write up on InsideGoogle.

I've also pulled together some code, from a Perl application that I'm writing in my spare time, if you're interested in tracking something like this yourself.

my %rss = (
  "Blog" => ["/index.rdf","/?p=rss","/blog/index.rdf"],
  "Links" => ["/links/index.rdf"],
  "Projects" => ["/projects/index.rdf"]
);

my @rss_names = qw( users subscribers readers );
my %rss_count = ();
my %rss_ip = ();

sub rss {
  my ( $page, $user, $ip ) = @_;
  my $found = 1;

  foreach my $i ( keys %rss ) {
    foreach ( @{ $rss{ $i } } ) {
      if ( $page eq $_ ) {
        unless ( exists $rss_ip{ $i }{ $ip } ) {
          my $count = 1;
          foreach ( @rss_names ) {
            if ( $user =~ /(\d+) $_/i ) {
              $count = $1;
            } elsif ( $user =~ /$_ (\d+)/i ) {
              $count = $1;
            }
          }
          $rss_count{$i} += $count;
          $rss_ip{ $i }{ $ip } = 1;
        }
        return 0;
      }
    }
  }

  return 1;
}

In a nutshell, this is what the code is doing: Each RSS feed is analyzed, of which each feed can have multiple URLs. The URLs for the RSS feeds are specified in the first declaration:

my %rss = (
  "Blog" => ["/index.rdf","/?p=rss","/blog/index.rdf"],
  "Links" => ["/links/index.rdf"],
  "Projects" => ["/projects/index.rdf"]
);

(This pieces of code is what I use on my weblog.) I especially like the multiple URLs to RSS feed due to mis-behaving news aggregators not following updated permanent redirects. This way I can make sure that everyone reading the same content is pulled together.

The next aspect of RSS tracking lies in figuring out if the IP of the RSS user is unique, or not. Currently, this is the only way to track users who don't use some form of a public aggregator and only pull information using some form of a desktop news application.

The main subroutine, itself, accepts three arguments. $page takes the URI of the requested page (e.g. /index.html). $user takes the user's user agent string. $ip takes the user's IP. The best way to use this subroutine is by iterating over your web server access logs (whatever form they may be in), parsing out the three pieces of information described above, and feeding it into this method.

After you're done parsing all the requested information from your logs, you now have a nice little hash of information, that will look something like this:

%rss_count = (
  "Blog" => 155,
  "Links" => 31,
  "Projects" => 45
);

Unfortunately, you end up having to take this figures with a grain of salt, considering that users sometimes request a feed, but end up not becoming a regular subscriber. You'll probably notice that you're subscription numbers fluctuate on a day-by-day basis, this is mostly due to the fact that different numbers of people read on different days of the week (weekends are very slow reader days).

So, play around with this code, have some fun - I'm hoping to release a full stats app that I've developed (using the above code), here soon.

Tags: rss, blogs, news, aggregator, data, analysis, stats

· « Previous entries

Current Projects

jQuery JavaScript Library

jQuery

Comprehensive DOM, Event, Animation, and Ajax JavaScript Library.

Recent Projects

Pro JavaScript Techniques

JavaScript Book

The best techniques for professional JavaScript. Published by Apress.