Blog


Helping People With Data

From a BoingBoing post made earlier today a worthy cause came to my attention. There have been mountains of people in, and around, New Orleans who need to find out if they're relatives are OK - or even tell their relatives that they are OK. This data exists in poor databases that need to be improved - and the quality of the data better structured. If there are any data fans in the audience, then this is for you:

Social Source Foundation, CivicSpace Labs and Salesforce.com Foundation are working with a wide community to solve a simple problem.

Refugee records are in databases spread across the web. What if everyone published their data in a standard format into a central database? A refugee could look in one place for records from across the web.

We have released a data standard and specs for populating a central database. We need community organizers to lead the effort of populating the central database.

Katrina PeopleFinder Project: Implementing data exchange from existing sites to central database

I highly recommend that you give this site a visit and see what you can do.

Tags: scrape, database, hurricane, support, data

SQL Issues

I've been doing a lot of SQL work on the Schedule Maker these past couple of days, when I realized something. The SQL query for a relatively common command is going to be absolutely huge! Observe:

-- Get all my friend's current schedules:
SELECT * FROM   friends, semesters, schedules, schedule_sections,   
sections, section_times WHERE   friends.user_id='jeresig' AND   
semesters.semester_current = '1' AND   
schedules.semester_id = semesters.semester_id AND   
schedules.schedule_default = '1' AND   
schedules.user_id=friends.friends.friend_id AND
chedule_sections.schedule_id=schedules.schedule_id AND
sections.section_id=schedule_sections.section_id AND 
section_times.section_id=sections.section_id;

That's six joins. Insane. Especially since I was planning on having the results of this query be the main page whenever you log in to your account. So, this leads me to the next issue: How can this be resolved? It's not completely apparent, as of yet, but these are a few of the modifications that I've made so far:

  • I've switched from PostgreSQL back to MySQL. MySQL is faster and more supported. I was using PostgreSQL originally for its nice advanced features (stored procedures, schemas, etc.) when I found that I was only using schemas in the end. That's hardly a cause to stay with the database, especially considering that I can get better performance elsewhere. So, I'm still digging around as to how to work around this, I'll have to see.
  • Searches were being performmed in memory, this has been moved to MySQL Full Text searches. In memory searches are fast (obviously) however, costly. I was storing about 40mb of data in memory per school. These adds up after a while and doesn't work to well when I want to add more data or restart the server. First off, I had no idea that full text searches existed and man are they nice. Boolean queries, query expansion - the works! It's search engine for dummies time! I'm definitely going to have to play around with this some more.
  • Finally, I began to realize that so many of my data was going to be virtually static (schedules, course data, etc. all change very infrequently). This is starting to push me in the direction of having a static XML copy of all data in a nicely organized directory structure. Observe:
    /users/ /users/jeresig/
    /users/jeresig/friends/
    /users/jeresig/friends/fred -> /users/fred
    /users/jeresig/friends/ted -> /users/ted
    /users/jeresig/schedules/
    /users/jeresig/schedules/20043/mysched.xml
    /users/jeresig/schedules/20043/default.xml ->
      /users/jeresig/schedules/20043/mysched.xml
    /users/jeresig/schedules/current.xml ->
      /users/jeresig/schedules/20043/default.xml
    Now observe this simple shell query:
    cat /users/jeresig/friends/*/schedules/current.xml
    Using only that command I can effectively acheive the same result of the massive SQL query above. I'm definitely going to explore this more as I'm still not sure if this is a viable way of going about this, but I am intrigued by the simplicity in it. (Also, since all the files are static, the caching implications are going to be quite interesting.)

I hope to report back soon with some of my final findings. I hope to talk with some of my database guru friends and see what we can cook up as being an ideal solution.

Tags: programming, database, sql

Current Projects

jQuery JavaScript Library

jQuery

Comprehensive DOM, Event, Animation, and Ajax JavaScript Library.

Recent Projects

Pro JavaScript Techniques

JavaScript Book

The best techniques for professional JavaScript. Published by Apress.


Hosting provided by the cool dudes at Engine Yard.