October 18th, 2005
I could give a thousand excuses, but I'm not entirely sure if they'd mean anything. September and October have been an incredibly busy time for me, school, moving, sickness, work - it's just piled on top of each other. Although, this doesn't mean that I've been slacking - I just haven't had time to make blog posts. Amusingly, according to my stats, there are now over 900 of you reading this weblog via RSS - which is pretty slick.
Here's a quick recap of some of the things that happened these past two months:
Today I won the Quirksmode addEvent re-coding contest. This particular contest was to write an implementation of addEvent and removeEvent that was completely cross-browser and usable. I like my submission, simply due to its brevity, and so did the judges.
Some of my past research, into Instant Messaging, was mentioned in the RIT Reporter (my school newspaper). It's kind of light and 'fluffy', but a good starter piece. If you're interested in this sort of thing, contact me, or check out the project page.
I was mentioned, in passing, in The Economist, concerning the recent trend in mash-up applications, specificially concerning the Yahoo Traffic RSS feed. I think the article is no longer accessible - but may be if you get a login.
Finally, one of my Google Maps projects went live - it was for a newspaper in Florida, the Herald Tribune, concerning the Save Our Homes initiative. My particular application allowed users to browse through their homes and see how their tax rates compared to their neighbors. From everything that I've heard, it's been quite successful. It makes me happy to bring cool technology (Google Maps) to people who wouldn't have used/seen it otherwise.
I'm going to be releasing a full-blown product within the next week, or so - it's very simple, but exciting, nonetheless. I hope people will get a kick out of it.
Tags: maps, google, news, im, javascript, rss, magazine
Comment on 'Fall Recap'
July 10th, 2005
A piece of information that I've been analyzing, in my spare time, is the number of readers on this web log. How this is done can be very tricky, as there are a number of factors (people can click your RSS feed and 'view' it in their browser, but it doesn't mean that they're reading it on a regular basis). Regardless, the easiest way to figure out, approximately, how many readers you have is to count the numbers provided by news aggregators in their user agent string. Some information on common user agent formats can be found in an excellent write up on InsideGoogle.
I've also pulled together some code, from a Perl application that I'm writing in my spare time, if you're interested in tracking something like this yourself.
my %rss =
(
"Blog" =>
["/index.rdf",
"/?p=rss",
"/blog/index.rdf"],
"Links" =>
["/links/index.rdf"],
"Projects" =>
["/projects/index.rdf"]
);
my @rss_names = qw( users subscribers readers );
my %rss_count = ();
my %rss_ip = ();
sub rss {
my ( $page, $user, $ip ) = @_;
my $found = 1;
foreach my $i ( keys %rss ) {
foreach ( @{ $rss{ $i } } ) {
if ( $page eq $_ ) {
unless ( exists $rss_ip{ $i }{ $ip } ) {
my $count = 1;
foreach ( @rss_names ) {
if ( $user =~ /(\d+) $_/i ) {
$count = $1;
} elsif ( $user =~ /$_ (\d+)/i ) {
$count = $1;
}
}
$rss_count{$i} += $count;
$rss_ip{ $i }{ $ip } = 1;
}
return 0;
}
}
}
return 1;
}
In a nutshell, this is what the code is doing: Each RSS feed is analyzed, of which each feed can have multiple URLs. The URLs for the RSS feeds are specified in the first declaration:
my %rss = (
"Blog" => ["/index.rdf","/?p=rss","/blog/index.rdf"],
"Links" => ["/links/index.rdf"],
"Projects" => ["/projects/index.rdf"]
);
(This pieces of code is what I use on my weblog.) I especially like the multiple URLs to RSS feed due to mis-behaving news aggregators not following updated permanent redirects. This way I can make sure that everyone reading the same content is pulled together.
The next aspect of RSS tracking lies in figuring out if the IP of the RSS user is unique, or not. Currently, this is the only way to track users who don't use some form of a public aggregator and only pull information using some form of a desktop news application.
The main subroutine, itself, accepts three arguments. $page takes the URI of the requested page (e.g. /index.html). $user takes the user's user agent string. $ip takes the user's IP. The best way to use this subroutine is by iterating over your web server access logs (whatever form they may be in), parsing out the three pieces of information described above, and feeding it into this method.
After you're done parsing all the requested information from your logs, you now have a nice little hash of information, that will look something like this:
%rss_count = (
"Blog" => 155,
"Links" => 31,
"Projects" => 45
);
Unfortunately, you end up having to take this figures with a grain of salt, considering that users sometimes request a feed, but end up not becoming a regular subscriber. You'll probably notice that you're subscription numbers fluctuate on a day-by-day basis, this is mostly due to the fact that different numbers of people read on different days of the week (weekends are very slow reader days).
So, play around with this code, have some fun - I'm hoping to release a full stats app that I've developed (using the above code), here soon.
Tags: rss, blogs, news, aggregator, data, analysis, stats
Comment on 'Number of RSS Readers'
June 3rd, 2005
Google just released an interesting feature that I'm sure is going to be getting a lot of play in the upcoming days. The premise is that you generate a site map of your web site, submit it to Google every time your content changes, and your site will get crawled sooner. I'm going to investigate this more, but in itself it sounds interesting.
The important part is what is tucked away in the FAQ:
We also accept RSS 2.0 and Atom 0.3 syndication feeds, using the link/lastMod fields.
This is huge! This means that any web site that has an RSS feed simply submits their RSS feed URL and they will instantly be indexed faster and sooner then before. Maintain a blog? Simply submit your RSS feed on the
Add a Sitemap page and you're off to the races.
Tags: google, sitemaps, news, webmasters, crawl, robots
3 Comments on 'Google Sitemaps'