Tag Archives: web

Shaving yaks and finding feeds

So I had some interesting ideas I wanted to play with to do with keeping on top of streams of information.

Of course, I needed some streams of information to keep on top of in order to do this. I decided to go with my RSS feeds (the other obvious source being twitter).

To do that I needed a database of feed entries. So I created a small program to do that (I really should just have used feed-bag, but there were some things I wanted to tweak and integrate so I didn’t).

Unfortunately for whatever reason I ended up with a lot of URLs that pointed to sites or something invalid in my opml. I’m not sure offhand if this was an import problem or a problem in the google reader export.

So, I thought, let’s do our damnedest to correct URLs: If it points to a site do feed discovery, follow redirects, etc. It can’t be that hard.

Cue me getting very angry. Suffice it to say, if you do what I did and foolishly expect people on the web to follow standards you are very mistaken.

Anyway, after much hacking around trying to get this to work I decided to codify the various tricks into a library so you don’t have to share my anger. I’ve called this library feedify. This is very rude of me as there’s another ruby library called feedify, but given that it hit 0.0.1 in january 2008 and never updated since then I don’t feel too bad about stomping on its namespace.

Additionally I’ve put up an http interface to it. If you go to http://feedify.merobe.com/feed/(some url) then it will try to find a feed associated with that URL and redirect you to it. You can also run this service yourself – it’s included in the github project.

This is all very rough and liable to change at the moment. If you have any bug reports of URLs it misses or gets wrong I’d be very interested to receive them.

This entry was posted in programming and tagged , on by .

Rube Goldberg 2.0

I just voted up a link on reddit. The link doesn’t matter, but it started me thinking about the chain of events that followed it.

First, a bit of javascript submits some information back to reddit’s server. The python code receiving reddit stores this.

At some point later, some ruby code running on my slicehost requests a json file from reddit. The python serves this up, the ruby fetches it, parses it and hands it off to delicious.com, where a mix of php and C++ (I think) stores it in their backend.

At some point later a php script running on my blog fetches that data from delicious. It dumps it into a MySQL database.

At some point later still, someone comes along to my website. They see a link to the thing I voted up on reddit, served up by php.

Alternative titles: “Polyglotism now!”, “…except for too much indirection”, “He Knows The Unix Way”.

This entry was posted in programming and tagged , on by .