I decided on about a dozen categories to use with my DIY blog aggregator (QuakerQuaker).
I only want to pull in posts that are being generated for my site by
community members so we use a community identifier, a unique prefix
that isn’t likely to be used by others.
This post will show you how to pull in tagged feeds from three sources: the Del.icio.us social bookmarking system, the Flickr photo sharing site and Google Blog Search.
Step 1: Pick a community designator
I’ve been using the community name followed by a dot. The prefix
goes in front of category description to make a set of unique tags for
the aggregator. When someone wants to add something for the site they
tag it with this “community.category” tag. In my example, when someone
wants to list a new Quaker blog they use “quaker.blog”, “quaker” being
the community name, “blog” being the category name for the “New Blogs”
page.
Step 2: Collect the community prefix and category name in Pipes
You begin by going into Pipes and pulling over two text inputs: one for
the community prefix, the other for the specific category.
Step 3: Construct these into tags
Now use the “String Concatenation” module to turn this into the
“community.category” model. The community input goes into the top slot,
a dot is the second slot and the category input goes into the last slot.
Now, when you have a tag in Flickr with a dot in it, Flickr automatically removes it in the resultant RSS feed.
So with Flickr you want your tag to be “communitycategory” without a
dot. Simple enough: just pull another “String Concatenation” module
onto your Pipes work space. It should look the same except that it
won’t have the middle slot with the dot.
Step 4: Turn these tags into RSS URLs
Pull three “URLBuilder” modules into Pipes, one for each of the
services we’re going to query. For the Base, use the non-tag specific
part of the URL that each service uses for its RSS feeds. Here they are:
Del.icio.us | http://del.icio.us/rss/tag |
Flickr | http://api.flickr.com/services/feeds |
Google Blog Search | http://blogsearch.google.com |
Under path elements, put the correct tag: for Del.icio.us and Google it should be the community.category tag, for Flickr the dot-less communitycategory tag.
Step 5: Fetch and Dedupe
Fetch is the Pipes module that pulls in URLs and outputs RSS feeds. It can also combine them. Send each URLBuilder output into the same Fetch routine.
Since it’s possible that you’ll might have duplicate posts, use the “Unique” module to deduplicate entries by URL.
Through a little trial and error I’ve determined that in cases of
duplicates, feeds lower in the Fetch list trump those higher. In the
actual Pipe powering my aggregator I pull a second Del.icio.us feed: my
own. I have that as the last entry in the Fetch list so that I can
personally override every other input.
Step 6: Sort by Date
With experimentation it seems like Pipes orders the output entries by
descending date, which is probably what you want. But I want to show
how Pipes can work with “dc” data, the “Dublin Core” model that allows
you to extend standard RSS feeds (see yesterday’s post for more on this).
Google Blog Search and Del.icio.us feeds use the “dc:date” field to
record the time when the post was made. Flickr uses “dc:date.Taken” to
pass on the photograph’s metadata about when it was taken. Pipes’
“Rename” module lets you copy both fields into one you create (I’ve
simply used “date”), which you can then run through its “Sort” module.
Again, it’s a moot point since Pipes seems to do this automatically.
But it’s good to know how to manipulate and rename “dc” data if only
because many PHP parsers have trouble laying it out on a webpage.
Update: it’s all moot: according to a ZDNet blog, “Pipes now automatically appends a pubDate tag to any RSS feed that has any of the other allowable date tags.” This is nice: no need to hack the date every time you want to make a Pipe!
Step 7: Output
The final step for any Pipe is the “Pipe Output” module.
In action
You can see this published Pipe here, and copy and play with it yourself. The result lets you build an RSS feed based on the two inputs.