Thursday, August 13, 2009

Pimpin' the blog part #2

In this episode of Pimpin' the blog we are going to have a look at how I will sync my data between Blogger.com and my new hosting platform. This involves a not well known feature from .NET in relation to RSS and using Linq2SQL. Eventually we'll end up with a tool that reads by Blogger RSS Feed and stores it in a SQL Server 2005 Database (or any other compatible database for that matter).

Thoughts on synchronization
As I laid out in part #1 I'm far from ready to give up my Blogger account as I still have many things to replace before I can do so. However I don't feel much for keeping two stores for the same information synchronized by hand. I have better things to do with my time than that (altough not that much better :-) ).
The first thing that came to mind was actually RSS as it keeps all the blog aggregate sites up to date as well, so why not use that? Besides, the final trigger to do all this was to customize my RSS feed in the first place, so why not use it as a source?
As a good developer I also went to see if I had any alternatives. I could opt for a HTTP/HTML spider, but it would be awkward, messy and complex. I could try and automate the export process for Blogger blogs, but again, awkward, messy and complex.

Loading a feed
So RSS it is then. The entire process is relatively simple:
  1. Get the feeds xml content
  2. Parse the feed into articles, etc.
  3. Store the articles and related data in the database
The first step is easy. Just take an HttpWebRequest and point it at the feed. Here is the code:

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(FeedUrl);
if (UseProxy)
{
request.Proxy = newWebProxy(ProxyUrl);
}

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();

So as you can see. it first sets up the HttpWebRequest, so it can get to the RSS Feed (using a proxy if needed). Then it just gets the response stream which then contains the XML for the RSS Feed.

The next step got me thinking. The first solution that popped into my mind was to use Linq2Xml. However that would involve a lot of code to get to all the different parts of information I needed. I googled around, read some blogs, until I ran into someone mentioning the SyndicationFeed object that's new in .NET Framework 3.5. I figured I would give that a try to see how it works and I could always go back to parsing the feed myself.

Here is the code to actually load the response stream into a SyndicationFeed instance:

XmlReader reader = XmlReader.Create(responseStream);

Feed = SyndicationFeed.Load(reader);

Wow, that was easy, now wasn't it? Keep in mind however that you do need to add a reference to both System.ServiceModel and System.ServiceModel.Web to make this work.
What I ended up with is a class that would handle loading the feed into a SyndicationFeed object that handled everything I needed in under fifty lines of code!

So that tackled step two of the process.
All that's left is to store it into the database. As I mentioned earlier, I chose to use Linq2Sql to handle this for me. Why? I have had extensive experience with Entity Framework and I do think that for large solutions it can be a good choice, however it does take a lot of effort to make it do what you want, which is not what I needed here.
I read up on Microsofts strategy on data access and why both Entity Framework and Linq2Sql are pushed and found out that Linq2Sql is actually meant to support RAD on smaller projects, or at least for smaller data access layers. As my data model only consists of three tables, I guess my project would qualify as small.



I'm not going to bother you with the details on how I stored my articles trough Linq2Sql and just go ahead and post a link to the code below.
The main program to control this is more interesting:

string feedUrl = ConfigurationManager.AppSettings[FeedUrlConfigKey];
RssFeedReader reader = newRssFeedReader(feedUrl);
reader.UseProxy = UseProxyTrueValue.Equals(ConfigurationManager.AppSettings[UseProxyConfigKey],
StringComparison.OrdinalIgnoreCase);
if (reader.UseProxy)
{
reader.ProxyUrl = ConfigurationManager.AppSettings[ProxyUrlConfigKey];
}
reader.ReadFeed();

foreach (SyndicationItem feedItem in reader.Feed.Items)
{
List<string> categories = newList<string>();
foreach (SyndicationCategory category in feedItem.Categories)
{
categories.Add(category.Name);
}
StorageConnection.AddArticle(feedItem.Title.Text, feedItem.Summary.Text, feedItem.PublishDate.Date,
feedItem.Id, categories.ToArray());
}


First I set up my RssFeedReader instance and call ReadFead on it. This results in a SyndicationFeed on which I iterate trough the Items collection. Then I get the categories and feed them into my StorageConnection class which makes sure everything is properly stored in the database. The StorageConnection class makes sure nothing is duplicated even if the same article is added more then once.

Here is the source:


In part #3 of this series, we'll look into building a new RSS feed with some customizations, based on the data we've retrieved today.

No comments:

Post a Comment