RssLibraries (original) (raw)

This wiki is in the process of being archived due to lack of usage and the resources necessary to serve it — predominately to bots, crawlers, and LLM companies. Edits are discouraged.
Pages are preserved as they were at the time of archival. For current information, please visit python.org.
If a change to this archive is absolutely needed, requests can be made via the infrastructure@python.org mailing list.

Articles:

Libraries:

Feed Parser

Feed Parser is an awesome RSS reader. It is now hosted on Google Code & Sourceforge - Universal Feed Parser on Google Code (Project Page on SourceForge).

Universal Feed Parser documentation.

Download it, and then start a Python prompt in the same directory.

1 import feedparser 2 3 python_wiki_rss_url = "http://www.python.org/cgi-bin/moinmoin/"
4 "RecentChanges?action=rss_rc" 5 6 feed = feedparser.parse( python_wiki_rss_url )

You now have the RSS feed data for the PythonInfo wiki!

Take a look at it; There's a lot of data there.

Of particular interest:

feed[ "bozo" ] 1 if the feed data isn't well-formed XML.
feed[ "url" ] URL of the feed's RSS feed
feed[ "version" ] version of the RSS feed
feed[ "channel" ][ "title" ] "PythonInfo Wiki" - Title of the Feed.
feed[ "channel" ][ "description" ] "RecentChanges at PythonInfo Wiki." - Description of the Feed
feed[ "channel" ][ "link" ] Link to RecentChanges - Web page associated with the feed.
feed[ "channel" ][ "wiki_interwiki" ] "Python``Info" - For wiki, the wiki's preferred InterWiki moniker.
feed[ "items" ] A gigantic list of all of the RecentChanges items.

For each item in feed["items"], we have:

item[ "date" ] "2004-02-13T22:28:23+08:00" - ISO 8601 date
item[ "date_parsed" ] (2004,02,13,14,28,23,4,44,0)
item[ "title" ] title for item
item[ "summary" ] change summary
item[ "link" ] URL to the page
item[ "wiki_diff" ] for wiki, a link to the diff for the page
item[ "wiki_history" ] for wiki, a link to the page history

Aggregating Feeds with Feed Parser

If you're pulling down a lot of feeds, and aggregating them:

First, you may want to use Future threads to pull down your feeds. That way, you can send out 5 requests immediately, and wait for them all to come back at once, rather than sending out a request, waiting for it to come in, send out another request, wait for it to come back in, etc., etc.,.

1 from future import Future 2 3 hit_list = [ "http://...", "...", "..." ] 4 5 6 future_calls = [Future(feedparser.parse,rss_url) for rss_url in hit_list] 7 8 feeds = [future_obj() for future_obj in future_calls]

Now that you have your feeds, extract all the entries.

1 entries = [] 2 for feed in feeds: 3 entries.extend( feed[ "items" ] )

...and sort them, by SortingListsOfDictionaries:

1 sorted_entries = sorted(entries, key=lambda entry: entry["date_parsed"]) 2 sorted_entries.reverse()

Congratulations! You've aggregated a bunch of changes!

Contributors

LionKimbro

Discussion

Getting the "author"/"contributor" out of most ModWiki RSS feeds with the feedparser module is a bit confusing as of now. Right now (feedparser 3.3), it goes into the "rdf_value" attribute of the entry.


I'm moving the following out of the main text:

Are you concerned that I'm encouraging people to reduplicate efforts, making aggregator after aggregator after aggregator?

That's not the case; there are good reasons to write aggregators yet.

In particular: I wrote the code because I needed a MoinMoin macro that aggregated RSS feeds.

I imagine that there are other good reasons to write aggregating code.

That said, RawDog is Python, and it is using Feed Parser, so I've linked it at the bottom of the page.

-- LionKimbro 2004-12-27 08:44:40


Next, I moved this out of the main text:

This makes sense if you're just writing a client aggregator for reading blogs. But if you're compiling parts of a web page, then you want to generate a response within 20 seconds, not 3 minutes.

Similarly, I'm removing:

Maybe there's some other page on some other wiki where this belongs. I don't think that space is here.

I'm mainly concerned here with giving an example of how the RSS library works, the kinds of things you can do with it, how to combine it's use with the Futures module.

This isn't really about writing aggregators.

-- LionKimbro 2004-12-27 08:44:40


Can someone please give a sample of how FeedParser works along with Future Threads to make an RSS AGGREGATOR.

Manasa