Skip to content

Ed The Archivist

Digital Preservation and Archives

Ed The Archivist

Tag: blog

BlogForever: Thoughts about blog data and metadata

During the ArchivePress project at ULCC, we briefly considered the data and metadata generally made available with blogs and blog posts. As ArchivePress focused on the representations of blogs in newsfeeds, we examined the metadata that is generated in common, and exposed in the newsfeeds of three of the most common blog platforms, WordPress, Blogger […]

From the BlogForever blog.

During the ArchivePress project at ULCC, we briefly considered the data and metadata generally made available with blogs and blog posts. As ArchivePress focused on the representations of blogs in newsfeeds, we examined the metadata that is generated in common, and exposed in the newsfeeds of three of the most common blog platforms, WordPress, Blogger and TypePad. Blogger and Typepad prefer the Atom newsfeed format; WordPress (particularly WordPress.com) prefers RSS (though it can be made to publish Atom feeds too). This analysis was done, about a year ago, things may have changed, but here is a summary of what we found.

For each Blog, the following core information is available in the feeds:

WordPress (RSS) Blogger (Atom) Typepad (Atom)
Feed Unique ID NA feed/id feed/id
Blog URL rss/channel/link feed/link@rel=”alternate” feed/link@rel=”alternate”
Blog Title rss/channel/title feed/title feed/title
Blog Description rss/channel/description feed/subtitle feed/subtitle
Date of last update rss/channel/lastBuildDate feed/updated feed/updated
Generating software rss/channel/generator feed/generator feed/generator

For each Post, we established that the following core information is available in the newsfeeds:

WordPress (RSS) Blogger (Atom) Typepad (Atom)
Post Unique ID rss/channel/item/guid@isPermaLink feed/entry/id feed/entry/id
Post Title rss/channel/item/title feed/entry/title feed/entry/title
Post Summary rss/channel/item/description NA feed/entry/summary
Post URL rss/channel/item/link feed/entry/link@rel=”alternate” feed/entry/link@rel=”alternate”
Date of publication rss/channel/item/pubDate feed/entry/published feed/entry/published
Date of last update NA feed/entry/updated feed/entry/updated
Post Author rss/channel/item/dc:creator

rss/xmlns:dc

feed/entry/author/name feed/entry/author/name
Post Category rss/channel/item/category feed/entry/category@term feed/entry/category@term
Post Content rss/channel/item/content:encoded

rss/xmlns:content

feed/entry/content

feed/entry/content

Post Comments rss/channel/item/comments feed/entry/link@rel=”replies” feed/entry/link@rel=”replies”
Post Comments Feed rss/channel/item/wfw:commentRss NA

NA

One interesting point we noted was that neither Blogger nor Typepad published a link to a Comments Feed for each post. This made our work on ArchivePress more difficult since it was predicated on being able to easily identify the Comments feed for each post, and harvest new Comments as they were published. Obviously for blogs generated other than by WordPress, this was not going to be so easy. (Our ace developer Emanuele found some workarounds, but that’s another story.)

I think this offers us an interesting overview of the core of standard, structured blog data and metadata, in three of the leading blog platforms. This is the data structure and metadata profile that is maintained in blog databases, in one of its native forms, and I’d expect it to be present in all blog platforms, since it arguably represents the essence of blogs. I hope this will be useful background when considering the core models for data and metadata handling that will be developed for BlogForever.

Tweet
Author Ed PinsentPosted on 25th April 2011Categories Web ArchivingTags blog, BlogForever, blogs, data, data model, European Commission, metadata, newsfeeds, RSS, web archiving

Recent Posts

  • Anti-folder, pro-searching
  • PDF/A and read-only in SharePoint
  • Metadata and Properties In SharePoint
  • Wanted: an underpinning model of organisational truth for the digital realm
  • What does an archivist do?

Recent Comments

  • Özhan Saglik on Metadata and Properties In SharePoint
  • Malcolm Todd on File formats…or data streams?
  • William Kilbride on File formats…or data streams?
  • Kevin Ashley on File formats…or data streams?
  • Chris Rusbridge on File formats…or data streams?

Archives

  • May 2019
  • June 2018
  • April 2018
  • November 2017
  • October 2017
  • September 2017
  • July 2017
  • May 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • October 2016
  • September 2016
  • April 2016
  • March 2016
  • February 2016
  • November 2015
  • November 2014
  • October 2014
  • November 2013
  • July 2013
  • April 2013
  • December 2012
  • October 2012
  • July 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • May 2011
  • April 2011
  • February 2010
  • December 2009
  • March 2009
  • February 2009
  • July 2008
  • June 2008
  • May 2008
  • April 2008

Categories

  • AOR toolkit
  • Archives
  • DA Blog
  • Digital Archives
  • Digital Preservation
  • Digitisation
  • Events
  • Projects
  • Repositories
  • Research Data
  • Web Archiving

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright 2016
Footer text center
Nucleus by GalussoThemes.com
Powered by WordPress