Marco Bastos and I have written a wrapper around the Guardian Content API for use with R. If you are unfamiliar with R you should check it out, it is an extremely valuable resource for data analysis.

The GuardianR package is available here, or you can simply install the binary via install.packages().

The core function of the package is get_guardian() which returns a variety of data fields (title, author, teaser text, full text) for news articles relating to a particular keyword (“euro”, in the example given below).

> install.packages("GuardianR")
versuche URL ''
Content type 'application/x-gzip' length 24240 bytes (23 Kb)
URL geöffnet
downloaded 23 Kb

The downloaded binary packages are in
> library("GuardianR")
Lade nötiges Paket: RCurl
Lade nötiges Paket: bitops
Lade nötiges Paket: RJSONIO
> x <- get_guardian(keywords="euro","2013-05-07","2013-05-17")
[1] "Fetched page #1 of 2"
> head(x)
id sectionId sectionName
1 business/2013/may/17/eurozone-crisis-car-sales-markets business Business
2 global/filmblog/2013/may/17/cannes-2013-live-blog-day-2-le-passe film Film
3 teacher-network/teacher-blog/2013/may/17/languages-schools-students-gcse-alevels-mfl teacher-network Teacher Network
4 commentisfree/2013/may/17/amid-tory-disarray-labour-critical-moment commentisfree Comment is free
5 politics/2013/may/16/labour-local-councils-welfare-funding politics Politics
6 football/blog/2013/may/16/real-madrid-atletico-copa-del-rey football Football

Currently the Mac version has a bug (at least on my machine) that prevents it from displaying more than 100 results, but we should be able to fix that soon.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>