Marco Bastos and I have written a wrapper around the Guardian Content API for use with R. If you are unfamiliar with R you should check it out, it is an extremely valuable resource for data analysis.

The GuardianR package is available here, or you can simply install the binary via install.packages().

The core function of the package is get_guardian() which returns a variety of data fields (title, author, teaser text, full text) for news articles relating to a particular keyword (“euro”, in the example given below).


> install.packages("GuardianR")
versuche URL 'http://ftp5.gwdg.de/pub/misc/cran/bin/macosx/contrib/3.0/GuardianR_0.1.tgz'
Content type 'application/x-gzip' length 24240 bytes (23 Kb)
URL geöffnet
==================================================
downloaded 23 Kb

The downloaded binary packages are in
/var/folders/5q/stxpb5813zbdkjs_l7xyfdhm0000gn/T//RtmpNXoZrn/downloaded_packages
> library("GuardianR")
Lade nötiges Paket: RCurl
Lade nötiges Paket: bitops
Lade nötiges Paket: RJSONIO
> x <- get_guardian(keywords="euro", from.date="2013-05-07", to.date="2013-05-17")
[1] "Fetched page #1 of 2"
> head(x)
id sectionId sectionName
1 business/2013/may/17/eurozone-crisis-car-sales-markets business Business
2 global/filmblog/2013/may/17/cannes-2013-live-blog-day-2-le-passe film Film
3 teacher-network/teacher-blog/2013/may/17/languages-schools-students-gcse-alevels-mfl teacher-network Teacher Network
4 commentisfree/2013/may/17/amid-tory-disarray-labour-critical-moment commentisfree Comment is free
5 politics/2013/may/16/labour-local-councils-welfare-funding politics Politics
6 football/blog/2013/may/16/real-madrid-atletico-copa-del-rey football Football

Currently the Mac version has a bug (at least on my machine) that prevents it from displaying more than 100 results, but we should be able to fix that soon.

 

Small notes on Big Data from Oxford

On March 25, 2013, in Events, by cornelius

As it was snowing cats and dogs in beautiful Oxford this weekend, I figured I might as well get to a much neglected task: blogging. Following the excellent Workshop Big Data: Rewards and Risks for the Social Sciences here at the Oxford Internet Institute last week (to be followed by another event this week) it feels especially timely to write up a short summary of what I’ve been doing during my stay here, and in the last couple of months in general.

Taken in January, but two months later the weather in this part of England hasn’t changed much.

It’s been fantastic time at the OII so far, and I have to admit that if I could, I would love to stay for a longer period. I arrived in January to snow and after figuring out the ins and outs of Oxford life (how to handle various keycards, where to shop and get coffee, which pubs are best), I quickly settled into my new office on 66 Banbury Road, the institute’s northern outpost. Office space is in short supply for obvious reasons in Oxford (as one can’t exactly tear down a medieval college…), but I liked 66 right away, simply because everyone I’ve encountered here has been incredibly welcoming and friendly, making it easy to settle in and get a lot of work done. I’ve had the chance to chat with a fantastic variety of people in the office, during brownbags, workshops, and over lunch, and it’s an absolutely unique environment. 

So what’s been happening on my end? A few days before my arrival in Oxford, my awesome colleague Jean Burgess and I published our working paper The Politics of Twitter Data in the Humboldt Institute for Internet and Society’s SSRN Discussion Paper Series. The piece has been quite well received, with coverage from Patrick Maier in the NatGeo Explorer blog, as well as from Netzpolitik. Also shortly around the time of my arrival in the UK, the volume Pragmatics of CMC was published by De Gruyter, a volume that has been in the making for quite a few years. I contributed a chapter on blogging to the handbook, expertly edited by Susan C. Herring, Dieter Stein, and Tuija Virtaanen, which destills a lot of my previous research on blogging.

A few weeks into my stay, I was invited to give a talk in the Nuffield Network Seminar Series (see slides below). My presentation focused on the scientific blog networks Hypotheses.org, providing an analysis of the dynamics of knowledge exchange between different scholarly communities inside the platform. A particular interest for me are disciplinary and linguistic communities, a theme that Marco Toledo Bastos, Rodrigo Travitzki and me have also explored in a recent paper on Twitter activism about which I’ll post more soon (Marco will present this research in Paris at HyperText 2013 next month). I’m very keen on doing more (and especially more sophisticated) network analysis and feedback from Bernie Hogan, Sandra Gonzalez-Bailon, and Taha Yasseri has already been invaluable in this regard. I’ll also be delivering an invited talk at the annual meeting of the Berliner Arbeitskreis Information next month on the role of Big Data for knowledge production, in which I will combine insights from research with a dose of criticism and reflection.

Later this spring, I am exceptionally looking forward to ICA 2013 in London, where I will be presenting two papers, one as part of the panel Big Data and Communication Research: Prospects, Perils, Alliances, and Impacts, chaired by Eric T. Meyer,  and another in a (oddly enough) session on Copyright and Digital Piracy with my colleague Merja Mahrt (who, by the way, has also written this important piece on Big Data for communications research). The first panel will see contributions from Eric, Ralph Schroeder, Bernie Hogan and Mark Graham, Matthew Weber, and from danah boyd and Kate Crawford, all of whom are exceptional researchers. My talk will focus on the politics (and economics) of social media platforms as characterized in the relationship between platform providers, data resellers, large media companies and consumers.

As you can probably guess, all of this ties in beautifully with the ongoing activities relating to Big Data here at the OII and future research at the Humboldt Institute for Internet and Society, where Big Data is also a major topic. The workshop last week was part of an initiative funded by the Sloan Foundation to promote discussion about Big Data in the Social Sciences. More discussion is needed about the impact of Big Data on scholarly research, but also on politics, business and culture more broadly.

It seems that 2013 is shaping up to be the year in which academia catches up with Big Data — or at least with some of the hype surrounding it.

Edit: A summary of the recent Open Science session at the Berlin Colloquium on Internet and Society with talks from Constanze Engelbrecht, Pasco Bilic and Christoph Lutz has been posted by the Humboldt Institute’s Benedikt Fecher (german version, english version). The text below is a more general discussion of how Open Science can be defined.

One area of research at the Alexander von Humboldt Institute is Open Science, an emerging term used to describe new ways of conducting research and communicating its results through the Internet. There is no single definition of what constitutes Open Science (and one could argue there doesn’t really need to be), but in this blog entry I want to point to attempts to define the term by prominent scientists and activists, and discuss some of the limitations of these definitions. I’ll summarize my observations in the form of five questions that suggest a direction that future research into Open Science could take.

Open Science: a few working definitions

Michael Nielsen – a prominent scientist and author on Open Science whose name pops up invariently when discussing the topic – provides this very comprehensive definition in a post to the Open Science mailing list:

“Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.”

In the same vein, Peter Murray-Rust, a professor in molecular chemistry and Open Access advocate, provides another definition (also through the OKFN’s open science mailing list):

“In a full open science process the major part of the research would have been posted openly and would potentially have been available to people outside the research group both for reading and comment.”

(Also see this interview if you want a more detailed exposition).

Finally, Jean Claude Bradley, also a professor in chemistry, provides a definition of what he calles Open Notebook Science, a very similar approach:

“[In Open Notebook Science] there is a URL to a laboratory notebook that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world.”

(Here’s a presentation summarizing his approach, Open Notebook Science. A similar view is articulated by M. Fabiana Kubke and Daniel Mietchen in this video, though they prefer the term Open Research.)

From natural philosophy to science

One thing that these different definitions have in common is the way in which they frame science. In English, the word science has come to denote primarily the natural sciences (traditionally physics and chemistry, more recently also biology and life sciences). The history of the term is long and complex (check out the Wikipedia entry), but as a result of language change, a wide range of disciplines are considered not to be part of the sciences, but instead belong to the social sciences and Humanities.

Why does this matter? The above definitions are very closely tailored to the methods and organizational structures of the natural sciences. They assume that research is conducted in a research group (Murray-Rust) that works primarily in a laboratory and whose members record the steps of an experimental process in a lab notebook (Bradley), following a sequence of more or less clearly-structured steps that can be summarized as “the discovery process” (Nielsen).

Research processes in other fields strongly differ from this approach, not just in the Humanities (where there is frequently no research group, and data is of varying relevance), but also in the social sciences (where there is generally no laboratory, and data frequently comes from human subjects, rather than technical instruments such as radio telescopes or DNA sequencers). Beyond just using different tools, the instruments also shape the assumptions of their users about the world, and about what they do.  Sociologist Karin Knorr-Cetina points to this difference in the title of her book Epistemic Cultures, and similar observations have been made in Bruno Latour & Steve Woolgar’s Laboratory Life: The Construction of Scientific Facts. One crucial aspect of this is how data is conceptualized in the different disciplinary perspectives, and, related to this, how notions differ regarding what openness means.

Openness beyond open access to publications

Openness can be defined in a variety of ways. Not all information that is available online is open in a technical sense – just think about proprietary file formats that make it difficult to share and re-use data. Technical openness does not equal legal openness, a problem that is also on the institute’s agenda.

Open Access – the technical and legal accessibility of scholarly publications via the Internet – is widely regarded to benefit both science and the public at large. In the traditional publishing model access to research results in scholarly monographs and journals is available to subscribers only (usually institutional subscribers, in other words, libraries). The Open Access model shifts the costs, sometimes to authors (who pay a fee to publish) or to publishing funds or other institutional actors. The Budapest and Berlin Declarations on Open Access specify under which provisions publications are truly Open Access, rather than just somehow accessible. Open Access has a range of benefits, from reducing costs and providing access to scientists at small universities and in developing countries, to increasing transparency and raising scholarly impact. Models based on author fees, such as the one utilized by PLoS, are increasingly common and make Open Access economically feasible.

There is broad consensus that Open Access is a first step, but that it’s not enough. Many scientists, such as the ones cited above, call for research data also to be made available more broadly. Sharing research data, instead of packaging data and analysis together in scholarly articles, could enable new forms of research that are much more complementary than current practices, which tend to emphasize positive outcomes (experiments that worked) over negative ones (those that didn’t), despite the fact that negative outcomes can greatly contribute to better understanding a problem.

Making openness count

The barriers to achieving a more open environment in regards to research data aren’t primarily technical or legal, but cultural. Research has always been based on the open dissemination of knowledge (just take the history of the Philosophical Transactions, considered by most to be the oldest scientific journal), but it is also very closely tied to the formats in which knowledge is stored and disseminated, such as books, journal articles, and conference papers, which tend to take on a valorizing role, rather than being just arbitrary containers of scholarly information. Many scholars, regardless of their field, see themselves in the business of publishing books, articles, and papers just as much as they consider themselves to be in the business of doing research. While the technology behind scholarly publishing has changed dramatically, the concepts have not changed. Because institutionalized academia is incentive-driven and highly competetive, collective goals (a more efficient approach to knowledge production) are trumped by individual ones (more highly-ranked publications = more funding and promotions for the individual researcher).

Institutional academia is no longer the only place where research happens. Increasingly, there is (if latently) competition from crowdsourcing platforms that facilitate collaborative knowledge creation (and, more open, problem solving) outside of institutional contexts. Depending on how you define the process of knowledge production, examples include both Wikipedia and projects such as the #SciFund Challenge. The approach to knowledge production in these environments seems to focus on knowledge recombination and remixing at the moment, but it appears plausible that more sophisticated models could arise in the future. Whether the hybrid communities of knowledge production have a potential to displace established institutional academia remains to be seen. Rather, such communities could blossom in those areas where traditional academia fails to deliver.

But even inside institutional academia, the time seems ripe for more openness beyond making publications and data available to other academics. Social media makes it possible for scholars to both communicate with their peers and engage with the public more directly — though they are still hesitant to do either at the moment. Public visibility is not as high on the agenda of most researchers as one might expect, because academic success is largely the result of peer, not popular evaluation.

Redefining scholarly impact

This may change as new, more open measurements of scholarly impact enter the mainstream. Measuring and evaluating the impact and quality of publicly-funded research has been a key political interest for decades. While frameworks exist for conducting large and complex evaluations (Research Assessment Exercises in the UK, Exzellenzinitiative in Germany) the metrics used to evaluate the performance of researchers are generally criticized as too one-dimensional. This criticism applies in particular to measuerments that indicate the quality of publications such as Thompson Reuters’ Impact Factor (IF). A confluence of measures (downloads, views, incoming links) could change the current, extremely one-sided approach to evaluation and make it more holistic, generating a more nuanced picture of scholarly performance.

Questions for research into Open Science

The following questions reflect some of the issues raised by “open” approaches to science and scholarship. They are by no means the only ones, als the Open Science project description on the pages of the HIIG highlights, but reflect my personal take.

  1. How can Open Science be conceptualized in ways that reach beyond the paradigm of the natural sciences? In other words, what should Open Humanities and Open Social Sciences look like?
  2. How do different types of data (recorded by machines, created by human subjects, classified and categorized by experts) and diverse methods used for interacting with it (close reading, qualitative analysis, hermeneutics, statistical approaches, data mining, machine learning) impact knowledge creation and what are their respective potentials for openness in the sense described by Nielsen, Murray-Rust and Bradley? What are limits to openness, e.g. for ethical, economic and political reasons?
  3. What are features of academic openness beyond open access (e.g. availability of data, talks, teaching materials, social media media presence, public outreach activities) and how do they apply differently to different disciplines?
  4. How can the above-mentioned features be used for a facetted, holistic evaluation of scholarly impact that goes beyond a single metric (in other words, that measures visibility, transparency and participation in both scientific and public contexts)?
  5. What is the relationship between institutionalized academia and hybrid virtual communities and platforms? Are they competitive or complementary? How do their approaches to knowledge production and the incentives they offer to the individual differ?

 

After recently teaching an introductory class to R aimed at linguists at the University of Bayreuth, I’ve decided to put my extended notes on the website in the form of a very basic tutorial. Check out Corpus and Text Linguistics with R (CTL-R) if you want to learn R fundamentals and have no prior programming experience. It’s still incomplete at present, but I hope to have more chapters ready soon. Happy R-ing! :-)

Tagged with:  

I just came back from Deidesheim, a small town (yet with an oddly epic entry in English Wikipedia — what’s up with that?) located in an area of Germany best known for its excellent Riesling, where I participated in the annual meeting of the SciLogs blogging community. My role in Deidesheim, together with my colleague Merja Mahrt, was to nominate a blogger for the SciLogs ’12 best blog award (here’s the winner!) and to give a talk on research on scholarly blogging (slides below).

The past ten days have been a whirlwind tour of sorts, with no less than three (!) different events related to scholarly/science/research blogging that I attended, and I want to take a moment and reflect on some of the things that were discussed and record a few thoughts they provoked.

So let’s start with a list of the events.

Weblogs in the Humanities, Munich

Picture of me during my talk at 'Weblogs in the Humanities'. Photo by Wenke Bönisch.

Last week, I presented at the conference Weblogs in den Geisteswissenschaften (Weblogs in the Humanities), organized by the Deutsche Historische Institut Paris and supported by hypotheses.org, a platform operated by Cléo, a section of the CNRS. The newly launched portal de.hypotheses.org is aimed at the German-speaking scholarly community and follows the model of its French parent. Weblogs (or carnets de recherche, as they are branded under the hypotheses label) are more widely read in France than they are in Germany, a factor which I think partly explains their uptake. Another key to their success seems to be the way they are supported, for example, each blog is provided with an ISSN, making it easier to cite. As part of the editorial team behind de.hypotheses.org, I’m excited to see whether the platform will succeed and follow in the footsteps of its French counterpart, which hosts an impressive 300 scholarly blogs. The conference was certainly an indicator that the topic is on a lot of people’s radar. More detailed reports from the event can be found here (keynote speaker Melissa Terras, in English), here (Wenke Bönisch, in German) and here (Anton Tantner, also in German). During a break, I had the chance to interview Melissa for my postdoc project and was myself interviewed for the German Humanities portal LISA. Thank You to Melissa for taking the time to chat with me and to Georgios Chatzoudis for asking some very thought-provoking questions!

VIDEO OF TALKAUDIO INTERVIEW

Symposia on e-Social Science, Oxford

Next I flew to England for the first time in several years, to visit the Oxford Internet Institute and attend two events, Social Science and Digital Research: Interdisciplinary Insights and Digital Social Research: A Forum for Policy and Practice. There was also a dinner on Monday to mark the formal ending of the Oxford e-Social Science Project and a breakfast on Tuesday morning, where the Euorpean Commission’s SESERV project was discussed and recommendations on how to integrate e-social science methods into teaching and research more closely were formulated. All of theses events were related to the Oxford e-Social Science project in one way or another, therefore the aspect of digital scholarly communication was just one facet of that larger theme. People had a broad discussion of research and teaching practices in the social sciences and how e-science fits into the mix. I found Christine Borgmann‘s keynote on reproducibility very thought-provoking. We take it for grandted that open data will make the research process more transparent and hopefully this is true, but what reproducibility actually amounts to is widely contested and especially tricky in the context of the human and social sciences.

SciLogs Meeting 2012, Deidesheim

Bloggers chatting at scilogs12 in Deidesheim.

After my visit to Oxford, it was on to Deidesheim via Düsseldorf. The SciLogs meetup was yet a different event than both the Munich conference and the symposia in England. SciLogs is comparable to scienceblogs.com in that it’s run by a publisher (Spektrum der Wissenschaft), who has launched it largely as a source of popular science content and that it has an orientation towards the natural sciences (though there are also blogs on history, linguistics and a variety of other fields). It was exciting to chat with people who have been involved in science blogging for years and to learn more about what drives them. I was particularly impressed by the enthusiasm that the sciloggers have for their blogs and their readers. Blogging is hard work (as the gracial pace of my postings here illustrates…) and Spektrum Verlag can be quite proud of the community it has built around the idea of better informing people about scientific research.

Below are some somewhat random points that I found noteworthy.

Scholarly/academic/science/research blogs are written by a wide variety of people (e.g. scholars, journalists, librarians, science enthusiasts), for a wide range of audiences (e.g. self, peers, people in the same field, practitioners, politicians, general public) with a variety of purposes in mind (self-fulfillment, knowledge management). It’s important not just to regard them exclusively as a form of science communication, but to see the many roles they take on for a range of users.

Just as scholarly bloggers and their topics are a diverse bunch, readers and commentators of S/a/s/r blogs have different reasons for visiting and participating. A key motivation among commentators could be that they can add their view to a post. This may seem obvious, but it’s interesting for several reasons. For example, there is fairly little dialogue going on in posts that have a lot of comments. The commentators simply add their take and then leave, without engaging with the blogger or with each other. Debates that do have a lot of actualy discussion sometimes devolve into arguments between individual users that have little to do with the original post. This isn’t a bad thing, but it illustrates that it’s a bad idea to give in to the temptation that a large number of comments translates into success. Or, perhaps speaking of personal success is alright as long as one doesn’t mistake it for societal impact. Another thing is the relation of commentators and readers. It’s not trivial to figure out whether fewer comments means less attention on the part of readers.

In order to play a role in main-stream scholarly communication, as it is still conducted primarily via monographs and journals, scholarly blogging must integrate some of the conventions that exist in these forms (quality control, long-term availability of content, citability), if it is to succeed as a formal genre of scholarly communication, while preserving its intrinsic strengths (speed and simplicity of publication, the potential of interaction via comments, the ability to embed images, video, audio, data and code, the ability to link and quote, the ability to track one’s impact via metrics). The adaptation can happen in multiple ways, and it only applies to formal scholarly communication — what happens informally or for other purposes remains uneffected, as does blogging about science by journalists or hobbyists. Blogging about science and scholarship is obviously in the public interest. The question is, should this be left to the researcher, or should it be incentivized by institutions? As Klaus Graf put it somewhat radically at the conference in Munich, is a researcher who doesn’t blog a bad researcher?

I’ve already shared this bit of personal news with a few friends and colleagues, but I thought I’d blog about it as well — especially since I’m woefully behind on my Iron Blogger schedule. ;-)

After a fairly long time in the making, I have been awarded a three-year research grant from the Deutsche Forschungsgemeinschaft (DFG) for the project Networking, visibility, information: a study of digital genres of scholarly communication and the motives of their users (summary in German on the DFG’s site). The project investigates new forms of scholarly communication (especially blogging and Twitter) and their role for academia. My key concerns are usage motives, i.e. why scholars use blogs and Twitter, and how these motives correspond with usage practices (how they blog and tweet), rather than how many researchers use these channels of communication or what makes them refrain from using them (see this blog post and the study mentioned in it for that kind of work). My main methods will be qualitative interviews with a sample of 20-25 blogging and/or tweeting academics, along with in-depth content analysis of the material they post in these channels over a prolonged period (>1 year). Identifying usage patterns and relating them to the participants’ narrative about their use will be another key objective. Ultimately, I hope to find a (tentative) answer to the question what role blogs and Twitter may play for the future of digital scholarship, and whether they will remain a niche phenomenon or become mainstream over time.

The project follows up on my work on corporate blogging and connects strongly to what we have been doing at the Junior Researchers Group “Science and the Internet” over the past year, but the focus on interviews should result in a more user-centric analysis. As someone who has been doing (applied) linguistic analysis to make inferences about social processes, I feel much more comfortable actually talking to the people I want to study, rather than just crunching numbers on how they tweet. Big data social science research is obviously and understandably en vogue these days, but I hope to find a good synergy between qualitative and quantitative approaches in my project.

My new institutional home for the next three years will be the Berlin School of Library and Information Science at Humboldt University. I’m grateful to Michael Seadle for supporting my project and really look forward to working with my new colleagues at IBI (that’s the German acronym, which, as far as I can tell, is preferred to its more entertaining English equivalent). I also look forward to working with colleagues from the Alexander von Humboldt Institute for Internet and Society (HIIG) where I’m currently supporting the project Regulation Watch. Finally, I plan to keep in close contact with the colleagues in Düsseldorf, both at the Junior Researchers Group and the Department of English Language and Linguistics, where I have learned virtually everything I know about being a researcher. I am especially indebted to Dieter Stein for his enduring support and for his contagious enthusiasm for all aspects of scholarship.

Sic itur ad astra! :-)

For an overview of previous work I’ve done in this direction, have a look at my publications.

Tagged with:  

Those of you following my occasional updates here know that I have previously posted code for graphing Twitter friend/follower networks using R (post #1. post #2). Kai Heinrich was kind enough to send me some updated code for doing so using a newer version of the extremely useful twitteR package. His very crisp, yet thoroughly documented script is pasted below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Script for graphing Twitter friends/followers
# by Kai Heinrich (kai.heinrich@mailbox.tu-dresden.de) 
 
# load the required packages
 
library("twitteR")
library("igraph")
 
# HINT: In order for the tkplot() function to work on mac you need to install 
#       the TCL/TK build for X11 
#       (get it here: http://cran.us.r-project.org/bin/macosx/tools/)
#
# Get User Information with twitteR function getUSer(), 
#  instead of using ur name you can do this with any other username as well 
 
start<-getUser("YOUR_USERNAME") 
 
# Get Friends and Follower names with first fetching IDs (getFollowerIDs(),getFriendIDs()) 
and then looking up the names (lookupUsers()) 
 
friends.object<-lookupUsers(start$getFriendIDs())
follower.object<-lookupUsers(start$getFollowerIDs())
 
# Retrieve the names of your friends and followers from the friend
# and follower objects. You can limit the number of friends and followers by adjusting the 
# size of the selected data with [1:n], where n is the number of followers/friends 
# that you want to visualize. If you do not put in the expression the maximum number of 
# friends and/or followers will be visualized.
 
n<-20 
friends <- sapply(friends.object[1:n],name)
followers <- sapply(followers.object[1:n],name)
 
# Create a data frame that relates friends and followers to you for expression in the graph
relations <- merge(data.frame(User='YOUR_NAME', Follower=friends), 
data.frame(User=followers, Follower='YOUR_NAME'), all=T)
 
# Create graph from relations.
g <- graph.data.frame(relations, directed = T)
 
# Assign labels to the graph (=people's names)
V(g)$label <- V(g)$name
 
# Plot the graph using plot() or tkplot(). Remember the HINT at the 
# beginning if you are using MAC OS/X
tkplot(g)
Tagged with:  
Tagged with:  

Berlin 9: The Worldwide Policy Environment (Wednesday)

On November 10, 2011, in Events, by cornelius

Avice Meehan moderated the first session of the Berlin 9 Open Access conference session on The Worldwide Policy Environment. She introduced the three presenters:

  • Jean-François Dechamp, Policy Officer, European Commission, Directorate-General for Research and Innovation
  • Harold Varmus, Director, U.S. National Cancer Institute
  • Cyril Muller, Vice President, External Affairs Department, The World Bank

After a brief introduction by Avice, Jean-Francois Dechamp took to the podium, to talk about the European policy context of Open Access. Jean-Francois described how the European Commission acts as a policy maker, a funding agency, and as an infrastructure funder and capacity builder. He cited Commission documents stating that “publicly funded research should be open access” and the noted that the Commission aims to to make Open Access to publicatons “the gerade principle for projects funded by the EU research Framework Programmes”. Key reasons for the European Commission to support Open Access include to serve science and research, benefit innovation and improve return on investment in R&D. OA publishing costs (article charges) are covered by FP7, although fairly few researchers realize this. Dechamp cited a study conducted by the EUC where the majority of researchers involved indicated that they were ready to self-archive, but that the legal challenges were daunting. He cited a soon-to-be-released study (ERAC, 2010-2011) that found that the overall significance of OA in the member states has significantly increased over the past few years.

Harold Varmus of the U.S. National Cancer Institute and NIH came next. Harold stressed that he was not speaking as the representative of a policy-making institution, but as a scientist. He lamented that the shift towards OA is not happening fast enough and asked for a broader idea of Open Access that must go beyond access to publications, to access to data and (ultimately) knowledge. True Open Access, according to Harold, means gold road OA, in accord with the Berlin Declaration — embargos aren’t good enough. Harold traced his contact with OA to 1998 when he heard about arXiv (built by Paul Ginsparg) and thought that such a resource should also exist for biomedicine. He went on to emphasize that different fields have different needs, and that publishing must be sensitive to these needs. Harold also stressed the success of Pubmed Central, with a size of now 2 mio articles. In 2006 publishers were encouraged to donate articles (with limited success), in 2008 a mandate was introduced to publish NIH-supported research on PubmedCentral after an embargo period. Harold noted that economics are essential and that there’s always a business plan attached to journals. He noted that while researchers love their publishers, they love the people who give them money even more, pointing to the central influence of funders in relation to OA. Harold noted the success of PLoS, specifically of PLoS ONE. He further echoed Cathy Norton’s observation that the public at large wants access — not just abstracts and titles, but the actual data. While articles are the best product of academic research, they are also emotionally laden. Harold noted that while funders see articles as mere vehicles of knowledge, authors also write for fame and prestige, not just to contribute to knowledge. He closed by arguing strongly for a new regime of review (post rather than pre). Authors should be forced to list their most important contributions rather than bean counting by relying on long publication lists and the impact factor.

Cyril Muller approached the topic differntly in his talk, focusing on the Open Data Approach of his insitution, the World bank, and on the positive effects that they had observed in making the data collected by them digitally available. He described the three pillars of their approach (Open Data, Open Knowledge, Open Solutions) and presented statistics on how much information were now made available online, rather than in print via their Open Knowledge Repository. He provided interesting examples of information-enabled innovation in Africa and elsewhere. My notes are unfortunately somewhat incomplete on Cyril’s talk, but it really focused on Open (Government) Data more than on Open Access (to Scholarly Publications), putting it more into a thematic camp with a variety of initiatives from that direction.

Berlin 9: Opening session (Wednesday)

On November 10, 2011, in Events, by cornelius

There are my notes from the opening session of the Berlin 9 Open Access Conference. I’ve already blogged the pre-conference workshops on Open Access Publishing and Open Access Policy.

The conference opened with welcoming remarks, first from the HHMI’s VP and chief scientific officer Jack E. Dixon, then from HHMI’s head Robert Tijan, followed by the Max Planck Society’s Bernard Schutz, and finally from the Marine Biology Lab’s Cathy Norton. Jack Dixon struck an optimistic note, observing that “the tide is turning, in a very positive way.” Robert Tijan observed that those who fund research should be more active in publishing, a reference to eLife, a new Open Access journal in the lifes sciences jointly launched by HHMI and the Max Planck Society. He went on to note that “scientific work is not complete before the results become accessible… what we do doesn’t have any impact otherwise.” Bernard Schutz focused on the development of the Berlin Declartion in his talk. 30 institutions had been original signatories in 2003 when the Declaration was first drafted, 338 institutions are now among the signatories. A global expansion of the Berlin meetings from Europe (Berlin 1 to Berlin 7) to the world (Berlin 8 in China, Berlin 9 in the U.S.) had been vital, because “research and publishing are glibal issues”. Bernard noted that much had been achieved in relation to green road OA and repositories, but that the Max Planck Society regards the popularization of gold road open access as an important achievement for the future. He went on to note that interdisciplinarity and innovation (e.g. in business) are enabled by OA. Free information is a common good and the spread of knowledge to stakeholders outside academia (teachers and students) is enabled by OA. Bernard observed that to many publishers “the business model is less important than the business itself” and that many publishers would transition to OA if viable business models could be established. He decribed disagreements between publishers, institutions, and researchers in some areas and stressed that the Max Planck Society is ready to work with all stakeholders on the issues at hand. Finally, he stated “we want to become more inclusive” and characterized Open Access as part of a larger movement towards (more) Free Information.

Cathy Norton from the Marine Biology Lab focused on issues close to her field in her talk. She discussed the success of MedLine and pointed out how interested the public is in certain areas of scientific information. The future of medicine, according to Cathy, lies in personalization of drugs and treatments, something that can only be achieved by having large volumes of data freely available. Techniques such as text mining and visual search are key to utilizing such new approaches, as are efforts such as semantic MedLine that map ontological relationships in large volumes of text. Cathy closed by noting the importance of citizen engagement, e.g. in relation to biodiversity data (95% of the publications on biodiversity are from North America and Europe, while the species described are virtually all found in Africa and South America).

The session closed with a questions from Stuart Shieber who wondered how the Max Planck Society wants to support creating an environment that allows publishers to transition to Open Access, a hint that Bernard Schutz made. Bernard replied that there were ongoing conversations between publishers and the MPS on these issues.