Small notes on Big Data from Oxford

On March 25, 2013, in Events, by cornelius

As it was snowing cats and dogs in beautiful Oxford this weekend, I figured I might as well get to a much neglected task: blogging. Following the excellent Workshop Big Data: Rewards and Risks for the Social Sciences here at the Oxford Internet Institute last week (to be followed by another event this week) it feels especially timely to write up a short summary of what I’ve been doing during my stay here, and in the last couple of months in general.

Taken in January, but two months later the weather in this part of England hasn’t changed much.

It’s been fantastic time at the OII so far, and I have to admit that if I could, I would love to stay for a longer period. I arrived in January to snow and after figuring out the ins and outs of Oxford life (how to handle various keycards, where to shop and get coffee, which pubs are best), I quickly settled into my new office on 66 Banbury Road, the institute’s northern outpost. Office space is in short supply for obvious reasons in Oxford (as one can’t exactly tear down a medieval college…), but I liked 66 right away, simply because everyone I’ve encountered here has been incredibly welcoming and friendly, making it easy to settle in and get a lot of work done. I’ve had the chance to chat with a fantastic variety of people in the office, during brownbags, workshops, and over lunch, and it’s an absolutely unique environment. 

So what’s been happening on my end? A few days before my arrival in Oxford, my awesome colleague Jean Burgess and I published our working paper The Politics of Twitter Data in the Humboldt Institute for Internet and Society’s SSRN Discussion Paper Series. The piece has been quite well received, with coverage from Patrick Maier in the NatGeo Explorer blog, as well as from Netzpolitik. Also shortly around the time of my arrival in the UK, the volume Pragmatics of CMC was published by De Gruyter, a volume that has been in the making for quite a few years. I contributed a chapter on blogging to the handbook, expertly edited by Susan C. Herring, Dieter Stein, and Tuija Virtaanen, which destills a lot of my previous research on blogging.

A few weeks into my stay, I was invited to give a talk in the Nuffield Network Seminar Series (see slides below). My presentation focused on the scientific blog networks Hypotheses.org, providing an analysis of the dynamics of knowledge exchange between different scholarly communities inside the platform. A particular interest for me are disciplinary and linguistic communities, a theme that Marco Toledo Bastos, Rodrigo Travitzki and me have also explored in a recent paper on Twitter activism about which I’ll post more soon (Marco will present this research in Paris at HyperText 2013 next month). I’m very keen on doing more (and especially more sophisticated) network analysis and feedback from Bernie Hogan, Sandra Gonzalez-Bailon, and Taha Yasseri has already been invaluable in this regard. I’ll also be delivering an invited talk at the annual meeting of the Berliner Arbeitskreis Information next month on the role of Big Data for knowledge production, in which I will combine insights from research with a dose of criticism and reflection.

Later this spring, I am exceptionally looking forward to ICA 2013 in London, where I will be presenting two papers, one as part of the panel Big Data and Communication Research: Prospects, Perils, Alliances, and Impacts, chaired by Eric T. Meyer,  and another in a (oddly enough) session on Copyright and Digital Piracy with my colleague Merja Mahrt (who, by the way, has also written this important piece on Big Data for communications research). The first panel will see contributions from Eric, Ralph Schroeder, Bernie Hogan and Mark Graham, Matthew Weber, and from danah boyd and Kate Crawford, all of whom are exceptional researchers. My talk will focus on the politics (and economics) of social media platforms as characterized in the relationship between platform providers, data resellers, large media companies and consumers.

As you can probably guess, all of this ties in beautifully with the ongoing activities relating to Big Data here at the OII and future research at the Humboldt Institute for Internet and Society, where Big Data is also a major topic. The workshop last week was part of an initiative funded by the Sloan Foundation to promote discussion about Big Data in the Social Sciences. More discussion is needed about the impact of Big Data on scholarly research, but also on politics, business and culture more broadly.

It seems that 2013 is shaping up to be the year in which academia catches up with Big Data — or at least with some of the hype surrounding it.

I just came back from Deidesheim, a small town (yet with an oddly epic entry in English Wikipedia — what’s up with that?) located in an area of Germany best known for its excellent Riesling, where I participated in the annual meeting of the SciLogs blogging community. My role in Deidesheim, together with my colleague Merja Mahrt, was to nominate a blogger for the SciLogs ’12 best blog award (here’s the winner!) and to give a talk on research on scholarly blogging (slides below).

The past ten days have been a whirlwind tour of sorts, with no less than three (!) different events related to scholarly/science/research blogging that I attended, and I want to take a moment and reflect on some of the things that were discussed and record a few thoughts they provoked.

So let’s start with a list of the events.

Weblogs in the Humanities, Munich

Picture of me during my talk at 'Weblogs in the Humanities'. Photo by Wenke Bönisch.

Last week, I presented at the conference Weblogs in den Geisteswissenschaften (Weblogs in the Humanities), organized by the Deutsche Historische Institut Paris and supported by hypotheses.org, a platform operated by Cléo, a section of the CNRS. The newly launched portal de.hypotheses.org is aimed at the German-speaking scholarly community and follows the model of its French parent. Weblogs (or carnets de recherche, as they are branded under the hypotheses label) are more widely read in France than they are in Germany, a factor which I think partly explains their uptake. Another key to their success seems to be the way they are supported, for example, each blog is provided with an ISSN, making it easier to cite. As part of the editorial team behind de.hypotheses.org, I’m excited to see whether the platform will succeed and follow in the footsteps of its French counterpart, which hosts an impressive 300 scholarly blogs. The conference was certainly an indicator that the topic is on a lot of people’s radar. More detailed reports from the event can be found here (keynote speaker Melissa Terras, in English), here (Wenke Bönisch, in German) and here (Anton Tantner, also in German). During a break, I had the chance to interview Melissa for my postdoc project and was myself interviewed for the German Humanities portal LISA. Thank You to Melissa for taking the time to chat with me and to Georgios Chatzoudis for asking some very thought-provoking questions!

VIDEO OF TALKAUDIO INTERVIEW

Symposia on e-Social Science, Oxford

Next I flew to England for the first time in several years, to visit the Oxford Internet Institute and attend two events, Social Science and Digital Research: Interdisciplinary Insights and Digital Social Research: A Forum for Policy and Practice. There was also a dinner on Monday to mark the formal ending of the Oxford e-Social Science Project and a breakfast on Tuesday morning, where the Euorpean Commission’s SESERV project was discussed and recommendations on how to integrate e-social science methods into teaching and research more closely were formulated. All of theses events were related to the Oxford e-Social Science project in one way or another, therefore the aspect of digital scholarly communication was just one facet of that larger theme. People had a broad discussion of research and teaching practices in the social sciences and how e-science fits into the mix. I found Christine Borgmann‘s keynote on reproducibility very thought-provoking. We take it for grandted that open data will make the research process more transparent and hopefully this is true, but what reproducibility actually amounts to is widely contested and especially tricky in the context of the human and social sciences.

SciLogs Meeting 2012, Deidesheim

Bloggers chatting at scilogs12 in Deidesheim.

After my visit to Oxford, it was on to Deidesheim via Düsseldorf. The SciLogs meetup was yet a different event than both the Munich conference and the symposia in England. SciLogs is comparable to scienceblogs.com in that it’s run by a publisher (Spektrum der Wissenschaft), who has launched it largely as a source of popular science content and that it has an orientation towards the natural sciences (though there are also blogs on history, linguistics and a variety of other fields). It was exciting to chat with people who have been involved in science blogging for years and to learn more about what drives them. I was particularly impressed by the enthusiasm that the sciloggers have for their blogs and their readers. Blogging is hard work (as the gracial pace of my postings here illustrates…) and Spektrum Verlag can be quite proud of the community it has built around the idea of better informing people about scientific research.

Below are some somewhat random points that I found noteworthy.

Scholarly/academic/science/research blogs are written by a wide variety of people (e.g. scholars, journalists, librarians, science enthusiasts), for a wide range of audiences (e.g. self, peers, people in the same field, practitioners, politicians, general public) with a variety of purposes in mind (self-fulfillment, knowledge management). It’s important not just to regard them exclusively as a form of science communication, but to see the many roles they take on for a range of users.

Just as scholarly bloggers and their topics are a diverse bunch, readers and commentators of S/a/s/r blogs have different reasons for visiting and participating. A key motivation among commentators could be that they can add their view to a post. This may seem obvious, but it’s interesting for several reasons. For example, there is fairly little dialogue going on in posts that have a lot of comments. The commentators simply add their take and then leave, without engaging with the blogger or with each other. Debates that do have a lot of actualy discussion sometimes devolve into arguments between individual users that have little to do with the original post. This isn’t a bad thing, but it illustrates that it’s a bad idea to give in to the temptation that a large number of comments translates into success. Or, perhaps speaking of personal success is alright as long as one doesn’t mistake it for societal impact. Another thing is the relation of commentators and readers. It’s not trivial to figure out whether fewer comments means less attention on the part of readers.

In order to play a role in main-stream scholarly communication, as it is still conducted primarily via monographs and journals, scholarly blogging must integrate some of the conventions that exist in these forms (quality control, long-term availability of content, citability), if it is to succeed as a formal genre of scholarly communication, while preserving its intrinsic strengths (speed and simplicity of publication, the potential of interaction via comments, the ability to embed images, video, audio, data and code, the ability to link and quote, the ability to track one’s impact via metrics). The adaptation can happen in multiple ways, and it only applies to formal scholarly communication — what happens informally or for other purposes remains uneffected, as does blogging about science by journalists or hobbyists. Blogging about science and scholarship is obviously in the public interest. The question is, should this be left to the researcher, or should it be incentivized by institutions? As Klaus Graf put it somewhat radically at the conference in Munich, is a researcher who doesn’t blog a bad researcher?