As it was snowing cats and dogs in beautiful Oxford this weekend, I figured I might as well get to a much neglected task: blogging. Following the excellent Workshop Big Data: Rewards and Risks for the Social Sciences here at the Oxford Internet Institute last week (to be followed by another event this week) it feels especially timely to write up a short summary of what I’ve been doing during my stay here, and in the last couple of months in general.
It’s been fantastic time at the OII so far, and I have to admit that if I could, I would love to stay for a longer period. I arrived in January to snow and after figuring out the ins and outs of Oxford life (how to handle various keycards, where to shop and get coffee, which pubs are best), I quickly settled into my new office on 66 Banbury Road, the institute’s northern outpost. Office space is in short supply for obvious reasons in Oxford (as one can’t exactly tear down a medieval college…), but I liked 66 right away, simply because everyone I’ve encountered here has been incredibly welcoming and friendly, making it easy to settle in and get a lot of work done. I’ve had the chance to chat with a fantastic variety of people in the office, during brownbags, workshops, and over lunch, and it’s an absolutely unique environment.
So what’s been happening on my end? A few days before my arrival in Oxford, my awesome colleague Jean Burgess and I published our working paper The Politics of Twitter Data in the Humboldt Institute for Internet and Society’s SSRN Discussion Paper Series. The piece has been quite well received, with coverage from Patrick Maier in the NatGeo Explorer blog, as well as from Netzpolitik. Also shortly around the time of my arrival in the UK, the volume Pragmatics of CMC was published by De Gruyter, a volume that has been in the making for quite a few years. I contributed a chapter on blogging to the handbook, expertly edited by Susan C. Herring, Dieter Stein, and Tuija Virtaanen, which destills a lot of my previous research on blogging.
A few weeks into my stay, I was invited to give a talk in the Nuffield Network Seminar Series (see slides below). My presentation focused on the scientific blog networks Hypotheses.org, providing an analysis of the dynamics of knowledge exchange between different scholarly communities inside the platform. A particular interest for me are disciplinary and linguistic communities, a theme that Marco Toledo Bastos, Rodrigo Travitzki and me have also explored in a recent paper on Twitter activism about which I’ll post more soon (Marco will present this research in Paris at HyperText 2013 next month). I’m very keen on doing more (and especially more sophisticated) network analysis and feedback from Bernie Hogan, Sandra Gonzalez-Bailon, and Taha Yasseri has already been invaluable in this regard. I’ll also be delivering an invited talk at the annual meeting of the Berliner Arbeitskreis Information next month on the role of Big Data for knowledge production, in which I will combine insights from research with a dose of criticism and reflection.
Later this spring, I am exceptionally looking forward to ICA 2013 in London, where I will be presenting two papers, one as part of the panel Big Data and Communication Research: Prospects, Perils, Alliances, and Impacts, chaired by Eric T. Meyer, and another in a (oddly enough) session on Copyright and Digital Piracy with my colleague Merja Mahrt (who, by the way, has also written this important piece on Big Data for communications research). The first panel will see contributions from Eric, Ralph Schroeder, Bernie Hogan and Mark Graham, Matthew Weber, and from danah boyd and Kate Crawford, all of whom are exceptional researchers. My talk will focus on the politics (and economics) of social media platforms as characterized in the relationship between platform providers, data resellers, large media companies and consumers.
As you can probably guess, all of this ties in beautifully with the ongoing activities relating to Big Data here at the OII and future research at the Humboldt Institute for Internet and Society, where Big Data is also a major topic. The workshop last week was part of an initiative funded by the Sloan Foundation to promote discussion about Big Data in the Social Sciences. More discussion is needed about the impact of Big Data on scholarly research, but also on politics, business and culture more broadly.
It seems that 2013 is shaping up to be the year in which academia catches up with Big Data — or at least with some of the hype surrounding it.
I’ve already shared this bit of personal news with a few friends and colleagues, but I thought I’d blog about it as well — especially since I’m woefully behind on my Iron Blogger schedule.
After a fairly long time in the making, I have been awarded a three-year research grant from the Deutsche Forschungsgemeinschaft (DFG) for the project Networking, visibility, information: a study of digital genres of scholarly communication and the motives of their users (summary in German on the DFG’s site). The project investigates new forms of scholarly communication (especially blogging and Twitter) and their role for academia. My key concerns are usage motives, i.e. why scholars use blogs and Twitter, and how these motives correspond with usage practices (how they blog and tweet), rather than how many researchers use these channels of communication or what makes them refrain from using them (see this blog post and the study mentioned in it for that kind of work). My main methods will be qualitative interviews with a sample of 20-25 blogging and/or tweeting academics, along with in-depth content analysis of the material they post in these channels over a prolonged period (>1 year). Identifying usage patterns and relating them to the participants’ narrative about their use will be another key objective. Ultimately, I hope to find a (tentative) answer to the question what role blogs and Twitter may play for the future of digital scholarship, and whether they will remain a niche phenomenon or become mainstream over time.
The project follows up on my work on corporate blogging and connects strongly to what we have been doing at the Junior Researchers Group “Science and the Internet” over the past year, but the focus on interviews should result in a more user-centric analysis. As someone who has been doing (applied) linguistic analysis to make inferences about social processes, I feel much more comfortable actually talking to the people I want to study, rather than just crunching numbers on how they tweet. Big data social science research is obviously and understandably en vogue these days, but I hope to find a good synergy between qualitative and quantitative approaches in my project.
My new institutional home for the next three years will be the Berlin School of Library and Information Science at Humboldt University. I’m grateful to Michael Seadle for supporting my project and really look forward to working with my new colleagues at IBI (that’s the German acronym, which, as far as I can tell, is preferred to its more entertaining English equivalent). I also look forward to working with colleagues from the Alexander von Humboldt Institute for Internet and Society (HIIG) where I’m currently supporting the project Regulation Watch. Finally, I plan to keep in close contact with the colleagues in Düsseldorf, both at the Junior Researchers Group and the Department of English Language and Linguistics, where I have learned virtually everything I know about being a researcher. I am especially indebted to Dieter Stein for his enduring support and for his contagious enthusiasm for all aspects of scholarship.
Sic itur ad astra!
For an overview of previous work I’ve done in this direction, have a look at my publications.
Those of you following my occasional updates here know that I have previously posted code for graphing Twitter friend/follower networks using R (post #1. post #2). Kai Heinrich was kind enough to send me some updated code for doing so using a newer version of the extremely useful twitteR package. His very crisp, yet thoroughly documented script is pasted below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
# Script for graphing Twitter friends/followers # by Kai Heinrich (firstname.lastname@example.org) # load the required packages library("twitteR") library("igraph") # HINT: In order for the tkplot() function to work on mac you need to install # the TCL/TK build for X11 # (get it here: http://cran.us.r-project.org/bin/macosx/tools/) # # Get User Information with twitteR function getUSer(), # instead of using ur name you can do this with any other username as well start<-getUser("YOUR_USERNAME") # Get Friends and Follower names with first fetching IDs (getFollowerIDs(),getFriendIDs()) and then looking up the names (lookupUsers()) friends.object<-lookupUsers(start$getFriendIDs()) follower.object<-lookupUsers(start$getFollowerIDs()) # Retrieve the names of your friends and followers from the friend # and follower objects. You can limit the number of friends and followers by adjusting the # size of the selected data with [1:n], where n is the number of followers/friends # that you want to visualize. If you do not put in the expression the maximum number of # friends and/or followers will be visualized. n<-20 friends <- sapply(friends.object[1:n],name) followers <- sapply(followers.object[1:n],name) # Create a data frame that relates friends and followers to you for expression in the graph relations <- merge(data.frame(User='YOUR_NAME', Follower=friends), data.frame(User=followers, Follower='YOUR_NAME'), all=T) # Create graph from relations. g <- graph.data.frame(relations, directed = T) # Assign labels to the graph (=people's names) V(g)$label <- V(g)$name # Plot the graph using plot() or tkplot(). Remember the HINT at the # beginning if you are using MAC OS/X tkplot(g)
Unfortunately I’m not able to attend the annual IPrA conference next week in Manchester and had to cancel the trip short notice. I was scheduled to give a talk as part of the session Quoting in Computer-mediated Communication on my work with Katrin Weller on retweeting among scientists.
Luckily for me, there will be a follow-up event of sorts (see below). I’ve posted the call here since it doesn’t seem to be available on the Web other than as a PDF. Submit something if you’re doing research on quoting! I’m fairly sure that the deadline will be extended by a week or two.
CfP: Quoting Now and Then – 3rd International Conference on Quotation and Meaning (ICQM)
University of Augsburg, Germany
19 April – 21 April 2012
Contact: Monika Kirner
Call for Papers
This conference addresses the pragmatics of quoting as a metacommunicative act both in old (printed) and new (electronically mediated) communication. With the rapid evolution of new media in the last two decades, approaches to the study of (forms, functions and impact of) quoting have been gaining momentum in linguistics. Although quotations in print media have already been investigated to some extent, quoting in computer-mediated communication is still unchartered territory. This conference shall focus on the formal and functional evolution of quoting from old (analog) to new (digital) media. While the conference builds on the panel “Quoting in Computer-mediated Communication” to be presented in July 2011 at the International Conference of Pragmatics (IPrA), it assumes a much broader perspective, paying special tribute to the inherent confluence and complementarity of synchronic and diachronic approaches. Consequently, we invite papers from both (synchronic and diachronic) perspectives to report on the formal, functional as well as the pragmatic-discursive and multimodal nature of quoting in different genres or media.
Plenary talk: Jörg Meibauer
Please submit an abstract of not more than 500 words (for a 30 min talk plus 10 min discussion) via e-mail to email@example.com
Deadline for abstracts:
1 July 2011
15 August 2011
Ich bin Sprachwissenschaftler an der Universität Düsseldorf und beschäftige mich schwerpunktmäßig mit Internetkommunikation. Als Teil der Studie “Aspekte privater Twitter-Kommunikation” möchte die Nutzungsgewohnheiten von deutschsprachigen Twitter-Nutzern untersuchen, die Twitter nicht ausschließlich beruflich einsetzen (im Gegensatz zu z.B. Journalisten, Wissenschaftlern, Politikern, und anderen Menschen in Kommunikationsberufen). Zu diesem Zweck würde ich gerne deine öffentlichen Tweets einen Monat lang aufzeichnen und auswerten. Anschließend würde ich dir gerne per Mail einige Fragen (nicht mehr als 10) zu deiner Twitter-Nutzung stellen.
Es werden ausschließlich öffentliche Tweets (also keine DMs) aufgezeichnet. Sämtliche Daten werden anonymisiert (d.h. Namen — auch Twitter-Nicknames — entfernt) und nicht an Dritte weitergegeben. Einzelne Tweets können über das Hashtag #exclude jeder Zeit aus der Aufzeichnung ausgeschlossen werden. Am Ende des Untersuchungszeitraum schicke ich dir bei Interesse gerne ein Archiv deiner aufgezeichneten Tweets zu.
Neben deinem Beitrag zur wissenschaftlichen Forschung winkt auch eine (kleine) Aufwandsentschädigung: ich verlose am Ende des Untersuchungszeitraum unter den Teilnehmern einen Amazon-Gutschein im Wert von 50 Euro.
Wenn du zu einer Teilnahme bereit bist, schicke bitte eine kurze Mail an Cornelius.Puschmann@uni-duesseldorf.de (Edit: natürlich kannst du dich auch per Twitter melden). Falls du nicht teilnehmen möchtest, musst du nichts weiter tun. Fragen zur Studie beantworte ich gerne per Mail.
Schon jetzt vielen Dank für dein Interesse und deine Unterstützung!
Dr. Cornelius Puschmann
Nachwuchsforschergruppe “Wissenschaft und Internet”
Here’s the citation for the poster:
Puschmann, C., Weller, K., & Dröge, E. (2011). Studying Twitter conversations as (dynamic) graphs: visualization and structural comparison. Presented at General Online Research, 14-16 March 2011, Düsseldorf, Germany. Retrieved from http://ynada.com/posters/gor11.pdf.
I thought I’d write a brief update to this earlier post discussing the consequences of what has recently happened with Twitter’s TOS update/enforcement of the redistribution clause. Here is a concise summary from ReadWriteWeb:
[..] Twitter’s recent announcement that it was no longer granting whitelisting requests and that it would no longer allow redistribution of content will have huge consequences on scholars’ ability to conduct their research, as they will no longer have the ability to collect or export datasets for analysis.
Read this earlier RWW post for more background. Twitter has cracked down on services like TwapperKeeper and 140kit.com that allow users not only to track Twitter keywords and hashtags, but also to export and download archives of tweets in XML or CSV. Apparently Twitter wants to stop redistribution of “its” content to the extent possible, including redistribution for research purposes. From the RWW post:
140kit offered its Twitter datasets to other scholars for their own research. By no means a full or complete scraping of Twitter data, this information that the project had collected was still made available for download (for free) to researchers. But no longer.
The people at 140kit, to their credit, are working on an approach which would allow researchers to work with Twitter data without exporting data, but rather by using their interface. From 140kit’s website:
We have a solution, which will involve using a plugin based analytical approach, which will not allow you to export data, but will, with Twitter’s blessings, allow you to ask any questions to your dataset with ease.
Hmm, sorry, but I’m underwhelmed. There are already countless services out there that allow Twitter analysis in some form, often with nebulous results, because data collection and methods are not transparent. With any list of frequent terms on Twitter the question needs to be What stop words did you exclude? How clean is your data? I can’t know whether these things are done appropriately for my analysis unless I do them myself. You might object that not everyone is keen on sifting through CSV files with their own scripts. That’s true outside of academic research — for a casual analysis using a GUI tool for Twitter analysis might be okay — but for serious analysis direct access to the raw data itself is a must. And beyond just having access yourself, in the spirit of reproducible research it’s important to distribute the dataset along with your paper. That’s where we should be heading, rather than basing our analyses on pre-produced tools and mechanisms which handle the data in ways which are intransparent and beyond our control.
Will this shut off researcher’s access to Twitter data, as the RWW article claims? Not really, at least not everyone’s access. Those researchers who build their own tools (or deploy existing ones, such as yourTwapperKeeper, on their own servers) will have no trouble at all getting all the data they want. It’s just the rest — those who can’t code, or lack tech support (=funding) who will be restricted to simple GUI tools. If you’re a PhD student at a small university, in a department with no technical expertise or support, you have a competitive disadvantage. More power to computer scientists, and to centers like Berkman and the OII, this decision seems to say.
How to solve this problem? Luckily services like Amazon AWS level the playing field somewhat. Setting up and account there to scrape Twitter on a regular basis (for example with yourTwapperKeeper, or with your own set of scripts) is probably the best alternative to using a service like 140kit.
Note: Check out this video interview with John O’Brian of TwapperKeeper, who basically gives the same advice.