I’ve already shared this bit of personal news with a few friends and colleagues, but I thought I’d blog about it as well — especially since I’m woefully behind on my Iron Blogger schedule. ;-)

After a fairly long time in the making, I have been awarded a three-year research grant from the Deutsche Forschungsgemeinschaft (DFG) for the project Networking, visibility, information: a study of digital genres of scholarly communication and the motives of their users (summary in German on the DFG’s site). The project investigates new forms of scholarly communication (especially blogging and Twitter) and their role for academia. My key concerns are usage motives, i.e. why scholars use blogs and Twitter, and how these motives correspond with usage practices (how they blog and tweet), rather than how many researchers use these channels of communication or what makes them refrain from using them (see this blog post and the study mentioned in it for that kind of work). My main methods will be qualitative interviews with a sample of 20-25 blogging and/or tweeting academics, along with in-depth content analysis of the material they post in these channels over a prolonged period (>1 year). Identifying usage patterns and relating them to the participants’ narrative about their use will be another key objective. Ultimately, I hope to find a (tentative) answer to the question what role blogs and Twitter may play for the future of digital scholarship, and whether they will remain a niche phenomenon or become mainstream over time.

The project follows up on my work on corporate blogging and connects strongly to what we have been doing at the Junior Researchers Group “Science and the Internet” over the past year, but the focus on interviews should result in a more user-centric analysis. As someone who has been doing (applied) linguistic analysis to make inferences about social processes, I feel much more comfortable actually talking to the people I want to study, rather than just crunching numbers on how they tweet. Big data social science research is obviously and understandably en vogue these days, but I hope to find a good synergy between qualitative and quantitative approaches in my project.

My new institutional home for the next three years will be the Berlin School of Library and Information Science at Humboldt University. I’m grateful to Michael Seadle for supporting my project and really look forward to working with my new colleagues at IBI (that’s the German acronym, which, as far as I can tell, is preferred to its more entertaining English equivalent). I also look forward to working with colleagues from the Alexander von Humboldt Institute for Internet and Society (HIIG) where I’m currently supporting the project Regulation Watch. Finally, I plan to keep in close contact with the colleagues in Düsseldorf, both at the Junior Researchers Group and the Department of English Language and Linguistics, where I have learned virtually everything I know about being a researcher. I am especially indebted to Dieter Stein for his enduring support and for his contagious enthusiasm for all aspects of scholarship.

Sic itur ad astra! :-)

For an overview of previous work I’ve done in this direction, have a look at my publications.

Tagged with:  

Those of you following my occasional updates here know that I have previously posted code for graphing Twitter friend/follower networks using R (post #1. post #2). Kai Heinrich was kind enough to send me some updated code for doing so using a newer version of the extremely useful twitteR package. His very crisp, yet thoroughly documented script is pasted below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Script for graphing Twitter friends/followers
# by Kai Heinrich (kai.heinrich@mailbox.tu-dresden.de) 
 
# load the required packages
 
library("twitteR")
library("igraph")
 
# HINT: In order for the tkplot() function to work on mac you need to install 
#       the TCL/TK build for X11 
#       (get it here: http://cran.us.r-project.org/bin/macosx/tools/)
#
# Get User Information with twitteR function getUSer(), 
#  instead of using ur name you can do this with any other username as well 
 
start<-getUser("YOUR_USERNAME") 
 
# Get Friends and Follower names with first fetching IDs (getFollowerIDs(),getFriendIDs()) 
and then looking up the names (lookupUsers()) 
 
friends.object<-lookupUsers(start$getFriendIDs())
follower.object<-lookupUsers(start$getFollowerIDs())
 
# Retrieve the names of your friends and followers from the friend
# and follower objects. You can limit the number of friends and followers by adjusting the 
# size of the selected data with [1:n], where n is the number of followers/friends 
# that you want to visualize. If you do not put in the expression the maximum number of 
# friends and/or followers will be visualized.
 
n<-20 
friends <- sapply(friends.object[1:n],name)
followers <- sapply(followers.object[1:n],name)
 
# Create a data frame that relates friends and followers to you for expression in the graph
relations <- merge(data.frame(User='YOUR_NAME', Follower=friends), 
data.frame(User=followers, Follower='YOUR_NAME'), all=T)
 
# Create graph from relations.
g <- graph.data.frame(relations, directed = T)
 
# Assign labels to the graph (=people's names)
V(g)$label <- V(g)$name
 
# Plot the graph using plot() or tkplot(). Remember the HINT at the 
# beginning if you are using MAC OS/X
tkplot(g)
Tagged with:  

Ahead of publishing my TwitterFunctions library of R code (which is constant work in progress) I thought I’d put up some really short Python code for getting a person’s friends and followers. Both scripts rely on Tweepy, my favorite Python implementation of the Twitter API. Install Python (works on Windows as well, not just on Mac/Linux) and then Tweepy on top of that and you are good to go with these two scripts, which can be executed from the command line with
python get_friends.py username

1
2
3
4
5
6
import sys
import tweepy
 
user = sys.argv[1]
for friend in tweepy.api.friends(user):
	print friend.screen_name
1
2
3
4
5
6
import sys
import tweepy
 
user = sys.argv[1]
for follower in tweepy.api.followers(user):
	print follower.screen_name
Tagged with:  

Unfortunately I’m not able to attend the annual IPrA conference next week in Manchester and had to cancel the trip short notice. I was scheduled to give a talk as part of the session Quoting in Computer-mediated Communication on my work with Katrin Weller on retweeting among scientists.

Luckily for me, there will be a follow-up event of sorts (see below). I’ve posted the call here since it doesn’t seem to be available on the Web other than as a PDF. Submit something if you’re doing research on quoting! I’m fairly sure that the deadline will be extended by a week or two.

CfP: Quoting Now and Then – 3rd International Conference on Quotation and Meaning (ICQM)

University of Augsburg, Germany

19 April – 21 April 2012

Conference Convenors:
Wolfram Bublitz
Jenny Arendholz
Christian Hoffmann
Monika Kirner

Contact: Monika Kirner
E-mail: monika.kirner@phil.uni-augsburg.de

Call for Papers
This conference addresses the pragmatics of quoting as a metacommunicative act both in old (printed) and new (electronically mediated) communication. With the rapid evolution of new media in the last two decades, approaches to the study of (forms, functions and impact of) quoting have been gaining momentum in linguistics. Although quotations in print media have already been investigated to some extent, quoting in computer-mediated communication is still unchartered territory. This conference shall focus on the formal and functional evolution of quoting from old (analog) to new (digital) media. While the conference builds on the panel “Quoting in Computer-mediated Communication” to be presented in July 2011 at the International Conference of Pragmatics (IPrA), it assumes a much broader perspective, paying special tribute to the inherent confluence and complementarity of synchronic and diachronic approaches. Consequently, we invite papers from both (synchronic and diachronic) perspectives to report on the formal, functional as well as the pragmatic-discursive and multimodal nature of quoting in different genres or media.

Plenary talk: Jörg Meibauer

Abstracts:
Please submit an abstract of not more than 500 words (for a 30 min talk plus 10 min discussion) via e-mail to monika.kirner@phil.uni-augsburg.de

Deadline for abstracts:
1 July 2011
15 August 2011

Tagged with:  

I meant to post this a month or so ago, when I was conducting my study of casual tweeting, but didn’t get to it. No harm in posting it now, I guess — code doesn’t go bad, fortunately.

Note: this requires Linux/Unix/OSX, Python 2.6 and the tweepy library. It might also work on Windows, but I haven’t checked.

1. Fetching a single user’s tweets with twitter_fetch.py

The purpose of the script below is to automatically retrieve all new tweets by one or more users, where “new” means all tweets that have been added since the last round of archiving. If the script is called for the first time for a given user, it will try to retrieve all available tweets for that person. It relies on the tweepy package for Python, which is one of a number of libraries providing access to the Twitter API. In case you’re looking for a library for R, check out twitteR.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import sys
import time
import os
import tweepy
 
# make sure that the directory 'Tweets' exists, this is
# where the tweets will be archived
wdir = 'Tweets'
user = sys.argv[1]
id_file = user + '.last_id'
timeline_file = user + '.timeline'
 
if os.path.exists(wdir + '/' + id_file):
	f = open(wdir + '/' + id_file, 'r')
	since = int(f.read())
	f.close()
	tweets = tweepy.api.user_timeline(user, since_id=since)
else:
	tweets = tweepy.api.user_timeline(user)
 
if len(tweets) > 0:
	last_id = str(tweets[0].id)
	tweets.reverse()
 
	# write tweets to file
	f = open(wdir + '/' + timeline_file, 'a+')
	for tweet in tweets:
		output = str(tweet.created_at) + '\t' + tweet.text.replace('\r', ' ').encode('utf-8') + '\t' + tweet.source.encode('utf-8') + '\n'
		f.write(output)
		print output
	f.close()
 
	# write last id to file
	f = open(wdir + '/' + id_file, 'w')
	f.write(last_id)
	f.close()
else:
	print 'No new tweets for ' + user

The code is pretty straight-forward. I wrote it without really knowing Python beyond the bare essentials and relying heavily on IPython‘s code completion. Actual retrieval of tweets happens in a single line:

tweets = tweepy.api.user_timeline(user)
 

The rest of the script is devoted to managing the data and making sure only new tweets are retrieved. This is done via the since_id parameter which is fed the last recorded id that has been saved to the user’s id file in the previous round of archiving. There are more elegant ways of doing this, but any improvements are up to you. ;-)

2. Fetching a bunch of different users’ tweets with twitter_fetch_all.sh

Second comes a very simple bash script. The only thing it does is call twitter_fetch.py once for each user in a list of people you want to track. Again, there are probably other ways of doing this, but I wanted to keep the functions of the two different scripts separate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# This is will perform twitter_fetch,py on the twitter_users[] array. Add any number of twitter_users[NUMBER]="USER" lines below
# to archive additional accounts.
 
# --- twitter user list ---
twitter_users[0]="SomeUser"
twitter_users[1]="SomeOtherUser"
twitter_users[2]="YetAnotherUser"
twitter_users[3]="YouGetTheIdea"
 
# --- execute twitter_fetch.py ---
for twitter_user in ${twitter_users[*]}
do
	echo "Getting tweets for user $twitter_user"
	python twitter_fetch.py $twitter_user
	echo "Done."
	echo ""
done

You should place this in the same directory as twitter_fetch.py and modify it to suit your needs.

3. Automating the whole thing with a cronjob

Finally, here’s a cron directive I used to automate the process and log the result in case any errors occur. Read the linked Wikipedia article if you’re unfamiliar with cron, it’s a very convenient way of automating tasks on Linux/Unix.

0 * * * * sh /root/twitter_fetch_all.sh >/root/twitter_fetch.log
 

(Yes, I’m running this as root. Because I can. And because it’s an EC2 instance with nothing else on it anyway.)

Hope it’s useful to someone, let me know if you have any questions. :-)

Tagged with:  

A crisis of brains indeed

On May 30, 2011, in Thoughts, by cornelius

I want to take a moment to comment on NYT editor Bill Keller’s op ed The Twitter Trap that I read this morning over coffee.

Keller’s piece is one of those self-proclaimed “thoughtful critiques” of digital media that journalists write to prove how relevant they still are. I don’t know about you, but I’m starting to feel like I’ve read hundreds of these “I’m not a luddite, but..”-articles about the pros and cons of the Net and its “impact on our culture” by now. Have a look and see if you recognize the script.

The line of argumentation is sadly predictable: First we are assured of how technologically progressive and up to date the author is and then Twitter, Facebook et al are associated with information overload, dropping attention spans, changes to our brains, souls, culture, and overall well-being. Sprinkled in are references and analogies that make the changes afforded by digital technology seem awe-inspiring and unique, oddly supporting the author’s argument about their world-changing potential. Comparing Mark Zuckerberg to Johannes Gutenberg (as done by Keller), may look catchy at first glance, but feels wrong for so many reasons I don’t know where to begin. For one thing Gutenberg’s invention didn’t make him quite as rich as Goldman Sachs made Zuckerberg and was conceived, you know, to spread the Word of God, rather then monopolizing social networking sites. People like Vint Cerf and Tim Berners-Lee (to name just two examples) were a lot more like Gutenberg in the sense that they enabled a medial shift they didn’t anticipate. Had Gutenberg been like Zuckerberg, we’d all be using the exact same printing press, and Johannes would be able to make sure we’re not using it to print anything nasty.

Forget Twitter and Facebook, this is the real threat to our brains.

Keller uses a fairly canonical set of arguments for his critique. He starts by claiming that people were more adept in memorizing large amounts of information in the pre-Gutenberg era. Alright, agreed, but wouldn’t this have to be followed up with a long list of other things that were also different back then? Mass-media fantasies aside, I think we have one hell of a hard time relating to the medieval mindset and oral culture (which is nothing medieval) is the smallest reason why.

Have our culture and brains really been under siege since the invention of cuniform or alternately, movable type? That’s one hell of a downward spiral, Bill. One oddly decontextualized neuroscience soundbite and several personal anecdotes later, Keller closes with nothing less than a plea for young, confused souls:

My own anxiety is less about the cerebrum than about the soul, and is best summed up not by a neuroscientist but by a novelist. In Meg Wolitzer’s charming new tale, “The Uncoupling,” there is a wistful passage about the high-school cohort my daughter is about to join. Wolitzer describes them this way: “The generation that had information, but no context. Butter, but no bread. Craving, but no longing.”

No longing, Bill? Like, seriously?

(I’ll skip the part where nobody, you know, knowledgable is consulted on the topic, but an argument is instead stitched together from a (seemingly unrelated) neuroscience experiment and a novel. Just know that I’ll be watching for a similar choice of sources in a NYT op ed when the next energy crisis or financial crisis looms.)

If you ask me there’s plenty of longing alright, but it’s not the longing, craving or whatever of these poor young people for “context”, but rather the longing of a newspaper editor for coherence, authority and control. It’s a crisis of brain and soul for journalists and other power elites (scholars, teachers, politicians, parents) who find themselves challenged by what the kids are doing. The example Keller gives about asking a complex question on Twitter and getting a short, reductive answer is telling in several ways. It’s not just that the expectations are wrong, it’s that his way of using Twitter is characterized as the way of using Twitter. Only that a prominent journalist’s use of microblogging bears little similarity to what the kids for whom Keller fears are doing. Claiming that they live in a world with “no context” (ah, the idiocy that expression!) is equally demeaning and implausible. It just that it’s a context that both journalists and parents have difficulty understanding.

What cheeses me off about all of this is that we could be having a real debate about what the implications of digital technology are, instead of playing out this tired, old are-you-for-or-against-the-Internet trope which by now feels extremely dated. Keller’s criticism looks oddly similar to that of Frank Schirrmacher, another journalist, and editor of the German newspaper Frankfurter Allgemeine Zeitung. Schirrmacher described the Internet as “a threat to our brains” in a book he published in 2009. Funny just how many “credible digital Cassandras” (Keller) decide to spread their warnings by the means of well-publicized books. They speak at conferences, peddle their “criticism” on TV shows and write thoughtful newspaper op eds reminding us of a simpler time when information was scarce and (comparably) easy to control and monetize. They remind us, the silly, star-eyed public, that not everything related to the Internet is teh awesome, but that some things there are smutty, bad and dangerous and that we must watch out before our children succumb to technology’s evil influences, which is best achieved by reading their thoughtful, balanced-yet-critical books. Except that these are mostly tired enumerations of speculations, soundbites from neurologists related as closely to Facebook’s effect on society as to the effect on brain-eating zombies on our mental health, and uninformed, extremely self-referential deliberations how the Internet is scary to elites based on how they are using it. They exaggerate the impact of digital technology and its uniqueness because if you’re in the horse and buggy industry, nothing is scarier than the automobile. And finally, they assume that everyone uses Twitter, Facebook and other services in the same way, which, as it turns out, is not true.

There is of course, a lot that can go wrong in the future. Data is increasingly treated as capital, and those who are producing it aren’t the ones owning the inferences mined from it about attitudes, behaviors and consumer choices. Despite the clamor that through the Interent information is available to EVERYONE, it’s neither true that everyone has access nor that we’re even all using the same Internet. Censorship, privacy, I could go on. But why bother with these complicated issues if you, as the executive editor of a leading newspaper, can instead lament about how you’re terribly conflicted about all this change that’s going on? Why acknowledge that the situation is complex and has many facets when you can instead troll a bit and get a lot of “how dare yous” and “finally someone says its” in response? Funny how even a piece about the dangers of Twitter has to be, you know, debatable on Twitter. 

Come to think of it, this would make a stellar title for a book:

OMG! ZOMBIES! How the hysteria of our social elites about the Internet is keeping us from engaging in a serious discussion.

Tagged with:  

Liebe Twitter-Nutzerin,
Lieber Twitter-Nutzer,

Ich bin Sprachwissenschaftler an der Universität Düsseldorf und beschäftige mich schwerpunktmäßig mit Internetkommunikation. Als Teil der Studie “Aspekte privater Twitter-Kommunikation” möchte die Nutzungsgewohnheiten von deutschsprachigen Twitter-Nutzern untersuchen, die Twitter nicht ausschließlich beruflich einsetzen (im Gegensatz zu z.B. Journalisten, Wissenschaftlern, Politikern, und anderen Menschen in Kommunikationsberufen). Zu diesem Zweck würde ich gerne deine öffentlichen Tweets einen Monat lang aufzeichnen und auswerten. Anschließend würde ich dir gerne per Mail einige Fragen (nicht mehr als 10) zu deiner Twitter-Nutzung stellen.

Es werden ausschließlich öffentliche Tweets (also keine DMs) aufgezeichnet. Sämtliche Daten werden anonymisiert (d.h. Namen — auch Twitter-Nicknames — entfernt) und nicht an Dritte weitergegeben. Einzelne Tweets können über das Hashtag #exclude jeder Zeit aus der Aufzeichnung ausgeschlossen werden. Am Ende des Untersuchungszeitraum schicke ich dir bei Interesse gerne ein Archiv deiner aufgezeichneten Tweets zu.

Neben deinem Beitrag zur wissenschaftlichen Forschung winkt auch eine (kleine) Aufwandsentschädigung: ich verlose am Ende des Untersuchungszeitraum unter den Teilnehmern einen Amazon-Gutschein im Wert von 50 Euro. :-)

Wenn du zu einer Teilnahme bereit bist, schicke bitte eine kurze Mail an Cornelius.Puschmann@uni-duesseldorf.de (Edit: natürlich kannst du dich auch per Twitter melden). Falls du nicht teilnehmen möchtest, musst du nichts weiter tun. Fragen zur Studie beantworte ich gerne per Mail.

Schon jetzt vielen Dank für dein Interesse und deine Unterstützung!

Dr. Cornelius Puschmann
Nachwuchsforschergruppe “Wissenschaft und Internet”
Heinrich-Heine-Universität Düsseldorf

Tagged with:  

As part of the research we’re doing in Düsseldorf on the use of Twitter at academic conferences, here’s a poster we’re presenting in a few days at GOR ’11:

Here’s the citation for the poster:

Puschmann, C., Weller, K., & Dröge, E. (2011). Studying Twitter conversations as (dynamic) graphs: visualization and structural comparison. Presented at General Online Research, 14-16 March 2011, Düsseldorf, Germany. Retrieved from http://ynada.com/posters/gor11.pdf.

See this older post for more information on how to visualize dynamic graphs of retweets with Gephi.

Tagged with:  

I thought I’d write a brief update to this earlier post discussing the consequences of what has recently happened with Twitter’s TOS update/enforcement of the redistribution clause. Here is a concise summary from ReadWriteWeb:

[..] Twitter’s recent announcement that it was no longer granting whitelisting requests and that it would no longer allow redistribution of content will have huge consequences on scholars’ ability to conduct their research, as they will no longer have the ability to collect or export datasets for analysis.

Read this earlier RWW post for more background. Twitter has cracked down on services like TwapperKeeper and 140kit.com that allow users not only to track Twitter keywords and hashtags, but also to export and download archives of tweets in XML or CSV. Apparently Twitter wants to stop redistribution of “its” content to the extent possible, including redistribution for research purposes. From the RWW post:

140kit offered its Twitter datasets to other scholars for their own research. By no means a full or complete scraping of Twitter data, this information that the project had collected was still made available for download (for free) to researchers. But no longer.

The people at 140kit, to their credit, are working on an approach which would allow researchers to work with Twitter data without exporting data, but rather by using their interface. From 140kit’s website:

We have a solution, which will involve using a plugin based analytical approach, which will not allow you to export data, but will, with Twitter’s blessings, allow you to ask any questions to your dataset with ease.

Hmm, sorry, but I’m underwhelmed. There are already countless services out there that allow Twitter analysis in some form, often with nebulous results, because data collection and methods are not transparent. With any list of frequent terms on Twitter the question needs to be What stop words did you exclude? How clean is your data? I can’t know whether these things are done appropriately for my analysis unless I do them myself. You might object that not everyone is keen on sifting through CSV files with their own scripts. That’s true outside of academic research — for a casual analysis using a GUI tool for Twitter analysis might be okay — but for serious analysis direct access to the raw data itself is a must. And beyond just having access yourself, in the spirit of reproducible research it’s important to distribute the dataset along with your paper. That’s where we should be heading, rather than basing our analyses on pre-produced tools and mechanisms which handle the data in ways which are intransparent and beyond our control.

Will this shut off researcher’s access to Twitter data, as the RWW article claims? Not really, at least not everyone’s access. Those researchers who build their own tools (or deploy existing ones, such as yourTwapperKeeper, on their own servers) will have no trouble at all getting all the data they want. It’s just the rest — those who can’t code, or lack tech support (=funding) who will be restricted to simple GUI tools. If you’re a PhD student at a small university, in a department with no technical expertise or support, you have a competitive disadvantage. More power to computer scientists, and to centers like Berkman and the OII, this decision seems to say.

How to solve this problem? Luckily services like Amazon AWS level the playing field somewhat. Setting up and account there to scrape Twitter on a regular basis (for example with yourTwapperKeeper, or with your own set of scripts) is probably the best alternative to using a service like 140kit.

Note: Check out this video interview with John O’Brian of TwapperKeeper, who basically gives the same advice.

Tagged with:  

Academic replacements for TwapperKeeper.com

On February 23, 2011, in Thoughts, by cornelius

Update: I’ve written a follow-up to this post.

A few days ago, the people behind Twitter archival site TwapperKeeper.com announced that they will be discontinuing the export feature of the service on March 20, 2011. Apparently the feature is in violation of Twitter’s terms of service, at least in the form it’s currently implemented in TwapperKeeper.

Unfortunately this cuts off a number of academics who are investigating communication on Twitter for scientific purposes from a convenient data source. While it’s fairly easy to get data directly via the Twitter API (which is what TwapperKeeper was doing), I know many people who want to concentrate on the data itself, rather than running their own servers to scrape Twitter on a regular basis. What’s more is that Twitter’s attitude is worrisome: many of us have tried to get an exemption from API rate limits in the past, to no avail. Twitter doesn’t give researchers privileged access to their data, and now they’re crippling TwapperKeeper on top of that.

Bottom line: what will we use after March 20? Ideally, a replacement would provide the following:

  • the hashtag/search query functionality of TwapperKeeper,
  • the export functionality of TwapperKeeper,
  • exclusive use for academic purposes (on the grounds that this might keep Twitter from shutting it down),
  • stability and reliability,
  • long-term viability.

The last point is important, because I don’t think it will be difficult to set up a server somewhere to suit the needs of a few people, but a larger-scale solution seems more sensible in the long run. Maybe JISC can do something like that, based on yourTwapperKeeper (which they supported)? Or one of the big institutes (OII, Berkman)? Either way it would be nice to find an alternative that doesn’t give those of us with devs and major IT support behind them a huge edge over the rest…

Tagged with: