THATCampCologne: lessons learned

On September 24, 2010, in Events, by cornelius

THATCampCologne 2010: ...and done!

Update: here are the (few) photos I took. Let me know if you come across more. For notes on the five sessions we did (in German) see here: 1, 2, 3, 4, 5. Thanks, Anja!

With THATCampCologne now a week behind us, I thought I’d provide a brief report of what happened, what we learned, and what we’ll do next. I’ll do this in English to give campers in the U.S. and elsewhere a very general idea of what the first camp in Germany was like. We’ll probably also write a report in German with focus on a more local audience. As we learned, language is a key issue when planning a camp, a point that I’ll return to.

First a little history. The original thought of organizing a barcamp on technology and humanities issues in Germany came from Robert Forkel from the Max Planck Digital Library in Munich during a conversation on the rooftop of the MPDL’s office in Amalienstrasse. I met Robert in 2008 at the first Public Knowledge Conference in Vancouver and we’ve been in touch since. He’s a mathematician by training, but he’s been involved in projects related to language and e-infrastructures in one way or another for several years now (see here for more on that). He developed the Web version of WALS, which I consider a great example of a virtual research environment for language typologists. Robert couldn’t be in Cologne in the end because of a sports injury, but I’m confident that we’ll get him to do a bootcamp session next year (Python, anyone?).

Robert and I liked the idea of doing a camp, but didn’t really know where to do it. The MPDL and Düsseldorf University didn’t really have an official stake in the Digitale Humanities and we were both unsure whether they would be enthusiastic about this kind of event.

At some point this Spring I was in Cologne at the Hochschulbibliothekszentrum NRW (hbz) where I was working on the Open Access platform. I had lunch with a bunch of guys and luckily for me, one of them was Patrick Sahle. Patrick is a lecturer at the Cologne Center for eHumanities and also a member of the (to me) somewhat mystical Institut für Dokumentologie und Editorik (Institute for Documentology and Scholarly Editing, IDE). A bunch of pretty cool people are members of the IDE, which is a virtual organization and somewhat of a DH think tank from what I gather. They also do some consulting, so you know who to turn to with all your geeky, TEI-related, edition-2.0-needs.

I pitched my idea of a technology and humanities camp to Patrick over a pizza. He loved it and despite having a ton of other projects on his plate he promoted the idea in Cologne and convinced the CCeH to sponsor us. In the end, Patrick did most of the heavy organizational lifting in terms of booking rooms, printing posters, etc. We were both very busy with other stuff in the run-up to the camp and I think in the week before it happened we were both starting to panic a little bit. Less than 30 people had registered through the website and we were unsure how to handle an event with a bunch of strangers and no concise program.

So, how did it turn out? In one word, it was awesome.

Awesome as in if you get the right mix of people together you’ll end up with so many ideas that two days are actually not enough to talk about all of them. And there’s really not much you need to do in your role as organizer other than step aside and let people do their thing. The less you try to control it, the better things work. We didn’t produce a manifesto in the end like they did in Paris and compared to THATCamp Prime or THATCamp London it was probably slower and more relaxed. About 30 people from a range of disciplines (musicology, ethnography, linguistics, history, classics, oriental studies, library & info sciences, computer science) and academic levels (students, phds, postdocs, one full professor) attended. Topics for sessions included XML/TEI, semantic web, visualization, corpus linguistics, language documentation, blogs in teaching, pulling bits of information from Wikipedia and a short presentation on a multi-touchscreen table for scholarly editing that is being developed at the RWTH Aachen. Here are the notes I jotted down during the final brainstorming session:

Bad things
1. Registration unclear/confusing. We never actually told people that they had been accepted after they registered. Duh.
2. Very important: We should have done the whole thing in German from the start. The local DH scence communicates primarily in German; having to write an abstract in English considered work, not fun.
3. Too little activity on the blog/Twitter in the weeks before the camp
4. Earlier mailing of posters.
5. New, (partly) unfamiliar concept of a bar camp (but people liked that, those who attended were sold on the concept).
6. We didn’t talk to people on a local level (at universities in the area) enough.
7. Too conference-like, ban Powerpoint next time (it was allowed).
8. We didn’t have any coffee and the first day took place in a room with no windows. I’m serious. Yeah, **that’s** how much people liked it.

Good things
1. Openness, interdisciplinarity.
2. An opportunity for people with different levels of experience to mingle.
3. Meet people and learn about their work/projects.
4. Pick up little, useful things (R snippet, wikipedia code).
5. Debate issues.
6. Yak.
7. Hack.
8. Great pretzels.

Next time
- Boot Camp! (or as a seperate event?)
- break up into small groups
- do challenges, problem solving/hacking

I know the event was somewhat of a black box in terms of Twitter coverage, but I think that was mostly because we were so busy with chatting in person that we didn’t really find the time to tweet. Patrick should have a list of the participants and we’ll get in touch with everyone once we know what’ll happen next. I’m hoping for a kind of Spring/Fall cycle of camps in Berlin and Cologne, as two campers from Berlin have strong DH affiliations. We’ll see. At any rate, it turned out to a be a tremendous amount of fun and really motivating.

If anyone out there is considering organizing a THATCamp, my only advice to them is to relax and not sweat it. The stuff we did wrong we’ll do right next time, but what the camp is really about is entirely up to the people who show up and make it (literally) happen. All you can do as organizer is to buy snacks (our campers loved fruit, so consider hitting a farmer’s market), get rooms (ideally *with* windows), wireless Internet and a projector. Oh yeah, and coffee.

The rest, incredibly, will just fall into place by itself.

Tagged with:  

Post-event Twitter stats for #THATcamp

On May 26, 2010, in data, by cornelius

I thought I’d post an updated version of the simple stats on Twitter activity presented here. The data in the older post was collected before THATcamp took place, the graphs below show the activity during and after the camp.

The tweets I’ve collected are also available here (my own file) and on TwapperKeeper.

Tweets over time (roughly 14th of May to 24th)

Most active users

Most @-messaged users

Most retweeted users

Tagged with:  

URLs tweeted at #THATCamp (all 230 of them)

On May 24, 2010, in data, by cornelius

I’ve data-mined the #thatcamp hashtag a bit more and extracted all 230 links that were tweeted recently (also includes some of THATCamp Paris). Enjoy :-)

(or go here to view the table inside Google Docs)

Tagged with:  

Edit: I’ve posted an updated version of the script here. It is not quite as compressed as Anatol’s version, but I think it’s a decent compromise between readability and efficiency. :-)

Edit #2 And yet another update, this one contributed by Kai Heinrich.

I hacked together some code for R last night to visualize a Twitter graph (=who you are following and who is following you) that I briefly showed at the session on visualizing text today at THATCamp and that I wanted to share. My comments in the code are very basic and there is much to improve, but in the spirit of “release early, release often”, I think it’s better to get it out there right away.


Note that packages are most easily installed with the install.packages() function inside of R, so R is really the only thing you need to download initially.


# Load twitteR package

# Load igraph package

# Set up friends and followers as vectors. This, along with some stuff below, is not really necessary, but the result of my relative inability to deal with the twitter user object in an elegant way. I'm hopeful that I will figure out a way of shortening this in the future

friends <- as.character()
followers <- as.character()

# Start an Twitter session. Note that the user through whom the session is started doesn't have to be the one that your search for in the next step. I'm using myself (coffee001) in the code below, but you could authenticate with your username and then search for somebody else.

sess <- initSession('coffee001', 'mypassword')

# Retrieve a maximum of 500 friends for user 'coffee001'.

friends.object <- userFriends('coffee001', n=500, sess)

# Retrieve a maximum of 500 followers for 'coffee001'. Note that retrieving many/all of your followers will create a very busy graph, so if you are experimenting it's better to start with a small number of people (I used 25 for the graph below).

followers.object <- userFollowers('coffee001', n=500, sess)

# This code is necessary at the moment, but only because I don't know how to slice just the "name" field for friends and followers from the list of user objects that twitteR retrieves. I am 100% sure there is an alternative to looping over the objects, I just haven't found it yet. Let me know if you do...

for (i in 1:length(friends.object))
friends <- c(friends, friends.object[[i]]@name);

for (i in 1:length(followers.object))
followers <- c(followers, followers.object[[i]]@name);

# Create data frames that relate friends and followers to the user you search for and merge them.

relations.1 <- data.frame(User='Cornelius', Follower=friends)
relations.2 <- data.frame(User=followers, Follower='Cornelius')
relations <- merge(relations.1, relations.2, all=T)

# Create graph from relations.

g <-, directed = T)

# Assign labels to the graph (=people's names)

V(g)$label <- V(g)$name

# Plot the graph.


For the screenshot below I've used the tkplot() method instead of plot(), which allows you to move around and highlight elements interactively with the mouse after plotting them. The graph only shows 20 people in order to keep the complexity manageable.

Tagged with:  

Visualizing text: theory and practice

On May 19, 2010, in Thoughts, by cornelius

Note: I’ve also posted this on

Bad, bad me — of course I’ve been putting off writing up my ideas and thoughts for THATcamp almost to the latest possible moment. Waiting so long has one definitive advantage though: I get to point to some of the interesting suggestions that have already been posted here and (hopefully) add to them.

I’d like to both discuss and do text visualization. Charts, maps, infographics and other forms of visualization are becoming increasingly popular as we are faced with large quantities of textual data from a variety of sources. To linguists and literary scholars, visualizing texts can (among other things) be interesting to uncover things about language as such (corpus linguistics) and about individual texts and their authors (narratology, stylometrics, authorship attribution), while to a wide range of other disciplines the things that can be inferred from visualization (social change, spreading of cultural memes) beyond the text itself can be interesting.

What can we potentially visualize? This may seem to be a naive question, but I believe that only by trying out virtually everything we can think of (distribution of letters, words, word classes, n-grams, paragraphs, …; patterning of narrative strands, structure of dialog, occurrence of specific rhetorical devices; references to places, people, points in time…; emotive expressions, abstract verbs, dream sequences… you name it) can we reach conclusions about what (if anything!) these things might mean.

How can we visualize text? If we consider for a moment how we mostly visualize text today it quickly becomes apparent that there is much more we could be doing. Bar plots, line graphs and pie charts are largely instruments for quantification, yet very often quantitative relations between elements aren’t our only concern when studying text. Word clouds add plasticity, yet they eliminate the sequential patterning of a text and thus do not represent its rhetorical development from beginning to end. Trees and maps are interesting in this regard, but by and large we hardly utilize the full potential of visualization as a form of analysis, for example by using lines, shapes, color (!) and beyond that, movement (video) in a way that suits the kind of data we are dealing with.

What tools can we use to do visualization? I’m very interested in Processing and have played with it, also more extensively with R and NLTK/Python. Tools for rendering data, such as Google Chart Tools, igraph and RGraph are also interesting. Other, non-statistical tools are also an option: free hand drawing tools and web-based services like Many Eyes. Visualization doesn’t need to be restricted to computation/statistics. Stephanie Posavec‘s trees are a dynamic mix of automation and manual annotation and demonstrate that visualizations are rhetorically powerful interpretations themselves.

I hope that some of the abovementioned things connect to other THATcampers’ ideas, e.g. Lincoln Mullen’s post on mining scarce sources and Bill Ferster’s post on teaching using visualization.

Don’t get me started on the potential for teaching. Ultimately translating a text into another form is a unique kind of critical engagement: you’re uncovering, interpreting and making an argument all at once, both to the text in question and to yourself.

Anyway — anything from discussing theoretical issues of visualization to sharing code snippets would fit into this session and I’m looking forward to hearing other campers’ thoughts and experiences on the subject.

Tagged with: