A new, simpler approach to Twitter visualization

If you’ve been following my work recently you might have noticed my interest slight obsession with visualization, especially in relation to communication on Twitter. I’ve been experimenting both with graphs and with traditional bar and pie charts to show what happens when people use Twitter.

Now I’ve tried something new, somewhat inspired by an essay on info visualization recently published by Lev Manovich. In it, Manovich describes an approach that he calls direct visualization:

In direct visualization, the data is reorganized into a new visual representation that preserves its original form. Usually, this does involve some data transformation such as changing data size. For instance, text cloud reduces the size of text to a small number of most frequently used words. However, this is a reduction that is quantitative rather than qualitative. We don’t substitute media objects by new objects (i.e. graphical primitives typically used in infovis), which only communicate selected properties of these objects (for instance, bars of different lengths representing word frequencies). My phrase “visualization without reduction” refers to this preservation of a much richer set of properties of data objects when we create visualizations directly from them.

Applying this idea to the Twitter data I work with, I decided to try something new. Instead of reducing the richness of the data, why not rearrange it to make it more readable? And here’s the result of my attempt to do that:

All tweets using the #MLA09 hashtags in one large PDF

(Note: download the PDF and look at it in your favorite PDF viewer if the zooming in Scribd is sluggish)

Note: The title of this post is somewhat misleading — the paper in question appears to be the most widely cited paper in Language, not necessarily in linguistics.

Mark Dingemanse has posted an interesting analysis of the LSA‘s recent survey for their anthology of Language. From the survey:

For each volume of the Anthology, we are seeking input on those articles which represent the best scholarship published during that particular period. By “best,” we mean the most influential, the most cited, the most visited in JSTOR, and those considered a must-read for students and scholars of the discipline.

Mark has put together a spreadsheet showing the ranking of six popular Language articles in terms of how often they are viewed on JSTOR and added citation information from Google Scholar to the ranking. The result are interesting for several reasons and I wanted to briefly remark on them.

Note: have a look at Mark’s spreadsheet for more detail than is visible in the chart above.

Mark points out that the 1974 paper A Simplest Systematics for the Organization of Turn-Taking for Conversation by Sacks et al has a remarkable lead when it comes to the number of citations. The first thing to consider in my view is that Google Scholar is not entirely accurate (see this; there are other, more recent studies showing attribution problems persist). Google Scholar’s greatest sin in the eyes of most librariens is that it largely ignores metadata and instead resorts to text mining approaches to determine things such as author and publication name. I use GS regularly and it’s a fantastic resource, but its citation counts should be taken with a grain of salt, to put it mildly.

My other argument is something I cannot fully back up, but that seems very plausible to me: Sacks at al is much more accessible than other highly rankend papers.

Accessible in what sense?

  1. The topic of the paper makes it relevant to scholars in other disciplines. The clear, non-technical and theory-agnostic title adds to this. People find papers via search and you can’t search for terms you aren’t familiar with.
  2. There are multiple open access PDF copies of the paper available that one can download without access to JSTOR (here, here and here — note that two copies are stored on the websites of a language and social interaction program and a computer science department).
  3. The paper is cited in the Wikipedia article on conversation analysis (in fact, this is the highest rankend Google hit when searching for the exact name of the article, even before the JSTOR page).

If you compare this with the other top-ranked papers you’ll come to the conclusion that

  1. their subject and scope and how it is reflected in the exact wording of the title makes them less relevant to other disciplines,
  2. they aren’t accessible except through JSTOR,
  3. they aren’t referenced in Wikipedia (because they aren’t accessible).

Of course my argumentation is somewhat skewed if we assume that both the citation figures and the numbers from JSTOR might not be entirely accurate. The #2 paper in JSTOR (Curtiss el al) is likely to have a large number of views because it pops up as #1 search result when searching for Genie on JSTOR, because it is fairly ambiguous, and because it is related to a spectacular and tragic incident.

Do linguists (and scholars in general) take the second and third argument into account? My impression is they don’t, at least not enough. Even the Language Anthology will not be openly acessible, although many popular texts are de facto available, whether legally or not (e.g. Chomsky’s review of Skinner). We should aim to make more of our research — past and present — both technically and legally available on the Internet. This will benefit colleagues from other fields, the general public, and ultimately linguistics as a discipline.

Blog-Empfehlung: Digiom (Jana Herwig)

Oben auf meiner Liste der Dinge, die ich längst einmal bloggen wollte, steht der Hinweis auf das sehr lesenswerte Blog von Jana Herwig alias Digiom. Wer sich für medienwissenschaftliche Themen mit einer Zugabe (österreichischer) Politik und (internationaler) Popkultur interessiert, der sollte mindestens einen Blick darauf werfen.

Irgendwann im Verlauf der letzten zwei Jahre habe ich sowohl damit aufgehört, aktiv Blogs (regelmäßig) zu lesen, als auch damit, selbst “richtig” (im Sinne von diskursiv, geplant, als Teil eines Gesprächs, mit viel Aufmerksamkeit und Muße) zu bloggen (siehe CorpBlawg um einen Eindruck meiner aktiveren Phase zu bekommen). Ich habe aufgegeben, weil es schlicht zu viel Informationen gibt, zu viele interessante Themen und Gespräche, und ich mich ständig verlaufe in diesem Internet mit seinem Überangebot, das natürlich nicht wirklich “in toto” ein Überangebot ist, sondern uns nur indidviduell überfordert (oder ggf. auch nur mich — von Frank Schirrmacher einmal abgesehen). Andere vorgeschobene Gründe sind Zeitmangel (habe ich wirklich weniger Zeit als früher?) und Faulheit; der wichtigste ist allerdings die Erkenntnis, dass Bloggen im akademischen Vokabular immer noch nicht gleichbedeutend mit Publizieren ist. Unaufhaltsam hat sich mein Blick innerhalb der letzten Jahre verengt, auf bestimmte Themen einerseits und auf Veröffentlichungen die “etwas bringen” (also die Publikationsliste erweitern) andererseits. Warum das so ist (und die Frage, ob es Kollegen ähnlich ergeht — es fällt nämlich auf, wie viele Leute nach der Promotion irgendwann mit dem Bloggen aufhören) ist ein Thema für einen anderen Post. Vielleicht ist an allem der Umstand schuld, dass Wissenschaft eben nicht nur eine Berufung, sondern auch ein Beruf ist und man zunehmend Gefahr läuft, in erster Linie strategisch zu denken.

Aber nicht Alle sind glücklicherweise so leicht überfordert, siehe Jana, was vermutlich auch mit Ihrer Strategie des Lesens, Ordnens und Kommentierens zu tun hat. Diese sehr Blog-originäre Herangehensweise ist auch (und vielleicht gerade) in Zeiten von Twitter und anderen Häppchenformaten noch ein Gewinn für den Leser, vor allem dann, wenn man sich nicht auf das reine Wiedergeben irgendwelcher Meldungen beschränkt (siehe z.B. diesen Beitrag).

Mal sehen, vielleicht mache ich das ja demnächst auch mal wieder.

