Two independent trajectories have prompted me to think about data literacy and its relevance lately. I’ll focus specifically on social data in the rest of this post, i.e. the information we generate on Facebook and similar services, though I think there are cases where these ideas may apply to other kinds of data as well.
In late February I attended the Cognitive Cities Conference, an event about the digital future of urbanity. Many presentations at CoCities incorporated statistics and flashy visualizations (traffic patterns, the journey of household trash to a landfill), and the importance of data was a recurring theme. It seemed to me like there was a slight uneasiness among the speakers in the face of the huge projection (which showed a colorful rendition of the presenter’s face at the beginning of each talk) and the ultramodern, Arduino-lit installation on the podium, activated by the speaker’s voice. Awe of such digital embellishments was mixed with embarrassment: Please, I’m not nearly as cool as that thing makes me look, many speakers seemed to say. Their reaction reflected a lingering consciousness of the risks posed by uncritical techno-fetishism that characterized the event for me. The digital future of cities, it became clear in the course of the two-day conference, will be intricately linked to our own future. Will we be smart mobs (or, even better, smart individuals), or dumb blobs of data, waiting to be mined by companies and government bureaucracies? Will we program or be programed?
One commentator aptly pointed out that a visualization of bike travel patterns in New York City didn’t really reveal anything a local wouldn’t know without rendering a graph, but the futurists were undeterred — and believe me when I say that I totally get why. All this data we all generate — whether it means something or not — can be analyzed, mined, visualized and repackaged in sophisticated rhetorical pastiches that blur the boundary between information and art. Data is being used to sell products, frame political statements and make scientific arguments. It is used to get insane valuations from investors, valuations ultimately based on the assumption that in the digital future, human behavior will be predictable in ways previously unimaginable. If code is law, data is capital.
The persuasiveness of digital data is owed to its degree of abstraction. The visualization of a set of data is a Russian doll of abstraction. It’s an interpretation based on implicit assumptions (What is highlighted? What is left out?), and on something (data) that also has a fluid and subjective relation to the world (What are friends on Facebook? Is there any relation between real friends and Facebook friends?). The raison d’être of social data is that something or someone external to us has generated it, making it seemingly superior evidence to our personal intuitions. But the frame in which the behavior takes place that the data perpetrates to describe conditions the possible options. The existence of a relationship status field makes the question of whether 500 million people are single or in a relationship (and whether their relationship is complicated) a public issue. By asking the question you’re conditioning the answer.
A second trajectory is the work on Twitter hashtag datasets we do in Düsseldorf as part of the Junior Researchers Group “Science and the Internet”. We’ve been using graph analysis and other procedures to figure out who is talking to whom and what’s being retweeted. The recent shutdown of TwapperKeeper has forced us to find our own custom solution for archiving tweets. In the process of looking for a fix, I discovered Amazon AWS and experimented with cloud-based data collection. I was up until 4am last night because I was so fascinated by the ability to launch a highly customized virtual server at the click of a button. Geeky as that may be, virtualization really empowers developers. It used to be that you needed access to a physical server for this kind of data collection — perhaps an old machine sitting in your office running 24/7, or, if you were a bit more professional, a machine provide by your university’s computing services. Or you could rent a commercial server, assuming you could afford it. But you couldn’t just click “launch instance”. You had to handle your resources carefully.
Not anymore. Not only is “web space” cheap or free (that happened a few years ago), but virtual computing power has become a commodity that you can use in a flexible way to do whatever you want to get done — collect data, do complex computations, anything. The one barrier that remains between the individual and this kind of digital self-empowerment is data literacy (in the connected world, that is, which means by no means everywhere). It is hard to imagine a future where those who are literate will not have a significant advantage over those who aren’t, because that barrier is unlikely to disappear as rapidly as economic hurdles are.
My take on this is not entirely positive. The increasing semantification of digital information and ubiquity of data makes arguments based on data and communicated via visualizations increasingly popular. Data-based argumentation can be deceitful or built on false premises, just like any other form of rhetoric. Data literacy must therefor not only be concerned with the technical dimension of data usage, but also with a critical reflection of the data’s relationship to the world. Add to this questions of ownership (Whose data is it?), control (Is the data being used to make inferences about people without their knowledge?) and trust (Are you dealing with a reliable data source?) and you have a rough sketch of what data literacy might look like.
Should we start teaching this stuff in school, as for example Adam Greenfield suggests? Or is data literacy a technocrat’s pipe dream, touted in order to make something appear universally relevant that really concerns only a small group of nerds?
Are our visualizations the ghosts from outer space that author Warren Ellis conjured in his closing speech at CoCities, phantasms that pretend to signify something, but ultimately mean nothing? Let me know what you think.