How relevant is data literacy?

On March 10, 2011, in Thoughts, by cornelius

Two independent trajectories have prompted me to think about data literacy and its relevance lately. I’ll focus specifically on social data in the rest of this post, i.e. the information we generate on Facebook and similar services, though I think there are cases where these ideas may apply to other kinds of data as well.

In late February I attended the Cognitive Cities Conference, an event about the digital future of urbanity. Many presentations at CoCities incorporated statistics and flashy visualizations (traffic patterns, the journey of household trash to a landfill), and the importance of data was a recurring theme. It seemed to me like there was a slight uneasiness among the speakers in the face of the huge projection (which showed a colorful rendition of the presenter’s face at the beginning of each talk) and the ultramodern, Arduino-lit installation on the podium, activated by the speaker’s voice. Awe of such digital embellishments was mixed with embarrassment: Please, I’m not nearly as cool as that thing makes me look, many speakers seemed to say. Their reaction reflected a lingering consciousness of the risks posed by uncritical techno-fetishism that characterized the event for me. The digital future of cities, it became clear in the course of the two-day conference, will be intricately linked to our own future. Will we be smart mobs (or, even better, smart individuals), or dumb blobs of data, waiting to be mined by companies and government bureaucracies? Will we program or be programed?

Ton Zijlstra speaking at the Cognitive Cities Conference

Ton Zijlstra speaking at the Cognitive Cities Conference

One commentator aptly pointed out that a visualization of bike travel patterns in New York City didn’t really reveal anything a local wouldn’t know without rendering a graph, but the futurists were undeterred — and believe me when I say that I totally get why. All this data we all generate — whether it means something or not — can be analyzed, mined, visualized and repackaged in sophisticated rhetorical pastiches that blur the boundary between information and art. Data is being used to sell products, frame political statements and make scientific arguments. It is used to get insane valuations from investors, valuations ultimately based on the assumption that in the digital future, human behavior will be predictable in ways previously unimaginable. If code is law, data is capital.

The persuasiveness of digital data is owed to its degree of abstraction. The visualization of a set of data is a Russian doll of abstraction. It’s an interpretation based on implicit assumptions (What is highlighted? What is left out?), and on something (data) that also has a fluid and subjective relation to the world (What are friends on Facebook? Is there any relation between real friends and Facebook friends?). The raison d’être of social data is that something or someone external to us has generated it, making it seemingly superior evidence to our personal intuitions. But the frame in which the behavior takes place that the data perpetrates to describe conditions the possible options. The existence of a relationship status field makes the question of whether 500 million people are single or in a relationship (and whether their relationship is complicated) a public issue. By asking the question you’re conditioning the answer.

Dietmar Offenhuber (MIT) maps immigrant phone call patterns in NYC

Dietmar Offenhuber (MIT) maps immigrant phone call patterns in NYC

A second trajectory is the work on Twitter hashtag datasets we do in Düsseldorf as part of the Junior Researchers Group “Science and the Internet”. We’ve been using graph analysis and other procedures to figure out who is talking to whom and what’s being retweeted. The recent shutdown of TwapperKeeper has forced us to find our own custom solution for archiving tweets. In the process of looking for a fix, I discovered Amazon AWS and experimented with cloud-based data collection. I was up until 4am last night because I was so fascinated by the ability to launch a highly customized virtual server at the click of a button. Geeky as that may be, virtualization really empowers developers. It used to be that you needed access to a physical server for this kind of data collection — perhaps an old machine sitting in your office running 24/7, or, if you were a bit more professional, a machine provide by your university’s computing services. Or you could rent a commercial server, assuming you could afford it. But you couldn’t just click “launch instance”. You had to handle your resources carefully.

Not anymore. Not only is “web space” cheap or free (that happened a few years ago), but virtual computing power has become a commodity that you can use in a flexible way to do whatever you want to get done — collect data, do complex computations, anything. The one barrier that remains between the individual and this kind of digital self-empowerment is data literacy (in the connected world, that is, which means by no means everywhere). It is hard to imagine a future where those who are literate will not have a significant advantage over those who aren’t, because that barrier is unlikely to disappear as rapidly as economic hurdles are.

My take on this is not entirely positive. The increasing semantification of digital information and ubiquity of data makes arguments based on data and communicated via visualizations increasingly popular. Data-based argumentation can be deceitful or built on false premises, just like any other form of rhetoric. Data literacy must therefor not only be concerned with the technical dimension of data usage, but also with a critical reflection of the data’s relationship to the world. Add to this questions of ownership (Whose data is it?), control (Is the data being used to make inferences about people without their knowledge?) and trust (Are you dealing with a reliable data source?) and you have a rough sketch of what data literacy might look like.

Data literacy mind map. What's missing?

Data literacy mind map. What's missing?

Should we start teaching this stuff in school, as for example Adam Greenfield suggests? Or is data literacy a technocrat’s pipe dream, touted in order to make something appear universally relevant that really concerns only a small group of nerds?

Are our visualizations the ghosts from outer space that author Warren Ellis conjured in his closing speech at CoCities, phantasms that pretend to signify something, but ultimately mean nothing? Let me know what you think.

Tagged with:  

4 Responses to How relevant is data literacy?

  1. Ton Zijlstra says:

    “One commentator aptly pointed out that a visualization of bike travel patterns in New York City didn’t really reveal anything a local wouldn’t know without rendering a graph”

    It was brought as a sort of putdown for data viz, but it’s actually a great thing that indeed the data can show what a local already (intuitively) knows having lived there for years, which means we can use that to build services / devices that recognize those patterns upon (in which case the dataviz giving you information is not the point, but that you can do stuff with it to activate / derive decisions, adding a layer). Another important point for this type of dataviz is that it always contains points that fall outside what a local already knows. These are the weak signals you need to be alert about as they signal potential changes going on, as well as provide ‘hooks’ for changing behavior of those same locals.

    All in all the remark that a certain dataviz is instantly recognizable to locals, to me simply means that you’re on the right track.

  2. Ton Zijlstra says:

    Forgot to add: also let’s not forget that if you would indeed ask the locals (e.g. in a survey) they would never be able to give you the same information. The fact that they know things on an intuitive level, and recognize your viz instantly, doesn’t mean they (individually or collectively) would be able to express that knowledge in a useful way.

    Now, combining the two things would be a different matter: the dataviz as substrate, plus the actual stories and details added by locals.

  3. Cornelius says:

    Hey Ton, thanks for stopping by.

    “All in all the remark that a certain dataviz is instantly recognizable to locals, to me simply means that you’re on the right track.”

    I agree that it’s not an indicator that visualizations isn’t useful. My impression is that we have a lot of expectations towards visualization, some of which may not be fulfilled. In my experience the question of what the creator tries to communicate with the visualization plays an important role. Asking a concrete question usually makes for the more interesting vis that just creating a representation of some chunk of data. Of course what’s interesting is also in the eye of the beholder.

  4. [...] Data literacy mind map. What's missing? How relevant is data literacy? [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>