Data explosion (part 2): How we digitize the world and its implications

29.09.2010 | Christian Kreutz

In my last post I listed the many ways we digitize our environment. Text, voice and image recognition, and mobile data collection are only a few possible methods to bridge the on- and offline world, as the Swiftly blog rightly pointed out. So here some more methods and some reflections on their implications.

Internet of Things

A rather old concept is becoming increasingly real thanks to RFID technology. I have previously blogged about its potential and consequences for development. RFID chips can be attached as stickers to objects, which then can disclose or collect information. Over at the ReadWritewWeb blog there is a rather funny example of a social tennis racket, which could tell its story – where, with who and how it was played – with the help of such technology. The logistic sector is using these technologies on a wide scale to track their packages, and soon millions of more objects will be connected to the Internet in the same way.

Location based information

Location based services have been around for a while, but it is lately when they are actually being tacked up. With services such as Foursquare or Gwozilla we not only send our location, but lots of additional information that is around the location. This is an amazing business concept, where users collect "for free" huge piles of information with all sorts of information:

Location of bars, restaurants, shops, clubs
An evaluation when a user sends statuses and comments
Massive social profiles of movements and behaviors.

No surprise Twitter and Facebook have started to send their users updates from specific locations. I wrote a while ago about how you can see, on the example of Google Buzz, that such services are used around the world even in countries you would not imagine.

GPS is included in more and more devices, such as cars, mobile phones, bicycles and even washing powder (!).

World of sensors

Another approach to get huge amounts of data is through sensors, which measure all sorts of factors from our environment. The idea is that soon low-cost sensors will be available, for example, to measure noise, air quality or one’s physical condition. Such sensors can also be RFID chips, but can go even further. These sensors could be included in a watch or mobile phone; this way millions of people can deliver real time information. Sounds like science fiction, but there are already some crowdsourcing projects "using humans as sensors". Citypulse wants to measure the air quality though the contributions from pedestrians. And with a smart phone it is easy to join a project to measure the noise level worldwide.

The senseable city lab from MIT using such methods. For example, the "Trash Track" project, where 3000 sensors were added to trash bags to analyze the different ways pieces of trash are taken through the disposal chain.

Implications

The list can be easily extended; please add some further methods in the comment section. But why have I written this list? Because I want to describe how pervasive the process has gone and that it has had far reaching consequences, which not everybody is aware of.

First, positively, these means extend the richness of data on the Internet available. Second, it can offer more and better information faster. Third, thanks to open source and fairly cheap web services these tools become available for many more people. And fourth, if it is offered as open data to everyone, it can help create useful web services. I will elaborate further on user scenarios in upcoming posts.

However, I also have a lot of concerns and questions. What about ownership, often the lines blur. Who owns the data and to which I as an individual control the information about me anymore? Where does this data collection lead to in a few years time, when companies like Foursquare with millions of social profile data are in complete different hands? Do we really need to digitize everything that is possibly modifiable?

Michael Gurstein has a great post, in which he expresses concerns that open data might only help the people, who already have an information advantage (e.g. access, research skills).

What is the sense of collecting all this huge amounts of data? Or is a lot of that data collection nonsense, because it limits or has even little or no meaningfulness? I will elaborate further on these questions in my next posts.

First part of the post.