Data explosion: The many ways to get content online (or how we digitize the world)

9/13/2010 | Christian Kreutz

The same way the fishing industry has found more efficient methods to get most fishes out of the oceans, is exactly how we find more ways to digitze information that was previously only available offline. Imagine a massive fishing-net bringing us the greatest fishes, but emptying the oceans. What would be then the fishing-net or, in this case, the opportunities and consequences of digitizing all the information? Nobody really knows.

Thanks to the Internet, we now double every two days all stored information. The estimated amount is 5 exabytes according to Eric Schmidt (Google) and it took human kind 2000 years to get a similar amount of archived information. Traditional governments and companies collect information and stored it as digital data. The non-profit sector is increasingly engaging in such efforts because technologies have become more widely available in many cases even as open source software.

Text recognition (OCR)

Text recognition software has become very sophisticated and can understand even hand written texts. Cloud services such as Evernote analyze each note uploaded for texts, being it a business card, wine label or any other document. Thanks to such softwares the project scanned thousands of documents by the United Nations and offered better search capabilities. Another project by HP Labs in Bangalore wants to offer an email service written by pen on paper. An image through a mobile phone and text recognition shall make it possible. And if it is not recognized by Optical Character Recognition (OCR), then our support through the well-known reCaptcha helps make sense of words. For Recaptcha we help decipher texts from medieval books.

Voice recognition

Voice recognition is far from being new, but it has become much better over the years, and its services are much easier available. Latest voice actions make such a service available for all Android driven mobile phones, where you can read outloud, for example, the text of an email. Ushahidi text-based service is now also available by voice to report about incidents. They work with Cloudvox a voice service application. This and services such as Twilio make such voice affordable and available to low budgets previously reserved only to companies and government.

The open source solution Freedom Fon even offers an interactive voice response system so, for example, iliterate people can provide information.

Mobile data collection

In recent years a lot of mobile softwares for data collection have been developed. Here are some open source solutions. Mobile phones are in the hand of half of the world population and many collect data passively or actively. Google collects already data through a GPS, if users have accepted to join it, for traffic information. In remote areas innovative solutions try to bridge the 160 character limit of SMS. A simple paper wheel is used to report critical health information from the country side in Cambodia.


More and more cameras and particular mobile phones have a GPS functionality and increasingly the photos are uploaded to the Internet. Flickr photos in the world map gives a first impression. Flickr has already 117,025,830 geotagged items. In few years in most locations around the world a series of blog posts are available. Google Street view will then "only" have the streets. Talking of Google, their Goggles service tries to deliver additional information on physical objects. Take a photo from an object such as a restaurant menu or a sight seeing spot and it will provide you with information and store the image in a database.

In the next days comes another post with more examples and thoughts on consequences.