Professional Documents
Culture Documents
Data that exceeds the processing capacity of conventional database systems. Too much data It moves too fast Its too diverse
Storage, processing speed, and bandwidth are becoming exponentially faster Networking is expanding exponentially And you can buy all the pieces - data, infrastructure, processing
source: http://radar.oreilly.com/2011/08/building-data-startups.html
Turn 12 terabytes of tweets/day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption Crunching Facebook recommendations based on your friends interests
Time-sensitive analysis and decision-making - to catch important events as they happen When theres too much input data (so toss some) or immediate decisions must be made Examples:
Scrutinize 5 million trade events/day to identify potential fraud Analyze 500 million daily call detail records in real-time to predict customer churn faster
Not just names/addresses in a customer database Want to analyze text, sensor data, audio, video, location data, click streams, log files, and anything else thats available
Principle: when you can, keep everything - there might be something useful in what you throw away
Unexpected Consequences
Anonymous AOL searcher isnt (NYT, 8/9/2006) Anonymous Netflix users arent, when compared with IMDb database (Wired, 12/13/2007) For many, browsing history is unique and repeatable (8/1/2012) Target knows when youre pregnant (NYT, 2/19/2012)
Lessons to (Re)learn
Copyright Issues
Who owns the data? Who owns the derivative works? Combined data?