You are on page 1of 14

Big Data

Lois Mermelstein The Law Office of Lois D. Mermelstein lois@loismermelstein.com 512-222-8589


Cyberspace Law Committee Meeting, August 3, 2012

Ted Claypoole Womble Carlyle tclaypoole@wcsr.com 704-331-4910

What Is Big Data?

Data that exceeds the processing capacity of conventional database systems. Too much data It moves too fast Its too diverse

Howd we get here?

Storage, processing speed, and bandwidth are becoming exponentially faster Networking is expanding exponentially And you can buy all the pieces - data, infrastructure, processing

source: http://radar.oreilly.com/2011/08/building-data-startups.html

Crunching Big Data - Volume

Turn 12 terabytes of tweets/day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption Crunching Facebook recommendations based on your friends interests

Crunching Big Data - Velocity

Time-sensitive analysis and decision-making - to catch important events as they happen When theres too much input data (so toss some) or immediate decisions must be made Examples:

Scrutinize 5 million trade events/day to identify potential fraud Analyze 500 million daily call detail records in real-time to predict customer churn faster

Crunching Big Data - Variety

Not just names/addresses in a customer database Want to analyze text, sensor data, audio, video, location data, click streams, log files, and anything else thats available

Principle: when you can, keep everything - there might be something useful in what you throw away

Unexpected Consequences

Anonymous AOL searcher isnt (NYT, 8/9/2006) Anonymous Netflix users arent, when compared with IMDb database (Wired, 12/13/2007) For many, browsing history is unique and repeatable (8/1/2012) Target knows when youre pregnant (NYT, 2/19/2012)

Lessons to (Re)learn

Correlation isn't causation But correlation may be all you need

You can't hide in the crowd

Personally Identifiable Information


PII as a mathematical function How many points of data do you need? Pineda v Williams Sonoma Stores, Inc. (Cal, Feb 10 2011)

HIPAA De-Identified Data


Re-Identifying De-Identified Data

Escaping Regulatory Requirements


Privacy Fair Credit Reporting Redlining Employment Discrimination

Single Transaction Owned By:


Retailer Wholesale vendor Manufacturer Shipping Company Customers Bank Customers ISP Retailers Bank Merchant Card Processor Phone company/Hardware/Software

Government Using Big Data


Law Enforcement

Copyright Issues
Who owns the data? Who owns the derivative works? Combined data?

You might also like