Professional Documents
Culture Documents
Analytics in Retail
Learn how Intel and Living Naturally* used big data to help a
health store increase sales and reduce inventory carrying costs.
SOLUTION BLUEPRINT
Big Data Analytics in Retail Data
Introduction
The volume, variety, and velocity of data1 being produced in all
areas of the retail industry is growing exponentially, creating both
challenges and opportunities for those diligently analyzing this data
to gain a competitive advantage. Although retailers have been using
data analytics to generate business intelligence for years, the extreme
composition of todays data necessitates new approaches and tools.
This is because the retail industry has entered the big data era, having
access to more information that can be used to create amazing shopping
experiences and forge tighter connections between customers, brands,
and retailers.
A trail of data follows products as they are manufactured, shipped,
stocked, advertised, purchased, consumed, and talked about by
consumers all of which can help forward-thinking retailers increase
sales and operations performance. This requires an end-to-end retail
analytics solution capable of analyzing large datasets populated
by retail systems and sensors, enterprise resource planning (ERP),
inventory control, social media, and other sources.
How does one start a big data project? In an attempt to demystify retail
data analytics, this paper chronicles a real-world implementation that is
producing tangible benefits, such as allowing retailers to:
Increase sales per visit with a deeper understanding of customers
purchase patterns.
Learn about new sales opportunities by identifying unexpected
trends from social media.
Improve inventory management with greater visibility into the
product pipeline.
A set of simple analytics experiments was performed to create
capabilities and a framework for conducting large-scale, distributed
data analytics. These experiments facilitated an understanding of the
edge-to-cloud business analytics value proposition and at the same
time, provided insight into the technical architecture and integration
needed for implementation.
Table of Contents
Introduction1
Competitive Advantage
Unstructured Data2
Competitive Advantage2
Big Data Technologies 2
Problem Definition4
Data Preparation6
Algorithm Definition Process 6
Social Media Analysis8
Physical Infrastructure 14
Accelerate Big Data Implementations 14
30
55
30
25
20
18
120
120
CASHEWS XLG
WHL
0000000003827
NUTRIENT DEPOT
14
ORDER# 1294853
18
21
30
38 units
Mangos Market
Sarasota, FL 99352
Associated Buyers
Houston, TX 77002
SALES
38
120
Recommendation
ADD
Figure 1. User Interface for Retail Buyers
Ordering
Current
Next
Order amount in
shopping cart shipment inventory track record
Previous orders
30
55
30
Inventory
status
Product
25
20
18
120
ORDER# 1294853
Associated Buyers
Houston, TX 77002
Next shipment
Recommended reorder
quantity
CASHEWS XLG
WHL
38 units
Mangos Market
Sarasota, FL 99352
120
Current inventory
38
Order quantity
selector
Recommendation
0000000003827
NUTRIENT DEPOT
14
18
18
21
SALES
INVENTORY
SHRINK
CHANGE
30
SALES
30
ORDERS
21
Significant
Twitter* chatter
0000000003827
NUTRIENT DEPOT
14
Click to see
associated
products
Click for
tweet details
Problem Definition
Before starting a big data project, it is important to have
clarity of purpose, which can be accomplished with the
following steps:
The tool also lets the retail buyer know about associated
products for cashews, recommending the buyer increase
orders of papayas, which customers are more likely to
purchase along with cashews due to another social mediadriven microtrend (Figure 4). In other words, if cashew sales
are about to spike due to social media postings, papaya sales
are likely to increase as well.
Data Type
Social Media
Supply Chain
Data Sources
Twitter*
Facebook*
Orders
Invoices
Shipments
Receipts
Sales
Transactions
Business Solutions
Data Preparation
Retail data used for big data analysis typically comes from
multiple sources and different operational systems, and
may have inconsistencies. Particularly for the Intel team,
transaction data was supplied by several systems, which
truncated UPC codes or stored them differently due to
special characters in the data field. More often than not,
one of the first steps is to cleanse, enhance, and normalize
the data.
Analysis
Technique
Classification
Algorithms
Business Problems
Business Solutions
Clustering/
Segmentation
Forecasting/
Prediction
Sequence Analysis
Directed Advertising: Personalize advertisements (instore, online) to better attract customer attention.
This type of algorithm translates images/highdimensional data from the real world to provide
numerical or symbolic information for decisionmaking.
Monitor social
influencers for
health food related
endorsements &
recommendations
Select natural
foods influencer
Monitor
endorsements &
recommendations
for viral
conditions
Monitor Living
Naturally*
products to
message
120
100
80
Trend 60
40
20
0
1
400
300
200 Sales,
100 Trend
0 Rate
-100
-120
9
Trend Rate
Month
Trend
Visualization
Compute
Sales
d(Trend)/d(Months)
Twitter* Data
Mongo
Export
Analytics
Visualization
in D3.js
Visualization
Store Manager
User Interface
Analytics
Hive
Pig
Mahout
R-Hadoop
Compute
Map Reduce
Programs
NLP Libraries
Tagging &
Tokenization
Sentiment
Scoring
HDFS
(Transaction & Raw
Twitter Data)
HBase
(Processed Data)
Sqoop
(Data Integration)
Cleaned Up and
Aggregated Data
Campaigns &
Promotions Data
Storage
Sqoop
Product Catalog
Store & Retailer
Details
Hardware
Loyalty Management
System
Facebook*
Twitter*
Extract
Features
Other
Social
Media
Tokenize &
Tag the Text
Public
chat
Filter by
User
Number of
Followers
Influencer
Keywords
Sentiment
Score
Topics
Inventory
PO & Forecast
Quantity
PO/Gross
Retailer
Ship and Inv.
Inventory
Inventory
Twitter*
Manufacturer
Raw
Material
PO/Gross
PO & Forecast
Customer
10
LG Promo
POS & Web
Customers
Price
Inventory
Environment
Web Search
Demand
Signals
Disty
Raw
Material
MFG Coupon
Price Drop
Shelf
Non-loyalty
Transactions
Demand
Generation
Monthly Dollars
Spent/Sold
Monthly
Inventory Turns %
$120,000
$100,000
$80,000
$60,000
$40.000
$20,000
$0
60%
50%
40%
30%
20%
10%
0%
3/31/11
7/31/11
11/30/11
3/31/12
7/31/12
1/30/11
5/31/11
9/30/11
1/31/12
5/31/12
2/28/11
4/30/11
6/30/11
8/31/11
10/31/11 12/31/11 2/29/12
4/30/12
6/30/12
8/31/12
Transaction ID
Milk
Bread
Butter
Beer
Source: http://en.wikipedia.org/wiki/Association_rules
Figure 12. Association Rules Learning Example
11
Transaction Dataset
Transaction ID
Vegetables
100012346
Canned Goods
100012347
Vegetables
100012348
Bread
100012349
100012350
Canned Goods
-------
-------
-------
-------
Association
Rules Mining
Product Category
100012345
VEGETABLES
FISH
CHEESE
MEAT
BREAD
CRACKERS
Support =
HERBS
MILK
SEAFOOD
Rule X
Confidence =
POULTRY
Lift =
98% pf people who purchased items
A and B also purchased item C
12
(X U Y).count
n
(X U Y).count
X.count
Support
Supp(X). Supp(Y)
Association
Rules Library
Basket Rules
items
1 {DRY GROCERY\SHELF STABLE FOODS\CANNED GOODS,
FRESH\PRODUCE\VEGETABLES}
2 {DESERTS,
DRY GROCERY\SHELF STABLE FOODS\COOKIES CRACKERS AND CRISPBREADS}
3 {DRY GROCERY\SHELF STABLE FOODS\COOKIES CRACKERS AND CRISPBREADS,
HERBS BULK}
4 {DRY GROCERY\SHELF STABLE FOODS\CANNED GOODS,
DRY GROCERY\SNACKS\CHIPS AND PRETZELS}
5 {DRY GROCERY\SHELF STABLE FOODS\SOUP AND BROTH,
DRY GROCERY\SNACKS\CHIPS AND PRETZELS}
6 {DRY GROCERY\SNACKS\CHIPS AND PRETZELS,
FRESH BAKERY\BREAD AND BAKED GOODS\BREAD}
7 {DRY GROCERY\SHELF STABLE MEAL SOLUTIONS\MEAT POULTRY SEAFOOD,
FRESH BAKERY\BREAD AND BAKED GOODS\BREAD}
8 {DRY GROCERY\SHELF STABLE MEAL SOLUTIONS\MEAT POULTRY SEAFOOD,
DRY GROCERY\SNACKS\CHIPS AND PRETZELS}
support
0 .005387144
0 .003611720
0.007000243
0 .003363160
0.0024992898
0 .003525485
0.004159565
0 .003276926
Physical Infrastructure
The physical implementation architecture of the analytics
framework is shown in Figure 14, including the hardware
configuration, operating systems, and important software
components used in the solution.
Hadoop* Cluster
2 x Intel Xeon processor @ 2.7 GHz, 24 TB disk space,
and 126 GB Memory per node
Cloudera* Distribution of Hadoop (CDH5)
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
10 Gigabit Ethernet
Web Server
2 x Intel Xeon processor @ 2.4 GHz,
2 TB disk space, and 24 GB Memory
Hosting UI and Analytics Visualization
Gartner analyst Doug Laney introduced the 3Vs concept in a 2001 MetaGroup research publication, 3D data management: Controlling data volume, variety and velocity. http://blogs.gartner.com/
doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
McKinsey & Company, Consumer Marketing Analytics Center, Creating Competitive Advantage from Big Data in Retail, June 2012. http://www.mckinsey.com/client_service/retail/expertise/~/media/
mckinsey/dotcom/client_service/retail/articles/cmac_creating_competitive_advantage_from_big_data.
Andrew McAfee and Erik Brynjolfsson, Big Data: The Management Revolution, Harvard Business Review, October 2012.
Copyright 2014 Intel Corporation. All rights reserved. Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the United States and/or other countries.
*Other names and brands may be claimed as the property of others.
Printed in USA
0614/MB/TM/PDF
Please Recycle
330716-001US
13