You are on page 1of 7

The Marketer's Dilemma:

Focusing on a Target or a Demographic?


The Utility of Data-integration Techniques
MrKE HESS
Nielsen
Michael.Hess@nielsen.
com
PETE DOE
Nielsen
Pete.Doe@nielsen.com
Data-integration techniques can be useful tools as marketers continue to innprove
overall efficiency and return on investment. This is true because of the value of the
techniques themselves and also because the current advertising market, based on
demographic buying, has major opportunities for arbitrage in the range of 10 percent
to 25 percent (where in that range depends on the nature of the vertical). The current
study reviews different methods of data integration in pursuing such negotiations.
INTRODUCTION
Advertisers, agencies, and content providers all
are looking for improvement in the placement of
advertisements in content. If an advertiser can
reach more of its customers and potential custom-
ers by spending less money, or an agency can help
an advertiser to do the same, this yields a positive
effect on the advertiser's bottom line. Conversely,
a content supplier can enhance its value if it can
demonstrate that its content is attractive to par-
ticular types of people (e.g., those disposed to a
particular brand or category, or even a particular.
psychographic target).
In this quest for improved advertising effi-
ciency and return on investment (ROI), a number
of different methods have evolved. Most market-
ers and their agencies use targeting rather than
mass-marketing strategies (Sharp, 2010). Beyond
this, many agencies have their own "secret-sauce"
formulas whereby they adjust the value of an
advertising buy as a function of how much
"engagement" can be attributed to that vehicle,
whether it be a specific television program
or a magazine title. A more recent in-market
approachexemplified by T RA (Harvey, 2012) and
Nielsen Catalina Serviceshas also shown that
buying can be improved through the identification
of programs that have more brand and category
heavy users.
T he authors' own work since 2007 with data-
integration techniques has shown that fused data
DO I : 10.2501/JAR-53-2-231-236
sets also can improve targeting efficiency by a
range from about 10 percent to 25 percent depend-
ing on the category vertical. A number of firms
employ data fusion and integration techniques
on the provider side (e.g., Nielsen, T elmar, Kantar,
and Simmons) and the agency business (Hess and
Fadeyeva, 2008).
In this study, the authors share some of the defi-
nitions and empirical generalizations that have
accumulated in the past five years of working with
these techniques.
T he practical application of data integration
already has begun to appear in the marketplace.
A large snack-manufacturing company presented
some of its findings ata recent Advertising Research
Foundation (ARF) conference (Lion, 2009); a global
software supplier took the stage at a Consumer-360
event (Nielsen C-360, 2011); and a media-planning
and buying agericy has indicated that it is using its
custom fusion data set to verify and fine-tune com-
mitments made in the 2012 Upfront and in all of
its competitive pitches for new business (personal
communication to M. Hess, 2012).
In the next section, the various data-integration
techniques are defined, and some of the advan-
tages and disadvantages of each are discussed.
TYPES OF DATA INTEGRATION
T here are three broad types of data integration
used in media and consumer research for advertis-
ing planning.
June 2 0 1 3 J DURIIRL OF HDUEd TISIIIG RESEHRCH 2 3 1
WHAT WE KNOW ABOUT ADVERTISING II
EIVIPIRICAL GENERALIZATION
Analysis with integrated data sets and the national people meter panel has shown us
that if an advertising buy is made based on a marketing target and the programs that
its members viewrather than against a demographic targetthere is empirically a
range of between 10 percent and 25 percent improvement in the efficiency of that buy.
This marketing target can be based either on consumption pattern segmentation (e.g.,
heavy/light category users) or on psychographic/lifestyle segmentation (e.g., prudent
savers versus financial risk takers).
Directly Matched Data
Data sets are matched using a common key
(e.g., name and address, or cookies). Very
often, this requires the use of personally
identifiable information, and appropriate
privacy measures must be in place. Some
of the key technical aspects that must be
evaluated are completeness and accuracy
of matching.
For marketing purposes, databases
that are integrated via direct-matching
of address are often referred to as single-
source data, but there is a distinction
between true single-source and this form
of integrated data as the completeness and
accuracy of the match are usually not per-
fect. However, it can be considered to be
the next best thing to single source assum-
ing the datasets being integrated are of
good quality and relevance.
An example of this sort of database is
the Nielsen Catalina Services integration
of Catalina frequent shopper data with
television data obtained from Nielsen
National People Meter data and Return
Path Set Top Box data.
Unit-Level (e.g., respondent-level)
Ascription
In many cases, direct matching of data
is unfeasible, perhaps because of pri-
vacy concerns or because the intersection
between the data sets is minimal (this is
usually the case with samples, where pop-
ulation sampling fractions are very small);
assuming no exclusion criteria for research
eligibility, the chance of a respondent
being in two samples with sampling frac-
tions of 1/10,000 is 1 in 100 million.
In these cases, statistical ascription tech-
niques can be used to impute data. For
example, product-purchase data can be
ascribed onto the members of a research
panel that measures television audiences,
using common variables on the television
panel and a product-purchase database to
guide the ascription. This enables viewing
habits of product users to be estimated.
Data fusion is one example of a unit-
level ascription technique that is increas-
ingly being used to create integrated
databases. (The topic is discussed in more
detail later in this article.)
Some of the advantages of this approach:
There is no additional burden on the
respondent. Because the ascription is sta-
tistical, it can be applied to anonymized
data. Additional data are obtained with-
out affecting existing response rates or
worsening respondent fatigue.
There are no privacy concerns. Along
with the previous point, it makes this a
particularly valuable approach to add-
ing additional data fields to media cur-
rency measurements, which typically
have tight constraints on respondent
access and measurement specifications.
As the ascription is applied at the urt/
respondent level, the database created
delivers complete analytic flexibility.
A particularly relevant and valuable
consequence of this for media databases
is that advertising reach and frequency
analyses can be created.
The cost of ascription is low in com-
parison to the cost of additional primary
research.
Caveats associated with this approach:
Ascription techniques contain the pos-
sibility of model bias. This needs to be
carefully assessed. Model validation is
essential.
In the majority of cases, ascription
models have aggregate- rather than
respondent-level validity. For example,
a model that overlays brand purchasing
onto a television measurement panel
may not be able to predict the actual
brand purchases of an individual house-
hold on the panel, but it will be able ta
reliably predict the viewing of brand
purchasers as a group. This means
that the approach is relevant to advei-
tising planning but less applicable to
test-control ROI analyses where direct
assessment of purchase versus exposure
is required.
Aggregate-Level Integration
Aggregate-level integration uses segmen-
tation to group and then link types cf
respondent on data sets. The segmentation
typically uses combinations of demograph-
ics and geography, though any information
common to the data sets can be employed.
An example of a commonly used seg-
mentation is Prizm, which segments the
population into 60 geo-demographic
groups. An assessment of viewing habits
of brand users can be obtained by iden-
tifying Prizm codes strongly associated
with particular brands (using a consumer
panel) and looking at viewing traits associ-
ated with these groups (using a television
2 3 2 J OUIRL or f lDUERTIS lOG RES EflflCH J u ne 2 0 1 3
THE MARKETER'S DILEMMA: FOCUSING ON A TARGET OR A DEMOGRAPHIC?
panel with Prizm classification). Alterna-
tively, purchase, propensity scores across
all segments can be calculated on the con-
sumer panels and used as media weights
on television audiences.
Advantages of this approach:
Segmentations can cover a wide
scopelinking data sets through
geo-demographic segmentation, for
example, allows consumer and media
research databases to be connected and
subsequently linked with geographical
data such as retail areas.
Understanding a brand through the lens
of a suitably constructed segmentation
delivers insights beyond basic purchase
facts, perhaps guiding advertising crea-
tivity as well as media touch-points.
Limitations of this approach:
Segmentations, by nature, assume
homogeneity within segments, and this
delivers less precision and less sensitiv-
ity than other approaches.
Because the integration of data sources
is not unit/respondent level, there are
restrictions on analysis: in particular,
campaign reach and frequency.
The Pros and Cons of Each Approach
Direct match, unit-level ascription, and
aggregate-level ascription can' be consid-
ered as a tool for users of research, to be
used in the appropriate way (See Table 1).
For example, respondent-level ascription
of brand user attributes on a television
panel may be used to plan advertising
for a specific brand target; a direct-match
database may then be used to estimate
advertising effectiveness of the cam-
paign; product distribution tactics may be
informed by the use of geo-demographic
segmentation.
TABLE i "
Overview of Integration Approaches
Direct Match (e.g..
Address Matching)
Applications Advertising ROI
Media Reach and
Frequency
Media Planning
Ad Sales
Accuracy/ High - near single
precision source
Unit-Level Ascription
(e.g., Data Fusion)
Media Reach and
Frequency
Media Planning
Ad Sales
Dependent on model:
can be near single
source
Aggregate Level
(e.g.. Segment Matching)
Media Planning
Ad Sales
Relating media and sales
activity to geographical
locations e.g., stores.
catchment areas
Dependent on
segmentation but typically
lower than unit-level
ascription
Caveats Privacy Aggregate-level
validity: not suited to
direct ROI estimation
Completeness and Model Bias
Accuracy of Matching
Aggregate-level validity:
not suited to direct ROI
estimation
Reach and Frequency not
available
Assumption of homogeneity
within segments reduces
sensitivity
DATA FUSiON
The term data fusion is used to describe
many different data-integration methods.
The most conunon definition, and the one
we shall use in this study, is as follows:
"Data fusion is a respondent-level integra-
tion of two or more survey databases to
create a simulated single source data set."
Essentially two surveys (or panels) are
merged at the respondent level to create a
single database (e.g., the U.S. Nielsen tele-
vision/Internet Data Fusion overlays data
from the Nielsen Online Audience Meas-
urement Panel onto the National People
Meter television Audience Measurement
Panel, creating a database of respondents
with television viewing measures and
online usage measures).
The Data Fusion Process
(TV/Internet Fusion)
TV Panel
Common
Characteristics
TV Viewing
Internet Panel
Common
Characteristics
Online Use
Data Fusion
(Matching via Common
Characteristics)
Integrated Data
Common Characteristics
TV Viewing and Online Use
J une 2 0 1 3 J DUR nH L OF H DUER TISIDG R ESER R CH 2 3 3
WHAT WE KNOW ABOUT ADVERTISING II
The term data fusion is used to describe many
different data-integration methods.
Linking Variabies
The creation of this single database
matches respondents on common vari-
ables to lir\k the data sets. Common vari-
ables (also known as "linking variables"
or "fusion hooks") typically are demo-
graphic, geographic, and media-related.
For example, men aged 18 to 24 years, in
full-time employment within a certain
geographical region who have a particular
defined set of media habits (defined across
the two' panels), may be matched across
the two databases.
The importance of linking variables in
the data fusion cannot be overstressed.
In the case of media-based data fusion,
Nielsen data fusions adhere to the gener-
ally accepted idea that linking variables
must encompass more than standard
demographic measures to ensure reliabil-
ity of results.
The importance of employing measures
directly related to the phenomena begin
fused (in this case, television viewing) was
emphasized by Suzanne Rassler (2002) in
Statistical Matching:
Within media and consuming data the
typical demographic and socioeconomic
variables will surely not completely explain
media exposure and consuming behavior.
Variables already concertiing media expo-
sure and consuming behavior have to be
asked as well. Thus, the common variables
also have to contain variables concerning
television and consuming behaviors....
Linking variables are the key to the sta-
tistical validity of the fusion, which oper-
ates on the assumption of conditional
independence; in the case of the televi-
sion/Internet fusion, this would mean that
variations in the way that television view-
ing and online use interact are random
within each group of respondents defined
by the interlaced common variables.
Where this condition does not hold,
model regression to mean occurs, and
there wili be some bias in the fused results.
This bias can be estimated using fold-over
tests or comparison to single-source data
(if available) and is an important part of
assessing a data fusion's validity and
utility.
In addition, a smart fusion practitioner
also will test the congruence of the link-
ing variables across the two databases
checking that the two sample structures
are matched well enough to enable the
fusion to work well and assessing the
closeness of matching of the two samples
post fusion.
Matching the Samples
In practice, it is rarely possible to find a
match for every respondent across every
characteristic in the linking variable set.
In the absence of a perfect match, the
objective, therefore, becomes finding the
best match. And although fusion algo-
rithms vary, this requirement typically is
achieved using statistical distance meas-
urements (including assessment of the rel-
ative importance of the linking variables
in predicting behavior) and identifying the
respondents with the smallest distance.
At the same time, checks should occur
in the fusion algorithm to ensure that the
fusion uses all the respondents in both
samples as equitably as possible. In some
cases, the two samples to be fused may
have very different sample sizes, and con-
sideration needs to be given to how to best
use the sampleswhether ail respondents
will contribute to the fused database or
just the closest matches to create a data-
base with a respondent base equal in size
to the smaller of the two samples. This
decision often is driven by logistical fac-
tors such as the analysis system capabili-
ties rather than being a purely statistical
consideration.
Vaiidation
Data fusion has been used in media
research for planning purposes for more
than 20 years, and a body of knowledge
has been built up over that time. Valu-
able guidance as to the validity levels that
may hold given various data-integration
approaches also can be found in industry
guidelines developed by the Advertising
Research Foundation (2003).
Validation studies have demonstrated
that data fusion provides vahd results
with acceptably low levels of model bias
assuming the following hold:
The samples are well defined and struc-
turally similar;
there is a sufficient set of relevant link-
ing variables; and
the fusion matches the samples closely
across the linking variables.
The authors of the current article believe
that it is important to validate every data
fusion across these three criteria and to
create formal fold-over validation tests
and/or single-source comparisons where
possible. In addition, offering methodo-
logical transparency and welcoming exter-
nal validation of data fusion processes
have contributed to greater acceptance of
data fusion by the industry. As such, the
method is viewed by many as a useful tool
in the researchers' tool box.
2 3 4 m m i or nouERTisiiiG DESEHRCH J une 2 0 1 3
THE MARKETER'S DILEMMA: FOCUSING ON A TARGET OR A DEMOGRAPHIC?
ANALYSIS OF LEARNINGS AND
EMPIRICAL GENERALIZATIONS
Although the authors have been work-
ing in this space since 2007, it is not easy
to obtain specific learning from every data
integration due to the proprietary nature of
the service. The generalizafions below are
offered in the spirit of industry advance-
ment while, at the same dme, protective of
the proprietary aspects of the outcomes.
Analysis with integrated data sets and
the national people meter panel has shown
us that if an advertising buy is made based
on a marketing target and the programs
that its members view, rather than on a
demographic target, there is empirically a
range of 1 0 percent to 25 percent improve-
ment in the efficiency of that buy.
This marketing target can be based
either on consumption pattern segmen-
tation (e.g., heavy/light category users)
or on psychographic/lifestyle segmenta-
tion (e.g., prudent savers versus financial
risk takers). An increase in efficiency is
explained as follows:
A campaign planned to deliver X demo-
graphic GRPs will deliver Y brand target
GRPs. An alternate plan can be developed
that delivers X demographic GRPs and Z
brand target GRPs wheh Z > X. Equiva-
lently an alternate plan can be developed to
deliver X2 demographic GRPs and Y brand
target GRPs where X2<X
(Collins and Doe, 201 1 ).
The general patterns observed are
technology companies are closer to the
high end of the 1 0-percent to 25 -percent
range of improvement;
services, such as financial, are in the
middle; and
Consumer Packaged Goods (CPG) are
at the lower end.
The authors attribute this outcome to the
fact that demographic buying is itself more
aligned with CPG items that have broader
penetration, whereas the technology side
is less aligned. Larger improvements can,
therefore, come from this area.
Expectations
The only empirical excepfions occur when
the demographics and marketing target
indexes for two programs happen to over-
lap, or at least not differ significantly.
These occasional excepfions, however,
are offset by the findings that come from a
list of demographicaUy similar programs.
In fact, one almost always can find a subset
that will have higher category consumpfion
or penetrafion of a key psychographic tar-
get segment. This 1 0-percent to 25 -percent
range, in turn, translates into a form of
media arbitrage because sellers do not take
into account the amount of the category
consumpfion/segment penetration when
they price their program Cost per Thou-
sand (CPMs) based on demographics. As
noted earlier, established CPG categories
tend to fall in the lower part of this range
whereas newer spaces such as software and
technology lie in the higher end.
Brands in all the categories we have
examined to date have fallen into that
range, signaling that there is virtually
always an efficiency to be gained by being
able to direct media toward the marketing
target from an initial condition of having
begun as a demographic target. Import-
antly, that marketing target can be based
either on psychographic/lifestyle attrib-
utes or on brand/category consumption.
These targets are sourced directly from
the fused databases. Although it is true
that if the target is very large (such as all
American television viewers), no efficien-
cies will be gained; the majority of the
targets worked with represent less than
20 percent of the viewing populafion. At
that level of targeting, the 1 0-percent to
25 -percent range of improvement holds.
As noted previously, the brand target need
not be either demographic or purchase
based: it could be based on a psycho-
graphic segmentation or a set of attitudes.
The implication is that planning on a
standard demographic target (e.g., women
ages 25 to 5 4 years) is less efficient than
planning on a more precisely defined
target.
STRATEGIC IMPLICATIONS
Using more precise brand targets than
tradifional demographics creates oppor-
tunities for both buyers and sellers and
improves overall media efficiency by
delivering less waste: better advertising
placement leads to more advertisements
being seen by the right people at the right
fime and less irrelevant adverfisements
being served up to bemused consumers.
Improving the media envirormient
in this way is clearly good for every-
one. Whether the use of brand targets
will become an explicit component of an
adverfising buy or will remain hidden in
the planning and negotiation process is
unclear. At present, the latter is the case
in television, in part because the execu-
fional tools for buying are conshained to
demographics. Online advertising-serving
models, however, are capable of defining
more precise targets through cookie-based
ascription models.
This empirical generalizafion also sug-
gests a strategy: to take advantage of the
available demographic-versus-marketing
target arbitrage, it is important to have the
right data that link the consumption seg-
ment, or psychographic segment, to pro-
gram viewing.
These data sets can be based on single-
source, direct-matched, or fused data. In
each case, the television currency meas-
urement (e.g., the National People Meter
service for national television advertising
in the United States) is used as the basis
for the program-viewing behavior. Get-
ting these efficiencies in the television buy
J u ne 2 0 1 3 J OU R IRL OF H DU ERTISil l G RESERRCH 2 3 5
WHAT WE KNOW ABOUT ADVERTISING II
also is important for cross-platform cam-
paigns. If the reach, for example, against
the marketing target is already enhanced
via this approach as part of the television
buy, the Key Performance Indicator (KPI)
of the cross-platform might be based more
on frequency and recency than on an effort
to attain additional unduplicated reach.
CONCLUSION
In sum, the authors believe that data-
integration techniques are acting as the
latest wave of services that are bringing
greater overall efficiency and, in tum, ROI
to the industry. They follow in the foot-
steps of predictive new product models in
the 1970s and 1980s, and marketing-mix
modeling in the 1990s and 2000s.
MIKE HESS is evp in Nielsen's Media Analytics group.
He aiso serves as the Nielsen spokesperson
for Social Television and is currentiy directing a
comprehensive anaiysis of the relationship between
social buzz and television ratings. Before joining
Nielsen in 2011, Hess was research director for
the media agencies of Carat and OMD. Hess's
publications inciude an American Association of
Advertising Agencies-sponsored monograph on "Short
and Long Term Effects of Advertising and Promotion"
(2002), and a review of quantitative methods in
advertising research for the Rftieth Anniversary issue
of the Journal of Advertising Research (2011). He
currently acts as project co-lead for the quantification
of brand equity for the MASB and this year became a
trustee of the Marketing Sciences Institute.
PETE DOE is svp/data integration at Nielsen. In that
role, he has global responsibility for Nielsen's
data-fusion methodologies and is involved with such
data-integration methods as STB modeled ratings and
oniine hybrid audiences. Prior to moving to the United
States in 2003. Doe was a board director at RSMB
television research in the United Kingdom, where he
worked on the BARB television audience measurement
currency and numerous data-fusion projects.
REFERENCES
ADVERTISING RESEARCH FOUNDATION. ARF
Guideiines for Data Integration. Advertising
Research Foundation, 2003.
COLLINS, ]., and P. DOE. Making Best Use
of Brand Target Audiences Print and Digital
Research Forum. San Francisco, CA, 2011.
HARVEY, B., panelist at the Wharton Empirical
Generalizations Conference-II, Philadelphia,
May 31,2012.
HESS, M. , and I. FADEYEVA. ARF Forum on Data
Fusion and Integration. New York: Advertising
Research Foundation, 2008.
LION, S. Marketing Laws in Action. AM 4.0. New
York, NY: Advertising Research Foundation,
2009. *
NIELSEN ANNUAL CUSTOMER C - 36 0 CONFER-
ENCE. Orlando, June 2011.
RASSLER, S. Statistical Matching: A Frequentisf
Theory, Practical Applications, and Alternatizx
Bayesian Approaches. New York: Springer-Verlag,
2002.
SHARP, B. HOW Brands Grow. Australia and New
Zealand: Oxford University Press, 2010.
2 3 6 J o u R o m o r H D UE IIT ISIIIG R E SE H UCH J u n e 2 0 1 3

You might also like