Success With Big Data Analytics: Competencies and Capabilities For The Journey

White Paper
Success with Big Data Analytics

Competencies and Capabilities for the Journey
Jason Danielson, Solutions Marketing Manager, NetApp
June 2016 | WP-7233
Abstract
Big data analytics creates sizable value for companies in all industries. Value is created
through a better customer focus, from increased operational excellence, or from entirely new
businesses. However, many companies underestimate the cost, complexity, and
competencies to get to that point, and many fail along the journey. Smart companies reduce
the cost, complexity, and competency gap by relying on NetApp and NetApp partners for their
big data infrastructure. Because enterprise storage building blocks and associated service
capabilities provide maturity and cost-effectiveness at the data management level, across all
big data uses and technologies, companies are freed to focus on developing business-facing
capabilities, which requires mastering competencies at the application and data science
levels. Big data analytics technologies discussed in this white paper include Splunk, NoSQL
databases, Hadoop, Solr, and Spark.
TABLE OF CONTENTS
1 Situation ...............................................................................................................................................3
2 Challenges ...........................................................................................................................................3
3 Playbook for Success with Big Data Analytics .................................................................................4

3.1 Strategy ..................................................................................................................................................... 5
3.2 Design and Deployment ............................................................................................................................. 5
3.3 Data Management and Governance ........................................................................................................... 7
3.4 Operations ................................................................................................................................................. 8
3.5 Security and Compliance ............................................................................................................................ 9
3.6 Program Management .............................................................................................................................. 10
3.7 Partners ................................................................................................................................................... 10
4 Conclusion .........................................................................................................................................10
Glossary ...................................................................................................................................................12
LIST OF FIGURES
Figure 1) Competencies required for success with big data analytics. ........................................................................ 4
2 Success with Big Data Analytics 2016 NetApp, Inc. All rights reserved.
1 Situation
1.1 The Promise of Big Data Analytics
Big data analytics is the process of examining datasets that are characterized by a greater volume,
velocity, or variety of data types than those found in traditional business intelligence and data warehouse
environments, with the purpose of uncovering hidden patterns, unknown correlations, market trends,
customer preferences, and other useful business information. These analytical findings can lead to more
effective marketing, new revenue opportunities, better customer service, improved operational efficiency,
competitive advantages over rival organizations, and other business benefits.
Companies across all industries increasingly view data as a critical production factor similar to talent and
capital. They realize that capturing and blending more data sources than ever before across many
different domains create economic value. For instance, financial services companies collect, price, and
disburse capital across their various lines of business, from granting credit to providing insurance to
making capital markets work. The volatility and disruption the industry has experienced over the last few
years have spurred banks and insurers to unlock the value inherent in the data their businesses generate.
They are looking for real-time, actionable insights that help them better understand customers, price risks,
and spot fraud. This means gathering, analyzing, and storing millions of transactions, interactions,
observations, and third-party data points per minute. Existing systems such as relational databases or
enterprise data warehouses are high performing but often not suited for the volume, velocity, and variety
of data. It is no wonder that financial services companies have been among the early adopters of big
data, embracing technologies and solutions such as NoSQL databases, Hadoop, Spark, and Splunk. (For
a definition of these technologies, see the glossary at the end of this white paper.) They are leveraging
the power of various big data technologies to transform the customer experience and improve profitability.
2 Challenges
Anecdotal evidence suggests that 50% of enterprises that embrace big data struggle to create business
value, evidenced by many abandoned, stranded Hadoop efforts. Only 10% become truly successful,
having developed and mastered the many competencies required after a long, arduous journey.
2.1 Why Is Success with Big Data So Hard to Achieve?

In our own experience, enterprises struggle with several aspects of scaling their use of big data:
Big data analytics is treated as a technology project, not as a cross-functional transformation
of the business. Enterprises typically start with a small implementation to solve a specific immediate
problem in a single department or line of business. Small environments tend to grow organically as
more people become aware of the availability and value of the solution. In their quest for agility, IT
staff often fail to consider what it would take to make such a solution available for broader use across
the enterprise. For instance, Hadoops ability to interpret data on read is initially seen as liberating
and encourages dumping any and all types of data into this Hadoop data lake, akin to a long-term
parking lot. However, if metadata (that is, data describing the nature and provenance of the raw data)
is not defined from the beginning, the data lake quickly becomes a data swamp of which it is difficult
to make sense. Similarly, many companies strive to assemble a customer golden record to provide a
360-degree view of the customer relationship, which becomes the foundation for highly targeted
segmentation and personalized calls to action. However, that works only if the different data-
producing departments (for example, sales, production, service, and marketing) collaborate, and if the
handoffs to the data-consuming processes are defined (for example, segmentation for the campaign
management system or next-best activity that a call center agent can suggest). As siloed line of
business big data deployments start to grow, centralized IT environments often get asked to assume
ownership of the big data infrastructure to address the scaling challenges.
The technology vendor ecosystem is fragmented and rapidly evolving. There is a perception
that the big data market is characterized by a high pace of innovation delivered by many best-in-class
software companies. Although NoSQL and Hadoop deliver compelling capabilities, many enterprises
underestimate the complexity of getting these technologies to work smoothly in business-critical
environments. The consolidation that has happened in more mature areas of technology (for
example, IBM, Oracle, Microsoft, and SAP) has yet to happen in the big data space, which is
characterized by more than a dozen NoSQL databases, several Hadoop distributions, and rapid
advancement in newer Apache projects such as Spark and Storm.
New technologies and architectures also call for new skills. There is a lot to learn across the
entire lifecycle of big data initiatives and technologies. Many enterprises lack a strong digital leader
who is able to align business needs with technology capabilities. The architectures, technologies, and
vendors selected need to align with those evolving business needs. New skills need to be developed,
often through external hires, across a range of roles ranging from architecture to infrastructure
engineering and operations to data science to application development. Moreover, when analyzing
data with high volume, velocity, and variety, great skill is required to assess the veracity (that is,
quality and trustworthiness) of this data.
3 Playbook for Success with Big Data Analytics

Enterprises across the world are increasingly using the agile approach to software development that was
perfected in Silicon Valley to design innovative big data systems and build data-driven solutions. In our
view, enterprises that truly succeed are agile and deliver business results quickly without compromising
on the future. The apparent conflict between agility versus scalability to meet enterprise-wide needs is
resolved by those enterprises that take a realistic look at the competencies required for success with big
data. Those enterprises realize that the journey toward business value can be accelerated and derisked
by relying on partners that can handle the inherent complexity. Those partners bring critical competencies
in areas such as big data strategy, design and deployment, data management and governance,
operations, security and compliance, and program management.
Figure 1 illustrates the seven areas of competency that are essential for success with big data.
Figure 1) Competencies required for success with big data analytics.
Competencies Required for Success with Big Data Analytics
Design & Data Mgt. &

Strategy Deployment Governance Operations
Security
Partners
Program Mgt.
The following sections cover the best practices that NetApp has developed and that ensure successful
business outcomes for our customers.
3.1 Strategy
To succeed at this stage, NetApp customers address the following areas:
Business strategy and alignment
Use case definition and prioritization
3.1.1 Business Strategy and Alignment

Those companies that succeed with big data in a big way are those that have built a strong case for
change. Those companies realize that data becomes a critical production factor similar to talent and
capital. They also realize that integrating big data analytics capabilities into the many aspects of the
business is a multiyear journey that cannot start soon enough. Often, it is about a completely new
architecture with new applications, platforms, tools, and capabilities, augmenting many organizations
current best-in-class setup. This results in better insights into potentially all aspects of a companys
business, generated across a range of valuable use cases targeting revenue growth, cost savings, or
better customer outcomes. However, some use cases require changes to processes, systems, and
metrics in order for insights to become fully actionable.
Big idea: NetApp helps your organization create business alignment, an important prerequisite to
success. Several thousand petabytes of data worldwide rely on NetApp. We are experts at proactively
advising large organizations on how to manage their data better. Superior availability, performance, and
ease of deployment, for instance, directly affect business value. In addition, NetApp and our partners
bring a proven approach on how to extract business value from that data across the enterprise.
3.1.2 Use Case Definition and Prioritization

Deciding which use cases to pursue in what order is not just a matter of business impact. Organizations
need to make sure the use case presents a manageable execution challenge. In other words, the ease or
difficulty of implementation needs to be aligned with the skills present at a point in time. That ease of
implementation is defined by factors such as business ownership, availability of data sources, technology
complexity, compliance complexity, skills, and organizational complexity.
Big idea: NetApp accelerates time to value and reduces execution risk. Our partners1 bring a proven
approach to onboard successive waves of users, use cases, and analytic applications, which allows you
to scale more quickly. Jointly, we provide project leadership, best practices, and robust plans to minimize
surprises. NetApp partners can fill roles as needed, accelerating your learning curve, and transfer
knowledge on the job until internal resources can perform their roles.
3.2 Design and Deployment

Target architecture and solution design
Infrastructure agility and resilience
3.2.1 Target Architecture and Solution Design

Traditionally, technology adoption has happened in silos. That silo approach to implementation was
characterized by domain-specific solutions, proprietary islands, with vendor lock-in. Siloed point solutions
meant that the data a company has is not usable outside the silo. Raw data is collected for a single
1
http://solutionconnection.netapp.com/solution-listing.aspx.
purpose and discarded too soon. Therefore, management has an incomplete picture of what is going on.
There is no single point of truth.
Smart enterprises adopt more of a layer approach to technology, driven by an enterprise-wide unifying
target architecture wherever possible. The main architectural principle of moving data once but sharing
and processing it many times mandates shared storage building blocks. The physical instantiation of
these building blocks can vary, serving data that is hot, warm, cold, or frozen, on the premises or in the
cloud. What matters is an integrated approach to managing, operating, and securing these building
blocks.
Such an approach brings many benefits:
Modular, modern architecture that supports the broadest range of applications and analyses
Freedom to choose the best tool or processing engine for the job
All data, across all time periods, joined and correlated across domains
Shared data that is consumable for different use cases, often building on each other
Multiple lenses on the same data, with team-specific views
Ability to serve new use cases quickly and affordably
Fast learning curve, making it easier to train, retain, and develop staff
Big idea: NetApp offers an enterprise architecture with validated storage building blocks stretching
across new deployments as well as in-place analytics on existing data, which guarantee lower total cost
of ownership (TCO) and risk than commodity servers with internal drives. The NetApp approach brings:
A mature solution architecture that includes validated designs, technical reports, and complete
runbooks, which shortens time to value, increases deployment stability, and reduces consulting
expenses
Ability to handle both unstructured and structured data with the portfolio of products
Reduced operational complexity, including speed of provisioning capacity and users
Consistent enforcement of data security, privacy, governance, and compliance
A dramatic reduction in the power, space, and skills required
Accelerated testing and development of big data solutions by making sure of seamless data
movement between on-premises and public cloud environments
3.2.2 Infrastructure Resilience and Agility

Unfortunately, standard server-based deployments that utilize internal disks provide less resilience and
performance than it might appear on day one of a deployment. Performance under failure conditions can
cripple day-to-day operations. Network costs and complexity increase due to replication and failure
redistribution models. Managing disk failures and replacements is decentralized and prone to errors. As
these big data deployments grow, storage management becomes complex.
Moreover, commodity internal drives are less agile than required. Agility is about accommodating future
growth and making the big data infrastructure consumable for different use cases. This requires
processing power and storage requirements to scale independently of each other to address evolving
business needs. Evidently that is impossible with commodity internal drives. Capital efficiency is low
because the load balancing and scale that a shared service provides are more difficult to achieve.
Big idea: NetApp provides enterprise readiness across all aspects of resilience and agility, spanning
operations, governance, integration, and security. NetApp accelerates time to productivity of your big data
infrastructure while allowing you to meet ever-changing business demands. Specifically:
No longer does a single disk failure cripple a node and immediately affect overall infrastructure
performance. Recovery from disk failures is dramatically improved due to the implementation of
dynamic disk pools (DDP), which harness the performance of a large pool of drives to intelligently
rebuild the lost data. Performance is only negligibly affected, and for an order of magnitude less time
than with internal storage. For instance, in a recent test, a single drive failure and rebuild process in
one of the internal drives in a Couchbase cluster server had a significant impact on the clusters
capability to process requests from clients. The operations-per-second rate dropped by over 90%.
However, with the NetApp EF560 and DDP, the impact was limited, and approximately 15 minutes
after the initial disk failure, normal service was restored.2
File system storage and compute are decoupled and can scale independently subject to workload
requirements. This also eliminates the need for rebalancing or migration when new data nodes are
added, thereby making the data lifecycle nondisruptive.
NetApp storage also increases performance for many big data workloads. For instance, in a recent
benchmark, Splunk on NetApp achieved search performance that was 69% better than Splunk on
equivalent commodity servers with internal disks.3 NetApp provided optimized performance and
capacity buckets for Splunks hot, warm, cold, and frozen data tiers. Moreover, because data is
externally protected, additional performance and efficiency gains can be realized by reducing the
amount of data replication, lightening the load on compute and network resources and reducing the
amount of storage required just for data protection.
The ability to do in-place analytics on existing NAS data using NetApp technologies can help save
infrastructure cost and time of setting up a duplicate storage silo for analytics and provide faster time
to insights. It also eliminates data movement.
3.3 Data Management and Governance

Smart enterprises deploy out-of-the-box processes that instantiate best practices and therefore close the
skills gap and reduce the potential for human error in the following areas:
Data ingestion
Metadata management
Collaboration
Data lifecycle management
3.3.1 Data Ingestion

Smart enterprises have a well-honed process for bringing new data sources into the data lake, called data
ingestion, at both the batch and real-time layer. They have also established processes and tools to
assess and improve data quality, sometimes referred to as veracity.
Big idea: NetApp partner solutions, such as Zalonis Bedrock Data Management Platform, provide
managed data ingestion: simplified onboarding of new datasets, managed so that IT knows where data
comes from and where it lands. It also allows automated orchestration of workflows to apply to new data
as it flows into the lake and is key for reusability, sourcing, and curation of different data types.
3.3.2 Metadata Management

Early big data use cases were often around higher volumes of structured data. After companies move into
unstructured data use cases, making sure of data quality becomes very difficult without good metadata
management. Metadata management cannot be an afterthought as more and more data sources feed
into the shared storage environment. Smart enterprises keep track of what data is in the big data
2
Detailed report available on www.netapp.com; report number detailed report or search on our webpage for
TR-4462.
3
Performance data taken from NetApp E-Series for Splunk Enterprise 2015 Function1.
platform: its source, its format, its lineage, and its quality. Data visibility and understanding are unified
across traditional business intelligence and true big data environments such as Hadoop. Defining and
capturing metadata allow ease of searching and browsing. Proper metadata management is the
foundation of data quality. As data lakes grow in depth and importance to the business, the quality of the
metadata becomes essential to make sure that the data poured into these lakes can be found, used, and
exploited for years to come.
Big idea: NetApp partner solutions can assist. Zalonis Bedrock provides unified data management
across the entire data pipeline from ingestion through to self-service data preparation. It ensures file- and
record-level watermarking so you can see data lineage, movement, and usage. This ensures that
consumers can search, browse, and find the data they need, reducing the time to insight for new analytics
projects.
3.3.3 Collaboration
Collaboration at its core is about coordination between data owners, data professionals (for example,
administrators, developers, and data scientists), and data consumers. The more business critical the use
cases, the more important that collaboration becomes. Smart organizations have found ways to address
misaligned funding and incentives. For instance, department A is only able to tap into the wealth of data
in the shared data lake if department A also makes its own data available to other departments, avoiding
the free-rider problem. Moreover, a code of conduct might state that department A needs to provide
advance notice regarding changes in the availability, quality, or format of its data, because department B
may use As data source for powering real-time recommendations at the point of sale. Some large
companies have created homegrown internal social networks that break down these communication and
incentive barriers.
Big idea: NetApp partner solutions such as Zalonis Bedrock embody best practices to foster
collaboration and coordination. Specifically, Bedrock provides workflow and enrichment. Workflow covers
tasks such as masking, lineage, data format conversion, change data capture, and notifications.
Enrichment allows data professionals to orchestrate and manage the data preparation process.
3.3.4 Data Lifecycle Management

As enterprises adopt cloud-based platforms for some big data workload, the gap between on-premises
and cloud-based data management and governance has widened. Enterprises struggle in their desire to
have a Hadoop data lake platform that integrates enterprise security and access policies with the
performance and reliability offered by familiar enterprise storage platforms.
Big idea: NetApp gives you the confidence that your analytics are always running on the right data, with
the right quality, with mature and proven data governance and associated workflows. Specifically, NetApp
provides advanced data lifecycle management, allowing for automated tiering of storage and migration
from hot to warm to cold storage. In addition, data can be tiered and replicated to NetApp private storage
(NPS), which enables analytics orfor the most critical use casesbackup and disaster recovery in the
cloud. Moreover, the architectural principle of moving data once but sharing and processing it many times
holds true even for traditional enterprise data because the NetApp NFS Connector for Hadoop allows in-
place analytics on data sitting in NFS-addressable storage arrays.
3.4 Operations
Manageability
Efficiency and performance
3.4.1 Manageability
Administering a large Hadoop cluster can be more complicated than many realize. There is complexity
associated with manually recovering from drive failures in a Hadoop cluster with internal drives.
Big idea: NetApp provides the SANtricity Storage Manager, which has often been commented on as
the easiest to use and most intuitive interface in the industry. It features a combination of wizards and
automation for common processes along with detailed dials for storage experts. It provides a centralized
management GUI for monitoring and managing drive failures. The SANtricity operating system is also
performance optimized, yet still offers a complete set of data management features such as Snapshot
copies and mirroring. This makes it easy to meet service-level agreements with predictable performance.
These are complemented by OnCommand Insight for health checks and the widely acclaimed NetApp
AutoSupport.
3.4.2 Efficiency and Performance

Manageability was a capability that came to the Hadoop framework relatively late (Ambari and so on).
That is a particular concern in commodity hardware clusters because drives fail much more frequently
than in enterprise-grade systems. Although this might have been a lesser concern in small clusters for
batch-oriented analytical workloads, it becomes a much larger concern as Hadoop powers insights that
drive quasi-real-time operational decisions and as clusters come to incorporate disparate hardware pools
(for example, servers of different generations).
Big idea: NetApp provides better efficiency, requiring less hardware, which also translates into savings in
power and software licenses. NetApp provides better performance per dollar and per rack compared to
internal disks. Rather than creating three copies for sustained data availability, NetApp is able to maintain
data availability with just two copies because of DDP. Hardware-based RAID and a lower Hadoop
replication count reduce network overhead, thus increasing the aggregate performance of the cluster. In
spite of the lower replication count, NetApp achieves 99.999% reliability, versus 99.9% for commodity
internal drives. As a result, throughput typically increases by 33% (that is, 33% less network traffic), and
storage consumption decreases by 33%. In addition, licensing costs for Hadoop and associated
components are reduced because of the lower node count. Additional efficiency is provided by the
storage tiering that NetApp provides.
Moreover, NetApp provides better resilience and reliability than internal drives during healthy and
unhealthy modes of operation. Data nodes can be swapped out without downtime. Crucially, it can
recover from a single node failure without disruption to running Hadoop jobs. NetApp E-Series and EF-
Series with hardware RAID provide transparent recovery from hard drive failures. The data node is not
blacklisted, and any job tasks that were running continue uninterrupted.
3.5 Security and Compliance

Data security
Data privacy and compliance
3.5.1 Data Security

For early, isolated big data pilots and proofs of concept, an illusion of security exists because physical
access to the data is largely restricted to a few friendly, trusted users. Data is largely owned by each
department, and service levels are best efforts, typically for batch use cases.
The new possibilities opened up by the data lake imply a steep learning curve for everyone involved in
security. Internal security signoffs are often complicated.
Best practice security needs to consider administration, authorization, audit, access, authentication, and
encryption.
Big idea: NetApp contributes to security by providing hardware-accelerated encryption. The benefit is a
performance impact of less than 1 percent, compared to several percentage points with competing
solutions.
3.5.2 Data Privacy and Compliance

Common techniques for addressing data privacy and compliance issues include role-based access
control, data masking, and tokenization. Any policies need to strike the appropriate balance between data
privacy and business value, for instance, governing fine-grained, role-based access control on shared
data in a data lake accessed potentially by hundreds of internal or thousands of external users.
Big idea: NetApp partners help you manage the privacy and compliance of your big datasets. Although a
full discussion of the data privacy and compliance regimes in different industries and jurisdictions is
beyond the scope of this paper, a common requirement that is supported by the NetApp partner
ecosystem is the ability to delineate all information associated with specific entities during specific time
frames, for instance, in the context of freedom of information requests or litigation/legal discovery
requests.
3.6 Program Management

Those companies that truly succeed run big data initiatives as end-to-end cross-functional
transformations, not as technology projects. Program management plays a pivotal role in that
transformation.
Big idea: NetApp Global Services provides seasoned program management capability to ensure
successful outcomes.
3.7 Partners
In the rapidly evolving big data space, no single company can provide everything, and NetApp is no
exception.
Big idea: What NetApp does provide is a comprehensive partner ecosystem across Hadoop, NoSQL,
and analytic applications such as Splunk that collectively solves the big data analytics needs of the most
demanding enterprises. The NetApp partner ecosystem is available on Solution Connection:
http://solutionconnection.netapp.com/solution-listing.aspx.
4 Conclusion
In summary, NetApp helps you achieve success with big data analytics initiatives, no matter whether your
role is on the business side as a business owner and consumer of big data insights or on the technical
side as a developer or data professional.
NetApp and partners help you create maximum business value with short time to market because
NetApps portfolio of solutions provides better and more consistent performance and is tested with
Hadoop distributions, NoSQL databases, and applications such as Splunk and Spark. Additionally, there
is a TCO advantage stemming from better performance and scalability, efficiency (storage, power, and
licenses), and improved recoverability. Overall, NetApp provides a better balance of performance (with
less hardware than competing solutions), capacity, and cost. Customers particularly value the
independent scaling of storage and compute, performance tiering, and space efficiency (single source of
data, no resync, and no copy).
Specifically, our E-Series provides the following benefits:
Realize better performance than internal drives during data rebuilds.
Increase search performance by 69% versus commodity servers with internal disks.4
Save on storage capacity by reducing replication factor. This reduces storage capacity requirements
and maintains availability with less copies.
Scale compute and storage independently to better match application workload.
Enjoy single-interface management across storage environment.
Maximize uptime of cluster through superior availability.
Improve reliability with enterprise storage building blocks.
Encrypt your data with no performance impact.
Optimize performance and capacity for hot, warm, cold, and frozen data.
Rest assured with world-class NetApp AutoSupport.
4
Source: Function1 report.
Glossary
Hadoop: Open-source software that provides the enterprise-wide data lake:
Allows acquiring all data in its original format and storing it in one place, cost effectively and for long
time periods.
Allows different processing engines and schema on read.
Provides mature multitenancy, operations, security, and integration.
NoSQL: Nonrelational databases popular for big data and real-time web applications:
Data models (for example, key-value, graph, or document) seen as more flexible than relational
database tables
Popular for high-availability, low-latency use cases
Simplicity of scaling out horizontally using clusters of machines versus scaling up for relational
databases
Popular open-source NoSQL DB, including MongoDB, Apache Cassandra, Solr, and HBase
Spark: Open-source software that provides a modern development environment and power user
analytical environment for big data:
In-memory high-speed analytics engine
Advanced machine learning libraries
Unified programming model across all processing engines
Splunk: Software solution for searching, monitoring, and analyzing machine-generated data using a web
interface:
Captures, indexes, and correlates real-time data in a searchable repository from which it can
generate graphs, reports, alerts, dashboards, and visualizations
Horizontal technology, based on a proprietary NoSQL database, traditionally used for IT service
management, security, compliance, and web analytics
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact
product and feature versions described in this document are supported for your specific environment.
The NetApp IMT defines the product components and versions that can be used to construct
configurations that are supported by NetApp. Specific results depend on each customers installation in
accordance with published specifications.
Copyright Information
Copyright 19942016 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this document
covered by copyright may be reproduced in any form or by any meansgraphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system
without prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP AS IS AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY
DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice.
NetApp assumes no responsibility or liability arising from the use of products described herein, except as
expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license
under any patent rights, trademark rights, or any other intellectual property rights of NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or
pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software
clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).
Trademark Information
NetApp, the NetApp logo, Go Further, Faster, AltaVault, ASUP, AutoSupport, Campaign Express, Cloud
ONTAP, Clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Flash Accel, Flash
Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare,
FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster, MultiStore, NetApp
Fitness, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, RAID-TEC, SANshare, SANtricity,
SecureShare, Simplicity, Simulate ONTAP, SnapCenter, SnapCopy, Snap Creator, SnapDrive,
SnapIntegrator, SnapLock, SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore,
Snapshot, SnapValidator, SnapVault, SolidFire, StorageGRID, Tech OnTap, Unbound Cloud, WAFL,
and other names are trademarks or registered trademarks of NetApp Inc., in the United States and/or
other countries. All other brands or products are trademarks or registered trademarks of their respective
holders and should be treated as such. A current list of NetApp trademarks is available on the web at
http://www.netapp.com/us/legal/netapptmlist.aspx. WP-7233-0616

Success With Big Data Analytics: Competencies and Capabilities For The Journey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Success With Big Data Analytics: Competencies and Capabilities For The Journey

Uploaded by

Copyright:

Available Formats

White Paper

Success with Big Data Analytics

3 Playbook for Success with Big Data Analytics .................................................................................4

3.2 Design and Deployment ............................................................................................................................. 5

3.3 Data Management and Governance ........................................................................................................... 7

3.4 Operations ................................................................................................................................................. 8

3.5 Security and Compliance ............................................................................................................................ 9

3.6 Program Management .............................................................................................................................. 10

3.7 Partners ................................................................................................................................................... 10

2.1 Why Is Success with Big Data So Hard to Achieve?

3 Playbook for Success with Big Data Analytics

Figure 1) Competencies required for success with big data analytics.

Competencies Required for Success with Big Data Analytics

Design & Data Mgt. &

3.1.1 Business Strategy and Alignment

3.1.2 Use Case Definition and Prioritization

3.2 Design and Deployment

3.2.1 Target Architecture and Solution Design

3.2.2 Infrastructure Resilience and Agility

3.3 Data Management and Governance

3.3.1 Data Ingestion

3.3.2 Metadata Management

3.3.4 Data Lifecycle Management

3.4.2 Efficiency and Performance

3.5 Security and Compliance

3.5.1 Data Security

3.5.2 Data Privacy and Compliance

3.6 Program Management

You might also like