You are on page 1of 116

____________________________________

____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 2
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 3
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 4
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 5
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 6
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 7
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 8
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 9
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 10
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 11
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 12
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 13
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 14
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 15
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 16
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 17
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 18
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 19
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 20
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 21
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 22
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 23
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 24
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 25
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 26
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 27
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 28
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 29
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 30
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 31
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 32
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 33
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 34
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 35
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 36
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 37
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 38
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 39
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 40
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 41
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 42
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________

Big Data Architecture Lab (Copyright Arcitura Education Inc. www.arcitura.com) v2.1 43
Table of Contents

About Using Answers and Hints ................................................................................................................. 3


Reading Exercise 12.1 (15 minutes) In-Class Reading and Discussion: SFI Case Study Background 4
Technical Infrastructure and Automation Environment ............................................................................... 4
Business Goals and Obstacles................................................................................................................... 4
Lab Exercise 12.2 (90 minutes) Design Big Data Pipeline for SLA Compliance..................................... 6
Plan Data Acquisition and Storage ............................................................................................................. 7
Plan Data Processing ................................................................................................................................11
Plan Data Export .......................................................................................................................................14
Lab Exercise 12.3 (60 minutes) Alleviate Customer Dissatisfaction ......................................................16
Plan Data Acquisition and Storage ............................................................................................................17
Plan Data Processing and Export..............................................................................................................19
Lab Exercise 12.4 (30 minutes) Reduce Data Storage Cost ....................................................................21
Identify Alternative Data Storage Solution .................................................................................................22
Reading Exercise 12.5 (15 minutes) In-Class Reading and Discussion: LOC Case Study Background
.......................................................................................................................................................................24
Technical Infrastructure and Automation Environment ..............................................................................24
Business Goals and Obstacles..................................................................................................................24
Lab Exercise 12.6 (60 minutes) Solution for Intelligent Oil Exploration.................................................26
Plan Data Acquisition and Storage ............................................................................................................27
Plan Data Analysis ....................................................................................................................................30
Lab Exercise 12.7 (60 minutes) Enhance Oil Well Production ................................................................32
Plan Well Logs Acquisition, Storage and Processing ................................................................................33
Democratize Well Logs..............................................................................................................................36
Lab Exercise 12.8 (60 minutes) Reduce Maintenance Costs and Achieve Regulatory Compliance ...38
Develop Predictive Maintenance Solution .................................................................................................39
Develop Continuous Asset Monitoring Solution.........................................................................................41
Reading Exercise 12.9 (15 minutes) In-Class Reading and Discussion: TXC Case Study Background
.......................................................................................................................................................................43
Technical Infrastructure and Automation Environment ..............................................................................43
Business Goals and Obstacles..................................................................................................................43
Lab Exercise 12.10 (45 minutes) Identify Fraud and Eliminate Waste ...................................................45
Collate and Correlate Datasets..................................................................................................................46
Lab Exercise 12.11 (45 minutes) Prioritize Resource Allocation and Enable Open Data Access........48
Enable Social Media Data Analysis and Public Data Access ....................................................................49
Answers/Hints for Exercise 12.2.................................................................................................................51
Plan Data Acquisition and Storage ............................................................................................................51
Plan Data Processing ................................................................................................................................53
Plan Data Export .......................................................................................................................................54
Answers/Hints for Exercise 12.3.................................................................................................................55
Plan Data Acquisition and Storage ............................................................................................................55
Plan Data Processing and Export..............................................................................................................57
Answers/Hints for Exercise 12.4.................................................................................................................58
Identify Alternative Data Storage Solution .................................................................................................58
Answers/Hints for Exercise 12.6.................................................................................................................60
Plan Data Acquisition and Storage ............................................................................................................60
Plan Data Analysis ....................................................................................................................................62
Answers/Hints for Exercise 12.7.................................................................................................................64
Plan Well Logs Acquisition, Storage and Processing ................................................................................64
Democratize Well Logs..............................................................................................................................66

Copyright Arcitura Education Inc. v2.1 1


Answers/Hints for Exercise 12.8.................................................................................................................67
Develop Predictive Maintenance Solution .................................................................................................67
Develop Continuous Asset Monitoring Solution.........................................................................................69
Answers/Hints for Exercise 12.10...............................................................................................................71
Collate and Correlate Datasets..................................................................................................................71
Answers/Hints for Exercise 12.11...............................................................................................................73
Enable Social Media Data Analysis and Public Data Access ....................................................................73

Copyright Arcitura Education Inc. v2.1 2


About Using Answers and Hints
Answers and hints are located in the back of this booklet. To get the most out of these
course materials, be sure to complete the lab exercises on your own to whatever extent
possible before reading these sections.

Copyright Arcitura Education Inc. v2.1 3


Reading Exercise 12.1 (15 minutes)
In-Class Reading and Discussion: SFI Case Study Background

SFI is a large internet service provider (ISP) and a website hosting company. It provides
internet services, including broadband and TV, to around 7.5 million customers, 5 million
of which are residential customers and 2.5 million of which are business customers. SFI
hosts a large number of websites and provides 24/7 support to its customers via
telephone, email and online chat.

Technical Infrastructure and Automation Environment


SFIs IT landscape can mainly be divided according to its business functions:
broadband/TV and website hosting. The broadband/TV services are provided through a
fiber optic/cable network stretched over hundreds of miles. A fiber optic carries data
between exchanges and the cabinets located at the street level. From the cabinets to the
customers premises, a cable is used that carries both the broadband and TV data. A
wireless router/modem is installed at the customers premises for providing the
broadband service, while a set-top box is used for the TV service. A number of
multiplexers, routers and gateways enable communication of data between the client
location and the Internet. Video content is stored on multiple CDN servers.
Website hosting infrastructure includes load balancers, DNS servers, numerous web
servers, email servers, FTP servers, relational databases, routers and switches.
An incident management system is used for recording and resolving service-related
issues. This system is linked with the CRM system, which is further used by the
customer care agents for registering and answering customer queries. An ERP system
is used for the automation of various business processes and activities, such as payroll,
accounts and purchases of equipment. A range of operational dashboards are used to
monitor the state of services and to ensure that the service delivery is within the
published SLA. An account management application provides the customer with the
ability to create an account, manage subscriptions and view service usage. A billing
application keeps track of customers subscriptions/contracts and service usage and
generates end-of-month bills.

Business Goals and Obstacles


SFI guarantees an uptime of 98.99%. However, for the past 6 months, it has not been
able to keep its published SLA. Figures show that monthly downtime has been more
than 10 hours, whereas the published SLA says the downtime cannot exceed 7.5 hours
each month. The inability to keep up with the published SLA has resulted in customer
defection to other competitors, which is reflected in the recent quarters financial reports.
SFIs management is concerned that if the service provision issues are not resolved in
time, SFIs profit levels may decline exponentially.
A customer satisfaction survey conducted by an independent ISP comparison
organization has caught the eye of the CEO. According to the survey, SFIs customer

Copyright Arcitura Education Inc. v2.1 4


satisfaction level is on a continuous decline. Comments left by customers reveal that one
of the top reasons cited for decreasing satisfaction levels is the time it takes to resolve
customers issues. This is causing frustration among the customers and resulting in
customers cancelling their contracts/subscriptions.
While going through the financial reports related to the IT spending, the CFO noticed an
upward trend in the amount spent on data storage software. Upon further querying, the
IT managers reveal that the cost corresponds to the acquisition of new licenses for
relational databases in order to keep up with the customers increased demand for data
storage. A breakdown of websites shows that not all hosted websites impose strict
relational data storage requirements. Even websites that require so, such as ecommerce
websites, do not have all of their data storage operations require ACID support. The
CFO directs the IT managers to look into the issue of the spiraling cost of data storage
and devise a solution to keep costs to a minimum.
SFI recently launched a pay-per-view service as part of its TV service. In order to entice
viewers, advertisements about different TV shows are displayed via their set-top boxes.
However, the viewers response has not been as projected, and the revenue target is not
being met.
Based on an assessment of the current challenges and the benefits promised by Big
Data, SFIs IT team decides to adopt Big Data technology and techniques. However,
none of the IT team members is conversant with the use of Big Data technologies and
techniques. Consequently, a consulting company is engaged that specializes in the field
of Big Data.

Copyright Arcitura Education Inc. v2.1 5


Lab Exercise 12.2 (90 minutes)
Design Big Data Pipeline for SLA Compliance

A team of consultants from the Big Data consultation company holds a meeting with
SFIs management and IT staff in order to prioritize the goals that need to be addressed.
After much deliberation, SLA fulfillment is given the top priority, for the management
believes that achieving SLA compliance will serve as a means of regaining customer
confidence and will ultimately help towards customer retention.
The consultants start looking into the reasons for non-conformance with the published
SLA. Compliance with SLA starts to slip when any of the offered services
(broadband/TV/website/email) becomes unavailable for more than the agreed downtime
or when, although the service is available, the data transfer speed becomes too slow,
resulting in severely degraded service. Normally, the main reason of total or partial
unavailability is a hardware failure, such as a failed router or a web server. The current
procedure of rectifying service-related issues follows a reactive approach, where an
issue is only fixed once it becomes known either when a customer reports it as an
incident or through the operational dashboards. Once it is known that there is a service
disruption, the next step is to identify the culprit hardware through manual inspection of
various log files. At times, the identification of the related log file itself takes a long time.
All this time taken to find the actual cause of the issue makes SLA compliance harder to
achieve.
The consultants propose a proactive strategy for rectifying total or partial service
unavailability issues by developing a Big Data analytics solution that can continuously
analyze log files to find error conditions. They are planning to develop a Big Data
pipeline that enables SFI to automatically collect log files from a variety of data sources,
processes these log files within a short time period and generate insights. The pipeline
would achieve this via a simple computation of statistics or through the application of
machine learning algorithms, and it would help the IT team quickly find the cause of an
issue.
Each of the following three exercises requires you to identify one or more design
patterns that help the development of a Big Data pipeline.

Copyright Arcitura Education Inc. v2.1 6


The Big Data Pipeline compound pattern, provided for reference purposes.

Plan Data Acquisition and Storage


A list of hardware devices that take part in the transferring or delivering of data in any
shape and form is compiled. The list includes load balancers, DNS servers, web servers,
email servers, FTP servers, relational databases, gateways, routers, switches and CDN
servers. Each of these data sources is allocated a device id, and each data source
creates a delimited log file in textual format that registers the functional aspects of each
device. Each line represents a separate record entry. For example, the web server log
could contain information on the time a client requested a particular resource, such as a
webpage or an image, a clients IP address, the requested resource, the size of the data
returned to the client and the HTTP status code. Different data sources produce log files
at different intervals. Successful analysis of log data requires that log files from all
identified data sources are acquired in their raw forms and on a periodic basis.
Furthermore, it is decided that to ease the burden of adding and removing data sources,
the management of data sources should be possible via point-and-click operations.
Once ingested, the log files will need to be saved in a redundant manner in order to cope
with data loss due to hardware failure. Log files will be first cleansed, and then the
cleansed files corresponding to the same type of data source will be processed together
as a group. The log files will be processed using a distributed processing engine, such
that records in each file are processed sequentially. The processed data will consist of
different types of statistics for each type of device. The computed statistics for each
individual device will need to be stored in such a way that a timeline for all values of
each statistic can be established and queried by using the device id as the key.
Anticipating that the computed statistics will be heavily queried, the statistics need to be
saved in such a way that they lend themselves to achieving maximum read
performance. Queries will be restricted to a specific type of hardware device.

Copyright Arcitura Education Inc. v2.1 7


The Poly Source compound pattern, provided for reference purposes.

The Poly Storage compound pattern, provided for reference purposes.

Copyright Arcitura Education Inc. v2.1 8


The Random Access Storage compound pattern, provided for reference purposes.

The Streaming Access Storage compound pattern, provided for reference purposes.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application. (Note that any pattern referenced must be a core member
pattern of the Big Data Pipeline compound pattern.)
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 9


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of each of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 10


Plan Data Processing
A number of data wrangling operations, including data cleansing, removal of unwanted
data and validation and extraction of data from certain fields, will be performed on the
ingested log files. Each line in the textual file will be processed separately, while log files
originating from the same type of device, such as the router, will be processed together
as a single lot. Due to the number and complexity of the data wrangling operations, SFIs
IT team requires a solution that makes the data wrangling logic easy to manage, such
that the function of each piece of logic can be easily understood and the required piece
of logic can easily be identified and changed in consideration of future requirements.
Once cleansed, certain statistics need to be computed that will eventually be used for
SLA compliance analysis. Due to the importance of these statistics, SFI requires the
means of verifying their authenticity.
Apart from the computation of statistics, the cleansed data will further be used to apply
correlation, regression and clustering techniques in order to help SFI quickly find the
cause of an issue or to predict if an issue is about to occur.

The Big Data Processing Environment compound pattern, provided for reference purposes.

Copyright Arcitura Education Inc. v2.1 11


A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application. (Note that any pattern referenced can be a core or an
optional member pattern of the Big Data Pipeline compound pattern.)
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 12


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the mechanisms required by the
pattern(s), as well as any other mechanism(s) not directly covered by the pattern(s), and
explain their relevance.

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 13


Plan Data Export
The computed statistics and the results obtained from the application of statistical and
machine learning techniques need to be passed to an operational dashboard that is
observed by the IT support team 24/7. The dashboards are browser-based and are
rendered by a reporting application that uses a relational database for populating various
charts and graphs. Currently, a script is executed in an ad-hoc manner to insert data into
the relational database whenever data pertinent to service monitoring becomes
available. However, SFI requires that up-to-date analysis results are available via the
dashboards through periodic log file import and the processing and export of computed
results without requiring any human intervention.

The Poly Sink compound pattern, provided for reference purposes.


A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application. (Note that any pattern referenced can be a core or an
optional member pattern of the Big Data Pipeline compound pattern.)
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the mechanisms required by the
pattern(s), as well as any other mechanism(s) not directly covered by the pattern(s), and
explain their relevance.

Copyright Arcitura Education Inc. v2.1 14


______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 15


Lab Exercise 12.3 (60 minutes)
Alleviate Customer Dissatisfaction

The next problem that the management wants the consultants to tackle is the increasing
customer dissatisfaction due to longer issue-resolution time. The objective is to decrease
the time it takes to resolve customer-reported service issues, which will alleviate
customer dissatisfaction and increase SFIs rating when compared with other ISPs.
A customer can report an incident by calling the customer care team, sending an email,
filling an online form on SFIs website or through online chat with the customer care
team agent. Once an incident is registered, first-line technical support presents the
customer with a set of standard troubleshooting solutions that may or may not be
relevant to the specific nature of the issue that the customer is currently facing. If
unresolved, the incident is forwarded to the second-line technical support, where the
team uses a combination of previous experience and going through old support incidents
to find a similar incident in the past. If the incident still remains unresolved, in the case of
a broadband/TV service issue, an engineer is sent to the customers location. This adds
to SFIs operational costs, whereas in the case of website/email issue, the incident is
forwarded to third-line support.
The consulting team proposes an analytics-driven solution to reduce the time it takes to
successfully resolve customer service issues. The team plans to empower first-line
support by providing first-line support team members with incident-specific
troubleshooting information. The idea is that by providing case-specific troubleshooting
information, the time it takes to find the right solution can be greatly reduced. This will
further reduce support-related costs by saving money on unnecessary callouts to
customers premises.
The incident management system keeps a record of all issues raised by customers. This
system uses a relational database for storing incident related data. Although the current
system has been in use for the past 5 years, due to the large number of incidents that
get generated and the limited storage space of the relational database, only incidents
going back as far as 2 years are available. Older incidents are periodically archived by
exporting the data as XML files that currently amount to around 1.5 petabytes in size.
The proposed solution will employ text analytics and semantic search techniques to find
similar incidents reported within the last 5 years. The matched incidents resolutions can
then be recommended to the first-line support team members in order to achieve a
targeted and timely resolution of the current incident.
Furthermore, it is also planned to find the total number of similar incidents reported by
customers in the past 24 hours within the same area. This will help support team
members determine if it is an issue that is local to a particular customer or a more
general issue. The solution will be based on the frequent querying of current incidents to
find out the total number of incidents that share the same incident type.

Copyright Arcitura Education Inc. v2.1 16


Plan Data Acquisition and Storage
The implementation of the solution requires current incident data from the incident
management systems relational database as well as the archived data in order to build
a large-enough repository of different types of incident resolutions. Once acquired, each
incident will be processed one-by-one in order to be converted into a structured form that
consists of an extremely wide row of data. This structured form will then be used as an
input for clustering and distance-based search techniques.
In order to find the number of similar reported incidents from the past 24 hours, one of
the less experienced IT team member proposes that a simple query can be run every 30
minutes that groups newly reported incidents by type and area. A quick test reveals that
such a query can take up to 10 to 15 minutes to complete and that while it is executing,
the entire incident management system grinds to a halt. Consequently, this option is
disregarded. The consultants come up with a viable solution that will import newly
reported incidents every 15 minutes from the incident management systems relational
database as a dataset. CRM data will further be required to get the area information for
each reported incident. The CRM data is also stored in a relational database. The two
datasets will be joined together to create a single dataset, which will be batch processed
to generate per-area statistics. All imported data will be processed using a distributed
processing framework. Looking at the amount of data to be imported, the IT team has
stipulated that the storage footprint should remain as small as possible.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

Copyright Arcitura Education Inc. v2.1 17


______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 18


Plan Data Processing and Export
Once the 5-year-old incident data has been imported, incident id, incident details and
resolution details will be extracted from each incident record. An algorithm will then be
applied to each records incident details to convert it into a structured form that
represents a large matrix. The structured form of the incident records will be subjected to
a clustering algorithm to find groups with similar incident details. The clustering algorithm
is a highly iterative algorithm that requires the data to be processed repeatedly.
When a new incident gets reported, its incident details will be converted to the
aforementioned structured form, and then it will be compared against each of the already
processed historic incidents using a distance-based comparison method. The historic
incident records that are at close proximity to the new incident will then be exported to
the relational database of the incident management system. The entire data processing
will take place on a cluster of machines.
With regards to finding the total number of similar incidents reported by customers in the
past 24 hours within the same area, the imported incidents dataset will be first joined
together with the CRM customer dataset using customer id as the joining criteria. The
joined data will then be processed in a sequential manner, such that the incidents with
the same incident type and customer location fields will be grouped together. Then the
total will be counted for each group. The generated totals for each area will then be
forwarded to the incident management system. The total statistic will be automatically
recomputed every 15 minutes once newly imported incidents data becomes available
and sent to the incident management system.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms and explain
how the mechanisms enable the application of each of the identified patterns.

Copyright Arcitura Education Inc. v2.1 19


______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 20


Lab Exercise 12.4 (30 minutes)
Reduce Data Storage Cost

SFI currently hosts a large number of a variety of websites. Some of these websites run
ecommerce sites, some act as a frontend for a variety of browser-based applications,
some host blogs and only a handful display static informational content that gets
updated infrequently. The websites that only display static content use a file system as
backend storage for the website content. However, all the other hosted websites use
relational databases for storing a variety of data. Some of these websites require
relational storage with ACID support for enabling transactional operations, such as order
processing and payment processing operations. However, not all operations require
relational storage, such as the storage of non-mutable data or update of data without
strict consistency requirements (data can remain stale for some period of time). Also,
most of the websites store structured data and unstructured data, such as images and
videos. Semi-structured data, such as blog entries and XML data, is also stored within
the relational databases.
In the recent past, ecommerce and social media-driven websites have been generating
very large amounts of data. To manage the increase in demand for data storage, SFI
has had to add additional database servers and buy licenses, resulting in a steep
increase in its IT spending. While SFI charges its customers for the amount of data
stored, the charge is heavily subsidized by SFI in order to remain competitive. Although
SFI can cope with the current data storage demand, the IT team envisages that the
added capacity will soon hit its limit, requiring a further increase in capacity. On the other
hand, some customers with technical understanding have also started demanding
alternative data storage solutions that are more scalable and provide better
performance.

Copyright Arcitura Education Inc. v2.1 21


Identify Alternative Data Storage Solution
SFIs IT team needs to implement a data storage solution that will help SFI cut down its
spending on the provisioning of the data storage service to its customers. The IT team
conducted a survey of SFIs entire customer base and has compiled a list of the different
types of data that currently reside in its relational databases. The compiled list includes:
images; videos; nested data, such as invoices and emails in JSON and XML format;
blog entries; product/service related comments; social media messages consisting of
timestamp, user id, location and message text and customer profiles that consist of a
large number of fields, some of which are grouped such as address. All data
manipulation operations performed on semi-structured and unstructured data require
accessing each record based on a unique key, whereas some data manipulation
operations performed on semi-structured data require accessing records based on the
value of non-key fields.
The IT team further requires that the new storage architecture ensure data availability in
the face of hardware failure and provide high performance read/write operations.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

Copyright Arcitura Education Inc. v2.1 22


______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 23


Reading Exercise 12.5 (15 minutes)
In-Class Reading and Discussion: LOC Case Study Background

LOC is a large oil company that deals with the exploration, extraction, storage and
refining of oil. LOC has been in operation for nearly 4 decades and consists of over
5,000 wells, both onshore and offshore, that jointly produce one-fourth of the entire
countrys daily oil production.
Oil is extracted from reservoirs by drilling wells. There can be multiple oil wells in a
single oil field. The extracted oil is then transferred to different refineries using a network
of pipelines, trucks and trains. The refined petroleum (gasoline and diesel) is then
delivered to various gas stations across the country.

Technical Infrastructure and Automation Environment


A number of applications and information management systems make up LOCs IT
environment. Some of these systems and applications are legacy in nature and are
specific to the oil industry. A range of dashboard-based applications are used for
monitoring reservoirs, wells, pipelines and refinery operations. Specialized geographical
information systems (GIS) are used for analyzing existing wells and exploring
prospective sites for drilling new wells. High performance computing (HPC) systems are
used for creating various types of models and running simulations.
Details about the amount of crude oil extracted from each well, the amount of crude oil
entering and leaving the refinery and various other production-related statistics are
recorded in various spreadsheets. These spreadsheets are then imported into an ERP
system. Financial data, such as data on various costs, volume of oil sold, profit and loss
statements and balance sheets, is also stored in the ERP system. Data regarding
various types of equipment and their maintenance is stored in an asset management
system. An enterprise data warehouse, which is periodically updated with data from a
range of applications and systems, is used to generate different reports at different
intervals, such as end-of-day and end-of-month, for analyzing production and refinery
operations, and profitability of LOC as well as ensuring regularity compliance.

Business Goals and Obstacles


A number of LOCs oil wells are nearing the ends of their lives, due to which LOC is
constantly exploring new oil reservoirs. However, LOCs latest financial reports show an
upward trend with regards to the amount spent on oil exploration with suboptimal
returns. The main contributory reason is the selection of sites that either contain
substandard oil or where the oil reserve contains less oil than originally predicted. Any
well drilled on such substandard sites results in a loss of millions of dollars. LOC needs
to find a way to select only sites that contain quality and plentiful oil reserves and
accelerate its oil exploration operations in order to keep a healthy inventory of oil wells
and to gain a competitive edge over other oil companies.

Copyright Arcitura Education Inc. v2.1 24


Oil exploration reports further show that new oil reserves are getting harder to find. This,
coupled with the fact that LOC operates in an industry where the resource (oil) depletes
over time, requires LOC to focus on making existing oil wells more profitable. To ensure
profitability, LOCs board directors emphasize on obtaining maximum return from LOCs
existing oil wells by making sure that each well is delivering maximum output. While
focusing on the optimization of well operations, the board observes that maintenance
costs are devouring a large portion of the profit. Unplanned repairs lead to downtime,
thereby affecting the yield. The board advises the operations managers to investigate
the issue in order to reduce maintenance costs.
Another issue that is affecting LOCs profitability is its inability to fully comply with the
newly introduced industry regulations, due to which LOC has had to pay heavy fines on
different occasions. Some main areas with regards to regulatory compliance include
operational safety, environmental considerations and detailed oil production and
financial reporting.
In order to address its business goals and objectives, LOC needs to adopt a data-driven
approach such that all of its operations and decision-making take into account all
available data. To implement this approach, LOC decides to incorporate Big Data
technologies and techniques. However, in the absence of any in-house Big Data skills,
LOC turns to you to guide them towards achieving their goals via Big Data.

Copyright Arcitura Education Inc. v2.1 25


Lab Exercise 12.6 (60 minutes)
Solution for Intelligent Oil Exploration

The first issue that you have been asked to look into is how to enhance oil exploration so
that only sites that can provide the best ROI are chosen. In order to design the required
Big Data solution environment, you perform some preliminary analysis in terms of how
the process works and the type of data involved.
Oil exploration involves analyzing large amounts of rock formation data, seismic data
and geospatial data. Historical reservoir data and well production data within the same
area or between similar areas is further analyzed to determine the quality and quantity of
the potential oil reserves. Once an oil reservoir is found, the required land is leased via
bidding. The amount of the bid and the duration of the lease depend upon the predicted
amount and the grade of the oil reserves and how much oil can be extracted each day.
Determination of these factors takes a considerable amount of time because the
engineers have to analyze, correlate and develop models from terabytes of data from
different information systems, for each system specializes in handling only a specific
type of dataset.
The engineers believe that the process of finding the right oil reserve can be greatly
expedited if all required data that needs to be analyzed is available at one place. They
further believe that access to increased amount of data will help them improve the
accuracy of their predictions. However, the current IT infrastructure does not provide a
means for storing and analyzing large volumes of non-relational data.
Based on your findings, you plan to design a repository of semi-structured and
unstructured data through the implementation of an unstructured data store.
Each of the next two exercises requires you to identify one or more design patterns that
help towards the development of a Big Data unstructured data store.

Copyright Arcitura Education Inc. v2.1 26


The Unstructured Data Store compound pattern, provided for reference purposes.

Plan Data Acquisition and Storage


You start with finding the required datasets that are analyzed by the engineers. This
includes country-wide borehole data, geospatial data, seismic data, historical reservoir
data and well production data.
The borehole dataset contains data on around 500,000 boreholes, along with their high
resolution images. The borehole dataset is petabytes in volume and consists of XML and
image data. The geospatial dataset contains vector and raster data. The vector data
consists of the location and some general attributes of the boreholes, while the raster
data comprises aerial photographs, both of which are in binary format. A seismic dataset
that contains data from a large number of surveys also consists of binary data and
amounts to petabytes in volume. The reservoir and well production datasets consist of
data on the entire set of historical oil reservoirs and oil wells that have been drilled in the
past, respectively. These two datasets are terabytes in size and consist of thousands of
spreadsheets.
Apart from the images, the binary data will be processed record-by-record for extracting
attributes required for numerical data analyses. The borehole XML conforms to the
standard format of storing borehole data, containing multiple levels of nested data for
each borehole that will be parsed and stored in the original standard format. This is
required because the engineers understand the borehole data only when it conforms to
the standard format. After preprocessing the spreadsheets, which involves certain data

Copyright Arcitura Education Inc. v2.1 27


validation checks, records will be stored such that their structure resembles the original
spreadsheet so that queries against such data can be made based on the structure that
the engineers are familiar with.
All processed data and images should be stored in way such that specific records that
fulfill criteria can be individually retrieved. Nearly all of these datasets have been
obtained from commercial exploration companies. Hence, it is imperative that they are
stored in a redundant manner. It is believed that multiple teams of engineers would be
evaluating different areas for potential oil reserves at the same time to increase the
success rate. You keep this requirement in mind and plan to save datasets in a way
such that none of the teams experience degraded data access performance. Bearing in
mind LOCs profitability, you plan to design a storage architecture that offers storage for
very large amounts of data without too much investment.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application. (Note that any pattern referenced can be a core or an
optional member pattern of the Big Data Unstructured Data Store compound pattern.)
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 28


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 29


Plan Data Analysis
The raw datasets, apart from the image data, are processed such that each record is
cleansed and validated independently. The processed borehole dataset and the seismic
dataset are used to determine the location of oil reservoirs. The geospatial dataset is
used to analyze the location where the well needs to be drilled in order to get an idea
about costs involved, such as the type of the land, how easy it is to transport equipment
to the location and any structures that might need to be removed for enabling access.
Once a potential oil reservoir is found, the reservoir and well production datasets are
then subjected to different regression-based predictive models for predicting the quantity
and quality of oil in the reservoir and finding the time period for which the reservoir will
remain productive. A classification algorithm is further used for finding the type of the oil
reservoir. However, before any of these advanced analysis techniques can be applied,
the datasets need to be preprocessed by applying a range of data reduction techniques.
The entire preprocessing involves multiple steps that need to be executed one after the
other, where each step can potentially take a long time to execute. In the case of an
error, the entire set of preprocessing steps needs to be executed from scratch. Sensing
that this could impact the entire oil exploration data processing operation, you plan to
implement a data processing strategy that does not require the re-execution of all data
preprocessing steps.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application. (Note that additional pattern(s) not part of the Big Data
Unstructured Data Store compound pattern may also be required.)
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 30


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 31


Lab Exercise 12.7 (60 minutes)
Enhance Oil Well Production

Next, you are asked to help LOC in optimizing its well operations in order to obtain the
maximum possible yield from each well. To design a solution, you investigate how oil
wells are currently monitored.
Subsurface sensors and sensors installed on the well-head take continuous
measurements in the form of well logs. Gigabytes of data are continuously generated by
these sensors each day. However, in the absence of a storage infrastructure that can
store gigabytes of data generated by each sensor each day, readings are currently taken
manually by the engineers once a day. These readings are entered into a spreadsheet.
The spreadsheets are sent to the head office via FTP on a weekly basis. One of the IT
team members then imports all 2500 spreadsheets received for each oil well into the
ERP system via a script. Following this, queries are run against the imported data to
generate various statistics, which are then made available to the engineers and business
managers via different dashboards. The aforementioned process, from receiving the
spreadsheets to generating the statistics, takes around 4 to 5 days.
At present, this weekly import of well data coupled with the time it takes to import them
into the ERP means that the engineers and the managers do not have access to the
latest well production data. Any decisions taken to adjust production parameters, related
to well operations, are based on stale data. Furthermore, due to storage space
limitations, the ERP dashboard can show production statistics going back to 6 months
only. The type of the statistics displayed in the dashboards is predetermined. If the
engineers and the managers need a new set of statistics, although they understand
SQL, they need to ask the IT team, for they do not have direct access to the well log
data. The IT team can take up to 15 days to implement the requested changes.
Having completed your investigation, you believe that the lack of up-to-date information
about the operation of wells is inhibiting LOC from making the right decisions at the right
time for optimizing well production. To resolve this issue, you plan to develop a Big Data
solution that is capable of ingesting well logs on a daily basis from across all wells and
that can process them to generate the required statistics overnight so that the latest
statistics are available to the engineers and managers for analysis the very next day.
Furthermore, you intend to make the raw well log data available to the engineers and the
managers so that they can query the data and generate new statistics as needed. Apart
from this, to enhance tactical decision-making, you aim to provide access to the previous
5 years of well logs. By looking at long-term data, more confidence can be instilled in the
decision-making process.

Copyright Arcitura Education Inc. v2.1 32


Plan Well Logs Acquisition, Storage and Processing
A single oil well produces around 5 gigabytes of log files each day. This amounts to
around 12.5 terabytes of data from across all oil wells. For each well, different sensors
record different types of readings in different logs. Although containing different types of
readings, all logs are textual in nature, and each log entry consist of comma separated
values on a single line.
Log files from all oil wells will be imported daily and then subjected to a series of
validation checks, which involve line-by-line parsing of data and then verifying if each
value falls within the expected range of values. Next, the data needs to be normalized
because some sensors that provide the same type of reading use different measurement
units. Normalization involves converting data to the same measurement units so that
data from different oil values can be aggregated later.
Various statistics will be generated for each well, oil field and all oil fields. These
statistics need to be saved in a way such that data for each well is uniquely identifiable.
However, you discover that the calculation of some of the statistics is not possible, for
some required values currently appear on two separate lines in the log file. You plan to
implement a storage strategy that solves this issue. The generated statistics need to be
presented to the engineers and the managers based on a graphical interface with point-
and-click functionality.
The well logs contain very detailed data about various operational aspects of an oil well,
and it is anticipated that it may become the basis of different data-driven applications.
However, it is not known in advance which technologies will be used to develop those
applications. Although the managers are excited to be able to view statistics on the latest
productions, they need to be sure that log data remains secure even if accidently
accessed, for the well logs contain sensitive production data. Furthermore, the IT team
has requested that the entire data processing only require minimum human intervention.
After a brief chat with the engineers, you further decide to include a dataset from the
ERP system, which uses a relational database as its storage backend, containing
reference data to be used for setting threshold values for certain statistics.

Copyright Arcitura Education Inc. v2.1 33


A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 34


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, along with
any additional mechanisms that are not directly required by the pattern(s) but are
mandatory for the solution, and explain how they enable the application of the identified
patterns.

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 35


Democratize Well Logs
Having set up a Big Data solution for processing well logs, you further plan to
democratize well logs so that the engineers and managers can access raw log data
without involving the IT team.
Although the engineers and managers have SQL knowledge, they do not have any
programming skills. Apart from addressing the log data querying requirement, you also
find out that the raw log files may also be analyzed via oil industry-specific analysis tools.
These tools use a relational database as their storage backend. You plan to enable this
requirement without duplicating log files. You further notice that all of the information
systems within LOC use a federated security model and that LOCs IT team would like to
extend this security model to the Big Data solution environment.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 36


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 37


Lab Exercise 12.8 (60 minutes)
Reduce Maintenance Costs and Achieve Regulatory Compliance

LOCs management is really satisfied with the progress you have been making so far.
They are already reaping the benefits of Big Data adoption in the form of increased
profits via timely analysis of a variety of voluminous data, which LOC was unable to
perform in the past. Building on your success, you start looking into LOCs final set of
business objects: reducing the cost of maintaining equipment and ensuring full
compliance with the newly introduced industry regulations.
Equipment is currently serviced/replaced based on predetermined intervals or when the
engineers perform a visual inspection, the timing of which can vary between engineers
and is normally dependent upon the experience of the engineer. The service, repair and
inspection records are stored in the asset management system. An inventory of parts is
kept in multiple warehouses across the country. Parts are ordered from different
suppliers spread across the globe and can take up to 7 days to arrive. However, parts
often fail unexpectedly, and when that happens, drilling, oil production from wells or
refinery operations grind to a halt, requiring emergency part replacement. This can
further create logistical problems, especially if the breakdown occurs at a remote site.
With regards to the activities undertaken for assuring regulatory compliance, all types of
operations, especially well drilling, need to demonstrate adherence to strict safety
guidelines at all times. This is a real concern for LOC because operational safety is only
maintained via infrequent physical inspections. One of the reasons behind the infrequent
inspections is the remote nature of the sites. A simple incident left unchecked can result
in a catastrophic accident, such as a blowout, posing grave danger to human lives as
well as the surrounding environment.
After a detailed consultation with the engineers and managers, you come to the
conclusion that the best way to reduce LOCs maintenance and repair costs is to
develop an intelligent asset management solution based on predictive analytics that can
forecast if a part is about to fail. Advance knowledge of service requirements will help
the engineers schedule a service in good time before the part fails, thus reducing well or
refinery downtime. A proactively planned service will further help ensure that a healthy
inventory level of the required parts exists.
On the other hand, for achieving full regulatory compliance, you propose the continuous
monitoring of all oil wells and pipelines. Such a monitoring system will provide advance
warning of any imminent issues. Furthermore, detailed data regarding all areas of
operations will be kept that will become the basis for fulfilling the newly imposed
regulatory requirement of detailed operational reporting.

Copyright Arcitura Education Inc. v2.1 38


Develop Predictive Maintenance Solution
You plan to develop a predictive asset maintenance solution based on an unsupervised
machine learning technique.
The unsupervised machine learning technique will work in an offline manner and will
provide information on the common reasons for part failure. The input data required by
the underlying algorithm consists of millions of service, repair and inspection records
stored in the asset management system, based on a relational database, that amount to
approximately 250 terabytes of data. Apart from other information, each record also
contains the engineers notes in the form of free-form text. These notes along with other
information will be mined to find failure patterns for different categories of equipment,
such as drill heads, pumps, valves and pipes. Once the commonly occurring reasons
have been identified, a days worth of sensor data from all oil wells and the entire
network of pipes will be gathered once a day as log files and then automatically
analyzed to see if the data manifests any of the previously identified failure patterns. The
entire log processing will take place without any human intervention.
Due to the extremely large amount of service, repair and inspection records, you are
thinking to use a machine learning algorithm that can process data in a distributed
fashion. Also, the selected algorithm performs multiple passes over the data. The
extracted patterns will be saved in a delimited file format. Daily sensor data log files will
then be processed, where each record is compared against the already identified failure
reasons. If a match is found, details about the corresponding equipment and part are
forwarded to the asset management system for the engineers attention.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 39


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 40


Develop Continuous Asset Monitoring Solution
To enable the continuous monitoring of oil wells and pipelines, you plan to design a
solution that is capable of analyzing the sensor data in realtime.
A number of sensors are installed on the drilling equipment, inside the wells, on the
wellhead and across the entire network of pipes to provide a range of measurements,
including temperature, pressure, flow rate and revolutions per minute (RPM), every 10
seconds. This data will be ingested as it gets generated by the sensors and will be
analyzed instantaneously. If any of the values does not fall within the normal range of
values, that value will be flagged, and the engineers will be notified instantly via alerts.
After its initial analysis, the sensor data will be saved in its raw form, such that the
sensor data can be queried by the managers based on different selection criteria for
each type of equipment. You are further thinking of enhancing the predictive
maintenance solution (designed previously) by enabling the instant analysis of incoming
streams of sensor data rather than analyzing log files, which means that each incoming
stream of data also needs to be forwarded to the solution that currently analyzes sensor
log files for finding a match against identified failure patterns.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 41


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 42


Reading Exercise 12.9 (15 minutes)
In-Class Reading and Discussion: TXC Case Study Background

TXC is the local government for a large metropolis collecting taxes and providing a
range of services to a population of over 15 million. Services include fire, ambulance,
police, libraries, waste collection, recycling, social care, streets and parks maintenance
and schools. Apart from getting subsidy from the federal government, TXC finances its
services through the collection of taxes, rates and fines. It is further responsible for
enforcing building regulations, urban development and maintaining electoral register.

Technical Infrastructure and Automation Environment


TXCs IT environment is littered with a number of stand-alone systems. A number of
legacy systems are in use, and most of the systems are based on 10 to 15-year-old
technology. Two main reasons behind TXCs slow adoption of contemporary IT solutions
are excessive bureaucracy and a very long consultation period.
A separate system is used for managing each service. For example, two different client-
server applications exist for managing taxes and rates. A GIS system is used for land
management and to perform property-related searches. Benefits-related systems
manage childcare and adult care disbursements. A range of services such as rubbish
collection and recycling are provided via partner organizations. These partner
organizations send invoices at the end of each month that are manually entered into a
legacy accounting software that uses a proprietary database.
A complaints and incident reporting application, which uses a relational database as its
backend, is used to record complaints and incidents by the citizens. Different document
management systems are used for keeping citizens personal data, property information
and other documents. TXCs website enables the online payment of taxes, rates and
fines, reporting of complaints and incidents, such as potholes, and further provides
information about important developments and events in the metropolis. A HR system is
used for maintaining employees record and payroll. Spreadsheets are used as the
common means of data analysis and reporting.

Business Goals and Obstacles


In the wake of the recent economic downturn, the federal government has introduced
austerity measures. A large chunk of the subsidy offered by the federal government is
not available anymore, due to which TXC needs to make massive cuts in spending and
make savings. To enable this, it needs to streamline its revenue acquisition by making
sure that the projected amount of revenue actually gets collected. On the other hand,
TXC needs to identify processes where waste is occurring and eliminate waste.
The spending cuts have forced TXC to work with a lean workforce. Although the budget
has been reduced, citizens still expect the provisioning of quality services in a timely

Copyright Arcitura Education Inc. v2.1 43


manner. TXC needs to strategically allocate its thinly spread resources to make sure that
the citizens are satisfied with the level of services provided.
Apart from budgetary issues, TXC also needs to implement the federal governments
vision of open data access. This requirement warrants TXC to not only provide public
access to a variety of datasets that it holds but also fulfill custom data requests from
citizens within a constrained timeframe.
TXC understands that the solution to its currently faced issues lies within full visibility
and understanding across its entire set of operations. For this, TXC looks towards Big
Data as a means of fulfilling its business goals. You have been brought in as the lead
Big Data architect to design a solution environment based on Big Data technologies.

Copyright Arcitura Education Inc. v2.1 44


Lab Exercise 12.10 (45 minutes)
Identify Fraud and Eliminate Waste

TXCs priority is to maximize its revenue collection, as past 5 years statistics reveal that,
on average, it has only been able to collect 83% of the targeted tax and rates. Similarly,
the recovery of fines, such as the collection of parking fines, has not been 100%. These
discrepancies in revenue collection mean a smaller budget for providing services.
Statistics further reveal that fraud within childcare and adult social care is responsible for
million-dollar losses. One other major area of improvement that TXC envisages for cost
savings is the mitigation of waste that not only occurs within service delivery but also
within the current business practices of TXC. For example, different departments
procure the same supplies from different suppliers, which, if consolidated, can result in
massive savings. Last but not the least, a study conducted by the auditors has revealed
that in some cases, suppliers were paid more than once, further devouring TXCs
already shrinking budget.
Preliminary analysis shows that the main reason behind the aforementioned issues is
the lack of cross-functional understanding of TXCs operations and timely reporting. You
believe that fraud identification and waste elimination can be achieved through a data-
driven strategy that collates data from siloed applications in order to provide full and
timely visibility across multiple business functions.
How much tax would be charged on a building, domestic and non-domestic, depends on
the information provided by the occupant. To fall into a lower band of tax, the payee
provides false information, such as false information about property or annual revenue
generated by a business, which is lower than the payees actual payment. However,
TXC can only perform a limited number of physical inspections to verify the facts. The
same principle applies to the payment of benefits for social care.
You propose a Big Data solution that will correlate TXCs tax and benefit records both
against internal and external datasets in order to detect fraud. Furthermore, the solution
will assemble data from different departments to get a unified view of TXCs operations
in order to identify opportunities for reducing waste.

Copyright Arcitura Education Inc. v2.1 45


Collate and Correlate Datasets
The data held on domestic and non-domestic properties for tax/rate calculation and data
regarding individuals for dispensing monthly social care payouts is stored on proprietary
systems that can only export data in a delimited file format. The combined size of the
files amounts to around 5 terabytes of data.
You are planning to correlate this data with census, revenue and building data. The
census and revenue datasets are external datasets consisting of XML data, while the
building data resides in a relational database that serves as a backend for an application
that keeps a record of planning applications submitted by the residents. All datasets will
be individually processed in a distributed fashion to extract the required fields from each
record. Records corresponding to each dataset will then be stored in a single database
so that queries can be executed against disparate datasets based on common fields,
such as property address and business name.
To identify waste, you are thinking to start by importing procurement data from across all
departments that is available in the form of spreadsheets. By collecting all procurement
data in one place, inter-department queries can be executed to identify items that are
commonly purchased between different departments. You anticipate that queries will be
dynamic in nature and will be executed by the data analysts that can only manipulate
data using SQL.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 46


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 47


Lab Exercise 12.11 (45 minutes)
Prioritize Resource Allocation and Enable Open Data Access

Your next task is to help TXC in the strategic deployment of its limited resources and to
enable public access to a variety of datasets.
A meeting is held to decide on the best course of action for deploying resources. It is
suggested that, as the services are provided to the general public, it would be ideal to
incorporate the publics opinion on which services should be given priority. Some
mangers suggest conducting a survey based on a sample of individuals. However,
others are of the opinion that doing so would not only take a long time but may also be
biased, as it will be based on the opinion of a handful of people. You step in and propose
that social media data can be analyzed to find out what the public actually values more.
Based on public opinion, budget and other resources can be allocated accordingly.
The implementation of the open data access policy requires TXC to collate data from
across different departments and make it available to the general public. However,
before making the data public, certain information, such as personally identifiable data,
will either need to be anonymized or completely removed. Additionally, members of the
public may also request data that requires gathering specific data elements from multiple
datasets based on individual request criteria.

Copyright Arcitura Education Inc. v2.1 48


Enable Social Media Data Analysis and Public Data Access
You are planning to incorporate two different sources of social media data: one source
provides data the moment a user sends a message to TXC (estimates show that, on
average, 15,000 messages may be sent), while the other source provides user
comments in the form of a delimited textual file at the end of each day with an average
size of 2 gigabytes. At this stage, you are only planning to analyze social media once a
day. The social media data will be processed using a distributed text analytics algorithm
that extracts relevant text from each message or comment and then applies specialized
text processing techniques. The results will be displayed in a purpose built dashboard
that requires data in XML format. The dashboard will be automatically refreshed as new
data gets incorporated each day.
For enabling open data access, spreadsheets will be imported on a monthly basis to
create 25 different datasets. However, these datasets will first be processed to remove
certain fields and to apply anonymization logic to personally identifiable data. The
processed datasets will then be exported to a webserver for FTP access. To fulfill
custom data requests, the data analysts need to be able to execute different queries
against these processed datasets so that they can extract the required data. You are
told that the number of datasets will increase in the future, each requiring a different set
of fields to be anonymized or removed. Anticipating that this can create dataset
management issues, such as leaving personally identifiable data as is and instead
anonymizing a field that is not required to be anonymized, you plan to implement policy-
based management of datasets that would enable TXCs IT team to effectively and
easily manage datasets.

A. Identify the design pattern(s) that need(s) to be applied to fulfill these requirements
and describe the application.
______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 49


B. Illustrate the Big Data analytics logical architecture resulting from the application of
the previously identified pattern(s) by identifying the required mechanisms, and explain
how the mechanisms enable the application of the identified pattern(s).

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

Copyright Arcitura Education Inc. v2.1 50


Answers/Hints for Exercise 12.2

Plan Data Acquisition and Storage


Patterns:
x Poly Source
o File-based Source
x Poly Storage
o Streaming Access Storage
Streaming Storage
Dataset Decomposition
o Random Access Storage
High Volume Tabular Storage
Automatic Data Sharding
o Automatic Data Replication and Reconstruction
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 51


Big Data Analytics Logical Architecture:

(See the Module 10 and 11: Big Data Design Patterns supplements to find out how
these mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 52


Plan Data Processing
Patterns:
x Big Data Processing Environment
o Large-Scale Batch Processing
o Complex Logic Decomposition
o Automated Processing Metadata Insertion
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Big Data Analytics Logical Architecture:

Copyright Arcitura Education Inc. v2.1 53


(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns and the
fulfillment of any requirement not directly covered by these patterns.)

Plan Data Export


Patterns:
x Poly Sink
o Relational Sink
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Big Data Analytics Logical Architecture:

Copyright Arcitura Education Inc. v2.1 54


(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns and the
fulfillment of any requirement not directly covered by these patterns.)
Answers/Hints for Exercise 12.3

Plan Data Acquisition and Storage


Patterns:
x Poly Source
o Relational Source
o File-based Source
x Poly Storage
o Data Size Reduction
o Streaming Access Storage
Streaming Storage
Dataset Decomposition
o Random Access Storage
High Volume Tabular Storage
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 55


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 56


Plan Data Processing and Export
Patterns:
x Big Data Processing Environment
o Large-Scale Batch Processing
o Large-Scale Graph Processing
x Poly Sink
o Relational Sink
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns and the
fulfillment of any requirement not directly covered by these patterns.)

Copyright Arcitura Education Inc. v2.1 57


Answers/Hints for Exercise 12.4

Identify Alternative Data Storage Solution


Patterns:
x Poly Storage
o Random Access Storage
High Volume Binary Storage
High Volume Hierarchical Storage
High Volume Tabular Storage
Automatic Data Sharding
o Automatic Data Replication and Reconstruction
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 58


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 59


Answers/Hints for Exercise 12.6

Plan Data Acquisition and Storage


Patterns:
x Poly Source
o File-based Source
x Poly Storage
o Streaming Access Storage
Streaming Storage
Dataset Decomposition
o Random Access Storage
High Volume Binary Storage
High Volume Tabular Storage
High Volume Hierarchical Storage
Automatic Data Sharding
o Automatic Data Replication and Reconstruction
o Data Size Reduction
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 60


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 61


Plan Data Analysis
Patterns:
x Big Data Processing Environment
o Large-Scale Batch Processing
o Intermediate Results Storage
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 62


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns and the
fulfillment of any requirement not directly covered by these patterns.)

Copyright Arcitura Education Inc. v2.1 63


Answers/Hints for Exercise 12.7

Plan Well Logs Acquisition, Storage and Processing


Patterns:
x Poly Source
o Relational Source
o File-based Source
x Poly Storage
o Streaming Access Storage
o Random Access Storage
o Confidential Data Storage
x Big Data Processing Environment
o Large-Scale Batch Processing
x Automated Dataset Execution
x Canonical Data Format
x Dataset Denormalization
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 64


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 65


Democratize Well Logs
Patterns:
x Processing Abstraction
x Direct Data Access
x Integrated Access

(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 66


Answers/Hints for Exercise 12.8

Develop Predictive Maintenance Solution


Patterns:
x Poly Source
o Relational Source
o File-based Source
x Poly Storage
o Streaming Access Storage
Streaming Storage
Dataset Decomposition
x Big Data Processing Environment
o Large-Scale Graph Processing
x Poly Sink
o Relational Sink
x Automated Dataset Execution
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 67


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 68


Develop Continuous Asset Monitoring Solution
Patterns:
x Poly Source
o Streaming Source
o Fan-out Ingress
x Poly Storage
o Random Access Storage
o Realtime Access Storage
x Big Data Processing Environment
o Large-Scale Batch Processing
o High Velocity Realtime Processing
o Processing Abstraction
x Poly Sink
o Streaming Egress
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 69


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 70


Answers/Hints for Exercise 12.10

Collate and Correlate Datasets


Patterns:
x Poly Source
o Relational Source
o File-based Source
x Poly Storage
o Streaming Access Storage
o Random Access Storage
x Big Data Processing Environment
o Large-Scale Batch Processing
o Processing Abstraction
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 71


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 72


Answers/Hints for Exercise 12.11

Enable Social Media Data Analysis and Public Data Access


Patterns:
x Poly Source
o File-based Source
o Streaming Source
x Poly Storage
o Streaming Access Storage
o Random Access Storage
x Big Data Processing Environment
o Large-Scale Batch Processing
o Processing Abstraction
x Poly Sink
o File-based Sink
x Automated Dataset Execution
x Centralized Dataset Governance
(See the Module 10 and 11 Big Data Design Patterns supplements for pattern
descriptions.)

Copyright Arcitura Education Inc. v2.1 73


Big Data Analytics Logical Architecture:

(See the Module 10 and 11 Big Data Design Patterns supplements to find out how these
mechanisms enable the application of the previously identified patterns.)

Copyright Arcitura Education Inc. v2.1 74

You might also like