OLAP

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively
extract and view data from different points of view. For example, a user can request that data be analyzed to
display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July,
compare revenue figures with those for the same products in September, and then see a comparison of other
product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in
a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a
multidimensional database considers each data attribute (such as product, geographic sales region, and time
period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products
sold in the Eastern region above a certain price during a certain time period) and display
them. Attributes such as time periods can be broken down into subattributes.
The chief component of OLAP is the OLAP server, which sits between a client and a database management
systems (DBMS). The OLAP server understands how data is organized in the database and has special
functions for analyzing the data. There are OLAP servers available for nearly all the major database systems.
OLAP databases contain two basic types of data: measures, which are numeric data, the quantities and
averages that you use to make informed business decisions, and dimensions, which are the categories that
you use to organize these measures. OLAP databases help organize data by many levels of detail, using the
same categories that you are familiar with to analyze the data.
The following sections describe each of these components in more detail:
Cube A data structure that aggregates the measures by the levels and hierarchies of each of the dimensions
that you want to analyze. Cubes combine several dimensions, such as time, geography, and product lines,
with summarized data, such as sales or inventory figures. Cubes are not "cubes" in the strictly mathematical
sense because they do not necessarily have equal sides. However, they are an apt metaphor for a complex
concept.
Measure A set of values in a cube that are based on a column in the cube's fact table and that are usually
numeric values. Measures are the central values in the cube that are preprocessed, aggregated, and analyzed.
Common examples include sales, profits, revenues, and costs.
Member An item in a hierarchy representing one or more occurrences of data. A member can be either
unique or nonunique. For example, 2007 and 2008 represent unique members in the year level of a time
dimension, whereas January represents nonunique members in the month level because there can be more
than one January in the time dimension if it contains data for more than one year.
Calculated member A member of a dimension whose value is calculated at run time by using an
expression. Calculated member values may be derived from other members' values. For example, a
calculated member, Profit, can be determined by subtracting the value of the member, Costs, from the value
of the member, Sales.
Dimension A set of one or more organized hierarchies of levels in a cube that a user understands and uses
as the base for data analysis. For example, a geography dimension might include levels for Country/Region,
State/Province, and City. Or, a time dimension might include a hierarchy with levels for year, quarter,
month, and day. In a PivotTable report or PivotChart report, each hierarchy becomes a set of fields that you
can expand and collapse to reveal lower or higher levels.
Hierarchy A logical tree structure that organizes the members of a dimension such that each member has
one parent member and zero or more child members. A child is a member in the next lower level in a
hierarchy that is directly related to the current member. For example, in a Time hierarchy containing the
levels Quarter, Month, and Day, January is a child of Qtr1. A parent is a member in the next higher level in a
hierarchy that is directly related to the current member. The parent value is usually a consolidation of the
values of all of its children. For example, in a Time hierarchy that contains the levels Quarter, Month, and
Day, Qtr1 is the parent of January.
Level Within a hierarchy, data can be organized into lower and higher levels of detail, such as Year,
Quarter, Month, and Day levels in a Time hierarchy.
OLAP Server Architectures
They are classified based on the underlying storage layouts
ROLAP (Relational OLAP): uses relational DBMS to store
and manage warehouse data (i.e., table-oriented
organization), and specific middleware to support OLAP
queries.
MOLAP (Multidimensional OLAP): uses array-based data
structures and pre-computed aggregated data. It shows higher
performance than OLAP but may not scale well if not properly
implemented
HOLAP (Hybird OLAP): ROLAP approach for low-level raw
data, MOLAP approach for higher-level data (aggregations).

Types :
1. Multidimensional
MOLAP is a "multi-dimensional online analytical processing". 'MOLAP' is the 'classic' form of
OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-
dimensional array storage, rather than in a relational database. Therefore it requires the pre-
computation and storage of information in the cube - the operation known as processing. MOLAP
tools generally utilize a pre-calculated data set referred to as a data cube. The data cube
contains all the possible answers to a given range of questions. MOLAP tools have a very fast
response time and the ability to quickly write back data into the data set.
Advantages of MOLAP
Fast query performance due to optimized storage, multidimensional indexing and caching.
Smaller on-disk size of data compared to data stored in relational database due to
compression techniques.
Automated computation of higher level aggregates of the data.
It is very compact for low dimension data sets.
Array models provide natural indexing.
Effective data extraction achieved through the pre-structuring of aggregated data.
Disadvantages of MOLAP
Within some MOLAP Solutions the processing step (data load) can be quite lengthy,
especially on large data volumes. This is usually remedied by doing only incremental
processing, i.e., processing only the data which have changed (usually new data) instead of
reprocessing the entire data set.
MOLAP tools traditionally have difficulty querying models with dimensions with very
high cardinality (i.e., millions of members).
Some MOLAP products have difficulty updating and querying models with more than
ten dimensions. This limit differs depending on the complexity and cardinalityof the
dimensions in question. It also depends on the number of facts or measures stored. Other
MOLAP products can handle hundreds of dimensions.
Some MOLAP methodologies introduce data redundancy.

2. Relational :
ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational
tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.
This methodology relies on manipulating the data stored in the relational database to give the appearance of
traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to
adding a "WHERE" clause in the SQL statement. ROLAP tools do not use pre-calculated data cubes but instead
pose the query to the standard relational database and its tables in order to bring back the data required to
answer the question. ROLAP tools feature the ability to ask any question because the methodology does not limit
to the contents of a cube. ROLAP also has the ability to drill down to the lowest level of detail in the database.
3. Hybrid :
There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database
will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database
will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least
some aspects of the smaller quantities of more-aggregate or less-detailed data. HOLAP addresses the
shortcomings of MOLAP and ROLAP by combining the capabilities of both approaches. HOLAP tools can utilize
both pre-calculated cubes and relational data sources.
Comparison :
Each type has certain benefits, although there is disagreement about the specifics of the benefits
between providers.
Some MOLAP implementations are prone to database explosion, a phenomenon causing
vast amounts of storage space to be used by MOLAP databases when certain common
conditions are met: high number of dimensions, pre-calculated results and sparse
multidimensional data.
MOLAP generally delivers better performance due to specialized indexing and storage
optimizations. MOLAP also needs less storage space compared to ROLAP because the
specialized storage typically includes compression techniques.
[16]

ROLAP is generally more scalable.
[16]
However, large volume pre-processing is difficult to
implement efficiently so it is frequently skipped. ROLAP query performance can therefore
suffer tremendously.
Since ROLAP relies more on the database to perform calculations, it has more limitations in
the specialized functions it can use.
HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and
MOLAP. It can generally pre-process swiftly, scale well, and offer good function support.

OLAP

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OLAP

Uploaded by

Copyright:

Available Formats

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively

You might also like