You are on page 1of 7

ROW STORE and COLUMN STORE -- MEMORY USAGE

http://debajitb.wix.com/debajitbanerjee/apps/blog/comparison-between-row-and-column-store

With the new Business Suite, load times have been cut by a factor of 20 and batch programming eliminated
completely, he explained, and mathematical functions have been slimmed down in the quest for faster updates,
with a goal of making three second response times the norm.
But if we look into within the database, why & how it is fast?
- Exploitation of current hardware developments
- Main Memory is the New Disk
- Non-Uniform Memory Access (NUMA)
- Multi-core processor parallelism
- Efficient communication between database layer with the application layer
- Pushing more application semantics into data management layer
- Data compression achieves a reduction in disk space
- Different compression techniques, Light-Weight/Heavy-Weight
- Compression-aware query execution
- Data-Dependent Optimization
SAP HANA Database supports Row Store and Column Store. Some of the HANA Administrative tables in row
store (e.g. SYS schema and tables from Statistics Server) whereas other Administrative tables in column
store(e.g. _SYS_BI, _SYS_BIC, _SYS_REPO schema, etc.). Transactional data stored in the physical tables of
SAP HANA Database used for Analytical purposes. HANA Analytical Data Modeling is only possible for
Columnar Tables; i.e, Information Modeler only works with column storage tables. Replication Server(SLT) and
Data Services create tables in column store by default.
If transaction data is stored in a column-based table, then it enables
- fast on-the-fly aggregations,
- ad-hoc reporting.
How data stored in column format? How it differs from row format?
Here, the below diagram will help you to understand very easily.

In lab, I have tried out the following:


- Comparison between ROW store and COLUMN store table [but not for ECC on HANA, this is purely a
HANA Database in SAP HANA Appliance]
I have created TWO tables in HANA database.
First Table, CDHDR_ROW : Table type in Row Format
Second Table, CDHDR_COLUMN : Table type in Column Format
Filled both the tables with same data/records.
No of records : 8,413,932

After data load into CDHDR_ROW and CDHDR_COLUMN from flat file, within HANA result shows (I didnt do
anything extra):
Now you can see, ROW stored CDHDR_ROW table occupies how much memory & disk-size ?

One thing, ROW store table automatically loaded into memory when HANA starts-up. But ROW store table can
not be unloaded from memory.

Whereas, columnar table CDHDR_COLUMN taking very less memory & disk-size.
Again, later I have converted ROW store table CDHDR_ROW into COLUMNAR table and record the output:
Now, both tables are in column store.

Now you can see, it occupies less memory and disk-size compared to its ROW based table state.

Demystifying the Column Store: Store Statistics in SAP HANA and the Benefits for
Business Insights
http://www.agilityworks.co.uk/our-blog/demystifying-the-column-store-%E2%80%93-viewing-
column-store-statistics-in-sap-hana-and-understanding-the-benefits-for-business-insights/

Although there have been a number of articles already written about the column store in SAP HANA, its still a
question that gets asked regularly, particularly around data reconstruction. The aim of this blog is to explain:
1. How we can access the statistics behind a table stored in column store format within SAP HANA, such as the
main memory size and compression rate.
2. The workings of a column store, including data reconstruction.
3. Some of the advantages of the column store for business insights and analytics.
In order to discuss these aims, lets work with an example of a row based table from a typical relational database. We
will load the data into SAP HANA in order to store it in a column store table within main memory.
A sample of the table we will work with is in Figure 1. In total we have 650,216 records. It shows New York Stock
Exchange Data. For each company that trades (stock symbol) it shows the prices for each day.

Figure 1 Source table with 9 sample records


In SAP HANA, we can choose to store the table in a row store or column store, both of which are stored in memory. For
this blog we have loaded the same data into two separate tables, one row store, one column store, in order to analyse
them both.
The size of a table in main memory can be seen in its definition view (see Figure 2 for how to access this in the HANA
Studio) as well as the type of store it uses.

Figure 2 Accessing the table definition


If we loaded the table into row store the table is essentially stored as-is from source. The screenshot in Figure 3 shows
the size of the table in memory (45,948 Kb = 44.9 Mb) for our 650,216 records. No fields in a column have been
merged/compressed.

Figure 3 Seeing the table definition of the row store table


If we loaded the data into column store the size of the table in memory is 15,754 Kb (15.4 Mb) which can be seen in
Figure 4. This size is significantly lower than the memory size of the same table loaded into row store. The way the data
is stored in column store to enable this reduced size is discussed shortly.

Figure 4 Seeing the table definition of the column store table


Compression has occurred against the columns. We can see the compression rate against the columns of the table in
Figure 5. This info is also visible in the table definition view within the HANA Studio.
Figure 5 Seeing the table definition of the column store table

But how does the compression in column store actually work?


SAP HANA has a number of different compression techniques (Run-length encoding, Cluster encoding and Dictionary
encoding) deep down in its inner workings. We will be looking theoretically at how Dictionary Encoding works. For each
column in the table, the database is storing a Dictionary and an Attribute Vector. Well take a look at both of these
objects in Figure 6 for our Stock Symbol field based on our table sample of 9 records.

Figure 6 The original column for Stock Symbol and how it is now represented in column store format
The next question to answer is how we reconstruct the row when querying the data in column store. Lets suppose we
want to run a query showing us the sum of the Stock Volume for the Stock Symbol ABB. First we would read the
dictionary of the Stock Symbol to get the Value ID for ABB (value ID 2) and use the inverted index to access the record
IDs that use Value ID 2 (records 1, 4 & 7). See reference 7 for a visual reference of this.
Figure 7 Reading of the Stock Symbol Dictionary and Index for ABB records
Then with the index of the Stock Volume we can map the records (1, 4 & 7) to the Stock Volume Value IDs (4, 5 & 6)
and then map these to the actual Stock Volumes values in the dictionary (see Figure 8). Then we apply the sum
function against the values to give us 8,804,800.

Figure 8 Reading of the Stock Volume Dictionary and Index


Key thing with our query is that we only read the columns we needed, i.e. Stock Symbol and Stock Volume.
From a business insights and analytics perspective, here are some of the main benefits in us being able to store our
data in column store format.
Store more data, in more detail than before
We can achieve very efficient compression because a large proportion of fields have a low number of distinct values,
that are used many times, which we can merge together. The compressed data means the storage space is reduced,
meaning more main memory is available whilst helping to eliminate data redundancy.
We can store large detailed and historic data sets that were too large for previous data marts and warehouses from both
storage and read/write performance perspectives.
Quicker access to this larger and more detailed dataset
The compressed data can be moved quicker from main memory into the CPUs in order to be processed. SAP HANA
already allows data to be moved to the CPU quickly because everything is in the main memory layer and not from disk.
With the compressed data the transfer from main memory to the CPU can happen even quicker, meaningful rapid
response times for users.
Only read what we need
When querying the data we only read the columns we need. Research shows that we only use a small number of
columns/fields within a table. Also, we would typically want to scan the whole column for each field (e.g. Sum the Sales
Revenue, for all Countries). In row store we read all of the columns regardless of whether we actually need them.
Column store eliminates unnecessary scanning.
Conclusion
Of course, we need to be aware that the write performance of a column store compared to row store isnt as good but
we can counter this with the current differentiation of read and write areas within the column store (and the subsequent
delta merge process). SAP is developing the capabilities of SAP HANA in order to improve write performance into the
column store.
Ultimately from a business insights perspective we no longer need to compromise or make a decision on what type of
data (detailed or aggregated, or at which level of aggregation) we need to store in our analytical database. Neither do
we need to think about the most common granularity of data access requests in order to build are indexes optimally
(e.g. is most retail data accessed by region and material category or by store and material?). With SAP HANA we can
store the complete dataset as it is and the column store technology will ensure everything is optimised for storage and
retrieval (we do still need however to think about lifecycle of data, we shouldnt load data that isnt going to be needed).
The column store is removing the boundaries of whats possible, ultimately meaning better insights leading to better
decisions.

You might also like