You are on page 1of 8

Improving Database Query Performance

Product(s): All
Version(s): All
Last Modified Date: 19 Mar 2015
Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still
valuable, but some steps may vary due to product changes.

At the heart of creating well-performing workbooks and visualizations is the basic principle that
the visualization will never run faster than the underlying query. Therefore, to ensure a workbook
is running as quickly as possible you need to ensure the query is running optimally.

For the End User


Below are some tips and pointers to help workbook authors understand if data access is a
problem and some suggestions on what they can do to address it.
Know what you are asking
Often a problem with slow-running visualizations is that you have inadvertently created a query
that returns a large number of records from the underlying table(s), when a smaller number of
aggregated records would suffice. The time it takes the database management system (DBMS) to
calculate the results, then stream the records back to Tableau can be significant. You can check
this by looking in the lower-left corner of the Tableau Desktop workspace and looking at the
number of marks. If this number is very large, you are potentially pulling a large amount of data
from the database.

Ensure you are not including any unnecessary dimensions in your visualization - this will affect
the aggregations in the database and increase the size of the result set.
Use native drivers
Tableau products include the ability to connect to a wide variety of data sources. Many of these
data sources are implemented as native connections which mean Tableau has implemented

techniques, capabilities and optimizations specific to these data sources. Tableau engineering and
testing activities for these connections ensure they are the most robust Tableau has to offer.
Tableau has additionally implemented the option to use the general-purpose ODBC standard for
accessing data sources beyond the list of named options available when creating a new
connection. As a publicly defined standard, many database vendors make ODBC drivers
available for connecting to their databases. Tableau provides the option to use these ODBC
drivers to connect to data.
There can be differences in how each database vendor interprets or implements capabilities of
the ODBC standard. In some cases Tableau will recommend or require you to create a data
extract to continue working with a particular driver. There will also be some ODBC drivers and
databases that Tableau is unable to connect to.
If there is a native driver for the data source you are querying you should use this over the
ODBC connections as it will generally provide better performance.
Test with another tool
A good way to determine if a slow workbook is being caused by a slow query is to test the same
query in another tool, such as Microsoft Access or Microsoft Excel. To find the query being run,
look in My Documents\My Tableau Repository\Logs and find a file titled log.txt. Open this file
and scroll up from the bottom until you find a section like the following:

2011-08-04 13:46:16.161 (2198):


2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.171 (2204):
[none:Customer Segment:nk],
2011-08-04 13:46:16.171 (2204):
[none:Product Category:nk],
2011-08-04 13:46:16.171 (2204):
[none:Product Sub-Category:nk],
2011-08-04 13:46:16.171 (2204):
[sum:Sales:qk]
2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.171 (2204):
2011-08-04 13:46:16.238 (2204):
2011-08-04 13:46:16.238 (2204):
2011-08-04 13:46:16.240 (2204):

DATA INTERPRETER: Executing primary query.


<QUERY protocol='05d09100 '>
SELECT [Superstore APAC].[Customer Segment] AS
[Superstore APAC].[Product Category] AS
[Superstore APAC].[Product Sub-Category] AS
SUM([Superstore APAC].[Sales]) AS
FROM [dbo].[Superstore APAC] [Superstore APAC]
GROUP BY [Superstore APAC].[Customer Segment],
[Superstore APAC].[Product Category],
[Superstore APAC].[Product Sub-Category]
</QUERY >
[Time] Running the command took 0.0659 sec.
[Time] Running the query took 0.0662 sec.
[Time] Getting the records took 0.0007 sec.

2011-08-04 13:46:16.240 (2204): Building the tuples took 0.0001 sec.


2011-08-04 13:46:16.240 (2198): [Count] Query returned 68 records (Q10).

The section between the begin and end query tags is the query that was passed to the database.
You can copy this text and then use it from a tool like Access or Excel. If it takes a similar time
to return as in Tableau, then it's likely the problem is with the query, not the tools.

Use extracts
If you are seeing poor query performance when using a live connection to data (i.e., against an
Excel spreadsheet or a database server) one easy way to improve performance is to create a
Tableau data extract (.tde).

Extracts allow you to read the full set of data pointed to by your data connection and store it into
an optimized file structure specifically designed for the type of analytic queries that Tableau
creates. These extract files can include performance-oriented features such as pre-aggregated
data for hierarchies and pre-calculated calculated fields (reducing the amount of work required to
render and display the visualization).

Optimize query performance by assuming referential integrity


When you are working with multiple tables in a data source, and you have joined multiple tables,
you may be able to improve query performance by selecting the option to Assume Referential
Integrity from the Data menu. When this option is selected, Tableau will include a joined table
in the query only if it is specifically referenced by fields in the view. Referential integrity exists
when any value you specify from a column in one table is assured to exist as a value for a
column in any joined table. For details, see Assuming Referential Integrity in the Tableau
Desktop online help.

Using this setting is appropriate when you know that your data has referential integrity but your
database is not enforcing or cannot enforce referential integrity. If you are able to configure
referential integrity in your database that is a better option than using this setting because it can
improve performance both in the database and in Tableau. The Assume Referential
Integrity option in Tableau can only affect performance on Tableau's end.
If you data does not have referential integrity and you turn this setting on, query results may not
be reliable

For the DBA


If the above points are not sufficient to address your performance problems, it could be that the
problems are at a deeper level than can be addressed by an end user. Tableau suggests that you
engage your database administrator (DBA) and have them look at the following section for
suggestions.
Seriously, know what you are asking
As pointed out earlier, knowing what you are asking the database to do is an important part of
performance tuning. By running an audit or trace on the database, you can isolate the query that
Tableau has passed to the query engine, and can check to see if it is what you expect. For
example, does it have the expected GROUP BY and filter clauses; is it doing aggregations in the
query as opposed to returning raw field values; etc.
As an example, in SQL Server start the Profiler tool to trace the queries running (filter by
application name ="Tableau 6.1" or by your user name if the server is busy). This will allow you
to see what the query was and how long it took to return.

Tune your indexes


Once you know the query being run, you can dump it into the execution plan estimator to see in
more detail how the DBMS will process the query - this does not execute the query, but returns
the estimated execution plan based on the statistics the server has collected.

Based on the information returned, you can determine whether additional indexes need to be
created (i.e., the kind of query being asked by end users has changed and the current indexing
model no longer reflects this accurately).
This is a deep topic but some basic principles are:

Make certain you have indexes on all columns that are part of table joins

Make certain you have indexes on any column used in a filter

Explicitly define primary keys

Explicitly define foreign key relationships

For large data sets, use table partitioning

Define columns as NOT NULL where possible

Use statistics
Databases engines collect statistical information about indexes and column data stored in the
database. These statistics are used by the query optimizer to choose the most efficient plan for
retrieving or updating data. Good statistics allow the optimizer to accurately assess the cost of
different query plans, and choose a high-quality plan. For example, a common misunderstanding

is that if you have indexes, databases will use those indexes to retrieve records in your query. Not
necessarily. If you create, let's say, an index to a column City and <90% of the values are
Vancouver', the DBMS will most likely opt for a table scan instead of using the index if it
knows these stats.
Ensuring that database statistics are being collected and used will help the database engine
generate better queries, resulting in faster query performance.
Optimize the data model
Finally, the data model being queried can have a significant impact on the performance of
queries. Ensuring that the structure of the data is aligned to the kinds of analysis the end users
will do is critical for good query performance. If you find you are needing to design excessive
joins, it could be an indication that the data model is not suited to the task at hand.
An example is where it could be beneficial to create summary tables if most of your queries only
need aggregated data - not base level details records.
Again, this is a big topic beyond the scope of this article, but DBMS vendors have many
whitepapers that describe their recommended best practices for data warehouse and data mart
design.

You might also like