Professional Documents
Culture Documents
Product(s): All
Version(s): All
Last Modified Date: 19 Mar 2015
Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still
valuable, but some steps may vary due to product changes.
At the heart of creating well-performing workbooks and visualizations is the basic principle that
the visualization will never run faster than the underlying query. Therefore, to ensure a workbook
is running as quickly as possible you need to ensure the query is running optimally.
Ensure you are not including any unnecessary dimensions in your visualization - this will affect
the aggregations in the database and increase the size of the result set.
Use native drivers
Tableau products include the ability to connect to a wide variety of data sources. Many of these
data sources are implemented as native connections which mean Tableau has implemented
techniques, capabilities and optimizations specific to these data sources. Tableau engineering and
testing activities for these connections ensure they are the most robust Tableau has to offer.
Tableau has additionally implemented the option to use the general-purpose ODBC standard for
accessing data sources beyond the list of named options available when creating a new
connection. As a publicly defined standard, many database vendors make ODBC drivers
available for connecting to their databases. Tableau provides the option to use these ODBC
drivers to connect to data.
There can be differences in how each database vendor interprets or implements capabilities of
the ODBC standard. In some cases Tableau will recommend or require you to create a data
extract to continue working with a particular driver. There will also be some ODBC drivers and
databases that Tableau is unable to connect to.
If there is a native driver for the data source you are querying you should use this over the
ODBC connections as it will generally provide better performance.
Test with another tool
A good way to determine if a slow workbook is being caused by a slow query is to test the same
query in another tool, such as Microsoft Access or Microsoft Excel. To find the query being run,
look in My Documents\My Tableau Repository\Logs and find a file titled log.txt. Open this file
and scroll up from the bottom until you find a section like the following:
The section between the begin and end query tags is the query that was passed to the database.
You can copy this text and then use it from a tool like Access or Excel. If it takes a similar time
to return as in Tableau, then it's likely the problem is with the query, not the tools.
Use extracts
If you are seeing poor query performance when using a live connection to data (i.e., against an
Excel spreadsheet or a database server) one easy way to improve performance is to create a
Tableau data extract (.tde).
Extracts allow you to read the full set of data pointed to by your data connection and store it into
an optimized file structure specifically designed for the type of analytic queries that Tableau
creates. These extract files can include performance-oriented features such as pre-aggregated
data for hierarchies and pre-calculated calculated fields (reducing the amount of work required to
render and display the visualization).
Using this setting is appropriate when you know that your data has referential integrity but your
database is not enforcing or cannot enforce referential integrity. If you are able to configure
referential integrity in your database that is a better option than using this setting because it can
improve performance both in the database and in Tableau. The Assume Referential
Integrity option in Tableau can only affect performance on Tableau's end.
If you data does not have referential integrity and you turn this setting on, query results may not
be reliable
Based on the information returned, you can determine whether additional indexes need to be
created (i.e., the kind of query being asked by end users has changed and the current indexing
model no longer reflects this accurately).
This is a deep topic but some basic principles are:
Make certain you have indexes on all columns that are part of table joins
Use statistics
Databases engines collect statistical information about indexes and column data stored in the
database. These statistics are used by the query optimizer to choose the most efficient plan for
retrieving or updating data. Good statistics allow the optimizer to accurately assess the cost of
different query plans, and choose a high-quality plan. For example, a common misunderstanding
is that if you have indexes, databases will use those indexes to retrieve records in your query. Not
necessarily. If you create, let's say, an index to a column City and <90% of the values are
Vancouver', the DBMS will most likely opt for a table scan instead of using the index if it
knows these stats.
Ensuring that database statistics are being collected and used will help the database engine
generate better queries, resulting in faster query performance.
Optimize the data model
Finally, the data model being queried can have a significant impact on the performance of
queries. Ensuring that the structure of the data is aligned to the kinds of analysis the end users
will do is critical for good query performance. If you find you are needing to design excessive
joins, it could be an indication that the data model is not suited to the task at hand.
An example is where it could be beneficial to create summary tables if most of your queries only
need aggregated data - not base level details records.
Again, this is a big topic beyond the scope of this article, but DBMS vendors have many
whitepapers that describe their recommended best practices for data warehouse and data mart
design.