Knowledge Base

Preparing Excel Files for Analysis

Product(s): Tableau Desktop, Tableau Public, Tableau Public Premium
Version(s): All
Last Modified Date: 22 Nov 2013

The first step in exploring your data with Tableau is examining how the data is presented. When an Excel data source (other than a cube) is already formatted as a cross-tabulation or is otherwise aggregated, options for viewing, aggregating, and grouping in Tableau are limited. Tableau cannot see underlying data points that have already been summarized into a higher level group or order. To take advantage of Tableau's full functionality, you need to normalize the data - that is, format it as raw data - before connecting to it from Tableau.

For example, consider the two workbooks shown below. The first one is a formatted report with repeated headers, empty rows, grand totals, and so on. When you open the workbook in Tableau, your data should instead be in a raw data table like the second one.

Incorrect - Formatted Report

Correct - Raw Data Table

Below are some tips for turning your formatted reports into a raw data table that is ready for analysis in Tableau.

Remove or exclude introductory and other unnecessary text

The first row in the entire file must contain your field headers (or column names). Many reports delivered as Excel workbooks have a block of introductory text at the top. This text may be titles, color legends, descriptions, and so on. Remove all this information before opening the data with Tableau.

Remove unnecessary information at the top of the file.

If you don't want to remove introductory text, you can alternatively create a Named Range that contains just the data. When opening Excel workbooks in Tableau, you can connect to an entire sheet or a named range within a sheet.

In Excel:

Step 1

Select the data.

Step 2

On the Formulas tab, in the Defined Names section, select Define Name.

Step 3

In the New Name dialog box, in the Name text box, Excel offers a name based on the content of the top left cell of the selected data range. Keep this name.

Step 4

In the Scope list, select Sheet1.

Step 5

When finished, click OK.

In Tableau:

Your named range is offered as a table when you connect to the Excel workbook.

Make sure each row contains only one piece of data

This example shows an Excel table that lists students and their grades in three subjects. In a crosstab layout, you have a column for each subject. In this table, each row contains three pieces of data: the student's grade in Math, grade in English, and grade in Science.

ID Gender School Math English Science
1 M West 90 80 70
2 F South 50 50 50
3 M Central 90 80 90
4 M Central 50 80 80
5 M West 100 90 100
6 F West 80 80 60
7 F South 50 80 100
8 F Central 80 50 100
9 M South 70 80 80

Replace the columns Math, English, and Science with a single column: Subject. Now the table contains three rows for each student, but each row contains only one grade.

ID Gender School Subject Score
1 M

West

Math 90
1 M West English 80
1 M West Science 70
2 F South Math 50
2 F South English 50
2 F South Science 50
3 M Central Math 90
3 M Central English 80
3 M Central Science 90
4 M Central Math 50
4 M Central English 80
4 M Central Science 80
5 M West Math 100
5 M West English 90
5 M West Science 100
6 F West Math 80
6 F West English 80
6 F West Science 60
7 F South Math 50
7 F South English 80
7 F South Science 100
8 F Central Math 80
8 F Central English 50
8 F Central Science 100
9 M South Math 70
9 M South English 80
9 M South Science 80

Limit headers to a single row

Not only should the first row contain your field headers, but also this should be the only row of headers. If you have headers that include some type of "categorical" breakdown above them, create a new column that contains the category.

In this example, East is removed as a categorical header, and a new column, Region, is added to the table.

Fill blank cells

If you have created a new column for categories, make sure to fill the blank cells so that the information is repeated for each row of data, not just the first occurrence. While this seems redundant, it is important that each record (or row) has data across all the columns.

Clean up aggregated and descriptive data

Make sure to remove the rows that do not contain raw data records. For example, an Excel report may have rows that contain descriptive information and Grand Totals rows. You can easily add totals in Tableau and do not need to calculate them in your data source.

Delete blank rows and duplicate headers

Remove any blank rows and rows that contain duplicate headers.

Add missing headers

If any column does not have a title, make sure to add one. Be descriptive when writing your column headers.

Reshape the entire structure of your Excel data using Tableau's Excel Reshaper Plug-in

Even if you have followed all the suggestions shown above, you may still have data in a format that is not ideal for Tableau, purely from an analytic perspective. For example, you may still have a column for each month of business data, which Tableau interprets as separate columns, making month-to-month comparisons difficult.

You can use the Tableau plug-in for Excel to reshape your data. You still need to follow any/all of the steps in this article. Download the plug-in from the Installing the Tableau Add-in for Reshaping Data in Excel knowledge base article.

Alternate Search Terms: Information Data Sources Excel open data reshaping
Did this article resolve the issue?

Attachments

Search Knowledge Base