Knowledge Base

Preparing Excel Files for Analysis

Product(s): Tableau Desktop, Tableau Public, Tableau Public Premium
Version(s): All
Last Modified Date: 05 Dec 2014

Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still valuable, but some steps may vary due to product changes.


The first step in exploring your data with Tableau is examining how the data is presented. When an Excel data source (other than a cube) is already formatted as a cross-tabulation or is otherwise aggregated, options for viewing, aggregating, and grouping in Tableau are limited. Tableau cannot see underlying data points that have already been summarized into a higher level group or order. To take advantage of Tableau's full functionality, you need to normalize the data - that is, format it as raw data - before connecting to it from Tableau.

For example, consider the two workbooks shown below. The first one is a formatted report with repeated headers, empty rows, grand totals, and so on. When you open the workbook in Tableau, your data should instead be in a raw data table like the second one.

Incorrect - Formatted Report

Correct - Raw Data Table

Below are some tips for turning your formatted reports into a raw data table that is ready for analysis in Tableau.

Remove or exclude introductory and other unnecessary text

The first row in the entire file must contain your field headers (or column names). Many reports delivered as Excel workbooks have a block of introductory text at the top. This text may be titles, color legends, descriptions, and so on. Remove all this information before opening the data with Tableau.

Remove unnecessary information at the top of the file.

If you don't want to remove introductory text, you can alternatively create a Named Range that contains just the data. When opening Excel workbooks in Tableau, you can connect to an entire sheet or a named range within a sheet.

In Excel:

Step 1

Select the data.

Step 2

On the Formulas tab, in the Defined Names section, select Define Name.

Step 3

In the New Name dialog box, in the Name text box, Excel offers a name based on the content of the top left cell of the selected data range. Keep this name.

Step 4

In the Scope list, select Sheet1.

Step 5

When finished, click OK.

In Tableau:

Your named range is offered as a table when you connect to the Excel workbook.

Make sure each row contains only one piece of data

This example shows an Excel table that lists students and their grades in three subjects. In a crosstab layout, you have a column for each subject. In this table, each row contains three pieces of data: the student's grade in Math, grade in English, and grade in Science.

IDGenderSchoolMathEnglishScience
1MWest908070
2FSouth505050
3MCentral908090
4MCentral508080
5MWest10090100
6FWest808060
7FSouth5080100
8FCentral8050100
9MSouth708080

Replace the columns Math, English, and Science with a single column: Subject. Now the table contains three rows for each student, but each row contains only one grade.

IDGenderSchoolSubjectScore
1M

West

Math90
1MWestEnglish80
1MWestScience70
2FSouthMath50
2FSouthEnglish50
2FSouthScience50
3MCentralMath90
3MCentralEnglish80
3MCentralScience90
4MCentralMath50
4MCentralEnglish80
4MCentralScience80
5MWestMath100
5MWestEnglish90
5MWestScience100
6FWestMath80
6FWestEnglish80
6FWestScience60
7FSouthMath50
7FSouthEnglish80
7FSouthScience100
8FCentralMath80
8FCentralEnglish50
8FCentralScience100
9MSouthMath70
9MSouthEnglish80
9MSouthScience80

Limit headers to a single row

Not only should the first row contain your field headers, but also this should be the only row of headers. If you have headers that include some type of "categorical" breakdown above them, create a new column that contains the category.

In this example, East is removed as a categorical header, and a new column, Region, is added to the table.

Fill blank cells

If you have created a new column for categories, make sure to fill the blank cells so that the information is repeated for each row of data, not just the first occurrence. While this seems redundant, it is important that each record (or row) has data across all the columns.

Clean up aggregated and descriptive data

Make sure to remove the rows that do not contain raw data records. For example, an Excel report may have rows that contain descriptive information and Grand Totals rows. You can easily add totals in Tableau and do not need to calculate them in your data source.

Delete blank rows and duplicate headers

Remove any blank rows and rows that contain duplicate headers.

Add missing headers

If any column does not have a title, make sure to add one. Be descriptive when writing your column headers.

Reshape the entire structure of your Excel data using Tableau's Excel Reshaper Plug-in

Even if you have followed all the suggestions shown above, you may still have data in a format that is not ideal for Tableau, purely from an analytic perspective. For example, you may still have a column for each month of business data, which Tableau interprets as separate columns, making month-to-month comparisons difficult.

You can use the Tableau plug-in for Excel to reshape your data. You still need to follow any/all of the steps in this article. Download the plug-in from the Installing the Tableau Add-in for Reshaping Data in Excel knowledge base article.

Alternate Search Terms:Excel, data reshaping
Did this article resolve the issue?

Attachments

Search Knowledge Base