Microsoft Excel Is a Gateway Drug to Data Analysis

Please find the accompanying source data here: GitHub.

This article assumes a modern version of Microsoft Excel running on Microsoft Windows.

Part I: Creating a New Pivot Table

Microsoft Excel is a technical gateway drug. Once you understand the basics—rows, columns, cells, ranges, data input, data formatting, basic formulas—Excel over time reveals its true power. That revelation begins with plain old Excel Tables and Pivot Tables. A Pivot Table is essentially a microscope with a built-in heads-up display, designed specifically for data analysis. Once you have witnessed these in action, it’s incredibly difficult not to get hooked on data analysis for software QA.

Source Data

Before you can create and explore a plain old Excel Table and Pivot Table, you need source data to analyze. A plain old Excel Table, while technically not a prerequisite for creating a Pivot Table, is strongly recommended over a simple range of data. Why this matters will be demonstrated shortly.

To get started, either clone the GitHub repository indicated previously or create a new workbook. If you create a new workbook, you’ll need to create a range of data consisting of several columns and several rows. Make sure each column has a header. Keep it simple.

Excel Tables

Now we’re ready to unleash Excel’s power.

We first need to transform a range of data into an Excel Table. If you cloned the repository, please use the PlainOldExcelTable.xlsx workbook for this exploration. Select any cell within the range of data, then press

CTRL+T

to open the Create Table dialog.

Notice that because you selected a cell within the data range before opening the dialog, the correct range is already prepopulated. This is a small but powerful shortcut—and one I only learned recently myself. Make sure the My table has headers checkbox is checked, then select OK.

You have now successfully transformed a simple range of data into a powerful Excel Table. You should see that your data now has a color band applied, and that each column header includes a drop-down that enables filtering and sorting.

There are other ways to achieve this, but this is by far the easiest method. More importantly, your data range has just leveled up in power, as we’ll see next.

Coloring, filtering, and sorting data is baseline functionality—nothing to get excited about. However, what if we want to add a column—or add a row—to our Excel Table? This is where Excel’s power begins to reveal itself.

Scroll to the right so the last column in the Excel Table is visible, then add a new column to the right of the last existing column. Similarly, add a new row after the last existing row.

Excel immediately accepts the new column and row and automatically includes them in the Excel Table’s underlying data range. This behavior is important, because we no longer need to update the data range definition every time we add a row or a column. This is one of the key benefits of using Excel Tables rather than a simple data range, and it directly supports easier and more reliable Pivot Table use.

There is much more to explore with Excel Tables, and we’ll return to them in more depth in a future article. For now, however, Pivot Tables are screaming for attention.

Terminology

Before we get hands-on with Pivot Tables, we first need to understand what they really are and why they’re so useful. Most importantly, Pivot Tables enable us to easily ask—and answer—questions about our data. This question-asking is made possible through a simple yet powerful interface that sits on top of the underlying data. To better understand how Pivot Tables are used in data analysis, it’s useful to review a few terms and their definitions.
First, an observation is a data analysis term for a single row of data in an Excel Table or range. Each observation represents one recorded instance or event.

Second, a dimension is a column of data that describes or categorizes an observation. One or more dimensions provide context for each observation. Dimensions are most commonly text values or dates. Dimensions can also include numeric values if they are categorical—such as a Priority dimension where a numeric value of 1, 2, or 3 each represents a distinct priority.

A measure, on the other hand, is a calculated numeric value that summarizes data, such as a count, sum, or average. Measures are not typically stored directly in the source data; instead, Pivot Tables calculate measures on-the-fly by aggregating numeric values found in the source data.

A measurement source is therefore a numeric column in the source data that supplies the raw values from which measures are calculated.

Now that we understand enough terminology to appreciate Pivot Tables from a data analyst perspective, it’s time to get hands-on with their power.

Pivot Tables

With a cell inside the Excel Table selected, press

ALT+N+V+T

to open the PivotTable from table or range dialog.

When you created the Excel Table, it was named Table1 by default. Because a cell within the Excel Table is selected, Excel automatically prepopulates the Table/Range field with the name of the Table. You can now choose whether to place the Pivot Table in a new worksheet or an existing one. Select New Worksheet.

One important note: while you may notice the Add this data to the Data Model checkbox, leave this unchecked for now. We’ll return to the Data Model in a future article. Select OK to create the Pivot Table.

At this point, you should see a new worksheet containing an empty Pivot Table. Congratulations—you’ve successfully created your first Pivot Table in Microsoft Excel. You’ve now embarked on a journey toward becoming a more skillful analyst of QA data. In Part II of this article, we’ll configure Pivot Tables and begin to realize their full power.

Microsoft Excel Is a Gateway Drug to Data Analysis

Comments

Leave a Reply Cancel reply