Photo by Markus Spiske on Unsplash

What is a Spreadsheet (and When Should You Use One)?

Spreadsheets organize data in a two-dimensional, tabular format of columns and rows. They are best suited for small datasets where you are working with numerical data and need to perform calculations. They also provide an intuitive “first-step” sandbox environment for exploring small data and establishing a plan for more rigorous analysis and larger projects. Unfortunately, they are frequently misused in data-driven workplaces and become bloated, unwieldy bundles of data duct-tape.

Before moving forward, you should review this post by Computer Hope if you are brand new to spreadsheets. It provides an excellent introduction to the architecture of spreadsheets and the terms you’ll encounter.

Advertisements

The humble spreadsheet…

Any competent data professionals should have a core mastery of “the basics,” and there are few things more central to basic data work than the humble spreadsheet. For those of you already scoffing at this statement, sitting behind your bespoke applications getting drunk on BigQuery while your multi-thread-optimized data streams flow straight into your brain, I have some insight for you: you’re in a bubble.

The people deciding how your work matters in the “big picture” generally don’t have a firm grasp of data science. These people are your bosses, your clients, or anyone doing the work your Board prioritizes most. Whoever they are, you need to be able to have meaningful conversations with them; conversations that are probably going to start with them coming to you and saying, “So, I have a spreadsheet…”

Those two-dimensional grids of rectangular cells conveniently organized are used all over the workplace. You’ll find them functioning as mailing lists, calendars, inventories, Kanban boards, customer feedback response repositories, grade books, sales ledgers, a means of opening CSV files…the list is extensive.

Despite that ubiquity, many users struggle to articulate what exactly spreadsheets are for and how to best use them. This is especially the case for people who don’t do data work full-time. It also doesn’t help that there is also so much variability in the path-to-entry for spreadsheets. They are often first encountered in informal settings with minimal guidance, and few people use them to their full capacity.

So, what is a spreadsheet anyway…

Spreadsheets are simple, tabular data constructs primarily designed to handle numbers and calculations. This was the original use case for spreadsheet apps, and the fact they have packaged that kind of computing into an accessible WYSIWYG interface is what has made them indispensable statistical workhorses since the days of VisiCalc.

VisiCalc (released in 1979) was the first widely-used spreadsheet app for personal computers and the progenitor of contemporary apps like Excel and Google Sheets.
spreadsheet example
Spreadsheets organize data into a tabular structure and are well-suited for analyzing small, numerical datasets (and performing dark magic through scripting and extensions).

The scope of calculations you can perform within spreadsheet apps is expansive. You can do everything simple arithmetic to concatenating strings to running ANOVA’s and performing Monte Carlo simulations.

Most modern spreadsheet apps also have a scripting component baked into them and allow you to automate routine processes. For Excel, that layer is found within the Macros capability (with scripts written in VBA). Google Sheets uses its Scripts module (with scripts written in Google Apps Script). Future posts on this site will go into more detail, but for now, know that these powerful capabilities are there. I use those capabilities all the time when my work involves spreadsheets. If you haven’t yet taken the plunge into the world of spreadsheet scripting, then you should! It will dramatically improve your spreadsheet-related productivity. It’s also a great first step in your path to becoming a capable data programmer!

Here’s an example of a Google Sheets script that creates a chart using data from a spreadsheet range.

function newChart() {
  // Generate a chart representing the data in the range of A1:B15.
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheets()[0];

  var chart = sheet.newChart()
     .setChartType(Charts.ChartType.BAR)
     .addRange(sheet.getRange('A1:B15'))
     .setPosition(5, 5, 0, 0)
     .build();

  sheet.insertChart(chart);
}

Some Bad Use-Cases for Spreadsheets

Modern apps have made spreadsheets extremely versatile. You can bend them to your will in many ways, but it’s easy to misuse them. Regardless of how shiny you make them, a spreadsheet is not the best solution for:

A spreadsheet is not a database

Spreadsheets and databases are apples and oranges. Databases are storage architectures intended to connect and process large amounts of data efficiently for use by software. They can quickly manipulate staggeringly large datasets and intrinsically “connect the dots” between data points.

Let’s say you want to keep some records of your customers. You’re going to record their contact details, the items they purchase, the times they make their purchases, the locations they travel to, etc. You also want to spit that all out in a useable format for analysis later. To do this, you’ll need to record, process, connect, and analyze a lot of disjoint information. That is why to data gods gave humanity databases. You could Frankenstein that kind of system together in a spreadsheet app, but that’s not a great solution.

Yes, Google Sheets has indeed opened up some exciting possibilities for treating spreadsheets like database tables. Its query() function and javascript-like scripting syntax allow it to dance a lot like a database, but just because something that looks like a duck and quacks like a duck doesn’t mean it’s a duck.

A spreadsheet is not an enterprise reporting solution. 

Sound and stable enterprise reporting should be portable, scalable, reliable, and consistent. They should provide high-level overviews of information in a common language from a single source of truth. Spreadsheets are technically capable of providing that kind of experience for end-users. They are also capable of retrieving data from external sources dynamically (e.g. enterprise data stores, external APIs, etc.). It’s therefore tempting to leverage their capabilities to create real-time dashboards with that data for organization-wide reporting. You should avoid that temptation.

For one, spreadsheet apps suffer from some very real performance limitations when dynamically processing data. They can quickly bloat and consume tremendous amounts of memory. If you’re using the 32-bit version of Excel, for example, that bloat can easily exceed Excel’s 2GB RAM limit. What started as a crisp and clean report is now a stuttering shadow of your analysis.

They also suffer from issues with portability, scalability, and long-term stability. It’s common for a spreadsheet dashboard to run just fine on one computer but not run on another due to problems with script permissions, extensions, and object libraries. They also inevitably start to grow over time. The great thing about liberating data within organizations using spreadsheets is that it immediately catalyzes discussion. Those discussions lead to new questions around the data within those spreadsheets which, in turn, leads to spreadsheet adjustments. If you’ve been through this experience, you know where this is going. The spreadsheets get tweaked, expanded, and twisted until they deteriorate.

A spreadsheet is not an accounting and finance platform

Yes, you can absolutely use spreadsheets to augment your accounting and finance work. Do not, however, use spreadsheet apps as your sole accounting and finance platform. Their issues with scalability and portability aside, they are also notoriously error-prone.

Some very large and established companies have learned that lesson the hard way. A faulty formula underestimated $9 million in losses at Eastman Kodak and Fannie Mae‘s spreadsheeting created more than $1 billion in discrepancies.

Why did such errors occur, you ask? Spreadsheets have too many moving parts and it’s hard to track every error propagation point over time reliably. All it takes is a single stray character or erroneous formula for their accuracy to implode. Their openness and “recode on the fly” nature is what makes them difficult to properly maintain for accuracy. Do yourself a favor and use some stable, formal accounting and finance software instead.

The solution doesn’t even have to be expensive and proprietary. There are several peer-reviewed open source solutions out there with exponentially less error potential than Excel or Google Sheets.

Advertisements

Some Good Use-Cases for Spreadsheets

Given those rules, there are several use cases where spreadsheets really excel (sorry, I couldn’t resist):

Straightforward statistical analysis of small datasets

Say you wanted to find the median gift amount in a group of 1,000 donations. A spreadsheet app can easily accomplish that. You could extract a CSV file of the contributions from your donation management software and upload it into a spreadsheet. From there, you could use a formula like median() to calculate the median amount on that range of gift amounts and display the result in a specific cell.

You could also envision a scenario where you want to visualize all those donations as a histogram. Again, a spreadsheet app makes this a cakewalk. You can highlight columns of data and visualize them as charts in just a few clicks.

Apps like Google Sheets make it easy to perform simple statistical analysis on small datasets.

Spreadsheets are well-suited for these kinds of tasks. You should, however, be sure to pay attention to your memory resources and how many formulas you’re using at once. Formulas can quickly stack-up and negatively impact performance.

Exploring small files stored in tabular formats like CSV

It can be challenging to visualize CSV files in their native form. They’re primarily a data exchange medium and not designed for human consumption. Yes, you can open them in a text editor like Notepad, but that method won’t provide a very clear or actionable perspective.

DonorID,DonationAmount,DonorYearsAsDonor,DonorType,RecurringDonation
Donor 00001,$25.24,9,Public,FALSE
Donor 00002,$28.24,9,Member,FALSE
Donor 00003,$25.73,9,Public,FALSE
Donor 00004,$32.25,2,Member,FALSE
Donor 00005,$32.72,1,Member,FALSE
Donor 00006,$31.66,6,Member,FALSE
Donor 00007,$28.55,2,Public,FALSE
Donor 00008,$30.79,9,Public,FALSE
Donor 00009,$16.87,9,Member,FALSE
Donor 00010,$29.83,1,Member,FALSE
Donor 00011,$31.66,10,Member,FALSE
Donor 00012,$28.42,10,Public,FALSE
Donor 00013,$27.56,6,Public,FALSE
Donor 00014,$20.93,5,Member,FALSE
Donor 00015,$21.96,8,Member,FALSE
Donor 00016,$30.72,4,Member,FALSE
Donor 00017,$27.18,1,Public,FALSE
Donor 00018,$22.36,8,Public,FALSE
...

Fortunately, CSV files (and other tabular data flat-file formats) happily convert into spreadsheets. Once imported into a spreadsheet app, you get a readable, accessible worksheet ready for exploratory analysis and other operations.

Caveat: If you’re using Excel, you should limit this technique to CSV files that have less than 1 million rows. For Google Sheets, you want to cut that down to less than 40,000 rows if you’re importing all at once.

Honestly, a few hundred thousand data records is a good limit for exploring CSV data in spreadsheets. The higher the row count, the greater the memory strain. This is especially true if you need to perform calculations on that data.

As a repository for small amounts of raw or summarized data

Spreadsheets work well with simple datasets that don’t warrant a full database solution. These are usually situations where any analysis is ad-hoc and the data doesn’t need to be stored long-term. For example, let’s say a classroom teacher wants to track assignment grades. A spreadsheet could work well here if the following conditions are met:

  • The data doesn’t need to follow the student to their next class (i.e. a cumulative grade record) or be stored long-term.
  • The data doesn’t need to be linked to several disjoint data points or referenced by other applications.
  • The spreadsheet is organized in a way that facilitates any intended analysis.

The key here is to keep things simple. Limit your formulas, make sure your data is clean and consistent, and don’t try to do too much all at once. Working within a spreadsheet environment means that you have both data recording tools (tabular worksheets) and data summary tools (charts, pivot tables) all in one place, but with great power comes great responsibility. It’s easy to overload a workbook and transform a clean, performant resource into a sluggish, bloated mess.

A spreadsheet can easily become bloated and unwieldy.
Don’t let your spreadsheets become a bloated mess.

In conclusion…

Spreadsheets are convenient and powerful instruments, but you must keep a few things in mind:

Spreadsheets work well for statistical analysis of small datasets. We’re talking a couple hundred thousand rows or less. In the right hands, apps like Google Sheets and Excel can work all manner of statistical dark magic. Know their limitations and always check (and double-check) your work.

Spreadsheets have a lot of moving parts which means a lot of places for things to go wrong. Keep your spreadsheets clean and simple so that you can spot problems early before they cause problems.

Spreadsheets can work quite well as rudimentary reporting platforms and data repositories for your organization. Their stability tends to degrade over time, however, so reserve them for short-term use. Also, document everything and make frequent backups because things will go wrong eventually. You’ll thank yourself later for leaving good breadcrumbs.

Advertisements Disclosure

I will always make it clear if I am writing to endorse or recommend a specific product(s) or service(s). I hate it when I visit a site only to find out that the article is just one big ad.

Various ads may be displayed on this post to help defray the operating cost of this blog. I may make a small commission on any purchases you make by clicking on those advertisements. Thank you for supporting my work bringing you accurate and actionable information on data literacy, analytics, and engineering.

Advertisements

One Response

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.