This post is not a rant against spreadsheets. It’s also not a defense of spreadsheets and duct-taping them together. Spreadsheets certainly have useful roles to play, but they can quickly run wild if not attended to properly. You need to fix your “too many spreadsheets” problem before they ultimately smother your sanity.

Why Too Many Spreadsheets is a Serious Problem

They may be less prevalent in large companies with robust enterprise solutions, but unwieldy Spreadsheet Forests are pervasive among small and medium organizations. 

They usually start as a singular, benign tool. Someone needs to track a given set of data points, so they grab a trusty spreadsheet. Their peers catch wind of the spreadsheet, and they all want to see it.

Soon progressively more functions and data get piled onto the spreadsheet in the unbridled data excitement. Spin-off sheets come out of the woodwork. Before you know it, you have a small network of disjoint workbooks all trying to paint a big, messy picture. It quickly becomes clear that someone needs to get control of these spreadsheets.

At this point, some “Excel Macro Ninja” typically rolls-up and figures out a way to integrate pieces of the mess. Unfortunately, these integrations are usually spotty at best. Worse, they often rely on having both the original developer and the original machinery around to make it work.

That is a Spreadsheet Forest and I wish I could say that wasn’t a typical case. Unfortunately, that’s a reasonably accurate representation of the majority of spreadsheet forests I encounter. Your forests might not be that bad. Regardless, they are always harmful.

Someone has to dedicate themselves to getting control of those spreadsheets and that’s generally not an effective use of their time. It’s much more cost-effective in terms of money and sanity to develop a more appropriate solution.

How to Get Control When You Have Too Many Spreadsheets

I don’t care how great you are at VBA or Apps Script; a network of spreadsheets is never an adequate substitute for actual databases and reporting solutions. However, these forests are sometimes unavoidable if your organization doesn’t yet have the budget or strategy in-place to set up proper solutions. That’s OK! There are plenty of things you can do to tame these out of control spreadsheets.

Get out of Excel and into Google Sheets

When I first started working with data computing, Microsoft Excel was synonymous with the word “spreadsheet.” It was the undisputed king of the market. Excel’s ability to facilitate automation and interface with databases made it an attractive “everyday” solution for a lot of companies.

Furthermore, it was already included in enterprise Microsoft Office subscriptions. This was at a time when the entire business world ran on Windows. Before long, legions of savvy workplace Excel wizards started deploying everything from rudimentary tables to OLAP cubes in Excel. 

Regrettably, conditions are still present in many workplaces. That needs to stop. Excel is neither a reporting solution, a database, nor bookkeeping software. In my opinion, it’s inappropriate to use it in the modern web-based enterprise landscape. Instead, I recommend making the move to Google Sheets.

Is Google Sheets Really a Suitable Replacement for Excel?

There’s this myth that Google Sheets is an ineffective and “junior-grade” tool. In the early days of Google Sheets, I’d agree with that assessment. Today, however, Google Sheets is not only a rival to Excel; it is the superior product. As a web-based product built on Google platforms, Google Sheets is collaborative (in real-time) and plays extremely well with web-based data sources. It’s Google scripting language, however, that really helps Google Sheets shine.

The Power of Apps Script

Scripting is done with Google Apps Script (the Google flavor Javascript). That means that you’re able to use the full power of Javascript-based computing for your calculations, models, and processes! Even if Google Sheets doesn’t have the same formulas baked-into it that Excel does out of the box, you can quickly write that functionality in Apps Script. You could then deploy it as a custom formula and use it throughout your spreadsheet ecosystem. 

Apps Script also allows you to interface seamlessly with the rest of G-Suite. Using Apps Script APIs, you can quickly connect data points between Sheets, Forms, Docs, and Mail (just to name a few). You can even use Sheets data to help manage your Google Admin components in the Google Admin Console.

Outside of internal APIs, Apps Script facilitates pulling data from external APIs and databases. Just like a web-app, you can write routines like fetch() scripts to grab data straight from the source, and these scripts will always work on any machine. The browser frees you from all those dependency and architecture pitfalls that come with similar locally-installed Excel solutions. Once you have the data, you can even run SQL queries against it through query() functions!

=QUERY('Example Data'!$A$2:$H$7, "select A where (B<>'Eng' and G=true) or (D > "&A2&")")

The point is that reducing your reliance on Excel will make for a more collaborative, cost-effective, and connected experience. This should be your first step as you try to get control of the spreadsheets in your spreadsheet forest.

Trim the Non-Essentials and Create Singular Sources of Truth

Spreadsheet forests are best kept small. They really should be spreadsheet gardens, if anything. You shouldn’t just limit the number of spreadsheets in the forest, you should also keep the spreadsheets themselves small.

Spreadsheets are not designed to hold hundreds of thousands of rows of information. If you want that capability, then you need a real database solution like PostgreSQL. Spreadsheets are best used for statistical analysis of small amounts of information, and you should be considerate of that fact.

When looking at areas to cut back in a spreadsheet forest, start by addressing the following:

Create Single Sources of Truth

What are the core sources of information? You want to have as few sources of truth as possible.

Let’s say that you have three spreadsheets that all have summaries about the same sales information. All three of those sheets should point to a single sales database/databank. That Single Source of Truth is now the only point you have to manage to ensure core data accuracy across its connections.

Clean up the pipelines between sheets and the data packet size

Once you’ve identified and cleaned your Single Sources of Truth, it’s time to focus on the data pipelines between your sheets. You should avoid duplicate pipelines where possible and only retrieve the information you need. The goal to only move small amounts of data between sheets. Don’t pull massive amounts of data from a source and then perform calculations in the Sheets. Instead, do the calculations in-transit or “server-side” and pass an aggregated packet of data.

For instance, if your data source is an API (e.g., REST, GraphQL), there are usually endpoints or mutations that allow you to grab pre-summarized data. In cases where those convenient structures aren’t in place (or you’re referencing another sheet), that’s not a problem! Google Sheets, for instance, makes it easy to write a script to fetch() the data and then perform summary calculations on what it returns in memory before passing it along to the spreadsheet endpoint.

Use Databases to Store your Data Sources Whenever Possible

People who aren’t data professionals often forget (or don’t know) that spreadsheets are not databases. Spreadsheets are simple tabular data constructs primarily designed to handle small numbers and calculations. Databases, on the other hand, are genuine storage architectures intended to connect and process large amounts of data efficiently for use by software. 

If you’re using a spreadsheet to store rows and rows of data, you should move that information to a database. Yes, that might seem intimidating, but it doesn’t need to be. You don’t have to develop a large, relational data warehouse full of tables upon tables upon junction tables upon tables. You might only need a small data lake or data pond. (In a future post, I’ll show how easy it is to do that using PostgreSQL.)

The point is that you want to move your raw data out of spreadsheets and into the right solution for storing data. That will decompress the strain on your spreadsheets and substantially improve performance. It will also consolidate your raw data into a central location that you can better manage, curate, and control. Furthermore, it allows you to leverage the full power of SQL to create connections and views!

A quick sidebar: I recommend sticking to “relational” architectures like PostgreSQL instead of “non-relational” architectures like MongoDB. If you’re living in a spreadsheet forest, relational structures will be more similar to the tabular environments you’re familiar with. Non-relational architectures meet the needs of specific cases that you’re unlikely to encounter at this stage.

Automate Data-Handling Wherever Possible

A well-designed and thoroughly-vetted data handling script will make far fewer errors than a human doing the same work. Compared to computers, humans are easily distracted and clumsy. This is particularly true for people like me with bear paws for hands. We easily make errors in transcription and transposition which creates problems down the line.

Humans also have different opinions when it comes to formatting. Whereas scripts and pipelines adhere to the same formatting instructions loop after loop, humans are all over the place. Data validation rules can help mitigate that, but there is still ample room for erroneous variability.

Remember, automation isn’t about replacing all your workers. It’s about letting machines do what they do best so that your human workers spend less time doing monotonous, repetitive tasks and more time focusing on pushing your organization forward. 

Wherever possible, you task a machine with shuttling your raw and aggregated data around. You’ll dramatically reduce your errors and capitalize on all that data cleanup work you did when you established your Single Sources of Truth. You’ll also have more capacity to do things that humans do best, like imagine and create.

Remember, automation isn’t about replacing all your workers. It’s about letting machines do what they do best so that your human workers spend less time doing monotonous, repetitive tasks and more time focusing on pushing your organization forward. 

Appoint Spreadsheet Curators (or Rangers)

The more people with control over your spreadsheets, the more fires you’ll have to put out. Users often erroneously modify critical formulas without anyone noticing. Data gets reformatted in ways that break integrations and filters get turned on and off with no reset. It’s data mayhem!

You should appoint a small group of people to serve as curators of the spreadsheet forest. Perhaps “curators” are the wrong analogy. Let’s stay on-brand and call them rangers or something. I like “rangers.”

These spreadsheet forest rangers are the ones responsible for monitoring the data, ensuring all the pipelines are functioning correctly, and are the only ones who should be able to edit/manipulate the spreadsheets themselves.

They aren’t responsible for the accuracy of the source data. That is the responsibility of whatever department generates that information. They are responsible for ensuring that the sheets themselves are stable and working in the ways they should be to working. 

Document Everything

The stability of spreadsheet forests is often dependent on the person who built them sticking around to manage them. Once that person leaves the organization or transitions roles, the forest rapidly loses its integrity.

If you’re tied to a spreadsheet forest, you must document all your processes. The documentation should be good enough that a new developer could come into the project and be able to keep things running without interruption. As with any data system, build a system that doesn’t need you around to keep it working.

Develop (and Commit to) a Plan to Retire the Spreadsheet Forest ASAP

Spreadsheet forests should never be allowed to persist permanently. They should only be temporary constructs on the road to more robust, secure, and productive solutions. Work with management and team leads to get control of the spreadsheets and develop a comprehensive plan to retire them as soon as possible.

Let’s Recap How to Get Control of the Spreadsheets in your Spreadsheet Forest

So, we’ve established that spreadsheet forests are as damaging as they are ubiquitous. However, we’ve also discussed how they can be effectively managed if you keep a few things in mind:

Get out of Excel and into Google Sheets. Excel encourages data siloing and creates inefficient, unwieldy processes. Leave your bad habits on the C:\ Drive and move your spreadsheets to the browser with everything else.

Trim your spreadsheet forests (and the spreadsheets within them) down. Only keep the necessities and make sure you’re considerate of resources and data processing best practices.

Create your Single Sources of Truth and remove duplicate datastores. If two spreadsheets use the same information, they should both reference a common source.

Use databases instead of spreadsheets to store your raw information. If you don’t want to always grab your data straight its some source outside your organization (i.e., your enterprise CRM platform’s API), look into creating a small data warehouse or lake and point your sheets at that.

Automate your data handling wherever you can. Let machines do the work they do best and stop doing everything manually.

Establish your group of spreadsheet forest rangers. This will be the group that manages the forest and takes responsibility for its security and stability.

Create comprehensive documentation. Make sure the documentation is good enough to allow your teammates to pick up the reins when you’re on vacation or transition out of your role.

Create and implement an exit plan to get you and your organization out of the spreadsheet forest as soon as possible. These structures are not sustainable long-term solutions. Don’t get trapped!

Now, stop reading this, grab that data ax, and get trimming!

Advertisements Disclosure

I will always make it clear if I am writing to endorse or recommend a specific product(s) or service(s). I hate it when I visit a site only to find out that the article is just one big ad.

Various ads may be displayed on this post to help defray the operating cost of this blog. I may make a small commission on any purchases you make by clicking on those advertisements. Thank you for supporting my work bringing you accurate and actionable information on data literacy, analytics, and engineering.