Top N: We Can Do Better

The case for showing slices of data more intelligently.

I am not, by nature, a distrustful person. But I'm deeply suspicious of pretty numbers when dealing with slices of data built on ranks. I'm referring, of course, to views such as these:

The sentiment behind a "Top N" table or chart is great; it's almost Pareto-esque. It is as if you are saying to your users: We get it, there are only so many hours in a day. Focus on these customers/patients/accounts/SKUs/[whatever] that are having the biggest impact on business. But that's when things seem to go horribly wrong, because this "why" behind a Top analysis is forgotten, leaving only the "what." It's as if the designer forgot that the purpose of a Top N view is to eliminate noise in a meaningful way, to shepherd users to important data points that they potentially need to act on.
Instead, if you're like me, then 90%+ of Top N analyses that you have seen are either Top 5, Top 10, or Top 3. This drives me up the wall. What's so special about these numbers? The answer is: absolutely nothing, but they're very pretty. Why 5? Because we have 5 fingers on each hand. Why 10? Because we have 10 fingers total. Why 3? Because humans have a millennia-old relationship with the number 3 (documented ad nauseum by everyone from psychologists to motivational speakers) that permeates everything in our world. Occasionally a designer will even do something silly like a "Top 7" view because there is room for exactly 7 bars on users' screens without scrolling. Absolute nonsense; if the design of your UI does not support the data you need to display to make your application effective from a business point of view, then it's time to change your design, not bend your data to fit!
I'd like to make an impassioned plea to business intelligence designers everywhere: please stop creating applications like these—we can do better. Instead, let's be true partners to our users, to help them see what they need to see, while eliminating data points of lesser value. There isn't going to be a magic number like 3, 5, or 10 that is going to work in all cases like a silver bullet. If your top 1 customer comprises 90% of your sales and the next largest is only 0.5%, then showing even a Top 3 chart makes no sense. It creates a false equivalency that, at best, will distract your users from what is important to their business. Conversely, if your top 12 customers are roughly equal in sales, why would you want to cut off the Top N view at 5 or 10?
Or, if you're already on target for where you need to be with your largest customer, then is there really value in highlighting that customer in Top N views? I would argue that there isn't—that Top N views, to an even greater extent than the rest of your application, need to be actionable, because their entire purpose is to focus users' attention on what you are telling them is important. If no action is required, if a KPI cannot realistically be improved for a particular data point, what is the value of showing that data point? The last thing you want is for users to begin subconsciously ignoring your views because they are irrelevant in driving their day-to-day success.
If we are going to be Pareto-esque, then let's really be Pareto-esque. For those unfamiliar with it, the Pareto principle states that roughly 80% of effects come from 20% of causes. For instance, that 20% of your customers will be responsible for 80% of your total sales. This principle is remarkable in that it seems to work in an unexpectedly wide range of real-life applications. Whether the number for your particular business is exactly 80/20 is not important, and neither is whether you even want to focus on the full 80% of effects or to cut a smaller slice for your "Top" charts. The key takeaway from the principle is to look at Top N% instead of simply "Top N." And, so there's no confusion, I don't mean Top N% in terms of rank (because that again can create a false equivalency between your #1 and #2 customers, for example). I mean the data points responsible for the cumulative Top N% of a KPI, sorted in descending order of that KPI. Something like this:

What I am proposing is a relatively simply 2-step formula:
  1. Start with the customers/patients/accounts/SKUs/[whatever] responsible for the Top N% of a measured KPI (e.g. sales or competitor sales)
  2. Remove from this list those data points that:
    • Meet a predefined business threshold for a KPI, if relevant (e.g. sales growth already at a certain target level);
    • We cannot improve by some sort of intervention (e.g. customers where we already have 100% market share); or
    • We should not improve by some sort of intervention (e.g. customers where we know we cannot realistically increase market share without a tremendous amount of effort, or high value accounts that we forecast losing value in the future)
The order of these two main steps will vary business by business—whether Top N% should be calculated based on your total business (as I prefer) or on whatever remains after the three exclusions are applied. In the end, this could result in a list of 3 data points this week, and 4 data points next week when the application is refreshed. Or it could be a list of 100 data points (sorted by relative value, of course). Or it could result in 0 data points. And all of the above are OK, because they represent a vastly truer picture of what areas of the business need attention than a simplistic Top N list. Please, let's think outside the box. Let's put the "intelligence" back in BI. Let's be better.
This concludes my rant. I look forward to reading in the comments below why I'm wrong and don't know what I'm talking about 🙂
This entry was posted in Tips & Tricks, Visualization and tagged , . Bookmark the permalink.

2 Responses to Top N: We Can Do Better

  1. Christophe Brault says:

    Hi Vlad,

    I totally agree with you, and i’m wasting lots of time with every new client to demonstrate that TOP N is something bad nine times out of ten.

    My first argument is always that TOP N is something inherited from static reporting tools. Since you can scroll or zoom in with Qlik, it’s a mistake to hide data that are not excluded by selections.

    To go further on vizualising Pareto, here is a good recipe by Henric Cronström on how to achieve pareto in table or graph :

    • Yep. I’d also add the second “exclusion” step I mentioned on top of the traditional Pareto analysis. What I’m proposing is certainly not difficult to achieve from a technical point of view, the main challenge is change management because Top N is just how things have always been done.

Leave a Reply to Christophe Brault Cancel reply

Your email address will not be published. Required fields are marked *

Notify via email when new comments are added

Blog Home