Neatly Visualize a Year of Data

How to solve a common visualization challenge with the help of an unlikely ally: the humble radar chart.

 
Let's say you have approximately a year's worth of data. You are not interested in displaying trends within this data, but are instead interested in viewing each day as an discrete data point. This scenario happens all the time but the challenge is fairly obvious: what graphical representation can easily support such a large number of data points? The traditional solution for graphing discrete (i.e. non-trend) data points is a bar chart. But 365 bars? Unless you own an IMAX movie theater, that is an impossible number of bars to view on a single screen without scrolling. And, of course, the problem with scrolling is that you cannot view all of your data at the same time, which causes you to lose context. A symbol chart (a type of line chart) is going similarly restricted, even if you make it "continuous" (more on that below).
 
A Cool Example
One of my hobbies is "travel hacking." Despite the ominous word "hacking," this basically just means flying for free using miles and points. Today I have decided to combine business and pleasure by using an example from the world of travel hacking to illustrate the challenge above, as well as the solution. Singapore Airlines is one of the world's premier airlines. They are known for their outstanding service across the board, but what really makes them stand out is their "Suites" first class product, which brings new meaning to the word "luxury":

 
It goes without saying that this product is far out of the price range of what most people can afford. However, Singapore often makes these first class seats available for Singapore's "KrisFlyer" miles (which you can transfer to Singapore from various sources), instead of cash. On a particular flight, a seat may be available at the "Standard" award level or at the "Saver" award level. In the example that we will use, which is the New York to Frankfurt route, a Standard award costs 130,000 miles, and a Saver award costs 76,000 miles. In addition, some award seats are "Waitlist" availability, which means that you might get the seat if the airline decides to open the waitlist up for that particular flight, or you might not. Award availability is released 355 days in advance. If I am a fairly flexible traveler, I might be open to traveling whenever I can find good availability. But what is the best way to visualize 355 days of availability to help me easily find a travel date? Well, here is what a traditional bar chart solution would look like using Standard award availability:

 
In this graph, green represents open availability, yellow represents Waitlist availability, and red represents no award availability. The problem with this bar is obvious though: screen real estate. You can only see a small percentage of the total 355 days, and lose the valuable context that seeing a full year of data gives you. A symbol chart would have the same problem:

 
QlikView has the ability to make a numeric axis (such as date values) "continuous," but doing so causes the discrete data points to blend into a single line and become hard to distinguish and pinpoint:

 
Radar Charts to the Rescue

 
Radar charts are, as these things go, a fairly obscure form of visualization. They are certainly known to business intelligence professionals, but odds are that most other folks have never seen one. Even BI pros often overlook this type of visualization because, frankly, it is not that useful most of the time. Where the humble radar shines though, is in visualizing multiples of approximately a year. In business, this can be appropriate to show seasonality of demand, for instance. In the current example, it is a fantastically elegant way of showing 355 data points cleanly and without creating a sense of chaos and clutter. Here is how the same Standard award availability looks plotted in a radar chart:

 
Notice that the overall effect does not overwhelm the eyes but neatly presents all 355 days in such a way that any particular day can be easily found at a glance, while the broader context is available without the need for any scrolling whatsoever. To paint a full picture, here is the same graph for Saver level award availability:

 
Kind of a bleak picture, but there you have it: the only dates in the next year that have non-Waitlist Saver award availability (as of the time this article was written) are New Years Eve 2018 and January 10th.
 
I've personally been guilty of overlooking radar charts in the past, but they might be my new favorite visualization for this type of data. At the very least, I am going to include them in the rotation. I would urge you to remember that this is a tool that you have in your toolboxes, as well! Of course, another visualization we could have considered would have been a calendar itself, with the squares colored green, yellow, or red. However, we would have had to use a bit more screen real estate, and would not have been able to see patterns in availability across multiple months as easily. With a radar chart, the same day of each month is always on the same concentric circle; this would help us see, for instance, if there's usually more availability during a particular part of each month. Plus a calendar like that is not an object that exists in QlikView without the use of extensions (which I view as cheating).
 
A Note on Measure Values
In this specific example, we did not care about quantifying the value at each data point. Because we knew that all saver awards cost 76,000 miles and all standard awards cost 130,000 miles for this particular route, the specific mileage required on a particular day was of little interest to us. However, the technique we just discussed can easily be adapted, in one of two ways, to situations where we do can about the measure value. In the first approach, the data does not need to be trinary (only three possible values) but should, ideally, be limited to a reasonable number of values. If you are dealing with a measure that has a wide range of possible values (e.g. sales per day), you would bucket the values into a more manageable number. For example, round your sales into buckets of $5,000 (or whatever number makes sense). You would then assign a unique color to each bucket. To those that balk at the idea of losing precision, I would counter that data visualization is about seeing patterns effectively; if you truly need precision, a graphical representation may not be appropriate for you at all. You also have the option to bring precision back via mouse-over popup text, as is done in the application that you can download below. The second approach, which is also perfectly valid, is to create a heat map of colors and assign each day's measure value to a color in the heat map range.
 
A Note on Color Usage: Know Your Audience
You may have noticed that, in the example above, I committed a cardinal data visualization sin: I used red and green of approximately equal saturations. For the uninitiated, the reason this is a problem is because approximately 10% of the world is red-green colorblind. Obviously, if any significant portion of your intended audience shares this condition, you should pick colors that are more easily distinguishable by these folks. You could also make your symbols different for green and red points, such as by using a square for green and a circle for red. This is not easily achieved in QlikView though, so I cut some corners in this example.
 
A Note on the Data Used
OK, I fully expect that at this point a portion of my readers are doing back flips because the data used in this example is supposed to be impossible to get. I probably owe you guys an explanation of how I got it. Singapore Airlines is famous for only posting its Suites availability on its own website, and you can only get to the results by performing a search for a specific route and day combination. To get the results for another day, you have to rinse and repeat. Availability is released 355 days in advance, but to actually query 355 dates by hand is arduously time consuming. To get around this, I used OLE automation via a nifty tool called iMacros. I created a loop to search for Suites availability (a single seat) from New York to Frankfurt for each of the 355 days, and then save the resulting page as an offline HTML file. The script took about 2.5 unattended hours to run. Then I simply parsed the resulting 355 HTML files using QlikView's native ETL capabilities and Bob's your uncle. There are currently 13 routes on which Singapore offers Suites service so, in theory, I could have used this same technique to generate a database of all Suites availability in about 32.5 hours. I don't really have any plans to refresh the application used in this example on an ongoing basis, at least until I decide to try Singapore's remarkable product out for myself!
 
Download
As always, I'm more than happy to make the QlikView app used to create the above charts available for download below. I would love to hear from you in the comments section—in this particular post, I'm also quite curious if the majority of my readers are BI people or travel enthusiasts!
 
 
Update: by popular demand, I've decided to make the entire end-to-end process (including the OLE automation piece) available for public download. Happy Qliking!
 
 
Updated again on December 16, 2017 to fix a small bug related to CSV generation. Check back once in a while, as additional updates/improvements may be posted.
 
Updated on January 23, 2018 for new "Advantage" award verbiage.
This entry was posted in Visualization and tagged , , , , , . Bookmark the permalink.

12 Responses to Neatly Visualize a Year of Data

  1. Mike S says:

    What do you view as the advantages of a radar chart in this case over a pivot table with Month and Day of Month to create a heatmap?

    • Hi, Mike. To put it simply, don’t underestimate the power of concentric circles. To keep things clean, there are only 6 circles explicitly drawn in the graph. However, the mind intuitively understands that there are, in fact, 31 such circles. This helps you see patterns very quickly (if they exist, of course), such as if availability tends to occur every X days, or around the 10 of each month, etc.

      As a side note, this is also a more effective use of space than a pivot table. If I wanted to, I could double the number of months to 24 and still take up the same amount of screen real estate.

      • Mike S says:

        It feels to me like we are really just taking that pivot table heatmap (or symbol chart or grid chart) and wrapping it around a pivot point. At the center, increasing the number of months would cause the symbols to overlap. At the edge, there is an increase in space between points where there may be patterns to detect. Even with the concentric circles, it’s hard to visually compare the ends of March and September because of the distance introduced between the data, while a density of data would help. In the corners, the circular orientation adds unused space, similar to a circular gauge vs. a linear gauge.

        • A pivot table certainly has its uses. One such use would be that it would make it easier to compare weekdays to each other, assuming you lay it out with 7 columns. If you lay it out with 31 columns and 12 rows, you’re right, you can perform a similar type of analysis as this radar chart allows. I’m not discounting the value of a pivot table. In this case though, I personally feel that the radar chart is easier to read than a pivot table. In a pivot table, the cells would flow together and be somewhat reminiscent of the picture of a continuous line chart that I posted above. Maybe it’s just me, but I don’t find that visualization to effectively represent discrete data points, especially when there are two or more cells of the same color next to each other.

          Another great use case for radar charts, as I briefly mentioned in the article, is to show seasonality. This is very powerful when visualizing sales data. A circle in that case helps you understand that December is not just next to November, it’s also next to January.

          Your point about the chart getting crowded at the origin when you add more data values is well taken. I withdraw my previous statement that you can easily double the number of months. Although if you download the QVW, you can see how playing with the axis offset can help with that particular problem.

          • Mike S says:

            Good feedback.

            Have you ever used ASCII or Unicode characters in tables rather than simply coloring cell backgrounds? It can help emphasize the discreteness of the “points” in the table and not just look like a sea of color. (Ex. ● − ✕ ✔.) Bonus points for improving colorblind-friendliness. It’s something I find myself doing more, over time. Thanks.

          • I like that idea! I think I would prefer circles and dots, but good one, Mike, thanks.

  2. daft009 says:

    hey mate,

    got your code working/modified for J/business (as per thread on F.T) but I have a few n00b questions (havent done any coding since uni ~20yrs ago).

    1. how many loops do you suggest we set in the imacros settings for it to run?
    2. what if I don’t want a year full of results? 1-2 months of data is plenty to plan my work trips. what/how should I cut down the input CSV file?
    3. in light of Q2, how many loops would you suggest now?

    thanks again for this awesome tool!

    • No problem, happy to help. Fortunately, the answer to all of your questions is pretty simple. If you want only 2 months of data, just manually edit the CSV in Excel after it’s automatically generated for you and delete the rows you don’t care about. Set the number of loops (the “Max” setting) in iMacros to the final number of rows in the CSV (including the header). No need to modify the “Current” setting. Feel free to post back here if you have any problems.

      • daft009 says:

        hey mate,

        yes, figured it out right after I posted the above comment 🙂

        running well so far, will hopefully book a flight this week!

      • daft009 says:

        got it working great! 😀
        Only ran it for June 2018 and booked in a trip!
        Which I wouldn’t have found if I was looking for a return ticket, think it is/was a bug in the SQ site.

        The same trip & dates as 2x 1way tickets was easily found but not as a return journey!

        • That’s so cool! As far as I know, you’re the first person to put this article to practical use. Congratulations and happy travels!

          • daft009 says:

            Thanks again Vlad,

            Just used it again to book in another redemption in business class.

            even doing only 30days at a time for going and another 30 days for the return at a time is a non-issue. Saved me countless hours yet again!

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify via email when new comments are added

Blog Home
Archives