Continuing our series on little-known charts and how to implement them in QlikView!

__Confidence Interval__*percentiles*, so often there will not be the same number of outliers on each side, and sometimes there may be no outliers at all. Practically speaking, if you are dealing with summer temperatures, for instance, and most of your temperatures fall in the 60 - 74 degree Fahrenheit range, this could lead to removal of a 52 degree outlier, a 76 degree outlier, etc., leaving the rest of your data points cleaner so that you can see real patterns more easily.

__Box Plot__*that*. This becomes your lower box limit, also known as Q1. Then you do the same thing with the numbers to the right of the true median, and that becomes your upper box limit, also known as Q3. The whiskers simply represent the highest and lowest numbers that our confidence interval has left us. That's it. We end up with a picture like this:

__Density Plot__

__QlikView-Specific Caveats__- You cannot control the width of box plots, so cannot fit the box plots
__within__the density plots to make it truly look violin-like - You cannot trellis charts that have box plot expressions
- You cannot have a line graph be both smooth and have an area that is filled in with a solid color
- Line graphs with filled-in areas do not work very well, in general, in combination with trellis charts
- Trellis chart have their own limitations, such as the inability to hide column/row separators and trellised dimension labels

__Change Over Time__

__Putting It All Together__*concentrations*of data. Box plots are powerful visualizations in their own right, but simply knowing the median and Q1/Q3 values leaves a lot unsaid. There are many ways to arrive at the same median. For instance, if you have 7 data points {67,68,69,70,71,72,73} then the median is 70. But if your data points are {60,60,60,70,80,80,80} the median is also 70, but the picture is very different. When the violin density plot tapers, it means that the results are less dense: in plain English, that there are less of them. When it gets wide, the density is higher. As a rule of thumb, the more curvaceous the density plot appears, especially in/around the interquartile range (i.e. the "box" from our box plot AKA the IQR), the more variance there is in the data. A stocky density plot, by contrast, indicates that the results are more evenly distributed.

In July, for instance, you had about an equal chance of getting 64, 65, or 66 degrees, with a slightly higher chance of 66. The total likelihood of getting one of these three average temperatures on a July day was 50% (one shot in two). However, notice that the density plot doesn't really taper above the IQR; the upper whisker tells us that the top 25% of days had temperatures of 67 or 68 degrees and the density plot tells me that either one of those was about as likely as the other. The lower whisker tells us that the bottom 25% of days had temperatures of 60 - 63 degrees; however, the tapered density plot tells me that 62 and 63 occurred much more often than 60 and 61. Anything higher than 68 degrees or lower than 62 degrees was an outlier excluded by our confidence interval—getting weather like that was a fluke.

Now let's look at November and we'll see a very different story. Obviously it was colder, because it was winter: the median temperature was 60 degrees. Notice also than the IQR is taller now, which tells us that we experienced a greater *range* of temperatures (between 58 and 62). However, not all of these temperatures were equally likely. We had 58, 60, and 62 degree weather __much__ more often than 59 or 61 degrees. Without a density plot, we would never have known that strange factoid. So while we still had a 50% shot of landing in the IQR range, it was not evenly distributed. We also see that 25% of the days were colder than 58 degrees, most often 56 degrees, to be specific. Again, without a density plot all we would have known is that this bottom 25% of days was somewhere between 50 and 58 degrees—a pretty big difference compared to the picture that we see above! The 25% of days that were warmer than 62 degrees were likely going to be 64 degrees, but could also very well have been 63 or 65. It was very unusual to experience weather colder than 55 degrees or warmer than 65 degrees.

*past*weather, actually indicate that any particular weather is more likely in the

*future*? The answer is: possibly. But to get a better idea of whether we're seeing a repeating pattern or simply an interesting historic distribution, we need to be mindful of the types of trends that Karl has created. It's always important to keep one eye on a trend chart when looking at distributions; if you don't, you could draw wrong conclusions from a historic metric averaging out a certain way. If we take a look at Karl's chart that we discussed above, we see, for example, that while November has remained about steady over time, July seems to be getting hotter each year. This might alert us that the past might not be a good indication of the future during Mexico's summers. You

*might*, however, be justified in assuming that the November results will continue to be more or less that same in future years and, if you're planning a visit (and feeling lucky), to not pack for weather colder than 55 degrees.

__A Complete Tangent__
Thanks Vlad for introducing me to violin plots. I especially like the detail that is revealed with the density plot that I’ve never noticed before. For example, I can see that the plots from June to September are fatter. This time period corresponds with the rainy season when it rains almost everyday in the afternoon. Temperatures are more stable and vary slightly from the median. All the other months are marked by weather fronts that pass over and more dramatically change the temperature for several days before another front comes in. Very cool.

Hi, Karl. Thanks for linking this post from your blog! Very cool to kind of “collaborate” with you on a project, let’s do this again some time!