Evaluating a Data Story

I’m midway through Alberto Cairo’s new book “The Truthful Art” and finding it very stimulating.  It’s an interesting time to be a data scientist,  journalist or consumer of data.

“The Truthful Art” encourages us to use data truthfully and fearlessly, and provides processes and principles to do so.

This week I noted a new study published by the Center for Immigration Studies (CIS). A recent Presidential Executive Order asserts that the US is in special danger from travelers from seven particular countries. The order is controversial and is currently being challenged in the courts.

The CIS study found that 72 individuals from the “seven terror-associated countries”  were convicted in terror cases since 9/11/2001.  The study offers this number as evidence of the exceptional danger posed by immigration from the seven countries.

It seems like there may be more of story here than “72 terrorists from seven countries”. The study provided a link to the raw data used. I undertook an evaluation of the data and conclusions using some of the techniques I had just been reading about.

The date used to select cases in the study was “Conviction Date”.   A more meaningful date would be “Offense Date” Offense Date was not given,  but a “Charge Date” was available. I saw this  as a better proxy for when the offense occurred.  As shown in the table below, the number of days between Charge and Conviction can be quite substantial.  Using Conviction Date skews the offense into the wrong time period.

Days from Charge to Conviction

Median 75% Maximum
381 840 2407

 

Now instead of looking at “72”, I “broadened” my view of the data as Cairo would suggest.  What about the other countries?  Are there slices of the data that provide insight?

When I plotted two country groups — banned and others — over time, an interesting story emerged.  There are no defendants from the banned countries in the last three years of the study. This suggests that travelers from those countries may actually pose less risk than travelers from other countries.

After 9/11, US domestic counter-terrorism efforts were greatly expanded and overhauled.   The decline shown in the chart suggests to me that the current screening procedures are effective and continually improving.

I’m going to continue my journey through “The Truthful Art“.

-Rob

Share

7 thoughts on “Evaluating a Data Story”

  1. Rob
    This is fascinating work you did and I would agree that using conviction date in the original study was just plain wrong. These are interesting times for US and I appreciate what you have done and plan to get a copy of Cairo’ book.
    Thanks
    Rick

  2. I’m against the ban, but in the pursuit of truthfulness – are we in a cult? 😉 – I’m wondering if the data you’ve visualized here best answers the question.

    If the goal is to assess whether the immigrants from the 7 countries pose any more danger than other immigrants, shouldn’t we look at the ratios of # of incidents to # of immigrants from each country? That would normalize the data and allow us to compare countries. In the current visual, you are comparing totals for 7 countries to totals for a much larger set of countries…

    I’m also curious whether # of incidents is the right metric or if it would be better to break that number out further into # of lives impacted (whether via injury or death). Otherwise right now the events are all weighted equally. Does a massive attack like 9/11 deserve the same weight in this analysis as much smaller incidents?

    1. Speros,
      I think you’ve raised some interesting questions regarding what metrics to use in developing public policy.

      As I responded to Oleg, my purpose here was to evaluate the study offered by CIS as evidence of the need for a ban.

      I was indeed motivated by my growing frustration at lack of evidence and disagreement on fundamental fact in public debate today. The travel ban policy seemed to driven by statements made by Candidate Trump like “we don’t know who these people are” which is demonstrably false. There should be data made available to support major policy decisions.

      1. Fair, thanks for clarifying the intent. It’s a good sign when an analysis spawns follow up questions – you’ve led us down an interesting path.

  3. Rob,

    I like your approach of applying visual analysis to current political issues. I tend to agree with Speros though, that this particular view might be a little bit deceiving.

    Remember QlikView’s “Dimension Limits” functionality? Every time you select Top N values and also show All Others in a bar chart, the “Others” usually dwarf the Top N values – just because there are so many “Others” compared to a few selected “Top” ones. Applying the same logic to the issue of Muslim terrorism – there are over 50 Muslim countries in the world, while the travel ban only applies to 7 out of 50+. So, it’s quite possible that the other 43+ countries collectively produced more terrorists. Plus, the list of terrorists might also include some non-Muslim ones, which makes my point even stronger.

    What could be interesting is to calculate the “Top 7” numbers of terrorists per country, and determine what these top 7 countries are – in total and per Year. In QlikView, it can be done with a creative use of AGGR, and I can tell you that this is just one of a few new techniques that I added to my “Set Analysis and AGGR” lecture for our next Summit in Munich. See you there!

    1. Oleg, the focus of my exercise was to evaluate the single claim made by the CIS study — that the dataset they presented supported a claim that these particular seven countries presented an exceptional danger justifying exceptional measures.

      I’m not saying that there isn’t data to support the ban. I’m arguing that this data does not.

Leave a Reply

Your email address will not be published. Required fields are marked *