Analytics blog from Avalon Business Systems. To find out more about our analytics offerings and SPSS Modeler please CLICK HERE.

 
alt

AvalonAnalytics.com (Avalon Business Systems, Inc.)

 

      Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      Eugen G Tarnow  April 7 2017 11:43:47 AM
      By Eugen Tarnow, Ph.D.
      Avalon Business Systems, Inc.
      http://AvalonAnalytics.com

      This is a famous data set describing the incidence of diabetes in a population prone to diabetes.  It can be downloaded from here: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes .  

      Some thirty data science publication have resulted.  But there are some strange things going on in this dataset.  

      First, the age distribution of the participants is exponential:

      Image:Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      Second, the body mass index does not increase with age:

      Image:Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      I wrote the dataset depositor but did not receive an answer.  I wrote the archivists at University of California Irvine and they decided to just leave the dataset up.

      But it seems there is something very wrong with it.  An the publications that resulted - are they therefore wrong too?

      As always, I reserve the right to be wrong.

      Comments Disabled