Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?
Eugen G Tarnow April 7 2017 11:43:47 AM
By Eugen Tarnow, Ph.D.Avalon Business Systems, Inc.
http://AvalonAnalytics.com
This is a famous data set describing the incidence of diabetes in a population prone to diabetes. It can be downloaded from here: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes .
Some thirty data science publication have resulted. But there are some strange things going on in this dataset.
First, the age distribution of the participants is exponential:
Second, the body mass index does not increase with age:
I wrote the dataset depositor but did not receive an answer. I wrote the archivists at University of California Irvine and they decided to just leave the dataset up.
But it seems there is something very wrong with it. An the publications that resulted - are they therefore wrong too?
As always, I reserve the right to be wrong.
Comments Disabled