|
.. _diabetes_dataset: |
|
|
|
Diabetes dataset |
|
---------------- |
|
|
|
Ten baseline variables, age, sex, body mass index, average blood |
|
pressure, and six blood serum measurements were obtained for each of n = |
|
442 diabetes patients, as well as the response of interest, a |
|
quantitative measure of disease progression one year after baseline. |
|
|
|
**Data Set Characteristics:** |
|
|
|
:Number of Instances: 442 |
|
|
|
:Number of Attributes: First 10 columns are numeric predictive values |
|
|
|
:Target: Column 11 is a quantitative measure of disease progression one year after baseline |
|
|
|
:Attribute Information: |
|
- age age in years |
|
- sex |
|
- bmi body mass index |
|
- bp average blood pressure |
|
- s1 tc, total serum cholesterol |
|
- s2 ldl, low-density lipoproteins |
|
- s3 hdl, high-density lipoproteins |
|
- s4 tch, total cholesterol / HDL |
|
- s5 ltg, possibly log of serum triglycerides level |
|
- s6 glu, blood sugar level |
|
|
|
Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1). |
|
|
|
Source URL: |
|
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html |
|
|
|
For more information see: |
|
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499. |
|
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf) |
|
|