Researchers in India have carried out a data mining exercise
to determine which are the most important risk factors in increasing the
chances of an individual suffering a heart attack.
Writing in the International Journal of Biomedical
Engineering and Technology, they confirm that the usual suspects high blood
cholesterol, intake of alcohol and passive smoking play the most crucial role
in "severe", "moderate" and "mild" cardiac risks,
Subhagata Chattopadhyay of the Camellia Institute of
Engineering in Kolkata adds that being male aged between 48 and 60 years are
exposed to severe and moderate risk by virtue of their age and gender
respectively, whereas women over 50 years old are effected by mild risk in the
absence of the other factors.
Medical prognosis is a highly subjective art as is
determining risk for particular health events, such as heart attack. After all,
clinical history, symptoms and signs rarely follow a linear path and their
interpretation at the individual level by doctor does not usually conform to
the rules of epidemiology – personal intuition, emotions, logic and experience
all conspire to confound the conclusion drawn for each patient at a given time
under a particular set of circumstances.
Computational data mining
The use of computational data mining techniques that allow
researchers to extract interesting and meaningful information from real-life
clinical data could remove at least some aspect of the subjectivity of clinical
prognosis and allow the epidemiology to work at the patient level more
There have been data mining approaches tried before.
However, they often have inherent problems in that the classification of the
data for information retrieval is based on decision making learnt from examples
set by doctors and so they incorporate the very subjectivity that Chattopadhyay
hopes to avoid with his approach.
He has used 300 real-world sample patient cases with various
levels of cardiac risk mild, moderate and severe and mined the data based on
twelve known predisposing factors: age, gender, alcohol abuse, cholesterol
level, smoking (active and passive), physical inactivity, obesity, diabetes,
family history, and prior cardiac event. He then built a risk model that
revealed specific risk factors associated with heart attack risk.
"The essence of this work essentially lies in the
introduction of clustering techniques instead of purely statistical modelling,
where the latter has its own limitations in 'data-model fitting' compared to
the former that is more flexible," Chattopadhyay explains. "The
reliability of the data used, should be checked, and this has been done in this
work to increase its authenticity. I reviewed several papers on epidemiological
research, where I'm yet to see these methodologies, used."