§ Research / Earlier work

Cardiovascular disease prediction

Scikit-learn · PyTorch · Cleveland Clinic dataset

First real ML research project. A binary classifier over the Cleveland Clinic dataset, predicting cardiovascular disease from a small set of patient features. The interesting work was less the model and more the surrounding craft: train/test splitting, feature engineering, calibration, and learning to write evaluation that doesn't lie to you. Final model reached 96% test accuracy.

The project

A binary classifier over the Cleveland Clinic dataset, predicting cardiovascular disease from a small set of patient features (age, sex, chest pain type, resting BP, cholesterol, fasting blood sugar, ST depression, etc.). My first real ML research project.

What it taught me

Less the model and more the surrounding craft: train/test splitting, feature engineering, calibration, ablations, and learning to write evaluation that doesn't lie to you. The final 96% test accuracy was nice, but the thing I actually walked away with was a healthier suspicion of any single number.

Stack

  • Scikit-learn for the classical baselines (logistic regression, SVM, random forest).
  • PyTorch for a small MLP comparison.
  • Pandas / NumPy / Matplotlib for the rest.
§ At a glance
Category
Earlier work
Period
2023
Supervisor
Kyungdong University
Dataset
Cleveland Clinic Cardiovascular Disease