DS 6040 - Project Proposal Update

Michael Davies (mld9s)
Akeem Wells (ajw3rg)

1. Couple of sentences reorienting me to your project

Predicting Heart Disease

Overview: “Heart disease has become a major health problem in both developed and developing countries, and it is cited as the number one cause of death throughout the world each year.” Given the risk of heart disease in modern society, detection of cardiovascular disease and identifying its risk level for adults is a critical task.

Objectives: We will implement a model to classify whether a patient is normal or has heart disease. More specifically, we will develop a binary classification model that predicts the posterior probability that an individual has heart disease (given our data and model).

2. Have you obtained the data you need, and if so, what does it look like?

In short, we obtained the data and have completed preliminary cleaning, which can be seen below.

Data

The data we selected comes from:

Variables:

3. Broadly, what Bayesian model/approach you are planning on using, and if you have already begun analyzing the data.

We plan to implement a Hierarchical Bayes Approach. This is appropriate given that our data contains the same feature but is drawn from from distinct locations: Budapest-Hungary, Zurich-Switzerland, Basel-Switzerland and the VA Medical Center (Long Beach and Cleveland).

Questions we have at this point are:

Imports

Import Data

Data merging and cleaning

Merge all countries into one dataset

Clean dtypes

Check class balances on response var

Initial EDA

Imputer

Scaling

https://towardsdatascience.com/data-normalization-with-pandas-and-scikit-learn-7c1cc6ed6475

Algorithms Initial Concept Testing

https://towardsdatascience.com/building-a-bayesian-logistic-regression-with-python-and-pymc3-4dd463bbb16

Logistic Model with pmc3

Bayesian Model Averaging Logistic Regression

https://www.kaggle.com/billbasener/bayesian-model-averaging-logistic-regression

Bayesian Model Averaging

Hierarchical Model - Non-Informative

https://docs.pymc.io/notebooks/GLM-hierarchical.html

https://docs.pymc.io/notebooks/posterior_predictive.html

Hierarchical Model - Deterministic

Offset Plots

Sigma Plots

Marginal Plots

Summary Figures and Tables

HW 4 Model Building - Testing

Additional Resources

https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/

https://docs.pymc.io/notebooks/posterior_predictive.html

https://discourse.pymc.io/t/do-we-need-a-testing-set/759/5