This machine learning project aimed at exploring and predicting factors that significantly influence students' academic performance. I used different ML algorithms, and aimed to find the most well performing one, and used these models to find the most important features for it.
The first challenge, as with most machine learning projects, was the data, since it wasn't clean. I preprocessed it, and went on to clustering, which was also quite tough to interpret, since the clusters had revealed a weird way that the grades were formatted
The models, chosing what models to compare, and how, was another difficult challenge. I settled on comparing the r-score, MSE, and MAE. The most difficult thing with the models, was that there were a lot of columns that were correlated with one another, which introduced multicolinearity. This was tough to weed out, as I had to run VIF, to find the features that were highly correlated, and remove them. This is the reason why before running the training on the models, I have removed so many columns