Student Dropout Risk Predictor

Option C: Machine Learning Final Project

Benjamin Hislop: 900896194

Project Overview

The goal of this project is to predict student dropout risk based on various demographic, socioeconomic, and academic factors. By identifying students at high risk early, institutions can intervene and provide necessary support.

This application uses a Random Forest Classifier to sort students into distinct categories: Dropout, Enrolled, or Graduate.

Data Source

Dataset: Predict Students' Dropout and Academic Success (UCI Machine Learning Repository)

  • Total Features: 36 (Demographic, Socioeconomic, Macroeconomic, Academic)
  • Records: 4424 students

Key features include Marital Status, Course, Qualifications (Mother/Father), GDP, Unemployment Rate, Tuition Fees status, and Age at enrollment.

Model Architecture

Random Forest Classifier

We chose Random Forest because:

  • It effectively captures complex non-linear interactions between features.
  • It is robust against overfitting compared to decision trees.
  • It handles tabular data with categorical features well.

Performance: The model achieves ~81% accuracy on the test set, with a strong ability to distinguish between Dropouts and Graduates.

App Functionality

The web application provides two main modes of interaction:

Single Student Prediction

Fill out a form with a student's details to get an immediate risk assessment. Input validation ensures data quality before processing.

Batch CSV Processing

Upload a CSV file containing data for multiple students. The app will:

  • Validate the file format and columns.
  • Process all records using the trained model.
  • Return a ranked list of students, sorted by dropout risk.
  • Allow exporting the results for further analysis.