UK2007 Spam Detection Analysis

A comprehensive machine learning school project focused on detecting web spam.

Screenshot

UK2007 Spam Detection Analysis screenshot

Tech Stack

R
RStudio
Markdown

Project Details

Analyzed three types of features — Direct, Link-based (transformed), and Content-based — to determine the best predictors of web spam

Applied machine learning models including Logistic Regression, Random Forest, and SVM using R and evaluated with cross-validation

Performed feature set combination analysis to test additive performance effects and ranked models using AUC as the primary metric

Automated full report generation with RMarkdown, including plots, tables, and ROC curves for each classifier-feature combination

Provided domain-specific discussion on spam detection strategies and summarized insights into feature importance and classifier behavior