News Text Classification

This project builds a multi-class text classification pipeline to categorize AG News articles into four categories: World, Sports, Business, and Science/Technology. Using the AG News dataset from Kaggle, the system preprocesses raw text, extracts features via TF-IDF vectorization, and trains a Logistic Regression model for classification. The workflow includes comprehensive evaluation with accuracy, classification reports, confusion matrices, and visualizations of the most frequent words per category.

Year

2025

Service

ML Model

Category

NLP

Tool

scikit-learn


Description:

This project focuses on classifying news articles into one of four categories—World, Sports, Business, and Science/Technology—using Natural Language Processing (NLP) and machine learning. The workflow begins with preprocessing the article text through lowercasing, removal of special characters, stopword removal, and lemmatization using NLTK.

The cleaned text is transformed into numerical features using TF-IDF vectorization, capturing the relative importance of words across the dataset. A Logistic Regression model is then trained to perform multi-class classification. Model performance is evaluated using accuracy scores, classification reports, and confusion matrices. Additionally, the most frequent words per category are visualized using Seaborn bar plots for deeper linguistic insights.

The project demonstrates the complete lifecycle of an NLP application: from dataset preparation and text preprocessing to feature engineering, supervised model training, evaluation, and result visualization.

Key Highlights:

  • Problem: Automatically categorize news articles into predefined topics.

  • Approach: TF-IDF + Logistic Regression.

  • Dataset: AG News Dataset from Kaggle.

  • Deployment: Local execution in Python with visualizations for insights.

Tools:

  • Python

  • Pandas

  • scikit-learn

  • NLTK

  • TfidfVectorizer

  • Matplotlib

  • Seaborn

  • tqdm

  • scikit-learn

Create a free website with Framer, the website builder loved by startups, designers and agencies.