JOB_POSTING_ANALYZER

Analyzed 2000+ job postings using Bag-of-Words NLP and K-Means clustering (no sklearn) | Python, Pandas, Matplotlib | Based on Data Science from Scratch(Book written by Joel Grus)

πŸ“Š Job Posting Analyzer

End-to-end data science project analyzing 2000+ LinkedIn job postings to extract in-demand skills and cluster jobs by skill profiles β€” built entirely from scratch using concepts from Data Science from Scratch by Joel Grus (O’Reilly).

🌐 Live Website Β· πŸ““ Open in Colab Β· πŸ“¦ Dataset


πŸš€ What This Project Does


πŸ“Έ Output Visualizations

Top 20 Skills Word Cloud
Top Skills Word Cloud
Cluster Distribution Skill Heatmap
Clusters Heatmap
Elbow Method Category Breakdown
Elbow Stacked

🧠 Concepts from Data Science from Scratch

Feature Chapter Concept
Counter, defaultdict Ch. 1–2 Python data structures
Vector math & distance Ch. 4 Linear algebra from scratch
Data loading & cleaning Ch. 10 Working with data
Bag of Words / NLP Ch. 13, 21 Text tokenization & frequency
K-Means clustering Ch. 19 Unsupervised learning from scratch
Matplotlib charts Ch. 3 Data visualization

βœ… No scikit-learn. No PyTorch. No shortcuts. Every algorithm is hand-coded.


πŸ—ΊοΈ Project Pipeline

Raw Kaggle Data
      ↓
  Data Cleaning          ← Ch. 10
      ↓
 Skill Extraction        ← Ch. 13, 21  (Bag of Words NLP)
      ↓
Word Frequency Analysis  ← Ch. 1       (Counter, defaultdict)
      ↓
K-Means Clustering       ← Ch. 19      (From scratch!)
      ↓
Visualizations + Job Recommender

πŸ› οΈ How to Run

Click the badge to open directly in Colab:

Open In Colab

Option 2 β€” Run Locally

# 1. Clone the repo
git clone https://github.com/shivangisinha828-beep/JOB_POSTING_ANALYZER.git
cd JOB_POSTING_ANALYZER

# 2. Install dependencies
pip install -r requirements.txt

# 3. Add your Kaggle credentials
#    Place kaggle.json in ~/.kaggle/

# 4. Open the notebook
jupyter notebook NOTEBOOK/JOBPOSTINGANALYZER.ipynb

πŸ“¦ Dataset


πŸ” Key Results


πŸ“ Repo Structure

JOB_POSTING_ANALYZER/
β”‚
β”œβ”€β”€ NOTEBOOK/
β”‚   └── JOBPOSTINGANALYZER.ipynb   ← full analysis notebook
β”‚
β”œβ”€β”€ OUTPUT/
β”‚   β”œβ”€β”€ Top Skills from Job Posting Analyzer.png
β”‚   β”œβ”€β”€ Job Posting Analyzer Wordcloud.png
β”‚   β”œβ”€β”€ Clusters Pie from Job Posting Analyzer.png
β”‚   β”œβ”€β”€ Skill Heatmap from Job Posting Analyzer.png
β”‚   β”œβ”€β”€ Job Posting Analyzer Elbow.png
β”‚   └── Job Posting Analyzer Stacked.png
β”‚
β”œβ”€β”€ index.html                     ← project website
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
└── README.md

🧰 Tech Stack

Tool Purpose
Python 3 Core language
Pandas Data loading only
Matplotlib + Seaborn Visualizations
WordCloud Text visualization
Google Colab Notebook environment
Kaggle API Dataset source
❌ No scikit-learn Everything from scratch

πŸ“š Reference

Book: Data Science from Scratch by Joel Grus (O’Reilly, 2nd Edition)


πŸ‘€ Author

Shivangi Sinha πŸ“š Currently reading: Data Science from Scratch β€” Joel Grus πŸ”— GitHub


πŸ“„ License

This project is licensed under the MIT License.