Analyzed 2000+ job postings using Bag-of-Words NLP and K-Means clustering (no sklearn) | Python, Pandas, Matplotlib | Based on Data Science from Scratch(Book written by Joel Grus)
End-to-end data science project analyzing 2000+ LinkedIn job postings to extract in-demand skills and cluster jobs by skill profiles β built entirely from scratch using concepts from Data Science from Scratch by Joel Grus (OβReilly).
π Live Website Β· π Open in Colab Β· π¦ Dataset
| Top 20 Skills | Word Cloud |
|---|---|
![]() |
![]() |
| Cluster Distribution | Skill Heatmap |
|---|---|
![]() |
![]() |
| Elbow Method | Category Breakdown |
|---|---|
![]() |
![]() |
| Feature | Chapter | Concept |
|---|---|---|
| Counter, defaultdict | Ch. 1β2 | Python data structures |
| Vector math & distance | Ch. 4 | Linear algebra from scratch |
| Data loading & cleaning | Ch. 10 | Working with data |
| Bag of Words / NLP | Ch. 13, 21 | Text tokenization & frequency |
| K-Means clustering | Ch. 19 | Unsupervised learning from scratch |
| Matplotlib charts | Ch. 3 | Data visualization |
β No scikit-learn. No PyTorch. No shortcuts. Every algorithm is hand-coded.
Raw Kaggle Data
β
Data Cleaning β Ch. 10
β
Skill Extraction β Ch. 13, 21 (Bag of Words NLP)
β
Word Frequency Analysis β Ch. 1 (Counter, defaultdict)
β
K-Means Clustering β Ch. 19 (From scratch!)
β
Visualizations + Job Recommender
Click the badge to open directly in Colab:
# 1. Clone the repo
git clone https://github.com/shivangisinha828-beep/JOB_POSTING_ANALYZER.git
cd JOB_POSTING_ANALYZER
# 2. Install dependencies
pip install -r requirements.txt
# 3. Add your Kaggle credentials
# Place kaggle.json in ~/.kaggle/
# 4. Open the notebook
jupyter notebook NOTEBOOK/JOBPOSTINGANALYZER.ipynb
JOB_POSTING_ANALYZER/
β
βββ NOTEBOOK/
β βββ JOBPOSTINGANALYZER.ipynb β full analysis notebook
β
βββ OUTPUT/
β βββ Top Skills from Job Posting Analyzer.png
β βββ Job Posting Analyzer Wordcloud.png
β βββ Clusters Pie from Job Posting Analyzer.png
β βββ Skill Heatmap from Job Posting Analyzer.png
β βββ Job Posting Analyzer Elbow.png
β βββ Job Posting Analyzer Stacked.png
β
βββ index.html β project website
βββ requirements.txt
βββ .gitignore
βββ README.md
| Tool | Purpose |
|---|---|
| Python 3 | Core language |
| Pandas | Data loading only |
| Matplotlib + Seaborn | Visualizations |
| WordCloud | Text visualization |
| Google Colab | Notebook environment |
| Kaggle API | Dataset source |
| β No scikit-learn | Everything from scratch |
Book: Data Science from Scratch by Joel Grus (OβReilly, 2nd Edition)
Shivangi Sinha π Currently reading: Data Science from Scratch β Joel Grus π GitHub
This project is licensed under the MIT License.