I break things, read source code,
and ship fixes upstream.

Data engineer in food manufacturing. I build pipelines, compliance dashboards, and local AI tools for factories where data can't leave the building.

$ cat ./what-i-build

OpsMind on-prem AI for factories. NL-to-SQL, RAG search, 36 tests, runs on Ollama
Compliance Dashboard BRC/HACCP. 674 batches, 8640 temp readings, z-score anomaly detection
UK Crime Pipeline API → PostgreSQL → dbt → Streamlit. 99,675 records, 53 tests, 3 CI/CD
SQL Ops Reviewer GitHub Action. Reviews .sql in PRs with local AI. One YAML file.

$ git log --oneline --author=pawan | wc -l
11 projects · 200K+ combined stars · 3 merged PRs · 10 open PRs

Projects

Things I built that are running in production or deployed publicly.

OpsMind

On-premises AI assistant for food manufacturing. Ask your database questions in English — the LLM generates SQL, executes it, and explains the result. Upload SOPs and search them with RAG. Runs entirely on Ollama, no data leaves the network.

36 tests · 7 modules · 147-table schema registry · SQLite + SQL Server
Python Ollama ChromaDB LangChain Streamlit SQLAlchemy Pytest

Manufacturing Compliance Dashboard

BRC/HACCP food safety compliance. Trace any batch from raw material to customer despatch in seconds. Temperature anomaly detection with rolling z-scores. Allergen matrix. One-click PDF audit reports. Built by someone on the factory floor.

674 batches · 8,640 temp readings · z-score anomaly detection · PySpark batch analytics
Live Streamlit PySpark Databricks Plotly Docker

UK Crime Pipeline

End-to-end data pipeline. Ingests live crime data from the Police UK API, transforms with dbt (staging/marts, 53 tests), orchestrates with Prefect, serves a public Streamlit dashboard. 3 GitHub Actions workflows: CI, weekly auto-ingest, daily health checks.

99,675 records · 10 cities · 53 dbt tests · 3 CI/CD workflows · weekly auto-ingest
Live Python PostgreSQL dbt Prefect Streamlit GitHub Actions

SQL Ops Reviewer

GitHub Action that auto-reviews .sql files in pull requests using local AI. Catches injection risks, performance anti-patterns, style violations. Posts structured review comments with fix suggestions. One YAML file to set up.

10 analysis categories · phi3:mini default · zero API keys · runs on CI runner
Python Ollama GitHub Actions

MediAsk

Health Q&A platform for factory workers. NHS-verified guidance, Gemini AI responses, voice input, 18 languages. Flask + PostgreSQL, Dockerised.

Live Flask PostgreSQL Gemini AI Docker

UK Education Attainment

ML analysis of UK A-Level attainment gaps across ethnicity, gender, and deprivation. Feature importance with XGBoost.

Python XGBoost Scikit-Learn

Open Source Contributions

I fix bugs and ship features in the tools I depend on. 11 projects, 200K+ combined stars.

ProjectContribution
vllm 75KImproved DCP/PCP error messages with actionable backend guidance
Apache Superset 65KRenamed supersetCanCSV → supersetCanDownload across frontend
pandas 45KClarified str.cat() return type docs for Index
ChromaDB 18KVersion compat check + 220-line HNSW tuning guide
Plotly 17KDependabot config for uv.lock
dbt-core 10KRemoved unnecessary profiler context manager arg
dlt 7KMigrated flake8 config to ruff
ollama-python 5KAdded exists() method + fixed ShowResponse ValidationError
drtSnowflake connector (290 lines, tests) + Dockerfile + pre-commit hooks — merged
fpdf2 1KFixed TextRegion.ln() double line break

Stack

Tools I use daily.

Languages

Python SQL PySpark JavaScript

Data

dbt Airflow Prefect PostgreSQL Databricks Delta Lake

AI / ML

Ollama ChromaDB LangChain Scikit-Learn XGBoost

Infra

Docker Terraform GitHub Actions AWS Azure

Education

MSc Data Analytics — Aston University, Birmingham
BCA (Computer Applications) — Amity University, Noida

Certifications

PL-300 — Microsoft Power BI Data Analyst
Google — Data Analytics Professional Certificate
Azure — Data Engineering (DP-203 path)
AWS — Cloud Practitioner