I break things, read source code,
and ship fixes upstream.

Data engineer in food manufacturing. I build pipelines, compliance dashboards, and local AI tools for factories where data can't leave the building.

pawankapkoti3889@gmail.com · LinkedIn · GitHub · Yorkshire, UK

$ cat ./what-i-build

OpsMind — on-prem AI for factories. NL-to-SQL, RAG search, 36 tests, runs on Ollama
Compliance Dashboard — BRC/HACCP. 674 batches, 8640 temp readings, z-score anomaly detection
UK Crime Pipeline — API → PostgreSQL → dbt → Streamlit. 99,675 records, 53 tests, 3 CI/CD
SQL Ops Reviewer — GitHub Action. Reviews .sql in PRs with local AI. One YAML file.

$ git log --oneline --author=pawan | wc -l
11 projects · 200K+ combined stars · 3 merged PRs · 10 open PRs

Projects

Things I built that are running in production or deployed publicly.

OpsMind

Code Docs

On-premises AI assistant for food manufacturing. Ask your database questions in English — the LLM generates SQL, executes it, and explains the result. Upload SOPs and search them with RAG. Runs entirely on Ollama, no data leaves the network.

36 tests · 7 modules · 147-table schema registry · SQLite + SQL Server

Python Ollama ChromaDB LangChain Streamlit SQLAlchemy Pytest

Manufacturing Compliance Dashboard

Code Live

BRC/HACCP food safety compliance. Trace any batch from raw material to customer despatch in seconds. Temperature anomaly detection with rolling z-scores. Allergen matrix. One-click PDF audit reports. Built by someone on the factory floor.

674 batches · 8,640 temp readings · z-score anomaly detection · PySpark batch analytics

Live Streamlit PySpark Databricks Plotly Docker

UK Crime Pipeline

Code Live

End-to-end data pipeline. Ingests live crime data from the Police UK API, transforms with dbt (staging/marts, 53 tests), orchestrates with Prefect, serves a public Streamlit dashboard. 3 GitHub Actions workflows: CI, weekly auto-ingest, daily health checks.

99,675 records · 10 cities · 53 dbt tests · 3 CI/CD workflows · weekly auto-ingest

Live Python PostgreSQL dbt Prefect Streamlit GitHub Actions

SQL Ops Reviewer

Code

GitHub Action that auto-reviews .sql files in pull requests using local AI. Catches injection risks, performance anti-patterns, style violations. Posts structured review comments with fix suggestions. One YAML file to set up.

10 analysis categories · phi3:mini default · zero API keys · runs on CI runner

Python Ollama GitHub Actions

MediAsk

Code Live

Health Q&A platform for factory workers. NHS-verified guidance, Gemini AI responses, voice input, 18 languages. Flask + PostgreSQL, Dockerised.

Live Flask PostgreSQL Gemini AI Docker

UK Education Attainment

Code

ML analysis of UK A-Level attainment gaps across ethnicity, gender, and deprivation. Feature importance with XGBoost.

Python XGBoost Scikit-Learn

Open Source Contributions

I fix bugs and ship features in the tools I depend on. 11 projects, 200K+ combined stars.

Project	Contribution
vllm 75K	Improved DCP/PCP error messages with actionable backend guidance
Apache Superset 65K	Renamed supersetCanCSV → supersetCanDownload across frontend
pandas 45K	Clarified str.cat() return type docs for Index
ChromaDB 18K	Version compat check + 220-line HNSW tuning guide
Plotly 17K	Dependabot config for uv.lock
dbt-core 10K	Removed unnecessary profiler context manager arg
dlt 7K	Migrated flake8 config to ruff
ollama-python 5K	Added exists() method + fixed ShowResponse ValidationError
drt	Snowflake connector (290 lines, tests) + Dockerfile + pre-commit hooks — merged
fpdf2 1K	Fixed TextRegion.ln() double line break

Stack

Tools I use daily.

Languages

Python SQL PySpark JavaScript

Data

dbt Airflow Prefect PostgreSQL Databricks Delta Lake

AI / ML

Ollama ChromaDB LangChain Scikit-Learn XGBoost

Infra

Docker Terraform GitHub Actions AWS Azure

Education

MSc Data Analytics — Aston University, Birmingham

BCA (Computer Applications) — Amity University, Noida

Certifications

PL-300 — Microsoft Power BI Data Analyst

Google — Data Analytics Professional Certificate

Azure — Data Engineering (DP-203 path)

AWS — Cloud Practitioner

I break things, read source code,and ship fixes upstream.

Projects

OpsMind

Manufacturing Compliance Dashboard

UK Crime Pipeline

SQL Ops Reviewer

MediAsk

UK Education Attainment

Open Source Contributions

Stack

Languages

Data

AI / ML

Infra

Education

Certifications

I break things, read source code,
and ship fixes upstream.