N
Neel Shah
About

Building at the frontier of
AI, data & health

Computer Engineer specialising in LLM deployment and data systems. Based in Ottawa, Canada. 10+ years turning complex data into real-world impact.

I'm Neel Shah — a Computer Engineer with deep roots in data engineering, NLP, and healthcare informatics, now focused on the rapidly evolving world of Large Language Models. My work spans building scalable data infrastructure, applying ML to health and social data, and integrating LLMs into production systems.

As Tech Lead at CIHI (Canadian Institute for Health Information), I lead large-scale PySpark pipelines processing over 1 billion Canadian health data points — covering all national registry, diagnosis, and pharma data — for government and NPO clients. I manage client relationships, lead the engineering team end-to-end through the SDLC, and operate across Azure, AWS, and Databricks. Working with PII at national population scale in a PIPEDA-regulated environment is daily reality — data governance and privacy compliance are non-negotiable. Before CIHI, at EXL Service, I built PySpark credit risk platforms for Goldman Sachs — powering Apple Card, Walmart Card, and GM Card risk decisioning. Previously, as a researcher at Lakehead University, I published peer-reviewed work in NLP and distributed data systems that has accumulated 89+ citations.

Today, my focus is on the AI layer: integrating Claude, GPT, and open-weight models into data workflows, deploying local LLMs for privacy-sensitive environments, and building the AI-ready datasets that make these systems actually work. I believe the next decade of impact will be won by engineers who can bridge raw data and LLM capabilities.

I also created emot — an early open source contribution that grew to 1M+ downloads. It's a reminder that the best tools solve one thing really well.

Originally from Vadodara, India — graduated 1st in my engineering class — I moved to Canada for graduate studies and have contributed to both the tech community and volunteer AI initiatives since.

Quick facts
  • 📍 Ottawa, Ontario, Canada
  • 🏢 CIHI (current)
  • 🎓 Lakehead University
  • 💻 10+ years experience
  • 📄 3 research papers · 89+ citations
  • 📦 1M+ open source downloads
  • 🌍 5 languages
Languages
  • English Native
  • Hindi Native
  • Gujarati Native
  • French Elementary
  • Sanskrit Limited
Primary Expertise

AI & LLM Skills

🤖

LLM Integration

Claude API (Anthropic)OpenAI / GPT-4CodexPrompt EngineeringRAG PipelinesLangChainVector DatabasesFunction CallingEmbeddings
🏠

Local LLM Deployment

OllamaLM Studiollama.cppMistralLlama 3Phi-3GemmaOn-prem deploymentEdge AIPrivacy-first AI
📦

AI-Ready Data Generation

Dataset curationData labellingSynthetic dataFine-tuning prepRLHF dataAnnotation pipelinesQuality filteringDeduplication
🧠

ML & NLP

scikit-learnRandom ForestText ClassificationSentiment AnalysisSocial Media MiningElasticsearchKibanaPandasNLTK

Other Technical Skills

Big Data
PySparkApache SparkDatabricksLarge-scale ETLData lakes
Cloud
AzureAWSDatabricksManaged SparkCloud pipelines
Languages
PythonSQLKotlinJavaScriptBash
Domains
HealthcareFinancial ServicesCredit RiskPII / PIPEDAGovernment & NPO

Experience

Canadian Institute for Health Information (CIHI) Current
Tech Lead
2019 – Present · Ottawa, Ontario, Canada

Leads large-scale PySpark pipeline processing 1B+ Canadian health data points (registry, diagnosis, pharma) for government and NPO clients. Manages client relationships, leads engineering team end-to-end through full SDLC, and handles PII at national scale under PIPEDA and provincial privacy legislation.

EXL Service
Data Engineer — Financial Risk Systems
May 2022 – Jul 2023 · Ottawa, ON

Built PySpark-based credit risk management platforms for Goldman Sachs — covering Apple Card, Walmart Card, and GM Card portfolios. Processed high-volume financial transaction data with strict PII, audit, and regulatory compliance requirements.

Lakehead University
Graduate Researcher
2017 – 2019 · Thunder Bay, Ontario, Canada

Published 3 peer-reviewed papers on NLP and distributed social media analytics. Developed ML models and Elasticsearch-based pipelines for large-scale health data analysis.

IDLI — Indian Deep Learning Initiative
Data Research Analyst & Web Developer (Volunteer)
Feb 2017 – Jan 2020 · Remote

Contributed to deep learning research, data analysis, and web infrastructure for an Indian AI research initiative.

Education

🎓
Lakehead University
Graduate Studies · Computer Science
2017–2019 · Thunder Bay, ON, Canada
🎓
Parul Institute of Engineering
Bachelor of Engineering
2010–2014 · Vadodara, India
🏆 1st Rank — Excellence Certificate