N
Neel Shah

Neel Shah

Tech Lead · Senior Data Engineer · AI/LLM Specialist · Researcher

📍 Ottawa, ON, Canada 📞 +1-343-574-1888 ✉️ neel@neelshah18.com 🌐 neelshah18.com 🔗 LinkedIn 🐙 GitHub 📚 Scholar
Download PDF

Profile

Tech Lead and Senior Data Engineer with 10+ years delivering large-scale PySpark systems across two regulated, high-stakes industries — national health infrastructure and financial services credit risk. At CIHI, leads engineering of pipelines processing 1B+ Canadian health data points (registry, diagnosis, pharma) for government and NPO clients; previously at EXL / Goldman Sachs, built PySpark credit risk platforms handling Apple Card, Walmart Card, and GM Card portfolios at 1M transactions/hour, saving clients $10M+. Cloud-native across Azure, AWS, and Databricks. Now actively applying this foundation to LLM deployment and AI integration — RAG pipelines, local model deployment for privacy-first workloads, and AI-ready data curation. Dual Master's degrees (GPA 3.9/4.0), published researcher with 89+ citations, and open source creator with 1M+ downloads.

10+
Years exp.
1B+
Health data points
1M/hr
Transactions
$10M+
Client savings
89+
Citations
1M+
OSS downloads

AI & LLM Expertise

LLM Integration
  • Claude API (Anthropic)
  • OpenAI / GPT-4
  • Prompt Engineering
  • RAG pipelines
  • Function Calling
  • Embeddings
  • LangChain
Local LLM Deployment
  • Ollama
  • LM Studio
  • llama.cpp
  • Mistral
  • Llama 3
  • Phi-3
  • On-prem / PII-safe AI
AI-Ready Data
  • Dataset curation
  • Synthetic data
  • Fine-tuning prep
  • RLHF data
  • Annotation pipelines
  • Health data for AI
Big Data & Cloud
  • PySpark
  • Apache Spark
  • Databricks
  • Azure
  • AWS
  • Large-scale ETL
  • Data governance

Technical Skills

Languages
Python · SQL · R · Rust · Go · Kotlin · JavaScript · Bash
Big Data
PySpark · Apache Spark · Databricks · Elasticsearch · ETL · Data lakes
Cloud
Azure · AWS · Databricks · Docker · CI/CD · GitHub Actions
APIs & Web
FastAPI · Flask · REST · MongoDB · PostgreSQL · Microservices
Data & ML
Pandas · NumPy · SciPy · scikit-learn · Random Forest · NLP · NLTK
Viz & Reporting
Power BI · Kibana · Real-time dashboards
Practices
Agile · Scrum · TDD · Code review · Technical leadership · SDLC
Domains
Healthcare · Financial Services · Credit Risk · PII / PIPEDA · Government & NPO

Experience

Canadian Institute for Health Information (CIHI)
Tech Lead
July 2023 – Present
Ottawa, ON
PySparkPythonETLAzureHealth DataAgile
  • Led the transformation of the R&D flagship product "CMG Grouper" from SAS to Python and PySpark — now processing up to 24 million records with 200+ parameters in under 60 minutes
  • Engineered and deployed an end-to-end ETL pipeline for CMG Grouper, ingesting diverse data types from all Canadian hospitals at population scale (1B+ data points across registry, diagnosis, and pharma datasets)
  • Serve government (federal & provincial) and NPO health clients — leading requirements gathering, client relationship management, and delivery across multiple concurrent engagements
  • Lead and mentor cross-functional engineering team through full SDLC: architecture, development, code review, testing, deployment, and ongoing maintenance in an agile environment
  • Define and enforce health information privacy and security standards for highly sensitive PII health data, ensuring rigorous compliance with PIPEDA and provincial health privacy legislation
  • Drive project roadmaps and stakeholder communication — accountable for milestones, quality, and outcomes across the full programme
EXL Service (embedded at Goldman Sachs)
Senior Consultant
May 2022 – July 2023
Ottawa, ON
PySparkPythonFastAPICredit RiskFinancial Services
  • Built and owned the core credit card policy engine using Python, PySpark, and FastAPI — handling Apple Card, Walmart Card, and GM Card fraud and credit risk portfolios
  • Designed Fraud and Credit policy systems capable of processing 1 million financial transactions per hour with full PII compliance and regulatory audit trails
  • Architected large-scale credit risk management REST API handling 1,000+ application requests per minute with sub-second latency
  • Resolved multiple P-0 production incidents, delivering successful outcomes that saved client companies at least $10 million in combined risk exposure
  • Built a Python and py-unit test automation framework that reduced end-to-end testing time by 60%
  • Led technical requirements gathering, tech stack evaluation, and key engineering hiring decisions for the Goldman Sachs engagement
Canopy Growth Corporation
Web System Architect
July 2021 – April 2022
Ottawa, ON
PythonFastAPIDockerAWSMongoDBMicroservices
  • Led web development team through a full waterfall-to-agile transformation, improving delivery efficiency and product quality across 42 company websites
  • Designed and deployed a high-throughput microservice using Python, FastAPI, Docker, and AWS capable of handling 100,000 requests per hour
  • Developed AEM virtualization of Dev and Production environments for all 42 websites using Docker, Python, and AWS — delivering $5 million USD in annual cost savings
  • Built a WCAG accessibility analysis tool covering all 42 company websites using Python, REST API, and SQL, ensuring compliance and improving user experience
  • Designed information architecture for KPI tracking and APIs using Python 3, REST API, and MongoDB across the full Canopy Growth digital estate
  • Performed root cause analysis on multiple infrastructure incidents, improving system stability and availability to 99%+
Manulife
Senior Python Developer
August 2020 – July 2021
Waterloo, ON
AzurePythonPower BICI/CDCloud Infrastructure
  • Built, maintained, and scaled Azure cloud infrastructure of 1,800+ servers (Windows and Linux) with 99.99% uptime SLA
  • Developed automation scripts for server monitoring, patching, and maintenance using Python, REST API, and Azure
  • Built real-time Power BI dashboards for Azure infrastructure monitoring, enabling data-driven operations for the platform team
  • Developed and maintained Python scripts for AXIS and Moody's distributed computational environment — reducing operational cost by 5%
  • Automated CI/CD pipeline using Python, Docker, and Git; reduced debugging time by 45 minutes through automated Azure environment testing
SITA
Python Developer
September 2019 – June 2020
Montreal, QC
PythonAzureMicroservicesPower BIReal-time Systems
  • Designed and developed a real-time airport analytical system integrating multiple hardware (LiDAR, Camera) using Python and reactive programming for operational intelligence
  • Led migration of a large-scale Python 2 codebase to Python 3, modernising core airport systems without service disruption
  • Transformed a legacy monolithic system into a cloud-based microservice architecture on Azure, significantly improving scalability and reliability
  • Reduced backend testing time by 30 minutes by integrating automated testing into the CI pipeline
  • Built a global Power BI data visualisation platform for Azure product line analytics, enabling organisation-wide data-driven decisions
Lakehead University
Research Assistant & Python Developer
November 2017 – May 2019
Thunder Bay, ON
PythonElasticsearchAWSETLNLPResearch
  • Published 3 peer-reviewed research papers on NLP, public health analytics, and distributed data systems — accumulating 89+ citations (NSERC-funded, $7,000/year Discovery Grant)
  • Designed and developed Elasticsearch cluster with 20 nodes capable of searching 330 million tweets per second for real-time social media health analytics
  • Built an end-to-end ETL pipeline and analytical platform on AWS, serving as a primary data source for both the Canada Health Department and internal research teams
  • Developed Random Forest NLP model achieving 93.4% accuracy for population-level public health classification
Datalog.ai
Python Developer
January 2017 – August 2017
Remote
PythonAWSAPIsNLPChatbot
  • Built an asynchronous chatbot analytics API capable of handling thousands of requests per second
  • Developed 5+ real-time visualisation dashboards for chatbot semantic analysis and topic extraction on AWS
  • Designed a clustering algorithm for chat-based decision support systems
  • Created Python and Bash tooling for call centre automation, data conversion, and API integration (REST, JSON, CRUD)
Panchamrut Dairy
Python Developer & Data Analyst
July 2014 – December 2016
Godhra, Gujarat, India
PythonSAPTime-seriesETLPower BI
  • Built real-time data analysis system for raw product cost and transportation logistics using SAP and Python
  • Developed a time-series sales forecasting model for ice-cream product lines achieving 71% prediction efficiency
  • Designed ETL logic and report generation pipeline for sales, cost, and inventory data in multiple formats (Excel, CSV, PDF)
  • Collaborated with data warehouse leads to evaluate and redesign ETL architecture for improved performance

Education

M.Sc. Computer Science
Lakehead University
GPA 3.9 / 4.0 · 2017–2019
Thunder Bay, ON, Canada
🏆 NSERC-funded
M.Sc. Information Technology
Gujarat Technology University
CGPA 8.73 / 10 · 2014–2016
Vadodara, India
B.Eng. Computer Science
Gujarat Technology University
CGPA 7.23 / 10 · 2010–2014
Vadodara, India
🏆 1st Rank — Excellence Certificate

Research & Publications

📄 3 peer-reviewed papers 📊 89+ citations 💰 NSERC Discovery Grant — $7,000/year
A Framework for Social Media Data Analytics using Elasticsearch and Kibana
64 citations
Shah N., Willick D.L., Mago V.K. · Wireless Networks, Springer (2018)
DOI: 10.1007/s11276-018-01896-2
Assessing Canadians Health Activity and Nutritional Habits Through Social Media
25 citations
Shah N., Srivastava G., Savage D.W., Mago V. · Frontiers in Public Health (2020)
DOI: 10.3389/fpubh.2019.00400
The Analysis of Canada's Health Through Social Media Using Machine Learning
Shah N. · Lakehead University Knowledge Commons (2019)

Notable Open Source

Languages

English Native/Bilingual
Hindi Native/Bilingual
Gujarati Native/Bilingual
French Elementary
Sanskrit Limited