Kuber Shahi

ML Engineer and Researcher

prof_pic.png
Email: kshahi[at]ucsd[dot]edu

Hey, thanks for stopping by!

I’m Kuber. I build things with data. From distributed pipelines that ingest massive, diverse datasets to training language and vision models for practical machine learning systems, I work across the full ML stack, transforming messy, real-world problems into production-ready solutions.

Currently, I’m a Graduate Researcher at UCSD’s Biomedical Image Analysis Group, where I investigate uncertainty quantification for medical image registration, while broadening my ML expertise at UC San Diego through coursework spanning Statistical NLP, Computer Vision, AI Agents, and ML Systems. My work is driven by my research interests in LLM-based reasoning, agentic AI, and deep learning for medical imaging.

Last summer, I was a Machine Learning Intern at Melio, a blood diagnostics biotech startup, where I got to work at the intersection of MLOps and healthcare, rebuilding fragmented research workflows into a robust, production-grade ML training infrastructure for blood diagnostic time series classification. Before that, I spent two years as a Data Scientist at Vayana Network, India’s largest trade credit and supply chain financing fintech, where I built large-scale data infrastructure to streamline invoice processing, developed graph-based tools to uncover customer networks driving business growth, and led an NLP-based entity resolution system to deduplicate and enrich company records, improving data quality at scale.

My foundations in ML go back to my undergraduate years at Ashoka University, where I graduated with honors in Computer Science with a minor in Physics. There, I got my first taste of research, working with Professors Mahavir Jhawar and Debayan Gupta on privacy-preserving machine learning, adversarial attacks on ML models, and cryptographic vulnerabilities in encrypted systems, which culminated in a capstone project implementing secure neural network training protocols in C++.

Beyond my core work, I’ve explored projects spanning LLM agent evaluation, low-resource NLP, abstractive text summarization, blockchain-based applications, and full-stack web development. I enjoy working on problems that require both creativity and technical depth, especially where real-world impact is involved.

Outside of work and research, I’m a big Chelsea FC fan and follow football closely, alongside cricket and Formula 1 occasionally. I also enjoy science fiction, mystery, and thriller films, and like to swim and hike in my free time.

recent updates

Mar 16, 2026 Released findings from our study on LLM agent planning under deception 🤖, evaluating four agent architectures across deceptive text environments.
Jan 05, 2026 Started investigating uncertainty quantification for medical image registration 🧬 at UCSD’s Biomedical Image Analysis Group.
Dec 20, 2025 Wrapped up my Machine Learning internship at Melio 🎉, where I rebuilt ML training infrastructure for blood diagnostic time series classification.