Research

Toward AI for science

I'm a Staff AI Engineer becoming a researcher. Day to day I build retrieval and entity-resolution systems over messy real-world data; on my own time I'm working through the mathematics, interpretability, and systems I think AI for science will demand. This page is the work in progress, not a finished résumé.

What I work on

Three threads run through my current work. The first grounds the rest; the other two are where I'm deliberately pushing into research.

Thread 01

Representation learning & retrieval

At Propelus I apply ML to medical licensing, verification, and fraud detection: retrieval, entity resolution, and data matching over structured, sparse, and noisy data. I built a domain-ontology induction pipeline (TNT-LLM-style, orchestrated with LangGraph), fine-tune embedding models, and run two-stage retrieval - a bi-encoder for recall, a cross-encoder reranker for precision. The open question I keep returning to is how embedding architecture and retrieval design trade off when the data is genuinely sparse and dirty. Two ideas anchor the work: retrieval beats classification when the target space keeps shifting, since new or merged categories don't break a retriever the way they break a classifier; and fine-tuned embeddings break the chicken-and-egg between a clean ontology and good retrieval, letting each bootstrap the other.

Thread 02

Mechanistic interpretability

This is the direction I most want to push into next. The plan: use sparse autoencoders (SAEs) to decompose a fine-tuned encoder's embedding space into interpretable features, turning ontology induction into something reproducible and auditable rather than opaque, with runtime feature steering to handle categories that split, merge, or emerge without retraining. I haven't built it yet; right now I'm grounding myself in the foundational interpretability literature (Elhage, Bricken, Park, Turner). It's where my retrieval work and my interest in interpretability converge.

Thread 03

GPU / CUDA as tooling

To understand how models actually compute, I'm writing a transformer inference engine in CUDA from scratch. It's a learning instrument as much as an artifact: working at the kernel level forces an honest account of memory, parallelism, and where the arithmetic really happens - the kind of understanding that's hard to fake from the framework layer down.

What I'm working toward

North star: AI for science, with sequence and protein modeling as the entry point. I don't claim expertise here yet - I'm building the foundations deliberately and in the open.

Mathematical foundations

Proof-based math to make the rest rigorous rather than hand-wavy: linear algebra via Axler, real analysis via Abbott, and geometric deep learning to connect structure and symmetry to learning.

Scientific grounding

A self-directed physics, then chemistry, then biology curriculum - the path toward modeling the systems science actually cares about, rather than treating them as black-box benchmarks.

The entry point

Sequence and protein modeling is where representation learning, interpretability, and scientific domain knowledge all meet. It's the concrete problem I'm orienting the study around.

Background

13+ years in systems and software, including 3+ at Meta. I bring an engineer's bias toward working artifacts and measurable results into research questions that usually get answered by intuition.

Writing

Notes and longer pieces as the work develops. More on the blog.

Papers grounding my work

The literature I keep returning to - the lineage from the foundations of computation and neural networks through the modern architectures and interpretability work my current threads build on. A reading map, not a list I've conquered.

A Logical Calculus of Ideas Immanent in Nervous Activity

McCulloch & Pitts (1943)

Introduced the first mathematical model of artificial neurons, laying the foundation for all neural network research.

On Computable Numbers

Alan Turing (1936)

Defined the Turing machine and established the theoretical foundations of computation and what can be algorithmically solved.

Computing Machinery and Intelligence

Alan Turing (1950)

Proposed the famous Turing Test and raised fundamental questions about machine intelligence and consciousness.

The Perceptron: A Probabilistic Model

Frank Rosenblatt (1958)

Introduced the perceptron algorithm, the first trainable neural network and precursor to modern deep learning.

Learning Representations by Back-Propagating Errors

Rumelhart, Hinton & Williams (1986)

Popularized backpropagation, the algorithm that makes training deep neural networks practical and efficient.

Handwritten Digit Recognition with a Back-Propagation Network

LeCun et al. (1989)

Demonstrated practical deep learning on real-world data, leading to modern computer vision applications.

A Neural Probabilistic Language Model

Bengio et al. (2003)

Introduced neural language modeling, paving the way for modern NLP and transformer architectures.

ImageNet Classification with Deep CNNs

Krizhevsky, Sutskever & Hinton (2012)

AlexNet sparked the deep learning revolution by dramatically improving computer vision performance.

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, Cho & Bengio (2014)

Introduced attention mechanisms, a crucial component of modern transformers and language models.

Generative Adversarial Networks

Ian Goodfellow et al. (2014)

Introduced GANs, revolutionizing generative modeling and creating new possibilities for synthetic data generation.

Deep Residual Learning for Image Recognition

He et al. (2015)

ResNets solved the vanishing gradient problem, enabling training of very deep networks with skip connections.

Attention Is All You Need

Vaswani et al. (2017)

The Transformer architecture revolutionized NLP and became the foundation for GPT, BERT, and ChatGPT.

BERT: Pre-training of Deep Bidirectional Transformers

Devlin et al. (2018)

Demonstrated the power of pre-training and fine-tuning, establishing a new paradigm in NLP.

Scaling Laws for Neural Language Models

Kaplan et al. (2020)

Revealed predictable scaling relationships, guiding the development of increasingly large language models.

Language Models are Few-Shot Learners

Brown et al. (2020)

GPT-3 demonstrated emergent abilities in large language models, showing few-shot learning capabilities.

Learning Transferable Visual Models From Natural Language

Radford et al. (2021)

CLIP bridged vision and language, enabling zero-shot image classification and multimodal AI systems.

An Image is Worth 16x16 Words

Dosovitskiy et al. (2020)

Vision Transformers showed that transformers could replace CNNs, unifying architectures across modalities.

Training Language Models to Follow Instructions

Ouyang et al. (2022)

Introduced RLHF (Reinforcement Learning from Human Feedback), making AI systems more helpful and aligned.

PaLM: Scaling Language Modeling with Pathways

Chowdhery et al. (2022)

Demonstrated continued scaling benefits and emergent reasoning abilities in very large language models.

GPT-4 Technical Report

OpenAI (2023)

Showcased multimodal capabilities and advanced reasoning, representing the current frontier of large language models.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek AI (2025)

Revolutionary reasoning model achieving o1-level performance using pure reinforcement learning, demonstrating cost-efficient training methods.

Introducing GPT-5

OpenAI (2025)

OpenAI's most advanced model with state-of-the-art performance across coding, math, writing, and multimodal understanding.

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI (2025)

OpenAI's first open-weight models since GPT-2, offering advanced reasoning capabilities under Apache 2.0 license.

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi et al. (2025)

Extended image segmentation to video understanding, enabling real-time object tracking and identification across temporal sequences.

Data Shapley in One Training Run

Various Authors (2025)

Breakthrough method for measuring training data contributions with minimal computational overhead, revolutionizing data valuation.