The Fundamental Questions of the AI Revolution
How we might avoid unreasoned hype and hate equally, by understanding reality
I'm a Staff AI Engineer becoming a researcher. Day to day I build retrieval and entity-resolution systems over messy real-world data; on my own time I'm working through the mathematics, interpretability, and systems I think AI for science will demand. This page is the work in progress, not a finished résumé.
Three threads run through my current work. The first grounds the rest; the other two are where I'm deliberately pushing into research.
At Propelus I apply ML to medical licensing, verification, and fraud detection: retrieval, entity resolution, and data matching over structured, sparse, and noisy data. I built a domain-ontology induction pipeline (TNT-LLM-style, orchestrated with LangGraph), fine-tune embedding models, and run two-stage retrieval - a bi-encoder for recall, a cross-encoder reranker for precision. The open question I keep returning to is how embedding architecture and retrieval design trade off when the data is genuinely sparse and dirty. Two ideas anchor the work: retrieval beats classification when the target space keeps shifting, since new or merged categories don't break a retriever the way they break a classifier; and fine-tuned embeddings break the chicken-and-egg between a clean ontology and good retrieval, letting each bootstrap the other.
This is the direction I most want to push into next. The plan: use sparse autoencoders (SAEs) to decompose a fine-tuned encoder's embedding space into interpretable features, turning ontology induction into something reproducible and auditable rather than opaque, with runtime feature steering to handle categories that split, merge, or emerge without retraining. I haven't built it yet; right now I'm grounding myself in the foundational interpretability literature (Elhage, Bricken, Park, Turner). It's where my retrieval work and my interest in interpretability converge.
To understand how models actually compute, I'm writing a transformer inference engine in CUDA from scratch. It's a learning instrument as much as an artifact: working at the kernel level forces an honest account of memory, parallelism, and where the arithmetic really happens - the kind of understanding that's hard to fake from the framework layer down.
North star: AI for science, with sequence and protein modeling as the entry point. I don't claim expertise here yet - I'm building the foundations deliberately and in the open.
Proof-based math to make the rest rigorous rather than hand-wavy: linear algebra via Axler, real analysis via Abbott, and geometric deep learning to connect structure and symmetry to learning.
A self-directed physics, then chemistry, then biology curriculum - the path toward modeling the systems science actually cares about, rather than treating them as black-box benchmarks.
Sequence and protein modeling is where representation learning, interpretability, and scientific domain knowledge all meet. It's the concrete problem I'm orienting the study around.
13+ years in systems and software, including 3+ at Meta. I bring an engineer's bias toward working artifacts and measurable results into research questions that usually get answered by intuition.
Notes and longer pieces as the work develops. More on the blog.
How we might avoid unreasoned hype and hate equally, by understanding reality
The literature I keep returning to - the lineage from the foundations of computation and neural networks through the modern architectures and interpretability work my current threads build on. A reading map, not a list I've conquered.
Introduced the first mathematical model of artificial neurons, laying the foundation for all neural network research.
Defined the Turing machine and established the theoretical foundations of computation and what can be algorithmically solved.
Proposed the famous Turing Test and raised fundamental questions about machine intelligence and consciousness.
Introduced the perceptron algorithm, the first trainable neural network and precursor to modern deep learning.
Popularized backpropagation, the algorithm that makes training deep neural networks practical and efficient.
Demonstrated practical deep learning on real-world data, leading to modern computer vision applications.
Introduced neural language modeling, paving the way for modern NLP and transformer architectures.
AlexNet sparked the deep learning revolution by dramatically improving computer vision performance.
Introduced attention mechanisms, a crucial component of modern transformers and language models.
Introduced GANs, revolutionizing generative modeling and creating new possibilities for synthetic data generation.
ResNets solved the vanishing gradient problem, enabling training of very deep networks with skip connections.
The Transformer architecture revolutionized NLP and became the foundation for GPT, BERT, and ChatGPT.
Demonstrated the power of pre-training and fine-tuning, establishing a new paradigm in NLP.
Revealed predictable scaling relationships, guiding the development of increasingly large language models.
GPT-3 demonstrated emergent abilities in large language models, showing few-shot learning capabilities.
CLIP bridged vision and language, enabling zero-shot image classification and multimodal AI systems.
Vision Transformers showed that transformers could replace CNNs, unifying architectures across modalities.
Introduced RLHF (Reinforcement Learning from Human Feedback), making AI systems more helpful and aligned.
Demonstrated continued scaling benefits and emergent reasoning abilities in very large language models.
Showcased multimodal capabilities and advanced reasoning, representing the current frontier of large language models.
Revolutionary reasoning model achieving o1-level performance using pure reinforcement learning, demonstrating cost-efficient training methods.
OpenAI's most advanced model with state-of-the-art performance across coding, math, writing, and multimodal understanding.
OpenAI's first open-weight models since GPT-2, offering advanced reasoning capabilities under Apache 2.0 license.
Extended image segmentation to video understanding, enabling real-time object tracking and identification across temporal sequences.
Breakthrough method for measuring training data contributions with minimal computational overhead, revolutionizing data valuation.