Landing a Job at a Frontier Lab

A while back I wrote about trying to point a software career at real science, and about the slow realization that I didn’t want to spend my life polishing the craft of engineering for its own sake. The question I kept avoiding, because it sounded ridiculous coming from a mid-career generalist with no PhD, was the practical one. How does someone like me actually get in? How do you go from writing CI/CD glue to working somewhere the frontier is being moved?

Recently Vlad Feinberg, a pre-training area lead on Gemini at Google DeepMind, published How to Land a Job at a Frontier Lab. I had actually already been drafting a post covering this with some thoughts, and this week he sat down with Ryan Peterman for an interview on The Developing Dev that fills in a lot of the gaps. There’s also a talk he gave at Princeton on Gemini pre-training, a year old now but still the clearest short technical grounding I’ve come across. Together they’re the most concrete answer I’ve seen anyone with real standing write down.

This is some commentary on the ideas Vlad is sharing. I’ve also gone and assembled the parts he points at but doesn’t spell out, because I wanted them for myself: what counts as a frontier lab, which ones exist, where you can work if you can’t or won’t move to the Bay Area, and the specific papers and open-source projects you’d study to build the skills he names. The back half of this post is that reference material. It’s long. I haven’t read most of the materials myself (although I’m surprised at how much I’m already familiar with from my own work on representation learning). Skim to the section you need.

What actually is a frontier lab

The phrase gets thrown around loosely, especially given that everyone with a .ai endpoint wants to check that box right now. I’d say a frontier lab is one of the small number of organizations that train general-purpose foundation models from scratch at the largest available compute scales, and whose models sit at or near the capability frontier. The defining act is building the frontier rather than consuming it. And I think the field/application matters a bit, because an org could build a SotA model in an underappreciated domain, but not have the funding to continue pushing that once someone starts taking it to $10m training run scales. That means training runs in roughly the 10^25 to 10^26 FLOP range (the threshold regulators have started using for “frontier” or “systemic-risk” models: 10^25 FLOP under the EU AI Act, 10^26 in several US frameworks), standing up the enormous GPU or TPU clusters those runs need, and shipping models that compete at the top.

This excludes the much larger world of companies that fine-tune, wrap, or route to someone else’s frontier model. Plenty of well-known AI products do exactly that, and there’s nothing wrong with it, but it’s a different kind of work and a different kind of job. I’ve been doing quite a lot of this in my own work. It’s a domain that combines a few different areas: ML, Data Science, and applied engineering work. It’s fun (especially at first when it’s novel), but after a while it takes on the same flavor as typical SWE work. The grass being greener is largely an illusion. I do find applied AI engineering to be more fun than typical SWE work (my title is currently “Staff AI Engineer”, for what it’s worth), but I can feel that over time it will begin to feel mundane.

The label also moves. “Frontier” tracks a target that keeps shifting, so any list is a snapshot. Industry self-identification roughly follows membership in the Frontier Model Forum (Amazon, Anthropic, Google, Meta, Microsoft, OpenAI), and a capability-based quick dive might add xAI, Mistral, DeepSeek, and Alibaba’s Qwen. A couple of well-funded research shops (Safe Superintelligence, Thinking Machines Lab) are best described as aspiring to the frontier, since they haven’t shipped a frontier-scale flagship yet.

The frontier labs

Everything below was verified as of June 2026. I did this as a case study of a moment in time, and I don’t plan to update this constantly. That doesn’t mean I won’t, as I do find myself interested in this more and more each day. Personally I am focusing on actionable projects and open source contributions over fantasizing about a specific job/company/role. However, to me the below is probably the most important aspect of the discussion on “working at a frontier lab”. For myself, relocating to CA is almost certainly off the table. Relocating and being in office is, in general, deeply unappealing to me. I imagine the same is true of many people across the world. Maybe this means we aren’t meant for the field. I don’t like that framing, and I want to believe that someone who is a capable independent researcher with valuable skills will find a spot for themselves. I sort of need to believe this right now, or much of my free time is being absolutely wasted, and I should be playing videogames right now instead of researching the state of frontier labs.

Careers URLs, office policies, and even company status change fast, so treat it as a starting point and confirm against the live posting for the specific role before you rely on any of it. Remote policies especially have tightened across the big labs over the last couple of years.

United States

The US labs: OpenAI, Anthropic, Google DeepMind, Meta, xAI, Microsoft AI, Amazon, SSI, Thinking Machines

OpenAI

OpenAI makes ChatGPT and the GPT models. HQ in San Francisco, with hubs in New York, London, Dublin, Tokyo, and Singapore. Office-first and not meaningfully remote. Sam Altman has said publicly that the fully-remote experiment is over, and the few remote roles cluster in security, infrastructure, and deployment. Careers: openai.com/careers (the live board is on Ashby).

Anthropic

Anthropic is the safety-focused lab behind Claude. HQ in San Francisco, with offices in New York, Seattle, London, and Zürich. Hybrid and office-forward. Their careers page is refreshingly blunt about it: “Most staff are in the Bay Area and come to the office regularly. Some live further away and come in for one week a month.” Many roles are tagged “Remote-Friendly (Travel-Required).” Careers: anthropic.com/careers. If “Travel-Required” means going to the Bay Area for a week every month…yuck.

Google DeepMind

Google DeepMind is Alphabet’s unified AI lab, building Gemini. HQ in London (King’s Cross), with around ten offices including Mountain View, New York, Paris, Zurich, Toronto, Montreal, Bangalore, and Tokyo. Roles are anchored to a specific office city and sit under Google’s company-wide roughly three-day hybrid expectation, which has been tightening. Fully-remote roles are rare. Careers: deepmind.google/about/careers. If I’m being honest DeepMind is the most attractive place to me. I’d love to live in London, but I don’t think it fits my lifestyle very well. I’d have to move a family and (too many) pets halfway across the world. To have as much space as I do now I’d need to live at least an hour out by train (their office is directly at King’s Cross, which I like to think would help my finally fulfill my fantasy of being a Hogwart’s student).

xAI

xAI builds Grok. HQ in Palo Alto, with major compute in Memphis. Mostly hybrid, with Palo Alto roles trending toward full in-office and a small number of remote US roles. Their careers page bot-blocks automated checks, so open it in a browser to confirm. Careers: x.ai/careers.

Microsoft AI

Microsoft AI (MAI), under Mustafa Suleyman, has moved from wrapping others’ models to training its own foundation models from scratch, and is a Frontier Model Forum founding member. HQ in Redmond, with MAI presence in Mountain View, New York, and London. MAI specifically has run a stricter in-office expectation than the company at large (around four days a week as of early 2026). Careers: microsoft.ai/careers. Microsoft Research is a separate, more open-publication org: microsoft.com/research/careers.

Amazon, Nova, etc

Amazon (AGI Lab and Nova) trains the Nova foundation models from scratch on its own Trainium and Inferentia silicon, and is on the Frontier Model Forum roster, though it has historically leaned toward price and performance rather than topping leaderboards. AGI labs in San Francisco and Boston; corporate HQ in Seattle. Amazon mandates five days a week in office company-wide. Careers: amazon.jobs (AGI team).

SSI

Safe Superintelligence Inc. (SSI) is Ilya Sutskever’s research-only company, with no public product by design. Offices in Palo Alto and Tel Aviv. Co-located in those two cities, with no remote language on the site. Careers: ssi.inc (board).

Thinking Machines Lab

Thinking Machines Lab is Mira Murati’s startup. It has shipped a fine-tuning API (Tinker) and research previews rather than a frontier flagship so far. Based in San Francisco, and every listed role is in San Francisco. Careers: thinkingmachines.ai (board). I keep seeing this listed as a Frontier Lab on other lists/sources, but I think, going by my original definition above, it doesn’t quite feel like the right classification.

Europe

Europe: Mistral AI

Mistral AI is the Paris lab behind a mix of open-weight and commercial models plus Le Chat. HQ in Paris, with offices in London, Palo Alto, and New York. Hybrid-leaning and role and geography dependent, with relocation and visa support, and a handful of postings that allow remote. Careers: mistral.ai/careers (Lever board).

This is quite honestly all I could find atm, but that doesn’t mean there’s not more.

China

A pattern holds across the Chinese labs: in-office, China-based (mostly Beijing, Shanghai, or Hangzhou), with careers portals that are Chinese-language JavaScript apps on platforms like MokaHR, Feishu, or Lark. None show evidence of hiring core R&D outside China, so treat the in-office, relocation-expected read as a well-supported inference rather than a stated policy. This probably also means you need to be able to at least speak and understand Mandarin fluently. I myself have been learning Mandarin (for very different reasons), and speaking honestly doesn’t seem so bad (tones are hard but not impossible). Like any language, vocabulary and speed are likely the most difficult things.

China: DeepSeek, Qwen, Moonshot, Zhipu, MiniMax

DeepSeek, the open-weight lab spun out of the quant fund High-Flyer, with offices in Hangzhou and Beijing. Careers run through MokaHR and recruiting platforms. Qwen / Alibaba’s Tongyi Lab in Hangzhou, hiring via talent.alibaba.com. Moonshot AI (Kimi) in Beijing, which unusually runs an English-capable Ashby board alongside its Chinese portal. Zhipu AI / Z.ai (GLM), a Tsinghua spinout in Beijing, now publicly traded, hiring through a Feishu portal. MiniMax in Shanghai, also now public, with an English careers page that routes applications through Feishu.

When the frontier isn’t an option

The honest reality for most of us is that a frontier lab is a long shot, and the locations are non-negotiable for a lot of people with families, mortgages, or visa constraints. The good news Vlad keeps returning to is that the frontier is a long perimeter with a huge amount of real work along it, and much of that work happens at companies one or two steps removed. Those places are more accessible, often more remote-friendly, and they let you build exactly the public track record that a frontier lab hires on.

I’m splitting these into two rough tiers. Tier 2 is serious, well-funded companies that build or heavily adapt their own models, or that provide critical AI infrastructure and inference at scale. Tier 3 is research institutes, open-source collectives, smaller startups, and the leading academic and national labs. The tiering is a judgment call and is largely based on publicly available compensation numbers (that I have no means of fact checking) and relative scale of impact from projects.

Tier 2: model and multimodal labs

Model and multimodal labs: Cohere, Reka, Liquid AI, Sakana, Contextual, Imbue, World Labs, AI21

Cohere builds enterprise and sovereign foundation models (the Command family) out of Toronto, with offices in San Francisco, New York, Montreal, London, Paris, and Seoul. Remote-first culture. Careers: cohere.com/careers, ashby board. Some roles listed directly as Remote, others as Hybrid, though many list location directly.

Reka AI builds multimodal models (text, image, audio, video) across the SF Bay Area and Singapore, globally distributed and remote-friendly. Careers: jobs.ashbyhq.com/reka. Didn’t spend much time fact checking this one.

Liquid AI, an MIT CSAIL spin-off, builds efficient edge and on-device foundation models on non-transformer architectures. Based in the Boston area, with roles in SF, New York, and Tokyo. Primarily hybrid. Careers: jobs.ashbyhq.com/liquid-ai. From what I can tell despite listing some remote, you are largely expected to live either in SF or Boston.

Sakana AI in Tokyo uses nature-inspired and evolutionary methods, and is Japan’s most prominent AI lab. In-office in Tokyo for every listed role. Careers: sakana.ai/careers. I bet living in Tokyo would be pretty neat though!

Contextual AI builds enterprise RAG for regulated industries; its CEO co-authored the original RAG paper. Mountain View, with New York and London. Hybrid, in-office leaning. Careers: contextual.ai/careers. No open roles at time of writing.

Imbue is a San Francisco research lab building reasoning models and coding agents. It offers both in-person and fully-remote arrangements, which is unusual at this level. Careers: imbue.com/careers. All open roles currently marked specifically as being in SF.

World Labs, Fei-Fei Li’s spatial-intelligence startup, builds large world models that generate persistent 3D scenes. San Francisco, onsite. Careers: job-boards.greenhouse.io/worldlabs. Again of existing companies that I’m familiar with, World Labs ranks right at the top of my interests along with the Bio-adjacent orgs. Sadly all SF, although that’s not surprising.

A note for the curious: AI21 Labs (Tel Aviv, Jamba models) reportedly scaled back sharply in 2026 and its careers page showed no open roles when I checked, so confirm its status before investing time. Verify before applying: ai21.com/careers. I wasn’t overly thorough, but a few archive checks make me think it’s been empty like this for some time (maybe all of 2026, at least).

Tier 2: infrastructure, inference, and serving

This is the category I’d point most working engineers toward, because the skills transfer cleanly from ordinary backend work and the public-contribution path is wide open. Also I’d expect more of them to be open to actually hiring remotely. Let’s find out together.

Infrastructure and inference: Together, Fireworks, Baseten, Modal, Groq, Cerebras, SambaNova, Databricks, Snowflake

Together AI runs an “acceleration cloud” of GPU infrastructure plus inference and fine-tuning APIs. San Francisco, with roles in Amsterdam, London, Singapore, and India, mostly Bay Area-centric. Careers: job-boards.greenhouse.io/togetherai. No remote. Off to a shaky start lol.

Fireworks AI is a production inference cloud founded by Meta PyTorch and Google Vertex veterans. Redwood City, hybrid with some fully-remote US roles tagged on the board. Careers: job-boards.greenhouse.io/fireworksai. However, engineering seems expected to be onsite.

Baseten does mission-critical AI inference infrastructure. San Francisco, and the most openly remote-friendly of this group (work-from-anywhere, SF, or hybrid). Careers: jobs.ashbyhq.com/baseten. Brief browsing of the job descriptions makes it seem like they still largely expect engineers to be onsite for a large chunk of their time.

Modal is an AI-native serverless compute platform driven from Python. New York and San Francisco, hybrid. Careers: jobs.ashbyhq.com/modal.

Groq (custom LPU inference hardware, Mountain View, jobs.gem.com/groq), Cerebras (wafer-scale systems, Sunnyvale, job-boards.greenhouse.io/cerebrassystems), and SambaNova (custom accelerators, Palo Alto, sambanova.ai/company/careers) are the leading non-NVIDIA hardware plays, all hybrid and in-office leaning given the silicon work.

Databricks (Mosaic AI) trains, fine-tunes, and serves models on top of its data platform. San Francisco with offices worldwide, hybrid-first. Careers: databricks.com/company/careers. Snowflake (Cortex AI) serves models where customer data lives, is distributed, and runs a lighter in-office expectation. Careers: careers.snowflake.com.

Tier 2: applications and big-tech divisions

Applications and big-tech AI: Perplexity, Cursor, Inflection, Character.AI, Apple, Scale, Hugging Face

Perplexity AI (answer engine, San Francisco, hybrid, jobs.ashbyhq.com/perplexity) and Cursor / Anysphere (AI code editor, San Francisco, mostly in-person, cursor.com/careers) are two of the highest-growth application companies, though both build on others’ models more than their own.

A couple of status notes worth knowing. Inflection AI pivoted to enterprise after Microsoft hired away its founders, and now operates as a B2B model company out of Palo Alto (inflection.ai/careers). Character.AI stopped building its own foundation models after Google’s licensing deal and now focuses on the product layer, with heavy Trust and Safety hiring (Redwood City, character.ai/careers). Both are useful examples of how fast these companies change shape.

The big-tech AI divisions are real options too, and tend to be more remote-tolerant than the pure frontier labs. Apple’s ML and AI group (Cupertino, hybrid, builds the Apple Foundation Models, jobs.apple.com) and Amazon AGI (covered above under the frontier labs) both train or heavily use frontier-scale models. NVIDIA gets its own section below.

Scale AI (training data and evals, San Francisco, scale.com/careers) and Hugging Face (the open-source model hub, fully remote and globally distributed, huggingface.co/careers) round out the infrastructure side. Hugging Face in particular is remote-first and central to the open ecosystem, which makes it a strong landing spot if relocation is off the table.

Tier 3: research institutes and open-source collectives

This tier is often the most accessible and the most remote-friendly, and for several of these the real entry point is a Discord or a GitHub repo, not a jobs board.

Research institutes and open-source collectives: Ai2, EleutherAI, Nous, LAION, Prime Intellect, Stability

Allen Institute for AI (Ai2) is the standout champion of fully open AI: open weights and open data, code, and recipes (OLMo, Molmo, Tülu). Seattle, mostly in-office. Careers: allenai.org/careers.

EleutherAI is the nonprofit open-source lab behind The Pile and the Pythia suite, fully distributed, with a public Discord and a mentorship program, and an explicit “no PhD, no problem” stance. Get involved: eleuther.ai/community.

Nous Research is a leader in American open-source models (the Hermes series) and distributed training (Psyche), fully remote. It even runs a real jobs page, which is rare for a collective: nousresearch.com/careers.

LAION is the German nonprofit behind the LAION-5B dataset that trained Stable Diffusion, a distributed volunteer network rather than an employer. Get involved: laion.ai.

Prime Intellect builds decentralized training infrastructure that lets anyone contribute compute to open model runs. San Francisco, hybrid with some remote-eligible roles. Careers: primeintellect.ai/careers. Stability AI survived its 2024 turmoil and continues on generative creative tools out of London, hiring selectively (stability.ai/careers).

Tier 3: academic and national labs

For these, “careers” usually means PhD admissions, postdoc fellowships, or research software engineer roles, and the work is in-residence unless noted. The most relevant for foundation-model work:

Academic and national labs: Stanford, Berkeley, MIT, CMU, Princeton, ETH/EPFL, Mila, plus the DOE labs

Stanford: HAI, SAIL, and CRFM, the group that coined “foundation models.”
UC Berkeley BAIR, whose Sky Computing Lab is where vLLM originated.
MIT CSAIL, CMU MLD and LTI, and Princeton PLI, which runs a sizable academic GPU cluster and is the most remote-tolerant of the set (its research-software-engineer posting notes flexible arrangements).
ETH Zurich and EPFL’s Swiss AI Initiative, which shipped the fully open multilingual Apertus model: ai.ethz.ch, ai.epfl.ch.
Mila in Montreal, founded by Yoshua Bengio, with the clearest formal jobs page of the academic set: mila.quebec/en/recruitment.

On the US national labs, Argonne (the Aurora supercomputer and AuroraGPT), Oak Ridge (Frontier), and Lawrence Berkeley (NERSC) all do frontier-scale AI for science, mostly on-site, with a shared intern pipeline through DOE SULI.

AI-adjacent companies

Not every good seat is at a lab. The frontier runs on hardware and rented capacity that a handful of companies build, and these are some of the most remote-tolerant employers in the whole space. NVIDIA in particular touches nearly every model trained anywhere, which is exactly why I was surprised it didn’t make my first pass. If your skills are on the systems and kernel side, this is a real path.

Chips and clouds: NVIDIA, AMD, CoreWeave, Lambda, Crusoe

NVIDIA is the company the rest of this post quietly depends on. Its GPUs and the CUDA stack run almost every frontier training run, and beyond the silicon it ships the dominant inference software (TensorRT, the Dynamo serving stack), maintains a large research org in NVIDIA Research, and trains its own open models (the Nemotron family). Santa Clara, with research spread across many sites. It is a notable return-to-office holdout, with location driven by the team rather than a company-wide mandate, which makes it more remote-tolerant than most of the labs. Careers: nvidia.com careers, research roles at research scientists.

AMD is the main GPU challenger (the Instinct MI accelerators and the ROCm software stack). The interesting angle for an outsider is that ROCm is the open competitor to CUDA and is still catching up, which means a large, public surface area you can contribute to, in exactly the low-level systems work frontier labs hire for. Santa Clara. Careers: careers.amd.com.

The GPU clouds (neoclouds) rent frontier-scale capacity and hire heavily for infrastructure, networking, and serving engineers, often with more remote flexibility than the labs. The notable ones: CoreWeave (coreweave.com/careers), Lambda (lambda.ai/careers), and Crusoe (crusoe.ai/about/careers). The hyperscaler clouds (AWS, Google Cloud, Azure) run their own large AI-infrastructure orgs too, though those blur into the big-tech divisions above.

Work at the edges, not the center

With the map laid out, here’s Vlad’s actual strategic advice, the part that most changed how I think about this. The obvious plan is the wrong plan. You don’t get a frontier job by trying to out-pretrain a frontier lab from your bedroom. The compute alone makes it a non-starter, and the people already inside are the cohort that a decade ago went to Jane Street and Citadel and now go to OpenAI, Anthropic, and Google DeepMind. You will not win that fight on those terms.

His reframe is that the interesting, enterable work sits at the edges of the model rather than the center. There’s a whole stack below the LLM and an emerging space above it, and both are starved for people.

You don’t get a frontier job by trying to out-pretrain a frontier lab from your bedroom. The enterable work is at the edges of the model.

Below the stack: kernels and inference

Vlad’s word for the work below is kernels, “carefully designed pieces of code which compute part of a neural network program.” Every lab needs this and the demand is, in his words, basically bottomless. It’s GPU-level work, but it doesn’t require access to a pretraining run to get good at. You can do it on a free Colab TPU.

What makes it interesting rather than rote is that it rewards the kind of lateral, systems-level thinking that LLMs are still bad at. His favorite example is the Flash Attention insight. The trick wasn’t a cleverer matmul. It was noticing that the binding constraint was memory bandwidth, not raw FLOPs, a variable that’s obvious in hindsight but had to be seen. Once you model that, you restructure the operation to avoid writing intermediate values to slow memory. He describes the general skill as zooming out until you find the constraint nobody else has modeled yet.

Above the stack: agents

The other direction is the work above the model, which Vlad describes as setting up rigorous, controlled experiments that measure how single or multiple LLM agents behave. He’s candid that this area is too new to have a clean path, which I found reassuring, since it means the map isn’t drawn yet. He points at Karpathy’s autoresearch experiments and the AlphaEvolve and FunSearch line of work as the shape of it.

Research is not engineering

The part of the interview that stuck with me was Vlad’s explanation of why a strong backend engineer can get dropped onto a research team and stall out. He frames it in terms of Markov decision processes. Engineering is a mostly deterministic graph. You build the service, then the next piece, then the next, and effort converts to progress. Research is a stochastic graph, where the outcome of any given path is uncertain, and the actual skill is estimating, before you walk it, which paths are even worth attempting.

That a-priori estimation is what he calls research taste, and it isn’t vibes. It comes from absorbing the literature so thoroughly that you’ve internalized the full state of the field’s bleeding edge in your corner of it. The engineer stalls not because they can’t code but because they haven’t read the citation tree, can’t fluently read a Kaplan or Chinchilla or PaLM paper and reimplement the idea, and don’t carry the context that LLM training is a one-shot extrapolation problem. You’re predicting test loss for a run at a scale nobody has ever attempted, not iterating on ImageNet where you can just try again.

Engineering is a deterministic graph, where effort converts to progress. Research is a stochastic graph, and the skill is estimating which paths are worth attempting at all.

I’ll say plainly why this landed. My whole instinct as an engineer is to value shipping, to value the climb where work reliably turns into progress. Hearing that the frontier runs on a different currency, taste accumulated from reading you do before anyone pays you for it, reframes the homework. I won’t close it by grinding harder at the thing I’m already good at. It’s a different muscle, and I have to train it from cold.

The skills, and what to actually read

So what do you read to build that taste and those skills? Below is a study list organized by the skill areas Vlad and the interview name, with real papers and the canonical sources. I’ve ordered each list roughly foundational to advanced. If you only pick one area, the systems-and-inference side has the cleanest path from ordinary engineering.

Kernels and low-level performance

The mental model first, then the hardware, then the kernels.

Roofline, the CUDA guide, online softmax, Triton, FlashAttention 1-3, ThunderKittens

Roofline: An Insightful Visual Performance Model (Williams, Waterman, Patterson, 2009). Arithmetic intensity, and whether you’re compute-bound or memory-bound.
The CUDA C++ Programming Guide (NVIDIA). The execution and memory model every later paper assumes.
Online normalizer calculation for softmax (Milakov, Gimelshein, 2018). The streaming-softmax trick that makes Flash Attention possible. Read it first.
Triton (Tillet, Kung, Cox, 2019). Tile-based GPU kernels in Python, the practical entry point.
FlashAttention, FlashAttention-2, and FlashAttention-3 (Dao et al.). The landmark IO-aware kernel and its evolution toward async pipelining and low precision.
ThunderKittens (Spector et al., 2024). A tile-based embedded DSL, the modern way to write competitive kernels.

Quantization

LLM.int8(), SmoothQuant, GPTQ/AWQ, the QuIP line, AQLM, SnapKV, FP8/FP4 formats

LLM.int8() (Dettmers et al., 2022). The outlier story: why naive INT8 breaks and how mixed-precision fixes it.
SmoothQuant (Xiao et al., 2023). Migrate quantization difficulty from activations into weights.
GPTQ (Frantar et al., 2022) and AWQ (Lin et al., 2024). The two workhorse post-training methods, worth contrasting.
QuIP, QuIP#, and QTIP (Cornell). The incoherence-processing line that pushed weight-only quantization toward two bits.
AQLM (Egiazarian et al., 2024). The additive-codebook branch to contrast with QuIP#.
SnapKV (2024). KV-cache compression, essential for long-context serving.
For low-precision number formats: FP8 Formats for Deep Learning, FP8-LM, and the recent push into FP4 training and NVFP4 pretraining.

Scaling laws and pre-training

Kaplan, Chinchilla, muP, GSPMD/PaLM, the JAX Scaling Book, DeepSeek-V3

Scaling Laws for Neural Language Models (Kaplan et al., 2020). The bedrock.
Training Compute-Optimal LLMs (Chinchilla) (Hoffmann et al., 2022). The correction: scale model and data together.
muP / Tensor Programs V (Yang et al., 2022). Tune hyperparameters on a tiny proxy and transfer them zero-shot.
GSPMD (Xu et al., 2021) and PaLM (Chowdhery et al., 2022). Compiler-based sharding, and the canonical large-scale case study (and the standard MFU definition).
The JAX Scaling Book (Austin et al., 2025). The best practical primer on rooflines and parallelism. This is the one Vlad tells you to work through exercise by exercise.
The DeepSeek-V3 Technical Report (2024). The modern recipe end to end.

Distributed training and systems

DistBelief, GPipe/PipeDream, Megatron-LM, ZeRO/FSDP, activation recomputation, Ring Attention

Large Scale Distributed Deep Networks (Dean et al., 2012). The parameter server and async SGD, the origin of the gradient-staleness story.
GPipe and PipeDream. Synchronous and asynchronous pipeline parallelism, worth reading as a pair.
Megatron-LM (Shoeybi et al., 2019). The standard for tensor parallelism.
ZeRO (Rajbhandari et al., 2019) and PyTorch FSDP (Zhao et al., 2023). Sharding optimizer states and parameters, in theory and in production.
Reducing Activation Recomputation (Korthikanti et al., 2022) for sequence parallelism, and Ring Attention (Liu et al., 2023) for context parallelism.

Inference and serving

Orca, PagedAttention/vLLM, speculative decoding, Splitwise/DistServe, MFU and MBU

Orca (Yu et al., 2022). Continuous batching and iteration-level scheduling.
PagedAttention / vLLM (Kwon et al., 2023). OS-style paging for the KV cache, the foundation of vLLM.
Speculative decoding (Leviathan et al., 2022) and the DeepMind derivation (Chen et al., 2023), plus Medusa (2024) for self-speculation.
Splitwise (2023) and DistServe (2024). Disaggregating prefill and decode, the reference for modern serving.
For the efficiency metrics, MFU is defined in PaLM and MBU in this Databricks engineering post.

Reinforcement learning and post-training

Sutton & Barto, RLHF roots, InstructGPT, DPO, GRPO and DeepSeek-R1

Reinforcement Learning: An Introduction (Sutton and Barto), and OpenAI’s Spinning Up as the code-first companion.
Deep RL from Human Preferences (Christiano et al., 2017) and PPO (Schulman et al., 2017). The roots of RLHF.
InstructGPT (Ouyang et al., 2022). The full SFT to reward-model to PPO pipeline.
DPO (Rafailov et al., 2023). Preference optimization without the RL loop.
DeepSeekMath (introduces GRPO) and DeepSeek-R1. The RL-for-reasoning work everyone is building on now.

Mixture-of-Experts

Shazeer's sparse MoE, GShard/Switch, Tutel, Mixtral, DeepSeekMoE

Outrageously Large Neural Networks (Shazeer et al., 2017). The foundational sparse MoE layer.
GShard (2020) and Switch Transformers (2021). Expert parallelism and simplified routing at scale.
Tutel (2022) for the systems side, and Mixtral (2024) and DeepSeekMoE (2024) for concrete modern designs.

Compilers and DSLs for ML

Halide, TVM, XLA, Triton, ThunderKittens, Mojo, Parrot

Halide (Ragan-Kelley et al., 2013). Decoupling algorithm from schedule, the idea under most ML compilers.
TVM (Chen et al., 2018) and XLA / OpenXLA. The full compiler stacks behind PyTorch and JAX.
Triton and ThunderKittens again, as the DSLs people actually write kernels in.
Mojo (Modular, from MLIR and LLVM creator Chris Lattner). A Python-superset systems language that compiles through MLIR toward hand-tuned-kernel performance, betting that one language can span Python ergonomics and GPU-kernel speed.
Parrot (NVlabs). A header-only C++20/CUDA library that uses expression templates for implicit fusion and lazy evaluation, composing element-wise math, reductions, and reshapes without materializing intermediates. A library-level take on the same avoid-the-slow-memory idea behind Flash Attention. I’ve actually landed a few PRs in the Parrot library recently.

This is the corner of all of this that pulls at me most, and it’s why I’ve been building cljrs, a from-scratch Clojure dialect in Rust with MLIR/LLVM native codegen and a GPU path in WGSL, no JVM. It’s experimental and mostly a learning project, but it sits on the same programming-language-meets-systems seam as the tools above. In that project I’ve included a small DSL that compiles to Mojo (called Clojo, for fun), and I’m working on the early stages of a cljrs -> Parrot pipeline, as well as direct kernel fusion via macros. Check itout via the docs.

Distillation

Hinton, sequence-level KD, DistilBERT, MiniLLM, on-policy distillation

Distilling the Knowledge in a Neural Network (Hinton et al., 2015). Soft targets and temperature, the origin.
Sequence-Level Knowledge Distillation (Kim, Rush, 2016) and DistilBERT (Sanh et al., 2019).
MiniLLM (Gu et al., 2023) on reverse-KL distillation, and Thinking Machines’ more recent On-Policy Distillation for efficient reasoning transfer.

Agents

ReAct, Toolformer, Reflexion, Voyager, SWE-bench, FunSearch, AlphaEvolve

ReAct (2022), Toolformer (2023), and Reflexion (2023). The reasoning-and-acting loop, learned tool use, and self-correction.
Voyager (2023) for open-ended skill acquisition, and SWE-bench (2023) for how coding agents are actually measured.
FunSearch (2023) and AlphaEvolve (2025). Search over programs, the line Vlad points at for the work above the model.

Finding more papers

This list is only a starting point. When a paper grabs you, the fastest way to find its neighborhood is Connected Papers, which builds a visual graph of related work from a single seed paper. It uses co-citation and bibliographic coupling rather than just direct citations, so it surfaces the siblings of a paper and not only its ancestors. It’s the one tool I’d shout out loudest here. Drop in any paper above and you’ll have a map of the subfield in a few seconds, which is exactly how you turn that “citation tree” from a metaphor into something you can walk.

A few others that do similar jobs from different angles: Semantic Scholar for citation context and recommendations, Papers with Code for papers tied to implementations and benchmark leaderboards, Research Rabbit and Litmaps for building a living citation map you keep adding to, and plain Google Scholar’s “cited by” and “related articles” links when you just need the nearest neighbors fast.

Signaling it in public

Reading builds the taste. The thing that actually lands you the interview is a public artifact, and the highest-signal artifact is a real contribution to a respected open-source project. Vlad is explicit that contributions to vLLM, SGLang, or TensorRT are an extremely positive signal, because a merged performance or correctness PR there demonstrates exactly the systems skills frontier inference and training teams hire for.

You don’t need a novel algorithm to start. In rough order of accessibility: pick a good first issue and scope it to one file; fix a broken example or a confusing doc, which forces you to actually run the code; reproduce a benchmark or file a clean regression report; take an “add support for model X” or “add backend Y” issue, which teaches you the architecture fast; or contribute a new eval task, which most eval frameworks explicitly solicit. Read the contributing guide, build from source, run the tests, and engage in the issue before you write code. A clean PR that follows the project’s conventions signals far more than volume.

Here’s the index, by skill area. All paths were verified active as of June 2026, and I’ve flagged the ones that have moved or gone stale so you don’t waste effort.

Inference and serving engines

vLLM, SGLang, TensorRT-LLM, llama.cpp, LMDeploy, MLC-LLM, ExLlamaV3

vLLM (Python, CUDA, C++). The reference high-throughput serving engine, and the single highest-signal repo on this list. Enormous surface area for model support, kernels, and scheduling.
SGLang (Python, CUDA, Rust). LMSYS’s production serving engine, co-equal with vLLM in hiring signal and very active.
TensorRT-LLM (Python, C++/CUDA). NVIDIA’s optimized stack; contributing signals you can work at the kernel-runtime boundary.
llama.cpp (C/C++). The de-facto local inference engine. Note the org moved from ggerganov to ggml-org.
LMDeploy, MLC-LLM (compiler-based deployment), and ExLlamaV3 (use V3, not the now low-maintenance V2).
Skip TGI as a contribution target; it was archived in 2026 and the ecosystem moved to vLLM and SGLang.

GPU kernels and DSLs

Triton, CUTLASS, FlashAttention, ThunderKittens, Liger-Kernel

Triton (Python DSL). The dominant kernel DSL. Note it moved from openai/triton.
CUTLASS (CUDA C++). Deep, hard, and respected; real tensor-core expertise.
FlashAttention (CUDA / CuTeDSL). About as load-bearing as a kernel repo gets.
ThunderKittens (CUDA tiles) and Liger-Kernel (Triton). Lower barrier to writing fast kernels, and Liger is an especially friendly first stop for Triton training kernels.

Quantization

llm-compressor, GPTQModel, torchao, bitsandbytes, HQQ

llm-compressor (Python). The vLLM-ecosystem PTQ library, the most strategically current quant repo.
GPTQModel, the active successor to the now-deprecated AutoGPTQ.
torchao (PyTorch-native quant and sparsity) and bitsandbytes (the ubiquitous k-bit library).
HQQ for calibration-free quantization. Skip AutoAWQ (archived) and AutoGPTQ (deprecated).

Training frameworks and distributed

Megatron-LM, DeepSpeed, torchtitan, Axolotl/Unsloth, LitGPT, GPT-NeoX, nanochat

Megatron-LM (Python, CUDA). The reference large-scale training framework.
DeepSpeed (ZeRO and offload; org moved to deepspeedai) and torchtitan, the best entry point for modern FSDP and parallelism.
Axolotl and Unsloth (the latter bridges training and Triton kernels). LitGPT and EleutherAI’s GPT-NeoX for clean from-scratch implementations.
nanochat, Karpathy’s full ChatGPT-pipeline-from-scratch. Use this rather than the now-deprecated nanoGPT, as a learning and portfolio project.

JAX ecosystem

JAX, Flax, Optax, Equinox, MaxText, Levanter

JAX (note the org is jax-ml), Flax, Optax, and Equinox.
MaxText, Google’s scalable pure-JAX LLM reference, and Levanter for bitwise-reproducible training. Both signal TPU-scale training skill.

RL and post-training

TRL, verl, OpenRLHF, alignment-handbook

TRL (the default open post-training library, GRPO/DPO/PPO), verl (one of the most-used scalable RL stacks), and OpenRLHF (Ray and vLLM based).
alignment-handbook for recipe and reproduction contributions. Skip trlx, which is effectively stale.

Agents, orchestration, and evals

OpenHands, DSPy, smolagents, LangChain, LlamaIndex, eval harnesses

OpenHands (leading open SWE-agent; org moved from All-Hands-AI), DSPy (programming, not prompting), and smolagents (a tiny, readable codebase ideal for a first contribution).
LangChain and LlamaIndex for applied agent work.
For evals: lm-evaluation-harness (the de-facto backend), lighteval, and UK AISI’s Inspect. Inspect’s evals repo explicitly takes community-contributed evals, a clean way to land a frontier-adjacent merged PR.

Data and tokenization

tokenizers, datatrove, nanotron, dolma

tokenizers (Rust plus Python bindings), a genuinely good place to show Rust and performance skill.
datatrove (the engine behind FineWeb), nanotron (minimal 3D-parallelism pretraining, with a real contributor guide), and Ai2’s dolma.

The part that isn’t about skills

Two things from the interview I want to hold onto, because they’re what I’m most likely to forget when I get anxious about credentials.

The first is about the AI-doom spiral that I, and probably you, fall into. Vlad’s counter is that you can’t hand accountability to a model. His example is law. An LLM understands the law better than most people, but it can’t represent you in court, because it can’t be disbarred. Accountability needs a human who can be held responsible and who allocates the organization’s resources. He says the whole post was a reaction to fear-mongering, and the message is that we have agency over our future and can invest today in the skills that will matter tomorrow. After a year of low-grade dread about all of this, I needed to hear it framed as agency rather than fate.

The second is his advice to his younger self, which could have come straight out of the thing I wrote about meaning. Don’t be afraid of the menial-sounding part of an important problem; by his own account he never turned up his nose at menial work, and he got where he is through consistent local optimization, reassessing every six months, rather than one heroic leap. And be someone people want to see succeed. Not the workplace-politics version, the opposite of it: help other people shine, lend your complementary skills, and people will want to back your projects in turn. He credits a mentor, Todd Lipkin, as the person who embodied it.

That last one quietly undercuts the grim premise I’d been carrying, that getting somewhere serious means out-competing a field of people sharper than me. The advice from someone on the inside is to make other people better and accept the humble slice of a problem that matters. I find that both more humane and more achievable than the alternative.

What I’m taking from it

I’m not going to pretend that reading two posts and an interview closes a decade-wide gap, because it doesn’t. But it turned a vague longing into something with edges. The path isn’t to become a once-in-a-generation researcher. It’s to pick the part of the stack that fits how my brain works, which for me is probably the systems and inference side, since lateral systems thinking is the one place my engineering instincts transfer cleanly, and then do the unglamorous, verifiable work of getting good at it in public, reading the literature like it’s the job and building something real enough that other people use it.

And if a frontier lab never works out, the rest of this post is the consolation that it doesn’t have to. The same skills and the same public track record open doors all along the perimeter, at the infrastructure companies, the open labs, and the research institutes that are doing real work and will actually hire you remotely. That’s enough to start.

The framing here is Vlad Feinberg’s. Read the original post and watch the interview with Ryan Peterman; both are richer than my summary. The Princeton talk slides are the technical companion. The lab index, remote notes, paper links, and repo paths were verified in June 2026 and will drift, so confirm anything you’re about to act on.

🐍 Snake

Landing a Job at a Frontier Lab

What actually is a frontier lab

The frontier labs

United States

OpenAI

Anthropic

Google DeepMind

Meta

xAI

Microsoft AI

Amazon, Nova, etc

SSI

Thinking Machines Lab

Europe

China

When the frontier isn’t an option

Tier 2: model and multimodal labs

Tier 2: infrastructure, inference, and serving

Tier 2: applications and big-tech divisions

Tier 3: research institutes and open-source collectives

Tier 3: academic and national labs

AI-adjacent companies

Work at the edges, not the center

Below the stack: kernels and inference

Above the stack: agents

Research is not engineering

The skills, and what to actually read

Kernels and low-level performance

Quantization

Scaling laws and pre-training

Distributed training and systems

Inference and serving

Reinforcement learning and post-training

Mixture-of-Experts

Compilers and DSLs for ML

Distillation

Agents

Finding more papers

Signaling it in public

Inference and serving engines

GPU kernels and DSLs

Quantization

Training frameworks and distributed

JAX ecosystem

RL and post-training

Agents, orchestration, and evals

Data and tokenization

The part that isn’t about skills

What I’m taking from it

Contributing to OpenFold 3: A Primer

A Brief Overview of ESM