Contributing to OpenFold 3: A Primer

What this is

If you’re new to OpenFold 3, as I am, the obvious question is where you can realistically be useful. This post is my attempt at an answer, written for two readers at once: someone deciding whether and where to jump in, and me, trying to pick a single lane I can go deep on instead of scattering effort across the whole thing.

A caveat up front. This is not a replacement for the repo’s own contributor guide, which lives at docs/source/contribution.md and is published on Read the Docs. That document covers the mechanics. This post is the layer on top of it: where, given finite time and a fresh pair of eyes, the work actually is. In the last post I said it’s necessary to understand the codebase before you contribute. Consider this the payoff of that promise.

The mechanics

The loop is short, which is the first encouraging thing. Fork the repo and clone it, then stand up the environment with pixi the same way the setup post walks through. The openfold3-cuda12 environment installs the package editable with the dev extras already baked in, so there’s no separate pip install step:

pixi install -e openfold3-cuda12

Run setup_openfold once (it downloads the model parameters and builds the local Chemical Component Dictionary), and confirm the suite passes before you touch anything:

pixi run -e openfold3-cuda12 setup_openfold
pixi run -e openfold3-cuda12 pytest openfold3/tests

That pixi run -e openfold3-cuda12 prefix gets old fast. As I mention in the setup post, I aliased it to ofrun (alias --save ofrun "pixi run -e openfold3-cuda12" in fish, or the plain alias ofrun="..." form in bash/zsh), and I’ll use that from here on.

Before you open a PR, format and lint:

ofrun ruff format
ofrun ruff check --fix

Two ruff choices are worth remembering, both of which I covered in the deep dive: an 88-character line length, and relative imports banned outright, so everything is imported by full path. The PR template asks for five things, Summary, Changes, Related Issues, Testing, and Other Notes. The Testing field is not decorative. The guide explicitly suggests turning whatever examples you used to convince yourself the change works into actual test cases, which is good advice in general and close to mandatory here.

If that still sounds intimidating, it shouldn’t. The two PRs I’ve landed so far were both single-file changes: a broken Slack invite link and a flaky test fix. Starting small is not a consolation prize, it’s the recommended path.

The rules of the house

OpenFold has an explicit, recently written policy on AI-assisted contributions, and given how I’m writing this series, I’m exactly the person it’s talking to.

The short version is three rules. First, every contribution, AI-assisted or not, is the human contributor’s responsibility; you have to fully understand and stand behind anything you submit. Second, issues and pull requests should be written by humans, because they’re the first line of communication with the core team, and that dialogue is the whole point. (The only exceptions are translation and generating a failing test case, and you’re asked to disclose even those.) Third, “good first issues” are reserved for new human developers, and agentic contributions on them will be closed, because those issues exist as a learning gateway and not as tickets to be cleared.

The reasoning behind all of it is that this is a small core team with finite review time, and they’d rather have a few high-signal contributions than a flood of generated ones. That seems reasonable to me, so here’s where I land. These tools are great for understanding the code and for drafting my own thinking, but the issue, the PR description, and the judgment behind them are mine. I opened both of my PRs myself, and I’m not pointing anything automated at the newcomer issues. If you lean on these tools too, that feels like the line to hold.

The lanes

The nice thing about a project this organized is that I don’t have to invent the map. The repo’s own issue labels basically are the map, so here are the lanes I see, each tied to a real label and a live issue so you can confirm it’s not hypothetical.

Performance and systems (my lean). Labels: inference, enhancement. This is the memory wall from the setup post: the pair and triangle tensors that scale with the square of the sequence length, the chunk_size and offload levers that fight back, and the OOM ceiling on consumer cards. A live example is issue #225, an inference OOM in get_token_frame_atoms. Suits people with a systems or CUDA background.
Kernels and hardware portability. Labels: inference, bug. There’s a validate_rocm.py in entry_points, and an open AMD bug (#177) where prep_cutlass breaks DeepSpeed’s Evoformer attention on ROCm. This is for people who like being close to the metal.
Data pipeline. Label: data preprocessing. MSA generation, templates, the CCD, input formats. Issue #172 (MSA features not allowed at the chain level) is a live one. High leverage and less glamorous, which is often where the real work hides.
Model and science. Labels: model, science. Confidence heads, the diffusion module, modality parity across RNA, DNA, and ligands. Issue #247 (a possible mismatch in nucleotide PAE frame atom order) is the flavor here. Suits people with an ML or structural-biology background who want to touch the model itself.
Testing, CI, and reproducibility. This is the one that bit me: snapshot tests that disagree across GPUs, conda and pixi parity, determinism. My flaky-test PR lived here. Underrated impact, and a good fit for infra-minded people.
Docs and developer experience. Labels: documentation, Installation. Setup friction, confusing error messages, missing examples. Issue #149 (a DataLoader worker dying on a fresh install) is the kind of thing that’s both a real fix and the gentlest possible on-ramp. My first PR lived here too.

One observation from skimming the open issues: they skew heavily toward bug and enhancement, then inference, with model, training, and config nearly empty. The honest read is that the demand right now is for fixing and hardening, not green-field model work. Worth knowing before you go hunting for a glamorous problem.

What the history actually shows

I wanted to know where newcomers actually succeed, so I read through the recently merged PRs, and the history tells the story better than I could.

The clearest example is a contributor going by GMNGeoffrey, who has quietly become the owner of the chunk-size and tuning lane. Across a string of PRs (making chunking for the Triton kernels match cuEquivariance, promoting tune_chunk_size to a top-level config field, avoiding retesting non-viable chunk sizes, forcing chunk sizes to powers of two, and more) it’s one person, one lane, building real trust by going deep rather than wide. It also happens to be the lane I’m eyeing, which is either encouraging or a warning depending on how you look at it.

And it’s not just one person. Community contributors have landed conda packaging fixes (sdvillal), documentation and a segfault fix (etowahadams), corrected CIF ligand output (ryanhulke), and small performance cleanups like removing redundant contiguous calls (borisfom). The contributor list is genuinely not just the consortium.

There’s a catch, though: as I write this there is essentially one open “good first issue.” One. So the realistic path for a newcomer is not to wait for a labeled gateway issue to appear. It’s to find a small, real bug or a performance nit and fix it cleanly. Which, I’ll admit, is exactly what my two PRs were, except I backed into that strategy by accident rather than planning it.

How to choose a lane

So how do you actually pick? The four things I’m weighing:

Leverage. Does fixing this help a lot of people, or just me?
Fit. Does it match what I’m already good at?
Maintainer pull. Is anyone actually asking for it? The contributions welcome label is a useful signal, and at the moment only two issues carry it (#58 and #40), which tells you where the door is most clearly open.
Scope. Can a first PR here be small and complete, or does being useful require a month of context first?

And one piece of advice I’m mostly giving to myself: talk to the maintainers before sinking weeks into something. The Slack and the issue tracker exist for exactly this, and it’s a lot cheaper to ask “would you take this?” than to find out at review time that the answer was no.

Where I am leaning, and why

The technical case

I’m planting the flag on performance and systems, specifically memory on consumer GPUs. The reasoning is concrete. The memory wall is the thing standing between a 24GB card and real-world-size inputs, and it isn’t abstract: issue #225 is a live inference OOM, and that’s the kind of specific, bounded first target I want rather than a vague intention to “make it faster.”

From there the threads are visible. Inference currently defaults to full fp32 (precision: "32-true" in the trainer args), which on a memory-bound card is a real lever sitting untouched. The chunking and offload machinery is already there to build on. And none of this requires me to understand the diffusion math to be useful, which matters a lot when you’re starting.

Why this lane, even though it isn’t the point

I want to be honest about the motivation here, because it would be easy to misread the systems focus as the thing I’m passionate about. It isn’t. I have a systems and CUDA background and I genuinely don’t mind the work, chasing down where the bytes went and fighting a memory budget is satisfying enough to do well, but it is not what actually pulls me to this project.

What pulls me is the biology itself, the ML fundamentals underneath these models, and the chance to have some real impact on humanity by accelerating the work of actual bench biologists and everyone downstream who builds on this software. Protein structure prediction is one of the few corners of computing where the output runs more or less straight into disease research and drug discovery, and that is the part I care about. The CUDA work is a means to that end, not the end itself.

So I’m picking the systems lane because it is the highest-leverage place I can be useful right now given what I already know, not because it is the dream. Being genuinely useful on a real project beats waiting until I feel “qualified” to touch the model or the science directly. In an earlier post on motivation and meaning I landed on the idea that the work worth doing is both high-leverage and pointed at something you actually care about. The memory wall is the leverage. The science it unblocks is the part that matters, and the part I’m steering toward.

Plug in

If any of this resonates, the repo is on GitHub, the issues page is the best place to see what’s live, and there’s an OpenFold Slack (the invite link works now, since fixing it was my first PR). If you’re eyeing the same memory-and-performance lane I am, say hi. I’d rather figure this out alongside someone than alone.

A structure I actually ran

The closing structure for the series is electron transfer flavoprotein subunit α (ETFA). It is a fitting one to end on, because it is a collector: it sits in the mitochondrial matrix and gathers electrons from a dozen different flavoprotein dehydrogenases (the enzymes of fatty acid and amino acid breakdown) and funnels them into the respiratory chain. A lot of separate metabolic streams converge on it before their energy ever reaches the proton gradient. That felt like a reasonable metaphor for a post about finding a lane: many possible entry points, one place to focus them.

Loading 3D structure…

Human electron transfer flavoprotein subunit α (ETFA), predicted with OpenFold 3 · avg pLDDT ≈ 89 Drag to rotate · pinch or scroll to zoom · colored by OpenFold 3 pLDDT on the AlphaFold confidence scale (blue high, orange low)

This is the cleanest prediction of the four, almost entirely high-confidence blue across both of its domains, with only a short low-confidence tail. Superposed against AlphaFold’s model of the same sequence (UniProt P13804), it matches to 0.54 Å backbone RMSD over 311 of 312 residues. Across the whole series the same result keeps showing up: where OpenFold 3 is confident, it lands within about half an ångström of AlphaFold, which is the kind of reproducibility that makes the open, commercially usable version genuinely useful rather than merely interesting. (As noted throughout, the public database is AlphaFold 2; there is no public bulk AlphaFold 3 download, but for monomers like these it is a fair reference.)

🐍 Snake