Nucleotide dependency maps

A categorical Jacobian / nucleotide dependency map measures how substituting the base at one position shifts the model's predicted nucleotide distribution at every other position — collapsed to an L×L heatmap over a locus. The method was discovered independently for protein language models (the categorical Jacobian — Zhang et al., PNAS 2024) and for genomic language models (nucleotide dependency analysis — Tomaz da Silva et al., Nat. Genet. 2025). Strong off-diagonal blocks flag positions the model treats as coupled (splice sites, structured elements, …).

Pick a locus to see every model's map for it, stacked — they share one genomic coordinate axis, so you can read a position straight down across models, against the annotated reference panel (cited beneath it).

Our gLMs are causal, so each strand populates only one triangle; the maps stitch a forward and a reverse-complement pass, then symmetrize (mean). See #237 for the method and the autoregressive correctness argument. The visible dependency range is bounded by the model's context window (255 bp), so kilobase-scale structure is not shown here. Color encodes dependency strength (coolwarm, per-map robust scaling; the diagonal is masked).