NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

An example of the implementation of an NNsight intervention graph: (a) A user writes research code from which (b) an intervention graph is constructed. (c) The intervention operations are interleaved with the original model's computation and then executed. Values marked with .save() are made available to the user upon completion.

Why is research on foundation model internals important?

As AI models grow in scale and capability, understanding their inner workings becomes increasingly critical for safety, reliability, and scientific progress. However, our analysis reveals a growing gap between the most capable models and those being studied in detail. While many interpretability researchers have access to open-weight models like Llama 3.1 405B, the technical complexity and computational requirements of working with such large models create substantial barriers.

Most interpretability research is done on models that lag far behind the capabilities available in either closed- or open-models. There is a significant gap between the performance of models studied (blue line) and the capabilities of leading open-weight models (shown in orange). This gap is extended even further when considering the performance of leading closed-weight models (black line).

How does the intervention graph architecture work?

The intervention graph represents experiments as a portable, serializable computational graph that can be transmitted, optimized, and executed independently from the underlying model deployment. This separation of experimental design from engineering implementation allows researchers to focus on scientific questions rather than operational challenges.

Interventions are modifications to the model's computation graph that introduce additional nodes and edges. By carefully introducing these additional nodes and edges, the intervention graph defines how these modifications interact with the original model's forward and backward pass. This enables a wide range of experiments, from simple activation probing to complex causal interventions.

What is NNsight?

NNsight is an open-source library that extends PyTorch to create a familiar yet powerful API for neural network interventions. By using deferred execution within a tracing context, NNsight allows researchers to define complex experiments using standard PyTorch syntax. The experiment is represented as an intervention graph that can be executed locally or sent to a remote service.

Experiment code expressed using (a) standard PyTorch hooks and (b) the NNsight API. Both code snippets define the same intervention, but the NNsight version is more concise and expressive, allowing all module inputs and outputs to be accessed within a single trace context.

The key features of NNsight include:

Familiar PyTorch syntax for defining model interventions
Support for all fundamental PyTorch operations
Access to intermediate activations, gradients, and model parameters
Compatibility with custom neural network architectures
Seamless transition between local and remote execution

Documentation and more information can be found on the NNsight website.

What is NDIF?

The National Deep Inference Fabric (NDIF) is a scalable inference service designed to execute intervention graphs on large, preloaded models. NDIF allows multiple researchers to share computational resources, dramatically reducing the cost and complexity of working with large-scale AI models.

System architecture overview: Researchers write experiments using NNsight, which creates an intervention graph that's sent to the NDIF service. NDIF hosts models across multiple GPUs, with weights distributed using tensor parallelism for very large models like Llama 3.1 405B. The intervention is executed across all model shards, results are gathered, and returned to the researcher's local environment. This architecture enables efficient sharing of computational resources while maintaining a simple interface for researchers.

NDIF's architecture provides several advantages over traditional approaches:

Minimized communication overhead compared to peer-to-peer systems like Petals
Efficient resource sharing across multiple users
Support for safe co-tenancy of user experiments
Horizontal scaling and dynamic resource allocation
Support for distributed model execution across multiple GPUs

More information and access details can be found on the NDIF website.

Performance and Evaluation

We evaluated the performance of NNsight and NDIF in comparison with existing tools for model interpretation and remote execution. Our results show that:

For large models, NDIF provides substantial performance improvements over traditional HPC approaches, with minimal overhead compared to local execution. When compared to Petals, another open-source remote inference framework, NDIF shows comparable performance for standard inference but significantly outperforms it for intervention tasks due to reduced communication overhead.

How to cite

The paper can be cited as follows:

bibliography

Jaden Fiotto-Kaufman*, Alexander R. Loftus*, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau. "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals." International Conference on Learning Representations (ICLR) 2025.

bibtex

@inproceedings{fiotto-kaufman2025nnsight,
  title={NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals},
  author={Fiotto-Kaufman, Jaden and Loftus, Alexander R. and Todd, Eric and Brinkmann, Jannik and Pal, Koyena and Troitskii, Dmitrii and Ripa, Michael and Belfki, Adam and Rager, Can and Juang, Caden and Mueller, Aaron and Marks, Samuel and Sharma, Arnab Sen and Lucchetti, Francesca and Prakash, Nikhil and Brodley, Carla and Guha, Arjun and Bell, Jonathan and Wallace, Byron C. and Bau, David},
  booktitle={International Conference on Learning Representations},
  year={2025}
}