Metadata-Version: 2.4
Name: nki
Version: 0.4.0+25940409122.gd30719f9
Summary: NKI (Neuron Kernel Interface) - MLIR-based compiler for Neuron devices
Keywords: compiler,mlir,neuron,machine-learning
Author-Email: AWS Neuron Team <neuron-support@amazon.com>
License-Expression: Apache-2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Compilers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Project-URL: Homepage, https://github.com/aws-neuron/private-nki-staging
Project-URL: Repository, https://github.com/aws-neuron/private-nki-staging
Project-URL: Issues, https://github.com/aws-neuron/private-nki-staging/issues
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.3
Requires-Dist: ml-dtypes>=0.2
Provides-Extra: kernel-builder
Requires-Dist: einops==0.8.1; extra == "kernel-builder"
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Provides-Extra: test
Requires-Dist: hypothesis>=6.0; extra == "test"
Requires-Dist: coverage[toml]>=7.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Requires-Dist: pandas>=2.0.0; extra == "test"
Requires-Dist: lets-plot>=4.0.0; extra == "test"
Requires-Dist: matplotlib>=3.7.0; extra == "test"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6; extra == "docs"
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Requires-Dist: mkdocs-minify-plugin>=0.8; extra == "docs"
Requires-Dist: pymdown-extensions>=10.0; extra == "docs"
Provides-Extra: sphinx-docs
Requires-Dist: sphinx>=7.0; extra == "sphinx-docs"
Requires-Dist: sphinx-design; extra == "sphinx-docs"
Requires-Dist: sphinx-copybutton; extra == "sphinx-docs"
Description-Content-Type: text/markdown

# Neuron Kernel Interface (NKI)

[![CI (Build → Push → Test)](https://github.com/aws-neuron/private-nki-staging/actions/workflows/ci.yaml/badge.svg)](https://github.com/aws-neuron/private-nki-staging/actions/workflows/ci.yaml)

**Neuron Kernel Interface (NKI)** is a bare-metal programming language and compiler for directly programming NeuronDevices available on AWS Trainium/Inferentia instances. NKI empowers ML developers to develop, optimize and run new operators directly on NeuronCores while making full use of available compute and memory resources, starting with NeuronCores v2 (Trainium1) and beyond.

NKI provides developers with direct access to the NeuronCore instruction set architecture through a Python-based programming environment with tile-level semantics similar to Triton and NumPy. This enables developers to get started quickly and optimize performance in a familiar environment, while getting full control of the underlying hardware.

## Project Overview

This repository contains components of the NKI compiler infrastructure. NKI kernels bypass traditional high-level ML framework compilation stages and are compiled into intermediate representations that the compiler back-end can directly consume, giving developers great control over NeuronDevices down to the instruction level.

## Project Commands

### Run format/lint checks

Project CI enforces clang-format and clang-tidy checks.

```bash
# runs clang-format and clang-tidy on c++ code.
./.github/scripts/check.sh all
# runs clang-format and clang-tidy on files that changed since a given rev (HEAD by default)
./scripts/check_changed.sh 
```

You can also run these steps as a pre-commit hook:

```bash
ln -s $(realpath ./scripts/pre-commit) .git/hooks/pre-commit
```


### Current Implementation: NISA Dialect

The **NISA (NKI ISA) dialect** is an MLIR dialect that represents the instruction set architecture of AWS Trainium/Inferentia NeuronCores. It serves as a core compiler intermediate representation in the NKI compilation pipeline, bridging NKI kernel code and the NeuronCore hardware through hardware-specific optimizations like memory allocation and instruction scheduling.

### Key Features

- **Custom MLIR Dialect**: Defines operations specific to the Trainium architecture
- **J2Gen Dialect Scaffolding**: Uses Jinja2-based templating to generate TableGen files and custom ASM parsers/printers. This system enables concise definition of ~80 complex instructions with bundled operands (memloc, indices, affinemap, etc.) in Python rather than tediously hand-writing verbose TableGen definitions, while also producing readable textual representations for testing
- **Memory Abstractions**: Represents memory buffers with specific shapes, element types, and memory spaces
- **Multiple Execution Engines**: Supports different execution units (DMA, Vector, Scalar, GpSimd, Sync)
- **Rich Operation Set**: Includes operations for data movement, activation functions, tensor operations, and more

## Technical Architecture

### Core Components

1. **MLIR Dialect**: The project defines a custom MLIR dialect called "nisa" that encapsulates Trainium's instruction set.

2. **J2Gen Dialect Scaffolding**: Uses a custom Jinja2-based templating system (j2gen) that transforms concise Python-based operation definitions into verbose TableGen files and custom assembly parsers/printers for MLIR.

3. **nisa-opt Tool**: A custom MLIR optimization tool that registers the NISA dialect and processes MLIR files containing NISA operations.

### Key Abstractions

1. **Memory Locations (MemLoc)**: Represents memory buffers with specific shapes, element types, and memory spaces:
   - HBM: High-Bandwidth Memory
   - SBUF: Scratchpad Buffer
   - PSUM: Partial Sum

2. **Engines**: Different execution units within the Trainium architecture:
   - DMA: Data movement engine
   - Tensor: Matrix multiplication engine
   - Vector: Vector processing engine
   - Scalar: Scalar processing engine
   - GpSimd: General-purpose SIMD engine
   - Sync: Synchronization engine

3. **Operations**: Various operations defined for the different engines, including:
   - `DmaCopy`: Copy data between memory locations
   - `Activation`: Apply activation functions (relu, sigmoid, tanh, etc.)
   - `Memset`: Initialize memory with constant values
   - `TensorTensorPower`: Element-wise power operation
   - `TensorTensorBitvec`: Element-wise bitwise operations
   - `TensorReduceBitvec`: Reduction operations with bitwise operations
   - `Reciprocal`: Compute reciprocal of elements

## Usage Example

```mlir
nisa.dma_copy[2, 2](
  dst=!nisa.memloc<[2, 2], f32, hbm> %arg0[%idx0 + d0, %idx0 + d1],
  src=!nisa.memloc<[2, 2], f32, hbm> %arg1[%idx0 + d0, %idx0 + d1]
) engine=dma
```

This represents a DMA copy operation between two memory locations in high-bandwidth memory (HBM), with specific access patterns.

## Building from Source

For developers who want to build NKI from source, see **[docs/BUILDING.md](docs/BUILDING.md)** for complete build instructions.

Quick links:
- **[docs/BUILDING.md](docs/BUILDING.md)** - Complete build guide (Docker and local builds)
- **[docs/ci/ToolchainDockeCiDeveloperGuide.md](docs/ci/ToolchainDockeCiDeveloperGuide.md)** - CI/Docker details

## Development

### Adding New Instructions
When extending the NISA dialect:
1. Define instructions in `build-tools/j2gen/nisa.py` with comprehensive documentation
2. Add test cases in `test/nisa/nisa-op-roundtrip.mlir`
3. Update `docs/nisa/NKINISAMappingTable.md`
4. Verify with `ninja -C build check-nki`

### Documentation
- **[Missing Instructions](docs/nisa/NKINISAMappingTable.md)**: Implementation status tracking
- **[Memory Location Access](docs/MemoryLocationAccess.md)**: Multi-dimensional memory access
- **[Buffer Allocation](docs/nisa/BufferAllocation.md)**: Buffer allocation algorithm
- **[Scheduling](docs/Scheduling.md)**: Instruction scheduling optimization

### Future NKI Components
This repository will expand to include additional NKI compiler components beyond the current NISA dialect implementation, building towards a complete NKI programming environment that provides developers with direct access to NeuronCore capabilities.

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This project is licensed under the Apache-2.0 License.
