Kernels documentation

Building and testing kernels with GitHub Actions

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.15.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Building and testing kernels with GitHub Actions

Compiling a kernel is CPU-intensive and testing it requires an accelerator (such as a GPU), two things that GitHub’s standard runners do not provide cheaply. Instead of maintaining self-hosted runners, you can offload both steps to Hugging Face Jobs directly from a GitHub Actions workflow.

Two prebuilt actions make this possible:

  • huggingface/kernel-builder-job runs the Nix kernel builder on a CPU flavor and publishes the built kernel to the Hub.
  • huggingface/hf-jobs-action runs an arbitrary script on any flavor (including GPUs), which is convenient for testing the kernel you just built. This also helps to test the kernel across different hardware.

A typical setup has two workflows: one that builds the kernel on push, and one that tests it on a GPU. They communicate through the Hub, the build uploads artifacts, the test pulls them back down.

Prerequisites

  • An HF access token with the job.write permission. If the test job loads gated models, the token also needs read access to them.
  • The token stored as a repository secret named HF_TOKEN (Settings → Secrets and variables → Actions).
  • A kernel repository on the Hub to upload to, with kernel-creation access for the owning user or org (see Building kernels).

Jobs run under the namespace you specify (your username or an org) and count against that namespace’s compute quota.

Building on push

The build action checks out nothing by itself, your script clones the exact commit and invokes the Nix builder. Compilation happens on the HF Jobs CPU flavor, and build-and-upload pushes the finished variants to the Hub.

# .github/workflows/build-kernel.yml
name: Build Kernel

on:
  push:
    branches: [main]
    paths:
      - "csrc/**"
      - "torch-ext/**"
      - build.toml
      - flake.nix
      - flake.lock
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build kernel via HF Jobs
        uses: huggingface/kernel-builder-job@main
        with:
          token: ${{ secrets.HF_TOKEN }}
          namespace: your-username
          flavor: cpu-xl
          timeout: "21600"
          script: |
            # The container starts with `set -x`; disable tracing so the
            # token below is not echoed into the streamed logs.
            set +x
            export HF_TOKEN="${{ secrets.HF_TOKEN }}"
            # Rebuild artifacts, so skip pulling existing LFS blobs.
            export GIT_LFS_SKIP_SMUDGE=1

            git clone "${{ github.server_url }}/${{ github.repository }}" kernel
            cd kernel
            git checkout "${{ github.sha }}"

            nix run github:huggingface/kernels#kernel-builder -- build-and-upload \
              --max-jobs 4 \
              --cores 8 \
              --repo-id your-username/your-kernel

The path filter keeps the build from running on unrelated commits, and workflow_dispatch lets you trigger it by hand from the Actions tab. The upload destination is taken from --repo-id (or, if omitted, from the repo-id/version fields in build.toml).

Builds can take a long time on the first run because every PyTorch and CUDA variant is compiled. Set a generous timeout (the example allows six hours) and rely on the Hugging Face binary cache to keep subsequent builds fast.

You can speed up builds by tuning how much work runs in parallel. --max-jobs sets how many kernel variants are built concurrently, while --cores sets how many CPU cores each of those jobs may use. Pick values that fit the chosen CPU flavor: a larger flavor (such as cpu-xl) has more cores to spread across jobs, so raising --max-jobs and --cores together shortens the total build time. Setting them too high for the flavor only adds scheduling overhead.

kernel-builder-job inputs

Input Required Default Description
token yes HF token with job.write permission.
namespace yes HF namespace (username or org) that owns the job.
script yes Shell script to run in the container.
flavor no cpu-upgrade Hardware flavor (e.g. cpu-xl).
image no Nix + cachix image Container image to run the build in.
timeout no 1200 Maximum seconds to wait for the job.

The action exposes job_id and job_url outputs that link to the run on huggingface.co.

Testing on a GPU

Once the kernel is on the Hub, the generic jobs action runs a test script on a GPU flavor. The files input copies repository files into the container (under /tmp/files by default), and a uv script with inline dependencies keeps the environment self-contained.

# .github/workflows/run-tests.yml
name: Run tests

on:
  push:
    branches: [main]
    paths:
      - scripts/test.py
  workflow_dispatch:

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run test.py on an HF Jobs GPU
        uses: huggingface/hf-jobs-action@main
        with:
          token: ${{ secrets.HF_TOKEN }}
          namespace: your-username
          flavor: rtx-pro-6000
          image: ghcr.io/astral-sh/uv:python3.10-bookworm
          timeout: "3600"
          files: scripts/test.py
          script: |
            set +x
            export HF_TOKEN="${{ secrets.HF_TOKEN }}"
            uv run /tmp/files/test.py

The test script pulls the kernel straight from the Hub with the kernels library, so it always runs against the artifacts the build workflow just published:

# scripts/test.py
# /// script
# dependencies = ["kernels", "torch"]
# ///
from kernels import get_kernel

kernel = get_kernel("your-username/your-kernel")
# ... exercise the kernel and assert on the results ...

Run the build workflow before the test workflow so the Hub has a fresh kernel to pull. For tightly coupled steps, you can also have one workflow trigger the other, or combine both jobs in a single workflow with a needs: dependency.

hf-jobs-action inputs

Input Required Default Description
token yes HF token with job.write permission.
namespace yes HF namespace (username or org) that owns the job.
image yes Container image to run.
script yes Shell script to execute in the container.
flavor no cpu-basic Hardware flavor (e.g. rtx-pro-6000).
files no Newline-separated repo files to copy into the job.
files_dest no /tmp/files Directory the files are copied to inside the container.
env no {} Environment variables as a JSON object.
timeout no 1200 Maximum seconds to wait for the job.

Choosing a flavor

Flavors map to the machine types available on Hugging Face Jobs, CPU flavors such as cpu-upgrade and cpu-xl for builds, and GPU flavors such as l4x1, a100-large, h200, or rtx-pro-6000 for tests. Pick the most reasonable GPU that fits your model to keep jobs low cost. The current list and pricing are in the Hugging Face Jobs documentation.

HF Jobs currently only offers a few CPU architectures, so the kernel is built for whatever architecture the available CPU flavors provide. This is a current limitation to keep in mind if you need to target a specific architecture.

HF Jobs containers start with shell tracing enabled (set -x). Always run set +x before exporting HF_TOKEN so the token does not leak into the streamed build logs.

Update on GitHub