MorphStream Models

Models and TensorRT engine cache for real-time face processing used by MorphStream GPU Worker.

Private repository — requires access token for downloads.

Structure

/
├── models/                          # ONNX models (active)
│   ├── buffalo_l/
│   │   ├── det_10g.onnx             # SCRFD face detection (16 MB)
│   │   └── w600k_r50.onnx           # ArcFace recognition (166 MB)
│   ├── fan_68_5.onnx                # 5→68 landmark refinement (1 MB)
│   ├── 2dfan4.onnx                  # 2DFAN4 68-point landmarks (93 MB)
│   ├── inswapper_128.onnx           # InSwapper FP32 (529 MB)
│   ├── inswapper_128_fp16.onnx      # InSwapper FP16 — default (265 MB)
│   ├── hyperswap_1a_256.onnx        # HyperSwap variant A (384 MB)
│   ├── hyperswap_1b_256.onnx        # HyperSwap variant B (384 MB)
│   ├── hyperswap_1c_256.onnx        # HyperSwap variant C (384 MB)
│   ├── xseg_1.onnx                  # XSeg occlusion mask 1 (67 MB)
│   ├── xseg_2.onnx                  # XSeg occlusion mask 2 (67 MB)
│   ├── xseg_3.onnx                  # XSeg occlusion mask 3 (67 MB)
│   ├── bisenet_resnet_34.onnx       # BiSeNet face parsing (89 MB)
│   ├── bisenet_resnet_18.onnx       # BiSeNet face parsing (51 MB)
│   └── yolov8n.onnx                 # Person detection (12 MB)
├── deploy/                          # Hot-deploy code archives
│   ├── develop/app_code.tar.zst     # develop branch
│   └── latest/app_code.tar.zst      # production (main)
├── archives/                        # Baked archives for Docker image
│   ├── models-core-masks.tar.zst    # Core+mask+yolov8n models (~584 MB)
│   └── trt-cache-sm89.tar.zst       # TRT engines for sm89 (~2.7 GB)
├── trt_cache/sm89/                  # TRT engine cache (per GPU arch)
│   └── trt10.14_ort1.24/            # ORT 1.24 + TRT 10.14
│       ├── manifest.json
│       ├── *.engine                  # Compiled TRT engines
│       ├── *.profile                # TRT optimization profiles
│       └── *.timing                 # Kernel autotuning cache
└── gfpgan/                          # Face restoration (not used in real-time)

Models

Face Swap

Model	Size	Input	TRT FP16	Notes
`inswapper_128_fp16.onnx`	265 MB	128px	No (FP32 TRT)	Default preset
`inswapper_128.onnx`	529 MB	128px	No (FP32 TRT)	Standard quality
`hyperswap_1a_256.onnx`	384 MB	256px	No (FP32 TRT)	High quality A
`hyperswap_1b_256.onnx`	384 MB	256px	No (FP32 TRT)	High quality B
`hyperswap_1c_256.onnx`	384 MB	256px	No (FP32 TRT)	High quality C

Swap models compiled with trt_fp16_enable=False — FP16 causes pixel artifacts.

Face Detection & Recognition (core)

Model	GPU Worker Class	Size	Input	TRT FP16
`buffalo_l/det_10g.onnx`	`DirectSCRFD`	16 MB	320px	Yes
`buffalo_l/w600k_r50.onnx`	`DirectArcFace`	166 MB	112px	Yes
`fan_68_5.onnx`	`DirectFan685`	1 MB	(1,5,2) coords	Yes
`2dfan4.onnx`	`Landmark68Detector`	93 MB	256px	Yes

Face Masks

Model	Type	Size	Input	TRT FP16
`xseg_1/2/3.onnx`	Occlusion	67 MB each	256px NHWC	No (FP32)
`bisenet_resnet_34.onnx`	Region parsing	89 MB	512px NCHW	No (FP32)
`bisenet_resnet_18.onnx`	Region parsing	51 MB	512px NCHW	No (FP32)

Person Detection

Model	Size	Input	TRT FP16
`yolov8n.onnx`	12 MB	640px	Yes

Docker Baking

Models are split into two groups:

Baked (in Docker image): core + masks + yolov8n (10 models, ~630 MB) via archives/models-core-masks.tar.zst
Per-stream download: swap models (5 models) — downloaded on demand by ModelDownloadService

# Rebuild models archive
bash scripts/pack_models.sh --upload

TensorRT Engine Cache

Pre-compiled TRT engines eliminate cold-start compilation (~180-300s → ~10-30s download).

Cache Key

Format: {gpu_arch}/trt{trt_version}_ort{ort_version}

Example: sm89/trt10.14_ort1.24 (RTX 4090, ORT 1.24, TRT 10.14)

manifest.json (format v2)

{
  "cache_key": "sm89/trt10.14_ort1.24",
  "format_version": 2,
  "gpu_arch": "sm89",
  "trt_version": "10.14",
  "ort_version": "1.24",
  "engine_files": {
    "TensorrtExecutionProvider_TRTKernel_*.engine": {
      "group": "core",
      "onnx_model": "det_10g"
    }
  }
}

Engine groups: core, masks, inswapper_128, inswapper_128_fp16, hyperswap_1a/1b/1c_256, yolov8n, shared (.timing).

Lifecycle

Download — at boot, GPU Worker downloads engines matching cache key from HF
Compile — if no cache, ORT compiles TRT engines from ONNX on first load
Upload — after compilation, engines uploaded to HF with manifest merge (preserves other groups)
Selective recompile — admin UI selects model groups for recompile; manifest merges new engines with existing HF entries
Cleanup — manifest-driven: stale engines (not in manifest) auto-deleted from HF during upload

Rebuild TRT Archive

# From local HF repo clone
bash scripts/pack_trt_cache.sh                           # auto-detect latest version
bash scripts/pack_trt_cache.sh sm89 trt10.14_ort1.24     # explicit
bash scripts/pack_trt_cache.sh --upload                  # pack + upload to HF

Hot Deploy

Code updates without Docker rebuild:

bash scripts/deploy_code.sh           # deploy to develop
DEPLOY_TAGS="latest" bash scripts/deploy_code.sh  # deploy to production

Uploaded to deploy/{tag}/app_code.tar.zst. GPU Worker downloads at boot via entrypoint.sh.

License

MIT License

Downloads last month: 3,840

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support