MorphStream Models
Models and TensorRT engine cache for real-time face processing used by MorphStream GPU Worker.
Private repository β requires access token for downloads.
Structure
/
βββ models/ # ONNX models (active)
β βββ buffalo_l/
β β βββ det_10g.onnx # SCRFD face detection (16 MB)
β β βββ w600k_r50.onnx # ArcFace recognition (166 MB)
β βββ fan_68_5.onnx # 5β68 landmark refinement (1 MB)
β βββ 2dfan4.onnx # 2DFAN4 68-point landmarks (93 MB)
β βββ inswapper_128.onnx # InSwapper FP32 (529 MB)
β βββ inswapper_128_fp16.onnx # InSwapper FP16 β default (265 MB)
β βββ hyperswap_1a_256.onnx # HyperSwap variant A (384 MB)
β βββ hyperswap_1b_256.onnx # HyperSwap variant B (384 MB)
β βββ hyperswap_1c_256.onnx # HyperSwap variant C (384 MB)
β βββ xseg_1.onnx # XSeg occlusion mask 1 (67 MB)
β βββ xseg_2.onnx # XSeg occlusion mask 2 (67 MB)
β βββ xseg_3.onnx # XSeg occlusion mask 3 (67 MB)
β βββ bisenet_resnet_34.onnx # BiSeNet face parsing (89 MB)
β βββ bisenet_resnet_18.onnx # BiSeNet face parsing (51 MB)
β βββ yolov8n.onnx # Person detection (12 MB)
βββ deploy/ # Hot-deploy code archives
β βββ develop/app_code.tar.zst # develop branch
β βββ latest/app_code.tar.zst # production (main)
βββ archives/ # Baked archives for Docker image
β βββ models-core-masks.tar.zst # Core+mask+yolov8n models (~584 MB)
β βββ trt-cache-sm89.tar.zst # TRT engines for sm89 (~2.7 GB)
βββ trt_cache/sm89/ # TRT engine cache (per GPU arch)
β βββ trt10.14_ort1.24/ # ORT 1.24 + TRT 10.14
β βββ manifest.json
β βββ *.engine # Compiled TRT engines
β βββ *.profile # TRT optimization profiles
β βββ *.timing # Kernel autotuning cache
βββ gfpgan/ # Face restoration (not used in real-time)
Models
Face Swap
| Model | Size | Input | TRT FP16 | Notes |
|---|---|---|---|---|
inswapper_128_fp16.onnx |
265 MB | 128px | No (FP32 TRT) | Default preset |
inswapper_128.onnx |
529 MB | 128px | No (FP32 TRT) | Standard quality |
hyperswap_1a_256.onnx |
384 MB | 256px | No (FP32 TRT) | High quality A |
hyperswap_1b_256.onnx |
384 MB | 256px | No (FP32 TRT) | High quality B |
hyperswap_1c_256.onnx |
384 MB | 256px | No (FP32 TRT) | High quality C |
Swap models compiled with trt_fp16_enable=False β FP16 causes pixel artifacts.
Face Detection & Recognition (core)
| Model | GPU Worker Class | Size | Input | TRT FP16 |
|---|---|---|---|---|
buffalo_l/det_10g.onnx |
DirectSCRFD |
16 MB | 320px | Yes |
buffalo_l/w600k_r50.onnx |
DirectArcFace |
166 MB | 112px | Yes |
fan_68_5.onnx |
DirectFan685 |
1 MB | (1,5,2) coords | Yes |
2dfan4.onnx |
Landmark68Detector |
93 MB | 256px | Yes |
Face Masks
| Model | Type | Size | Input | TRT FP16 |
|---|---|---|---|---|
xseg_1/2/3.onnx |
Occlusion | 67 MB each | 256px NHWC | No (FP32) |
bisenet_resnet_34.onnx |
Region parsing | 89 MB | 512px NCHW | No (FP32) |
bisenet_resnet_18.onnx |
Region parsing | 51 MB | 512px NCHW | No (FP32) |
Person Detection
| Model | Size | Input | TRT FP16 |
|---|---|---|---|
yolov8n.onnx |
12 MB | 640px | Yes |
Docker Baking
Models are split into two groups:
- Baked (in Docker image): core + masks + yolov8n (10 models, ~630 MB) via
archives/models-core-masks.tar.zst - Per-stream download: swap models (5 models) β downloaded on demand by
ModelDownloadService
# Rebuild models archive
bash scripts/pack_models.sh --upload
TensorRT Engine Cache
Pre-compiled TRT engines eliminate cold-start compilation (~180-300s β ~10-30s download).
Cache Key
Format: {gpu_arch}/trt{trt_version}_ort{ort_version}
Example: sm89/trt10.14_ort1.24 (RTX 4090, ORT 1.24, TRT 10.14)
manifest.json (format v2)
{
"cache_key": "sm89/trt10.14_ort1.24",
"format_version": 2,
"gpu_arch": "sm89",
"trt_version": "10.14",
"ort_version": "1.24",
"engine_files": {
"TensorrtExecutionProvider_TRTKernel_*.engine": {
"group": "core",
"onnx_model": "det_10g"
}
}
}
Engine groups: core, masks, inswapper_128, inswapper_128_fp16, hyperswap_1a/1b/1c_256, yolov8n, shared (.timing).
Lifecycle
- Download β at boot, GPU Worker downloads engines matching cache key from HF
- Compile β if no cache, ORT compiles TRT engines from ONNX on first load
- Upload β after compilation, engines uploaded to HF with manifest merge (preserves other groups)
- Selective recompile β admin UI selects model groups for recompile; manifest merges new engines with existing HF entries
- Cleanup β manifest-driven: stale engines (not in manifest) auto-deleted from HF during upload
Rebuild TRT Archive
# From local HF repo clone
bash scripts/pack_trt_cache.sh # auto-detect latest version
bash scripts/pack_trt_cache.sh sm89 trt10.14_ort1.24 # explicit
bash scripts/pack_trt_cache.sh --upload # pack + upload to HF
Hot Deploy
Code updates without Docker rebuild:
bash scripts/deploy_code.sh # deploy to develop
DEPLOY_TAGS="latest" bash scripts/deploy_code.sh # deploy to production
Uploaded to deploy/{tag}/app_code.tar.zst. GPU Worker downloads at boot via entrypoint.sh.
License
MIT License
- Downloads last month
- 3,840
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support