alexwengg commited on
Commit
ef0a912
·
verified ·
1 Parent(s): f30fb77

Upload 25 files

Browse files
Files changed (25) hide show
  1. iteration_3/README.md +115 -0
  2. iteration_3/packages/bert_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  3. iteration_3/packages/bert_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  4. iteration_3/packages/bert_fp16.mlpackage/Manifest.json +18 -0
  5. iteration_3/packages/decoder_pre_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  6. iteration_3/packages/decoder_pre_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  7. iteration_3/packages/decoder_pre_fp16.mlpackage/Manifest.json +18 -0
  8. iteration_3/packages/decoder_upsample_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  9. iteration_3/packages/decoder_upsample_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  10. iteration_3/packages/decoder_upsample_fp16.mlpackage/Manifest.json +18 -0
  11. iteration_3/packages/duration_predictor_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  12. iteration_3/packages/duration_predictor_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  13. iteration_3/packages/duration_predictor_fp16.mlpackage/Manifest.json +18 -0
  14. iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  15. iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  16. iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Manifest.json +18 -0
  17. iteration_3/packages/fused_f0n_har_source.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  18. iteration_3/packages/fused_f0n_har_source.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  19. iteration_3/packages/fused_f0n_har_source.mlpackage/Manifest.json +18 -0
  20. iteration_3/packages/ref_encoder_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  21. iteration_3/packages/ref_encoder_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  22. iteration_3/packages/ref_encoder_fp16.mlpackage/Manifest.json +18 -0
  23. iteration_3/packages/text_encoder_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
  24. iteration_3/packages/text_encoder_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
  25. iteration_3/packages/text_encoder_fp16.mlpackage/Manifest.json +18 -0
iteration_3/README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # StyleTTS2 → CoreML iteration_3
2
+
3
+ Mixed-precision build on top of iteration_2: 7 stages flipped to fp16
4
+ weight precision, 1 stage kept at fp32 to avoid an audible-quality
5
+ regression. Disk halved, pipeline-stage sum cut 24–41 % cool.
6
+
7
+ ## Pipeline (8 stages, 8 dispatches)
8
+
9
+ ```
10
+ text_encoder → CPU_ONLY fp16 11 MB
11
+ bert → ALL fp16 12 MB
12
+ ref_encoder → CPU_AND_GPU fp16 53 MB
13
+ fused_diffusion_sampler → ALL fp16 47 MB ← Trial 4
14
+ duration_predictor → CPU_ONLY fp16 15 MB
15
+ fused_f0n_har_source → CPU_ONLY fp32 32 MB ← Trial 6 (kept fp32: cumsum drift)
16
+ decoder_pre → CPU_AND_NE fp16 64 MB
17
+ decoder_upsample → CPU_ONLY fp16 40 MB
18
+ ```
19
+
20
+ Total: **274 MB**, 8 mlpackages, 8 dispatches per utterance.
21
+
22
+ ## Performance
23
+
24
+ Warm pipeline-stage sum (sum of per-stage timings reported by
25
+ `coreml.inference`), 3-iter sweep with 8 s cooldown, M-series Mac:
26
+
27
+ | Build | min | avg | max |
28
+ |-----------------|------|------|-------|
29
+ | iteration_2 fp32| 782 | 898 | 1075 |
30
+ | iteration_3 | **460** | **683** | 1110 (thermal) |
31
+
32
+ Cool-run delta: **−322 ms (−41 %)** at min, **−215 ms (−24 %)** at avg.
33
+ The max bucket bunches because pipeline-wide variance dominates any
34
+ config — same pattern observed in Trial 8b benches.
35
+
36
+ Per-stage savings observed end-to-end:
37
+
38
+ | stage | fp32 ms | fp16 ms | Δ |
39
+ |-------------------------|---------|---------|----------|
40
+ | fused_diffusion_sampler | 18.3 | 14.7 | −3.6 ms |
41
+ | decoder_pre | 35 | 7 | −28 ms |
42
+ | decoder_upsample | 593–638 | 284–325 | **−309 ms** |
43
+
44
+ ## Mixed precision rationale
45
+
46
+ | Stage | fp16 verdict | Why |
47
+ |-------------------------|---------------------|-----------------------------------------|
48
+ | text_encoder | adopt | clean A/B |
49
+ | bert | adopt | clean A/B |
50
+ | ref_encoder | adopt | clean A/B |
51
+ | fused_diffusion_sampler | adopt | parity 4.66e-3, A/B clean |
52
+ | duration_predictor | adopt | clean A/B |
53
+ | fused_f0n_har_source | **drop** | har computes sin(2π·cumsum(f0)) over 88 200 samples; fp16 cumsum drifts ~10 bits, audible phase distortion in second half |
54
+ | decoder_pre | adopt | parity tight, A/B clean |
55
+ | decoder_upsample | adopt | A/B clean; previously feared "+240 ms" regression on `ALL` did not reproduce on `CPU_ONLY` placement (this is the 8b-winning placement) |
56
+
57
+ Drift evidence comes from per-stage CoreML parity vs eager fp32 plus
58
+ direct A/B listening of three configurations:
59
+
60
+ ```
61
+ sanity_fp16_mixed.wav (5 fp16 / 3 fp32) — clean
62
+ sanity_fp16_plus_decpre.wav (6 fp16 / 2 fp32) — clean
63
+ sanity_fp16_plus_decup.wav (7 fp16 / 1 fp32) — clean ← this build
64
+ sanity_fp16_plus_f0n.wav (8 fp16) — degraded second half
65
+ ```
66
+
67
+ ## Storage
68
+
69
+ | Artifact | iteration_2 | iteration_3 |
70
+ |--------------------------------------|-------------|-------------|
71
+ | Total | 514 MB | **274 MB** (−47 %) |
72
+ | largest stage | decoder_pre 128 MB | decoder_pre 64 MB |
73
+ | smallest stage | text_encoder 21 MB | text_encoder 11 MB |
74
+
75
+ ## Usage
76
+
77
+ Same wiring as iteration_2 — `_STAGE_PRECISION` in `coreml/inference.py`
78
+ selects fp16 / fp32 per stage. No code changes, only the manifest values
79
+ flip:
80
+
81
+ ```python
82
+ _STAGE_PRECISION: dict[str, str] = {
83
+ "text_encoder": "fp16",
84
+ "bert": "fp16",
85
+ "ref_encoder": "fp16",
86
+ "fused_diffusion_sampler": "fp16",
87
+ "diffusion_unet": "fp32", # legacy fallback
88
+ "duration_predictor": "fp16",
89
+ "fused_f0n_har_source": "fp32", # cumsum drift
90
+ "f0n_predictor": "fp32", # legacy fallback
91
+ "har_source": "fp32", # legacy fallback
92
+ "decoder_pre": "fp16",
93
+ "decoder_upsample": "fp16",
94
+ }
95
+ ```
96
+
97
+ CLI overrides still work:
98
+
99
+ ```bash
100
+ # Re-run any stage at fp32 to A/B
101
+ python -m coreml.inference --fp32 decoder_upsample
102
+
103
+ # Drop back to iteration_2 wholesale
104
+ python -m coreml.inference --fp32
105
+ ```
106
+
107
+ ## Skipped trials this iteration
108
+
109
+ | Stage | Reason for staying fp32 |
110
+ |--------------------------|------------------------------------------------------|
111
+ | fused_f0n_har_source | har_source cumsum drift over 88 200-sample window |
112
+
113
+ Other quantization tiers (int8 weight-only, int4 palettization) deferred
114
+ to a future iteration — fp16 already pays for itself on disk and warm
115
+ latency.
iteration_3/packages/bert_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92c06d63856f46e8788c54fb2f2e7228d7da9798e2192c3078fb96a5f1de4074
3
+ size 85458
iteration_3/packages/bert_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc4a9fb3870729f9572b0830993351524b04b99eba6cab982cef2a17507d9ba0
3
+ size 12090496
iteration_3/packages/bert_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "514C9E67-3E15-43D6-AE2B-6179B9113D2E": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Weights",
7
+ "name": "weights",
8
+ "path": "com.apple.CoreML/weights"
9
+ },
10
+ "BED7A6A1-56C6-4FB3-AB4B-06ADAD7C844E": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Specification",
13
+ "name": "model.mlmodel",
14
+ "path": "com.apple.CoreML/model.mlmodel"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "BED7A6A1-56C6-4FB3-AB4B-06ADAD7C844E"
18
+ }
iteration_3/packages/decoder_pre_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:753dbab37d3232a69b52d48f5d0732632e9307d388ed5224736e9c585db6029c
3
+ size 55933
iteration_3/packages/decoder_pre_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db81849a38ce1959ea345219332051947f22f00dc2445cb9b7a119673ca4bf93
3
+ size 67190976
iteration_3/packages/decoder_pre_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "211DC47B-E839-4B47-B64D-EE04F9C081B9": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "BE7D7840-FCB4-4491-B2ED-0D81B5FD33AA": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "211DC47B-E839-4B47-B64D-EE04F9C081B9"
18
+ }
iteration_3/packages/decoder_upsample_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff29829e3c92a4208ef07d307293fd576c4484c6048e519b90cd32ee80180038
3
+ size 491796
iteration_3/packages/decoder_upsample_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43161151f001bb951c34952465adfc3c4f5fb8ab2845f31903be09ea9f1a6bc5
3
+ size 41400320
iteration_3/packages/decoder_upsample_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "26E8FCA8-BD9B-4185-B59E-00453487B2B3": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "BFFB197D-D576-4F27-85E5-48F5438F08C2": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "26E8FCA8-BD9B-4185-B59E-00453487B2B3"
18
+ }
iteration_3/packages/duration_predictor_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b60eccf1aff0c09069d4eeebb5611c11caee89788229d0780ef606ac8fa1384
3
+ size 29886
iteration_3/packages/duration_predictor_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75ba0b7b2f7dc6a687e9ec01d226c300b09f07832d8e4aac2705a16b5079910c
3
+ size 15543524
iteration_3/packages/duration_predictor_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "EA4FC14C-8DE2-414B-A6C4-B93190F89ED0": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Weights",
7
+ "name": "weights",
8
+ "path": "com.apple.CoreML/weights"
9
+ },
10
+ "EAAA83DE-C745-4884-AE8D-1ED5C06BC490": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Specification",
13
+ "name": "model.mlmodel",
14
+ "path": "com.apple.CoreML/model.mlmodel"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "EAAA83DE-C745-4884-AE8D-1ED5C06BC490"
18
+ }
iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7b29b69d4fcddc08e802c02e60db2a8d515669b6b8cec7ad18f5a5e3a5054df
3
+ size 311675
iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f823b5c638d2eb2fd91bf8e4efe4a90b2e1d3d9e2f5ab40e7e93cb03cd212aca
3
+ size 49361856
iteration_3/packages/fused_diffusion_sampler_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "2513097D-94F1-4E53-A221-C56C469859E3": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Weights",
7
+ "name": "weights",
8
+ "path": "com.apple.CoreML/weights"
9
+ },
10
+ "A740A1E5-A518-4F32-B0B3-878148DC3920": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Specification",
13
+ "name": "model.mlmodel",
14
+ "path": "com.apple.CoreML/model.mlmodel"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "A740A1E5-A518-4F32-B0B3-878148DC3920"
18
+ }
iteration_3/packages/fused_f0n_har_source.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c98d56382c336dbea5a864e594022c11ced07372687dc66e56c7c91795e45c63
3
+ size 61645
iteration_3/packages/fused_f0n_har_source.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce01b0523d8e925108b5f9ba113d5ff1721bb3e5e25f3eb6d8a2dfaa56876c59
3
+ size 33640448
iteration_3/packages/fused_f0n_har_source.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "696D6238-15F9-4A4A-AB5B-3957A847EB37": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Weights",
7
+ "name": "weights",
8
+ "path": "com.apple.CoreML/weights"
9
+ },
10
+ "8111F480-F3EF-48DE-91B8-5309A11BF779": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Specification",
13
+ "name": "model.mlmodel",
14
+ "path": "com.apple.CoreML/model.mlmodel"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "8111F480-F3EF-48DE-91B8-5309A11BF779"
18
+ }
iteration_3/packages/ref_encoder_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cbd0cf223b874ed6b2de35606a5690bc6355b4890ea32ec30119db5dc00497e
3
+ size 68843
iteration_3/packages/ref_encoder_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:011d14fdb46589dfb79efb619d63846430be4e4ac86372f8819f35f5e0157391
3
+ size 55386048
iteration_3/packages/ref_encoder_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "32FE6195-7355-4635-AECB-58D9F49F1E17": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Weights",
7
+ "name": "weights",
8
+ "path": "com.apple.CoreML/weights"
9
+ },
10
+ "343F4722-A338-4705-8547-09E9A93DE8EC": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Specification",
13
+ "name": "model.mlmodel",
14
+ "path": "com.apple.CoreML/model.mlmodel"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "343F4722-A338-4705-8547-09E9A93DE8EC"
18
+ }
iteration_3/packages/text_encoder_fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d88b74cb84892f7ff1e4d013517dd3d4dab56688b0a0fb4d920f72d0caf9e961
3
+ size 16587
iteration_3/packages/text_encoder_fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d7f6e5869bb9d523956183e0facdff160c301d28113290efa329ae7bf72d3ce
3
+ size 11208000
iteration_3/packages/text_encoder_fp16.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "7F3243AB-2AFC-40E5-A6DE-069619301D63": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "F67A2205-52AD-4B8E-A19F-A7FB9AEB48F9": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "7F3243AB-2AFC-40E5-A6DE-069619301D63"
18
+ }