Spaces:
Running
Running
update README with final state, full pipeline inference, LM generation step
Browse files
README.md
CHANGED
|
@@ -26,100 +26,98 @@ startup_duration_timeout: 2h
|
|
| 26 |
|
| 27 |
## Features
|
| 28 |
|
| 29 |
-
- **Music Generation** -
|
| 30 |
-
- **LoRA Training** -
|
| 31 |
-
- **
|
| 32 |
-
- **
|
|
|
|
| 33 |
|
| 34 |
## Music Generation
|
| 35 |
|
| 36 |
-
1. Enter a music description
|
| 37 |
2. Enter lyrics or check **Instrumental**
|
| 38 |
3. Adjust BPM, duration, steps, seed
|
| 39 |
-
4. Select
|
| 40 |
-
5.
|
| 41 |
-
6. Click **Generate Music**
|
| 42 |
|
| 43 |
-
**Timing:** ~270s for 10s audio with 1.7B LM, 8 steps.
|
| 44 |
|
| 45 |
## LoRA Training
|
| 46 |
|
| 47 |
-
1.
|
| 48 |
-
2.
|
| 49 |
-
3.
|
| 50 |
-
4.
|
| 51 |
-
5.
|
| 52 |
-
6. Trained adapter appears in the LoRA dropdown
|
| 53 |
|
| 54 |
-
**Timing:** ~170s preprocessing + ~
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
## API
|
| 70 |
|
| 71 |
-
###
|
| 72 |
|
| 73 |
```python
|
| 74 |
from gradio_client import Client
|
| 75 |
|
| 76 |
client = Client("WeReCooking/ACE-Step-CPU")
|
| 77 |
-
|
| 78 |
result = client.predict(
|
| 79 |
caption="upbeat electronic dance music",
|
| 80 |
lyrics="[Instrumental]",
|
| 81 |
-
instrumental=True,
|
| 82 |
-
|
| 83 |
-
duration=10,
|
| 84 |
-
seed=-1, # -1 = random
|
| 85 |
-
steps=8, # 1-32, fewer = faster
|
| 86 |
-
lora_select="None (no LoRA)", # or trained adapter name
|
| 87 |
lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf",
|
| 88 |
api_name="/generate"
|
| 89 |
)
|
| 90 |
-
print(result) # (audio_path, status_message)
|
| 91 |
```
|
| 92 |
|
| 93 |
-
###
|
| 94 |
|
| 95 |
```python
|
| 96 |
from gradio_client import Client, handle_file
|
| 97 |
|
| 98 |
client = Client("WeReCooking/ACE-Step-CPU")
|
| 99 |
-
|
| 100 |
result = client.predict(
|
| 101 |
audio_files=[handle_file("song.mp3")],
|
| 102 |
-
lora_name="my-style",
|
| 103 |
-
epochs=3,
|
| 104 |
-
lr=0.0001,
|
| 105 |
-
rank=16,
|
| 106 |
api_name="/train_lora"
|
| 107 |
)
|
| 108 |
-
print(result) # (log_text, train_btn, cancel_btn)
|
| 109 |
-
```
|
| 110 |
-
|
| 111 |
-
### Python Client - Server Status
|
| 112 |
-
|
| 113 |
-
```python
|
| 114 |
-
result = client.predict(api_name="/server_status")
|
| 115 |
-
print(result) # JSON with model info
|
| 116 |
```
|
| 117 |
|
| 118 |
### MCP (Model Context Protocol)
|
| 119 |
|
| 120 |
-
This Space supports MCP for AI assistants (Claude Desktop, Cursor, VS Code).
|
| 121 |
-
|
| 122 |
-
**MCP Config:**
|
| 123 |
```json
|
| 124 |
{
|
| 125 |
"mcpServers": {
|
|
@@ -128,44 +126,24 @@ This Space supports MCP for AI assistants (Claude Desktop, Cursor, VS Code).
|
|
| 128 |
}
|
| 129 |
```
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
## CLI Usage
|
| 134 |
|
| 135 |
```bash
|
| 136 |
-
|
| 137 |
-
python app.py "upbeat electronic dance music" --duration 10 --steps 8 --format mp3
|
| 138 |
-
|
| 139 |
-
# With lyrics
|
| 140 |
-
python app.py "pop ballad" --lyrics "Hello world\nThis is a test" -d 30
|
| 141 |
-
|
| 142 |
-
# With LoRA adapter
|
| 143 |
python app.py "jazz piano" --adapter my-style --seed 42
|
| 144 |
-
|
| 145 |
-
# Custom server URL
|
| 146 |
-
python app.py "ambient" --server http://localhost:8085
|
| 147 |
```
|
| 148 |
|
| 149 |
-
---
|
| 150 |
-
|
| 151 |
## Architecture
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
/props +-- preprocess_audio()
|
| 159 |
-
/job +-- train_lora_generator()
|
| 160 |
-
```
|
| 161 |
-
|
| 162 |
-
- **Inference:** GGUF via [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) HTTP API
|
| 163 |
-
- **Training:** PyTorch via ported [Side-Step](https://github.com/koda-dernet/Side-Step) engine
|
| 164 |
-
- Training stops ace-server (free RAM), restarts after with new adapters
|
| 165 |
|
| 166 |
## Credits
|
| 167 |
|
| 168 |
-
- [ACE-Step 1.5](https://github.com/ace-step/ACE-Step-1.5)
|
| 169 |
-
- [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp)
|
| 170 |
-
- [Side-Step](https://github.com/koda-dernet/Side-Step)
|
| 171 |
-
- [Serveurperso/ACE-Step-1.5-GGUF](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF)
|
|
|
|
| 26 |
|
| 27 |
## Features
|
| 28 |
|
| 29 |
+
- **Music Generation** -- text/lyrics to stereo 48kHz MP3 via GGUF quantized models
|
| 30 |
+
- **LoRA Training** -- fine-tune on your own audio (~11s/epoch CPU, ~1.4s/epoch GPU)
|
| 31 |
+
- **Auto-Captioning** -- librosa BPM/key/signature + LM understand mode (caption + lyrics extraction)
|
| 32 |
+
- **Multiple LM Sizes** -- 0.6B / 1.7B / 4B language models (on-demand download)
|
| 33 |
+
- **Cancel + Download** -- cancel training mid-epoch, download trained LoRA adapter
|
| 34 |
|
| 35 |
## Music Generation
|
| 36 |
|
| 37 |
+
1. Enter a music description
|
| 38 |
2. Enter lyrics or check **Instrumental**
|
| 39 |
3. Adjust BPM, duration, steps, seed
|
| 40 |
+
4. Select LoRA adapter if trained
|
| 41 |
+
5. Click **Generate Music**
|
|
|
|
| 42 |
|
| 43 |
+
**Timing:** ~270s for 10s audio with 1.7B LM, 8 steps on CPU.
|
| 44 |
|
| 45 |
## LoRA Training
|
| 46 |
|
| 47 |
+
1. Upload audio files (any length, auto-tiled at 30s chunks by VAE)
|
| 48 |
+
2. Set LoRA name, epochs, learning rate, rank
|
| 49 |
+
3. Click **Train** -- ace-server stops during training, restarts after
|
| 50 |
+
4. Use **Cancel** to stop early (saves checkpoint)
|
| 51 |
+
5. **Download** the trained adapter file
|
| 52 |
+
6. Trained adapter appears in the LoRA dropdown
|
| 53 |
|
| 54 |
+
**Timing:** ~170s preprocessing + ~11s/epoch on CPU. GPU: ~1.4s/epoch.
|
| 55 |
|
| 56 |
+
**Limits:** 30 min total audio across all files. Files exceeding the cap are truncated with a warning. 50 files max. 8h training timeout.
|
| 57 |
|
| 58 |
+
**Settings (per Side-Step author recommendations):**
|
| 59 |
+
- LR: 3e-4
|
| 60 |
+
- Rank: 32, Alpha: 64
|
| 61 |
+
- Epochs: 200-500 for 3-10 files
|
| 62 |
+
- Optimizer: Adafactor (minimal memory)
|
| 63 |
+
- Variant: standard turbo (not XL -- XL swaps on 18GB)
|
| 64 |
|
| 65 |
+
## Captioning Pipeline
|
| 66 |
|
| 67 |
+
Training audio is auto-captioned before preprocessing:
|
| 68 |
+
|
| 69 |
+
| Method | What it extracts | Speed |
|
| 70 |
+
|--------|-----------------|-------|
|
| 71 |
+
| **librosa** | BPM, key, time signature | ~3s/file |
|
| 72 |
+
| **LM understand** (GPU) | Rich caption + lyrics + metadata | ~52s/file |
|
| 73 |
+
| **ace-server /understand** (Space) | Same as LM, via GGUF | ~30s/file |
|
| 74 |
+
| **.txt/.json sidecar** | User-provided caption (if present) | instant |
|
| 75 |
+
|
| 76 |
+
On Space: uses ace-server /understand before training. Locally: uses PyTorch LM understand.
|
| 77 |
+
|
| 78 |
+
## Models
|
| 79 |
+
|
| 80 |
+
| Component | GGUF | Size | Purpose |
|
| 81 |
+
|-----------|------|------|---------|
|
| 82 |
+
| DiT XL turbo | acestep-v15-xl-turbo-Q4_K_M | 2.8 GB | Music generation (no LoRA) |
|
| 83 |
+
| DiT standard turbo | acestep-v15-turbo-Q4_K_M | 1.1 GB | Music generation (with LoRA) |
|
| 84 |
+
| LM 1.7B | acestep-5Hz-lm-1.7B-Q8_0 | 1.7 GB | Caption understanding |
|
| 85 |
+
| Text Encoder | Qwen3-Embedding-0.6B-Q8_0 | 0.75 GB | Text encoding |
|
| 86 |
+
| VAE | vae-BF16 | 0.32 GB | Audio encode/decode |
|
| 87 |
|
| 88 |
## API
|
| 89 |
|
| 90 |
+
### Generate Music
|
| 91 |
|
| 92 |
```python
|
| 93 |
from gradio_client import Client
|
| 94 |
|
| 95 |
client = Client("WeReCooking/ACE-Step-CPU")
|
|
|
|
| 96 |
result = client.predict(
|
| 97 |
caption="upbeat electronic dance music",
|
| 98 |
lyrics="[Instrumental]",
|
| 99 |
+
instrumental=True, bpm=120, duration=10, seed=-1, steps=8,
|
| 100 |
+
lora_select="None (no LoRA)",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf",
|
| 102 |
api_name="/generate"
|
| 103 |
)
|
|
|
|
| 104 |
```
|
| 105 |
|
| 106 |
+
### Train LoRA
|
| 107 |
|
| 108 |
```python
|
| 109 |
from gradio_client import Client, handle_file
|
| 110 |
|
| 111 |
client = Client("WeReCooking/ACE-Step-CPU")
|
|
|
|
| 112 |
result = client.predict(
|
| 113 |
audio_files=[handle_file("song.mp3")],
|
| 114 |
+
lora_name="my-style", epochs=200, lr=0.0003, rank=32,
|
|
|
|
|
|
|
|
|
|
| 115 |
api_name="/train_lora"
|
| 116 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
```
|
| 118 |
|
| 119 |
### MCP (Model Context Protocol)
|
| 120 |
|
|
|
|
|
|
|
|
|
|
| 121 |
```json
|
| 122 |
{
|
| 123 |
"mcpServers": {
|
|
|
|
| 126 |
}
|
| 127 |
```
|
| 128 |
|
| 129 |
+
## CLI
|
|
|
|
|
|
|
| 130 |
|
| 131 |
```bash
|
| 132 |
+
python app.py "upbeat electronic dance music" --duration 10 --steps 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
python app.py "jazz piano" --adapter my-style --seed 42
|
|
|
|
|
|
|
|
|
|
| 134 |
```
|
| 135 |
|
|
|
|
|
|
|
| 136 |
## Architecture
|
| 137 |
|
| 138 |
+
- **Inference:** GGUF via [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp)
|
| 139 |
+
- **Training:** PyTorch, ported from [Side-Step](https://github.com/koda-dernet/Side-Step) (commit ecd13bd)
|
| 140 |
+
- **Captioning:** librosa + LM understand (PyTorch or ace-server /understand)
|
| 141 |
+
- Training stops ace-server to free RAM, restarts after with new adapters
|
| 142 |
+
- Inference blocked during training with clear message
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
|
| 144 |
## Credits
|
| 145 |
|
| 146 |
+
- [ACE-Step 1.5](https://github.com/ace-step/ACE-Step-1.5)
|
| 147 |
+
- [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp)
|
| 148 |
+
- [Side-Step](https://github.com/koda-dernet/Side-Step)
|
| 149 |
+
- [Serveurperso/ACE-Step-1.5-GGUF](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF)
|