Instructions to use stabilityai/stablecode-instruct-alpha-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stabilityai/stablecode-instruct-alpha-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stabilityai/stablecode-instruct-alpha-3b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b") model = AutoModelForCausalLM.from_pretrained("stabilityai/stablecode-instruct-alpha-3b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stabilityai/stablecode-instruct-alpha-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stabilityai/stablecode-instruct-alpha-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b
- SGLang
How to use stabilityai/stablecode-instruct-alpha-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stabilityai/stablecode-instruct-alpha-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stabilityai/stablecode-instruct-alpha-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use stabilityai/stablecode-instruct-alpha-3b with Docker Model Runner:
docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b
ValueError: `model_kwargs` are not used by the model
When running the sample snippet provided on the model page, it throws this error (after downloading the tokenizer, config, safetensors, etc):ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)
This is from running the snippet copied directly from the documentation with no alterations. Python version 3.10.12, Pytorch version 2.1.0.dev20230705+cu121, running with CUDA on a 10GB RTX 3080.
Full traceback:
```
ValueError Traceback (most recent call last)
Cell In[1], line 10
8 model.cuda()
9 inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("cuda")
---> 10 tokens = model.generate(
11 **inputs,
12 max_new_tokens=48,
13 temperature=0.2,
14 do_sample=True,
15 )
16 print(tokenizer.decode(tokens[0], skip_special_tokens=True))
File ~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/.local/lib/python3.10/site-packages/transformers/generation/utils.py:1282, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
1280 model_kwargs = generation_config.update(**kwargs) # All unused kwargs must be model kwargs
1281 generation_config.validate()
-> 1282 self._validate_model_kwargs(model_kwargs.copy())
1284 # 2. Set generation parameters if not already defined
1285 logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
File ~/.local/lib/python3.10/site-packages/transformers/generation/utils.py:1155, in GenerationMixin._validate_model_kwargs(self, model_kwargs)
1152 unused_model_args.append(key)
1154 if unused_model_args:
-> 1155 raise ValueError(
1156 f"The following model_kwargs are not used by the model: {unused_model_args} (note: typos in the"
1157 " generate arguments will also show up in this list)"
1158 )
ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)
```
I have faced the same issue, using colab
I have faced the same issue, using colab
Tell me, can you share exisitng code and ask it to debug? How? In the colab there is weird behavior on changing the prompt and adding my own code. In oogabaoga, the same thing. Can it only write code?
inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("cuda")
# Removing 'token_type_ids' from the inputs dictionary resolved the error
if 'token_type_ids' in inputs:
del inputs['token_type_ids']
tokens = model.generate(
You can also use fix provided here: https://huggingface.co/stabilityai/stablecode-instruct-alpha-3b/discussions/2#64d30f314eb2ea6d5d8e118a
Same error on Colab. Speaks volume about ease of use and user friendliness if their proverbial "Hello world" gives errors as output. Such a difficult model or program causing user frustration is bound to fail. I guess they are headed the Android Studio way!

