Instructions to use stabilityai/stablecode-instruct-alpha-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stabilityai/stablecode-instruct-alpha-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stabilityai/stablecode-instruct-alpha-3b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b") model = AutoModelForCausalLM.from_pretrained("stabilityai/stablecode-instruct-alpha-3b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stabilityai/stablecode-instruct-alpha-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stabilityai/stablecode-instruct-alpha-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b
- SGLang
How to use stabilityai/stablecode-instruct-alpha-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stabilityai/stablecode-instruct-alpha-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stabilityai/stablecode-instruct-alpha-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/stablecode-instruct-alpha-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use stabilityai/stablecode-instruct-alpha-3b with Docker Model Runner:
docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b
UserWarning: You have modified the pretrained model configuration to control generation
lib\site-packages\transformers\generation\utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
Setting pad_token_id to eos_token_id:0 for open-end generation.
How to resolve the above warnings?
From GPT-4:
The warning you're seeing is related to using a deprecated method for controlling generation in the Hugging Face Transformers library.
Let's break down the warning and address the issues:
Modification of the Pretrained Model Configuration: The warning message indicates that directly modifying the model's configuration for controlling generation is a deprecated strategy. The solution Hugging Face recommends is to use a generation configuration file.
Setting
pad_token_idtoeos_token_id: This is an informational message rather than a warning. In some generation tasks, whenpad_token_idis not specified, theeos_token_id(End-of-Sentence token ID) is used as thepad_token_idby default. If you don’t want this behavior, you should explicitly set thepad_token_id.
Solutions:
Using Generation Configuration:
Instead of modifying the model's configuration, you should use
generationmethods provided by the model or thegeneratefunction and pass the necessary arguments there.For example, let's say you want to generate text using a model:
from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("gpt2-medium") tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium") input_text = "Once upon a time" input_ids = tokenizer.encode(input_text, return_tensors="pt") # Generate output output = model.generate(input_ids, max_length=100, num_return_sequences=5) # Decode the output for sequence in output: print(tokenizer.decode(sequence, skip_special_tokens=True))If you want to control generation, you can pass various arguments to the
generatefunction, likemax_length,temperature,top_k, etc. However, you shouldn't directly modify the model's configuration for these parameters.Explicitly Setting
pad_token_id:If you don’t want the default behavior of setting
pad_token_idtoeos_token_id, you should explicitly setpad_token_idduring the generation:output = model.generate(input_ids, max_length=100, pad_token_id=tokenizer.eos_token_id)
In summary, always refer to the official Hugging Face documentation or the linked URL in the warning for updated best practices and guidelines.