Instructions to use stabilityai/stablecode-instruct-alpha-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stabilityai/stablecode-instruct-alpha-3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="stabilityai/stablecode-instruct-alpha-3b")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stabilityai/stablecode-instruct-alpha-3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stabilityai/stablecode-instruct-alpha-3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stabilityai/stablecode-instruct-alpha-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b

SGLang

How to use stabilityai/stablecode-instruct-alpha-3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stabilityai/stablecode-instruct-alpha-3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stabilityai/stablecode-instruct-alpha-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stabilityai/stablecode-instruct-alpha-3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stabilityai/stablecode-instruct-alpha-3b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use stabilityai/stablecode-instruct-alpha-3b with Docker Model Runner:
```
docker model run hf.co/stabilityai/stablecode-instruct-alpha-3b
```

UserWarning: You have modified the pretrained model configuration to control generation

#11

by jy395 - opened Aug 16, 2023

Discussion

jy395

Aug 16, 2023

lib\site-packages\transformers\generation\utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
Setting pad_token_id to eos_token_id:0 for open-end generation.

How to resolve the above warnings?

jdart

Aug 18, 2023

From GPT-4:

The warning you're seeing is related to using a deprecated method for controlling generation in the Hugging Face Transformers library.

Let's break down the warning and address the issues:

Modification of the Pretrained Model Configuration: The warning message indicates that directly modifying the model's configuration for controlling generation is a deprecated strategy. The solution Hugging Face recommends is to use a generation configuration file.
Setting pad_token_id to eos_token_id: This is an informational message rather than a warning. In some generation tasks, when pad_token_id is not specified, the eos_token_id (End-of-Sentence token ID) is used as the pad_token_id by default. If you don’t want this behavior, you should explicitly set the pad_token_id.

Solutions:

Using Generation Configuration:

Instead of modifying the model's configuration, you should use generation methods provided by the model or the generate function and pass the necessary arguments there.

For example, let's say you want to generate text using a model:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2-medium")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")

input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate output
output = model.generate(input_ids, max_length=100, num_return_sequences=5)

# Decode the output
for sequence in output:
    print(tokenizer.decode(sequence, skip_special_tokens=True))

If you want to control generation, you can pass various arguments to the generate function, like max_length, temperature, top_k, etc. However, you shouldn't directly modify the model's configuration for these parameters.

Explicitly Setting pad_token_id:

If you don’t want the default behavior of setting pad_token_id to eos_token_id, you should explicitly set pad_token_id during the generation:
```
output = model.generate(input_ids, max_length=100, pad_token_id=tokenizer.eos_token_id)
```

In summary, always refer to the official Hugging Face documentation or the linked URL in the warning for updated best practices and guidelines.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment