Microcoder 1.5B

Microcoder 1.5B is a code-focused language model fine-tuned from Qwen 2.5 Coder 1.5B Instruct using LoRA (Low-Rank Adaptation) on curated code datasets. It is designed for code generation, completion, and instruction-following tasks in a lightweight, efficient package.

Model Details

Property	Value
Base Model	Qwen 2.5 Coder 1.5B Instruct
Fine-tuning	LoRA
Parameters	~1.5B
License	BSD 3-Clause
Language	English (primary), multilingual code
Task	Code generation, completion, instruction following

Benchmarks

Benchmark	Metric	Score
HumanEval	pass@1	59.15%
MBPP+	pass@1	52.91%

HumanEval and MBPP+ results were obtained using the model in GGUF format with Q5_K_M quantization. Results may vary slightly with other formats or quantization levels.

Usage

Important: You must use apply_chat_template when formatting inputs. Passing raw text directly to the tokenizer will produce incorrect results.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "pedrodev2026/microcoder-1.5b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": "Write a Python function that returns the nth Fibonacci number."
    }
]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Microcoder 1.5B was fine-tuned using LoRA on top of Qwen 2.5 Coder 1.5B Instruct. The training focused on code-heavy datasets covering multiple programming languages and problem-solving scenarios, aiming to improve instruction-following and code correctness at a small model scale.

Credits

Model credits — see MODEL_CREDITS.md
Dataset credits — see DATASET_CREDITS.md

License

The Microcoder 1.5B model weights and associated code in this repository are released under the BSD 3-Clause License. See LICENSE for details.

Note that the base model (Qwen 2.5 Coder 1.5B Instruct) and the datasets used for fine-tuning are subject to their own respective licenses, as detailed in the credit files above.

Notice

The documentation files in this repository (including README.md, MODEL_CREDITS.md, DATASET_CREDITS.md, and other .md files) were generated with the assistance of an AI language model.