GitHub stars License Parameters Context Sovereign AI Python 3.10+

Stack X Ultimate

The ultimate 3B parameter model for sovereign AI deployment

Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment.


Hardware Requirements

Quantization GPU Required VRAM Total Model Size
FP16 (full precision) RTX 3060+ ~6 GB ~6 GB
Q8_0 RTX 3060 ~3 GB ~3 GB
Q4_K_M Any modern GPU ~1.8 GB ~1.8 GB
Q3_K_M Integrated GPU ~1.2 GB ~1.2 GB
Q2_K CPU + 8GB RAM ~900 MB ~900 MB

Minimum Requirements (Q3_K and below)

  • GPU: None required (CPU inference supported)
  • RAM: 8GB system RAM
  • Storage: 2GB+ free space

Recommended Requirements

  • GPU: NVIDIA RTX 3060 (12GB) or better
  • RAM: 16GB system RAM
  • Storage: 4GB+ free space for multiple quantizations

Edge Deployment

Platform Quantization Requirements
NVIDIA Jetson Orin Q4_K_M 8GB RAM, 15W TDP
Raspberry Pi 5 + GPU Q2_K 8GB RAM, external GPU
Apple Silicon (M1/M2/M3) Q4_K_M 16GB unified memory
Intel Arc GPU Q4_K_M Intel Arc A770

File Sizes

Quantization File Size Download
FP16 ~6.0 GB Download
Q8_0 ~3.0 GB Download
Q4_K_M ~1.8 GB Download
Q3_K_M ~1.2 GB Download
Q2_K ~900 MB Download

Use Cases

Best Suited Tasks

  • Code Generation: Multi-language code writing, refactoring, and debugging
  • Text Generation: Creative writing, documentation, content creation
  • Question Answering: Information retrieval, knowledge base queries
  • Summarization: Document summarization, abstract generation
  • Classification: Text classification, sentiment analysis
  • Translation: Cross-language text translation
  • Embedded Systems: On-device AI, IoT applications

Industries & Domains

Industry Use Case
Healthcare HIPAA-compliant AI assistants, clinical documentation
Finance SOC2-compliant automation, risk assessment
Legal Contract analysis, case law research
Government Classified environment AI, secure documentation
Manufacturing Edge AI for quality control, predictive maintenance
Retail On-premise customer service, inventory optimization
Education Offline learning assistants, classroom AI

Quick Start

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "my-ai-stack/Stack-X-Ultimate"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate response
prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment."

messages = [
    {"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
    )

response = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)

print(response)

llama.cpp

# Download the GGUF model file
# Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main

# Run with llama.cpp on GPU
./main -m stack-x-ultimate-q4_k_m.gguf \
  -n 512 \
  -t 8 \
  -c 131072 \
  --temp 0.7 \
  --top-p 0.95 \
  -p "Write a Python function to implement quicksort algorithm."

# Run on CPU only
./main -m stack-x-ultimate-q4_k_m.gguf \
  -n 512 \
  -t 8 \
  -c 131072 \
  --no-display \
  --threads 8 \
  -p "Explain the differences between sovereign AI and cloud-based AI solutions."

# Use with quantization comparison
./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5

Ollama

# Pull the model
ollama pull stack-x-ultimate

# Run interactively
ollama run stack-x-ultimate "Write a Python function to implement binary search."

# Run with creative temperature
ollama run stack-x-ultimate \
  --temperature 0.9 \
  --top-p 0.95 \
  "Write a short story about an AI that becomes self-aware in an air-gapped facility."

# Run with low temperature for factual responses
ollama run stack-x-ultimate \
  --temperature 0.2 \
  --top-p 0.9 \
  "Explain quantum computing and its applications in cryptography."

# Use with longer context for document processing
ollama run stack-x-ultimate \
  --num-ctx 65536 \
  --temperature 0.5 \
  "Summarize the following research paper: [PASTE TEXT]"

Model Architecture

Attribute Value
Base Model Qwen/Qwen2.5-3B
Parameters 3B
Fine-tuning Full fine-tuning + LoRA
Context Length 131,072 tokens (128K)
Vocabulary Size 151,936 tokens
Hidden Size 1,536
Attention Heads 12
Num Key Value Heads 2
Transformer Layers 28
Activation Function SiLU
RoPE Scaling NTK (factor: 4.0)

Training Details

  • Base Model: Qwen2.5-3B
  • Training Approach: Combined full fine-tuning + LoRA
  • Fine-tuning Data: Diverse high-quality corpus
  • Focus Areas: General understanding, code generation, instruction following
  • Special Training: Sovereign deployment optimization, edge computing efficiency
  • Context Length: 128K tokens
  • License: Apache 2.0
  • Release Date: April 2026

Performance Notes

Inference Speed (Q4_K_M)

Device Tokens/sec Latency (512 tokens)
RTX 4090 ~55 ~9.3s
RTX 3090 ~42 ~12.2s
RTX 3060 ~25 ~20.5s
Apple M2 Pro ~35 ~14.6s
CPU (i9-13900K) ~10 ~51.2s

Deployment Scenarios

Single User (Interactive)

config = {
    "max_new_tokens": 512,
    "temperature": 0.7,
    "top_p": 0.95,
    "batch_size": 1,
}

Multi-User (Server)

config = {
    "max_new_tokens": 256,
    "temperature": 0.5,
    "top_p": 0.9,
    "batch_size": 4,
    "use_kv_cache": True,
}

Offline/Edge

config = {
    "max_new_tokens": 128,
    "temperature": 0.3,
    "top_p": 0.85,
    "quantization": "q4_k_m",
}

Security & Sovereignty

Stack X Ultimate is designed for secure, sovereign deployment:

  • Air-Gapped Operation: No internet connection required
  • Data Privacy: All data stays within your infrastructure
  • Compliance Ready: SOC2, HIPAA, GDPR compatible
  • Audit Trail: Full inference logging capabilities
  • On-Premise Only: No cloud dependencies

Enterprise Security Features

Feature Description
VPC Deployment Deploy within your private network
TLS/SSL Encrypted communication
Authentication OAuth2, LDAP, SSO support
Rate Limiting Prevent abuse and overuse
Audit Logging Complete inference history

Limitations

  • Model Size: At 3B parameters, less capable than larger models for complex reasoning
  • Specialized Tasks: May require fine-tuning for domain-specific tasks
  • Multi-modal: Text-only; does not support images or audio
  • Hallucinations: May occasionally generate incorrect information; verification recommended

Quick Links


Citation

@misc{my-ai-stack/stack-x-ultimate,
  author = {Walid Sobhi},
  title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
}

Built with love for developers
Discord · GitHub · HuggingFace

Downloads last month
666
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for my-ai-stack/Stack-X-Ultimate

Base model

Qwen/Qwen2.5-3B
Finetuned
(383)
this model

Space using my-ai-stack/Stack-X-Ultimate 1

Evaluation results