Koinic

Training a 1B-parameter code model on GPU costs $10,000+ in cloud compute. Byte-level tokenization makes sequences 3-4x longer than BPE, multiplying the cost further. The majority of developers — the people who would benefit most from local code generation — simply cannot participate.

Standard approachSingle resolution, O(N2) attention

TokenizationBPE (32K vocab), requires training

OptimizerAdamW (8 bytes/param state)

HardwareGPU cluster (A100, H100)

What AXL Changes

Three parallel encoder stacks at 1x, 2x, 4x resolution. The coarse scale processes 1/4 of tokens — exactly offsetting the byte tokenization length penalty. Cross-scale attention lets fine details inform coarse context and vice versa.

Multi-scale3 stacks, O(N2d/16) at coarse

TokenizationByte (258 vocab), no training

OptimizerLion (4 bytes/param, sign momentum)

HardwareAny modern CPU (Ryzen 5, i5, M1)

Results

With matched parameters (both 12.8M), same data, same optimizer, same 3-minute wall-clock: AXL achieves 16x better perplexity on Code-1B and processes 52% more training steps than a standard transformer.

AXL Multi-Scale (12.8M)

PPL 1.03

Standard Transformer (12.8M)

PPL 18.09

Same model size (12.8M params), same data, same optimizer (Lion), same wall-clock (3 min on Ryzen 5 5600G). AXL wins 2/2 seeds. Std dev: +/- 0.00 (AXL) vs +/- 2.72 (Standard).

Conclusion

AXL proves transformer models can be trained on consumer CPUs. It is a starting point, not a destination — 318M params is tiny by 2026 standards. But the architecture works, the optimizer works, and the $0.004 cost makes iteration feasible for everyone.

First multi-scale byte-level code transformer trained from scratch on CPU

First 318M-parameter code model trained entirely on consumer hardware

Complete open-source pipeline: train → quantize → deploy in Ollama

How It Works

Three resolution scales process the same sequence in parallel. Coarse attention is 16x cheaper than fine.

Byte Tokenization vs BPE

Byte-level tokenization makes sequences 3-4x longer. The coarse scale exactly offsets this.

BPE (Standard)

def fibonacci(n):

def fibonacci(n):

5 tokensVocab: 32,000

Byte (AXL)

def fibonacci(n):

def fibonacci...

17 tokensVocab: 258

AXL Coarse Scale

def fibonacci(n):

def fibonacci(n)...

~5 groups (N/4)Effective length = BPE

The 4x byte penalty is exactly offset by the 4x downsampling at coarse scale. No information is lost — fine scale still sees every byte.

Training Cost

Train the entire AXL family for less than a cup of coffee.

AXL-Comment-Lion (7M)

$0.0004

AXL-Micro-Lion (13M)

$0.001

AXL-Reasoning-Lion (70M)

$0.002

AXL-Code-1B-Lion (318M)

$0.004

All 11 Lion models

$0.031

Cloud A100 (1 model, 1 hr)

$3.00+

Based on AMD Ryzen 5 5600G, 100W system power, US average $0.12/kWh.

Model Family

Lion Models SGD Models Specialized

Model	Params	PPL	tok/s	Q4_K_M	Time
AXL-Code-1B-Lion	318M	1.90	6.1	188 MB	20 min
AXL-Reasoning-Lion	70M	1.79	22.4	44 MB	10 min
AXL-Refactor-Lion	19.1M	1.11	52.2	12 MB	3 min
AXL-TestGen-Lion	15.2M	1.15	57.3	18 MB	3 min
AXL-Chat-Lion	9.9M	1.52	73.4	7 MB	3 min
AXL-Micro-Lion	12.8M	1.04	66.2	15 MB	3 min
AXL-Secure-Lion	11.7M	1.20	63.5	8 MB	3 min
AXL-Docs-Lion	9.9M	1.12	72.8	7 MB	2 min
AXL-Comment-Lion	7.2M	1.20	75.8	5 MB	2 min

Model	Params	PPL	Focus	GGUF
AXL-Micro-600K	600K	1.04	Demo	1 MB
AXL-Micro-8M	12.8M	3.13	Code gen	25 MB
AXL-Coder-15M	26.0M	1.54	Agentic	50 MB
AXL-Debugger-8M	14.1M	1.49	Bug fixing	27 MB
AXL-Fixer-12M	20.9M	1.52	Debug	40 MB
AXL-Reasoning-70M	70M	1.93	CoT	134 MB
AXL-300M	322M	1.11	Flagship	616 MB
AXL-Chat-10M	9.9M	1.48	Dialogue	19 MB
AXL-TestGen-15M	15.2M	1.15	Test gen	30 MB
AXL-Refactor-20M	19.1M	1.15	Refactoring	37 MB
AXL-Docs-8M	9.9M	1.12	Docstrings	19 MB
AXL-Comment-5M	7.2M	1.16	Comments	14 MB
AXL-Secure-10M	11.7M	1.20	Security	23 MB

Model	Params	PPL	Focus	GGUF
AXL-Code-1B	318M	31.22	Code gen (SGD)	606 MB
AXL-Chat-Pro	12.8M	1.34	Advanced chat	25 MB
AXL-Translate	15.2M	1.86	Code translation	29 MB

Get Started

Full quality via Python API. Degraded quality via Ollama.

Python API (Full Quality)

pip install -e .
python AXL/API/serve_model.py \
  --model checkpoints/axl_micro_lion \
  --port 8880

# OpenAI-compatible endpoint:
# POST http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChain

Train Your Own

pip install -e .
python scripts/retrain_all_lion.py \
  --models micro
# Done in 3 minutes. Model in checkpoints/

Ollama (Degraded)

# Warning: uses only 1/3 of AXL architecture
cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion \
  "def fibonacci(n):"

Honest Trade-offs

AXL is not a silver bullet. Here is where it works and where it does not.

When AXL works better

Edge deployment (5-40 MB models)
CPU-only environments (no GPU available)
Rapid prototyping (2-3 min training)
Multilingual code (byte tokenizer handles any language)
Resource-constrained research (students, hobbyists)
Privacy-sensitive (all data stays local)

When AXL works worse

Complex multi-step code reasoning
Long context (max 256-2048 bytes)
Production-grade code generation
Benchmark SOTA competition
Non-code NLP tasks
Models above 318M parameters

Common Questions

"PPL 1.90 sounds too good to be true"

Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab). The entropy ceiling is different. We report byte-PPL only and compare byte-PPL to byte-PPL.

"Can this actually generate working code?"

HumanEval pass@1 is very low for from-scratch 318M models. AXL proves architecture viability, not production quality. The models learn byte-level Python patterns (syntax, indentation, common keywords).

"Byte tokenization makes sequences 4x longer"

The coarse scale processes 4x fewer tokens. Fine processes N bytes, coarse processes N/4 groups. The penalty is exactly offset at coarse scale.

"How is this different from MEGABYTE?"

MEGABYTE (Meta, 2023) does not target CPU-first training, code-specific optimization, or GGUF export. AXL adds Lion optimizer, progressive training, 27 specialized models, and a complete deploy pipeline.

"20 minutes can't produce a useful model"

"Useful" depends on the task. PPL 1.90 means the model has learned byte-level patterns in Python code. It captures syntax, common structures, and formatting.

"What's the multi-scale overhead?"

3 parallel stacks increase training FLOPs by ~1.3x vs a single stack (not 3x), because coarse processes 1/4 tokens. The cross-attention adds ~25% params. Net: slightly more compute per step, but the coarse pathway provides 16x cheaper long-range attention.

"Can I use this with Ollama/LM Studio?"

The GGUF files exported for Ollama use only the fine-scale encoder (1/3 of the AXL architecture). Cross-scale attention, adaptive fusion, and the medium/coarse encoders are stripped. The reported PPL values (1.90, 1.11, etc.) apply to the full multi-scale model, NOT the Ollama version. For full AXL quality, use the Python API server.

Resources

GitHub

Full codebase, training scripts, documentation

Research Paper

Architecture details, related work, benchmarks

Browse Models

27 models with individual cards and GGUF files

KoinicLabs

Building accessible AI systems for AGI, AI, and Cybersecurity. CPU-first research that runs on consumer hardware.

AGI ResearchAI DevelopmentCybersecurityCPU-First

GitHub HuggingFace

Research Focus

Our core research areas driving the future of accessible AI

AGI

AGI Research

Advancing towards artificial general intelligence through scalable architectures, efficient training methods, and novel reasoning approaches.

AI Development

Building practical AI systems that run efficiently on consumer hardware. Focus on CPU-first architectures and open-source models.

SEC

Cybersecurity

Developing AI-powered security tools, threat detection systems, and privacy-preserving machine learning techniques.

Projects

Our open-source projects

AXL

Active

AXL

Architecture eXperimental Lab — CPU-first code generation. 27 models from 566K to 318M parameters.

Code GenerationCPU-FirstApache 2.0

Key Metrics

Models Released

Research Paper

Team Members

$0.004

Min Training Cost

100%

Open Source

Open Research Problems

Challenges we are working on

Efficient Attention Mechanisms Hard

Developing attention mechanisms that scale sub-quadratically with sequence length while maintaining quality.

CPU-Optimal Model Architectures Medium

Finding architectural choices that maximize throughput on consumer CPUs without GPU acceleration.

Multi-Scale Tokenization Medium

Novel tokenization approaches that adaptively represent information at multiple granularities.

Adversarial Robustness in LLMs Hard

Making large language models resistant to adversarial prompts and distribution shifts.

Team

The people behind KoinicLabs

Kennedy

CEO & Head of AI Research

Founder leading AGI research and AI development. Focused on accessible, open-source AI systems.

Jasser

CTO & Head of Cybersecurity

Leading cybersecurity research and technical architecture. Expert in secure AI systems.

Taem

Head of Marketing/Sales/Technical Assist

Leading marketing, sales, and technical assistance for KoinicLabs.

Milestones

Our journey and achievements

2026 - Q1

Project Inception

KoinicLabs founded with mission to make AI accessible on consumer hardware.

2026 - Q2

AXL Alpha Release

First AXL models released — 566K parameter code generation model.

2026 - Q3

AXL Model Family

Expanded to 27 models ranging from 566K to 318M parameters.

2026 - Q4

GGUF Export Support

Added native GGUF export for all models — deployment on llama.cpp and Ollama.

2026 - Present

Research Expansion

Expanding into AGI research, cybersecurity, and new projects under KoinicLabs.

FAQ

Frequently asked questions

What makes KoinicLabs different from other AI labs?

We are the only research lab focused on CPU-first AI. While others optimize for GPU clusters costing millions, we optimize for accessibility. Our models can be trained on a consumer laptop for less than a penny.

Is AXL open source?

Yes! All AXL models are released under Apache 2.0 license. Training code, weights, and documentation are all publicly available on our GitHub.

How can I contribute?

We welcome contributions! Check our GitHub for open issues, join discussions, and submit pull requests. We also welcome research collaboration.

What hardware do I need to run AXL models?

The smallest models (566K-2M parameters) can run on any modern CPU. Larger models (up to 318M) work well on consumer laptops with 8GB+ RAM. No GPU required.

How do you keep costs so low?

Our CPU-first approach eliminates GPU costs entirely. We use efficient architectures, byte-level tokenization (reducing vocabulary overhead), and multi-scale design to minimize compute requirements.

KoinicLabs/Testing-Models-Datasets-Etc

0 Bytes

Collections 1

models 27

datasets 4

KoinicLabs/AXL-Research-Paper

Viewer • Updated 3 days ago • 1 • 14

KoinicLabs/AXL-Architecture-eXperimental-Lab

Updated 3 days ago • 12

KoinicLabs/AXL-ARCHITECTURE

Updated 3 days ago • 9

KoinicLabs/AXL-DATASET-1

Viewer • Updated 3 days ago • 15.5M • 27

AI & ML interests

Recent Activity

Team members 2

AXL

Why AXL Exists

The Problem

What AXL Changes

Results

Conclusion

How It Works

Byte Tokenization vs BPE

BPE (Standard)

Byte (AXL)

AXL Coarse Scale

Training Cost

Model Family

Get Started

Python API (Full Quality)

Train Your Own

Ollama (Degraded)

Honest Trade-offs

When AXL works better

When AXL works worse

Common Questions

Resources

GitHub

Research Paper

Browse Models

KoinicLabs

Research Focus

AGI Research

AI Development

Cybersecurity

Projects

AXL

Key Metrics

Open Research Problems

Efficient Attention Mechanisms Hard

CPU-Optimal Model Architectures Medium

Multi-Scale Tokenization Medium

Adversarial Robustness in LLMs Hard

Team

Kennedy

Jasser

Taem

Milestones

Project Inception

AXL Alpha Release

AXL Model Family

GGUF Export Support

Research Expansion

FAQ

Resources

GitHub

HuggingFace

Research Papers

buckets 1

KoinicLabs/Testing-Models-Datasets-Etc

Collections 1

models 27 Sort: Recently updated

datasets 4 Sort: Recently updated

models 27

datasets 4