AI & ML interests
None defined yet.
Recent Activity
AXL
Train a 318M-parameter code model in 20 minutes on your laptop. No GPU. $0.004 electricity.
Why AXL Exists
The Problem
Training a 1B-parameter code model on GPU costs $10,000+ in cloud compute. Byte-level tokenization makes sequences 3-4x longer than BPE, multiplying the cost further. The majority of developers — the people who would benefit most from local code generation — simply cannot participate.
What AXL Changes
Three parallel encoder stacks at 1x, 2x, 4x resolution. The coarse scale processes 1/4 of tokens — exactly offsetting the byte tokenization length penalty. Cross-scale attention lets fine details inform coarse context and vice versa.
Results
With matched parameters (both 12.8M), same data, same optimizer, same 3-minute wall-clock: AXL achieves 16x better perplexity on Code-1B and processes 52% more training steps than a standard transformer.
Same model size (12.8M params), same data, same optimizer (Lion), same wall-clock (3 min on Ryzen 5 5600G). AXL wins 2/2 seeds. Std dev: +/- 0.00 (AXL) vs +/- 2.72 (Standard).
Conclusion
AXL proves transformer models can be trained on consumer CPUs. It is a starting point, not a destination — 318M params is tiny by 2026 standards. But the architecture works, the optimizer works, and the $0.004 cost makes iteration feasible for everyone.
How It Works
Three resolution scales process the same sequence in parallel. Coarse attention is 16x cheaper than fine.
Byte Tokenization vs BPE
Byte-level tokenization makes sequences 3-4x longer. The coarse scale exactly offsets this.
BPE (Standard)
def fibonacci(n):Byte (AXL)
def fibonacci(n):AXL Coarse Scale
def fibonacci(n):The 4x byte penalty is exactly offset by the 4x downsampling at coarse scale. No information is lost — fine scale still sees every byte.
Training Cost
Train the entire AXL family for less than a cup of coffee.
Based on AMD Ryzen 5 5600G, 100W system power, US average $0.12/kWh.
Model Family
| Model | Params | PPL | tok/s | Q4_K_M | Time |
|---|---|---|---|---|---|
| AXL-Code-1B-Lion | 318M | 1.90 | 6.1 | 188 MB | 20 min |
| AXL-Reasoning-Lion | 70M | 1.79 | 22.4 | 44 MB | 10 min |
| AXL-Refactor-Lion | 19.1M | 1.11 | 52.2 | 12 MB | 3 min |
| AXL-TestGen-Lion | 15.2M | 1.15 | 57.3 | 18 MB | 3 min |
| AXL-Chat-Lion | 9.9M | 1.52 | 73.4 | 7 MB | 3 min |
| AXL-Micro-Lion | 12.8M | 1.04 | 66.2 | 15 MB | 3 min |
| AXL-Secure-Lion | 11.7M | 1.20 | 63.5 | 8 MB | 3 min |
| AXL-Docs-Lion | 9.9M | 1.12 | 72.8 | 7 MB | 2 min |
| AXL-Comment-Lion | 7.2M | 1.20 | 75.8 | 5 MB | 2 min |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Micro-600K | 600K | 1.04 | Demo | 1 MB |
| AXL-Micro-8M | 12.8M | 3.13 | Code gen | 25 MB |
| AXL-Coder-15M | 26.0M | 1.54 | Agentic | 50 MB |
| AXL-Debugger-8M | 14.1M | 1.49 | Bug fixing | 27 MB |
| AXL-Fixer-12M | 20.9M | 1.52 | Debug | 40 MB |
| AXL-Reasoning-70M | 70M | 1.93 | CoT | 134 MB |
| AXL-300M | 322M | 1.11 | Flagship | 616 MB |
| AXL-Chat-10M | 9.9M | 1.48 | Dialogue | 19 MB |
| AXL-TestGen-15M | 15.2M | 1.15 | Test gen | 30 MB |
| AXL-Refactor-20M | 19.1M | 1.15 | Refactoring | 37 MB |
| AXL-Docs-8M | 9.9M | 1.12 | Docstrings | 19 MB |
| AXL-Comment-5M | 7.2M | 1.16 | Comments | 14 MB |
| AXL-Secure-10M | 11.7M | 1.20 | Security | 23 MB |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Code-1B | 318M | 31.22 | Code gen (SGD) | 606 MB |
| AXL-Chat-Pro | 12.8M | 1.34 | Advanced chat | 25 MB |
| AXL-Translate | 15.2M | 1.86 | Code translation | 29 MB |
Get Started
Full quality via Python API. Degraded quality via Ollama.
Python API (Full Quality)
pip install -e .
python AXL/API/serve_model.py \
--model checkpoints/axl_micro_lion \
--port 8880
# OpenAI-compatible endpoint:
# POST http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChainTrain Your Own
pip install -e .
python scripts/retrain_all_lion.py \
--models micro
# Done in 3 minutes. Model in checkpoints/Ollama (Degraded)
# Warning: uses only 1/3 of AXL architecture
cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion \
"def fibonacci(n):"Honest Trade-offs
AXL is not a silver bullet. Here is where it works and where it does not.
When AXL works better
- Edge deployment (5-40 MB models)
- CPU-only environments (no GPU available)
- Rapid prototyping (2-3 min training)
- Multilingual code (byte tokenizer handles any language)
- Resource-constrained research (students, hobbyists)
- Privacy-sensitive (all data stays local)
When AXL works worse
- Complex multi-step code reasoning
- Long context (max 256-2048 bytes)
- Production-grade code generation
- Benchmark SOTA competition
- Non-code NLP tasks
- Models above 318M parameters
Common Questions
Resources
KoinicLabs
Building accessible AI systems for AGI, AI, and Cybersecurity. CPU-first research that runs on consumer hardware.
Research Focus
Our core research areas driving the future of accessible AI
AGI Research
Advancing towards artificial general intelligence through scalable architectures, efficient training methods, and novel reasoning approaches.
AI Development
Building practical AI systems that run efficiently on consumer hardware. Focus on CPU-first architectures and open-source models.
Cybersecurity
Developing AI-powered security tools, threat detection systems, and privacy-preserving machine learning techniques.
Projects
Our open-source projects
Key Metrics
Open Research Problems
Challenges we are working on
Efficient Attention Mechanisms Hard
Developing attention mechanisms that scale sub-quadratically with sequence length while maintaining quality.
CPU-Optimal Model Architectures Medium
Finding architectural choices that maximize throughput on consumer CPUs without GPU acceleration.
Multi-Scale Tokenization Medium
Novel tokenization approaches that adaptively represent information at multiple granularities.
Adversarial Robustness in LLMs Hard
Making large language models resistant to adversarial prompts and distribution shifts.
Team
The people behind KoinicLabs
Kennedy
Founder leading AGI research and AI development. Focused on accessible, open-source AI systems.
Jasser
Leading cybersecurity research and technical architecture. Expert in secure AI systems.
Taem
Leading marketing, sales, and technical assistance for KoinicLabs.
Milestones
Our journey and achievements
Project Inception
KoinicLabs founded with mission to make AI accessible on consumer hardware.
AXL Alpha Release
First AXL models released — 566K parameter code generation model.
AXL Model Family
Expanded to 27 models ranging from 566K to 318M parameters.
GGUF Export Support
Added native GGUF export for all models — deployment on llama.cpp and Ollama.
Research Expansion
Expanding into AGI research, cybersecurity, and new projects under KoinicLabs.
FAQ
Frequently asked questions
We are the only research lab focused on CPU-first AI. While others optimize for GPU clusters costing millions, we optimize for accessibility. Our models can be trained on a consumer laptop for less than a penny.
Yes! All AXL models are released under Apache 2.0 license. Training code, weights, and documentation are all publicly available on our GitHub.
We welcome contributions! Check our GitHub for open issues, join discussions, and submit pull requests. We also welcome research collaboration.
The smallest models (566K-2M parameters) can run on any modern CPU. Larger models (up to 318M) work well on consumer laptops with 8GB+ RAM. No GPU required.
Our CPU-first approach eliminates GPU costs entirely. We use efficient architectures, byte-level tokenization (reducing vocabulary overhead), and multi-scale design to minimize compute requirements.