Instructions to use LLM-OS-Models/KoHRM-Text-1.4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/KoHRM-Text-1.4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/KoHRM-Text-1.4B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLM-OS-Models/KoHRM-Text-1.4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/KoHRM-Text-1.4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B
- SGLang
How to use LLM-OS-Models/KoHRM-Text-1.4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/KoHRM-Text-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/KoHRM-Text-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM-OS-Models/KoHRM-Text-1.4B with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/KoHRM-Text-1.4B
KoHRM-Text-1.4B
Language / ์ธ์ด: English | ํ๊ตญ์ด
English
KoHRM-Text-1.4B is a scratch-pretrained Korean/English/code/terminal/tool-use model built from the sapientinc/HRM-Text PrefixLM training stack.
This is not a continued finetune of sapientinc/HRM-Text-1B. It uses a new Korean/terminal-oriented 131K byte-level BPE tokenizer and a new scratch training run.
Current Status
This repository is a rolling latest public model export. Training is still in progress.
- Main repo:
LLM-OS-Models/KoHRM-Text-1.4B - Current public files:
model.safetensors,config.json, tokenizer files, and thisREADME.md - Raw FSDP2 resume checkpoints:
LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints - Prepared data:
LLM-OS-Models/KoHRM-Text-1.4B-prepared-data - Project code: https://github.com/LLM-OS-Models/KoHRM-text
- Upstream HRM-Text code: https://github.com/sapientinc/HRM-Text
- HRM-Text paper: https://arxiv.org/html/2605.20613
- Tokenizer repo:
LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K
The main branch is overwritten with the newest converted EMA safetensors export as training checkpoints are uploaded. To test the latest public weight, download revision="main".
Important Compatibility Note
The public repo currently contains the converted model weights and tokenizer, but it does not yet include a Hugging Face trust_remote_code modeling implementation for HrmTextForCausalLM.
What works today:
- Download the latest public weights.
- Load the tokenizer with
AutoTokenizer. - Inspect
config.json. - Verify
model.safetensorson CPU or Colab T4.
What is not supported yet in plain Transformers:
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")- One-line hosted text generation from this repo
Expected reason: model_type: "hrm_text" is a custom HRM-Text architecture. Public generation will require adding the compatible HrmTextForCausalLM remote-code files to this model repo or releasing a standard wrapper.
Model Details
| Field | Value |
|---|---|
| Model id | LLM-OS-Models/KoHRM-Text-1.4B |
| Standard name | KoHRM-Text-1.4B |
| Training origin | scratch |
| Architecture family | HRM-Text PrefixLM |
| Architecture size | XL |
| Parameters | 1,384,120,320 |
| Context length | 4,096 tokens |
| Training dtype | bfloat16 |
| Public export dtype | bfloat16 EMA safetensors |
| Tokenizer | byte-level BPE, NFC normalization |
| Vocabulary size | 131,072 |
| Objective | PrefixLM response-only loss |
| Optimizer | Adam-atan2 from upstream HRM-Text |
| EMA | 0.9999 |
Converted config highlights:
{
"model_type": "hrm_text",
"architectures": ["HrmTextForCausalLM"],
"vocab_size": 131072,
"hidden_size": 1536,
"num_hidden_layers": 32,
"num_attention_heads": 12,
"max_position_embeddings": 4096,
"prefix_lm": true
}
Compared With The HRM-Text Paper
This run can take longer than the paper recipe even on 8 x H200 because the setup is not identical:
- The paper reference used 16 x H100; this run uses 8 x H200.
- KoHRM uses a larger 131K tokenizer vocabulary, compared with the upstream 65K tokenizer.
- The public KoHRM size is about 1.38B parameters.
- The stable long-run batch is
180,224tokens/step after OOM probing; larger batches were possible briefly but not chosen for reliability. - The continuation includes extra Korean, terminal, tool-call, legal, finance, wiki, and repeated HRM-cleaned stages.
This does not automatically guarantee better benchmark scores. The expected upside is domain-specific: Korean tokenization efficiency, Korean legal/finance/wiki coverage, terminal trajectories, tool-call formatting, and code-oriented behavior should have a better chance than the upstream English/general checkpoint. Final claims require evaluation after the planned continuation and SFT finish.
Tokenizer
The tokenizer was trained for Korean, English, code, shell/terminal text, and JSON/tool-call formats. It keeps common chat/tool special tokens as stable single tokens where possible.
| Sample bucket | chars/token |
|---|---|
| Korean general text | 2.60 |
| Korean legal text | 2.36 |
| Korean terminal instruction | 2.18 |
| shell command | 2.68 |
| tool-call JSON | 3.32 |
| Python code | 3.37 |
| English | 4.40 |
Formatting tokens:
<|im_start|> instruction start
<|im_end|> instruction end
<|box_end|> response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|> chain-of-thought style condition
<|quad_start|> noisy condition
<|quad_end|> synthetic condition
Prompt format used by the project-side inference code:
<|im_start|><|object_ref_start|>YOUR_PROMPT_HERE<|im_end|>
CPU / Colab T4 Quick Test
Use this to test the latest public weight files on CPU or a Colab T4 runtime. This verifies that the tokenizer, config, and model.safetensors are downloadable and readable.
It does not run text generation yet, because the public repo does not yet ship the custom HRM-Text modeling wrapper.
!pip -q install -U huggingface_hub transformers safetensors accelerate
from pathlib import Path
import json
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
from safetensors.torch import load_file
repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"
repo_dir = Path(snapshot_download(
repo_id,
revision="main",
allow_patterns=[
"README.md",
"config.json",
"tokenizer.json",
"tokenizer_config.json",
"special_tokens_map.json",
"model.safetensors",
],
))
print("Downloaded to:", repo_dir)
print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])
tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True)
prompt = "<|im_start|><|object_ref_start|>ํ๊ตญ์ด๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ ๋ช
๋ น์ ์๋ ค์ฃผ์ธ์.<|im_end|>"
ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
print("prompt tokens:", len(ids))
print("first token ids:", ids[:20])
# CPU weight integrity check. This loads about 2.8GB of bf16 weights into CPU RAM.
state = load_file(str(repo_dir / "model.safetensors"), device="cpu")
num_tensors = len(state)
num_params = sum(t.numel() for t in state.values())
first_key = next(iter(state))
print("num_tensors:", num_tensors)
print("num_params:", f"{num_params:,}")
print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype)
Expected result:
model_typeshould behrm_text.vocab_sizeshould be131072.num_paramsshould be around1.38B.- Tokenizer loading should work on CPU and Colab T4.
AutoModelForCausalLMgeneration is expected to be unavailable until remote-code support is added.
If you try this:
from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
and it fails with an unknown hrm_text architecture, that is expected for the current public export.
Internal / Project-Side Generation
For actual generation today, use the project code and raw FSDP2 checkpoints. This is the currently supported copy-paste path for CUDA machines. A BF16-capable GPU with enough VRAM is recommended; Colab T4 is useful for the smoke test above, not for this raw-checkpoint generation path.
git clone https://github.com/LLM-OS-Models/KoHRM-text
cd KoHRM-text
python -m venv .venv
source .venv/bin/activate
pip install -U pip wheel
pip install -r requirements.txt
pip install -U "huggingface_hub[cli]"
export TOKENIZERS_PARALLELISM=false
export NUMEXPR_MAX_THREADS=128
Download the latest uploaded raw checkpoint example. This example uses stage1b-hrm-fastcap-repeat-step310000, which is available in the raw checkpoint repo. When a newer raw checkpoint is uploaded, change both the include path and ckpt_step.
mkdir -p checkpoints/kohm-raw
huggingface-cli download LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints \
--include "stage1b-hrm-fastcap-repeat-step310000/**" \
--local-dir checkpoints/kohm-raw
Create and run a minimal generation script:
cat > run_kohrm_raw_generate.py <<'PY'
import os
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
os.environ.setdefault("NUMEXPR_MAX_THREADS", "128")
from simple_inference_engine import inference_load_checkpoint, inference_generate
ckpt_dir = "checkpoints/kohm-raw/stage1b-hrm-fastcap-repeat-step310000"
prompts = [
(
0,
(
"direct",
"ํ๊ตญ์ด ์กด๋๋ง๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ์ฉ๋์ด ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ bash ๋ช
๋ น์ ์ ์ํด ์ฃผ์ธ์.",
),
),
(
1,
(
"direct",
"Write a Python function that validates a JSON tool-call object with name and arguments.",
),
),
]
ckpt = inference_load_checkpoint(
ckpt_path=ckpt_dir,
ckpt_epoch=None,
ckpt_step=310000,
ckpt_use_ema=True,
device="cuda",
)
for pid, text in inference_generate(
ckpt,
iter(prompts),
max_tokens=1024,
max_generation=256,
batch_size=1,
temp=0.0,
):
print(f"\n### sample {pid}\n{text}")
PY
python run_kohrm_raw_generate.py
Prompt format is handled by InferenceCheckpoint.tokenize_prompt. The first tuple item is the condition string, usually "direct", and the second item is the user prompt. Internally this becomes:
<|im_start|><|object_ref_start|>PROMPT<|im_end|>
If you want to test a newer raw checkpoint:
- Check the raw checkpoint repo for the newest uploaded stage/step.
- Change the
huggingface-cli download --includepattern. - Change
ckpt_dir. - Change
ckpt_step.
Plain AutoModelForCausalLM generation from model.safetensors will be added later when the public trust_remote_code wrapper is available.
Training Data
Prepared data artifacts are uploaded to:
https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data
The training objective is PrefixLM response-only loss. Instruction/prompt tokens are visible as context, while loss is applied to the response span.
Major prepared data groups:
| Dataset group | Tokens | Use |
|---|---|---|
koterm_pretrain_mix_v1 |
711.3M | stage-0/stage0b |
| HRM cleaned fast-cap stage1/stage1b | 14.55B | HRM-style instruction pretraining |
| HRM cleaned full/no-cap stage2 | 14.55B | completed continuation |
| HRM cleaned full/no-cap extra stage2b | 14.55B | active continuation |
| Local terminal conversations | 9.39B | terminal/code/tool-heavy continuation |
| Korean tool/legal/wiki/finance mix | 3.02B | Korean domain and tool continuation |
| BCAI Finance Korean | 857.7M | Korean finance/domain data |
| Korean legal/admin task data | 629.0M | Korean legal/admin data |
| Korean Wikipedia | 462.5M | Korean general text |
| ToolBench train tool-call data | 127.0M | tool-call pretraining |
| SWE-ZERO + GLM reasoning subsets | 251.2M | code/reasoning data |
Evaluation-like datasets are excluded where identified, including ToolBench eval, Terminal Bench style evaluation data, and benchmark-oriented chi-bench data.
Training Run
The current run uses staged continuation:
stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b
-> stage1c
-> stage2c
-> stage3c
-> stage4c
The checkpoint carries model weights, optimizer state, EMA weights, and recurrent carry state. resume_step_offset and total_steps_override are used so the learning-rate schedule follows the intended longer run instead of resetting at each stage.
As of 2026-05-27, stage2b is active. The continuation watcher is scheduled to launch stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c after each completed checkpoint. The handoff reads the actual epoch_1_info.json global_step from each completed checkpoint before starting the next stage.
Intended Use
This checkpoint is intended for:
- continued pretraining experiments
- Korean tokenizer and HRM-Text architecture experiments
- terminal/tool-call/code pretraining research
- checkpoint conversion and evaluation work
It is not yet intended as a finished assistant model.
Limitations
- This is an intermediate checkpoint, not a final aligned instruct model.
- The full planned continuation has not finished.
- Final SFT and safety tuning have not been completed.
- Public benchmark scores for this new checkpoint are not final.
- Plain Transformers generation requires adding the custom
hrm_textmodeling wrapper or remote-code files. - Tool-call JSON validity and terminal action safety must be evaluated before production use.
Citation
This work builds on HRM-Text:
- Paper: https://arxiv.org/html/2605.20613
- Upstream code: https://github.com/sapientinc/HRM-Text
ํ๊ตญ์ด
KoHRM-Text-1.4B๋ sapientinc/HRM-Text์ PrefixLM ํ์ต ์คํ์ ๊ธฐ๋ฐ์ผ๋ก ์ฒ์๋ถํฐ ํ์ต ์ค์ธ ํ๊ตญ์ด/์์ด/์ฝ๋/ํฐ๋ฏธ๋/ํด์ฝ ๋ชจ๋ธ์
๋๋ค.
์ด ๋ชจ๋ธ์ sapientinc/HRM-Text-1B๋ฅผ ์ด์ด์ ํ์ธํ๋ํ ๋ชจ๋ธ์ด ์๋๋๋ค. ํ๊ตญ์ด์ ํฐ๋ฏธ๋/ํด์ฝ ํ์์ ๋ง์ถฐ ์๋ก ๋ง๋ 131K byte-level BPE tokenizer๋ฅผ ์ฌ์ฉํ๋ฉฐ, ๊ฐ์ค์น๋ scratch pretraining์ผ๋ก ํ์ตํฉ๋๋ค.
ํ์ฌ ์ํ
์ด ์ ์ฅ์๋ ์ต์ ๊ณต๊ฐ ๋ณํ๋ณธ์ ๊ณ์ ๋ฎ์ด์ฐ๋ rolling latest model repo์ ๋๋ค. ํ์ต์ ์์ง ์งํ ์ค์ ๋๋ค.
- ๋ฉ์ธ ๋ชจ๋ธ repo:
LLM-OS-Models/KoHRM-Text-1.4B - ํ์ฌ ๊ณต๊ฐ ํ์ผ:
model.safetensors,config.json, tokenizer ํ์ผ,README.md - raw FSDP2 resume checkpoint:
LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints - prepared data:
LLM-OS-Models/KoHRM-Text-1.4B-prepared-data - ํ๋ก์ ํธ ์ฝ๋: https://github.com/LLM-OS-Models/KoHRM-text
- ์๋ณธ HRM-Text ์ฝ๋: https://github.com/sapientinc/HRM-Text
- HRM-Text ๋ ผ๋ฌธ: https://arxiv.org/html/2605.20613
- tokenizer repo:
LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K
์ต์ ๊ณต๊ฐ weight๋ฅผ ํ
์คํธํ๋ ค๋ฉด revision="main"์ผ๋ก ๋ค์ด๋ก๋ํ๋ฉด ๋ฉ๋๋ค. ํ์ต ์ค 10,000 step ๋จ์๋ก ์ checkpoint๊ฐ ๋ณํ๋์ด ์ฌ๋ผ์ค๋ฉด ๊ฐ์ ํ์ผ๋ช
์ด ์ต์ EMA safetensors๋ก ๊ฐฑ์ ๋ฉ๋๋ค.
์ค์ํ ํธํ์ฑ ์๋ด
ํ์ฌ ๊ณต๊ฐ repo์๋ ๋ณํ๋ model weight์ tokenizer๊ฐ ์์ง๋ง, ์์ง Hugging Face trust_remote_code์ฉ HrmTextForCausalLM ๊ตฌํ ํ์ผ์ ํฌํจ๋์ด ์์ง ์์ต๋๋ค.
ํ์ฌ ๋ฐ๋ก ๊ฐ๋ฅํ ๊ฒ:
- ์ต์ ๊ณต๊ฐ weight ๋ค์ด๋ก๋
AutoTokenizer๋ก tokenizer ๋ก๋config.jsonํ์ธ- CPU ๋๋ Colab T4์์
model.safetensors๋ฌด๊ฒฐ์ฑ ํ์ธ
์์ง ์ผ๋ฐ Transformers์์ ๋ฐ๋ก ์ ๋๋ ๊ฒ:
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")- ์ด repo๋ง์ผ๋ก one-line text generation ์คํ
์ด์ ๋ model_type: "hrm_text"๊ฐ custom HRM-Text architecture์ด๊ธฐ ๋๋ฌธ์
๋๋ค. ๊ณต๊ฐ generation์ ํ๋ ค๋ฉด ์ด model repo์ HrmTextForCausalLM remote-code wrapper๊ฐ ์ถ๊ฐ๋์ด์ผ ํฉ๋๋ค.
๋ชจ๋ธ ์์ธ
| ํญ๋ชฉ | ๊ฐ |
|---|---|
| ๋ชจ๋ธ ID | LLM-OS-Models/KoHRM-Text-1.4B |
| ํ์ค ์ด๋ฆ | KoHRM-Text-1.4B |
| ํ์ต ์ถ๋ฐ์ | scratch |
| ์ํคํ ์ฒ ๊ณ์ด | HRM-Text PrefixLM |
| ์ํคํ ์ฒ ํฌ๊ธฐ | XL |
| ํ๋ผ๋ฏธํฐ | 1,384,120,320 |
| ์ปจํ ์คํธ ๊ธธ์ด | 4,096 tokens |
| ํ์ต dtype | bfloat16 |
| ๊ณต๊ฐ ๋ณํ๋ณธ dtype | bfloat16 EMA safetensors |
| tokenizer | byte-level BPE, NFC normalization |
| vocabulary size | 131,072 |
| objective | PrefixLM response-only loss |
| optimizer | HRM-Text์ Adam-atan2 |
| EMA | 0.9999 |
๋ณํ๋ config ์ฃผ์ ๊ฐ:
{
"model_type": "hrm_text",
"architectures": ["HrmTextForCausalLM"],
"vocab_size": 131072,
"hidden_size": 1536,
"num_hidden_layers": 32,
"num_attention_heads": 12,
"max_position_embeddings": 4096,
"prefix_lm": true
}
HRM-Text ๋ ผ๋ฌธ ๋๋น
ํ์ฌ run์ ๋ ผ๋ฌธ recipe๋ณด๋ค ๋ ์ค๋ ๊ฑธ๋ฆด ์ ์์ต๋๋ค. ์ค์ ์ด ์์ ํ ๊ฐ์ง ์๊ธฐ ๋๋ฌธ์ ๋๋ค.
- ๋ ผ๋ฌธ ๊ธฐ์ค์ 16 x H100์ด๊ณ , ํ์ฌ run์ 8 x H200์ ๋๋ค.
- KoHRM์ ์๋ณธ 65K tokenizer๋ณด๋ค ํฐ 131K tokenizer vocab์ ์๋๋ค.
- ๊ณต๊ฐ KoHRM ํฌ๊ธฐ๋ ์ฝ 1.38B parameters์ ๋๋ค.
- ์์ ์ฅ๊ธฐ run batch๋ OOM probe ์ดํ
180,224tokens/step์ผ๋ก ์ก์์ต๋๋ค. ๋ ํฐ batch๋ ์ด๋ฐ์ ๊ฐ๋ฅํด ๋ณด์ฌ๋ ์ฅ๊ธฐ ์์ ์ฑ์ด ๋จ์ด์ก์ต๋๋ค. - ํ๊ตญ์ด, ํฐ๋ฏธ๋, ํด์ฝ, ๋ฒ๋ฅ , ๊ธ์ต, ์ํค, HRM-cleaned ๋ฐ๋ณต stage๊ฐ ์ถ๊ฐ๋์ต๋๋ค.
์ด๊ฒ์ด ์๋์ผ๋ก ๋ชจ๋ benchmark ์ ์ ์์น์ ๋ณด์ฅํ์ง๋ ์์ต๋๋ค. ๋ค๋ง ํ๊ตญ์ด ํ ํฌ๋์ด์ ํจ์จ, ํ๊ตญ์ด ๋ฒ๋ฅ /๊ธ์ต/์ํค coverage, ํฐ๋ฏธ๋ trajectory, tool-call formatting, code-oriented behavior ์ชฝ์ ์๋ณธ ์์ด/general checkpoint๋ณด๋ค ์ข์์ง ๊ฐ๋ฅ์ฑ์ด ์์ต๋๋ค. ์ต์ข ์ฃผ์ฅ์ continuation๊ณผ SFT๊ฐ ๋๋ ๋ค ํ๊ฐ๋ก ํ์ธํด์ผ ํฉ๋๋ค.
ํ ํฌ๋์ด์
ํ ํฌ๋์ด์ ๋ ํ๊ตญ์ด, ์์ด, ์ฝ๋, shell/terminal ํ ์คํธ, JSON/tool-call ํ์์ ๊ณ ๋ คํด์ ๋ง๋ค์์ต๋๋ค. ์์ฃผ ์ฐ๋ chat/tool special token์ ๊ฐ๋ฅํ ํ ์์ ์ ์ธ ๋จ์ผ token์ผ๋ก ์ ์งํฉ๋๋ค.
| ์ํ ์ข ๋ฅ | chars/token |
|---|---|
| ํ๊ตญ์ด ์ผ๋ฐ | 2.60 |
| ํ๊ตญ์ด ๋ฒ๋ฅ | 2.36 |
| ํ๊ตญ์ด ํฐ๋ฏธ๋ ์ง์ | 2.18 |
| shell command | 2.68 |
| tool-call JSON | 3.32 |
| Python code | 3.37 |
| ์์ด | 4.40 |
ํฌ๋งท token:
<|im_start|> instruction ์์
<|im_end|> instruction ์ข
๋ฃ
<|box_end|> response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|> chain-of-thought style condition
<|quad_start|> noisy condition
<|quad_end|> synthetic condition
ํ๋ก์ ํธ ๋ด๋ถ inference code๊ฐ ์ฐ๋ prompt ํ์:
<|im_start|><|object_ref_start|>์ฌ๊ธฐ์_ํ๋กฌํํธ๋ฅผ_๋ฃ์ต๋๋ค<|im_end|>
CPU / Colab T4 ๋น ๋ฅธ ํ ์คํธ
์๋ ์ฝ๋๋ CPU ํ๊ฒฝ์ด๋ Colab T4 ๋ฐํ์์์ ์ต์ ๊ณต๊ฐ weight ํ์ผ์ ํ์ธํ๋ ์ฉ๋์
๋๋ค. tokenizer, config, model.safetensors๊ฐ ์ ์์ ์ผ๋ก ๋ฐ์์ง๊ณ ์ฝํ๋์ง ๊ฒ์ฆํฉ๋๋ค.
์์ง public repo์ custom HRM-Text modeling wrapper๊ฐ ์๊ธฐ ๋๋ฌธ์ ์ด ์ฝ๋๋ text generation์ ์คํํ์ง ์์ต๋๋ค.
!pip -q install -U huggingface_hub transformers safetensors accelerate
from pathlib import Path
import json
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
from safetensors.torch import load_file
repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"
repo_dir = Path(snapshot_download(
repo_id,
revision="main",
allow_patterns=[
"README.md",
"config.json",
"tokenizer.json",
"tokenizer_config.json",
"special_tokens_map.json",
"model.safetensors",
],
))
print("Downloaded to:", repo_dir)
print("Runtime:", "cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])
tokenizer = AutoTokenizer.from_pretrained(repo_dir, use_fast=True)
prompt = "<|im_start|><|object_ref_start|>ํ๊ตญ์ด๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ ๋ช
๋ น์ ์๋ ค์ฃผ์ธ์.<|im_end|>"
ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
print("prompt tokens:", len(ids))
print("first token ids:", ids[:20])
# CPU weight integrity check. ์ฝ 2.8GB bf16 weight๋ฅผ CPU RAM์ ๋ก๋ํฉ๋๋ค.
state = load_file(str(repo_dir / "model.safetensors"), device="cpu")
num_tensors = len(state)
num_params = sum(t.numel() for t in state.values())
first_key = next(iter(state))
print("num_tensors:", num_tensors)
print("num_params:", f"{num_params:,}")
print("first tensor:", first_key, tuple(state[first_key].shape), state[first_key].dtype)
์ ์ ๊ฒฐ๊ณผ:
model_type์hrm_text์ ๋๋ค.vocab_size๋131072์ ๋๋ค.num_params๋ ์ฝ1.38B์ ๋๋ค.- tokenizer๋ CPU์ Colab T4์์ ์ ์ ๋ก๋๋ฉ๋๋ค.
AutoModelForCausalLMgeneration์ remote-code wrapper๊ฐ ์ถ๊ฐ๋๊ธฐ ์ ๊น์ง๋ ์ ๋๋ ๊ฒ์ด ์ ์์ ๋๋ค.
๋ค์ ์ฝ๋๋ ํ์ฌ public repo ๊ธฐ์ค์ผ๋ก ์คํจํ ์ ์์ต๋๋ค.
from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
hrm_text architecture๋ฅผ ๋ชจ๋ฅธ๋ค๋ ์ค๋ฅ๊ฐ ๋์ค๋ฉด ํ์ฌ ์ํ์์๋ ์ ์์
๋๋ค.
๋ด๋ถ / ํ๋ก์ ํธ ์ฝ๋ ๊ธฐ๋ฐ ์์ฑ
ํ์ฌ ์ค์ generation์ ํ๋ ค๋ฉด ํ๋ก์ ํธ ์ฝ๋์ raw FSDP2 checkpoint๋ฅผ ์ฌ์ฉํฉ๋๋ค. ์ด๊ฒ์ด ์ง๊ธ ๋ฐ๋ก ์ธ ์ ์๋ CUDA ํ๊ฒฝ์ฉ ๊ฒฝ๋ก์ ๋๋ค. BF16์ด ๋๋ ์ถฉ๋ถํ VRAM์ GPU๋ฅผ ๊ถ์ฅํฉ๋๋ค. Colab T4๋ ์ smoke test์๋ ์ธ ์ ์์ง๋ง, raw checkpoint generation ๊ถ์ฅ ๊ฒฝ๋ก๋ ์๋๋๋ค.
git clone https://github.com/LLM-OS-Models/KoHRM-text
cd KoHRM-text
python -m venv .venv
source .venv/bin/activate
pip install -U pip wheel
pip install -r requirements.txt
pip install -U "huggingface_hub[cli]"
export TOKENIZERS_PARALLELISM=false
export NUMEXPR_MAX_THREADS=128
ํ์ฌ ๋ฐ๋ก ๋ฐ์ ์ ์๋ raw checkpoint ์์์
๋๋ค. ์๋ ์์๋ raw checkpoint repo์ ์ฌ๋ผ์จ stage1b-hrm-fastcap-repeat-step310000์ ์ฌ์ฉํฉ๋๋ค. ๋ ์ต์ raw checkpoint๊ฐ ์ฌ๋ผ์ค๋ฉด include path์ ckpt_step์ ๊ฐ์ด ๋ฐ๊พธ๋ฉด ๋ฉ๋๋ค.
mkdir -p checkpoints/kohm-raw
huggingface-cli download LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints \
--include "stage1b-hrm-fastcap-repeat-step310000/**" \
--local-dir checkpoints/kohm-raw
์ต์ generation script:
cat > run_kohrm_raw_generate.py <<'PY'
import os
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
os.environ.setdefault("NUMEXPR_MAX_THREADS", "128")
from simple_inference_engine import inference_load_checkpoint, inference_generate
ckpt_dir = "checkpoints/kohm-raw/stage1b-hrm-fastcap-repeat-step310000"
prompts = [
(
0,
(
"direct",
"ํ๊ตญ์ด ์กด๋๋ง๋ก ํ์ฌ ๋๋ ํฐ๋ฆฌ์์ ์ฉ๋์ด ๊ฐ์ฅ ํฐ ํ์ผ 10๊ฐ๋ฅผ ์ฐพ๋ bash ๋ช
๋ น์ ์ ์ํด ์ฃผ์ธ์.",
),
),
(
1,
(
"direct",
"Write a Python function that validates a JSON tool-call object with name and arguments.",
),
),
]
ckpt = inference_load_checkpoint(
ckpt_path=ckpt_dir,
ckpt_epoch=None,
ckpt_step=310000,
ckpt_use_ema=True,
device="cuda",
)
for pid, text in inference_generate(
ckpt,
iter(prompts),
max_tokens=1024,
max_generation=256,
batch_size=1,
temp=0.0,
):
print(f"\n### sample {pid}\n{text}")
PY
python run_kohrm_raw_generate.py
Prompt formatting์ InferenceCheckpoint.tokenize_prompt๊ฐ ์ฒ๋ฆฌํฉ๋๋ค. tuple์ ์ฒซ ๋ฒ์งธ ๊ฐ์ condition string์ด๊ณ ๋ณดํต "direct"๋ฅผ ์๋๋ค. ๋ ๋ฒ์งธ ๊ฐ์ ์ฌ์ฉ์ prompt์
๋๋ค. ๋ด๋ถ์ ์ผ๋ก๋ ๋ค์ ํ์์ด ๋ฉ๋๋ค.
<|im_start|><|object_ref_start|>PROMPT<|im_end|>
๋ ์ต์ raw checkpoint๋ฅผ ํ ์คํธํ๋ ค๋ฉด:
- raw checkpoint repo์์ ๊ฐ์ฅ ์ต์ stage/step์ ํ์ธํฉ๋๋ค.
huggingface-cli download --includepattern์ ๋ฐ๊ฟ๋๋ค.ckpt_dir๋ฅผ ๋ฐ๊ฟ๋๋ค.ckpt_step์ ๋ฐ๊ฟ๋๋ค.
๊ณต๊ฐ model.safetensors์์ ๋ฐ๋ก AutoModelForCausalLM generation์ ํ๋ ๊ฒฝ๋ก๋ public trust_remote_code wrapper๋ฅผ ์ถ๊ฐํ ๋ค ์ง์ํ ์์ ์
๋๋ค.
ํ์ต ๋ฐ์ดํฐ
prepared data๋ ์๋ dataset repo์ ์ ๋ก๋ํฉ๋๋ค.
https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data
ํ์ต objective๋ PrefixLM response-only loss์ ๋๋ค. instruction/prompt token์ context๋ก ๋ณด๊ณ , loss๋ response span์๋ง ์ ์ฉํฉ๋๋ค.
์ฃผ์ prepared data group:
| ๋ฐ์ดํฐ ๊ทธ๋ฃน | Tokens | ์ฉ๋ |
|---|---|---|
koterm_pretrain_mix_v1 |
711.3M | stage-0/stage0b |
| HRM cleaned fast-cap stage1/stage1b | 14.55B | HRM-style instruction pretraining |
| HRM cleaned full/no-cap stage2 | 14.55B | ์๋ฃ๋ continuation |
| HRM cleaned full/no-cap extra stage2b | 14.55B | ์งํ ์ค์ธ continuation |
| local terminal conversations | 9.39B | terminal/code/tool-heavy continuation |
| Korean tool/legal/wiki/finance mix | 3.02B | ํ๊ตญ์ด domain/tool continuation |
| BCAI Finance Korean | 857.7M | ํ๊ตญ์ด ๊ธ์ต/domain data |
| Korean legal/admin task data | 629.0M | ํ๊ตญ์ด ๋ฒ๋ฅ /ํ์ data |
| Korean Wikipedia | 462.5M | ํ๊ตญ์ด ์ผ๋ฐ ํ ์คํธ |
| ToolBench train tool-call data | 127.0M | tool-call pretraining |
| SWE-ZERO + GLM reasoning subsets | 251.2M | code/reasoning data |
ํ๊ฐ ์ฑ๊ฒฉ ๋ฐ์ดํฐ๋ ํ์ธ๋๋ ๋ฒ์์์ train์์ ์ ์ธํฉ๋๋ค. ์์๋ ToolBench eval, Terminal Bench ๊ณ์ด ํ๊ฐ ๋ฐ์ดํฐ, benchmark ์ฑ๊ฒฉ์ chi-bench์
๋๋ค.
ํ์ต ์งํ
ํ์ฌ run์ staged continuation ๋ฐฉ์์ ๋๋ค.
stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b
-> stage1c
-> stage2c
-> stage3c
-> stage4c
checkpoint๋ model weights, optimizer state, EMA weights, recurrent carry state๋ฅผ ์ด์ด๊ฐ๋๋ค. resume_step_offset๊ณผ total_steps_override๋ฅผ ์จ์ stage๋ง๋ค learning-rate schedule์ด ๋ฆฌ์
๋์ง ์๊ณ ๊ธด pretraining run์ฒ๋ผ ์ด์ด์ง๊ฒ ํฉ๋๋ค.
2026-05-27 ๊ธฐ์ค stage2b๊ฐ ์งํ ์ค์
๋๋ค. continuation watcher๊ฐ ์ดํ stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c๋ฅผ ์ด์ด์ ์คํํ๋๋ก ์์ฝ๋์ด ์์ต๋๋ค. handoff๋ ๊ฐ stage์ ์ค์ epoch_1_info.json global_step์ ์ฝ๊ณ ๋ค์ stage๋ฅผ ์์ํฉ๋๋ค.
์ฌ์ฉ ๋ชฉ์
์ด checkpoint๋ ๋ค์ ๋ชฉ์ ์ ์ ํฉํฉ๋๋ค.
- continued pretraining ์คํ
- ํ๊ตญ์ด tokenizer ๋ฐ HRM-Text architecture ์คํ
- terminal/tool-call/code pretraining ์ฐ๊ตฌ
- checkpoint conversion ๋ฐ evaluation ์์
์์ง ์์ฑ๋ assistant model์ ์๋๋๋ค.
์ ํ ์ฌํญ
- ์ค๊ฐ checkpoint์ด๋ฉฐ ์ต์ข aligned instruct model์ด ์๋๋๋ค.
- ์ ์ฒด planned continuation์ด ์์ง ๋๋์ง ์์์ต๋๋ค.
- ์ต์ข SFT์ safety tuning์ด ์์ง ๋๋์ง ์์์ต๋๋ค.
- ์ checkpoint์ public benchmark score๋ ์์ง final์ด ์๋๋๋ค.
- ์ผ๋ฐ Transformers generation์ custom
hrm_textmodeling wrapper ๋๋ remote-code file์ด ์ถ๊ฐ๋์ด์ผ ๊ฐ๋ฅํฉ๋๋ค. - tool-call JSON ์ ํจ์ฑ๊ณผ terminal action safety๋ ์ค์ ์ฌ์ฉ ์ ์ ๋ณ๋ ํ๊ฐ๊ฐ ํ์ํฉ๋๋ค.
์ธ์ฉ
์ด ์์ ์ HRM-Text architecture์ training stack์ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค.
- ๋ ผ๋ฌธ: https://arxiv.org/html/2605.20613
- ์๋ณธ ์ฝ๋: https://github.com/sapientinc/HRM-Text
- Downloads last month
- 182