Instructions to use 11-47/Sentience.Cascade.II with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 11-47/Sentience.Cascade.II with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="11-47/Sentience.Cascade.II")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("11-47/Sentience.Cascade.II", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use 11-47/Sentience.Cascade.II with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "11-47/Sentience.Cascade.II"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "11-47/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/11-47/Sentience.Cascade.II

SGLang

How to use 11-47/Sentience.Cascade.II with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "11-47/Sentience.Cascade.II" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "11-47/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "11-47/Sentience.Cascade.II" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "11-47/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use 11-47/Sentience.Cascade.II with Docker Model Runner:
```
docker model run hf.co/11-47/Sentience.Cascade.II
```

Sentience.Cascade.II / README.md

GODsStrongestSoldier

Initial upload: Sentience.Cascade.II RLM 1.147B base weights

87221e0 verified 7 days ago

preview code

raw

history blame contribute delete

5.73 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- recursive-language-model
	- hybrid-mind
	- causal-lm
	- multimodal
	- self-automated
	- reinforcement-learning
	- continual-learning
	- memory-augmented
	pipeline_tag: text-generation
	library_name: transformers
	model_type: sentience_cascade
	---

	# Sentience.Cascade.II

	Recursive Language Model (RLM) · Hybrid Mind Frame
	1.147B Parameters · 64K Context Window · Dual T4 Trained

	---

	## Overview

	Sentience.Cascade.II is not a Large Language Model (LLM).
	It is a Recursive Language Model (RLM) — a novel architecture where every
	forward pass includes multiple self-recursive refinement steps, episodic short
	and long-term memory, and a fully wired Hybrid Mind module that runs *as one
	integrated frame*, not as sequential pipeline stages.

	All cognitive subsystems operate inside a single unified forward pass.

	---

	## Architecture

	\| Component \| Detail \|
	\|---\|---\|
	\| Architecture type \| Recursive Language Model (RLM) \|
	\| Parameters \| ~1.147B \|
	\| Context window \| 64,000 tokens \|
	\| Attention \| Grouped Query Attention (16 heads / 4 KV heads) \|
	\| Positional encoding \| RoPE (θ=500,000) \|
	\| FFN \| SwiGLU \|
	\| Normalisation \| RMSNorm \|
	\| Weight format \| safetensors (float32 on disk, bfloat16 for training) \|
	\| Vocabulary \| 65,536 (BPE ByteLevel) \|

	---

	## Hybrid Mind Frame — Self-Automated (S.A.) Modules

	All modules are active simultaneously inside each transformer layer.
	None are optional pipeline steps — they are weights baked into the model.

	\| Module \| Role \|
	\|---\|---\|
	\| S.A. Meta Learning Gate \| Scales activation magnitude as a proxy learning signal \|
	\| S.A. Reinforcement Learning Head \| Scalar reward prediction per forward pass \|
	\| S.A. Continual Learning Gate \| Soft forgetting-protection via decay gates \|
	\| S.A. Adaptive Learning Scale \| Per-token hidden-state scaling \|
	\| S.A. Rewrite Gate \| Token-level hidden-state rewriting delta \|
	\| S.A. NLP Head \| Span boundary logits for structured extraction \|
	\| S.A. Problem Solving Head \| 8-class step-type classification \|
	\| S.A. Innovation Noise \| Trainable exploration noise (active during training only) \|
	\| S.A. Debug Probe \| 4-class anomalous activation detector \|
	\| S.A. Advanced Short-Term Memory \| 512-slot episodic rolling buffer \|
	\| S.A. Advanced Long-Term Memory \| 1024-slot consolidated episodic store \|
	\| S.A. Recursive Seed Learning \| Multi-step (×4) recursive refinement loop \|
	\| S.A. Self-Evaluation & Reward \| Scalar self-score head \|
	\| S.A. Goal & Constraint Engine \| Residual goal-projection delta \|
	\| S.A. Memory Consolidation \| Automatic STM→LTM every 8 layers \|
	\| S.A. Introspection Interface \| 64-dim interpretable summary of hidden state \|
	\| S.A. Recursive Outer Loop Gate \| Final gate before residual output \|
	\| Conversational Intelligence \| 32-class dialog-act classification head \|
	\| MultiModal (Text/Image/Audio/Video) \| Linear projection from ViT-L / mel-spec / video dims \|

	---

	## Recursive Language Model Core

	Unlike a standard transformer that processes tokens once per layer, Sentience.Cascade.II
	applies a RecursiveSeedLayer after all transformer blocks. This layer runs
	`num_recursive_steps=4` passes of attention + FFN with a shared-weight inner loop,
	allowing the model to internally "think again" before producing logits.

	This is the defining feature of the RLM architecture:
	> Output is not produced after one pass — it is refined recursively.

	---

	## Memory System

	- Short-Term Memory (512 slots): Updated every forward pass via a write gate.
	Cross-attended by every layer, giving the model persistent intra-context state.
	- Long-Term Memory (1024 slots): Consolidated from short-term every 8 layers via
	a separate consolidation gate with 0.99/0.01 EMA blend.
	Persists across training steps when fine-tuning.

	---

	## Multimodal Support

	Three input projection heads accept external embeddings:

	\| Modality \| Input dim \| Projection \|
	\|---\|---\|---\|
	\| Image \| 1024 (ViT-L patch) \| Linear → 2048 \|
	\| Audio \| 128 (mel-spectrogram) \| Linear → 2048 \|
	\| Video \| 1024 (frame embedding) \| Linear → 2048 \|

	These are additive prefix embeddings — concatenate modality tokens before input_ids.

	---

	## Chat Template

	```
	<\|system\|>You are Sentience.Cascade.II, a recursive reasoning model.
	<\|user\|>What is consciousness?
	<\|assistant\|>
	```

	---

	## Fine-Tuning

	This is the base pretrained initialisation — weights are randomly initialised
	and the tokenizer is bootstrapped. Fine-tune on your domain corpus using standard
	causal-LM training.

	Recommended fine-tune config:

	```python
	from transformers import TrainingArguments

	args = TrainingArguments(
	output_dir = "./sc2-finetuned",
	per_device_train_batch_size = 1,
	gradient_accumulation_steps = 16,
	num_train_epochs = 3,
	learning_rate = 2e-4,
	lr_scheduler_type = "cosine",
	warmup_ratio = 0.03,
	bf16 = True,
	gradient_checkpointing = True,
	save_strategy = "steps",
	save_steps = 500,
	logging_steps = 10,
	report_to = "none",
	)
	```

	> Note: Because `SentienceCascadeModel` is a custom architecture, you will
	> need to register it with the HuggingFace `AutoModel` registry or load it
	> with `trust_remote_code=True` after placing the model code in the repo.

	---

	## Citation

	```bibtex
	@misc{sentiencecascade2,
	author = {GODsStrongestSoldier},
	title = {Sentience.Cascade.II: A Recursive Language Model with Hybrid Mind Frame},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/GODsStrongestSoldier/Sentience.Cascade.II}},
	}
	```

	---

	## License

	Apache 2.0