Text Generation
Transformers
Safetensors
English
sentience_cascade
recursive-language-model
hybrid-mind
causal-lm
multimodal
self-automated
reinforcement-learning
continual-learning
memory-augmented
conversational
Instructions to use 11-47/Sentience.Cascade.II with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 11-47/Sentience.Cascade.II with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="11-47/Sentience.Cascade.II") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("11-47/Sentience.Cascade.II", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use 11-47/Sentience.Cascade.II with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "11-47/Sentience.Cascade.II" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "11-47/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/11-47/Sentience.Cascade.II
- SGLang
How to use 11-47/Sentience.Cascade.II with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "11-47/Sentience.Cascade.II" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "11-47/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "11-47/Sentience.Cascade.II" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "11-47/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use 11-47/Sentience.Cascade.II with Docker Model Runner:
docker model run hf.co/11-47/Sentience.Cascade.II
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - recursive-language-model | |
| - hybrid-mind | |
| - causal-lm | |
| - multimodal | |
| - self-automated | |
| - reinforcement-learning | |
| - continual-learning | |
| - memory-augmented | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| model_type: sentience_cascade | |
| # Sentience.Cascade.II | |
| **Recursive Language Model (RLM) · Hybrid Mind Frame** | |
| **1.147B Parameters · 64K Context Window · Dual T4 Trained** | |
| --- | |
| ## Overview | |
| **Sentience.Cascade.II** is not a Large Language Model (LLM). | |
| It is a **Recursive Language Model (RLM)** — a novel architecture where every | |
| forward pass includes multiple self-recursive refinement steps, episodic short | |
| and long-term memory, and a fully wired Hybrid Mind module that runs *as one | |
| integrated frame*, not as sequential pipeline stages. | |
| All cognitive subsystems operate inside a single unified forward pass. | |
| --- | |
| ## Architecture | |
| | Component | Detail | | |
| |---|---| | |
| | Architecture type | Recursive Language Model (RLM) | | |
| | Parameters | ~1.147B | | |
| | Context window | 64,000 tokens | | |
| | Attention | Grouped Query Attention (16 heads / 4 KV heads) | | |
| | Positional encoding | RoPE (θ=500,000) | | |
| | FFN | SwiGLU | | |
| | Normalisation | RMSNorm | | |
| | Weight format | safetensors (float32 on disk, bfloat16 for training) | | |
| | Vocabulary | 65,536 (BPE ByteLevel) | | |
| --- | |
| ## Hybrid Mind Frame — Self-Automated (S.A.) Modules | |
| All modules are active simultaneously inside each transformer layer. | |
| None are optional pipeline steps — they are weights baked into the model. | |
| | Module | Role | | |
| |---|---| | |
| | S.A. Meta Learning Gate | Scales activation magnitude as a proxy learning signal | | |
| | S.A. Reinforcement Learning Head | Scalar reward prediction per forward pass | | |
| | S.A. Continual Learning Gate | Soft forgetting-protection via decay gates | | |
| | S.A. Adaptive Learning Scale | Per-token hidden-state scaling | | |
| | S.A. Rewrite Gate | Token-level hidden-state rewriting delta | | |
| | S.A. NLP Head | Span boundary logits for structured extraction | | |
| | S.A. Problem Solving Head | 8-class step-type classification | | |
| | S.A. Innovation Noise | Trainable exploration noise (active during training only) | | |
| | S.A. Debug Probe | 4-class anomalous activation detector | | |
| | S.A. Advanced Short-Term Memory | 512-slot episodic rolling buffer | | |
| | S.A. Advanced Long-Term Memory | 1024-slot consolidated episodic store | | |
| | S.A. Recursive Seed Learning | Multi-step (×4) recursive refinement loop | | |
| | S.A. Self-Evaluation & Reward | Scalar self-score head | | |
| | S.A. Goal & Constraint Engine | Residual goal-projection delta | | |
| | S.A. Memory Consolidation | Automatic STM→LTM every 8 layers | | |
| | S.A. Introspection Interface | 64-dim interpretable summary of hidden state | | |
| | S.A. Recursive Outer Loop Gate | Final gate before residual output | | |
| | Conversational Intelligence | 32-class dialog-act classification head | | |
| | MultiModal (Text/Image/Audio/Video) | Linear projection from ViT-L / mel-spec / video dims | | |
| --- | |
| ## Recursive Language Model Core | |
| Unlike a standard transformer that processes tokens once per layer, **Sentience.Cascade.II** | |
| applies a **RecursiveSeedLayer** after all transformer blocks. This layer runs | |
| `num_recursive_steps=4` passes of attention + FFN with a shared-weight inner loop, | |
| allowing the model to internally "think again" before producing logits. | |
| This is the defining feature of the RLM architecture: | |
| > *Output is not produced after one pass — it is refined recursively.* | |
| --- | |
| ## Memory System | |
| - **Short-Term Memory (512 slots):** Updated every forward pass via a write gate. | |
| Cross-attended by every layer, giving the model persistent intra-context state. | |
| - **Long-Term Memory (1024 slots):** Consolidated from short-term every 8 layers via | |
| a separate consolidation gate with 0.99/0.01 EMA blend. | |
| Persists across training steps when fine-tuning. | |
| --- | |
| ## Multimodal Support | |
| Three input projection heads accept external embeddings: | |
| | Modality | Input dim | Projection | | |
| |---|---|---| | |
| | Image | 1024 (ViT-L patch) | Linear → 2048 | | |
| | Audio | 128 (mel-spectrogram) | Linear → 2048 | | |
| | Video | 1024 (frame embedding) | Linear → 2048 | | |
| These are additive prefix embeddings — concatenate modality tokens before input_ids. | |
| --- | |
| ## Chat Template | |
| ``` | |
| <|system|>You are Sentience.Cascade.II, a recursive reasoning model. | |
| <|user|>What is consciousness? | |
| <|assistant|> | |
| ``` | |
| --- | |
| ## Fine-Tuning | |
| This is the **base pretrained initialisation** — weights are randomly initialised | |
| and the tokenizer is bootstrapped. Fine-tune on your domain corpus using standard | |
| causal-LM training. | |
| Recommended fine-tune config: | |
| ```python | |
| from transformers import TrainingArguments | |
| args = TrainingArguments( | |
| output_dir = "./sc2-finetuned", | |
| per_device_train_batch_size = 1, | |
| gradient_accumulation_steps = 16, | |
| num_train_epochs = 3, | |
| learning_rate = 2e-4, | |
| lr_scheduler_type = "cosine", | |
| warmup_ratio = 0.03, | |
| bf16 = True, | |
| gradient_checkpointing = True, | |
| save_strategy = "steps", | |
| save_steps = 500, | |
| logging_steps = 10, | |
| report_to = "none", | |
| ) | |
| ``` | |
| > **Note:** Because `SentienceCascadeModel` is a custom architecture, you will | |
| > need to register it with the HuggingFace `AutoModel` registry or load it | |
| > with `trust_remote_code=True` after placing the model code in the repo. | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{sentiencecascade2, | |
| author = {GODsStrongestSoldier}, | |
| title = {Sentience.Cascade.II: A Recursive Language Model with Hybrid Mind Frame}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\url{https://huggingface.co/GODsStrongestSoldier/Sentience.Cascade.II}}, | |
| } | |
| ``` | |
| --- | |
| ## License | |
| Apache 2.0 | |