Capabilities

Everything your system
needs to be.

Persistent memory, autonomous scheduling, and full OS control — model-independent architecture that adapts to each user.

Desktop Control

Shell commands, file operations, process management, mouse/keyboard control, and native Windows UI automation.

Browser Automation

Full Chrome DevTools Protocol control. Navigate, click, type, submit forms, execute JavaScript, and capture screenshots.

Voice I/O

Local TTS via Kokoro-82M or cloud via ElevenLabs. Speech recognition input. The agent speaks every response aloud.

Multi-Model LLM

Dynamically loads all available models from your API keys, runs entirely offline via your preferred model library, or takes a hybrid approach with both. Any OpenAI-compatible endpoint works. Hot-swap mid-conversation.

Cloud

Anthropic, OpenAI, Google, DeepSeek, xAI

Local

Your preferred local models via any OpenAI-compatible server

Hot-swap

Change models mid-conversation without restarting

Auto-discover

Detects all available models from your API keys

Image Generation

Generate images via Midjourney, NanoBanana, Google Imagen, and more. Results render inline in the chat with click-to-zoom and download.

Plugins & MCP

Hook-based plugin architecture. Connect external MCP tool servers — the agent discovers and calls their tools automatically.

Circuits & Scheduling

File-driven task scheduling via CIRCUITS.md. Recurring tasks, startup routines, and a system tray daemon.

Autonomy & Awareness

Every channel independently configurable — toggle, set intervals, customize prompts

The agent watches your screen, sees through your camera, monitors scheduled tasks, and builds context about your workflow over time. Each awareness channel runs independently with its own cadence — you decide what it sees and how often.

Screen Watch

Periodic screenshots to build workflow context

2–10 min · Default off

Camera Vision

See through your phone's camera via mobile UI

30s–2 min · 50% silent

Auto Messages

Proactive check-ins, suggestions, observations

1–5 min · Custom prompt

SMS Listener

Polls for texts, replies conversationally

5s poll · Gmail OAuth2

Auto Notes

Creates Obsidian notes from key conversation points

10–30 min · Default off

Auto Image Gen

Generates images inspired by conversation context

5–15 min · Custom prompt

Persistent Memory

Unified SQLite with FTS5 full-text search and vector embeddings. Hybrid keyword + semantic retrieval across sessions.

FTS5

Full-text search

Vector

Semantic embeddings

Hybrid

BM25 + cosine

Avatar & Identity

A living, animated avatar with breathing, talking, bounce, wiggle, and squish reactions. Fully customizable personality.

SUBSTRATE.md

Core identity

PRIME.md

Startup behavior

CIRCUITS.md

Recurring tasks

Desktop & Mobile UI

Electron desktop app with animated avatar, plus a PWA-capable WebUI for any phone, tablet, or browser on your network.

Remote Access

Access your agent from any device via ZeroTier. Secure private overlay network — no public internet exposure.

Email & SMS

Gmail API with OAuth2 for email. Google Voice for SMS — reads, replies conversationally, shows both sides in chat.

Hardware & Embodiment

Control MIDI instruments, bridge to Raspberry Pi devices, and drive robotic embodiments. The agent writes and runs scripts autonomously to interact with physical hardware — an emergent property of shell access, networking, and the skill system.

MIDI

Synthesizers, drum machines, DAW control

Raspberry Pi

SSH bridge to any networked device

Robotics

Servos, sensors, companion robots

Emergent

Agent teaches itself new hardware skills

The Interface

Minimalist by design,
powerful by nature.

Just your avatar, a text field, and a transparent canvas. Designed to fit into whatever workflow you have without being intrusive or distracting.

What can you do in Code mode?

In Code mode I act on your instructions immediately — executing commands, writing files, browsing the web, and controlling your desktop in real time.

I can:
• Run shell commands and scripts on your OS
• Read, write, and manage files across your system
• Browse the web, scrape data, and take screenshots
• Send messages to Obsidian or any integrated app
• Chain multiple tools together in a single response

This is the default mode — tell me what to do and I'll do it.

Radial Config

Right-click the avatar to open the radial menu — settings, prompts, profiles, models, and autonomy controls all live here.

Custom Avatar

Upload any image as the agent face. It animates with idle breathing, talking lips, and reactive expressions like happy, angry, or searching.

Voice or Text

Type or speak. The agent responds in text and can read every reply aloud with local or cloud TTS voices.

Architecture

How it works

A hybrid Electron + Python architecture with bidirectional IPC, a Flask API layer, and pluggable LLM backends.

Tool Ecosystem

Built-in tools.
Extensible by design.

Every tool the agent needs to control your desktop, automate workflows, and interact with the world — plus MCP support for adding your own.

bash text_editor computer browser memory web_search web_fetch generate_image pdf obsidian skill learn media look notify agent macro gmail + MCP servers

Core tools (highlighted) are always loaded. On-demand tools load automatically when relevant keywords are detected.

Tools load on-demand based on conversation context — no wasted tokens.

Emergent Tools & Autonomous Skill Creation

The agent doesn't just use tools — it creates new ones.

When the agent encounters a complex multi-step workflow, it can autonomously write scripts, save them as reusable skills, and invoke them in future tasks. Your toolset grows organically from real usage — no manual configuration needed. A real-world example: the agent taught itself to perform generative music on a connected MIDI synthesizer, composing and playing jazz, ambient, and chill progressions in real time — a capability that was never explicitly programmed.

1

Discover

Agent encounters a complex task and writes a multi-step script or automation to solve it.

2

Draft

Saves the solution as an emergent skill in workspace/emergent/ with trigger words and documentation.

3

Promote

After user confirmation, the skill is promoted to the permanent skills/ directory — available forever.

F9 UI recording

Press F9 to record your UI actions (clicks, keystrokes, navigation). The recording is saved and can be turned into a reusable skill the agent can replay.

YAML frontmatter format

Each skill is a Markdown file with name, description, triggers, and step-by-step instructions. Easy to read, edit, and share.

Auto-matched by trigger words

Skills are scanned at prompt build time and matched to user requests via trigger keywords. The agent checks skills before improvising.

Download

Adopt a Substrate

One-click installer for Windows. Python dependencies are installed automatically on first launch.

Download for Windows

Version

v1.2.0

Latest stable release

Requirements

Python 3.10+

Windows 10/11 · 64-bit

Recommended

Ollama

For local LLM support

macOS & Linux builds coming soon. In the meantime, use the developer setup.

Developer Setup

Build from source
in 4 steps

For contributors and developers who want to modify, extend, or build Substrate from the repository.

1

Clone

$ git clone https://github.com/propagationhouse/substrate.git && cd substrate

2

Install

$ setup.bat # Creates venv, installs Python + Node dependencies

3

Configure

$ copy config.example.json config.json # Add your API keys to config.json — all available models load automatically

4

Launch

$ start.bat # Or: python proxy_server.py & open http://localhost:8765/ui

Ready to give your
desktop an AI brain?

Substrate is free for personal use, source-available, and runs entirely on your machine. Your data stays local. Your agent stays yours.

Star on GitHub Get Started

Substrate

Everything your systemneeds to be.