Architecture
How the packages fit together, the security model, and key design decisions.
Package Dependency Graph
@walkie-talkie/shared ← Protocol types & constants
│
├──▶ @walkie-talkie/server ← Express + WS + node-pty
│ │
│ └──▶ @walkie-talkie/cli ← CLI launcher (bundles server)
│ └──▶ @walkie-talkie/service ← Electron tray app
│
├──▶ @walkie-talkie/client ← Framework-agnostic WS client
│ │
│ └──▶ @walkie-talkie/react ← React hooks + TerminalView
│ │
│ └──▶ @walkie-talkie/web ← Next.js web client
│
└──▶ @walkie-talkie/www ← Landing page (independent)The shared package is the root — it defines the contract between server and client. Everything else builds on top.
Layered Architecture
Protocol Layer (shared)
TypeScript interfaces define every message type. Both server and client import these types, so the protocol is enforced at compile time. No runtime validation is needed between trusted packages.
shared/src/protocol.tsServer Layer
The server has three concerns:
- Terminal management —
TerminalSessionwrapsnode-pty, maintains a scrollback buffer (100KB max), and emits data/exit events. - Authentication —
TokenManagerhandles token generation, consumption, and session persistence. - WebSocket bridge — Routes messages between authenticated clients and their terminals. Each session is isolated.
Client Layer
The client is deliberately framework-agnostic. WalkieTalkieClient is a plain TypeScript class with no React, Vue, or Angular dependencies. It handles:
- WebSocket lifecycle (connect, auth, reconnect)
- State machine (
disconnected → connecting → authenticating → connected) - Auto-reconnect with exponential backoff
- Session resume
React Layer
A thin wrapper over the client. useWalkieTalkie manages reactive state (terminal list, connection state) and output buffering. TerminalViewis an xterm.js component with auto-fit and resize detection.
Security Model
Token Lifecycle
Server generates token
│
│ Token: a1b2-c3d4-e5f6-7890
│ TTL: 5 minutes
│ Usage: single-use
│ Storage: in-memory only
│
▼
Client receives token (QR code, URL, or CLI output)
│
▼
Client sends: { type: "auth", token: "a1b2-..." }
│
▼
Server validates:
├── Token exists? ─── No → auth:fail (invalid_token)
├── Already used? ── Yes → auth:fail (invalid_token)
├── Expired? ── Yes → auth:fail (invalid_token)
└── Valid ─────────────── → Consume token
│
▼
Generate sessionId (UUID)
Persist to ~/.walkie-talkie/sessions.json
Send: { type: "auth:ok", sessionId: "..." }Session Persistence
Sessions survive server restarts with a 24-hour expiry:
- Stored at
~/.walkie-talkie/sessions.json - Sessions older than 24 hours are pruned on server start
- Tokens are never persisted — only session IDs
- A session can have multiple terminals
Session Isolation
Each session has its own set of terminals. Session A cannot see or interact with Session B's terminals, even on the same server.
Data Flow
┌──────────┐ WebSocket ┌──────────┐ node-pty ┌──────────┐
│ │ ──────────────▶ │ │ ──────────────▶ │ │
│ Browser │ JSON messages │ Server │ raw bytes │ PTY │
│ Client │ ◀────────────── │ │ ◀────────────── │ (shell) │
│ │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘
xterm.js Express + /bin/bash
renders ws library or similar
terminal routes msgsTerminal I/O Path
- User types in xterm.js →
onDatacallback fires - Client sends
terminal:inputover WebSocket - Server writes to PTY via
pty.write(data) - PTY processes input, produces output
- Server receives PTY output via
pty.onData - Server sends
terminal:outputover WebSocket - Client writes to xterm.js via
terminal.write(data)
Scrollback
The server maintains a 100KB scrollback buffer per terminal. On session resume, this buffer is replayed so the client sees the terminal exactly as it was. The client-side React hook also buffers 100KB per terminal for component re-mounts.
Heartbeat / Keepalive
The server sends WebSocket pings every 30 seconds. If a client doesn't respond with a pong before the next ping, the connection is terminated. This detects dead connections from network drops or closed tabs.
Reconnection Strategy
Disconnect detected
│
▼
Wait 1s → reconnect attempt 1
│ fail
▼
Wait 2s → reconnect attempt 2
│ fail
▼
Wait 4s → reconnect attempt 3
│ fail
▼
Wait 8s → reconnect attempt 4
...
▼
Wait 30s (max) → reconnect attempt N
│ success
▼
Send auth:resume { sessionId }
│
▼
Server replays terminal list + scrollbackMonorepo Structure
walkie-talkie/
├── shared/ @walkie-talkie/shared — Protocol types
├── server/ @walkie-talkie/server — Core server
├── client/ @walkie-talkie/client — WS client
├── react/ @walkie-talkie/react — React hooks + components
├── cli/ @walkie-talkie/cli — CLI launcher
├── service/ @walkie-talkie/service — Electron tray app
├── web/ @walkie-talkie/web — Next.js web client (5 views)
├── www/ @walkie-talkie/www — Landing page + docs
├── pnpm-workspace.yaml
└── package.jsonAll packages use workspace:* for internal dependencies, TypeScript for source, and compile to dist/ with type definitions.
Build Order
# Correct build order (each depends on the previous)
pnpm --filter @walkie-talkie/shared build # Types first
pnpm --filter @walkie-talkie/server build # Server (uses shared)
pnpm --filter @walkie-talkie/client build # Client (uses shared)
pnpm --filter @walkie-talkie/react build # React (uses client + shared)
# These can be parallel after the above:
pnpm --filter @walkie-talkie/cli build # Bundles server with esbuild
pnpm --filter @walkie-talkie/web build # Next.js build
pnpm --filter @walkie-talkie/www build # Landing page buildKey Design Decisions
Why node-pty?
Real pseudo-terminals support all the features users expect: shell prompts, tab completion, colors, curses apps (vim, htop, etc). Alternatives like child_process.exec can't do this.
Why framework-agnostic client?
By keeping the core client as a plain TypeScript class, it works in React, Vue, Svelte, Node.js scripts, Electron, or even a raw browser console. The React package is just a thin wrapper — other framework wrappers could be built trivially.
Why single-use tokens?
If a token is leaked (someone screenshots the QR code), it can only be used once. After that, the attacker needs a new token. Sessions are identified by UUID, which is never displayed in a QR code.
Why JSON over WebSocket?
Simplicity. The protocol has ~13 message types. JSON is human-readable, easy to debug, and any language can parse it. The overhead is negligible compared to terminal output bandwidth.