Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture

Insight is built as a Tauri 2.0 desktop application with a Rust backend and Svelte frontend.

Core Stack

ComponentLibraryPurpose
App frameworkTauri 2.0Desktop app (Rust backend, web frontend)
UISvelte 5Frontend
StylingTailwind 4Utility-first CSS
LLM inferencemistralrsLocal model loading and inference (GGUF format)
Model downloadhf-hubFetch models from HuggingFace
P2P / SyncirohConnections, NAT traversal, sync
Content storageiroh-blobsContent-addressed file storage
Metadata synciroh-docsCRDT key-value store for metadata
Real-timeiroh-gossipPub/sub for live updates
SearchmilliFull-text + vector search (used by agent)
PDF processinglopdfText extraction

Agent Architecture

User Query
    ↓
Local LLM (via mistralrs)
    ↓
Tool Calling Loop
    ↓
Synthesized Answer (with citations)

The agent has tools for searching and reading documents. It iteratively gathers evidence to answer questions, citing sources along the way. There is no direct user-facing search—all document retrieval happens through the agent.

Data Model

Collections as Namespaces

Each collection is an iroh-docs namespace. Sharing a collection means sharing namespace access.

Namespace: 7f3a8b2c... ("Climate Research")
│
├── files/abc123     → blob with metadata JSON
├── files/def456     → blob with metadata JSON
└── _collection      → blob with collection settings

Document Metadata

Document metadata is stored as a blob and referenced by an entry in the namespace:

{
	"name": "paper.pdf",
	"pdf_hash": "blake3-hash-of-pdf",
	"text_hash": "blake3-hash-of-extracted-text",
	"tags": ["research", "climate"],
	"created_at": "2024-01-15T10:30:00Z"
}

Content-Addressed Storage

All file content (PDFs, extracted text) is stored in iroh-blobs using content-addressing:

  • Files are identified by their BLAKE3 hash
  • Duplicate files are automatically deduplicated
  • Content can be verified for integrity

Embedding Sync

Embeddings are stored in iroh-docs under embeddings/{doc_id}/{model_id}. This design:

  • Avoids redundant computation — generating embeddings is expensive, so peers share them
  • Preserves model flexibility — different peers can use different embedding models
  • Enables offline use — embeddings sync with documents, ready for immediate use

When a peer receives a document, it checks for existing embeddings matching its configured model. If found, they’re used directly. If not (different model or new document), embeddings are generated locally and stored for other peers to use.

Data Flow

Local Import

  1. User adds PDF to collection
  2. Extract text via lopdf
  3. Store PDF blob → get pdf_hash
  4. Store text blob → get text_hash
  5. Create metadata entry in iroh-docs
  6. Index text + generate embeddings in milli

On Sync

When a new metadata entry arrives from a peer:

  1. Fetch text blob using text_hash
  2. Index text + generate embeddings in milli
  3. PDF blob fetched on-demand (when user opens document)

What Syncs vs What’s Local

DataSyncsStored in
PDF filesYesiroh-blobs
Extracted textYesiroh-blobs
File metadataYesiroh-docs
Collection infoYesiroh-docs
EmbeddingsYes (keyed by model)iroh-docs
Search indexNo (derived)milli (for agent)
LLM modelsNo~/.cache/huggingface/hub

Local Storage

~/.local/share/insight/
├── iroh/               # iroh data (blobs, docs)
└── search/             # milli index

~/.cache/huggingface/hub/
└── models--*/          # Downloaded models (LLM + embedding)

On Windows, app data is under %LOCALAPPDATA%\insight\ and models under %USERPROFILE%\.cache\huggingface\hub\.