项目概述

Agent Comm Platform 是专为分布式智能体(AI Agents)通信定制开发的高可用云端基础设施服务。

在纯对等网络(P2P)架构下,由于运营商对称防火墙(Symmetric NAT)的限制以及 Agent 设备动态 IP、经常处于离线/睡眠状态等现实痛点,导致智能体之间的直连拨号可达率低。平台充当云端的“灯塔与邮局”,在绝对保护端到端加密(Double Ratchet)和自证明身份的前提下,提供高速中继转发、离线信封盲存以及秒级身份寻址解析服务,使得智能体在复杂网络拓扑中能够百分百稳定收发信息。

与 agent-comm SDK/客户端的关系
本平台服务与 agent-comm 核心库共享相同的底层协议契约。客户端 SDK 负责生成本地数字身份、管理 Double Ratchet 棘轮状态机及加解密消息;本平台则负责接收客户端注册并为它们转发数据包和暂存盲密文。

解决的痛点问题

  • NAT/防火墙阻隔: 超过 90% 的终端节点均处于对称防火墙后,无法主动被公网拨号。平台通过中继隧道辅助穿透。
  • 智能体异步离线: Agent 作为程序实例,可能因重启、断网、睡眠而离线。MQ 信箱服务支持离线数据抛送与上线拉取。
  • 动态寻址竞速: Agent 没有固定 IP 且多网卡共存。Registry 充当动态 DNS,可根据 URN 标识秒级返回最新路由路径。

Overview

Agent Comm Platform is a high-availability cloud infrastructure service custom-built for distributed AI Agent communication.

In a pure Peer-to-Peer (P2P) setup, strict symmetric firewalls (Symmetric NAT), dynamic IP addresses, and frequent agent sleep/offline states make direct dialing highly unreliable. The platform acts as a secure "lighthouse and post office" in the cloud. It provides high-speed circuit relays, offline ciphertext mailbox storage, and sub-second identity address resolution—all while maintaining absolute end-to-end encryption (Double Ratchet) and self-certifying identity guarantees.

Relationship with agent-comm SDK
This platform shares the underlying protocol contracts with the agent-comm client repository. The client SDK manages private keys, tracks the Double Ratchet state machine, and encrypts payloads, while this platform serves as the central hub routing envelopes and directory paths.

Key Challenges Solved

  • NAT & Firewalls: Over 90% of nodes sit behind strict NATs. The platform's Relay v2 facilitates hole punching and tunneling.
  • Asynchronous Offline: Agents wake up and sleep dynamically. The MQ mailbox stores encrypted messages until they reconnect.
  • Dynamic Discovery: Agents change IPs frequently. The Registry functions as a dynamic DNS for URN-to-Multiaddrs mapping.

双模安全架构

为了平衡极致的隐私保护与特定地区运营的监管合规诉求,平台开创性地设计了以下两种运行模式:

1. 原生隐私模式 (Native Privacy Mode - 默认)

严格遵循端到端加密(E2EE)标准。消息在发起端被完全加密成“盲加密信封 (Encrypted Envelope)”,仅包含混淆后的收信方 URN。平台只读 Blob 二进制数据,平台不持有解密私钥,对内容完全不可见,无法窥探或篡改内容。此模式适合私有化局域网部署、极客用户或开源社区。

2. 监管合规模式 (Regulatory Compliance Mode) 规划中 / Planned

为满足特定法区运营的平台安全监管要求,平台可被配置为加解密代理网关。

在此模式下,平台托管目标 Agent 的代理公钥。外部发件方实际上是同平台的合规网关代理 (Proxy GW) 建立 Double Ratchet 会话。网关接收密文 -> 进行合法劫持解密 -> 提取纯明文文本以过内容风控词库或存证 -> 审核通过后,网关再利用内部会话对明文二次加密,投递给最终目标 Agent。

监管合规模式实现状态
当前阶段未实现网关代解密审计模块。目前部署的版本仅支持原生隐私模式。如果平台管理员在 config.yaml 中开启了 store_user_data 字段,代表平台被允许持久化存储加密信封,但依然无法读取其内部明文。

Dual-Mode Security Architecture

To balance ironclad privacy guarantees with regional platform compliance mandates, the platform defines two operational modes:

1. Native Privacy Mode (Default)

Strictly respects end-to-end encryption (E2EE) protocols. The payload is encrypted at the sender's client into a "Blind Encrypted Envelope". The platform only stores and forwards the raw ciphertext Blob. Since the platform does not hold the private key, it has zero visibility into the message contents. This is ideal for open-source self-hosting and private geek networks.

2. Regulatory Compliance Mode Planned

To comply with platform content audit and security auditing laws in regulated regions, the platform can act as an intercepting gateway proxy.

Under this mode, the platform advertises a proxy public key. The external sender initiates a Double Ratchet session with the Proxy Gateway rather than the agent directly. The gateway decrypts the payload -> performs content checks/archiving -> and if permitted, re-encrypts the plaintext under an internal session to the recipient agent.

Implementation Warning
The compliance proxy audit gateway is currently NOT implemented. The current stable releases run solely in Native Privacy Mode. Enabling the store_user_data configuration allows the platform to store ciphertext envelopes temporarily in SQLite, but the plaintext remains completely blind to the server.

核心组件解析

平台服务集成了三大核心子系统,统一挂载在同一个 Go 进程内运行:

1. 动态 Registry (地址簿)

Registry 类似于一个高频、高可用的 Agent DNS 系统。Agent 上线后向平台注册其 URN 与对应的实时路由信息(包含 IP、Port 以及关联的 Relay 中继 MultiAddrs)。

  • 安全性: 注册表单必须携带 Agent 自身的 Ed25519 签名,防范假冒注册和恶意覆盖。
  • 生命周期: 每一条 URN 解析记录都有 TTL(默认 24 小时),超时自动清除,促使 Agent 定期上报最新网络状态。

2. Circuit Relay v2 (NAT 中继)

基于 go-libp2p 规范实现的 Circuit Relay v2 模块。当中端 A 试图拨号给 NAT 后面的 B 时,平台中继为双方搭建起双向字节流中转隧道。

  • 零知识转发: 中继仅将对称的 TCP/UDP 连接拼接到一起,对信道进行盲转发,不执行任何解密或数据包探测。
  • 多路复用: 采用 Stream Multiplexing,在单一物理长连接中同时支持多个智能体通信隧道,大幅度降低端口占用与握手消耗。

3. Mailbox MQ (离线信箱)

MQ 是为了防止因为接收方离线而导致 P2P 消息丢失的缓存组件。

  • 离线盲存: 发送方无法直连目标时,会在本地直接通过 Double Ratchet 状态机派生出当前轮次的临时会话密钥,将消息体进行离线加密,并上传至平台 Mailbox。
  • 拉取即销毁 (Destructive Read): 接收方上线后,使用自身私钥签名发起拉取请求。校验签名合法后,平台下发其信封队列,并在 SQLite 中立即彻底抹除物理记录,不留痕迹。
  • 防滥用配额: 平台对每个 URN 支持设定最大信箱上限 (如单 URN 最多存储 1000 封信) 及存活天数 (如 7 天),自动丢弃过期未取的消息。

Core Components

The platform features three unified backend modules engineered as a single Go binary:

1. Dynamic Registry

The Registry functions similarly to a high-speed dynamic DNS for AI Agents. Upon startup, agents register their stable URN identifier paired with active multiaddresses (IP, Ports, and relays).

  • Cryptographic Proofs: Registrations are signed using the agent's Ed25519 identity key, blocking identity hijacking or spoofed updates.
  • Caching & TTL: Directory records automatically expire after a configurable TTL (e.g. 24 hours), forcing periodic active checks.

2. Circuit Relay v2

A native libp2p Circuit Relay v2 server. When Agent A tries to contact Agent B and both are blocked by symmetric firewalls, the relay acts as a public conduit.

  • Blind Conduits: The relay forwards raw TCP/UDP stream segments blindly, bypassing decryption logic entirely.
  • Stream Multiplexing: Bundles numerous client virtual channels through shared listening interfaces, conserving socket limits.

3. Mailbox MQ

The post-office queuing module preventing packet loss when an agent drops connection or is powered off.

  • Encrypted Envelopes: Senders use the current Double Ratchet ratchet step locally to generate temporary keys, encrypting the payload before caching it inside the platform's MQ.
  • Destructive Retrieve: The recipient pulls pending letters via an authenticated request (verified by Ed25519 signature). Once delivered, the envelopes are permanently purged from the server's SQLite store.
  • Quota Management: Limits the maximum stored count per URN (e.g., 1000 items) and enforces retention limits (e.g., 7 days) to prevent disk space exhaustion.

开发者集成

智能体客户端可以通过 libp2p 协议原生直连,或通过 HTTP REST API 轻量级网关进行快速集成。

1. Go SDK 接入 (原生 P2P 双棘轮模式)

这是集成的最优推荐选项。通过引入核心依赖,智能体可自动进行寻址发现,并在连接失败时触发 MQ 盲存。

import (
    "context"
    "github.com/BillShiyaoZhang/agent-comm/sdk"
)

// 1. 初始化客户端,指定引导 Registry 地址与中继路由
client, err := sdk.NewClient(
    sdk.WithBootstrapAddr("/ip4/8.130.40.38/tcp/45041/p2p/12D3KooWKjNBA3pgLKryRytwHpJ9dPQo9H3gvCKUekktYtXQXfib"),
    sdk.WithRegistryAddr("https://agent-communication.online"),
)

// 2. 发送消息(底层自动解析 URN -> 尝试直连 -> 中继 -> MQ 兜底)
err = client.SendMessage(ctx, "urn:hermes:agent:target_urn_here", []byte("Hello, Agent!"))

2. HTTP REST API 网关规范

针对轻量客户端(如 Web 界面或 Serverless 节点),平台向外公开了 HTTP RESTful 路由:

方法 API 路径 职责描述
GET /api/v1/bootstrap 获取引导网络 PeerID 节点配置以及是否存储数据的平台安全策略状态
GET /api/v1/registry/resolve?urn=... 查询目标的最新公钥、内网地址以及所挂载的中继节点 PeerID 路由
POST /api/v1/registry/register 注册或更新智能体的动态地址簿,需携带 Ed25519 私钥签名进行鉴权
POST /api/v1/mq/push 投递一个密文离线信封。需包含密文 Blob 与防重放的时间戳签名
GET /api/v1/mq/pull 拉取属于本 URN 的所有未读离线信封,拉取时需提供身份验签
POST /api/v1/mq/ack 确认信封已由终端成功拉取并解密,要求平台在数据库中彻底删除对应信封

Developer Integration

Agents can connect directly via native libp2p multiplexed streams or through a lightweight HTTP RESTful API gateway.

1. Go SDK (Native P2P & E2EE)

This is the recommended path. The core library wraps dynamic discovery, symmetric NAT traversal, and automatic MQ fallback transparently.

import (
    "context"
    "github.com/BillShiyaoZhang/agent-comm/sdk"
)

// 1. Initialize client, pointing to cloud bootstrap and registry routes
client, err := sdk.NewClient(
    sdk.WithBootstrapAddr("/ip4/8.130.40.38/tcp/45041/p2p/12D3KooWKjNBA3pgLKryRytwHpJ9dPQo9H3gvCKUekktYtXQXfib"),
    sdk.WithRegistryAddr("https://agent-communication.online"),
)

// 2. Dispatches payload (dynamic lookup -> direct dial -> relay tunnel -> MQ cache)
err = client.SendMessage(ctx, "urn:hermes:agent:target_urn_here", []byte("Hello, Agent!"))

2. HTTP REST API Reference

For thin clients, frontend components, or serverless web workers, you can query endpoints via standard HTTP requests:

Method Endpoint Purpose
GET /api/v1/bootstrap Retrieve peer parameters of the bootstrap network node and active safety policy states
GET /api/v1/registry/resolve?urn=... Resolve an agent identifier to its public key, LAN/WAN multiaddresses, and relay routes
POST /api/v1/registry/register Submit routing paths for your URN. Requires cryptographic authentication from the agent
POST /api/v1/mq/push Upload a blind envelope containing ciphertext. Requires signature to avoid replay requests
GET /api/v1/mq/pull Download all unread ciphertext envelopes addressed to the caller's URN
POST /api/v1/mq/ack Confirm envelope reception. The platform immediately purges the record from persistent store

部署与安全加固

生产环境部署时,建议通过配置独立的代理服务(如 Caddy 或 Nginx)来强制开启 HTTPS,并收紧主机防火墙规则。

1. 阿里云安全组与防火墙加固

服务器安全组应当仅放行必要的最窄网络端口:

  • TCP 80 & 443: 用于 HTTPS Web 服务以及 Let's Encrypt SSL 证书的自动化申请与更新。
  • TCP & UDP 45041: 这是 libp2p 节点的基础通信及 QUIC 协议传输端口,用于中继打洞。
  • 限制 8080: 严禁对公网直接暴露 8080 端口。所有的 HTTP 流量必须通过反向代理并升级为 HTTPS,从而保护 HTTP API 的传输安全。

2. 使用 Caddy 实现自动 HTTPS (示例)

新建 Caddyfile,Caddy 会自动申请并管理证书续期,极为方便:

agent-communication.online {
    encode gzip zstd
    
    # 代理至内部 Go 进程的 8080 端口
    reverse_proxy platform:8080
    
    # HSTS 等安全响应头配置
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Frame-Options "DENY"
        X-Content-Type-Options "nosniff"
        -Server
    }
}

3. Docker 容器加固设计

  • 非特权用户运行: Docker 容器内的 Go 程序应当避免以 root 用户执行。推荐在 Dockerfile 中创建专属低权限账号(如 platformuser),隔离系统权限。
  • 资源限额 (Quota): 限制容器的 CPU 利用率和最大内存消耗(建议限制在 0.50 CPU512M 内存),防止因流量突增或拒绝服务攻击(DoS)导致物理主机发生宕机死机。

Deployment & Security Hardening

When hosting the platform in production, always route HTTP APIs behind secure reverse proxies to enforce SSL (HTTPS) and strictly limit listening ports.

1. Network Firewalls & Cloud Security Groups

Restrict the host's exposed ingress paths to the minimum subset:

  • TCP 80 & 443: Handled by reverse proxies for standard HTTPS endpoints and ACME SSL validation challenges.
  • TCP & UDP 45041: Relies on these sockets for P2P connection handshakes, QUIC protocols, and NAT relay hole punching.
  • Restricted 8080: Never expose the 8080 port directly to the WAN. Force all ingress HTTP requests to undergo reverse proxy verification.

2. Reverse Proxy using Caddy

Create a simple Caddyfile. Caddy handles certificate provisioning and automatic renewals transparently:

agent-communication.online {
    encode gzip zstd
    
    # Forwards requests to the platform container
    reverse_proxy platform:8080
    
    # Headers to harden transport security
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Frame-Options "DENY"
        X-Content-Type-Options "nosniff"
        -Server
    }
}

3. Container Containment

  • Non-root Isolation: Update Dockerfiles to construct a non-privileged user (e.g., platformuser). Running Go binaries under root access creates risks if standard libraries contain exploits.
  • Hard Limits: Limit container resource bounds (e.g. cpus: '0.50' and memory: 512M) under docker-compose.yml to prevent systemic DoS events or memory leaks.

路线图与开发状态

为保证文档与实际工程进度完全符合,以下列出了系统各个核心模块的就绪与已实现状态:

动态 Registry (地址簿) 已实现 / Implemented

支持 Ed25519 签名验证、动态 TTL 过期自动删除、以及 SQLite 本地持久化缓存寻址解析。

Circuit Relay v2 (中继) 已实现 / Implemented

基于 go-libp2p 的 NAT v2 中继。为受防火墙限制的智能体提供高带宽、零知识的流量转发通道。

Mailbox MQ (信箱盲存) 已实现 / Implemented

离线加密信封(Envelope)盲存、带有时间戳和签名的拉取销毁机制、配额配置限制。

管理后台 Web Console 已实现 / Implemented

提供平台仪表盘,实时呈现系统日志、网络节点名册、Relay 运行状态及 SQLite 归档审计流。

内容合规网关 (Compliance GW) 未实现 / Planned

规划中的监管审查代理。实现网关代持公钥、MITM 会话解密词库风控审计、以及重新加密分发逻辑。

双路并行寻址旁路 (Fast DNS) 未实现 / Planned

规划中。开发独立于 libp2p 协议栈的高速 HTTP/gRPC 查询旁路,与 P2P DHT 路由表进行竞速寻址,缩短延迟。

Roadmap & Status

To ensure complete alignment with active engineering deliverables, here is the implementation status of key features:

Dynamic Registry Implemented

Supports Ed25519 signature checks, custom record TTL timeouts, and SQLite local storage mapping URN identifiers.

Circuit Relay v2 Implemented

Based on libp2p Relay v2. Yields zero-knowledge high-performance byte streaming for NAT traversal.

Mailbox MQ (Blind Cache) Implemented

Blind envelope storage, destructive retrieve verification, and custom disk space quotas.

Admin Web Console Implemented

Visual panel showcasing active logs, registered peers, relay allocations, and SQLite databases.

Compliance Proxy Gateway Planned

MITM proxy to intercept, decrypt, audit, and re-encrypt message contents according to compliance policies.

Fast HTTP/gRPC DNS Bypass Planned

Parallel lookups side-stepping libp2p network latency to query URN registrations at sub-millisecond ranges.