Talk to Data deploys autonomous AI agents across your entire data stack — querying, interpreting, and surfacing insights that would take your team weeks to find.
Trusted by 1,00+ data teams
上周各门店的销售排名是什么?为什么有差异?
Semantic Governance
A structured semantic layer that transforms raw metadata into governed, auditable knowledge — from schema definitions to compliance packs and industry archetypes.
Table schemas, column definitions, metric contracts, join rules, and few-shot NL2SQL examples — version-controlled as YAML, ingested into PostgreSQL.
Business term lifecycle management, bilingual disambiguation, synonym networks, and ambiguity matrices — powered by pgvector embeddings.
Compliance rules, business context assumptions, industry archetypes, and regulatory constraints — enforced via Apache AGE graph traversal.
PG Supernode
One PostgreSQL instance, three retrieval paradigms. Relational precision, vector semantics, and graph governance — unified in a single transaction.
Engine
Structured DDL, metric definitions, join rules, lifecycle tracking, and ML tool registry in PostgreSQL
Engine
7 embedding types with pgvector — cosine similarity search for fuzzy term matching and SQL cache
Engine
Apache AGE property graph enforces governance boundaries, relationship traversal, and compliance packs
Multi-Agent System
LangGraph four-layer orchestration — Supervisor → Query Understanding → Router → Sub-agents. LLM reasons and plans, ML tools compute and analyze.
Four-layer orchestration entry
Intent + analytical depth detection
7-route intelligent dispatch
ML DAG step decomposition
R/V/G hybrid search
ML tool selection & execution
Code-specialized LLM
Sandboxed Python execution
7-layer security validation
Auto chart + ML charts
ML-grounded data insights
Next-best-action guidance
Personalized Intelligence
Four-layer personalized memory that learns from every interaction. Every retrieve, write, merge, and suppress operation is linked to the request trace for full observability and replay.
In-flight conversational state for the current request — powered by LangGraph checkpointer.
Durable user preferences: chart styles, favored grains, default comparisons, repeated metric choices.
Decaying summaries of successful past interactions — preserves continuity without replaying raw history.
User-approved corrections and disambiguation outcomes — high-value for accuracy, governance-gated.
Zero cross-user sharing. Every memory record is scoped to user_id with adversarial isolation guarantees.
Memory-on vs memory-off replay experiments validate every retrieval policy change before production rollout.
Explicit retention, redaction, suppression, and expiry semantics. User deletion is a first-class operation.
Memory writes only from durable signals — explicit preferences, accepted clarifications, recurring patterns, and user-approved corrections. Never from transient failures or low-confidence single turns.
Unified Observability
One ObservabilityFacade unifies Langfuse tracing, AutoMQ event streaming, and structured logging. Every agent step, memory operation, and evaluator result shares the same trace context.
Every request gets a single trace_id. All spans — LLM calls, retrieval, SQL generation, memory operations — nest under one trace with full input/output capture.
Lightweight, reference-based events streamed to Kafka-compatible topics. Build real-time dashboards, alerts, and audit trails from canonical lifecycle events.
Compare policy variants with memory-on/off replay experiments. Correlate memory effectiveness with SQL quality, user feedback, and evaluator pass rates.
Seven dedicated event types — retrieve, rank, write, upsert, suppress, expire, delete — each linked to trace context for full auditability.
API Request
start_trace()
Agent Spans
start_span() → end_span()
Memory Events
emit_event(memory.*)
Score Recording
record_score()
Trace Finalize
finalize_trace()
Defense in Depth
Every query passes through seven independent security checks — from JWT authentication to result-level PII masking. Fail-safe by default: when in doubt, reject.
Role-based access control + rate limiting
Domain-scoped data access boundaries
PII exposure policy (hidden / masked / visible)
Apache AGE enforces governance joins
SELECT-only whitelist + column validation
Dedicated workgroup + statement timeout
Code interpreter whitelist + column-level PII masking
Capabilities
Four-layer personalized memory (profile, episodic, correction, working) learns your preferences and patterns — every interaction makes the agent smarter.
Every query, span, and decision traced end-to-end via ObservabilityFacade. Langfuse for deep tracing, AutoMQ for real-time event streaming.
LLM handles reasoning, planning, and explanation. Traditional ML handles attribution, forecasting, clustering, anomaly detection, and regression.
What (descriptive) → Why (diagnostic) → Next (predictive) → How (prescriptive). System detects analytical depth and routes to the right execution path.
Sandboxed Python execution with whitelisted ML libraries. LLM generates analysis code, sandbox executes safely with memory/time limits.
Results as decision-ready card blocks — narrative, metrics, charts, ML results, insights, and next-best-action suggestions. Pin any block to Dashboard.
Smart model routing — DeepSeek-v3.2 for understanding, qwen3-coder-plus for SQL, Qwen3-max for insights. Automatic fallback chains.
Convert conversational analysis blocks into reusable BI dashboard widgets. Persist chart configs, SQL, filters, and ML context as durable assets.
From natural language to governed SQL, ML-powered analysis, and actionable dashboard cards. Deploy on your own infrastructure with full data sovereignty.