Now generally available — v2.0

Your data,
finally thinking
for itself.

Data Agent deploys autonomous AI agents across your entire data stack — querying, interpreting, and surfacing insights that would take your team weeks to find.

Get started free

Watch the Presentation

Trusted by 8,000+ data teams worldwide

Talk to Data — Query Workspace

You

上周各门店的销售排名是什么？为什么有差异？

Active agents

84,200+

+12% this month

Queries / day

2.4M

+28% vs last week

Avg. latency

1.8 s

−64% faster

Data connectors

120+

plug & play

30+

Agent Nodes

Memory Layers

Route Decisions

Security Layers

KPI LookupNL2SQL QueryDeep AnalysisGraph ReasoningKnowledge QAClarificationAnalytical WorkflowKPI LookupNL2SQL QueryDeep AnalysisGraph ReasoningKnowledge QAClarificationAnalytical WorkflowKPI LookupNL2SQL QueryDeep AnalysisGraph ReasoningKnowledge QAClarificationAnalytical WorkflowKPI LookupNL2SQL QueryDeep AnalysisGraph ReasoningKnowledge QAClarificationAnalytical Workflow

Agent Architecture

12 specialized agents,
one unified pipeline

Each query flows through a deterministic orchestration graph — understanding, planning, executing, validating, and presenting results in a single coherent pass.

Supervisor

4-layer orchestration entry

Query Understanding

Intent + depth detection

Router

7-route intelligent dispatch

Planner

ML DAG decomposition

Data Retrieval

R/V/G hybrid search

Analytical Agent

ML tool execution

SQL Generation

Code-specialized LLM

Code Interpreter

Sandboxed Python

Guardrails

7-layer security validation

Visualization

Auto chart + ML charts

Insights

ML-grounded insights

Suggested Followup

Next-best-action

Storage Architecture

Triple-engine knowledge store

One query. Three synchronized stores. Relational precision, vector recall, and graph governance — working in concert.

R · Relational

Relational Store

Structured DDL, metric definitions, join rules, lifecycle tracking, and ML tool registry.

PostgreSQLDDL schemasMetric contractsVersion control

V · Vector

Vector Store

7 embedding types with pgvector — cosine similarity search for fuzzy term matching and SQL cache.

pgvector7 embeddingsFuzzy matchingSQL cache

G · Graph

Graph Store

Apache AGE property graph enforces governance boundaries, relationship traversal, and compliance packs.

Apache AGEGraph traversalComplianceGovernance

Data Governance

Three-layer knowledge
governance model

Every data asset is governed from definition to query — schema, terminology, and compliance fused into one coherent graph.

Layer 1

Canonical Metadata

Table schemas, column definitions, metric contracts, join rules, and few-shot NL2SQL examples — version-controlled as YAML, ingested into PostgreSQL.

table_assetmetric_assetjoin_rulefew_shot_example

Layer 2

Term Governance

Business term lifecycle management, bilingual disambiguation, synonym networks, and ambiguity matrices — powered by pgvector embeddings.

business_termterm_relationshipambiguity_matrix

Layer 3

Business Knowledge

Compliance rules, business context assumptions, industry archetypes, and regulatory constraints — enforced via Apache AGE graph traversal.

business_rulebusiness_contextcompliance_packindustry_archetype

Ingestion Pipeline

YAML Metadata

Schema Validation

Cross-Layer Refs

Version Check

CI Pipeline

PG Ingest (R+V+G)

✓ live

Compliance Coverage

GDPR

Enforced

HIPAA

Enforced

PCI-DSS

Enforced

SOX

Enforced

Query Planning Layer

Semantic Query Planner

Deterministic graph routing + algebraic reasoning sits between RAG and SQL generation, providing structured guidance so LLMs generate safer, more accurate SQL in complex scenarios.

Pipeline Position

RAG Retrieval

R+V+G

Reranker

Priority Sort

Semantic Planner

Advisory

SQL Generator

LLM

Guardrails

Validate

SQL Executor

Run

KMB Graph Routing

Dijkstra + Kruskal MST finds minimal-cost table join paths with fanout risk weighting.

Weighted graph topology

Steiner node insertion

Fanout risk scoring

Cross-domain penalties

Semantic Algebraic Reasoning

Three rewrite rules detect and suggest fixes for chasm traps, redundant joins, and semi-join patterns.

Lossless join elimination

Aggregate-before-join

Semi/anti-join reasoning

Structured diagnostics

Safe Advisory Mode

Never blocks the pipeline — failures return empty guidance, LLM proceeds independently.

Try/except wrapping

Empty fallback

No external calls

Zero blocking risk

semantic_plan.diagnostics

# Query: "Compare Q2 revenue by product across regions with active customer counts"

JOIN_ELIMINATED

table: dim_product

No columns referenced, N:1 leaf node

AGGREGATE_BEFORE_JOIN

table: fact_orders, fact_sessions

Chasm trap detected: 2 fact tables

SEMI_JOIN

table: dim_customer

EXISTS filter detected

✓ Guidance generated · 0 errors · fallback: safe

Capabilities

Everything your data team
actually needs

Not a wrapper around an LLM. A complete analytical intelligence platform built from first principles.

Trace-Native Memory

Four-layer personalized memory (profile, episodic, correction, working) learns your preferences and patterns — every interaction makes the agent smarter.

Learn more →

Full-Pipeline Observability

Every query, span, and decision traced end-to-end. Langfuse for deep tracing, AutoMQ for real-time event streaming.

LLM + ML Hybrid

LLM handles reasoning and planning. Traditional ML handles attribution, forecasting, clustering, anomaly detection, and regression.

Four-Level Analytical Depth

What → Why → Next → How. The system detects analytical depth and routes to the right execution path automatically.

Code Interpreter

Sandboxed Python execution with whitelisted ML libraries. LLM generates analysis code, sandbox executes safely with memory and time limits.

Learn more →

Block-Based Output

Results as decision-ready blocks — narrative, metrics, charts, ML results, insights, and next-best-action suggestions.

Multi-Model Strategy

Smart model routing — DeepSeek-v3.2 for understanding, qwen3-coder for SQL, Qwen3-max for insights. Automatic fallback chains.

Learn more →

Pin to Dashboard

Convert conversational analysis into reusable BI dashboard widgets. Persist chart configs, SQL, filters, and ML context as durable assets.

Learn more →

Memory Architecture

Four-layer memory that
compounds over time

Unlike stateless LLMs, Data Agent builds a persistent understanding of your team, data, and preferences — becoming more accurate with every query.

Layer 1

Profile Memory

Persistent user preferences, domain knowledge, analytical style, and role context. Loaded at session start.

user_role: data_analyst · preferred_chart: bar · domain: fintech

Layer 2

Episodic Memory

Successful query–result pairs and effective analysis patterns from past sessions. Retrieved via vector similarity.

episode_id: q-3847 · pattern: attribution_model · reuse_score: 0.94

Layer 3

Correction Memory

Errors caught, user corrections applied, negative feedback reinforced. Prevents repeating the same mistakes.

correction: avoid_yoy_comparison · trigger: fiscal_year_mismatch

Layer 4

Working Memory

Active session context, in-flight query state, partial results, and real-time clarification thread.

session: s-92f · active_query: revenue_attribution · turns: 6

Observability

Every decision,
traced end-to-end

No black boxes. Data Agent's ObservabilityFacade instruments every span — from intent detection to chart rendering — with full latency, token, and cost attribution.

Langfuse deep tracing

Full LLM call trees with prompt/response capture and latency breakdown

AutoMQ event streaming

Real-time pipeline events streamed for monitoring and replay

Cost attribution

Token spend, model routing decisions, and cache hit rates per query

langfuse trace · query-92f48a

live

» Compare Q2 APAC revenue vs forecast by segment

SUPERVISOR148ms

QUERY_UNDERSTANDING8ms

ROUTER → deep_analysis2ms

PLANNER15ms

DATA_RETRIEVAL (R+V+G)45ms

ANALYTICAL_AGENT38ms

SQL_GENERATION12ms

GUARDRAILS ✓ pass3ms

VISUALIZATION8ms

INSIGHTS + FOLLOWUP17ms

Total: 148ms · 3 models · 2 ML tools · $0.0042

Security

7-layer security,
enforced at every hop

Defense in depth from authentication to sandbox execution. Every query traverses every layer — no shortcuts, no bypasses.

JWT Authentication

Role-based access control + rate limiting

Layer 1

Permission Filter

Domain-scoped data access boundaries

Layer 2

Metadata Safety

PII exposure policy — hidden / masked / visible

Layer 3

Graph Constraints

Apache AGE enforces governance joins

Layer 4

SQL Policy Engine

SELECT-only whitelist + column validation

Layer 5

Execution Isolation

Dedicated workgroup + statement timeout

Layer 6

Sandbox & Result Masking

Code interpreter whitelist + column-level PII masking

Layer 7

Get Started

Ready to let your
data think for itself?

Join 100+ data teams who ship insights in seconds, not weeks.

Get started free

Read the docs

Your data,finally thinkingfor itself.

12 specialized agents,one unified pipeline

Triple-engine knowledge store

Relational Store

Vector Store

Graph Store

Three-layer knowledgegovernance model

Canonical Metadata

Term Governance

Business Knowledge

Semantic Query Planner

KMB Graph Routing

Semantic Algebraic Reasoning

Safe Advisory Mode

Everything your data teamactually needs

Trace-Native Memory

Full-Pipeline Observability

LLM + ML Hybrid

Four-Level Analytical Depth

Code Interpreter

Block-Based Output

Multi-Model Strategy

Pin to Dashboard

Four-layer memory thatcompounds over time

Profile Memory

Episodic Memory

Correction Memory

Working Memory

Every decision,traced end-to-end

7-layer security,enforced at every hop

Ready to let yourdata think for itself?

Your data,
finally thinking
for itself.

12 specialized agents,
one unified pipeline

Three-layer knowledge
governance model

Everything your data team
actually needs

Four-layer memory that
compounds over time

Every decision,
traced end-to-end

7-layer security,
enforced at every hop

Ready to let your
data think for itself?