HydraDB
The RAG engine powering this app
HydraDB is a multi-tenant context and memory engine with hybrid recall, structured memory, and an edge-native API. This page shows exactly how H1BAgent uses it to turn 30+ USCIS documents into instant, cited answers.
<80ms
Recall Latency (p50)
Full recall endpoint
~50 docs/min
Ingest Throughput
PDF extraction + embedding
High
Chunk Precision
Tunable via alpha parameter
Parallel
Concurrency
Both recall endpoints simultaneously
Bearer Token
Auth
Single API key per tenant
Edge-native
Runtime
No Node.js dependency for recall
Why can’t I just self-host Pinecone + Neo4j?
Someone can. But can they hit the latency requirements, ingest at 1M+ tokens per minute, and beat the benchmarks — continuously? HydraDB’s core offering comes in two pieces: infra that scales seamlessly + accuracy at par with SOTA benchmarks.
Pinecone gives you vector search. Neo4j gives you a graph. Stitching them into a tuned system — ingestion at scale, low-latency recall, benchmark-grade accuracy, entity extraction, graph construction, identity inference — is a different problem. Think Vercel for retrieval: you could rebuild parts yourself, but you’re paying for the integrated, managed, continuously-tuned system.
HydraDB tracks SOTA on BEAM / Long MemEval, with finance-specific and private-dataset evaluations on the way. The claim isn’t “we’re easier” — it’s “we’re better, and the gap compounds as benchmarks move.” A DIY Pinecone + Neo4j stack is a static snapshot; HydraDB is a trajectory.
Real numbers from the engineering work already done: 1-2 million tokens per minute ingestion, multi-tenant by design, sub-second recall at scale. The honest question: can your team independently build and maintain ingestion + retrieval at this scale — not once, but continuously?
Entity extraction, graph construction, continuous graph updates, identity and persona inference — all part of the system, not a bolt-on. Already a real differentiator vs pure vector stores, with an active research sprint improving how graph structure feeds retrieval.
Kubernetes is hard. Ingestion pipelines are harder. Tuning retrieval at scale is harder than both. HydraDB’s integration surface is intentionally tiny — five lines of code to get production-grade ingestion and retrieval running. Self-host or managed cloud, same one-hour setup either way. The complexity is hidden, not removed.
Want infrastructure ownership? The answer isn’t open source — it’s BYOC (Bring Your Own Cloud). The entire platform deploys inside your own AWS account, one click. Isolated endpoint, stronger security posture, independence from the public HydraDB cloud. If hydradb.com goes down, BYOC customers are unaffected.
Anyone can assemble Pinecone + Neo4j + an embedding model. The moat is the team continuously operating that stack better than anyone else — benchmark tracking, latency work, ingestion engineering, graph research. Hard to replicate without a dedicated team whose only job is making retrieval better.
Especially relevant for companies whose core product is not search. Building a budgeting app, a CRM, a vertical AI agent? Your engineering time should go to your product, not to becoming a retrieval infrastructure company. H1BAgent is exactly that case — the product is H-1B data clarity, not search infra.
The Two Pieces
HydraDB's core offering — neither piece is extractable from Pinecone + Neo4j alone
What HydraDB actually is
Mental model: “Vercel for retrieval.”
Not a vector DB. Not a graph DB. A retrieval ecosystem— the product is the system tuned around the components.
“Humans don’t remember everything, but they don’t forget everything either. They recall the right thing when it’s needed.” HydraDB’s memory layer is that recall — across 5-year-old context if needed — for agents.
Components vs. System
Anyone can assemble the parts. The moat is operating them as one tuned system — continuously.
Pinecone + Neo4j (DIY) vs. HydraDB
The comparison that matters most
| Dimension | Pinecone + Neo4j (DIY) | HydraDB |
|---|---|---|
| What you get | Components | A tuned system |
| Setup | Architect, integrate, test | 5 lines of code |
| Ingest at 1-2M tokens/min | You build it | Already engineered |
| Retrieval accuracy | Whatever you tune to | SOTA, benchmark-driven |
| Graph + vector together | You stitch it | Native |
| Continuous improvement | Your team's side project | Dedicated team's full-time job |
| Ownership of infra | You host everything | BYOC, in your AWS account |
| When it breaks at 10x | You find out the hard way | Already engineered for it |
Snapshot vs. Trajectory
“Good enough today” is a snapshot. Benchmarks move, competitors advance, your data grows.
BYOC: Bring Your Own Cloud
The real answer to “I want to own the infra.” Better than open source for both sides.
Why not open source?
Because the value isn’t the code — it’s the infrastructure, tuning, and operational know-how built around it. If you handed a customer the source tomorrow, they’d still need to host it, tune it, maintain it, upgrade it, optimize latency and throughput.
Open-sourcing adds a maintenance burden for the team while giving customers something they couldn’t extract real value from anyway. Open source = headache for both sides.
The ownership ask is real — and the answer is BYOC, not open source.
Engineering velocity, not the stack
Anyone can assemble Pinecone + Neo4j + an embedding model. The moat is the team continuously operating that stack better than anyone else — benchmark tracking, latency work, ingestion engineering, ongoing graph research.
In one sentence: the compounding gap between a tuned, researched, benchmarked system and a stitched-together one.
“You should know when your system breaks at 10x or 100x. Most teams don’t — until it does.”
Better than most, not yet the best version
- +Already a real differentiator vs pure vector stores
- +Inference pipeline (how data is structured and inserted) is part of the moat
- +Active research sprint planned to improve how graph structure feeds retrieval
- +Knowing when context expires — not just storing everything forever
The honesty is a feature: roadmap, not frozen product.
Companies whose core product is not search
Budgeting apps. CRMs. Vertical AI agents. Your engineering time should go to your product, not to becoming a retrieval infrastructure company.
H1BAgent is exactly that case— the product is H-1B data clarity for applicants, not search infra. One engineer, one product focus, and HydraDB does the retrieval work that would otherwise swallow months.
System Architecture
How H1BAgent uses HydraDB for RAG
RAG Pipeline
End-to-end flow from data to answer
How H1BAgent Uses HydraDB
Real production architecture — not a demo
What's Inside the Knowledge Base
30+ documents ingested via upload.knowledge() and upload.addMemory()— here's what the workbench queries against
is_markdown: true and upsert: true for idempotent updates. At query time, both sources are searched in parallel via full_recall + recall_preferences, with the alpha parameter controlling the semantic vs keyword balance. The top chunks are injected into the LLM system prompt.Live Workbench
Query HydraDB in real time — tune parameters, compare endpoints
Core Capabilities
Click to expand code examples and implementation details
Every tenant gets its own isolated namespace. Sub-tenants enable hierarchical data organization — perfect for per-user, per-org, or per-project knowledge separation.
// Create isolated tenant
await hydra.tenant.create({
tenant_id: "H1B"
});
// Data always scoped to tenant
await hydra.upload.knowledge({
files: [pdfBuffer],
tenant_id: "H1B",
// Optional sub-tenant for finer isolation
sub_tenant_id: "user_123"
});- +Complete data isolation between tenants
- +Hierarchical sub-tenant support for nested workspaces
- +409 conflict handling for idempotent tenant creation
- +Tenant-scoped queries — no data leakage across boundaries
With HydraDB vs. Without
const result = streamText({
model: google("gemini-2.0-flash"),
system: "You are an H-1B expert.",
// No context — LLM hallucinates
// outdated data, wrong numbers
messages,
});LLM invents numbers, cites non-existent reports, mixes up fiscal years
// Retrieve grounding context
const chunks = await hydraRecall(query);
const result = streamText({
model: google("gemini-2.0-flash"),
system: `You are an H-1B expert.
## Retrieved USCIS Data:
${chunks.join("\n---\n")}`,
messages,
});Every answer grounded in official USCIS data with correct fiscal year citations