Back

HydraDBTechnical Deep Dive

HydraDB

The RAG engine powering this app

HydraDB is a multi-tenant context and memory engine with hybrid recall, structured memory, and an edge-native API. This page shows exactly how H1BAgent uses it to turn 30+ USCIS documents into instant, cited answers.

Multi-TenantHybrid SearchMemory SystemEdge-Native APIPDF Ingestion

<80ms

Recall Latency (p50)

Full recall endpoint

~50 docs/min

Ingest Throughput

PDF extraction + embedding

High

Chunk Precision

Tunable via alpha parameter

Parallel

Concurrency

Both recall endpoints simultaneously

Bearer Token

Auth

Single API key per tenant

Edge-native

Runtime

No Node.js dependency for recall

?

Why can’t I just self-host Pinecone + Neo4j?

Someone can. But can they hit the latency requirements, ingest at 1M+ tokens per minute, and beat the benchmarks — continuously? HydraDB’s core offering comes in two pieces: infra that scales seamlessly + accuracy at par with SOTA benchmarks.

Not a vector DB. Not a graph DB. A retrieval ecosystem.

Pinecone gives you vector search. Neo4j gives you a graph. Stitching them into a tuned system — ingestion at scale, low-latency recall, benchmark-grade accuracy, entity extraction, graph construction, identity inference — is a different problem. Think Vercel for retrieval: you could rebuild parts yourself, but you’re paying for the integrated, managed, continuously-tuned system.

Accuracy, benchmark-driven

HydraDB tracks SOTA on BEAM / Long MemEval, with finance-specific and private-dataset evaluations on the way. The claim isn’t “we’re easier” — it’s “we’re better, and the gap compounds as benchmarks move.” A DIY Pinecone + Neo4j stack is a static snapshot; HydraDB is a trajectory.

Infra: 1-2M tokens/min ingestion, multi-tenant, low-latency

Real numbers from the engineering work already done: 1-2 million tokens per minute ingestion, multi-tenant by design, sub-second recall at scale. The honest question: can your team independently build and maintain ingestion + retrieval at this scale — not once, but continuously?

Vector + graph, inference included

Entity extraction, graph construction, continuous graph updates, identity and persona inference — all part of the system, not a bolt-on. Already a real differentiator vs pure vector stores, with an active research sprint improving how graph structure feeds retrieval.

Five lines of code, one hour to deploy

Kubernetes is hard. Ingestion pipelines are harder. Tuning retrieval at scale is harder than both. HydraDB’s integration surface is intentionally tiny — five lines of code to get production-grade ingestion and retrieval running. Self-host or managed cloud, same one-hour setup either way. The complexity is hidden, not removed.

BYOC &mdash; the real answer to &ldquo;I want to own the infra&rdquo;

Want infrastructure ownership? The answer isn’t open source — it’s BYOC (Bring Your Own Cloud). The entire platform deploys inside your own AWS account, one click. Isolated endpoint, stronger security posture, independence from the public HydraDB cloud. If hydradb.com goes down, BYOC customers are unaffected.

The moat: engineering velocity, not the stack

Anyone can assemble Pinecone + Neo4j + an embedding model. The moat is the team continuously operating that stack better than anyone else — benchmark tracking, latency work, ingestion engineering, graph research. Hard to replicate without a dedicated team whose only job is making retrieval better.

Who this is for

Especially relevant for companies whose core product is not search. Building a budgeting app, a CRM, a vertical AI agent? Your engineering time should go to your product, not to becoming a retrieval infrastructure company. H1BAgent is exactly that case — the product is H-1B data clarity, not search infra.

TL;DR: Two pieces you can’t stitch together from parts — infra that scales (1-2M tokens/min ingest, multi-tenant, low-latency recall) and accuracy that stays at SOTA (benchmark-driven, continuously tuned). Self-hosted Pinecone + Neo4j gives you components; HydraDB gives you the system.Learn more at hydradb.com →

The Two Pieces

HydraDB's core offering — neither piece is extractable from Pinecone + Neo4j alone

INFRAthat scales seamlessly1-2M tokens/min ingestionMulti-tenant by designSub-second recall latencyBYOC: deploy in your AWS5 lines of code, 1hr setup+ACCURACYat par with SOTA benchmarksBEAM benchmark trackingLong MemEval evaluationsFinance-specific benchmarksPrivate dataset evalsContinuously tuned, not staticHydraDBretrieval ecosystem, not a DB

What HydraDB actually is

Mental model: “Vercel for retrieval.”

Not a vector DB. Not a graph DB. A retrieval ecosystem— the product is the system tuned around the components.

Vector search
Graph search
Entity extraction
Graph construction
Identity inference
Continuous tuning

“Humans don’t remember everything, but they don’t forget everything either. They recall the right thing when it’s needed.” HydraDB’s memory layer is that recall — across 5-year-old context if needed — for agents.

Components vs. System

Anyone can assemble the parts. The moat is operating them as one tuned system — continuously.

DIY stackYou own all the gluePineconeNeo4jEmbedderGraph inferTuning loopIngest queuelatency? scaling? benchmarks? ingest rate?vsHydraDBOne system, tunedVector searchGraph layerEmbedderGraph inferIngest queueTuning loopContinuous benchmark + research loopBEAM / Long MemEval / finance / private evals5 lines of code to integrate

Pinecone + Neo4j (DIY) vs. HydraDB

The comparison that matters most

DimensionPinecone + Neo4j (DIY)HydraDB
What you getComponentsA tuned system
SetupArchitect, integrate, test5 lines of code
Ingest at 1-2M tokens/minYou build itAlready engineered
Retrieval accuracyWhatever you tune toSOTA, benchmark-driven
Graph + vector togetherYou stitch itNative
Continuous improvementYour team's side projectDedicated team's full-time job
Ownership of infraYou host everythingBYOC, in your AWS account
When it breaks at 10xYou find out the hard wayAlready engineered for it

Snapshot vs. Trajectory

“Good enough today” is a snapshot. Benchmarks move, competitors advance, your data grows.

Retrieval qualityTime →DIY stackHydraDBBEAM v1Long MemEvalFinance evalPrivate evalsA stitched-together stack is static. A tuned system compounds.

BYOC: Bring Your Own Cloud

The real answer to “I want to own the infra.” Better than open source for both sides.

Your AWS AccountIsolated deployment. Customer-owned.HydraDB (full platform)Ingest APIVector indexGraph storeRecall APIMemory layerInference pipelineBenchmark engineYour endpoint. Your data. Your perimeter.isolatedPublic HydraDB Cloudhydradb.comif this goes down……BYOC customers unaffectedno runtime dependency
Closed source, by design

Why not open source?

Because the value isn’t the code — it’s the infrastructure, tuning, and operational know-how built around it. If you handed a customer the source tomorrow, they’d still need to host it, tune it, maintain it, upgrade it, optimize latency and throughput.

Open-sourcing adds a maintenance burden for the team while giving customers something they couldn’t extract real value from anyway. Open source = headache for both sides.

The ownership ask is real — and the answer is BYOC, not open source.

The moat

Engineering velocity, not the stack

Anyone can assemble Pinecone + Neo4j + an embedding model. The moat is the team continuously operating that stack better than anyone else — benchmark tracking, latency work, ingestion engineering, ongoing graph research.

In one sentence: the compounding gap between a tuned, researched, benchmarked system and a stitched-together one.

“You should know when your system breaks at 10x or 100x. Most teams don’t — until it does.”

Graph layer — honest status

Better than most, not yet the best version

  • +Already a real differentiator vs pure vector stores
  • +Inference pipeline (how data is structured and inserted) is part of the moat
  • +Active research sprint planned to improve how graph structure feeds retrieval
  • +Knowing when context expires — not just storing everything forever

The honesty is a feature: roadmap, not frozen product.

Who this is for

Companies whose core product is not search

Budgeting apps. CRMs. Vertical AI agents. Your engineering time should go to your product, not to becoming a retrieval infrastructure company.

H1BAgent is exactly that case— the product is H-1B data clarity for applicants, not search infra. One engineer, one product focus, and HydraDB does the retrieval work that would otherwise swallow months.

System Architecture

How H1BAgent uses HydraDB for RAG

upload.knowledge()upload.addMemory()recall/full_recallrecall/recall_preferencesHydraDBPDF UploadMemory StoreMarkdown IngestMetadata TagsFull RecallPreference RecallHybrid SearchMulti-TenantSub-TenantsRanked ChunksIngestionCoreRetrievalIsolationOutput

RAG Pipeline

End-to-end flow from data to answer

1
IngestUpload PDFs + structured markdown with metadata
2
ProcessAuto-chunk, embed, and index into tenant namespace
3
RecallHybrid semantic + keyword search with alpha tuning
4
AugmentInject ranked chunks into LLM system prompt
5
RespondStream grounded, cited answers to the user

How H1BAgent Uses HydraDB

Real production architecture — not a demo

Ingested
30+
USCIS documents (15 PDFs + 15 structured markdown covering FY2001-FY2027)
Recall Strategy
2x
Parallel recall paths (full_recall + recall_preferences) merged into a single context window
Runtime
Edge
Pure fetch() API on Cloudflare Workers edge runtime — no Node SDK needed at query time

What's Inside the Knowledge Base

30+ documents ingested via upload.knowledge() and upload.addMemory()— here's what the workbench queries against

PDF Documents (15 files via upload.knowledge)
PDFCharacteristics of H-1B Workers FY2024
PDFCharacteristics of H-1B Workers FY2023
PDFCharacteristics of H-1B Workers FY2020
PDFCharacteristics of H-1B Workers FY2019
PDFCharacteristics of H-1B Workers FY2018
PDFCharacteristics of H-1B Workers FY2005
PDFCharacteristics of H-1B Workers FY2004
PDFReport on H-1B Petitions FY2024
PDFReport on H-1B Petitions FY2023
PDFReport on H-1B Petitions FY2019
PDFH-1B Trend Tables FY2007-FY2017
PDFTop 30 H-1B Employers FY2018
PDFFY2025 E-Registration Process
PDFH-1B Weighted Selection Compliance Guide
PDFH-1B Characteristics FY2023 (supplemental)
Structured Memories (15 docs via upload.addMemory)
H-1B Program Overview
Visa category basics, dual intent, specialty occupation definition
Cap History & Legislative Changes
65k/85k cap evolution, ACWIA, AC21, H-1B Reform Act
Petitions Filed/Approved FY2001-FY2024
Year-by-year petition volumes with approval counts
Petition Types FY2024
Initial vs continuing employment breakdown
Country of Birth Data
India, China, Canada share trends across 20 years
Age & Sex Demographics
Median age 34, gender split 70/30
Education Levels
Bachelor's vs master's shift from 57/31 to 33/46
Top Occupations
Computer occupations rising from 44% to 64%
Compensation Data
Median salary $52k (FY03) to $120k (FY24) by occupation
Industry Sectors
Professional services, tech, finance distribution
Registration & Lottery System
FY21-FY27 registration counts, selection rates, methods
Fees Structure
Filing fees, ACWIA, fraud prevention, asylum program fees
Premium Processing & RFE
I-907 stats, RFE rates, denial trends
Top Employers FY2007-2017
Infosys, Tata, Cognizant, Wipro petition volumes
FY2027 Weighted Lottery
Wage-level based selection probabilities
How it works: PDFs are auto-chunked and embedded by HydraDB. Structured memories are stored as markdown with is_markdown: true and upsert: true for idempotent updates. At query time, both sources are searched in parallel via full_recall + recall_preferences, with the alpha parameter controlling the semantic vs keyword balance. The top chunks are injected into the LLM system prompt.

Live Workbench

Query HydraDB in real time — tune parameters, compare endpoints

KeywordSemantic
NoneRecent first
110

Core Capabilities

Click to expand code examples and implementation details

Every tenant gets its own isolated namespace. Sub-tenants enable hierarchical data organization — perfect for per-user, per-org, or per-project knowledge separation.

TypeScript
// Create isolated tenant
await hydra.tenant.create({
  tenant_id: "H1B"
});

// Data always scoped to tenant
await hydra.upload.knowledge({
  files: [pdfBuffer],
  tenant_id: "H1B",
  // Optional sub-tenant for finer isolation
  sub_tenant_id: "user_123"
});
  • +Complete data isolation between tenants
  • +Hierarchical sub-tenant support for nested workspaces
  • +409 conflict handling for idempotent tenant creation
  • +Tenant-scoped queries — no data leakage across boundaries

With HydraDB vs. Without

Without RAG
const result = streamText({
  model: google("gemini-2.0-flash"),
  system: "You are an H-1B expert.",
  // No context — LLM hallucinates
  // outdated data, wrong numbers
  messages,
});

LLM invents numbers, cites non-existent reports, mixes up fiscal years

With HydraDB RAG
// Retrieve grounding context
const chunks = await hydraRecall(query);

const result = streamText({
  model: google("gemini-2.0-flash"),
  system: `You are an H-1B expert.
    ## Retrieved USCIS Data:
    ${chunks.join("\n---\n")}`,
  messages,
});

Every answer grounded in official USCIS data with correct fiscal year citations