HydraDB

The RAG engine powering this app

HydraDB is a multi-tenant context and memory engine with hybrid recall, structured memory, and an edge-native API. This page shows exactly how H1BAgent uses it to turn 30+ USCIS documents into instant, cited answers.

hydradb.com Docs

Multi-TenantHybrid SearchMemory SystemEdge-Native APIPDF Ingestion

<80ms

Recall Latency (p50)

Full recall endpoint

~50 docs/min

Ingest Throughput

PDF extraction + embedding

High

Chunk Precision

Tunable via alpha parameter

Parallel

Concurrency

Both recall endpoints simultaneously

Bearer Token

Auth

Single API key per tenant

Edge-native

Runtime

No Node.js dependency for recall

Why can’t I just self-host Pinecone + Neo4j?

Someone can. But can they hit the latency requirements, ingest at 1M+ tokens per minute, and beat the benchmarks — continuously? HydraDB’s core offering comes in two pieces: infra that scales seamlessly + accuracy at par with SOTA benchmarks.

Not a vector DB. Not a graph DB. A retrieval ecosystem.

Pinecone gives you vector search. Neo4j gives you a graph. Stitching them into a tuned system — ingestion at scale, low-latency recall, benchmark-grade accuracy, entity extraction, graph construction, identity inference — is a different problem. Think Vercel for retrieval: you could rebuild parts yourself, but you’re paying for the integrated, managed, continuously-tuned system.

Accuracy, benchmark-driven

HydraDB tracks SOTA on BEAM / Long MemEval, with finance-specific and private-dataset evaluations on the way. The claim isn’t “we’re easier” — it’s “we’re better, and the gap compounds as benchmarks move.” A DIY Pinecone + Neo4j stack is a static snapshot; HydraDB is a trajectory.

Infra: 1-2M tokens/min ingestion, multi-tenant, low-latency

Real numbers from the engineering work already done: 1-2 million tokens per minute ingestion, multi-tenant by design, sub-second recall at scale. The honest question: can your team independently build and maintain ingestion + retrieval at this scale — not once, but continuously?

Vector + graph, inference included

Entity extraction, graph construction, continuous graph updates, identity and persona inference — all part of the system, not a bolt-on. Already a real differentiator vs pure vector stores, with an active research sprint improving how graph structure feeds retrieval.

Five lines of code, one hour to deploy

Kubernetes is hard. Ingestion pipelines are harder. Tuning retrieval at scale is harder than both. HydraDB’s integration surface is intentionally tiny — five lines of code to get production-grade ingestion and retrieval running. Self-host or managed cloud, same one-hour setup either way. The complexity is hidden, not removed.

BYOC — the real answer to “I want to own the infra”

Want infrastructure ownership? The answer isn’t open source — it’s BYOC (Bring Your Own Cloud). The entire platform deploys inside your own AWS account, one click. Isolated endpoint, stronger security posture, independence from the public HydraDB cloud. If hydradb.com goes down, BYOC customers are unaffected.

The moat: engineering velocity, not the stack

Anyone can assemble Pinecone + Neo4j + an embedding model. The moat is the team continuously operating that stack better than anyone else — benchmark tracking, latency work, ingestion engineering, graph research. Hard to replicate without a dedicated team whose only job is making retrieval better.

Who this is for

Especially relevant for companies whose core product is not search. Building a budgeting app, a CRM, a vertical AI agent? Your engineering time should go to your product, not to becoming a retrieval infrastructure company. H1BAgent is exactly that case — the product is H-1B data clarity, not search infra.

TL;DR: Two pieces you can’t stitch together from parts — infra that scales (1-2M tokens/min ingest, multi-tenant, low-latency recall) and accuracy that stays at SOTA (benchmark-driven, continuously tuned). Self-hosted Pinecone + Neo4j gives you components; HydraDB gives you the system.Learn more at hydradb.com →

The Two Pieces

HydraDB's core offering — neither piece is extractable from Pinecone + Neo4j alone

∿

What HydraDB actually is

Mental model: “Vercel for retrieval.”

Not a vector DB. Not a graph DB. A retrieval ecosystem— the product is the system tuned around the components.

Vector search

Graph search

Entity extraction

Graph construction

Identity inference

Continuous tuning

“Humans don’t remember everything, but they don’t forget everything either. They recall the right thing when it’s needed.” HydraDB’s memory layer is that recall — across 5-year-old context if needed — for agents.

Components vs. System

Anyone can assemble the parts. The moat is operating them as one tuned system — continuously.

Pinecone + Neo4j (DIY) vs. HydraDB

The comparison that matters most

Dimension	Pinecone + Neo4j (DIY)	HydraDB
What you get	Components	A tuned system
Setup	Architect, integrate, test	5 lines of code
Ingest at 1-2M tokens/min	You build it	Already engineered
Retrieval accuracy	Whatever you tune to	SOTA, benchmark-driven
Graph + vector together	You stitch it	Native
Continuous improvement	Your team's side project	Dedicated team's full-time job
Ownership of infra	You host everything	BYOC, in your AWS account
When it breaks at 10x	You find out the hard way	Already engineered for it

Snapshot vs. Trajectory

“Good enough today” is a snapshot. Benchmarks move, competitors advance, your data grows.

BYOC: Bring Your Own Cloud

The real answer to “I want to own the infra.” Better than open source for both sides.

Closed source, by design

Why not open source?

Because the value isn’t the code — it’s the infrastructure, tuning, and operational know-how built around it. If you handed a customer the source tomorrow, they’d still need to host it, tune it, maintain it, upgrade it, optimize latency and throughput.

Open-sourcing adds a maintenance burden for the team while giving customers something they couldn’t extract real value from anyway. Open source = headache for both sides.

The ownership ask is real — and the answer is BYOC, not open source.

The moat

Engineering velocity, not the stack

In one sentence: the compounding gap between a tuned, researched, benchmarked system and a stitched-together one.

“You should know when your system breaks at 10x or 100x. Most teams don’t — until it does.”

Graph layer — honest status

Better than most, not yet the best version

+Already a real differentiator vs pure vector stores
+Inference pipeline (how data is structured and inserted) is part of the moat
+Active research sprint planned to improve how graph structure feeds retrieval
+Knowing when context expires — not just storing everything forever

The honesty is a feature: roadmap, not frozen product.

Who this is for

Companies whose core product is not search

Budgeting apps. CRMs. Vertical AI agents. Your engineering time should go to your product, not to becoming a retrieval infrastructure company.

H1BAgent is exactly that case— the product is H-1B data clarity for applicants, not search infra. One engineer, one product focus, and HydraDB does the retrieval work that would otherwise swallow months.

System Architecture

How H1BAgent uses HydraDB for RAG

RAG Pipeline

End-to-end flow from data to answer

IngestUpload PDFs + structured markdown with metadata

ProcessAuto-chunk, embed, and index into tenant namespace

RecallHybrid semantic + keyword search with alpha tuning

AugmentInject ranked chunks into LLM system prompt

RespondStream grounded, cited answers to the user

How H1BAgent Uses HydraDB

Real production architecture — not a demo

Ingested

30+

USCIS documents (15 PDFs + 15 structured markdown covering FY2001-FY2027)

Recall Strategy

Parallel recall paths (full_recall + recall_preferences) merged into a single context window

Runtime

Edge

Pure fetch() API on Cloudflare Workers edge runtime — no Node SDK needed at query time

What's Inside the Knowledge Base

30+ documents ingested via upload.knowledge() and upload.addMemory()— here's what the workbench queries against

PDF Documents (15 files via upload.knowledge)

PDFCharacteristics of H-1B Workers FY2024

PDFCharacteristics of H-1B Workers FY2023

PDFCharacteristics of H-1B Workers FY2020

PDFCharacteristics of H-1B Workers FY2019

PDFCharacteristics of H-1B Workers FY2018

PDFCharacteristics of H-1B Workers FY2005

PDFCharacteristics of H-1B Workers FY2004

PDFReport on H-1B Petitions FY2024

PDFReport on H-1B Petitions FY2023

PDFReport on H-1B Petitions FY2019

PDFH-1B Trend Tables FY2007-FY2017

PDFTop 30 H-1B Employers FY2018

PDFFY2025 E-Registration Process

PDFH-1B Weighted Selection Compliance Guide

PDFH-1B Characteristics FY2023 (supplemental)

Structured Memories (15 docs via upload.addMemory)

H-1B Program Overview

Visa category basics, dual intent, specialty occupation definition

Cap History & Legislative Changes

65k/85k cap evolution, ACWIA, AC21, H-1B Reform Act

Petitions Filed/Approved FY2001-FY2024

Year-by-year petition volumes with approval counts

Petition Types FY2024

Initial vs continuing employment breakdown

Country of Birth Data

India, China, Canada share trends across 20 years

Age & Sex Demographics

Median age 34, gender split 70/30

Education Levels

Bachelor's vs master's shift from 57/31 to 33/46

Top Occupations

Computer occupations rising from 44% to 64%

Compensation Data

Median salary $52k (FY03) to $120k (FY24) by occupation

Industry Sectors

Professional services, tech, finance distribution

Registration & Lottery System

FY21-FY27 registration counts, selection rates, methods

Fees Structure

Filing fees, ACWIA, fraud prevention, asylum program fees

Premium Processing & RFE

I-907 stats, RFE rates, denial trends

Top Employers FY2007-2017

Infosys, Tata, Cognizant, Wipro petition volumes

FY2027 Weighted Lottery

Wage-level based selection probabilities

How it works: PDFs are auto-chunked and embedded by HydraDB. Structured memories are stored as markdown with is_markdown: true and upsert: true for idempotent updates. At query time, both sources are searched in parallel via full_recall + recall_preferences, with the alpha parameter controlling the semantic vs keyword balance. The top chunks are injected into the LLM system prompt.

Live Workbench

Query HydraDB in real time — tune parameters, compare endpoints

Query

Quick Queries

Endpoint

Alpha 0.70

KeywordSemantic

Recency Bias 0.00

NoneRecent first

Chunks 5

110

Core Capabilities

Click to expand code examples and implementation details

Every tenant gets its own isolated namespace. Sub-tenants enable hierarchical data organization — perfect for per-user, per-org, or per-project knowledge separation.

TypeScript

// Create isolated tenant
await hydra.tenant.create({
  tenant_id: "H1B"
});

// Data always scoped to tenant
await hydra.upload.knowledge({
  files: [pdfBuffer],
  tenant_id: "H1B",
  // Optional sub-tenant for finer isolation
  sub_tenant_id: "user_123"
});

+Complete data isolation between tenants
+Hierarchical sub-tenant support for nested workspaces
+409 conflict handling for idempotent tenant creation
+Tenant-scoped queries — no data leakage across boundaries

With HydraDB vs. Without

Without RAG

const result = streamText({
  model: google("gemini-2.0-flash"),
  system: "You are an H-1B expert.",
  // No context — LLM hallucinates
  // outdated data, wrong numbers
  messages,
});

LLM invents numbers, cites non-existent reports, mixes up fiscal years

With HydraDB RAG

// Retrieve grounding context
const chunks = await hydraRecall(query);

const result = streamText({
  model: google("gemini-2.0-flash"),
  system: `You are an H-1B expert.
    ## Retrieved USCIS Data:
    ${chunks.join("\n---\n")}`,
  messages,
});

Every answer grounded in official USCIS data with correct fiscal year citations

← Back to Dashboard H-1B Blueprint →

HydraDBTechnical Deep Dive