Skip to content

Retrieval Adapters (Context & Memory)

What is Retrieval?

Retrieval is the mechanism for giving your LLM "Long-Term Memory" or access to private data. It is the core of RAG (Retrieval-Augmented Generation).

In llm-core, retrieval is not a single capability. It is a pipeline of four distinct stages:

  1. Embedders: Turn text into math (vectors).
  2. Vector Stores: Save (Ingest) those vectors for later.
  3. Retrievers: Read (Query) relevant vectors during a workflow.
  4. Rerankers: Refine the results using a high-precision model.

The RAG Pipeline

graph LR
    Doc[Document] -->|1. Embed| Vector[Vector]
    Vector -->|2. Ingest| DB[(Vector Store)]
    Query([Query]) -->|3. Search| DB
    DB -->|4. Retrieve| Context[Context]
    Context -->|5. Rerank| TopK[Top Results]

1. Retrievers

The Reading Interface

A Retriever takes a string query and returns a list of Documents. It is Read-Only.

When to use what?

  • LlamaIndex (fromLlamaIndexRetriever): Best for complex data. If you are parsing PDFs, building Knowledge Graphs, or using hierarchical indices, LlamaIndex is the gold standard. Use their retrievers to tap into that complexity.

  • LangChain (fromLangChainRetriever): Best for broad database support. If you need to connect to a specific database (Pinecone, Chroma, qdrant) and do a simple similarity search, LangChain likely has the driver you need.


2. Vector Stores

The Writing Interface

While a Retriever reads, a VectorStore adapter writes. It allows llm-core to Upsert (Add/Update) and Delete documents in a database-agnostic way.

Why use an adapter?

Every vector DB SDK has a different API for adding records. llm-core normalizes this so you can write a generic Ingestion Recipe that works with any backed storage.

Example: Indexing a Document

Here is how you wire up a LangChain Memory store to ingest data:

ts
// #region docs
import { fromLangChainVectorStore } from "@geekist/llm-core/adapters";
import type { VectorStore } from "@geekist/llm-core/adapters";
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";

// 1. Wrap the ecosystem Store
const store: VectorStore = fromLangChainVectorStore(new MemoryVectorStore(new OpenAIEmbeddings()));

// 2. Use it in an Ingestion Workflow
interface MyDoc {
  id: string;
  text: string;
  metadata: { author: string };
}

await store.upsert({
  documents: [
    { id: "doc-1", text: "Jason likes coffee.", metadata: { author: "Jason" } },
    { id: "doc-2", text: "Jason hates tea.", metadata: { author: "Jason" } },
  ] as MyDoc[],
});
// #endregion docs

3. Indexing (The Sync Problem)

Why isn't store.upsert enough?

Naive RAG pipelines blindly upsert documents every time they run. This is dangerous because:

  1. Cost: You pay to re-embed text that hasn't changed.
  2. Duplicates: If a file moves or changes slightly, you might get ghost records.
  3. Deletions: If you delete a source file, the vector stays in the DB forever (hallucination risk).

The Solution: Indexing Adapters

An Indexing Adapter sits between your Source and your Vector Store. It tracks a hash of every document to ensure strict synchronization.

Source Docs -> Indexing Adapter -> Vector Store

Integrations

LangChain (RecordManager)

LangChain's Record Manager is the industry standard for this pattern. Our adapter expects a LangChain VectorStore instance (not the llm-core VectorStore adapter). You can still mix-and-match models and other adapters in the workflow, but indexing itself is currently LangChain-native.

ts
// #region docs
import { fromLangChainIndexing } from "@geekist/llm-core/adapters";
import type { Indexing, IndexingResult } from "@geekist/llm-core/adapters";
import type { RecordManagerInterface } from "@langchain/core/indexing";
import type { VectorStore } from "@langchain/core/vectorstores";

const recordManager: RecordManagerInterface = {
  createSchema: async () => {},
  getTime: async () => Date.now(),
  update: async () => {},
  exists: async (keys) => keys.map(() => false),
  listKeys: async () => [],
  deleteKeys: async () => {},
};
const langChainVectorStore = {} as unknown as VectorStore;
const myDocs = [{ id: "doc-1", text: "Hello" }];

// 1. Define the Indexing logic
const indexing: Indexing = fromLangChainIndexing(recordManager, langChainVectorStore);

// 2. Run the sync job
const result: IndexingResult = await indexing.index({
  documents: myDocs,
  options: {
    cleanup: "full",
    sourceIdKey: "source",
  },
});
// #endregion docs
void result;

4. Embedders

The Meaning Maker

Embedders are the hidden workhorses of RAG. They convert "Hello World" into a massive array of numbers (e.g., [0.1, -0.4, 0.8...]).

Dimensions Matter

When choosing an embedder, you must ensure the dimensions (e.g., 1536 for OpenAI text-embedding-3-small) match your Vector Store configuration.

Ecosystem Support

  • AI SDK (EmbeddingModel): Recommended for speed. The AI SDK provides a clean, Type-Safe interface for modern providers like OpenAI, Cohere, and Mistral.

  • LangChain / LlamaIndex: Useful if you are using their document splitting chains, as they often require their own Embeddings instances to be passed in.

ts
// #region docs
import { fromAiSdkEmbeddings } from "@geekist/llm-core/adapters";
import type { Embedder } from "@geekist/llm-core/adapters";
import { openai } from "@ai-sdk/openai";

// Create an embedder capable of batching
const embedder: Embedder = fromAiSdkEmbeddings(openai.embedding("text-embedding-3-small"));

// Embed a batch of text (fallback if embedMany is not available)
const vectors: number[][] = embedder.embedMany
  ? await embedder.embedMany(["Hello", "World"])
  : [await embedder.embed("Hello"), await embedder.embed("World")];
// #endregion docs
void vectors;

4. Rerankers

The Precision Refiner

Vector similarity is "fuzzy"—it finds things that sound alike, but not always things that answer the question. A Reranker takes the top (e.g., 50) results from a Retriever and uses a much smarter (but slower) model to sort them by actual relevance.

Why Rerank?

It significantly improves answer quality (reduces hallucinations) by ensuring the LLM only sees the absolute best context.

Implementation

We align with the AI SDK Reranker standard (RerankingModelV3).

ts
// #region docs
import { fromAiSdkReranker } from "@geekist/llm-core/adapters";
import type { Reranker, Document } from "@geekist/llm-core/adapters";
import type { RerankingModelV3 } from "@ai-sdk/provider";

const mockRerankerModel: RerankingModelV3 = {
  specificationVersion: "v3",
  provider: "mock",
  modelId: "rerank-mini",
  doRerank: async () => ({
    ranking: [{ index: 0, relevanceScore: 1 }],
  }),
};

const reranker: Reranker = fromAiSdkReranker(mockRerankerModel);

// Mocks
const userQuery = "query";
const retrievedDocs: Document[] = [];

// In a custom recipe step:
const refinedDocs: Document[] = await reranker.rerank(userQuery, retrievedDocs);
// #endregion docs

void refinedDocs;

5. Structured Queries (LangChain only)

The Filter Compiler

LangChain exposes a StructuredQuery type used by self-query retrievers and structured filters. We normalize it so you can keep the same filter shape while mixing in any model adapter.

ts
import {
  Comparison,
  StructuredQuery as LangChainStructuredQuery,
} from "@langchain/core/structured_query";
import { fromLangChainStructuredQuery } from "@geekist/llm-core/adapters";

const lcQuery = new LangChainStructuredQuery(
  "find docs",
  new Comparison("eq", "category", "policies"),
);

const query = fromLangChainStructuredQuery(lcQuery);

void query;

You can pass the resulting StructuredQuery into your own retriever filters or recipe steps, regardless of whether your Model adapter comes from AI SDK, LangChain, or LlamaIndex.


6. Query Engines (The "Black Box")

When to use a Query Engine?

In llm-core, we usually encourage you to build Recipes—explicit workflows where you control the Retrieve -> Rerank -> Generate chain.

However, frameworks like LlamaIndex offer pre-packaged "Engines" that encapsulate highly complex retrieval logic (e.g., Sub-Question Query Engines, Multi-Step Reasoning).

Use a Query Engine Adapter when:

  1. You want to use a specific, advanced LlamaIndex strategy.
  2. You don't want to reimplement the orchestration logic yourself.
  3. You treat the retrieval subsystem as an opaque "Oracle".
ts
import { fromLlamaIndexQueryEngine } from "@geekist/llm-core/adapters";
import type { QueryEngine, QueryResult } from "@geekist/llm-core/adapters";
import { BaseQueryEngine } from "@llamaindex/core/query-engine";
import { EngineResponse } from "@llamaindex/core/schema";

class DemoQueryEngine extends BaseQueryEngine {
  async _query(query: string) {
    return EngineResponse.fromResponse(`Answer for ${query}`, false, []);
  }

  protected _getPrompts() {
    return {};
  }

  protected _updatePrompts() {}

  protected _getPromptModules() {
    return {};
  }
}

// 1. Create the complex engine upstream
const complexEngine = new DemoQueryEngine();

// 2. Wrap it as a simple "Query In -> Answer Out" adapter
const queryEngine: QueryEngine = fromLlamaIndexQueryEngine(complexEngine);

// 3. Use it in your workflow
const result: QueryResult = await queryEngine.query("Compare Q1 revenue for Apple and Google");
console.log(result.text);

Response Synthesizers

A Response Synthesizer takes a query and a set of retrieved nodes, and generates a final response. It is the "Generation" half of RAG.

ts
import { fromLlamaIndexResponseSynthesizer } from "@geekist/llm-core/adapters";
import type { ResponseSynthesizer } from "@geekist/llm-core/adapters";
import { BaseSynthesizer } from "@llamaindex/core/response-synthesizers";
import { EngineResponse } from "@llamaindex/core/schema";

class DemoSynthesizer extends BaseSynthesizer {
  constructor() {
    super({});
  }

  async getResponse(query: string, _nodes: unknown[], _stream: boolean) {
    return EngineResponse.fromResponse(`Answer for ${query}`, false, []);
  }

  protected _getPrompts() {
    return {};
  }

  protected _updatePrompts() {}

  protected _getPromptModules() {
    return {};
  }
}

const synthesizerEngine: BaseSynthesizer = new DemoSynthesizer();

const synthesizer: ResponseSynthesizer = fromLlamaIndexResponseSynthesizer(synthesizerEngine);

Supported Integrations (Flex)

We support the full pipeline: Ingestion, Embedding, Storage, Retrieval, and Reranking.

Core RAG Components

CapabilityEcosystemAdapter FactoryUpstream InterfaceDeep Link
RetrievalLangChainfromLangChainRetrieverBaseRetrieverDocs
RetrievalLlamaIndexfromLlamaIndexRetrieverBaseRetrieverDocs
Vector StoreLangChainfromLangChainVectorStoreVectorStoreDocs
Vector StoreLlamaIndexfromLlamaIndexVectorStoreBaseVectorStoreDocs
EmbeddingsAI SDKfromAiSdkEmbeddingsEmbeddingModelDocs
EmbeddingsLangChainfromLangChainEmbeddingsEmbeddingsDocs
EmbeddingsLlamaIndexfromLlamaIndexEmbeddingsBaseEmbeddingDocs
RerankerAI SDKfromAiSdkRerankerRerankingModelV3Docs
RerankerLangChainfromLangChainRerankerBaseDocumentCompressorDocs
RerankerLlamaIndexfromLlamaIndexRerankerBaseNodePostprocessorDocs

Ingestion Utilities (Loaders & Splitters)

We also wrap the upstream ETL tools so you can use them directly in llm-core pipelines.

CapabilityEcosystemAdapter FactoryDeep Link
LoaderLangChainfromLangChainLoaderDocs
LoaderLlamaIndexfromLlamaIndexLoaderDocs
SplitterLangChainfromLangChainTextSplitterDocs
SplitterLlamaIndexfromLlamaIndexTextSplitterDocs

Supported Integrations (Flex)

We support the full pipeline: Ingestion, Embedding, Storage, Retrieval, and Reranking.

Core RAG Components

CapabilityEcosystemAdapter FactoryUpstream InterfaceDeep Link
RetrievalLangChainfromLangChainRetrieverBaseRetrieverDocs
RetrievalLlamaIndexfromLlamaIndexRetrieverBaseRetrieverDocs
Vector StoreLangChainfromLangChainVectorStoreVectorStoreDocs
Vector StoreLlamaIndexfromLlamaIndexVectorStoreBaseVectorStoreDocs
EmbeddingsAI SDKfromAiSdkEmbeddingsEmbeddingModelDocs
EmbeddingsLangChainfromLangChainEmbeddingsEmbeddingsDocs
EmbeddingsLlamaIndexfromLlamaIndexEmbeddingsBaseEmbeddingDocs
RerankerAI SDKfromAiSdkRerankerRerankingModelV3Docs
RerankerLangChainfromLangChainRerankerBaseDocumentCompressorDocs
RerankerLlamaIndexfromLlamaIndexRerankerBaseNodePostprocessorDocs

Ingestion Utilities (Loaders & Splitters)

We also wrap the upstream ETL tools so you can use them directly in llm-core pipelines.

CapabilityEcosystemAdapter FactoryDeep Link
LoaderLangChainfromLangChainLoaderDocs
LoaderLlamaIndexfromLlamaIndexLoaderDocs
SplitterLangChainfromLangChainTextSplitterDocs
SplitterLlamaIndexfromLlamaIndexTextSplitterDocs