
How we combine vector and full-text search with configurable weighting strategies, track search analytics, and built a 37-component dataset management UI with upload, processing, and search.
Part 1 covered the data model and document processing pipeline. Part 2 covered embedding generation and vector search. This post covers the layer that ties everything together — the search system and the UI.
Vector search finds semantically similar content. "shipping delays" matches "delivery postponement." But it misses exact terms — searching for order number "ORD-12345" returns noise because the embedding doesnt capture literal string matching.
Full-text search finds exact and stemmed keyword matches. "return policy" matches "returns" and "returning." But it misses semantic equivalence — "refund process" wont match "how to get your money back."
Hybrid search runs both, combines the results, and ranks by a weighted score. The user gets semantic understanding and keyword precision. This is the approach that production RAG systems converge on — and its what we built.
Full-text search uses the pre-computed searchVector column from Part 1. The tsvector is generated by PostgreSQL at insert time and indexed with GIN. At query time, we match against the index — no text processing needed.
// packages/lib/src/datasets/search/full-text-search.ts
export class FullTextSearchService {
static async search(
query: SearchQuery,
datasetIds: string[],
organizationId: string,
_userId?: string
): Promise<{ results: SearchResult[]; metrics: SearchPerformanceMetrics }> {
const searchResults = await FullTextSearchService.performFullTextSearch(
query.query,
datasetIds,
organizationId,
{
fuzzySearch: true,
phraseSearch: false,
booleanMode: false,
rankingMode: 'bm25',
minScore: 0.1,
filters: query.filters,
}
)
// Pagination
const limit = query.limit || 20
const offset = query.offset || 0
return { results: searchResults.slice(offset, offset + limit), metrics }
}
}
Under the hood, the actual SQL uses plainto_tsquery and ts_rank_cd:
SELECT
"DocumentSegment".id,
"DocumentSegment".content,
ts_rank_cd("searchVector", plainto_tsquery('english', $query)) as rank
FROM "DocumentSegment"
WHERE "searchVector" @@ plainto_tsquery('english', $query)
AND "indexStatus" = 'INDEXED'
AND "documentId" IN (SELECT id FROM "Document" WHERE "datasetId" = ANY($datasetIds) AND enabled = true)
ORDER BY rank DESC
LIMIT 100
A few design decisions worth explaining.
Pre-computed searchVector, not runtime tsvector generation. Computing to_tsvector('english', content) at query time processes every matching rows full text. A stored, GIN-indexed searchVector column turns this into an index lookup. For 100,000 segments, this is 5ms vs 500ms.
plainto_tsquery over to_tsquery. to_tsquery requires the caller to handle boolean operators and escaping (cat & dog, not cat and dog). plainto_tsquery accepts natural language input and converts it. Users type questions, not boolean expressions.
ts_rank_cd for ranking. The _cd variant (cover density) considers proximity of matching terms, not just frequency. "return policy" ranks higher when the words appear near each other in the segment, not scattered across 500 words.
100-result cap per query. Full-text search can return thousands of matches. We cap at 100 server-side. The hybrid combiner and pagination handle the rest. This keeps memory bounded and response times consistent.
Filters applied in SQL, not in-memory. Document type, MIME type, date range, and status filters are in the WHERE clause, not post-query JavaScript filtering. PostgreSQLs query planner combines the GIN index scan with these filters efficiently.
This is where it gets interesting. Hybrid search runs vector and text search in parallel, then combines the results.
// packages/lib/src/datasets/search/hybrid-search.ts
export class HybridSearchService {
static async search(
query: SearchQuery,
datasetConfigs: DatasetConfig[],
organizationId: string,
userId?: string
): Promise<{ results: SearchResult[]; metrics: SearchPerformanceMetrics }> {
const datasetIds = datasetConfigs.map(d => d.id)
// Read weights from query or use defaults
const hybridOptions: HybridSearchOptions = {
vectorWeight: query.vectorWeight ?? 0.6,
textWeight: query.textWeight ?? 0.4,
combineMethod: query.combineMethod || 'weighted_sum',
}
// Execute both searches in parallel
const [vectorResult, textResult] = await Promise.allSettled([
VectorSearchService.search(query, datasetConfigs, organizationId, userId),
FullTextSearchService.search(query, datasetIds, organizationId, userId),
])
// Handle partial failures gracefully
const vectorResults = vectorResult.status === 'fulfilled' ? vectorResult.value.results : []
const textResults = textResult.status === 'fulfilled' ? textResult.value.results : []
// If both failed, throw
if (vectorResults.length === 0 && textResults.length === 0
&& vectorResult.status === 'rejected' && textResult.status === 'rejected') {
throw new SearchError('Both vector and text searches failed')
}
// Combine and rank
const combinedResults = HybridSearchService.combineSearchResults(
vectorResults, textResults, hybridOptions
)
return { results: combinedResults.slice(offset, offset + limit), metrics }
}
}
Parallel execution. Vector and text search are independent — neither needs the others results. Promise.allSettled cuts latency to max(vector_time, text_time) instead of vector_time + text_time. For typical dataset sizes, this is ~80ms instead of ~150ms.
Promise.allSettled, not Promise.all. If vector search fails (e.g., embedding API timeout), text search results still return. Promise.all would throw on the first failure and discard the successful results. A partial result set is better than an error page.
The combiner supports three strategies for merging vector and text results. Each handles the "how do you combine scores from two different systems?" problem differently.
// packages/lib/src/datasets/search/hybrid-search.ts
private static combineByWeightedSum(
vectorResult: SearchResult | undefined,
textResult: SearchResult | undefined,
vectorWeight: number,
textWeight: number
): SearchResult | null {
if (!vectorResult && !textResult) return null
const baseResult = vectorResult || textResult!
let combinedScore = 0
if (vectorResult) combinedScore += (vectorResult.score || 0) * vectorWeight
if (textResult) combinedScore += (textResult.score || 0) * textWeight
return { ...baseResult, score: combinedScore, searchType: 'hybrid' }
}
finalScore = (vectorScore * 0.6) + (textScore * 0.4)
Simple, intuitive, configurable. The default 60/40 split favors semantic relevance while keeping keyword precision as a strong signal. Works well when both score distributions are roughly similar.
// packages/lib/src/datasets/search/hybrid-search.ts
private static combineByRRF(
vectorResult: SearchResult | undefined,
textResult: SearchResult | undefined,
k: number = 60
): SearchResult | null {
if (!vectorResult && !textResult) return null
const baseResult = vectorResult || textResult!
let rrfScore = 0
// RRF formula: 1 / (k + rank)
if (vectorResult) rrfScore += 1 / (k + (vectorResult.rank || 1))
if (textResult) rrfScore += 1 / (k + (textResult.rank || 1))
return { ...baseResult, score: rrfScore, searchType: 'hybrid' }
}
finalScore = 1/(k + vectorRank) + 1/(k + textRank) where k = 60.
RRF ignores raw scores entirely — only rank positions matter. This is the great equalizer. Vector scores (0-1 cosine similarity) and text scores (0-infinity ts_rank) are on completely different scales. Weighted sum requires the scores to be comparable. RRF sidesteps this entirely by using ranks instead of scores. Its robust and requires no tuning beyond the k parameter.
// packages/lib/src/datasets/search/hybrid-search.ts
private static combineByLinearCombination(
vectorResult: SearchResult | undefined,
textResult: SearchResult | undefined,
vectorWeight: number,
textWeight: number
): SearchResult | null {
if (!vectorResult && !textResult) return null
const baseResult = vectorResult || textResult!
// Normalize scores to 0-1 before combining
const vectorScore = vectorResult
? HybridSearchService.normalizeScore(vectorResult.score || 0, 'vector') : 0
const textScore = textResult
? HybridSearchService.normalizeScore(textResult.score || 0, 'text') : 0
const combinedScore = vectorScore * vectorWeight + textScore * textWeight
return { ...baseResult, score: combinedScore, searchType: 'hybrid' }
}
Score-based like weighted sum, but with min-max normalization first. This handles the scale mismatch between cosine similarity and ts_rank by mapping both to 0-1 before combining. A middle ground between weighted sum (no normalization) and RRF (no scores).
A segment can appear in both vector and text results. The combiner handles this by indexing all results by segment ID into two maps, then iterating over the union of all segment IDs. Each segment gets one combined score, not two entries.
// packages/lib/src/datasets/search/hybrid-search.ts
const vectorMap = new Map<string, SearchResult>()
const textMap = new Map<string, SearchResult>()
const allSegmentIds = new Set<string>()
vectorResults.forEach(result => {
vectorMap.set(result.segment.id, result)
allSegmentIds.add(result.segment.id)
})
textResults.forEach(result => {
textMap.set(result.segment.id, result)
allSegmentIds.add(result.segment.id)
})
// Combine per segment
allSegmentIds.forEach(segmentId => {
const vectorResult = vectorMap.get(segmentId)
const textResult = textMap.get(segmentId)
// ... combine using selected method
})
A segment that scores 0.9 in vector and 0.85 in text gets one combined entry — not two separate results pointing to the same content.
Every search is recorded. Not as a log line — as structured data in the database.
// packages/lib/src/datasets/services/search.service.ts
export class SearchService {
static async search(
query: SearchQuery,
organizationId: string,
userId?: string
): Promise<SearchResponse> {
const startTime = Date.now()
// ... validate, fetch datasets, execute search ...
// Fire-and-forget analytics recording
void SearchService.recordSearchAnalytics({
query: query.query,
queryType: query.searchType,
resultsCount: results.length,
responseTime: Date.now() - startTime,
vectorSimilarityThreshold: query.similarityThreshold,
maxResults: query.limit,
filters: query.filters,
organizationId,
userId: finalUserId,
})
return { results, total, query: query.query, searchType, responseTime, hasMore }
}
}
Fire-and-forget recording. Analytics are written asynchronously after the search response is sent. A DB write failure doesnt slow down or block the search. The void prefix explicitly marks the un-awaited promise — this is intentional, not a missing await.
Per-result score tracking. DatasetSearchResult stores rank and score for every result in every search. This enables "what score threshold captures 95% of clicked results?" — data-driven threshold tuning instead of guesswork.
Search history with deduplication. getSearchHistory() returns a users recent queries, deduplicated. Searching "return policy" three times shows up once. This powers the autocomplete/suggestions feature.
Aggregate analytics. getSearchAnalytics() computes: total queries, average response time, popular queries (by frequency), and search type distribution (vector/text/hybrid). This is the dashboard data for "is search working well?"
Suggestions from history + popularity. getSuggestions() combines the users personal history with globally popular queries. Type "ret" and see "return policy" (popular across the org) and "retrieve order status" (from your history).
The SearchService is the single entry point for all search. It validates, routes, paginates, and records.
// packages/lib/src/datasets/services/search.service.ts
export class SearchService {
static async search(
query: SearchQuery,
organizationId: string,
userId?: string
): Promise<SearchResponse> {
// 1. Validate query (non-empty, max 1000 chars)
SearchService.validateQuery(query)
// 2. Fetch accessible datasets with embedding configs
const accessibleDatasetConfigs = await SearchService.getAccessibleDatasets(
organizationId, finalUserId, query.datasetIds, query.includeInactive
)
// 3. Route to search implementation
switch (query.searchType) {
case 'vector':
results = await VectorSearchService.search(query, accessibleDatasetConfigs, /* ... */)
break
case 'text':
results = await FullTextSearchService.search(query, accessibleDatasetIds, /* ... */)
break
case 'hybrid':
default:
results = await HybridSearchService.search(query, accessibleDatasetConfigs, /* ... */)
break
}
// 4. Record analytics (fire-and-forget)
void SearchService.recordSearchAnalytics(/* ... */)
// 5. Return response with metrics
return { results, total, responseTime, searchType, hasMore }
}
}
Dataset accessibility check. Not all datasets are searchable by all users. The service fetches datasets the user has access to (org-scoped, active status) and intersects with the requested datasetIds. This prevents searching archived or cross-org datasets.
Search type routing. The searchType parameter determines which implementation runs. The search service doesnt know how vector search works — it delegates. Adding a new search strategy (e.g., graph-based) means adding one implementation, not modifying the orchestrator.
Every response includes responseTime. Measured at the orchestration layer — includes DB queries, embedding generation, result enrichment. The number the user actually experiences, not just the vector comparison time.
The dataset UI has 37 components across 6 areas. Each component has one job.
Two views: a visual grid (dataset-card.tsx) and a data-dense table (datasets-table-view.tsx). Toggle between them. Grid shows status badges, document counts, last indexed date. Table adds sorting by any column.
Stats cards at the top (datasets-stats-cards.tsx) show total datasets, total documents, total size, and average search time. A quick health check without drilling into any dataset.
The empty state (datasets-empty-state.tsx) isnt just "no datasets" — its a guided creation flow. Upload files inline or create an empty dataset to configure first.
The detail page uses a provider pattern. dataset-detail-provider.tsx wraps everything in a React context. Dataset data, documents, and actions are available to all child components without prop drilling.
The document management area (documents/document-management.tsx) is the document table with filtering by status, type, and search. Batch operations — reprocess, delete, archive — via multi-select. Real-time processing progress shows via document-processing-progress.tsx.
The document detail drawer (documents/document-detail-drawer.tsx) opens as a side drawer showing the full document info: extracted content preview, chunk settings (dataset defaults or document overrides), processing metrics (time, chunk count), and error messages for failed documents.
The drag-and-drop zone (document-upload-zone.tsx) accepts PDF, DOCX, TXT, and HTML. It shows file type icons, validates file size, and warns about duplicates (checksum check against existing documents). Multiple files at once.
Processing feedback is split between toasts and inline status. document-processing-toast.tsx shows toast notifications for processing start/completion/failure. processing-status.tsx renders inline status badges (UPLOADED, PROCESSING with spinner animation, INDEXED, FAILED) in the document table.
The search method selector (search/search-method-selector.tsx) toggles between Vector, Text, and Hybrid. Each method shows its own options panel:
Advanced filters (search/advanced-search-options.tsx) let you filter by document type, MIME type, date range, file size, and metadata. The panel is collapsible so it doesnt clutter the default search.
Each result item (search/search-result-item.tsx) shows: matched content with highlights, similarity score, source document name, dataset name, and a search type badge.
Five settings sections, each in its own component:
Each section is independently saveable. Changing the embedding model triggers a warning that existing documents will need reprocessing.
// apps/web/src/server/api/routers/dataset.ts
create: protectedProcedure
.input(createDatasetSchema)
.mutation(async ({ ctx, input }) => {
// Check feature access + limits
await FeaturePermissionService.check(ctx.session.organizationId, 'datasets')
// ... create dataset
})
Feature gating, not just auth. Creating a dataset isnt just "is the user logged in?" — its "does this organizations plan include datasets, and are they under their dataset limit?" FeaturePermissionService handles both checks.
Processing status as a dedicated endpoint. getProcessingStatus returns a breakdown: { uploaded: 3, processing: 2, indexed: 45, failed: 1 }. The UI polls this during bulk uploads to show real-time progress without loading full document data.
Stats aggregation in SQL. getStats computes document count, total size, average processing time, and total searches via SQL aggregations — not by loading all documents into memory. For a dataset with 10,000 documents, this is a 5ms query vs a 500ms data transfer.
| Decision | Trade-off | Why we chose it |
|---|---|---|
| Hybrid search as default | Runs two search queries instead of one | Neither approach alone is good enough for production RAG |
| RRF as an option | Ignores raw scores (only ranks) | Handles scale mismatch between cosine similarity and ts_rank |
| Pre-computed tsvectors | Slight write overhead per segment | 100x faster full-text search at query time |
| Fire-and-forget analytics | Analytics can be lost on crash | Search latency matters more than analytics completeness |
| 37 UI components | Many files | Each component has one job — maintainable as the feature grows |
| Promise.allSettled for hybrid | More complex error handling | Partial results are better than total failure |
| SQL-side filtering for FTS | More complex SQL | Avoids loading thousands of results into memory |
Across these three posts, the dataset engine breaks down to:
All running on PostgreSQL with pgvector. No external vector database. No separate search service. One database, one deployment, one set of credentials.
The dataset engine is open source as part of Auxx.ai. If youre building RAG for a SaaS product and debating whether to use Pinecone or build your own — PostgreSQL is probably enough.