VectorKLEngine¶

VectorKLEngine is a vector database-backed BaseKLEngine implementation that enables semantic similarity search over knowledge objects using vector embeddings combined with metadata filtering. It leverages LlamaIndex’s unified vector store abstraction to provide flexible, high-performance semantic retrieval through Filter operators—a powerful system for combining vector similarity with structured metadata constraints.

1. Introduction¶

1.2. When to Use VectorKLEngine¶

Ideal Use Cases:

  • Semantic Similarity Search: Find documents related to a concept without exact keyword matches (e.g., “machine learning” finds “neural networks”, “deep learning”)

  • Question Answering: Retrieve relevant context passages for questions

  • Recommendation Systems: Find similar items based on content semantics

  • Hybrid Search: Combine vector similarity with metadata filtering (e.g., “recent papers about transformers”)

  • Cross-Lingual Search: Find documents in different languages with multilingual embeddings

  • Multi-Modal Retrieval: Search across text, images, audio with appropriate embedding models

Not Suitable For:

  • Exact Attribute Matching: Use FacetKLEngine for precise filtering by type, status, dates, etc.

  • Entity Recognition: Use DAACKLEngine for finding known entity strings in text

  • Deterministic Filtering: Vector search is approximate and may miss exact matches

  • Low-Latency Requirements: Vector search is slower than database index lookups


1.3. Key Features¶

  • Semantic Similarity: Find conceptually related content using neural embedding models

  • Metadata Filtering: Combine vector similarity with structured filters through Filter operators

  • Multiple Vector Backends: Supports LanceDB, ChromaDB, Milvus, SimpleVectorStore through LlamaIndex

  • Flexible Encoding: Custom encoder functions to extract searchable text from knowledge objects

  • Configurable Embedders: Use any embedding model (OpenAI, local models, custom)

  • Two Operating Modes:

    • In-Place Mode (inplace=True): Directly query the attached VectorKLStore with zero overhead

    • Non-In-Place Mode (inplace=False): Create an optimized copy with schema subsetting

  • Relevance Scoring: Returns similarity scores for ranking and thresholding

  • Top-K Retrieval: Efficiently retrieve only the most relevant results


2. Understanding Filter Operators¶

The Filter class provides a fluent API for building metadata filter expressions that work with LlamaIndex’s MetadataFilters. These filters constrain vector search results to match specific metadata criteria. The API is very similar to KLOp operators used in FacetKLEngine, but generates LlamaIndex filter objects instead of SQL.

2.1. Comparison Operators¶

These operators compare metadata field values against constants:

Equality and Inequality:

from ahvn.utils.klop import KLOp

# Exact match (field == value)
KLOp.expr(type="tutorial")
# Generated: {"FIELD:type": {"==": "tutorial"}}
# LlamaIndex: ExactMatchFilter(key="type", value="tutorial")

# Not equal (field != value) - use NOT wrapper
KLOp.expr(status=KLOp.NOT("archived"))
# Generated: {"FIELD:status": {"NOT": {"==": "archived"}}}
# LlamaIndex: MetadataFilters(filters=[MetadataFilter(key="status", value="archived", operator="!=")], condition="not")

Numeric Comparisons:

# Less than
KLOp.expr(priority=KLOp.LT(5))
# Generated: {"FIELD:priority": {"<": 5}}
# LlamaIndex: MetadataFilter(key="priority", value=5, operator="<")

# Less than or equal
KLOp.expr(priority=KLOp.LTE(10))
# Generated: {"FIELD:priority": {"<=": 10}}

# Greater than
KLOp.expr(score=KLOp.GT(80))
# Generated: {"FIELD:score": {">": 80}}

# Greater than or equal
KLOp.expr(score=KLOp.GTE(90))
# Generated: {"FIELD:score": {">=": 90}}

DateTime Comparisons:

import datetime

# Filter by date range
KLOp.expr(created_at=KLOp.GTE(datetime.datetime(2024, 1, 1)))
# Generated: {"FIELD:created_at": {">=": datetime.datetime(2024, 1, 1)}}

KLOp.expr(updated_at=KLOp.LTE(datetime.datetime(2024, 12, 31)))
# Generated: {"FIELD:updated_at": {"<=": datetime.datetime(2024, 12, 31)}}

2.2. Pattern Matching Operators¶

Text pattern matching operators (mapped to LlamaIndex text matching):

Case-Sensitive Pattern Matching (LIKE → text_match):

# Wildcard matching
KLOp.expr(name=KLOp.LIKE("%Python%"))  # Contains "Python"
# Generated: {"FIELD:name": {"LIKE": "%Python%"}}
# LlamaIndex: MetadataFilter(key="name", value="%Python%", operator="text_match")

KLOp.expr(name=KLOp.LIKE("Python%"))   # Starts with "Python"
KLOp.expr(name=KLOp.LIKE("%Tutorial"))  # Ends with "Tutorial"

Case-Insensitive Pattern Matching (ILIKE → text_match_insensitive):

# Case-insensitive search
KLOp.expr(description=KLOp.ILIKE("%python%"))  # Matches "Python", "PYTHON", "python"
# Generated: {"FIELD:description": {"ILIKE": "%python%"}}
# LlamaIndex: MetadataFilter(key="description", value="%python%", operator="text_match_insensitive")

Note: Pattern matching behavior depends on the vector database backend. Some backends may not support text matching operators.


2.3. Range Operators¶

Filter values within numeric or datetime ranges:

BETWEEN Operator:

# Inclusive range [min, max]
KLOp.expr(score=KLOp.BETWEEN(0, 100))
# Generated: {"FIELD:score": {"AND": [{">=": 0}, {"<=": 100}]}}
# LlamaIndex: MetadataFilters with two filters (>= and <=) combined with AND

# Open-ended ranges with None
KLOp.expr(price=KLOp.BETWEEN(100, None))  # >= 100 (no upper limit)
# Generated: {"FIELD:price": {"AND": [{">=": 100}, {"<=": inf}]}}

KLOp.expr(age=KLOp.BETWEEN(None, 65))     # <= 65 (no lower limit)
# Generated: {"FIELD:age": {"AND": [{">=": -inf}, {"<=": 65}]}}

Tuple Shorthand:

# Tuples automatically convert to BETWEEN
KLOp.expr(priority=(1, 10))
# Equivalent to: KLOp.expr(priority=KLOp.BETWEEN(1, 10))
# Generated: {"FIELD:priority": {"AND": [{">=": 1}, {"<=": 10}]}}

2.4. Logical Operators¶

Combine multiple conditions with boolean logic:

AND Operator (Logical Conjunction):

# All conditions must be true
KLOp.expr(score=KLOp.AND([KLOp.GTE(80), KLOp.LTE(100)]))
# Generated: {"FIELD:score": {"AND": [{">=": 80}, {"<=": 100}]}}
# LlamaIndex: MetadataFilters(filters=[...], condition="and")

OR Operator (Logical Disjunction):

# At least one condition must be true
KLOp.expr(status=KLOp.OR(["active", "pending", "reviewing"]))
# Generated: {"FIELD:status": {"OR": [{"IN": ["active", "pending", "reviewing"]}]}}
# LlamaIndex: MetadataFilters(filters=[ExactMatchFilter(...), ...], condition="or")

# Mix operators in OR
KLOp.expr(priority=KLOp.OR([KLOp.GTE(8), KLOp.LIKE("urgent%")]))
# Generated: {"FIELD:priority": {"OR": [{">=": 8}, {"LIKE": "urgent%"}]}}

NOT Operator (Logical Negation):

# Negate any condition
KLOp.expr(status=KLOp.NOT("deleted"))
# Generated: {"FIELD:status": {"NOT": {"==": "deleted"}}}
# LlamaIndex: MetadataFilters(filters=[...], condition="not")

KLOp.expr(name=KLOp.NOT(KLOp.LIKE("%deprecated%")))
# Generated: {"FIELD:name": {"NOT": {"LIKE": "%deprecated%"}}}

IN Operator (Membership Test):

# Alias for OR with simple values
KLOp.expr(category=KLOp.IN(["tutorial", "guide", "reference"]))
# Equivalent to: KLOp.expr(category=KLOp.OR([...]))
# Generated: {"FIELD:category": {"OR": [{"IN": ["tutorial", "guide", "reference"]}]}}
# LlamaIndex: MetadataFilters with multiple ExactMatchFilter combined with OR

# Lists automatically convert to OR/IN
KLOp.expr(status=["active", "pending"])
# Generated: {"FIELD:status": {"OR": [{"IN": ["active", "pending"]}]}}

2.5. Multiple Field Expressions¶

Combine filters on multiple fields (implicit AND):

# Multiple fields create AND structure
KLOp.expr(
    type="documentation",
    priority=KLOp.GTE(5),
    status=KLOp.OR(["active", "reviewing"]),
    name=KLOp.LIKE("%Tutorial%")
)
# Generated: {
#   "AND": [
#     {"FIELD:type": {"==": "documentation"}},
#     {"FIELD:priority": {">=": 5}},
#     {"FIELD:status": {"OR": [{"IN": ["active", "reviewing"]}]}},
#     {"FIELD:name": {"LIKE": "%Tutorial%"}}
#   ]
# }
# LlamaIndex: MetadataFilters(filters=[...], condition="and")

2.6. Complex Nested Expressions¶

Build arbitrarily complex filter trees:

# Sophisticated multi-level filtering
KLOp.expr(
    # Pattern matching
    name=KLOp.LIKE("%agent%"),
    
    # Multiple status options
    status=KLOp.OR(["active", "pending", "reviewing"]),
    
    # Score range with explicit AND
    score=KLOp.AND([KLOp.GTE(80), KLOp.LTE(100)]),
    
    # Date range
    created_at=KLOp.GTE(datetime.datetime(2024, 1, 1)),
    
    # Negation
    description=KLOp.NOT(KLOp.LIKE("%deprecated%"))
)
# All conditions combined with AND logic
# LlamaIndex: Complex nested MetadataFilters structure

2.7. Key Differences from KLOp Operators¶

While Filter and KLOp have very similar APIs, there are important differences:

Feature

KLOp (FacetKLEngine)

Filter (VectorKLEngine)

Target System

SQL databases (SQLAlchemy)

Vector databases (LlamaIndex)

Output Format

SQLAlchemy ClauseElement

LlamaIndex MetadataFilters

NF Operator

Dimension table queries

Not supported*

Backend Support

PostgreSQL, MySQL, SQLite, DuckDB, MSSQL, etc.

LanceDB, ChromaDB, Milvus, etc.

Primary Use

Structured filtering

Metadata filtering + vector similarity

Note: The NF operator is not supported in Filter because vector databases don’t have the same dimension table concept as relational databases. Complex nested metadata should be flattened into top-level fields for vector storage.


3. Quick Start¶

3.1. Basic Usage with In-Place Mode¶

from ahvn.klstore import VectorKLStore
from ahvn.klengine import VectorKLEngine
from ahvn.utils.klop import KLOp
from ahvn.llm import LLM
from ahvn.ukf import BaseUKF

# Create embedder
embedder = LLM(preset="embedder")  # Uses default embedding model

# Create vector storage
store = VectorKLStore(
    provider="lancedb",
    uri="./knowledge_vectors",
    collection="documents",
    encoder=lambda kl: kl.content,  # Extract text for embedding
    embedder=embedder
)

# Populate with knowledge objects
kls = [
    BaseUKF(
        name="Python Tutorial", 
        type="tutorial", 
        priority=5,
        content="Learn Python programming from basics to advanced concepts",
        tags={"programming", "python", "beginner"}
    ),
    BaseUKF(
        name="Machine Learning Guide",
        type="guide",
        priority=8,
        content="Introduction to machine learning algorithms and neural networks",
        tags={"ai", "machine-learning", "tutorial"}
    ),
    BaseUKF(
        name="SQL Database Design",
        type="tutorial",
        priority=7,
        content="Database design principles and SQL query optimization",
        tags={"database", "sql", "advanced"}
    ),
]
store.batch_upsert(kls)
store.flush()

# Create VectorKLEngine in in-place mode (directly queries store)
engine = VectorKLEngine(storage=store, inplace=True)

# Semantic search without filters
results = engine.search(
    query="How do I learn programming?",
    topk=2,
    include=["id", "kl", "score"]
)
print(f"Found {len(results)} results")
for result in results:
    print(f"- {result['kl'].name} (score: {result['score']:.3f})")

# Semantic search WITH metadata filters
results = engine.search(
    query="neural network architectures",
    topk=5,
    include=["id", "kl", "score"],
    type="guide",  # Only guides
    priority=KLOp.GTE(7)  # High priority only
)
for result in results:
    kl = result['kl']
    print(f"- {kl.name} (priority: {kl.priority}, score: {result['score']:.3f})")

3.2. Initialization Parameters¶

Required Parameters:

  • storage (VectorKLStore): The vector-backed KLStore to query. Must be a VectorKLStore instance.

Mode Parameters:

  • inplace (bool, default: True): Operating mode

    • True: Query directly on storage vector database (zero overhead, no copying)

    • False: Create separate index collection with schema subsetting

Schema Parameters (for inplace=False only):

  • include (List[str], optional): List of BaseUKF field names to include in index. If None, includes all fields.

  • exclude (List[str], optional): List of BaseUKF field names to exclude from index. Applied after include.

Filter Parameters:

  • filters (Dict[str, Any], optional): Global filters applied to all searches. Uses same format as search filters.

Common Parameters:

  • name (str, optional): Engine instance name. Defaults to "{storage.name}_vec_idx".

  • condition (Callable, optional): Filter function for conditional indexing. Only UKFs satisfying the condition are indexed.

Vector Database Parameters (for inplace=False only):

  • provider (str, optional): Vector database provider (“lancedb”, “chroma”, “chromalite”, “milvuslite”). Uses config default if omitted.

  • collection (str, optional): Collection/table name in vector database. Defaults to engine name.

  • encoder (Callable or Tuple[Callable, Callable], optional): Text extraction function(s). Can be single function or (key_encoder, query_encoder) tuple.

  • embedder (Callable or LLM, optional): Embedding function or LLM instance to generate vectors.

  • Additional kwargs: Connection parameters specific to the vector database provider (uri, path, mode, etc.)


4. Operating Modes¶

VectorKLEngine supports two distinct operating modes with different performance characteristics:

4.1. In-Place Mode (inplace=True)¶

How It Works:

  • Engine directly queries the attached VectorKLStore without creating any additional structures

  • All operations (search, get) are routed to the storage backend

  • Zero setup time, zero storage overhead

  • Modifications to storage are immediately visible in search results

Characteristics:

  • Setup Time: Instant (no indexing required)

  • Storage Overhead: None (uses existing store)

  • Query Performance: Depends on storage vector index

  • Synchronization: Always up-to-date (no sync needed)

  • Schema: All metadata fields from storage are queryable

When to Use:

  • Development and prototyping (fastest setup)

  • Small to medium datasets (< 100K objects)

  • Dynamic data (frequent inserts/updates)

  • When storage has appropriate vector indexes

  • When all metadata fields are needed for filtering

Example:

store = VectorKLStore(provider="lancedb", uri="./vectors", embedder=embedder)
engine = VectorKLEngine(storage=store, inplace=True)

# Immediately ready to query (no indexing phase)
results = engine.search(query="machine learning", topk=5)

Note: In in-place mode, upsert(), insert(), remove(), clear() operations are no-ops since the engine doesn’t maintain separate state.


4.2. Non-In-Place Mode (inplace=False)¶

How It Works:

  • Engine creates a separate index collection with schema subsetting

  • Only specified metadata fields (include parameter) are copied to index

  • Requires explicit synchronization with storage when data changes

  • Optimized queries on subset of metadata fields

Characteristics:

  • Setup Time: Requires initial sync (copies data and embeddings to index)

  • Storage Overhead: Duplicate data (index collection)

  • Query Performance: Faster on subset schemas (fewer fields = better indexes)

  • Synchronization: Manual sync() required after storage changes

  • Schema: Only included metadata fields are queryable

When to Use:

  • Large datasets (> 100K objects) where schema subsetting improves performance

  • Static or slowly changing data (infrequent updates)

  • When only a subset of metadata is needed for filtering

  • To reduce index size and improve query speed

  • Different embedding models for search vs storage

Example:

store = VectorKLStore(provider="lancedb", uri="./vectors", embedder=embedder)

# Create separate index with subset of metadata
engine = VectorKLEngine(
    storage=store,
    inplace=False,
    include=["id", "name", "type", "priority"],  # Only these metadata fields
    provider="chromalite",  # Can use different backend
    collection="search_index"
)

# Initial synchronization (copies data and generates embeddings)
engine.sync()

# Query on subset schema
results = engine.search(query="transformers", topk=5, priority=KLOp.GTE(7))

# After storage changes, resync
store.upsert(new_kl)
engine.sync()  # Update index

4.3. Mode Comparison¶

Feature

In-Place Mode

Non-In-Place Mode

Setup Time

Instant

Requires sync

Storage Overhead

None

Duplicate vectors & metadata

Query Speed

Depends on storage

Optimized for subset

Synchronization

Automatic

Manual sync required

Schema Flexibility

All metadata fields

Subset only

Backend Choice

Same as storage

Can differ from storage

Best For

Development, dynamic data

Production, static data


5. Search Operations¶

5.1. Vector Similarity Search (Default)¶

The primary search method using semantic similarity with optional metadata filtering:

results = engine.search(
    query="transformer architectures for NLP",  # Natural language query
    topk=10,                                    # Number of results to return
    include=["id", "kl", "score"],             # Fields to include in results
    # Optional metadata filters (same syntax as Filter.expr)
    type="research_paper",
    priority=KLOp.GTE(7),
    status=KLOp.OR(["published", "peer_reviewed"])
)

Search Parameters:

  • query (str, required): Natural language query text to search for. Will be encoded and embedded.

  • topk (int, default: 10): Number of top results to return, ranked by similarity score.

  • include (Iterable[str], optional): Fields to include in results

    • "id": Knowledge object ID (int)

    • "kl": Full BaseUKF object (retrieved if recoverable)

    • "score": Similarity score (float, higher = more similar)

    • "filter": The applied metadata filters (for debugging)

    • "query": The VectorStoreQuery object (for debugging)

    • Default: ["id", "kl", "score"]

  • **kwargs: Metadata filter conditions using Filter operators or simple values

Return Value:

List[Dict[str, Any]]  # Each dict contains requested fields from `include`

5.2. Result Structures¶

Minimal Results (IDs and scores):

results = engine.search(query="deep learning", topk=5, include=["id", "score"])
# [
#   {"id": 123, "score": 0.89},
#   {"id": 456, "score": 0.85},
#   {"id": 789, "score": 0.82}
# ]

Full Results (with knowledge objects):

results = engine.search(query="neural networks", topk=5, include=["id", "kl", "score"])
# [
#   {"id": 123, "kl": <BaseUKF object>, "score": 0.89},
#   {"id": 456, "kl": <BaseUKF object>, "score": 0.85}
# ]

Debug Results (with filter and query objects):

results = engine.search(
    query="transformers", 
    topk=5, 
    include=["id", "score", "filter", "query"],
    type="paper"
)
# [
#   {
#     "id": 123, 
#     "score": 0.89,
#     "filter": <MetadataFilters object>,
#     "query": <VectorStoreQuery object>
#   },
#   ...
# ]

5.3. Hybrid Search: Semantic + Metadata Filtering¶

Combine vector similarity with structured metadata constraints:

# Find semantically similar papers with specific metadata
results = engine.search(
    query="attention mechanisms in transformers",
    topk=10,
    include=["id", "kl", "score"],
    # Metadata filters
    type="research_paper",
    year=KLOp.GTE(2020),  # Recent papers only
    venue=KLOp.OR(["NeurIPS", "ICLR", "ICML"]),  # Top conferences
    citations=KLOp.GTE(100)  # Highly cited
)

for result in results:
    kl = result["kl"]
    print(f"{kl.name} ({kl.year}) - Score: {result['score']:.3f}, Citations: {kl.citations}")

5.4. Global Filters¶

Apply persistent filters across all searches:

# Create engine with global filters
engine = VectorKLEngine(
    storage=store,
    inplace=True,
    filters={
        "type": "tutorial",                          # Only index tutorials
        "status": "published",                       # Only published content
        "language": KLOp.OR(["en", "zh"])         # English or Chinese only
    }
)

# All searches automatically include global filters
results = engine.search(
    query="python programming basics",
    topk=5,
    priority=KLOp.GTE(7)  # Additional filter on top of global filters
)
# Results will be: tutorials AND published AND (en OR zh) AND priority >= 7

5.5. Similarity Score Thresholding¶

Filter results by minimum similarity score:

# Get all results
results = engine.search(query="machine learning", topk=20, include=["id", "kl", "score"])

# Filter by score threshold
high_quality = [r for r in results if r["score"] >= 0.8]
medium_quality = [r for r in results if 0.6 <= r["score"] < 0.8]
low_quality = [r for r in results if r["score"] < 0.6]

print(f"High quality matches: {len(high_quality)}")
print(f"Medium quality matches: {len(medium_quality)}")
print(f"Low quality matches: {len(low_quality)}")

6. Complete Examples¶

6.2. Documentation Search with Metadata Filtering¶

from ahvn.klstore import VectorKLStore
from ahvn.klengine import VectorKLEngine
from ahvn.utils.klop import KLOp
from ahvn.llm import LLM
from ahvn.ukf import BaseUKF

# Setup
embedder = LLM(preset="embedder")
store = VectorKLStore(
    provider="chromalite",
    collection="docs",
    encoder=lambda kl: kl.content,
    embedder=embedder
)
engine = VectorKLEngine(storage=store, inplace=True)

# Create documentation
docs = [
    BaseUKF(
        name="Python Lists Tutorial",
        type="tutorial",
        difficulty="beginner",
        language="python",
        content="Learn how to use lists in Python: creating, indexing, slicing, and common operations.",
        tags={"data-structures", "basics"}
    ),
    BaseUKF(
        name="Advanced Dictionary Techniques",
        type="guide",
        difficulty="advanced",
        language="python",
        content="Deep dive into Python dictionaries: comprehensions, defaultdict, OrderedDict, and performance optimization.",
        tags={"data-structures", "optimization"}
    ),
    BaseUKF(
        name="JavaScript Promises",
        type="tutorial",
        difficulty="intermediate",
        language="javascript",
        content="Understanding asynchronous programming with Promises: then, catch, async/await patterns.",
        tags={"async", "modern-js"}
    ),
    BaseUKF(
        name="Python Async/Await Guide",
        type="guide",
        difficulty="advanced",
        language="python",
        content="Master asynchronous programming in Python using asyncio, async/await syntax, and coroutines.",
        tags={"async", "concurrency"}
    ),
]
store.batch_upsert(docs)
store.flush()

# Query 1: Beginner-friendly Python tutorials
results = engine.search(
    query="how to get started with Python programming",
    topk=3,
    include=["id", "kl", "score"],
    language="python",
    difficulty="beginner"
)
print("=== Beginner Python Tutorials ===")
for result in results:
    print(f"- {result['kl'].name} (score: {result['score']:.3f})")

# Query 2: Asynchronous programming in any language
results = engine.search(
    query="asynchronous programming patterns",
    topk=5,
    include=["id", "kl", "score"]
)
print("\n=== Async Programming Resources ===")
for result in results:
    kl = result['kl']
    print(f"- {kl.name} ({kl.language}, {kl.difficulty})")

# Query 3: Advanced Python content only
results = engine.search(
    query="data structures and algorithms",
    topk=3,
    include=["id", "kl", "score"],
    language="python",
    difficulty=KLOp.OR(["intermediate", "advanced"])
)
print("\n=== Advanced Python Content ===")
for result in results:
    print(f"- {result['kl'].name}")

6.3. E-commerce Product Recommendations¶

from ahvn.klstore import VectorKLStore
from ahvn.klengine import VectorKLEngine
from ahvn.utils.klop import KLOp
from ahvn.llm import LLM
from ahvn.ukf import BaseUKF

# Setup
embedder = LLM(preset="embedder")
store = VectorKLStore(
    provider="lancedb",
    uri="./products",
    collection="catalog",
    encoder=lambda kl: f"{kl.name}. {kl.content}",
    embedder=embedder
)
engine = VectorKLEngine(storage=store, inplace=True)

# Create products
products = [
    BaseUKF(
        name="Laptop Pro 15",
        type="electronics",
        category="laptop",
        brand="TechCorp",
        price=1299.99,
        rating=4.5,
        in_stock=True,
        content="High-performance laptop with 15-inch display, 16GB RAM, 512GB SSD, perfect for programming and content creation"
    ),
    BaseUKF(
        name="Wireless Mechanical Keyboard",
        type="electronics",
        category="accessory",
        brand="KeyMaster",
        price=129.99,
        rating=4.7,
        in_stock=True,
        content="Premium wireless mechanical keyboard with RGB lighting, perfect for programmers and gamers"
    ),
    BaseUKF(
        name="USB-C Hub 7-in-1",
        type="electronics",
        category="accessory",
        brand="ConnectAll",
        price=49.99,
        rating=4.3,
        in_stock=True,
        content="Versatile USB-C hub with HDMI, USB 3.0, SD card reader, and ethernet port for laptops"
    ),
    BaseUKF(
        name="Ergonomic Office Chair",
        type="furniture",
        category="chair",
        brand="ComfortSeating",
        price=399.99,
        rating=4.6,
        in_stock=False,
        content="Premium ergonomic chair with lumbar support, adjustable armrests, ideal for long programming sessions"
    ),
]
store.batch_upsert(products)
store.flush()

# Query 1: Find products for a programmer's setup
results = engine.search(
    query="best equipment for software developers",
    topk=5,
    include=["id", "kl", "score"],
    in_stock=True  # Only in-stock items
)
print("=== Developer Setup Recommendations ===")
for result in results:
    kl = result['kl']
    print(f"- {kl.name}: ${kl.price} (rating: {kl.rating}, score: {result['score']:.3f})")

# Query 2: Budget-friendly accessories
results = engine.search(
    query="affordable laptop accessories",
    topk=3,
    include=["id", "kl", "score"],
    category="accessory",
    price=KLOp.LTE(150.00),
    rating=KLOp.GTE(4.0)
)
print("\n=== Budget Accessories ===")
for result in results:
    kl = result['kl']
    print(f"- {kl.name}: ${kl.price}")

# Query 3: Premium electronics
results = engine.search(
    query="high quality professional equipment",
    topk=3,
    include=["id", "kl", "score"],
    type="electronics",
    price=KLOp.GTE(1000.00),
    in_stock=True
)
print("\n=== Premium Electronics ===")
for result in results:
    kl = result['kl']
    print(f"- {kl.name} by {kl.brand}: ${kl.price}")

6.4. Non-In-Place Mode with Different Backends¶

from ahvn.klstore import VectorKLStore
from ahvn.klengine import VectorKLEngine
from ahvn.utils.klop import KLOp
from ahvn.llm import LLM

# Storage with LanceDB
embedder = LLM(preset="embedder")
store = VectorKLStore(
    provider="lancedb",
    uri="./primary_store",
    collection="documents",
    encoder=lambda kl: kl.content,
    embedder=embedder
)

# Populate storage
# ... (add documents to store)

# Create search index with ChromaDB (different backend)
engine = VectorKLEngine(
    storage=store,
    inplace=False,
    include=["id", "name", "type", "priority", "tags"],  # Subset of metadata
    provider="chromalite",  # Different from storage provider
    collection="search_index",
    encoder=lambda kl: kl.content,  # Can use different encoding
    embedder=embedder  # Can use different embedding model
)

# Initial sync
print("Building search index...")
engine.sync()
print(f"Indexed {len(engine)} documents")

# Fast hybrid search on optimized index
results = engine.search(
    query="machine learning fundamentals",
    topk=10,
    include=["id", "score"],  # Minimal includes for speed
    type="tutorial",
    priority=KLOp.GTE(7)
)
print(f"Found {len(results)} matching documents")

# Close resources
engine.close()
store.close()

Further Exploration¶

Tip: For the base interface and common operations, see:

  • BaseKLEngine - Abstract base class defining the KLEngine interface and shared functionality

  • KLEngine Overview - Introduction to query engines and retrieval strategies

Tip: For other search methodologies, see:

  • FacetKLEngine - Structured filtering and faceted search using SQL predicates

  • DAACKLEngine - Multi-pattern string matching for entity recognition

Tip: For vector database integration, see:

Tip: For embedding and encoding, see:

  • LLM - Language models for embedding generation