BaseKLEngine¶

BaseKLEngine is the abstract base class that defines the common interface for all KLEngine implementations in AgentHeaven. It provides a standardized way to index, search, and retrieve knowledge objects (BaseUKF instances) across different search methodologies while ensuring consistent behavior and functionality.

1. Understanding KLEngine¶

1.1. What is KLEngine?¶

KLEngine is AgentHeaven’s query and retrieval layer for knowledge objects. Think of it as a specialized search system where:

Input is a query with search parameters (filters, keywords, embeddings, etc.)
Output is a list of matching knowledge objects with their metadata
Operations are primarily search-focused, with support for indexing and maintenance

KLEngine focuses on searching and retrieval — while it supports indexing operations (insert, update, remove) for maintaining the search index, its core purpose is to provide various ways to query and retrieve knowledge objects. KLEngine is not required to store the entirety of knowledge objects; instead, it can work in conjunction with a KLStore that handles persistent storage. Most likely, the KLEngine will only contain knowledge ids, with index metadata or vector embeddings to facilitate efficient searching.

1.2. Why Separate Storage from Searching?¶

This separation of concerns brings several benefits:

Flexibility in Search Methodologies: You can easily switch between or combine different search approaches (vector similarity, faceted search, pattern matching) based on your needs—accuracy, speed, or complexity—without changing your storage backend.

Storage Independence: KLEngines can work with or without attached KLStores. When attached, they retrieve full knowledge objects from storage on demand. When detached, they can still provide search results with IDs or cached metadata. This flexibility enables various architectural patterns.

In-Place vs. Standalone Modes: Some engines operate in-place, directly querying the storage backend (e.g., FacetKLEngine with DatabaseKLStore), while others maintain their own search indexes (e.g., DAACKLEngine). This design accommodates both lightweight and specialized search strategies.

Multi-Modal Retrieval: The modular design enables combining multiple search engines over the same knowledge base—using vector search for semantic queries, faceted search for structured filtering, and pattern matching for entity recognition—all working together to provide comprehensive retrieval capabilities.

2. Shared Functionality¶

All KLEngine implementations inherit the following common capabilities from BaseKLEngine:

2.1. Core Operations¶

Search Operations: Perform queries to retrieve matching knowledge objects
Index Maintenance: Insert, update, and remove knowledge objects in the search index
Storage Attachment: Optionally attach to a KLStore for retrieving full knowledge objects
Flexible Retrieval: Get knowledge objects by their ID from engine or attached storage

2.2. Batch Operations¶

Batch Insert: Insert multiple knowledge objects efficiently into the search index
Batch Upsert: Insert or update multiple knowledge objects in one operation
Batch Remove: Remove multiple knowledge objects from the search index simultaneously

2.3. Multiple Search Modes¶

KLEngines support multiple search methods through a routing mechanism:

Default Search: Use search() without a mode parameter to invoke _search()
Named Search: Use search(mode="xxx") to invoke _search_xxx()
Search Discovery: Use list_search() to discover available search modes

2.4. Flexible Key Handling¶

BaseKLEngine accepts three types of keys for all operations:

int: Direct numeric ID
str: String representation of numeric ID (automatically converted)
BaseUKF: Knowledge object instance (ID extracted automatically)

2.5. Conditional Indexing¶

All KLEngine implementations support optional condition filtering:

# Only index knowledge objects that meet specific criteria
engine = MyKLEngine(condition=lambda kl: kl.category == "important")

2.6. In-Place vs. Standalone Modes¶

KLEngines can operate in two modes:

Standalone Mode (inplace=False): The engine maintains its own search index separate from storage
In-Place Mode (inplace=True): The engine operates directly on the attached storage backend without maintaining a separate index

3. Core Interface Methods¶

BaseKLEngine defines the essential interface that all implementations must provide:

3.1. Required Abstract Methods¶

_search(include, *args, **kwargs): Perform the default search operation. Returns a list of dictionaries containing search results with keys limited to include. Conventionally, use "id" for BaseUKF.id and "kl" for the BaseUKF instance itself.
_upsert(kl): Insert or update a knowledge object in the search index.
_remove(key): Remove a knowledge object from the search index by its ID. If not applicable for the engine type, override with an empty function or raise an exception.
_clear(): Clear all knowledge objects from the search index.

3.2. Optional Methods¶

_get(key, default): Retrieve a knowledge object from the engine’s internal cache or index. Though not required, leaving this unimplemented may lead to unexpected behavior if knowledge objects should be returned by search() without an attached KLStore.
_post_search(results, include, *args, **kwargs): Postprocess search results. By default, returns results unchanged. Override to add ranking, filtering, or enrichment.
_search_xxx(include, *args, **kwargs): Named search methods for different search modes. For example, _search() for vector similarity search, _search_facet() for faceted filtering.

3.3. Optional Optimization Methods¶

_batch_upsert(kls), _batch_insert(kls), _batch_remove(keys): Optimized batch operations. The default implementations iterate through individual operations. Override for better performance with large datasets.
close(): Optional method to close any open connections or resources. Default is a no-op.
flush(): Optional method to flush any buffered data to persistent storage. Default is a no-op.
sync(): Synchronize the engine with its attached KLStore by clearing and re-indexing all knowledge objects. Useful when the storage has been batch modified externally.

4. Usage Patterns¶

4.1. Basic Operations¶

class MyKLEngine(BaseKLEngine):
    # Implement required methods here
    pass

# Create engine with optional storage attachment
store = MyKLStore("my_store")
engine = MyKLEngine(storage=store, name="my_engine")

# Insert knowledge objects into the index
engine.insert(knowledge_object)
engine.upsert(knowledge_object)  # Insert or update

# Search for knowledge objects
results = engine.search(query="example", include=["id", "kl"])
for result in results:
    kl_id = result["id"]
    kl_obj = result["kl"]
    
# Retrieve specific knowledge object
kl = engine.get(123)  # From engine cache or attached storage

# Remove knowledge objects from index
engine.remove(123)
del engine[123]

4.2. Batch Operations¶

# Batch insert (only if not exists)
engine.batch_insert([kl1, kl2, kl3])

# Batch upsert (insert or update)
engine.batch_upsert([kl1, kl2, kl3])

# Batch remove
engine.batch_remove([123, 456, 789])

4.3. Multiple Search Modes¶

# Discover available search modes
modes = engine.list_search()  # Returns [None, 'vector', 'facet', ...]

# Use default search
results = engine.search(query="example")

# Use named search mode
results = engine.search(query="example", mode="vector")  # this requires a _search implementation, typically by VectorKLEngine
results = engine.search(filters={"category": "science"}, mode="facet")  # this requires a _search_facet implementation, typically by FacetKLEngine

4.4. Storage Attachment¶

# Create engine without storage
engine = MyKLEngine(name="my_engine")

# Attach storage later
store = MyKLStore("my_store")
engine.attach(store)

# Search returns IDs, retrieval uses attached storage
results = engine.search(query="example", include=["id", "kl"])

# Synchronize engine with storage
engine.sync()  # Re-index all objects from storage

# Detach storage
engine.detach()

4.5. Flexible Result Inclusion¶

# Control what fields to include in search results
results = engine.search(
    query="example",
    include=["id", "kl", "score", "metadata"]
)

# Minimal results (IDs only)
results = engine.search(query="example", include=["id"])

# Full knowledge objects
results = engine.search(query="example", include=["id", "kl"])

# Full knowledge objects with search metadata
results = engine.search(query="example", include=["id", "kl", "score"])  # Typically from vector search
results = engine.search(query="example", include=["id", "kl", "matches"])  # Typically from string search

5. Implementation Guide¶

When creating a custom KLEngine implementation:

5.1. Extend BaseKLEngine¶

from ahvn.klengine.base import BaseKLEngine
from ahvn.ukf.base import BaseUKF
from typing import Any, Dict, List, Optional, Iterable

class MyKLEngine(BaseKLEngine):
    def __init__(
        self,
        storage=None,
        inplace=False,
        name=None,
        condition=None,
        **kwargs
    ):
        super().__init__(storage, inplace, name, condition, **kwargs)
        # Initialize your search index
        self._index = {}  # Example: simple dictionary index
    
    # Implement required abstract methods
    def _search(
        self,
        include: Optional[Iterable[str]] = None,
        query: str = "",
        **kwargs
    ) -> List[Dict[str, Any]]:
        """Your search implementation"""
        results = []
        # Perform search logic
        for kl_id, metadata in self._index.items():
            if self._matches(metadata, query):
                results.append({"id": kl_id, "score": 1.0})
        return results
    
    def _upsert(self, kl: BaseUKF):
        """Update search index"""
        self._index[kl.id] = self._extract_metadata(kl)
    
    def _remove(self, key: int):
        """Remove from search index"""
        self._index.pop(key, None)
    
    def _clear(self):
        """Clear search index"""
        self._index.clear()

5.2. Add Named Search Modes¶

class MyKLEngine(BaseKLEngine):
    # ... (previous code)
    
    def _search_exact(
        self,
        include: Optional[Iterable[str]] = None,
        keyword: str = "",
        **kwargs
    ) -> List[Dict[str, Any]]:
        """Exact keyword matching search mode"""
        results = []
        for kl_id, metadata in self._index.items():
            if keyword in metadata.get("content", ""):
                results.append({"id": kl_id, "score": 1.0})
        return results
    
    def _search_fuzzy(
        self,
        include: Optional[Iterable[str]] = None,
        keyword: str = "",
        threshold: float = 0.8,
        **kwargs
    ) -> List[Dict[str, Any]]:
        """Fuzzy matching search mode"""
        results = []
        # Fuzzy matching logic
        return results

5.3. Performance Optimization¶

Override optimization methods for better performance:

def _batch_upsert(self, kls: Iterable[BaseUKF]):
    """Optimized batch indexing"""
    # Use bulk operations if your backend supports them
    for kl in kls:
        self._index[kl.id] = self._extract_metadata(kl)
    self._rebuild_secondary_indexes()  # Example: rebuild once

def _post_search(
    self,
    results: List[Dict[str, Any]],
    include: Optional[Iterable[str]] = None,
    **kwargs
) -> List[Dict[str, Any]]:
    """Post-process search results"""
    # Add re-ranking, deduplication, or enrichment
    results = sorted(results, key=lambda r: r.get("score", 0), reverse=True)
    return results[:kwargs.get("limit", 100)]

5.4. In-Place Engine Implementation¶

from ahvn.klstore.database import DatabaseKLStore

class MyInPlaceKLEngine(BaseKLEngine):
    inplace = True  # Mark as in-place engine
    
    def __init__(self, storage: DatabaseKLStore, **kwargs):
        # In-place engines require a storage backend
        if storage is None:
            raise ValueError("In-place engines require a storage backend")
        super().__init__(storage=storage, inplace=True, **kwargs)
    
    def _search(
        self,
        include: Optional[Iterable[str]] = None,
        filters: Dict[str, Any] = None,
        **kwargs
    ) -> List[Dict[str, Any]]:
        """Search directly on storage backend"""
        # Query the database directly
        query = self.storage.session.query(self.storage.entity)
        
        # Apply filters
        if filters:
            for key, value in filters.items():
                query = query.filter(getattr(self.storage.entity, key) == value)
        
        # Execute and return results
        results = []
        for entity in query.all():
            results.append({"id": entity.id})
        return results
    
    def _upsert(self, kl: BaseUKF):
        """No-op for in-place engines"""
        pass  # Storage handles persistence
    
    def _remove(self, key: int):
        """No-op for in-place engines"""
        pass  # Storage handles removal
    
    def _clear(self):
        """No-op for in-place engines"""
        pass  # Storage handles clearing

6. Further Exploration¶

Tip: For concrete KLEngine implementations, see:

FacetKLEngine - Structured search with ORM-like filtering and SQL queries

DAACKLEngine - High-performance string matching using Aho-Corasick automaton

VectorKLEngine - Vector similarity search for semantic retrieval

Tip: For storage backends that work with KLEngines, see:

KLStore - Storage layer for knowledge objects

CacheKLStore - In-memory storage for fast access

DatabaseKLStore - Persistent relational storage

VectorKLStore - Vector database storage

Tip: For knowledge object fundamentals, see:

BaseUKF - Universal Knowledge Format for representing knowledge objects

UKF Data Types - Data type mappings between UKF, Pydantic, and databases