ahvn.klengine.daac_engine module

class ahvn.klengine.daac_engine.DAACKLEngine(storage, path, encoder=None, min_length=2, inverse=True, normalizer=None, name=None, condition=None, encoding='utf-8', *args, **kwargs)[source]

Bases: BaseKLEngine

A Double Array AC Automaton-based KLEngine for efficient string search in BaseUKF objects.

This engine uses the Aho-Corasick automaton algorithm for fast multi-pattern string matching. It’s particularly useful for knowledge base applications where you need to find all occurrences of known entity strings within a given text query. The engine is designed to be inplace (storing only id and string, not full data) and requires external storage for BaseUKF objects.

Search Methods:

_search(query, conflict, whole_word, include, *args, **kwargs): AC automaton-based string search.

Abstract Methods (inherited from BaseKLEngine):

_upsert(kl): Insert or update a BaseUKF in the engine. _remove(key): Remove a BaseUKF from the engine by its key (id). _clear(): Clear all BaseUKF objects from the engine.

Parameters:
inplace: bool = False
recoverable: bool = False
__init__(storage, path, encoder=None, min_length=2, inverse=True, normalizer=None, name=None, condition=None, encoding='utf-8', *args, **kwargs)[source]

Initialize the DAACKLEngine.

Parameters:
  • storage (BaseKLStore) – The storage backend for BaseUKF objects (required).

  • path (str) – Local directory path to store AC automaton files.

  • encoder (Callable[[BaseUKF], List[str]]) – Function to extract searchable strings from BaseUKF objects. The recommended pattern is to use lambda kl: kl.synonyms where kl.synonyms contains all string variants that should point to the same knowledge object.

  • min_length (int) – Minimum length of strings to include in the automaton. Default is 2.

  • inverse (bool) – If True, builds the automaton on reversed strings for suffix matching efficiency. Default is True.

  • normalizer (Optional[Union[Callable[[str], str], bool]]) – Function to normalize strings before indexing and searching. If True, uses a default text normalizer including tokenization, stop word removal, lemmatization, and lowercasing. If None or False, no normalization is applied. Default is None.

  • name (Optional[str]) – Name of the KLEngine instance. If None, defaults to “{storage.name}_daac_idx”.

  • condition (Optional[Callable]) – Optional upsert/insert condition to apply to the KLEngine. KLs that do not satisfy the condition will be ignored. If None, all KLs are accepted.

  • encoding (Optional[str]) – Encoding used for saving/loading files. Default is None, which uses HEAVEN_CM’s default encoding.

  • *args – Additional positional arguments passed to BaseKLEngine.

  • **kwargs – Additional keyword arguments passed to BaseKLEngine.

__len__()[source]

Returns the number of unique BaseUKF entities (IDs) currently indexed by the engine.

clear(**kwargs)[source]

Clear all BaseUKF objects from the engine, resetting it to an empty state.

flush()[source]

Apply pending deletions and rebuild the AC automaton.

This method processes lazy deletions and rebuilds the automaton to ensure all changes are reflected in the search index.

sync(batch_size=None, flush=True, progress=None, **kwargs)[source]

Synchronize KLEngine with its attached KLStore, if applicable. Notice that a whole synchronization can often lead to large data upload/download. This could result in performance issues and even errors for particular backends. Therefore, parameters like batch_size are provided to control the synchronization process. It is recommended to override this method for better performance.

Parameters:
  • batch_size (Optional[int]) – The batch size for synchronization. If None, use the default batch size from configuration (512). If <= 0, yields all KLs in a single batch.

  • flush (bool) – If True, saves the engine state after synchronization. Default is True.

  • progress (Type[Progress]) – Progress class for reporting. None for silent, TqdmProgress for terminal.

  • **kwargs – Additional keyword arguments.

save(path=None)[source]

Save the current state of the engine to disk.

Parameters:

path (str) – Directory path to save the data. If None, uses self.path.

load(path=None)[source]

Load a previously saved engine state from disk.

Parameters:

path (str) – Directory path to load the data from. If None, uses self.path.

Returns:

True if loading was successful (files exist), False otherwise.

Return type:

bool

close()[source]

Close the engine and save current state to disk.