ahvn.llm package¶

class ahvn.llm.LLM(preset=None, model=None, provider=None, cache=True, cache_exclude=None, name=None, **kwargs)[source]¶

Bases: object

High-level chat LLM client with retry, caching, proxy, and streaming support.

This class wraps a litellm-compatible chat API and provides two access modes: - stream: incremental (delta) results as they arrive - oracle: full (final) result collected from the stream

Key features: - Retry: automatic retries via tenacity on retryable exceptions. - Caching: memoizes successful results keyed by all request inputs and a user-defined name. Excluded keys can be configured via cache_exclude. - Streaming-first: always uses stream=True under the hood for stability; oracle aggregates the stream. - Proxies: optional http_proxy and https_proxy support per-request. - Flexible messages: accepts multiple message formats and normalizes them. - Output shaping: include and reduce control what is returned and whether to flatten lists.

Parameters:
  • preset (str | None) – Named preset from configuration (if supported by resolve_llm_config).

  • model (str | None) – Model identifier (e.g., “gpt-4o”). Overrides preset when provided.

  • provider (str | None) – Provider name used by the underlying client.

  • cache (Union[bool, str, BaseCache] | None) – Cache implementation. Defaults to True. If True, uses DiskCache with the default cache directory (“core.cache_path”). If a string is provided, it is treated as the path for DiskCache. If None/False, uses NoCache (no caching).

  • cache_exclude (list[str] | None) – Keys to exclude from cache key construction.

  • name (str | None) – Logical name for this LLM instance. Used to namespace the cache. Defaults to “llm”.

  • **kwargs – Additional provider/client config (e.g., temperature, top_p, n, tools, tool_choice, http_proxy, https_proxy, and any litellm client options). These act as defaults and can be overridden per call.

Notes

  • Caching: Only successful executions are cached. The cache key includes the normalized messages,

    the full effective configuration, and name, minus any keys listed in cache_exclude.

  • Set name differently for semantically distinct use-cases to avoid cache collisions.

__init__(preset=None, model=None, provider=None, cache=True, cache_exclude=None, name=None, **kwargs)[source]¶
Parameters:
stream(messages, tools=None, tool_choice=None, include=None, verbose=False, reduce=True, **kwargs)[source]¶

Stream LLM responses (deltas) for the given messages.

Features: - Retry: automatic retries for transient failures. - Caching: memoizes successful runs keyed by inputs and name. - Streaming-first: uses stream=True for stability; yields deltas as they arrive. - Tool support: when tools are provided, tool_calls are aggregated and yielded at the end. - Proxies: supports http_proxy and https_proxy in kwargs. - Flexible input: accepts multiple message formats and normalizes them. - Output shaping: control returned fields with include and flattening with reduce.

Parameters:
  • messages (Union[str, Dict[str, Any], Any, List[Union[str, Dict[str, Any], Any]]]) –

    Conversation content, normalized by format_messages: 1) str -> treated as a single user message 2) list:

    • litellm.Message -> converted via json()

    • str -> treated as user message

    • dict -> used as-is and must include “role”

  • tools (Optional[List[Union[Dict, ToolSpec]]]) – Optional list of tools, each can be a ToolSpec or jsonschema dict. When provided, include defaults to [“think”, “text”, “tool_calls”].

  • tool_choice (Optional[str]) – Tool choice setting. Defaults to “auto” if tools present, otherwise None.

  • include (Optional[List[Literal['text', 'think', 'tool_calls', 'content', 'message', 'structured', 'tool_messages', 'tool_results', 'delta_messages', 'messages']]]) – Fields to include in each streamed delta. Can be a str or list[str]. Allowed: “text”, “think”, “tool_calls”, “content”, “message”, “structured”, “tool_messages”, “tool_results”, “delta_messages”, “messages”. Default: [“text”] without tools, [“think”, “text”, “tool_calls”] with tools.

  • verbose (bool) – If True, logs the resolved request config.

  • reduce (bool) – If True and len(include) == 1, returns a single value instead of a dict. If False, always returns a dict.

  • **kwargs – Per-call overrides for LLM config (e.g., temperature, top_p, http_proxy, https_proxy, etc.).

Yields:

LLMResponse – - dict if len(include) > 1 or reduce == False - single value if len(include) == 1 and reduce == True When tools are present, tool_calls/tool_messages/tool_results are yielded at the end after all text.

Raises:
  • ValueError – if include is empty or contains unsupported fields (e.g., “messages”).

  • ValueError – if tool_messages or tool_results is in include but some tools are not ToolSpec.

Return type:

Generator[Union[str, Dict[str, Any], List[Union[str, Dict[str, Any]]]], None, None]

async astream(messages, tools=None, tool_choice=None, include=None, verbose=False, reduce=True, **kwargs)[source]¶

Asynchronously stream LLM responses (deltas) for the given messages.

Mirrors stream() but returns an async generator suitable for async workflows.

Warning: tools are not yet supported in async mode and will raise NotImplementedError if provided.

Return type:

AsyncGenerator[Union[str, Dict[str, Any], List[Union[str, Dict[str, Any]]]], None]

Parameters:
oracle(messages, tools=None, tool_choice=None, include=None, verbose=False, reduce=True, **kwargs)[source]¶

Get the final LLM response for the given messages (aggregated from a stream).

Features: - Retry: automatic retries for transient failures. - Caching: memoizes successful runs keyed by inputs and name. - Streaming-first: uses stream=True under the hood and aggregates the result. - Tool support: can include tools and tool_results in response. - Proxies: supports http_proxy and https_proxy in kwargs. - Flexible input: accepts multiple message formats and normalizes them. - Output shaping: control returned fields with include and flattening with reduce.

Parameters:
  • messages (Union[str, Dict[str, Any], Any, List[Union[str, Dict[str, Any], Any]]]) – Conversation content, normalized by format_messages.

  • tools (Optional[List[Union[Dict, ToolSpec]]]) – Optional list of tools, each can be a ToolSpec or jsonschema dict. When provided, include defaults to [“think”, “text”, “tool_calls”].

  • tool_choice (Optional[str]) – Tool choice setting. Defaults to “auto” if tools present.

  • include (Optional[List[Literal['text', 'think', 'tool_calls', 'content', 'message', 'structured', 'tool_messages', 'tool_results', 'delta_messages', 'messages']]]) – Fields to include in the final result. Can be a str or list[str]. Allowed: “text”, “think”, “tool_calls”, “content”, “message”, “structured”, “tool_messages”, “tool_results”, “delta_messages”, “messages”. Default: [“text”] without tools, [“think”, “text”, “tool_calls”] with tools.

  • verbose (bool) – If True, logs the resolved request config.

  • reduce (bool) – If True and len(include) == 1, returns a single value instead of a dict. If False, always returns a dict.

  • **kwargs – Per-call overrides for LLM config.

Returns:

  • dict if len(include) > 1 or reduce == False

  • single value if len(include) == 1 and reduce == True

Return type:

LLMResponse

Raises:
  • ValueError – if include is empty or contains unsupported fields.

  • ValueError – if tool_messages or tool_results is in include but some tools are not ToolSpec.

async aoracle(messages, tools=None, tool_choice=None, include=None, verbose=False, reduce=True, **kwargs)[source]¶

Asynchronously retrieve the final LLM response (aggregated from the async stream).

Mirrors oracle() and shares its configuration, caching, and reduction semantics.

Return type:

Union[str, Dict[str, Any], List[Union[str, Dict[str, Any]]]]

Parameters:
embed(inputs, verbose=False, **kwargs)[source]¶

Get embeddings for the given inputs.

Parameters:
  • inputs (Union[str, List[str]]) – A single string or a list of strings to embed.

  • verbose (bool) – If True, logs the resolved request config.

  • **kwargs – Additional parameters for the embedding request.

Returns:

A list of embeddings, one for each input string.

Return type:

List[List[float]]

async aembed(inputs, verbose=False, **kwargs)[source]¶

Get embeddings for the given inputs asynchronously.

Provides parity with embed() using litellm.aembedding under the hood while respecting caching behavior.

Return type:

List[List[float]]

Parameters:
tooluse(messages, tools, tool_choice='required', include=None, verbose=False, reduce=True, **kwargs)[source]¶

Execute tool calls with the LLM.

This is a convenience method that forces the LLM to use tools and returns the executed tool messages. It sets tool_choice=”required” and returns tool_messages by default.

Parameters:
  • messages (Union[str, Dict[str, Any], Any, List[Union[str, Dict[str, Any], Any]]]) – Conversation content.

  • tools (List[Union[Dict, ToolSpec]]) – List of tools (ToolSpec instances required for execution).

  • tool_choice (str) – Tool choice setting. Defaults to “required”.

  • include (Union[str, List[str], None]) – Fields to include in the result. Defaults to [“tool_messages”].

  • verbose (bool) – If True, logs the resolved request config.

  • reduce (bool) – If True, simplifies the output when possible.

  • **kwargs – Per-call overrides for LLM config.

Returns:

List of tool result messages in OpenAI format:

[{“role”: “tool”, “tool_call_id”: …, “name”: …, “content”: …}, …]

Return type:

List[Dict]

Raises:

ValueError – if tools are not ToolSpec instances.

Example

>>> tool_messages = llm.tooluse("Calculate fib(10)", tools=[fib_tool])
>>> print(tool_messages)
[{"role": "tool", "tool_call_id": "...", "name": "fib", "content": "55"}]
>>> # For repeated tool use iteration:
>>> messages.append({"role": "assistant", "tool_calls": ...})
>>> messages.extend(tool_messages)
>>> tool_messages = llm.tooluse(messages, tools=[fib_tool])
async atooluse(messages, tools, verbose=False, **kwargs)[source]¶

Asynchronously execute tool calls with the LLM.

Mirrors tooluse() but awaits async streaming.

Return type:

List[Dict]

Parameters:
property dim¶

Get the dimensionality of the embeddings produced by this LLM. This is determined by making a test embedding call (i.e., “<TEST>”).

Warning

Due to efficiency considerations, this is only computed once and cached. If the LLM config is edited after the first call (which is not recommended), the result may be incorrect.

Returns:

The dimensionality of the embeddings.

Return type:

int

Raises:

ValueError – if the embedding dimension cannot be determined.

property embed_empty: List[float]¶

Get a fixed embedding vector for empty strings.

This is a simple heuristic embedding consisting of a 1 followed by zeros, with the length equal to the LLM’s embedding dimensionality.

Returns:

The embedding vector for an empty string.

Return type:

List[float]

ahvn.llm.gather_assistant_message(message_chunks)[source]¶

Gather assistant message_chunks (returned by _LLMChunk.to_message()) from a list of message dictionaries.

Parameters:

message_chunks (List[Dict]) – A list of message dictionaries to gather.

Returns:

A dictionary containing the gathered assistant message.

Return type:

Dict[str, Any]

ahvn.llm.resolve_llm_config(preset=None, model=None, provider=None, **kwargs)[source]¶

Compile an LLM configuration dictionary based on the following order of priority: 1. kwargs 2. preset 3. provider 4. model 5. global configuration When a parameter is specified in multiple places, the one with the highest priority is used. For example, if a parameter is specified in both kwargs and preset, the value from kwargs will be used. When missing, the preset falls back to the default preset, the model falls back to the default model, and the provider falls back to the default provider of the model.

Parameters:
  • preset (str, optional) – The preset name to use.

  • model (str, optional) – The model name to use.

  • provider (str, optional) – The provider name to use.

  • encrypt (bool, optional) – Whether to encrypt the configuration. Defaults to False.

  • **kwargs – Additional parameters to override in the configuration.

Returns:

The resolved LLM configuration dictionary.

Return type:

Dict[str, Any]

ahvn.llm.format_messages(messages)[source]¶

Unify messages for LLM in diverse formats to OpenAI message format.

  1. If messages is a single string, it is treated as a single user message.

  2. If messages is a list, each item is processed as follows:

    • If the item is a litellm.Message object, it is converted to dict using its json() method.

    • If the item is a string, it is treated as a user message.

    • If the item is a dict, it is used as is, but must contain a “role” field.

    • If the item is of any other type, a TypeError is raised.

  3. If a message dict contains “tool_calls”, its “function.arguments” field is converted to a JSON string if it is not already a string.

Parameters:

messages (Union[str, Dict[str, Any], Any, List[Union[str, Dict[str, Any], Any]]]) – List of messages that can be either dict or Message objects

Returns:

List of formatted messages in OpenAI format

Return type:

List[dict]

Raises:
  • ValueError – If messages are invalid or missing required fields

  • TypeError – If an unsupported message type is encountered

Submodules¶