ahvn.utils.basic.rnd_utils module¶
Random utilities with stable seeding that don't interfere with global random state.
- ahvn.utils.basic.rnd_utils.stable_rnd(seed=42)[源代码]¶
Generate a random float in [0.0, 1.0) without affecting the global random state.
- ahvn.utils.basic.rnd_utils.stable_rndint(min, max, seed=42)[源代码]¶
Generate a random integer between min and max (inclusive) without affecting the global random state.
- 参数:
- 返回:
Random integer in the specified range.
- 返回类型:
- ahvn.utils.basic.rnd_utils.stable_shuffle(seq, inplace=False, seed=42)[源代码]¶
Shuffle a sequence without affecting the global random state.
- 参数:
seq (Iterable[Any]) -- The sequence to shuffle.
inplace (bool, optional) -- If True, shuffle the input sequence in place (only works if seq is a mutable sequence). Default is False.
seed (int, optional) -- Seed for deterministic shuffling. Default is 42. If None, uses no salt (unstable). Nevertheless, it is strongly recommended to pass in a non-null seed value to ensure stability.
- 返回:
A new shuffled list containing the elements from seq.
- 返回类型:
List[Any]
- ahvn.utils.basic.rnd_utils.stable_split(seq, r=0.10, seed=42)[源代码]¶
Split a sequence into two parts based on a stable hash-based selection.
This function creates a stable split that is resilient to adding/removing items. Items are selected for the first group based on their hash values, ensuring that the same items are consistently selected even when the sequence changes.
It is worth addressing that the actual ratio of the split may not be exactly r due to the discrete nature of item selection based on hash values. However, over large datasets, the ratio should approximate r. To get an exact ratio/count, consider using stable_sample instead.
- 参数:
seq (Iterable[Any]) -- The sequence to split.
r (float, optional) -- Ratio for the first group (default: 0.10 for 10%).
seed (int, optional) -- Seed for deterministic splitting. Default is 42. If None, uses no salt (unstable). Nevertheless, it is strongly recommended to pass in a non-null seed value to ensure stability.
- 返回:
A tuple containing (selected_items, remaining_items).
- 返回类型:
- ahvn.utils.basic.rnd_utils.stable_sample(seq, n, seed=42)[源代码]¶
Sample n elements without replacement in a stable manner using min n of hash values.
This function creates a stable sample that is resilient to adding/removing items. Items are selected based on their hash values, ensuring that the same items are consistently selected even when the sequence changes, as long as n remains the same.
- 参数:
- 返回:
A list containing the sampled elements.
- 返回类型:
List[Any]
- ahvn.utils.basic.rnd_utils.stable_rnd_vector(seed=42, dim=384, major_ratio=0.7)[源代码]¶
Generate a stable random vector with a major value on a hashed dimension.
This function creates a deterministic embedding-like vector where: - One dimension (determined by hashing the seed) has a major value - Other dimensions have small random values - The entire vector is normalized via softmax then L2 normalization to unit length
This two-stage normalization (softmax followed by L2) better approximates the distribution of real embeddings compared to direct L2 normalization.
This is useful for creating mock embeddings in tests where you need deterministic but varied vectors that approximate the behavior of real embeddings.
- 参数:
seed (int, optional) -- Seed value for deterministic generation. Default is 42. If None, uses 42 as default to ensure stability.
dim (int, optional) -- Dimensionality of the vector. Default is 384 (common embedding dimension).
major_ratio (float, optional) -- Approximate ratio of the major dimension before normalization. Default is 0.7. The major dimension will have this value while others have small random values, then the whole vector is normalized via softmax + L2.
- 返回:
A normalized vector of length dim with unit L2 norm.
- 返回类型:
List[float]
示例
>>> vec1 = stable_rnd_vector(seed=123, dim=5) >>> vec2 = stable_rnd_vector(seed=123, dim=5) >>> vec1 == vec2 # Same seed produces same vector True >>> vec3 = stable_rnd_vector(seed=456, dim=5) >>> vec1 != vec3 # Different seed produces different vector True