ahvn.utils.basic.serialize_utils module¶
- ahvn.utils.basic.serialize_utils.load_txt(path, encoding=None, strict=False)[source]¶
Load text from a file. If the file does not exist, returns an empty string.
- Parameters:
- Returns:
The contents of the file or an empty string if the file does not exist.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- ahvn.utils.basic.serialize_utils.iter_txt(path, encoding=None, strict=False)[source]¶
Iterate over a text file, yielding each line (stripping the newline character at the end).
- Parameters:
- Yields:
str – Each line in the text file.
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- Return type:
- ahvn.utils.basic.serialize_utils.save_txt(obj, path, encoding=None)[source]¶
Save text to a file. If the file does not exist, it will be created.
Warning
An extra newline will be added at the end of the string to be consistent with the behavior of append_txt.
- ahvn.utils.basic.serialize_utils.append_txt(obj, path, encoding=None)[source]¶
Append text to a file. If the file does not exist, it will be created.
- ahvn.utils.basic.serialize_utils.loads_yaml(s, **kwargs)[source]¶
Load a YAML string into a Python object.
- ahvn.utils.basic.serialize_utils.dumps_yaml(obj, sort_keys=False, indent=4, allow_unicode=True, **kwargs)[source]¶
Serialize a Python object to a YAML string.
- Parameters:
obj (Any) – The Python object to serialize.
sort_keys (bool) – Whether to sort the keys in the YAML output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
allow_unicode (bool) – Whether to allow Unicode characters in the output. Defaults to True.
**kwargs – Additional keyword arguments to pass to yaml.safe_dump.
- Returns:
The YAML string representation of the object.
- Return type:
- ahvn.utils.basic.serialize_utils.load_yaml(path, encoding=None, strict=False, **kwargs)[source]¶
Load a YAML file into a Python object.
- Parameters:
path (str) – The path to the YAML file.
encoding (str) – The encoding to use for reading the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
strict (bool) – If True, raises an error if the file does not exist. Otherwise, returns an empty dictionary.
**kwargs – Additional keyword arguments to pass to yaml.safe_load.
- Returns:
The Python object represented by the YAML file.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- ahvn.utils.basic.serialize_utils.dump_yaml(obj, path, sort_keys=False, indent=4, allow_unicode=True, **kwargs)[source]¶
Save a Python object to a YAML file.
- Parameters:
obj (Any) – The Python object to save.
path (str) – The path to the YAML file.
sort_keys (bool) – Whether to sort the keys in the YAML output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
allow_unicode (bool) – Whether to allow Unicode characters in the output. Defaults to True.
**kwargs – Additional keyword arguments to pass to yaml.safe_dump.
- ahvn.utils.basic.serialize_utils.save_yaml(obj, path, sort_keys=False, indent=4, allow_unicode=True, **kwargs)[source]¶
Alias for dump_yaml. Saves a Python object to a YAML file.
- Parameters:
obj (Any) – The Python object to save.
path (str) – The path to the YAML file.
sort_keys (bool) – Whether to sort the keys in the YAML output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
allow_unicode (bool) – Whether to allow Unicode characters in the output. Defaults to True.
**kwargs – Additional keyword arguments to pass to yaml.safe_dump.
- ahvn.utils.basic.serialize_utils.load_pkl(path, strict=False, **kwargs)[source]¶
Load a Python object from a pickle file.
- ahvn.utils.basic.serialize_utils.dump_pkl(obj, path, **kwargs)[source]¶
Save a Python object to a pickle file.
- ahvn.utils.basic.serialize_utils.save_pkl(obj, path, **kwargs)[source]¶
Alias for dump_pkl. Saves a Python object to a pickle file.
- ahvn.utils.basic.serialize_utils.load_hex(path, strict=False, **kwargs)[source]¶
Load the binary contents of a file as a hexadecimal string.
- Parameters:
- Returns:
The hexadecimal string representation of the file’s contents.
- Return type:
- ahvn.utils.basic.serialize_utils.dump_hex(obj, path, **kwargs)[source]¶
Save a string or bytes object as a hexadecimal string to a file.
- ahvn.utils.basic.serialize_utils.save_hex(obj, path, **kwargs)[source]¶
Alias for dump_hex. Saves a string or bytes object as a hexadecimal string to a file.
- ahvn.utils.basic.serialize_utils.load_b64(path, strict=False)[source]¶
Load the binary contents of a file as a Base64-encoded string.
- ahvn.utils.basic.serialize_utils.dump_b64(obj, path)[source]¶
Save a Base64 string to a file by decoding it into binary content.
- ahvn.utils.basic.serialize_utils.save_b64(obj, path)[source]¶
Alias for dump_b64. Saves a Base64 string to a file by decoding it into binary content.
- ahvn.utils.basic.serialize_utils.serialize_path(path)[source]¶
Serialize the contents of a directory hierarchy into a dictionary mapping relative paths to Base64-encoded file contents.
Directories are recorded with a value of
Noneso the structure can be rehydrated later.
- ahvn.utils.basic.serialize_utils.deserialize_path(serialized, path)[source]¶
Materialize files and directories described by
serializedunderpath.- Parameters:
serialized (Dict[str, Optional[str]]) – Mapping emitted by
serialize_path().path (str) – Destination directory where content should be written.
- ahvn.utils.basic.serialize_utils.serialize_func(func, **kwargs)[source]¶
Serialize a function to a descriptor dictionary using dill for source code and cloudpickle for binary content.
- Parameters:
func (Callable) – The function to serialize.
**kwargs – Additional keyword arguments to pass to cloudpickle.dumps.
- Returns:
- A dictionary representation of the serialized function. It contains the following attributes:
Built-in Attributes: - name: The function’s name. - qualname: The qualified name of the function. - doc: The function’s docstring. - module: The qualified name of the module where the function is defined. - defaults: Default values for the function’s positional arguments. - kwdefaults: Default values for the function’s keyword-only arguments. - annotations: Type annotations for the function’s arguments and return value. - code: The source code of the function (as a string, via dill). - dict: The function’s __dict__ (excluding __source__), with all values stringified. Extra Attributes: - stream: Whether the function is a generator function (bool). - hex_dumps: The function serialized as a hex string using cloudpickle.
- Return type:
Dict
- ahvn.utils.basic.serialize_utils.deserialize_func(func, prefer='hex_dumps')[source]¶
Deserialize a function from a descriptor dictionary.
- Parameters:
func (Dict) – The function descriptor dictionary.
prefer (Literal['code','hex_dumps']) – Which method to try first.
- Returns:
The deserialized function.
- Return type:
Callable
- Raises:
FunctionDeserializationError – If deserialization fails.
- class ahvn.utils.basic.serialize_utils.AhvnJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
JSONEncoder
- class ahvn.utils.basic.serialize_utils.AhvnJsonDecoder(*args, **kwargs)[source]¶
Bases:
JSONDecoder- __init__(*args, **kwargs)[source]¶
object_hook, if specified, will be called with the result of every JSON object decoded and its return value will be used in place of the givendict. This can be used to provide custom deserializations (e.g. to support JSON-RPC class hinting).object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value ofobject_pairs_hookwill be used instead of thedict. This feature can be used to implement custom decoders. Ifobject_hookis also defined, theobject_pairs_hooktakes priority.parse_float, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal).parse_int, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float).parse_constant, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered.If
strictis false (true is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including'\t'(tab),'\n','\r'and'\0'.- Return type:
None
- ahvn.utils.basic.serialize_utils.loads_json(s, **kwargs)[source]¶
Load a JSON string into a Python object.
- ahvn.utils.basic.serialize_utils.dumps_json(obj, sort_keys=False, indent=4, ensure_ascii=False, **kwargs)[source]¶
Serialize a Python object to a JSON string.
- Parameters:
obj (Any) – The Python object to serialize.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
**kwargs – Additional keyword arguments to pass to json.dumps.
- Returns:
The JSON string representation of the object.
- Return type:
- ahvn.utils.basic.serialize_utils.load_json(path, encoding=None, strict=False, **kwargs)[source]¶
Load a JSON file into a Python object.
- Parameters:
path (str) – The path to the JSON file.
encoding (str) – The encoding to use for reading the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
strict (bool) – If True, raises an error if the file does not exist. Otherwise, returns an empty dictionary.
**kwargs – Additional keyword arguments to pass to json.load.
- Returns:
The Python object represented by the JSON file.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- ahvn.utils.basic.serialize_utils.dump_json(obj, path, sort_keys=False, indent=4, encoding=None, ensure_ascii=False, **kwargs)[source]¶
Save a Python object to a JSON file.
- Parameters:
obj (Any) – The Python object to save.
path (str) – The path to the JSON file.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
encoding (str) – The encoding to use for writing the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
**kwargs – Additional keyword arguments to pass to json.dump.
- ahvn.utils.basic.serialize_utils.save_json(obj, path, sort_keys=False, indent=4, encoding=None, ensure_ascii=False, **kwargs)[source]¶
Alias for dump_json. Saves a Python object to a JSON file.
- Parameters:
obj (Any) – The Python object to save.
path (str) – The path to the JSON file.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
indent (int) – The number of spaces to use for indentation. Defaults to 4.
encoding (str) – The encoding to use for writing the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
**kwargs – Additional keyword arguments to pass to json.dump.
- ahvn.utils.basic.serialize_utils.escape_json(s, args, **kwargs)[source]¶
Fixes corrupted JSON by escaping string values for known keys.
Only processes keys listed in args
Only escapes string values
Leaves non-string values untouched
Handles unescaped quotes and newlines inside strings
- ahvn.utils.basic.serialize_utils.loads_jsonl(s, **kwargs)[source]¶
Load a JSON Lines string into a list of Python objects.
- ahvn.utils.basic.serialize_utils.dumps_jsonl(obj, sort_keys=False, ensure_ascii=False, **kwargs)[source]¶
Serialize a list of Python objects to a JSON Lines string.
Warning
An extra newline will be added at the end of the string to be consistent with the behavior of append_jsonl. indent is NOT a valid argument for this function, as JSON Lines does not support indentation. Passing indent will be ignored.
- Parameters:
- Returns:
The JSON Lines string representation of the list.
- Return type:
- ahvn.utils.basic.serialize_utils.load_jsonl(path, encoding=None, strict=False, **kwargs)[source]¶
Load a JSON Lines file into a list of Python objects.
- Parameters:
path (str) – The path to the JSON Lines file.
encoding (str) – The encoding to use for reading the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
strict (bool) – If True, raises an error if the file does not exist. Otherwise, returns an empty list.
**kwargs – Additional keyword arguments to pass to json.load.
- Returns:
A list of Python objects represented by the JSON Lines file.
- Return type:
List[Any]
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- ahvn.utils.basic.serialize_utils.iter_jsonl(path, encoding=None, strict=False, **kwargs)[source]¶
Iterate over a JSON Lines file, yielding each Python object.
- Parameters:
path (str) – The path to the JSON Lines file.
encoding (str) – The encoding to use for reading the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
strict (bool) – If True, raises an error if the file does not exist. Otherwise, returns an empty list.
**kwargs – Additional keyword arguments to pass to json.load.
- Yields:
Any – Each Python object represented by a line in the JSON Lines file.
- Raises:
FileNotFoundError – If the file does not exist and strict is True.
- Return type:
- ahvn.utils.basic.serialize_utils.dump_jsonl(obj, path, sort_keys=False, ensure_ascii=False, encoding=None, **kwargs)[source]¶
Save a list of Python objects to a JSON Lines file.
Warning
An extra newline will be added at the end of the file to be consistent with the behavior of append_jsonl. indent is NOT a valid argument for this function, as JSON Lines does not support indentation. Passing indent will be ignored.
- Parameters:
obj (List[Any]) – The list of Python objects to save.
path (str) – The path to the JSON Lines file.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
encoding (str) – The encoding to use for writing the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
**kwargs – Additional keyword arguments to pass to json.dump.
- ahvn.utils.basic.serialize_utils.save_jsonl(obj, path, sort_keys=False, ensure_ascii=False, encoding=None, **kwargs)[source]¶
Alias for dump_jsonl. Saves a list of Python objects to a JSON Lines file.
Warning
An extra newline will be added at the end of the file to be consistent with the behavior of append_jsonl. indent is NOT a valid argument for this function, as JSON Lines does not support indentation. Passing indent will be ignored.
- Parameters:
obj (List[Any]) – The list of Python objects to save.
path (str) – The path to the JSON Lines file.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
encoding (str) – The encoding to use for writing the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
**kwargs – Additional keyword arguments to pass to json.dump.
- ahvn.utils.basic.serialize_utils.append_jsonl(obj, path, sort_keys=False, ensure_ascii=False, encoding=None, **kwargs)[source]¶
Append a list of Python objects to a JSON Lines file. If the file does not exist, it will be created. If the object is a dictionary, a single line will be added with the dictionary serialized as JSON. If the object is a list, each item in the list will be serialized as a separate line in the JSON Lines file.
- Parameters:
obj (Union[Dict,List[Any]]) – The list of Python objects to append.
path (str) – The path to the JSON Lines file.
sort_keys (bool) – Whether to sort the keys in the JSON output. Defaults to False.
ensure_ascii (bool) – Whether to escape non-ASCII characters. Defaults to False.
encoding (str) – The encoding to use for writing the file. Defaults to None, which will use the encoding in the config file (“core.encoding”).
**kwargs – Additional keyword arguments to pass to json.dump.