Pipeline API¶
The DELM
class coordinates configuration validation, experiment setup, preprocessing,
and batched extraction. Use this page to review constructor arguments and helper methods.
delm.delm.DELM ¶
Extraction pipeline with pluggable strategies.
Attributes: |
|
---|
__init__ ¶
__init__(
*,
config: DELMConfig,
experiment_name: str,
experiment_directory: Path,
overwrite_experiment: bool = False,
auto_checkpoint_and_resume_experiment: bool = True,
use_disk_storage: bool = True,
save_file_log: bool = True,
log_dir: Union[str, Optional][Path] = None,
console_log_level: str = DEFAULT_CONSOLE_LOG_LEVEL,
file_log_level: str = DEFAULT_FILE_LOG_LEVEL,
override_logging: bool = True
) -> None
Initialize the DELM extraction pipeline.
Parameters: |
|
---|
Raises: |
|
---|
from_yaml
classmethod
¶
from_yaml(
config_path: Union[str, Path],
experiment_name: str,
experiment_directory: Path,
**kwargs: Any
) -> "DELM"
Create a DELM instance from a YAML configuration file.
Parameters: |
|
---|
Returns: |
|
---|
from_dict
classmethod
¶
from_dict(
config_dict: Dict[str, Any],
experiment_name: str,
experiment_directory: Path,
**kwargs: Any
) -> "DELM"
Create a DELM instance from a configuration dictionary.
Parameters: |
|
---|
Returns: |
|
---|
prep_data ¶
prep_data(
data: Union[str, Path] | DataFrame,
sample_size: int = -1,
) -> pd.DataFrame
Preprocess data using the instance config and always save to the experiment manager.
Parameters: |
|
---|
Returns: |
|
---|
process_via_llm ¶
process_via_llm(
preprocessed_file_path: Optional[Path] = None,
) -> pd.DataFrame
Process data through LLM extraction using configuration from constructor, with batch checkpointing and resuming.
Parameters: |
|
---|
Returns: |
|
---|
get_extraction_results ¶
get_extraction_results() -> pd.DataFrame
Get the results from the experiment manager.
Returns: |
|
---|
get_cost_summary ¶
get_cost_summary() -> dict[str, Any]
Get the cost summary from the cost tracker.
Returns: |
|
---|
Raises: |
|
---|