Skip to content

Cost Estimation

Utilities for estimating API costs before running extractions.

estimate_input_token_cost()

Estimate cost based on input tokens only (free, no API calls).

from delm.utils.cost_estimation import estimate_input_token_cost

cost_report = estimate_input_token_cost(
    config: DELM | DELMConfig | str | Path,
    data_source: str | Path | pd.DataFrame,
    save_file_log: bool = False,
    log_dir: str | Path | None = ".delm/logs/cost_estimation",
    console_log_level: str = "INFO",
    file_log_level: str = "DEBUG"
) -> dict

Parameters: - config: DELM instance, DELMConfig, or path to config YAML - data_source: Input data (file path, directory, or DataFrame) - save_file_log: Save log file - log_dir: Log directory - console_log_level: Console verbosity - file_log_level: File verbosity

Returns: Dictionary with: - estimated_input_tokens (int) - estimated_input_cost (float) - num_records (int) - num_chunks (int)

Note: Counts cached requests toward token cost (they would be cached on first run).

estimate_total_cost()

Estimate total cost (input + output tokens) using sample API calls.

from delm.utils.cost_estimation import estimate_total_cost

cost_report = estimate_total_cost(
    config: DELM | DELMConfig | str | Path,
    data_source: str | Path | pd.DataFrame,
    sample_size: int = 10,
    save_file_log: bool = False,
    log_dir: str | Path | None = ".delm/logs/cost_estimation",
    console_log_level: str = "INFO",
    file_log_level: str = "DEBUG"
) -> dict

Parameters: - config: DELM instance, DELMConfig, or path to config YAML - data_source: Input data - sample_size: Number of chunks to sample for estimation - save_file_log, log_dir, console_log_level, file_log_level: Logging settings

Returns: Dictionary with: - estimated_total_cost (float) - estimated_input_tokens (int) - estimated_output_tokens (int) - estimated_input_cost (float) - estimated_output_cost (float) - sample_size (int) - total_chunks (int)

Warning: Makes real API calls (costs apply).

Example

from delm import DELM, Schema, ExtractionVariable
from delm.utils.cost_estimation import estimate_input_token_cost, estimate_total_cost

schema = Schema.simple(
    ExtractionVariable("price", "Price value", "number")
)

delm = DELM(
    schema=schema,
    provider="openai",
    model="gpt-4o-mini",
    model_input_cost_per_1M_tokens=0.15,  # Custom pricing
    model_output_cost_per_1M_tokens=0.60
)

# Free estimate (input tokens only)
input_cost = estimate_input_token_cost(delm, "data.csv")
print(f"Input cost: ${input_cost['estimated_input_cost']:.4f}")

# Sample-based estimate (costs ~$0.01)
total_cost = estimate_total_cost(delm, "data.csv", sample_size=10)
print(f"Total estimated cost: ${total_cost['estimated_total_cost']:.2f}")