Skip to content

System Constants

DELM system constants for column names and defaults.

System Columns

Column names automatically added by DELM.

from delm import (
    SYSTEM_FILE_NAME_COLUMN,
    SYSTEM_RAW_DATA_COLUMN,
    SYSTEM_RECORD_ID_COLUMN,
    SYSTEM_CHUNK_COLUMN,
    SYSTEM_CHUNK_ID_COLUMN,
    SYSTEM_SCORE_COLUMN,
    SYSTEM_BATCH_ID_COLUMN,
    SYSTEM_ERRORS_COLUMN,
    SYSTEM_EXTRACTED_DATA_JSON_COLUMN
)
Constant Value Description
SYSTEM_FILE_NAME_COLUMN "delm_file_name" Source filename (directories only)
SYSTEM_RAW_DATA_COLUMN "delm_raw_data" Original raw text (before splitting)
SYSTEM_RECORD_ID_COLUMN "delm_record_id" Unique record identifier
SYSTEM_CHUNK_COLUMN "delm_text_chunk" Text chunk (after splitting)
SYSTEM_CHUNK_ID_COLUMN "delm_chunk_id" Unique chunk identifier
SYSTEM_SCORE_COLUMN "delm_score" Relevance score (if scorer used)
SYSTEM_BATCH_ID_COLUMN "delm_batch_id" Batch number
SYSTEM_ERRORS_COLUMN "delm_errors" Extraction errors (if any)
SYSTEM_EXTRACTED_DATA_JSON_COLUMN "delm_extracted_data_json" Extracted JSON data

Experiment Directory

Constants for disk storage structure.

from delm import (
    DATA_DIR_NAME,
    PROCESSING_CACHE_DIR_NAME,
    BATCH_FILE_PREFIX,
    STATE_FILE_NAME,
    CONSOLIDATED_RESULT_FILE_NAME,
    PREPROCESSED_DATA_FILE_NAME
)
Constant Value Description
DATA_DIR_NAME "delm_data" Preprocessed data directory
PROCESSING_CACHE_DIR_NAME "delm_llm_processing" LLM processing cache directory
BATCH_FILE_PREFIX "batch_" Batch file prefix
STATE_FILE_NAME "state.json" Checkpoint state file
CONSOLIDATED_RESULT_FILE_NAME "extraction_result.feather" Final results file
PREPROCESSED_DATA_FILE_NAME "preprocessed.feather" Preprocessed data file

Other Constants

from delm import SYSTEM_RANDOM_SEED, IGNORE_FILES
Constant Value Description
SYSTEM_RANDOM_SEED 42 Random seed for sampling
IGNORE_FILES [".DS_Store", ...] Files to ignore when loading directories

Usage Example

from delm import DELM, SYSTEM_EXTRACTED_DATA_JSON_COLUMN, SYSTEM_SCORE_COLUMN

delm = DELM(schema=schema, relevance_scorer=scorer)
results = delm.extract("data.csv")

# Access system columns
results[SYSTEM_EXTRACTED_DATA_JSON_COLUMN]  # Extracted JSON
results[SYSTEM_SCORE_COLUMN]  # Relevance scores