Skip to content

DELM

Data Extraction with Language Models – A Python toolkit for extracting structured data from unstructured text using LLMs.

Why DELM?

Extracting structured data from documents at scale is harder than it should be. You need consistent prompts, validation logic, retry handling, cost tracking, and robust file processing—before you even get to your actual research questions.

DELM provides the infrastructure layer so you can focus on defining what to extract, not how to extract it:

  • Declare your schema, not your prompts – Specify fields with types, validation rules, and descriptions. DELM generates prompts, validates outputs, and handles malformed responses.
  • Test before you spend – Estimate costs on sample data, set hard budget limits, and automatically cache results to avoid paying for the same extraction twice.
  • Scale without breaking – Process 100K+ documents with automatic checkpointing, concurrent batching, and text preprocessing (splitting, relevance filtering) built in.
  • Model independence – Switch between OpenAI, Anthropic, Google, or any provider Instructor supports without rewriting code.
  • Measure quality – Built-in precision/recall evaluation against ground truth, with field-level metrics for debugging.

Quick Example

from delm import DELM, Schema, ExtractionVariable

# Define what to extract
schema = Schema.simple(
    ExtractionVariable("company", "Company name", "string"),
    ExtractionVariable("price", "Stock price", "number")
)

# Configure extraction
delm = DELM(
    schema=schema,
    provider="openai",
    model="gpt-4o-mini"
)

# Extract from data
results = delm.extract("financial_reports.csv")

Getting Started

→ Installation & First Extraction

Install DELM, set up API keys, and run your first extraction in under 5 minutes.

Documentation

User Guide

Core concepts and common workflows:

Advanced Topics

Power user features for large-scale deployments:

API Reference

Complete technical documentation:

Support