Skip to content

Schema Reference

Schemas define the structured outputs that DELM extracts from your documents. The schema system supports progressive complexity levels, from simple key‑value extraction to complex nested structures.

Table of Contents

Imports

All schema classes are available directly from the main package:

from delm import Schema, ExtractionVariable

Schema Types

DELM supports three levels of schema complexity, each building on the previous level.

Simple Schema (Level 1)

The simplest form of extraction: individual key‑value pairs found once per chunk.

schema = Schema.simple(
    ExtractionVariable(
        name="company_names",
        description="Company names mentioned in the text",
        data_type="[string]",
        required=False
    ),
    ExtractionVariable(
        name="revenue_numbers",
        description="Revenue figures mentioned",
        data_type="[number]",
        required=False
    ),
    ExtractionVariable(
        name="forecast_year",
        description="Year for which forecast is made",
        data_type="integer",
        required=True,
        validate_in_text=True
    )
)

Output Format:

{
  "company_names": ["Apple", "Microsoft"],
  "revenue_numbers": [1500000000, 2000000000],
  "forecast_year": 2024
}

Nested Schema (Level 2)

Extract structured objects with multiple related fields (a list of dictionaries).

schema = Schema.nested(
    container_name="companies",
    variables_list=[
        ExtractionVariable(
            name="name",
            description="Company name",
            data_type="string",
            required=True
        ),
        ExtractionVariable(
            name="revenue",
            description="Revenue figure in USD",
            data_type="number",
            required=False
        ),
        ExtractionVariable(
            name="sector",
            description="Business sector",
            data_type="string",
            required=False,
            allowed_values=["technology", "finance", "healthcare", "energy", "retail"]
        ),
        ExtractionVariable(
            name="growth_rate",
            description="Annual growth rate percentage",
            data_type="number",
            required=False,
            validate_in_text=True  # Only extract if explicitly mentioned
        ),
        ExtractionVariable(
            name="products",
            description="List of products offered by the company",
            data_type="[string]",
            required=False
        )
    ]
)

Output Format:

{
  "companies": [
    {
      "name": "Apple",
      "revenue": 1500000000,
      "sector": "technology",
      "growth_rate": 12.5,
      "products": ["iPhone", "MacBook", "iPad"]
    },
    {
      "name": "Microsoft",
      "revenue": 2000000000,
      "sector": "technology",
      "growth_rate": null,
      "products": ["Windows", "Office", "Azure"]
    }
  ]
}

Multiple Schemas (Level 3)

Extract multiple independent structured objects simultaneously. These can be simple, nested, or even deep multi‑schemas.

# Define sub-schemas first
companies_schema = Schema.nested(
    container_name="companies",
    variables_list=[
        ExtractionVariable(name="name", description="Company name", data_type="string", required=True),
        ExtractionVariable(name="revenue", description="Revenue figure", data_type="number", required=False)
    ]
)

products_schema = Schema.nested(
    container_name="products",
    variables_list=[
        ExtractionVariable(name="name", description="Product name", data_type="string", required=True),
        ExtractionVariable(name="price", description="Product price in USD", data_type="number", required=False),
        ExtractionVariable(
            name="category", 
            description="Product category", 
            data_type="string", 
            allowed_values=["software", "hardware", "service", "consulting"]
        )
    ]
)

trends_schema = Schema.nested(
    container_name="trends",
    variables_list=[
        ExtractionVariable(name="trend_name", description="Market trend description", data_type="string", required=True),
        ExtractionVariable(
            name="impact", 
            description="Expected impact", 
            data_type="string", 
            allowed_values=["positive", "negative", "neutral"]
        )
    ]
)

# Combine into multiple schema
schema = Schema.multiple(
    companies=companies_schema,
    products=products_schema,
    market_trends=trends_schema
)

Output Format:

{
  "companies": [
    { "name": "Apple", "revenue": 1500000000 }
  ],
  "products": [
    { "name": "iPhone 15", "price": 999, "category": "hardware" }
  ],
  "trends": [
    { "trend_name": "AI adoption acceleration", "impact": "positive" }
  ]
}

Variable Configuration

Each ExtractionVariable can be configured with these arguments.

Required Arguments

Argument Type Description
name string Variable name (used as JSON key)
description string Human‑readable description for LLM
data_type string Data type (see supported types below)

Optional Arguments

Argument Type Default Description
required boolean False Whether field must be present
allowed_values list None List of valid string values (enums)
validate_in_text boolean False Only extract if value literally appears in text

Supported Data Types

Type String Description Example Values
"string" Text values "Apple", "technology"
"number" Floating‑point numbers 1500000000, 12.5
"integer" Whole numbers 2024, 100
"boolean" True/false values True, False
"date" Date strings "2025-09-15"
"[string]" List of strings ["Apple", "Google"]
"[number]" List of numbers [12.5, 42, 100]
"[integer]" List of integers [2024, 100, 7]
"[boolean]" List of booleans [True, False, True]