Relevance Scorers¶

Score text chunks by relevance for filtering.

Base Class¶

from delm.strategies import RelevanceScorer

class CustomScorer(RelevanceScorer):
    def score(self, text_chunk: str) -> float:
        # Return 0.0-1.0 score
        return score

    def to_dict(self) -> dict:
        return {"type": "CustomScorer", ...}

    @classmethod
    def from_dict(cls, data: dict) -> "CustomScorer":
        return cls(...)

Required for disk storage: Implement to_dict() and from_dict(), then register:

from delm.strategies import SCORER_REGISTRY
SCORER_REGISTRY["CustomScorer"] = CustomScorer

Built-in Scorers¶

KeywordScorer¶

Binary score (0.0 or 1.0) based on keyword presence.

from delm import DELM

delm = DELM(
    schema=schema,
    relevance_scorer={
        "type": "keyword",
        "keywords": ["price", "revenue", "earnings"]
    },
    score_filter="delm_score > 0"  # Keep only matching chunks
)

Parameters: - keywords (List[str]): Keywords to search for (case-insensitive)

Score: - 1.0 if any keyword found - 0.0 otherwise

FuzzyScorer¶

Fuzzy matching score (0.0-1.0) using rapidfuzz.

delm = DELM(
    schema=schema,
    relevance_scorer={
        "type": "fuzzy",
        "target_phrases": ["quarterly earnings report", "financial statement"]
    },
    score_filter="delm_score > 0.7"  # Keep high-similarity chunks
)

Parameters: - target_phrases (List[str]): Phrases to fuzzy match against

Score: Max fuzzy match score (0.0-1.0) across all target phrases

Requirements: pip install rapidfuzz

Filtering¶

Use score_filter with pandas query syntax:

delm = DELM(
    schema=schema,
    relevance_scorer={"type": "keyword", "keywords": ["revenue"]},
    score_filter="delm_score > 0.5"  # Only process chunks with score > 0.5
)

Valid filters: - "delm_score > 0.5" - "delm_score >= 0.8" - "delm_score == 1.0"

Note: score_filter requires a relevance_scorer.

Class-based Definition¶

from delm.strategies import KeywordScorer, FuzzyScorer

scorer = KeywordScorer(keywords=["price", "cost"])
# Or
scorer = FuzzyScorer(target_phrases=["quarterly earnings"])

delm = DELM(
    schema=schema,
    relevance_scorer=scorer,
    score_filter="delm_score > 0"
)

Registry¶

Access all available scorers:

from delm.strategies import SCORER_REGISTRY

print(SCORER_REGISTRY.keys())
# dict_keys(['keyword', 'fuzzy', ...])