Parser

omniread.core.parser

Abstract parsing contracts for OmniRead.

This module defines the format-agnostic parser interface used to transform raw content into structured, typed representations.

Parsers are responsible for: - Interpreting a single Content instance - Validating compatibility with the content type - Producing a structured output suitable for downstream consumers

Parsers are not responsible for: - Fetching or acquiring content - Performing retries or error recovery - Managing multiple content sources

BaseParser

BaseParser(content: Content)

Bases: ABC, Generic[T]

Base interface for all parsers.

A parser is a self-contained object that owns the Content it is responsible for interpreting.

Implementations must: - Declare supported content types via supported_types - Raise parsing-specific exceptions from parse() - Remain deterministic for a given input

Consumers may rely on: - Early validation of content compatibility - Type-stable return values from parse()

Initialize the parser with content to be parsed.

Parameters:

Name	Type	Description	Default
`content`	`Content`	Content instance to be parsed.	required

Raises:

Type	Description
`ValueError`	If the content type is not supported by this parser.

supported_types `class-attribute` `instance-attribute`

supported_types: Set[ContentType] = set()

Set of content types supported by this parser.

An empty set indicates that the parser is content-type agnostic.

parse `abstractmethod`

parse() -> T

Parse the owned content into structured output.

Implementations must fully consume the provided content and return a deterministic, structured output.

Returns:

Type	Description
`T`	Parsed, structured representation.

Raises:

Type	Description
`Exception`	Parsing-specific errors as defined by the implementation.

supports

supports() -> bool

Check whether this parser supports the content's type.