Scraper
omniread.pdf.scraper
PDF scraping implementation for OmniRead.
This module provides a PDF-specific scraper that coordinates PDF byte
retrieval via a client and normalizes the result into a Content object.
The scraper implements the core BaseScraper contract while delegating
all storage and access concerns to a BasePDFClient implementation.
PDFScraper
PDFScraper(*, client: BasePDFClient)
Bases: BaseScraper
Scraper for PDF sources.
Delegates byte retrieval to a PDF client and normalizes output into Content.
The scraper: - Does not perform parsing or interpretation - Does not assume a specific storage backend - Preserves caller-provided metadata
Initialize the PDF scraper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client |
BasePDFClient
|
PDF client responsible for retrieving raw PDF bytes. |
required |
fetch
fetch(source: Any, *, metadata: Optional[Mapping[str, Any]] = None) -> Content
Fetch a PDF document from the given source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source |
Any
|
Identifier of the PDF source as understood by the configured PDF client. |
required |
metadata |
Optional[Mapping[str, Any]]
|
Optional metadata to attach to the returned content. |
None
|
Returns:
| Type | Description |
|---|---|
Content
|
A |
Content
|
|
Content
|
|
Content
|
|
Content
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
Retrieval-specific errors raised by the PDF client. |