Skip to content

Client

omniread.pdf.client

PDF client abstractions for OmniRead.

This module defines the client layer responsible for retrieving raw PDF bytes from a concrete backing store.

Clients provide low-level access to PDF binaries and are intentionally decoupled from scraping and parsing logic. They do not perform validation, interpretation, or content extraction.

Typical backing stores include: - Local filesystems - Object storage (S3, GCS, etc.) - Network file systems

BasePDFClient

Bases: ABC

Abstract client responsible for retrieving PDF bytes from a specific backing store (filesystem, S3, FTP, etc.).

Implementations must: - Accept a source identifier appropriate to the backing store - Return the full PDF binary payload - Raise retrieval-specific errors on failure

fetch abstractmethod

fetch(source: Any) -> bytes

Fetch raw PDF bytes from the given source.

Parameters:

Name Type Description Default
source Any

Identifier of the PDF location, such as a file path, object storage key, or remote reference.

required

Returns:

Type Description
bytes

Raw PDF bytes.

Raises:

Type Description
Exception

Retrieval-specific errors defined by the implementation.

FileSystemPDFClient

Bases: BasePDFClient

PDF client that reads from the local filesystem.

This client reads PDF files directly from the disk and returns their raw binary contents.

fetch

fetch(path: Path) -> bytes

Read a PDF file from the local filesystem.

Parameters:

Name Type Description Default
path Path

Filesystem path to the PDF file.

required

Returns:

Type Description
bytes

Raw PDF bytes.

Raises:

Type Description
FileNotFoundError

If the path does not exist.

ValueError

If the path exists but is not a file.