Skip to content

LLM

Wrapper around the OpenAI Python SDK for batch prompting.

The helper provides:

  • generate for plain or vision-enabled prompts with optional pydantic validation.
  • _call_llm_fallback used by the scraper when HTML and PDF heuristics fail.
  • Built-in back-off and host-wide throttling via a semaphore.

Parameters:

Name Type Description Default
api_key str

OpenAI (or proxy) API key. Defaults to None which expects the environment variable to be set.

None
model str

Chat model name. Defaults to "gpt-4.1".

'gpt-4.1'
base_url str or None

Override the OpenAI endpoint. Defaults to None.

None
max_workers int

Maximum parallel workers for batch calls and for the global semaphore. Defaults to 8.

8
temperature float

Sampling temperature used for all requests. Defaults to 0.0.

0.0
seed int or None

Deterministic seed value passed to the API. Defaults to None.

None
validation_attempts int

Number of times to retry parsing LLM output into a pydantic model. Defaults to 2.

2
timeout float | Timeout | None

Override the HTTP timeout in seconds. None uses the OpenAI client default of 600 seconds.

120

Attributes:

Name Type Description
_sem Semaphore

Global semaphore that limits concurrent requests to max_workers.

client Client

Low-level SDK client configured with key and base URL.

Examples:

llm_client = LLM(api_key="sk-...", model="gpt-4o-mini", temperature=0.2,
          timeout=600)

estimate_tokens

estimate_tokens(texts, model=None)

Return token counts for text using tiktoken.

Parameters:

Name Type Description Default
texts list[str] | str

Input strings to tokenise.

required
model str

Model name for selecting the encoding. Defaults to self.model.

None

Returns:

Type Description
list[int]

Token counts in the same order as texts.

generate

generate(
    prompts,
    images_list=None,
    response_format=None,
    max_workers=None,
    tqdm_extra_kwargs=None,
)

Run many prompts either sequentially or in parallel.

    Parameters:
    prompts : list[str]
        List of user prompts. One prompt per model call.

    images_list : list[list[bytes]] or None, optional
        For vision models: a parallel list where each inner list
        holds **base64-encoded** JPEG pages for that prompt.  Use
        *None* to send no images.

    response_format : type[pydantic.BaseModel] or None, optional
        If provided, each response is parsed into that model via the
        *beta/parse* endpoint; otherwise a raw string is returned.

    max_workers : int or None, optional
        Thread count just for this batch. ``None`` uses the instance-wide
        ``max_workers`` value. Defaults to ``None``.
    Returns:
    list[Union[pydantic.BaseModel, str]]
        Results in the same order as `prompts`.
    Raises:
    openai.RateLimitError
        Raised only if the exponential back-off exhausts all retries.
    openai.APIConnectionError
        Raised if network issues persist beyond the retry window.
    openai.APITimeoutError
        Raised if the API repeatedly times out.
    Examples:
        msgs = ["Summarise:

" + txt for txt in docs] summaries = llm.generate(msgs)