Creating an LLM Client¶

PFD Toolkit uses a Large Language Model (LLM) client for advanced features. This page explains how to set up your LLM, including how to get an API key.

Setting up your LLM client¶

To use AI-powered features, you need to create an LLM client and supply your OpenAI API key (how to get one below). You do not need an LLM client to simply load report data (i.e. using load_reports).

Basic setup:

from pfd_toolkit import LLM

llm_client = LLM(api_key=YOUR-API-KEY) # Replace YOUR-API-KEY with your real API key

You can now use LLM-powered features! For example, to screen for reports about medication purchased online:

from pfd_toolkit import Screener

query = "Deaths that followed ordering medication(s) online."

screener = Screener(llm=llm_client, reports=reports)
online_med_reports = screener.screen_reports(user_query=query)

How do I get an OpenAI API key?¶

Sign up or log in at OpenAI Platform.
Go to API Keys.
Click “Create new secret key” and copy the string.
Store your key somewhere safe. Never share or publish it.
Add credit to your account (just $5 is enough for most research uses).

For more information about usage costs, see OpenAI pricing.

Warning

OpenAI currently mandate that your account must be at least than 48 hours old before being able to run LLMs as normal. If you're setting up your account for this first time, you might have to wait a couple of days before using PFD Toolkit's advanced features.

Hide your API key¶

It's important that you never share your API key. This includes making sure you don't commit any code containing your API to GitHub.

The below code shows you how to stealthily import your API key without ever printing it in your code or console:

from pfd_toolkit import LLM
from dotenv import load_dotenv
import os

# Load OpenAI API key from file
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key)

This code assumes you've got a file called api.env in the same directory as your script. api.env should look like this:

OPENAI_API_KEY = [...]

where [...] is your API key.

Finally, create a file called .gitignore at the front of your directory. Inside it, simply provide the directory of your api.env file:

path/to/your/api.env/file

This tells GitHub not to commit your api.env file, keeping it protected.

For more information on .gitignore, see here.

Speed up your LLM¶

Process more reports in parallel by increasing the max_workers parameter. By default, this is set to 8, but larger values can lead to faster run-times.

llm_client = LLM(
    api_key=openai_api_key,
    max_workers=20      # Increase parallelisation
)

Note

OpenAI enforces rate limits for each account and model. If you set max_workers too high, you may hit these limits and see errors or slowdowns. PFD Toolkit will automatically pause and retry if a rate limit is reached, but it’s best to keep max_workers within a reasonable range (usually 8 to 20 for most users).

Your exact rate limit may depend on the 'tier' of your OpenAI account as well as the model you're using. If you need higher limits, you may be able to apply for an increase in your OpenAI account settings.

Change your model¶

By default, PFD Toolkit uses gpt-4.1. We love this model as it balances cost, speed, and accuracy. We also recommend its smaller equivalent, gpt-4.1-mini, which offers decent performance at a lower API cost.

llm_client = LLM(
    api_key=openai_api_key,
    model="gpt-4.1"     # Set model here
)

See OpenAI's documentation for a complete list of their models.