Skip to content

Screener

Classifies a list of report texts against a user-defined topic using an LLM.

This class takes a DataFrame of reports, a user query, and various configuration options to classify whether each report matches the query. It can either filter the DataFrame to return only matching reports or add a classification column to the original DataFrame.

Parameters:

Name Type Description Default
llm LLM

An instance of the LLM class from pfd_toolkit.

None
reports DataFrame

A DataFrame containing Prevention of Future Death reports.

None
verbose bool

If True, print more detailed logs. Defaults to False.

False
include_date bool

Flag to determine if the 'date' column is included. Defaults to False.

False
include_coroner bool

Flag to determine if the 'coroner' column is included. Defaults to False.

False
include_area bool

Flag to determine if the 'area' column is included. Defaults to False.

False
include_receiver bool

Flag to determine if the 'receiver' column is included. Defaults to False.

False
include_investigation bool

Flag to determine if the 'investigation' column is included. Defaults to True.

True
include_circumstances bool

Flag to determine if the 'circumstances' column is included. Defaults to True.

True
include_concerns bool

Flag to determine if the 'concerns' column is included. Defaults to True.

True

Examples:

user_topic = "medication errors"
llm_client = LLM()
screener = Screener(llm=llm_client, reports=reports_df)
screened_reports = screener.screen_reports(user_query=user_topic)
print(f"Found {len(screened_reports)} report(s) on '{user_topic}'.")

screen_reports

screen_reports(
    reports=None,
    user_query=None,
    filter_df=True,
    result_col_name="matches_query",
    produce_spans=False,
    drop_spans=False,
)

Classifies reports in the DataFrame against the user-defined topic using the LLM.

Parameters:

Name Type Description Default
reports DataFrame

If provided, this DataFrame will be used for screening, replacing any DataFrame stored in the instance for this call.

None
user_query str

If provided, this query will be used, overriding any query stored in the instance for this call. The prompt template will be rebuilt.

None
filter_df bool

If True the returned DataFrame is filtered to only matching reports. Defaults to True.

True
result_col_name str

Name of the boolean column added when filter_df is False. Defaults to "matches_query".

'matches_query'
produce_spans bool

When True a spans_matches_topic column is created containing the text snippet that justified the classification. Defaults to False.

False
drop_spans bool

When True and produce_spans is also True, the spans_matches_topic column is removed from the returned DataFrame. Defaults to False.

False

Returns:

Type Description
DataFrame

Either a filtered DataFrame (if filter_df is True), or the original DataFrame with an added classification column.

Examples:

reports_df = pd.DataFrame(data)
screener = Screener(LLM(), reports=reports_df)

# Screen reports with the initial query
filtered_df = screener.screen_reports(user_query="medication safety")

# Screen the same reports with a new query and add a classification column
classified_df = screener.screen_reports(user_query="tree safety", filter_df=False)