Skip to content

Cleaner

Batch-clean PFD report fields with an LLM.

The cleaner loops over selected columns, builds field-specific prompts and writes the returned text back into a copy of the DataFrame.

Parameters:

Name Type Description Default
reports DataFrame

Input DataFrame to clean.

required
llm LLM

Instance of the LLM helper used for prompting.

required
include_coroner bool

Clean the coroner column. Defaults to True.

True
include_receiver bool

Clean the receiver column. Defaults to True.

True
include_area bool

Clean the area column. Defaults to True.

True
include_investigation bool

Clean the investigation column. Defaults to True.

True
include_circumstances bool

Clean the circumstances column. Defaults to True.

True
include_concerns bool

Clean the concerns column. Defaults to True.

True
coroner_prompt str or None

Custom prompt for the coroner field. Defaults to None.

None
area_prompt str or None

Custom prompt for the area field. Defaults to None.

None
receiver_prompt str or None

Custom prompt for the receiver field. Defaults to None.

None
investigation_prompt str or None

Custom prompt for the investigation field. Defaults to None.

None
circumstances_prompt str or None

Custom prompt for the circumstances field. Defaults to None.

None
concerns_prompt str or None

Custom prompt for the concerns field. Defaults to None.

None
verbose bool

Emit info-level logs for each batch when True. Defaults to False.

False

Attributes:

Name Type Description
cleaned_reports DataFrame

Result of the last call to clean_reports.

coroner_prompt_template, area_prompt_template, ... str

Finalised prompt strings actually sent to the model.

Examples:

cleaner = Cleaner(df, llm, include_coroner=False, verbose=True)
cleaned_df = cleaner.clean_reports()
cleaned_df.head()

clean_reports

clean_reports(anonymise=False)

Run LLM-based cleaning for the configured columns.

The method operates in place on a copy of self.reports so the original DataFrame is never mutated.

Returns:

Type Description
DataFrame

A new DataFrame in which the selected columns have been replaced by the LLM output (or left unchanged when the model returns an error marker).

Parameters:

Name Type Description Default
anonymise bool

When True append an instruction to anonymise names and pronouns in the investigation, circumstances and concerns fields. Defaults to False.

False

Examples:

cleaner = Cleaner(llm=llm_client, reports=reports)
cleaned = cleaner.clean_reports()

generate_prompt_template

generate_prompt_template()

Return the prompt templates used for each field.

The returned dictionary maps DataFrame column names to the full prompt text with a [TEXT] placeholder appended to illustrate how the prompt will look during clean_reports.