`Cleaner`¶

Batch-clean PFD report fields with an LLM.

The cleaner loops over selected columns, builds field-specific prompts and writes the returned text back into a copy of the DataFrame.

Parameters:

Name	Type	Description	Default
`reports`	`DataFrame`	Input DataFrame to clean.	required
`llm`	`LLM`	Instance of the `LLM` helper used for prompting.	required
`include_coroner`	`bool`	Clean the `coroner` column. Defaults to `True`.	`True`
`include_receiver`	`bool`	Clean the `receiver` column. Defaults to `True`.	`True`
`include_area`	`bool`	Clean the `area` column. Defaults to `True`.	`True`
`include_investigation`	`bool`	Clean the `investigation` column. Defaults to `True`.	`True`
`include_circumstances`	`bool`	Clean the `circumstances` column. Defaults to `True`.	`True`
`include_concerns`	`bool`	Clean the `concerns` column. Defaults to `True`.	`True`
`coroner_prompt`	`str or None`	Custom prompt for the coroner field. Defaults to `None`.	`None`
`area_prompt`	`str or None`	Custom prompt for the area field. Defaults to `None`.	`None`
`receiver_prompt`	`str or None`	Custom prompt for the receiver field. Defaults to `None`.	`None`
`investigation_prompt`	`str or None`	Custom prompt for the investigation field. Defaults to `None`.	`None`
`circumstances_prompt`	`str or None`	Custom prompt for the circumstances field. Defaults to `None`.	`None`
`concerns_prompt`	`str or None`	Custom prompt for the concerns field. Defaults to `None`.	`None`
`verbose`	`bool`	Emit info-level logs for each batch when `True`. Defaults to `False`.	`False`

Attributes:

Name	Type	Description
`cleaned_reports`	`DataFrame`	Result of the last call to `clean_reports`.
`coroner_prompt_template, area_prompt_template, ...`	`str`	Finalised prompt strings actually sent to the model.

Examples:

cleaner = Cleaner(df, llm, include_coroner=False, verbose=True)
cleaned_df = cleaner.clean_reports()
cleaned_df.head()

clean_reports ¶

clean_reports(anonymise=False)

Run LLM-based cleaning for the configured columns.

The method operates in place on a copy of self.reports so the original DataFrame is never mutated.

Returns:

Type	Description
`DataFrame`	A new DataFrame in which the selected columns have been replaced by the LLM output (or left unchanged when the model returns an error marker).

Parameters:

Name	Type	Description	Default
`anonymise`	`bool`	When `True` append an instruction to anonymise names and pronouns in the investigation, circumstances and concerns fields. Defaults to `False`.	`False`

Examples:

cleaner = Cleaner(llm=llm_client, reports=reports)
cleaned = cleaner.clean_reports()

generate_prompt_template ¶

generate_prompt_template()

Return the prompt templates used for each field.

The returned dictionary maps DataFrame column names to the full prompt text with a [TEXT] placeholder appended to illustrate how the prompt will look during clean_reports.

Cleaner¶

clean_reports ¶

generate_prompt_template ¶

`Cleaner`¶