Cleaner
¶
Batch-clean PFD report fields with an LLM.
The cleaner loops over selected columns, builds field-specific prompts and writes the returned text back into a copy of the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reports
|
DataFrame
|
Input DataFrame to clean. |
required |
llm
|
LLM
|
Instance of the |
required |
include_coroner
|
bool
|
Clean the |
True
|
include_receiver
|
bool
|
Clean the |
True
|
include_area
|
bool
|
Clean the |
True
|
include_investigation
|
bool
|
Clean the |
True
|
include_circumstances
|
bool
|
Clean the |
True
|
include_concerns
|
bool
|
Clean the |
True
|
coroner_prompt
|
str or None
|
Custom prompt for the coroner field. Defaults to |
None
|
area_prompt
|
str or None
|
Custom prompt for the area field. Defaults to |
None
|
receiver_prompt
|
str or None
|
Custom prompt for the receiver field. Defaults to |
None
|
investigation_prompt
|
str or None
|
Custom prompt for the investigation field. Defaults to |
None
|
circumstances_prompt
|
str or None
|
Custom prompt for the circumstances field. Defaults to |
None
|
concerns_prompt
|
str or None
|
Custom prompt for the concerns field. Defaults to |
None
|
verbose
|
bool
|
Emit info-level logs for each batch when |
False
|
Attributes:
Name | Type | Description |
---|---|---|
cleaned_reports |
DataFrame
|
Result of the last call to |
coroner_prompt_template, area_prompt_template, ... |
str
|
Finalised prompt strings actually sent to the model. |
Examples:
cleaner = Cleaner(df, llm, include_coroner=False, verbose=True)
cleaned_df = cleaner.clean_reports()
cleaned_df.head()
clean_reports ¶
Run LLM-based cleaning for the configured columns.
The method operates in place on a copy of self.reports
so the
original DataFrame is never mutated.
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame in which the selected columns have been replaced by the LLM output (or left unchanged when the model returns an error marker). |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
anonymise
|
bool
|
When |
False
|
Examples:
cleaner = Cleaner(llm=llm_client, reports=reports)
cleaned = cleaner.clean_reports()
generate_prompt_template ¶
Return the prompt templates used for each field.
The returned dictionary maps DataFrame column names to the full prompt
text with a [TEXT]
placeholder appended to illustrate how the
prompt will look during clean_reports
.