Additional options¶
Annotation vs. filtering¶
If filter_df
is True (the default) Screener
returns a trimmed DataFrame that contains only the reports the LLM marked as relevant to your query.
Setting it to False activates annotate mode: every report/row from your original DataFrame is kept, and a boolean column is added denoting whether the report met your query or not. You can also rename this column with result_col_name
.
Annotate mode is useful where you want to add a column denoting whether the report matched your query, but you don't want to lose the non-matching reports from your dataset.
screener = Screener(
llm=llm_client,
reports=reports,
)
annotated = screener.screen_reports(
user_query=user_query,
filter_df=False, # <--- create annotation column instead of filtering
result_col_name='custody_match' # <--- name of annotation column
)
Choosing which columns the LLM 'sees'¶
By default the LLM model reads the narrative heavyweight sections of each report: investigation, circumstances and concerns. You can expose or hide any field with include_*
flags.
For example, if you are screening based on a specific cause of death, then you should consider setting include_concerns
to False, as including this won't benefit your search.
By contrast, if you are searching for a specific concern, then setting include_investigation
and include_circumstances
to False may improve accuracy, speed up your code, and lead to cheaper LLM calls.
user_query = "Death from insulin overdose due to misprogrammed insulin pumps."
screener = Screener(
llm=llm_client,
reports=reports,
include_concerns=False # <--- Our query doesn't need this section
)
result = screener.screen_reports(user_query=user_query)
In another example, let's say we are only interested in reports sent to a Member of Parliament. We'll want to turn off all default sections and only read from the receiver column.
user_query = "Whether the report was sent to a Member of Parliament (MP)"
screener = Screener(
llm=llm_client,
reports=reports,
# Turn off the defaults...
include_investigation=False,
include_circumstances=False,
include_concerns=False,
include_receiver=True # <--- Read from receiver section
)
result = screener.screen_reports(user_query=user_query)
All options and defaults¶
Flag | Report section | What it's useful for | Default |
---|---|---|---|
include_coroner |
Coroner’s name | Simply the name of the coroner. Rarely needed for screening. | False |
include_area |
Coroner’s area | Useful for geographic questions, e.g. deaths in South-East England. | False |
include_receiver |
Receiver(s) of the report | Great for accountability queries, e.g. reports sent to NHS Wales. | False |
include_investigation |
“Investigation & Inquest” section | Contains procedural detail about the inquest. | True |
include_circumstances |
“Circumstances of Death” section | Describes what actually happened; holds key facts about the death. | True |
include_concerns |
“Coroner’s Concerns” section | Lists the issues the coroner wants addressed — ideal for risk screening. | True |
Returning text spans¶
Set produce_spans=True
when calling screen_reports()
to capture the exact snippets from the report text that justified whether or not a report was returned as relevant or not. A new column called spans_matches_topic
will be created containing these verbatim snippets.
screener = Screener(llm=llm_client, reports=reports)
filtered_reports = screener.screen_reports(
user_query="Where the cause of death was determined to be suicide",
produce_spans=True,
drop_spans=False)
If you only want to use the spans internally, pass drop_spans=True
to remove the column from the returned dataset after screening.
Note
Producing but then dropping spans might seem a bit pointless, but it's actually likely a great way of improving performance. The LLM will generate these spans before deciding whether a report matches the query, allowing it to judge whether these spans truly capture the search criteria.