Produce summaries of report text¶

Short, consistent summaries make it possible to skim hundreds or even of thousands of reports. You might use these summaries to quickly identify notable case studies, or to get an 'at a glance' understanding of the breadth contained within PFD reports.

Producing summaries also paves the way for automated theme discovery (see Discover recurring themes).

Use summarise() to condense each report into a short text snippet. The trim_intensity option controls how terse the summary should be. Calling summarise adds a summary column to your stored reports and keeps a copy on the instance under extractor.summarised_reports for later reuse.

Getting started¶

You'll likely wish to screen/filter reports with Screener before generating summaries. For example:

from pfd_toolkit import load_reports, LLM, Screener

# Load reports
reports = load_reports()

# Set up your LLM client
llm_client = LLM(api_key=YOUR-API-KEY)

# Screen reports by user query
user_query = "Deaths in police custody **only**."

screener = Screener(
    llm=llm_client,
    reports=reports
)

police_df = screener.screen_reports(
    user_query=user_query)

Note

For more information on screening reports, see here.

Following this, we can generate summaries of our screened/filtered reports:

from pfd_toolkit import Extractor

# Set up Extractor
extractor = Extractor(
    llm=llm_client,
    reports=reports
)

summary_df = extractor.summarise(trim_intensity="medium")

The resulting DataFrame contains a new column (default name summary).

You can specify a different column name via result_col_name if desired. You can also set a different trim_intensity (options range from low to very high) if desired.

Specify which sections to summarise¶

By default, the summarise() method will trim the Investigation, Circumstances of Death and Coroner's Concerns sections. You can override this by setting the include_* flags. For example:

# Set up Extractor
extractor = Extractor(
    llm=llm_client,
    reports=reports,

    # Decide which sections to include:
    include_investigation=False,
    include_circumstances=False,
    include_concerns=True
)

summary_df = extractor.summarise(trim_intensity="medium")

All options and defaults¶

Flag	Report section	What it's useful for	Default
`include_coroner`	Coroner’s name	Simply the name of the coroner.	`False`
`include_area`	Coroner’s area	The local area the coroner operates within.	`False`
`include_receiver`	Receiver(s) of the report	The recipient(s) of the reports.	`False`
`include_investigation`	“Investigation & Inquest” section	Contains procedural detail about the inquest.	`True`
`include_circumstances`	“Circumstances of Death” section	Describes what actually happened; holds key facts about the death.	`True`
`include_concerns`	“Coroner’s Concerns” section	Lists the issues the coroner believes should be addressed.	`True`

Estimating token counts¶

Token usage is important when working with paid APIs. The estimate_tokens() helper provides a quick approximation of how many tokens a text column will consume.

total = extractor.estimate_tokens()
print(f"Total tokens in summaries: {total}")

estimate_tokens defaults to the summary column, but you can pass any text series via col_name. Set return_series=True to get a per-row estimate instead of the total.