What does an excerpt look like?

Every pathology lab in the Netherlands sends excerpts to Palga’s database on a daily basis. An excerpt is a summary of the original pathology report containing relevant data, the pathologist’s conclusion and diagnosis coding lines. The pathologist adds one or more diagnosis coding lines – consisting of a combination of diagnostic terms (localisation, method, abnormality) – from the Palga Thesaurus to each report. For instance: lung*biopsy*adenocarcinoma. These diagnostic terms are automatically linked to one or more classification codes from a hierarchical coding system based on SNOMED(-CT).

Personal data is pseudonymised by a Trusted Third Party (ZorgTTP), ensuring that it cannot be traced back to patients. For example, one pseudonymisation method uses the first four letters of the patient’s name, their date of birth and their sex. Another method uses the first eight letters of the patient’s name.

Palga’s database offers unprecedented potential for scientific research. When researchers submit a data request to Palga, they receive an Excel file containing the excerpts that are relevant to them. These excerpts usually contain a standard selection of variables. In consultation with the data request consultants, this selection can be expanded to include additional variables if the researcher can demonstrate that these are relevant to the study.

Palga patient no	Ascending patient numbers. New patient numbers are created for each request. The term ‘patient’ is based on equality of pseudonyms.
Palga excerpt no	Chronological numbering of excerpts relating to the same patient.
Selection	Method used to select the excerpts (most common example: 1 = goal selection excerpt, 3 = history excerpt).
Chance_admin_multiple	The probability that excerpts with the same Palga patient number do not belong to the same patient. This ranges from 1 (no chance) to 3 (real chance) and depends, among other things, on the amount of information available per excerpt.
Date of receipt	Date of receipt of tissue samples by pathology lab.
Type of test	Type of test (T=histology, C=cytology, B=population cytology, S=autopsy).
Sex	Patient’s sex.
Age	Patient’s age on date of receipt of tissue samples by pathology lab.
Year of test	Year of receipt of tissue samples by pathology lab.
Conclusion	Conclusion from pathology report.
Palga diagnoses	Diagnosis lines associated with the pathology report.
Palga codes	Diagnosis coding lines associated with the pathology report.

By mutual agreement, the dataset may be expanded to include the following:

Additional ‘long text fields’: clinical data, macroscopy, microscopy, epicrisis.
(Anonymous) lab numbers, provided the labs have given their consent and the numbers cannot be traced back.
Academic or non-academic lab.
Date of the most recent authorisation of the report.
Test done as part of screening programme for colon cancer.
KOPAC-B code for cervical cytology.

Because the Palga database primarily contains excerpts from narrative reports, researchers must extract the desired data (e.g. tumour diameter, degree of differentiation) from the conclusion, or possibly from another ‘long text field’.

Protocol variables

Pathology labs are increasingly using standardised synoptic reporting based on national guidelines. Unlike narrative reports, where the pathologist dictates their findings, synoptic reporting involves filling in a template (protocol) to create a structured report. The Protocols page provides up-to-date information on the available protocols. Protocols are available for almost all tumours for which national guidelines have been established. There are also protocols for a number of non-malignancies.

Labs and pathologists are not obliged to use standardised reporting, so not all reports will be protocolised.

Data from standardised reports is stored separately in the Palga database, making it highly suitable for scientific research. If a data delivery contains excerpts from standardised pathology reports, the researcher may also receive protocol variables. If this is the case, they can assess which variables are relevant to the study together with the consultant.