NLM Scrubber: NLM s Software Application to De-identify Clinical Text Documents

NCT ID: NCT02795806

Last Updated: 2025-12-19

Study Results

Results pending

The study team has not published outcome measurements, participant flow, or safety data for this trial yet. Check back later for updates.

Basic Information

Get a concise snapshot of the trial, including recruitment status, study phase, enrollment targets, and key timeline milestones.

Recruitment Status

ENROLLING_BY_INVITATION

Total Enrollment

50000 participants

Study Classification

OBSERVATIONAL

Study Start Date

2016-05-25

Study Completion Date

2027-01-31

Brief Summary

Review the sponsor-provided synopsis that highlights what the study is about and why it is being conducted.

Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy.

U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate.

Objectives:

To find ways to improve clinical text de-identification.

Eligibility:

No new participants. Researchers will review data that have already been collected.

Design:

Researchers will collect a random sample of reports. These will be from different doctors in different fields.

Researchers will manually remove personal information from the records.

Researchers will also automatically remove personal information from original records using NLM-Scrubber.

Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly.

Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete.

Detailed Description

Dive into the extended narrative that explains the scientific background, objectives, and procedures in greater depth.

This study is about the quality assessment, improvement, and monitoring of an automatic clinical text de-identification software application called NLM Scrubber, which has been developed at the National Library of Medicine (NLM). The application has been developed so that clinical reports can be used in secondary scientific studies (i.e., for secondary use) without breaching patient privacy. Research on methods for protecting patient privacy and on the development of NLM Scrubber have been conducted by following the guidelines of and in compliance with HIPAA and the Privacy Act.

In order to further develop and improve NLM Scrubber and assess its de-identification performance effectively, the investigators require the original / unredacted samples from all potential clinical report types and sources. To this end, NLM investigators have been

collaborating with entities within NIH, namely, NIH Clinical Center, BTRIS, and NCI as well as outside entities, Kentucky State Registry administered by University of Kentucky and researchers from the University of Pittsburgh, who stated their interest in integrating NLM

Scrubber to their application called Text Information Extraction System. These entities collect samples of various types of clinical reports for assessing and improving NLM Scrubber performance. However we also need access to the original data in order to assess

potential problems and improve the accuracy of NLM Scrubber.

Conditions

See the medical conditions and disease areas that this research is targeting or investigating.

Personally Identifiable Information

Keywords

Explore important study keywords that can help with search, categorization, and topic discovery.

Address Natural History

Study Design

Understand how the trial is structured, including allocation methods, masking strategies, primary purpose, and other design elements.

Observational Model Type

OTHER

Study Time Perspective

RETROSPECTIVE

Study Groups

Review each arm or cohort in the study, along with the interventions and objectives associated with them.

1

Everybody for whom a clinical narrative report is created.

No interventions assigned to this group

Eligibility Criteria

Check the participation requirements, including inclusion and exclusion rules, age limits, and whether healthy volunteers are accepted.

Inclusion Criteria

* No new participant enrollment. Researchers will review data that have already been collected.
Minimum Eligible Age

1 Day

Eligible Sex

ALL

Accepts Healthy Volunteers

No

Sponsors

Meet the organizations funding or collaborating on the study and learn about their roles.

National Cancer Institute (NCI)

NIH

Sponsor Role collaborator

National Institutes of Health Clinical Center (CC)

NIH

Sponsor Role collaborator

National Library of Medicine (NLM)

NIH

Sponsor Role lead

Responsible Party

Identify the individual or organization who holds primary responsibility for the study information submitted to regulators.

Responsibility Role SPONSOR

Principal Investigators

Learn about the lead researchers overseeing the trial and their institutional affiliations.

Mehmet M Kayaalp, Ph.D.

Role: PRINCIPAL_INVESTIGATOR

National Library of Medicine (NLM)

Locations

Explore where the study is taking place and check the recruitment status at each participating site.

National Library of Medicine

Bethesda, Maryland, United States

Site Status

Countries

Review the countries where the study has at least one active or historical site.

United States

References

Explore related publications, articles, or registry entries linked to this study.

Kayaalp M. Patient Privacy in the Era of Big Data. Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.

Reference Type BACKGROUND
PMID: 28903886 (View on PubMed)

Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.

Reference Type BACKGROUND
PMID: 25954383 (View on PubMed)

Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.

Reference Type BACKGROUND
PMID: 24026308 (View on PubMed)

Other Identifiers

Review additional registry numbers or institutional identifiers associated with this trial.

16-LM-N122

Identifier Type: -

Identifier Source: secondary_id

999916122

Identifier Type: -

Identifier Source: org_study_id