Lead Institution: University of Illinois at Urbana-Champaign

Project Leader: Carl Gunter and Bernard A’cs

Research Progress

  • Abstract
    The aim of the Pan-SHARP CCD De-identification Project is to take CCDs generated from an EMR system and de-identify them using the HIPPA Safe Harbor guidelines. A Continuity of Care Document (CCD) is an XML-based document meant to summarize a patient’s health information. The CCD schema is derived from the HL7 Clinical Document Architecture (CDA) which structures and specifies the encoding of health data. Such documents are generated routinely for primary care. A system that de-identifies them makes them potentially usable for sharing to support research.

  • Focus of the research/Market need for this project
    At a high level, HIPAA Safe Harbor de-identification of a document involves the removal of 18 elements that are considered (potentially) identifying. This can be challenging for two reasons: the use of free-text attributes and the re-use of elements across various contexts. The focus of this project was to show that the standards for CCDs can be used to address these problems and construct a general purpose tool that can perform HIPAA Safe Harbor de-identification. Such a tool could be valuable for research applications to systems that are set up to generate CCDs but where these must be make sharable by de-identification.

  • Project Aims/Goals (SHARPS)
    The aim is to design, develop, and implement a web service that: 1) consumes a fully identified Continuing Care Document (CCD) as defined by the Common Document Architecture (CDA) framework; 2) perform transformations on the under laying XML to “De-Identify” CCD (implementation of logic needed to address HIPAA Safe Harbor guidelines); 3) output/return the transformed document to the client invoking the operation. The resulting service can be used by holders of medical data, in particular those qualifying for Meaningful Use benefits, to convert their patient records into a form that can be used for medical research.

  • Key Conclusions/Significant Findings/Milestones reached/Deliverables
    This project was still un-going at the time SHARPS support for it ended, but it has been continued under other support. The main output from SHARPS was software to do the de-identifications and a plan to testing it on data at Carle Hospital in Urbana Illinois. The diagram below provides a high level illustration of the overall activity model expected.

    UIUC researchers will view both identified and de-identified CCD artifacts to verify and validate that the de-identification processor has removed all of the 18 HIPAA identifiers and all instances in each of the sections of the CCD where they may occur. When the UIUC researchers believe the software is filtering out all of the HIPAA identifiers only a smaller sampling of records will be reviewed and be made available for confirmation by Carle’s expert reviewers.

    IRB approvals for Human Subject Research were obtained from both the Carle IRB and the UIUC IRB processes. The persons manipulating the data obtained formal human subject research training. They have meet site access requirements and received the HIPAA training through the Carle program.

  • Materials Available for Other Investigators/interested parties
    Once validation has been completed, we expect to deliver the de-identification engine with document persistent storage services along with a browser interface enabling a side-by-side document version viewer and the capability to record annotation related to specific XML nodes within the subject document. Software developed as part of the project will belong to UIUC and will be released for the benefit of the public and research community as part of the SHARPS project. We anticipate a technical report and/or other publications that give summation of results and effectiveness of the de-identification processor and lessons learned.

    There is no plan or intension to transfer, retain, or otherwise access any identified and/or de-identified artifacts outside the confines of the physical facilitates of Carle by anyone. All work and analysis of CCD artifacts will be performed on-site at Carle with strict adherence and observation of all operational policies defined by the facility.

  • Market entry strategies
    The intent is to continue the project work into the future to complete this proof-of-concept service implementation along with the expert validation of the resulting artifacts. The significance of this exemplar could be enormous enabling generation and/or archiving de-identified versions of CCD artifacts to support a wide range of potential research topics across encounters, cohorts, institutions, and/or providers where a collection of servers would conceptually work together to manifest a co-operative repository of de-identified CCD artifacts. The design, development, and deployment of the customized functional components that perform the value-added de-identification and the objective to establish a replicable operational service model provides and demonstrates a significant functional promise for wide range of research communities.

    The opportunity to interactively work with the staff at Carle Hospital and their HIE vendor provides a unique potential strategy for this project to leverage and capitalize upon, given the objective to establish HIE peering between the health care provider and the de-identification engine incorporated into another HIE instance. The generation, collection, and processing of CCD artifacts could conceptually be persistently stored, versioned, and/or queried leveraging the HIE service extended to provide interactive interfaces for the artifact repository to support academic and clinical research access to the resource library.