Automated Policy: ILHIE PROTOTYPE

Lead Institution: University of Illinois at Urbana-Champaign

Project Leader: Carl Gunter

Research Progress

  • Abstract
    Some patients are reluctant to share data on an “all or nothing” basis, commonly viewing some data as more sensitive than others. Furthermore, some state and federal laws stipulate that certain classes of information, such as mental health, sexually-transmitted disease, and substance abuse, cannot be shared without patient consent.

    In collaboration with the Illinois Health Information Exchange (ILHIE), this project defined a technical architecture and developed an open source prototype to leverage OpenCDS, a popular Clinical Decision Support (CDS) framework, for the identification and sequestration of certain types of sensitive information from patient records flowing through an HIE. The project adopted the name “DS2” – Decision Support for Data Segmentation – because of its unique focus on the ability to detect clinical facts that may imply a sensitive condition, in addition to detecting clinical facts that are directly related to the condition.

    The architecture is based on three core functions: Predicates, which determine if a clinical document reveals a particular condition; reducers, which redact documents in order to remove a condition; and consistency checkers, which examine the original and redacted documents to help ensure that certain properties – such as medication safety – aren’t violated.

    Key contributions include the technical architecture and open source prototype, along with a suite of related software tools for creating, manipulating, converting, and testing standards-based clinical documents; a methodology for developing and implementing deterministic and probabilistic predicates; test results on a variety of machine learning techniques; a web-based “inference analyzer” for visualizing the effectiveness and the impact of predicates and reducers; and improvements to web-based software used with OpenCDS.

    The project team included researchers from UIUC, NYU, and Stanford, and software developers from HLN Consulting, LLC.

  • Focus of the research/Market need for this project
    The team worked closely with ILHIE throughout the project, helping to shape its policy by serving on its Data Security and Privacy Committee and informing its members with demonstrations of the prototype software. The team also participated in the federal Standards and Interoperability Framework on an initiative known as Data Segmentation for Privacy (DS4P). In both activities, the team witnessed a need for automated technologies to segment certain sensitive conditions in clinical documents, but found this task surprisingly difficult to perform effectively: Utilizing anonymized data from Northwestern Memorial Hospital, as well as publicly available hospital discharge data, the team demonstrated that redaction of a condition and its related clinical facts often leaves residual facts – such as co-morbidities and co-occurrences – that still reveal the condition.

    Figure 1: Inference Analyzer Web Application

    For that reason, after the architecture and basic prototype were created, the team focused its work on experimentation and development of deterministic and probabilistic predicates to detect such residuals; and optimal reducer strategies to redact them. At the same time, the NYU researchers on the team integrated insights from the contextual integrity framework into the ILHIE Prototype, and the Stanford researchers published work that suggested certain sensitive chronic disease conditions may co-occur with patterns of related clinical facts that can be more easily identified by those with specialized medical domain or statistical knowledge.

  • Project Aims/Goals
    The goal of the project was to develop a technical architecture and open source prototype for automated data segmentation in HIE, with a special focus on utilizing CDS to predict clinical inferencing that might defeat ordinary segmentation. The aim of the prototype is to be used for demonstrations and policy discussions, as a platform for testing deterministic and probabilistic segmentation strategies, and to be extended by contributors. Ultimately the goal is for the technology to be applied in production for HIEs and potentially in other domains such as clinical research, public health, and personal health records.

    Because the project involved a general-purpose clinical decision support platform (OpenCDS), another goal of the work was to contribute to that platform and its ecosystem of related software tools.

  • Key Conclusions/Significant Findings/Milestones reached/Deliverables
    The project demonstrated that the redaction of a condition and its related clinical facts sometimes leaves residual facts that, through clinical inference, can still reveal the condition. The team created, open sourced, and demonstrated prototype software leveraging OpenCDS and probabilistic classifiers to redact targeted conditions along with certain co-occurrences and co-morbidities. In using the software, the team found that deterministic rules combined with a Naive Bayes predicate can sometimes outperform more sophisticated methods such as Support Vector Machines, Decision Trees, Adaboost, etc., and may be effective enough for production use. The project also resulted in the creation of an Inference Analyzer web application (see Figure 1) to visualize and evaluate a predicate against a data set in order to better understand its predictive capabilities; and to explore a patient data set, visualize co-occurrences, and leverage the predicate to make inferences and find missing concepts.

    The team conducted two face-to-face workshops in Chicago with ILHIE in addition to dozens of web conferences and presentations to inform ILHIE and its stakeholders on issues related to data segmentation and access control policy and practice.

    The team developed numerous enhancements to the “CDS Rule Manager” and “CDS Test Manager”, web-based software tools used in conjunction with OpenCDS – including a template-based CDA editor inside the Test Manager. To integrate an HIE with its OpenCDS-based prototype, the team also created a CCD-to-vMR converter in collaboration with the OpenCDS team at the University of Utah.

    Figure 2: CDS Test Manager & CDA Editor Web Application

  • Materials Available for Other Investigators/interested parties
    Prototype software and documentation is freely available to the general public under open source (BSD) license at White papers are available at

  • Market entry strategies
    The software has been released under an open source (BSD) license so that vendors in the marketplace can extend it or incorporate it in to their own products. Team members have met and presented to a number of organizations including Systems Made Simple (SMS), SAMHSA, and Regenstrief Institute. Because the technology can be applied in a number of different Health IT domains and use cases, it can be marketed for use in HIE privacy and consent, personal health records, or as a risk management tool for researchers and public health.

Efficiently Discovering Privacy-Leaking Association Rules in Large Medical Discharge Databases
Ellick M. Chan, Peifung E. Lam, and John C. Mitchell
Under Review, 2014

Transparent Patients: Addressing Risk in Health Information Exchanges
Martin French and Helen Nissenbaum
Thematic Group (04) on the Sociology of Risk and Uncertainty, XVIII ISA World Congress of Sociology (, Yokohama, Japan, July 13-19, 2014

The Politics of Personal Health Information Flows: PETs Alone Will Not Save Us
Martin French and Helen Nissenbaum
Politics of Surveillance Workshop, (, Ottawa, Canada May 8-10, 2014

Caring for your Data Double: New Risks and Responsibilities in an Era of Ubiquitous Health Information Flows
Martin French and Helen Nissenbaum
The 6th Biannual Surveillance and Society Conference, (, Barcelona, Spain, April 24-26, 2014

Understanding the Challenges with Medical Data Segmentation for Privacy
Ellick M. Chan, Peifung E. Lam, and John C. Mitchell
USENIX Workshop on Health Information Technologies, August 2013

SHARPS Project and ILHIE Prototype
Carl A. Gunter and Mike Berry
Presentation to Illinois Patient Consent Management Workshop, Chicago, IL, June 26, 2013

Decision Support for Data Segmentation (DS2): Contextual Integrity Considerations
Martin French, Helen Nissenbaum, Mike Berry, Noam Arzt, and Carl A. Gunter

Decision Support for Data Segmentation (DS2): Technical and Architectural Considerations
Mike Berry, Noam Arzt, Carl Gunter, and Daryl Chertcoff

Governing Health Information in the Surveillance Age: Operationalizing Health Information Privacy in the United States
Martin French and Helen Nissenbaum
Annual Conference of the British Sociological Association
( Grand Connaught Rooms, London, England, April 3, 2013

Operationalizing Privacy as Contextual Integrity in Health Information Exchanges
Martin French and Helen Nissenbaum
International Conference on Law & Society (, Boston, MA, May 30, 2013

Operationalizing Privacy in Health Information Exchanges: A Preliminary Analysis
Martin French and Helen Nissenbaum
Privacy Research Group, New York University, New York, NY, June 12, 2013

Operationalizing Privacy in Health Information Exchanges
Martin French and Helen Nissenbaum
SHARPS/OHIT Workshop on Decision Support for Data Segmentation (HIE), Northwestern University, Chicago, IL, July 23, 2013.

Report of Preliminary Findings and Recommendations
State of Illinois Health Information Exchange Authority Data Security and Privacy Committee, September 2012

From Epidemiological Surveillance to ‘Infodemiology’: Information Privacy in a New Public Health Paradigm
Martin French
SHARPS/OHIT Workshop on Audit and Health Information Exchange (HIE), Northwestern University, Chicago, IL, August 15, 2012