Audit: SIMILAR

Lead Institution: University of Illinois at Urbana-Champaign

Project Leader: Carl Gunter

Research Progress

  • Abstract

    This project focused on technologies for role engineering and anomaly detection to support EBAM. The primary strategy was to focus on the access histories of chart users and develop ways to tell when chart users are similar. This supports grouping the chart users into roles with similar privileges or determining when an access or access pattern is unusual because it deviates from expectations based on similar users.

  • Focus of the research/Market need for this project

    See EBAM.

  • Project Aims/Goals

    The project consisted of four lines of effort. These were: (1) investigations into how audit logs can be used to predict and evolve roles, (2) how topic modeling can be used to develop new concepts of anomalous access patterns, (3) systematic risk assessment techniques that can be used to determine whether a decision about access should be made at the time the access is requested or later during an audit, and (4) techniques to determine medical specialties from access behavior of chart users and physician NPI specialty codes.

  • Key Conclusions/Significant Findings/Milestones Reached

    Role prediction is the problem of deciding whether a given chart user has a given role or not based on the access pattern of the user. We studied this concept with a naïve Bayes classifier technique applied to Cerner roles for the NMH (four month) data set. The main observation was that many of the roles are very similar and cannot easily be told apart, but it also makes sense to group such similar roles to improve the accuracy of prediction. We developed a technique known as the role-up algorithm to do this and showed how it could parameterize tradeoffs between accuracy of prediction versus precision of role. In a subsequent study we looked at the idea of redefining the privileges associated with roles (were a privilege consists of the right to invoke a specific reason for accessing a record) and showed an algorithm for trading off between the coherence of the role (in the sense that its members have similar behavior) versus limiting the deviation of recommended new roles compared to previously established roles.

    Most of the studies available in the literature on audit log analytics use a technique we call the Random Object Access Model (ROAM). The idea is to assume that illegitimate accesses look random because they are not made for a medical reason. This technique has the strong virtue that it allows validation of audit log analytics without the need to involve domain experts (like doctors who can score how impropriate an access appears to be). An audit analytic technique is considered to be good if it has good ROC AUC when applied to find random accesses added to a real hospital access log. ROAM has the deficit that there is no direct proof that all (or even most) illegitimate access seem to be random. We conducted a study to develop an alternative model in which accesses are random at the level of topic rather than at the level of individual access. This was done by using Latent Dirichlet Analysis (LDA) to develop hospital topics and then use them generatively to create random topics for groups of chart user accesses. We studied the effectiveness of standard outlier techniques on LDA models of a group of Cerner positions in the (four month) NMH data set, an approach we called the Random Topic Access Model (RTAM). We found that LDA does quite well in finding hospital topics, but when we use these with Cerner positions the effectiveness of the detection depends somewhat on whether the positions represent specialties. For instance a position like Attending Physician CPOE includes many different specialties. We aimed to address this problem in our studies on specialties.

    The insider access problem can be addressed through two general strategies: i) prospective methods, such as access control, that make a decision at the time of a request, and ii) retrospective methods, such as post hoc auditing, that make a decision in the light of the knowledge gathered afterwards. While it is recognized that each strategy has a distinct set of benefits and drawbacks, there has been little investigation into how to provide system administrators with practical guidance on when one or the other should be applied. We developed a framework to compare these strategies on a common quantitative scale. To do this we translate these strategies into classification problems using a context-based feature space that assesses the likelihood that an access request is legitimate. We then use a new technique called bispective analysis to compare the performance of the classification models under the situation of non-equivalent costs for false positive and negative instances. This represents a significant extension on traditional cost analysis techniques, such as analysis of the receiver operator characteristic (ROC) curve. Using domain-specific cost estimates and access logs of several months the NMH data set we were able to demonstrate how bispective analysis can support meaningful decisions about the relative merits of prospective and retrospective decision making for specific types of hospital personnel. In particular, we were able to demonstrate three examples of bispective analysis for the job titles Patient Care Assistive Staff, Anesthesia CPOE, and Rehabilitation – Physical Therapist, estimating key risk parameters for each job title, and then applying bispective analysis to determine if a prospective or a retrospective model should be applied on this job title. We show that, for some jobs, choosing a prospective model will minimize cost, disagreeing with techniques that do not take cost into account.

    Medical specialties provide essential information about which providers have the skills needed to carry out key procedures or make critical judgments. They are useful for training and staffing and provide confidence to patients that their providers have the experience needed to address their problems. In one line of study on EBAM we showed how machine learning classifiers can be trained on treatment histories to recognize medical specialties. Such classifiers can be used to evaluate staffing and workflows and have applications to safety and security. We focus on treatment histories that consist of the patient diagnoses. We found that some specialties, such as urologist, can be learned with good precision and recall, while other specialties, such as anesthesiology, are less easily recognized. We call the former diagnosis specialties and explored four machine learning techniques for them, which we compare to a naive baseline based on the diagnoses most commonly treated by specialists in a training set. We find that these techniques can improve substantially on the baseline and that the best technique, which uses Latent Dirichlet Allocation (LDA), provides precision and recall above 80% for many diagnosis specialties based on a study with one year of chart accesses and discharge diagnoses for the NMH data set. In a further set of studies we considered whether these techniques can be used to discover “new” medical specialties that are like the certified ones but have not been officially certified as specialties. This provides a resource for evolving medical specialty classifications and aids us with EBAM access analysis because it allows us to work with de facto specialties rather than just recognized ones. We studied additional machine learning techniques for this and found at least two ways to get reasonable discovery. Both techniques led us to propose a new breast cancer specialty as distinct from specialties associated with oncology.

  • Available Materials for Other Investigators/Interested parties

    Details of the work on role prediction and role evolution were published in AMIA ’11 and SACMAT 13 respectively. We reported our work on RTAM in ISI ’13 and in Siddharth Gupta’s 2013 MS thesis. Our work on bispective analysis and specialty learning is under review.

  • Market entry strategies

    See EBAM.

Bibliography
Learning a Medical Specialty from a Provider Treatment History
Xun Lu, Aston Zhang, Carl A. Gunter, Daniel Fabbri, David Liebovitz, and Bradley Malin
Under Review

Learning to Discover New Medical Specialties via Patient Treatment Histories
Xun Lu, Aston Zhang, Carl A. Gunter, Daniel Fabbri, David Liebovitz, and Bradley Malin
Under Review

Decide Now or Decide Later? Quantifying the Tradeoff between Prospective and Retrospective Access Decisions
Wen Zhang, You Chen, Ted Cybulski, Carl A. Gunter, Daniel Fabbri, Patrick Lawlor, David Liebovitz and Brad Malin
Under Review

Diagnosis Based Specialist Identification in the Hospital
Xun Lu
Master of Science Thesis, University of Illinois at Urbana-Champaign, May 2014

Modeling and Detecting Anomalous Topic Access in EMR Audit Logs
Siddharth Gupta
Master of Science Thesis, University of Illinois at Urbana-Champaign, May 2013

Modeling and Detecting Anomalous Topic Access
Siddharth Gupta, Casey Hanson, Carl A. Gunter, Mario Frank, David Liebovitz, and Bradley Malin.
IEEE Intelligence and Security Informatics (ISI 13), June 2013

Evolving Role Definitions through Permission Invocation Patterns
Wen Zhang, You Chen, Carl A. Gunter, David Liebovitz, and Bradley Malin
ACM Symposium on Access Control Models and Technologies, June 2013

Role Prediction using Electronic Medical Record System Audits
Wen Zhang, Carl A. Gunter, David Liebovitz, Jian Tian and Bradley Malin
AMIA 2011 Annual Symposium, Washington, DC, October 2011

Experience-Based Access Management: A Life-Cycle Framework for Identity and Access Management Systems
Carl A. Gunter, David M. Liebovitz, and Bradley Malin
IEEE Security & Privacy, September/October 2011