Category: Information Models

Descriptive_Zoopraxography_Athlete,_Running_Long_Jump_Animated_14

Healthcare Data Issues – Part 3 (Data Model vs Dataset)

A series of articles penned by Dr. Navin Ramachandran, Dr. Wai Keong Wong and Dr. Ian McNicoll. We examine the current approaches to data modelling which have led to the issues we face with interoperability.

The other parts of the article can be found here: Part 1, Part 2.

We have mentioned the terms “data model” and “dataset” many times. It is extremely important to understand the difference between the two terms as they have often been conflated, resulting in a lack of understanding of the importance of the data model. We believe that the absence of a proper data model in most medical systems has led to the current problems with interoperability.

The concept of the data model (vs a dataset) is a difficult one to grasp at first. Let’s look at the (more simple) concept of the dataset first. In a particular dataset, an organisation may stipulate the collection of blood pressure readings in the pre-surgery clinic. But these 2 values are just single readings and only some of the variables that can be defined for blood pressure. The blood pressure data model should account for all the potential variables related to blood pressure and their variance with time. For example, the blood pressure archetype that has been defined on openEHR (a mini data model – link: http://openehr.org/ckm/#showArchetype_1013.1.130_MINDMAP ) looks like this on a mind map:

BP mindmap - att desc

The basic blood pressure dataset can easily be mapped to the systolic and diastolic components of the data model (top right of the mind map). All the different components of the model do not have to be used, but are in place, ready for future expansion of the system.

If this data model is used across institutions, different systems may use different components of the model, but the relationships and meanings of each component would be defined from the outset, making transfer of information between systems easier.

“The data model should account for all the potential variables and their variance with time.”

To look at it in reverse, the dataset is actually an expression of the underlying data model. A 2-dimensional representation of a 3- or 4-dimensional entity. A good analogy is that the dataset is like a photograph, representing only a snapshot of a whole being at one moment in time. The data model describes the whole being. The dataset is just one view of it.

 

Descriptive_Zoopraxography_Athlete,_Running_Long_Jump_Animated_14

 

Social_Network_Analysis_Visualization

Healthcare Data Issues – Part 2 (Data Transfer)

A series of articles penned by Dr. Navin Ramachandran, Dr. Wai Keong Wong and Dr. Ian McNicoll. We examine the current approaches to data modelling which have led to the issues we face with interoperability. Main image from: Grandjean, Martin (2014). “La connaissance est un réseau”. Les Cahiers du Numérique 10 (3): 37–54. Retrieved 2014-10-15.

The other parts of the article can be found here: Part 1, Part 3.

The majority of current approaches to resolving data transfer issues revolve around either:

  • Replacing all the current systems in a region with a single vendor, using a single data model across the region. This circumvents the need for interoperability of systems.
  • Developing common datasets and messaging standards to maximise interoperability of systems.

Interoperability is a term that is often used in medical informatics, exalted as the answer to all our woes. However this interoperability can exist on a range from partial to complete. Unfortunately, we believe that even the most recent attempts at interoperability will only achieve partial success, and we explore the underlying reasons in this section.

 

Building a bigger silo

Also known as the monolithic model. In its most basic form, you would replace all the disparate systems in one centre with a single vendor solution:
Monolith 1
This has happened at many institutions, using solutions such as Cerner, Epic, Allscripts and Intersystems. However this does not resolve the issue of data transfer between hospitals or with community systems, therefore in some regions all the institutions have moved to the same vendor using a similar / identical data model:
Monolith 2
However this approach is ultimately problematic as:

  • It may be politically and technologically difficult to transition all the institutions to the same provider.
  • One system will not be able to satisfy the varied needs of all the different institutions. As we have identified previously, the documentation accuracy of smaller “best of breed” solutions is often better than the equivalent module of the monolithic solution, as they are more responsive to local needs. Therefore we often find multiple other smaller systems being run concurrently to the monolithic solution (eg cancer management systems separate to Cerner EHR at The Royal Free Hospital), which would break this monolithic model.
  • What happens outside this region? e.g How do San Francisco’s hospitals communicate with Los Angeles? The complexity of the healthcare environment means that there are always more boundaries, and the logical end-point of the single-system approach must always end in one single-system for an entire nation.
  • How will this system speak to social care, police, government systems? This and the previous issue are partially overcome by using common datasets and messaging standards, but that brings in the added problems of that solution too (see below).
  • The adverse effects of monopoly. The power of the apps revolution is in innovation and “fail fast”.

“The complexity of the healthcare environment means that there are always more boundaries, and the logical end-point of the single-system approach must always end in one single-system for an entire nation.”

 

Using common datasets, messaging standards and APIs (eg using FHIR)

This method works in the following manner. A central body defines the dataset for a particular medical condition or procedure and publishes it. Each individual institution then implements the dataset within its system, usually by mapping it to existing fields in existing tables, or adding new fields.

Central Dataset

The central body then defines messaging datasets for transmission of data between organisations – this may be the same as the initial dataset or a subset of this. This messaging dataset defines how and what data will be transmitted and received in different scenarios (eg for a referral form, for a discharge summary).

For example the Royal College of Surgeons has released a dataset for The National Prostate Cancer Audit: http://www.npca.org.uk/wp-content/uploads/2013/07/NPCA_MDS-specification_-V2-0_-17-12-14.pdf

A very important fact to note is that this is only a small portion of all the data that will be collected during their care, which has yet to be defined properly, and will in fact naturally vary between hospitals. Therefore if everything works well, the data which has been defined in the central dataset will be available for analysis and transfer. But the other peripheral data that has not been defined, will never be meaningfully stored or transmitted.

This is the major problem with this model, that only a very small dataset can be defined centrally – the lowest common denominator – which means that other potentially valuable data (which may provide great insight into causes of, or treatments for the disease) will be lost to analysis.

Furthermore, a lot of these central bodies are not versed in informatics and are still very much in thinking using the paper paradigm. Their datasets are often no more sophisticated than a digital representation of a paper form, and not derived from an overarching data model. Sadly this leads to a situation where the paper form is the dataset, and dataset is the data model!

“Only a very small dataset can be defined centrally – the lowest common denominator – which means that other potentially valuable data (which may provide great insight into causes of, or treatments for the disease) will be lost to analysis.”

Other significant drawbacks to this model are:

  • Loss of meaning at each stage.
    • When the central dataset is deployed locally, it has to be translated to the local data model. Then when it is transmitted, it is translated into the messaging model and then finally into the model of the receiving institution’s system. At each translation, there will likely be some loss of meaning of the data.

Transfer Message

  •  Updating pathways / datasets across hospitals, while maintaining functioning APIs / messaging standards.
    • An updated data model can take anywhere between 1 to 12 months to deploy in a hospital (this is the reality on the ground). This is because they have to be translated to the hospital systems’ data models and any systems interfaces and user interfaces updated, without breaking the underlying tables.
    • In the app world, there can be many app updates in 1 month. But if we very conservatively say that we may make 2 updates to an information model in a year, some hospitals may not have even deployed the first change by the end of the year. Therefore some hospitals would still be on version 1, others on version 2 and the most efficient on version 3. Therefore 3 different models would be in operation at the same time at different hospitals – which common API / messaging standard would now be the correct one to use?

 

The emergence of the granular API – HL7 FHIR

For the last decade cross-system interoperability in the US has largely focused on efforts to use the HL7v3 and CDA messaging standards. In spite of significant investment via the “Meaningful Use” program, useful interoperability has remained elusive, in part due to the complexity of the HL7v3 standard. In response the HL7 community developed FHIR (Fast Health Interoperability Resources) adopting modern software approaches such as granular APIs and a RESTful interface, and in line with influential industry opinion as per the JASON report.

FHIR has found favour amongst most of the major US system vendors, represented by the Argonaut Project, and without doubt is a major step forward in reducing the barriers to ease of data exchange between systems.

http://www.hl7.org/documentcenter/public_temp_EF13C1F7-1C23-BA17-0CB343BB6F370C99/pressreleases/HL7_PRESS_20141204.pdf

We strongly support the industry-led adoption of FHIR, nevertheless, in our view, the scope of FHIR falls short of the changes required to support a vibrant eHealth industry.

  • FHIR is expressly engineered to support data exchange not data persistence and querying.
  • Support for each new FHIR resource needs to be programmed by each vendor, unlike openEHR where new archetype models are consumed automatically (i.e in openEHR, data conforming to these new models can be immediately stored and queried without any re-programming).
  • FHIR is expressly designed to meet only key, common interoperability areas. It does have a capacity for extension, but this remains immature and beyond the core scope of the project.
  • FHIR resources remain technical artefacts, whilst archetypes are designed to be designed and maintained by clinicians.

This assessment may seem to put FHIR directly in opposition to openEHR but our view is that each, playing to its strength, makes for a compelling combination. openEHR can provide the methodology and toolset to build the clinical content definitions needed to underpin FHIR – this is the approach being used by NHS England.

“openEHR can provide the methodology and toolset to build the clinical content definitions needed to underpin FHIR.”

MediaWiki_1.24.1_database_schema.svg

Healthcare Data Issues – Part 1 (Internal Data Modelling)

A series of articles penned by Dr. Navin Ramachandran, Dr. Wai Keong Wong and Dr. Ian McNicoll. We examine the current approaches to data modelling which have led to the issues we face with interoperability.

The other parts of the article can be found here: Part 2, Part 3.

The current healthcare data landscape is incredibly complex, with great variations between institutions, and even between systems within a single institution. This means that healthcare data can be incredibly difficult to collect and share. Here we explore the underlying problems.

 

Common Problematic Approaches To Medical Data Models And Storage

The major obstacles to efficient sharing of health information are that:

  • Data is stored in multiple different systems using many different technology stacks. While this fosters innovation and competition, it does of course make data sharing difficult.
  • Many companies benefit commercially from the lack of interoperability of their systems. Data is generally “locked-in” to a particular vendor solution.
  • Even where companies are minded to adopt an “open data” policy, the current technical and regulatory environment makes interoperability highly demanding.
  • Even if interconnectivity is resolved, the variable information models underlying the data storage severely limit the richness of transmitted data, since complex and clinically-risky transforms are required.

 

Multiplicity of information systems within a hospital

Within a single hospital, there are usually multiple systems which store data including:

  • PAS – patient administration system.
  • RIS – radiology information systems.
  • Pathology systems.
  • TomCat – Cardiology system.
  • Cancer reporting and management systems.

At some hospitals, this is extreme. For example, at University College London Hospitals NHS Trust, it is estimated that there are currently over 300 different such systems, some user-created (eg with MS Access). This is rightly seen by the IT department as an unsustainable long-term solution. But it reflects the inability of the main patient record system to capture the nuanced data that these different teams require.

Even in hospitals that have installed a large megasuite EHR such as Epic or Cerner, we often find multiple other systems being run concurrently (eg cancer management systems separate to Cerner EHR at The Royal Free Hospital). This is because the documentation accuracy of smaller “best of breed” solutions is often better than the equivalent module of the monolithic solution:

http://www.informationweek.com/healthcare/electronic-health-records/physicians-prefer-best-of-breed-emergency-department-ehrs/d/d-id/1108610

The continued development of the smaller systems tends to be more agile and reactive to the users’ needs, especially at a local level, than the monolithic system. The advantage of the monolithic system however is, at least theoretically the (more) consistent data model across the whole institution, allowing easier sharing of data. The “EHR” in this context has come to mean an all-encompassing solution, providing the varied application needs of a wide variety of users, along with a coherent data platform, all provided by a single vendor.

“Separating the data layer from the applications layer would allow us to maintain a consistent data model, while permitting more reactive / agile development of applications locally.”

Separating the data layer from the applications layer would allow us to maintain a consistent data model, while permitting more reactive / agile development of applications locally. A significant proviso is that the information models within the data layer must also be amenable to agile development, to meet this faster demand-cycle. Indeed new data models should be developed, adapted and incorporated into the data platform without any need to re-program the underlying database. New or updated data models must be implemented as a configuration change, not as a piece of new software development.

Inconsistent data models within a single institution

There are usually multiple different systems running concurrently with a hospital. These multiple systems each have their own data schema and information models. They may contain multiple tables which themselves may not be standardised in their nomenclature, across the single system.

For example, here is an extremely abbreviated dataset for a few of the many different systems at one theoretical hospital. Vertical columns contain fields for data collected in a system, the “ID” (pink) being the primary key:

Hosp 1 schema

If we were looking to extract the blood pressure (yellow) and temperature (red) readings, we can immediately see the difficulty, as the different systems often have different standards for field nomenclature and formatting:

  • Many different names for fields containing the same information type. eg “Temp” vs “Temperature”
  • Different data units / formatting. eg “BP = 120/80” vs “SBP=120 and DBP=80; “XXX XXX XXX” vs. “XXX-XXX-XXX”.

The data is often stored in multiple tables, with minimal relations / meanings established (usually only the primary key establishes any relationships or meaning). Therefore there is no way of addressing data in a field without knowing the exact path (table and field name).

Inconsistent data models across institutions

When we are considering transfer of data between institutions, the situation is even more complex. For example, these are theoretical datasets for 2 different hospitals:
Hosp 1 schemaHosp 2 schema

This illustrates the following issues that we may encounter:

  • Tables may:
    • Be present in some hospital / absent in others.
    • Have different names in different institutions.
    • Contain different fields in different institutions.
    • Contain fields with different formatting (eg XXX XXX XXX ; XXX-XXX-XXX).
    • Have different content or terminology bindings (eg ICD-10 vs SNOMED-CT).
  • Some specialities may not be present at some institutions, and therefore not have any corresponding data.
  • Even the same vendor often has different information models / data schema at different institutions – the core data such as demographics will likely be the same, but more personalised data for each department are often configured during deployment and personalised to the department.

Therefore again there is no way of addressing data in a field without knowing the exact path (table and field name). Furthermore, when this information is extracted and transferred to the next institution, it becomes difficult to map it to the correct fields on the second system.

Structured vs unstructured data

We have seen above how difficult it is to extract structured meaningful data from one system and then map it correctly to another system, due to the poor information modelling that is pervasive throughout healthcare.

In practice this structured transfer of information is nigh on impossible, and therefore the lowest common denominator is often used – a wholesale data dump of unstructured information. This is usually in the form of a PDF file, or even copy-pasted information into a spreadsheet / word-processed document.

Whilst this unstructured method can convey all the information required, it has very significant problems:

  • The most important information is often hidden amongst numerous pages of irrelevant data. This invariably leads to the information being incompletely processed or even ignored. Imagine a doctor parsing through a hundred pages of a pdf during a 15 minute consultation – this time would be better spent speaking to and examining the patient.
  • This data is very difficult to computationally analyse.
  • The copy-paste method is prone to significant error.

Note:
We must acknowledge that unstructured data itself can have its uses. Most recorded clinical notes are saved in an unstructured form, and this can help to convey other features of an encounter that are difficult to codify. For example, we may record the fact that the patient looks more dishevelled than usual at an appointment, which may be a marker for a significant change in mood in a patient with depression.

However these notes are often filled in using unique shorthand, which is different at each site, specific to subspecialties, and cannot easily be interpreted, even by sophisticated natural language processing algorithms. See the example below in yellow. This is typical of the recorded outcome from a cancer meeting regularly attended by one of the authors – it would make sense to the small group of people who attend the meeting, but very few people outside it.

Unstructured data

How did we get to the current situation?

Traditionally, most doctors and medical staff have worked with a paper medical record. Indeed the majority of clinical records worldwide are still recorded on paper. And though most hospitals within developed countries have access to at least a basic electronic record, paper entry still plays a large role.

Our early attempts at producing electronic models have been basic at best. In most cases, there has been no business process modelling, and the data schema are literally electronic copies of paper forms that have been used clinically.

“If the core infrastructure is inadequate, there is only so much patching you can do before you reach the limits of the platform.”

Most importantly there are no true information models underlying the data schema, to give them intrinsic meaning. This is a critical aspect that has been ignored in most systems, and we will discuss this fully later.

These basic schema have usually been expressed using the relatively limited functionality of relational databases where “the table is the data model”. When clinical pathways change over time, we are required to change the underlying data schema, but this can often break the relations in the table and therefore any changes can take a lot of time to implement (ie we have lost agility). Some leading systems use NoSQL solutions such as MUMPS (Epic, Intersystems, VA VISTA) which are more flexible, but the data models remain locked in to each system and the specific NoSQL database.

To cover the inadequacies / inefficiencies of these systems, many APIs and messaging standards have been developed, but we feel that they can only ever partly overcome the underlying problems. If the core infrastructure is inadequate, there is only so much patching you can do before you reach the limits of the platform.

To compound these problems, every institution / vendor has taken slightly different approaches to modelling their systems. The reasons for this are complex:

  • Some companies / institutions want to lock in their data to maintain competitive advantage. For example in the US, there is growing frustration that the big EHR vendors have built proprietary networks using public (HITECH) funding, and are charging exorbitant fees to build interfaces to access them. http://geekdoctor.blogspot.co.uk/2015/06/so-what-is-interoperability-anyway.html
  • The demands of local project timescales often make broad collaboration difficult.
  • Clinicians often exert considerable local power and insist on doing things “their way”. Clinical evidence is often lacking for such variation. They are also often not aware of such local variance.
  • Different branches of the profession many have quite different information granularity needs. For example the “Family history of breast cancer” will be recorded at different levels of depth in a patient-held record, a family doctor record and the records of a geneticist or research institution. For the information to flow between the systems, the models supporting these needs have to be aligned.
  • Recent changes in practice and technology mean that large volumes of data (“-omics” data such as a highly complex genetic profile – genomics) may need to be consumed and displayed to the patient. Therefore centres with the ability to process “omics” data now need the further ability to support a “big data” platform.

We are still relatively early on in the transition from paper to a fully electronic record, and we believe at a crucial point. The next steps will decide who controls the platforms and how easy it will be to innovate.

We also draw parallels to other systems that have already undergone this process. Within the medical field, PACS systems in medical imaging previously used proprietary models for their metadata. This metadata (including demographics and study details) is crucial to the study as it gives it meaning. It is only a few kilobytes in size per study and the total amount of metadata in a large PACS system will only be a few hundred gigabytes at the most.

As PACS systems have matured, we are now changing vendors and therefore migrating our data onto new systems. However due to the proprietary nature of the metadata, this has led to problematic corruption during transfer, which has resulted in significant clinical impact. Increasingly PACS platforms are required to be Vendor Neutral Archives (VNAs), based on standardised data and metadata formats to avoid vendor lock-in.

“We believe that current data storage should be within a vendor neutral architecture.”

When EHR platforms have matured to the same point, and people are considering moving to a second vendor, this migration will be a lot more problematic as we will be dealing with petabytes or more of meaningful data. Therefore we believe that current data storage should be within a vendor neutral architecture. This critical aspect is often underestimated, as the deleterious effects will only be evident in the medium term.

 

Over-reliance on terminologies alone

Terminologies such as LOINC, RxNorm, dm+d and SNOMED CT play a vital role in allowing shared meaning to be passed between systems, and help to power features such as clinical decision support.

More recently, terminologies have started to incorporate some aspects of “ontology” as a means of embedding even richer meaning into terminology, allowing complex computational analysis. This approach works well where the ontology is describing entities in the biomedical domain e.g. diseases, organisms, bodily structures, medications. These are scientific concepts grounded in “truth”, at least as far as is understood by current science.

Unfortunately the enthusiasm of many ontology researchers for this powerful approach has led to some promoting its use as a panacea for all clinical information modelling. But it is ill-suited for the description of the working arrangements, culturally and nationally determined clinical processes, and sheer human variation inherent in clinical practice, which largely evades logical analysis. In this regard the conceptualisation of medical practice has more in common with political or religious classification: ever-changing, nuanced and contentious. There are significant parallels with Clay Shirkey’s critical analysis of the Semantic Web: http://www.shirky.com/writings/herecomeseverybody/semantic_syllogism.html

In addition, the development of logical ontologies requires a deep understanding of description logics, and of tooling and trains of thought, that are well beyond the expertise of average clinicians, system developers, or even many experienced clinical informaticians. Even if logics provided the answer, we simply do not have enough clinical logicians to build and maintain the required ontologies.

Current clinical informatics thinking has retreated from attempts to apply ontology wholesale – preferring instead to find ways of making optimal use of terminologies and ontologies within the context of conventional information models. Over time, ontological methods will undoubtedly grow in significance and ease of use, but an over-concentration on this approach during the past decade has been to the detriment of of achieving practical healthcare interoperability.