Healthcare Data Issues – Part 3 (Data Model vs Dataset)

A series of articles penned by Dr. Navin Ramachandran, Dr. Wai Keong Wong and Dr. Ian McNicoll. We examine the current approaches to data modelling which have led to the issues we face with interoperability.

The other parts of the article can be found here: Part 1, Part 2.

We have mentioned the terms “data model” and “dataset” many times. It is extremely important to understand the difference between the two terms as they have often been conflated, resulting in a lack of understanding of the importance of the data model. We believe that the absence of a proper data model in most medical systems has led to the current problems with interoperability.

The concept of the data model (vs a dataset) is a difficult one to grasp at first. Let’s look at the (more simple) concept of the dataset first. In a particular dataset, an organisation may stipulate the collection of blood pressure readings in the pre-surgery clinic. But these 2 values are just single readings and only some of the variables that can be defined for blood pressure. The blood pressure data model should account for all the potential variables related to blood pressure and their variance with time. For example, the blood pressure archetype that has been defined on openEHR (a mini data model – link: ) looks like this on a mind map:

BP mindmap - att desc

The basic blood pressure dataset can easily be mapped to the systolic and diastolic components of the data model (top right of the mind map). All the different components of the model do not have to be used, but are in place, ready for future expansion of the system.

If this data model is used across institutions, different systems may use different components of the model, but the relationships and meanings of each component would be defined from the outset, making transfer of information between systems easier.

“The data model should account for all the potential variables and their variance with time.”

To look at it in reverse, the dataset is actually an expression of the underlying data model. A 2-dimensional representation of a 3- or 4-dimensional entity. A good analogy is that the dataset is like a photograph, representing only a snapshot of a whole being at one moment in time. The data model describes the whole being. The dataset is just one view of it.




Leave a Reply