A Semantic Model is defined as a conceptual data model in which the model describes the meaning of its instances. This is a HOT topic when we are talking about web standards. Being able to describe the metadata in an electronic re-useable way opens up a lot of opportunities for better quality and control.
Kerstin Forsberg is an expert in Semantic Models currently working at AstraZeneca. She has 25+ years of experience in information and knowledge management strategies, standards and solutions across the pharmaceutical, news and automotive sectors.
I met Kerstin Forsberg at the 2012 CDISC EU Interchange where she and Frederik Malfait, working for Roche, presented on how they used 'Semantic Models for CDISC Based Standards and Metadata Management.' Frederik described how Roche's Global Standards Office are well on their way to achieving their goal of a creating a metadata repository using semantic web standards such as OWL/RDF (Web Ontology Language, using RDF, Resource Description Framework, defined by W3C's stack of semantic web standards). Kerstin described how OWL/RDF is used by Google, Bing and Yahoo to publish a joint vocabulary, schema.org. It is also used by NCI to publish their thesaurus which is the source for CDISC's controlled terminology. They hope to use this technology to improve operational activities such as electronically building CRFs, and to strategically change things such as maximizing cross study analysis capabilities.
In Jan 2013, Kerstin announced that the CDISC2RDF team has released the CDISC SDTM Model/IG, and SDTM, CDASH, ADaM and Terminology standards as OWL/RDF files for demonstration and discussion with the clinical data standard community. In the past, these standards have all been in .pdf documents which had programmers recreating the information on their own in excel or database formats.
Semantic Web Standards and Clinical Research
Making the standards machine processable allows for them to directly be used programmatically. Programmatically mean both coding traditional "program" and also, by making clinical data standards and metadata machines processable they can be:
- Directly queried using SPARQL (a W3C semantic web standard).
- Be used to check consistently between different standards by expressing the rules in SPIN (a W3C semantic standard based in RDF to specify rules and logical constraints).
- Be enriched by inference of relationships and also be extended using SPARQL Construct.
Some of the benefits of OWL/RDF files recognized by BioClinica are:
- Programmatically generate CDASH datasets with all required and optional fields.
- Check existing datasets against the SDTM standard to make sure all expected fields are present.
- Check all code lists against the terminology to make sure all values are correct.
- Automatically create SDTM output from CDASH output with the addition of code to create the fields that are different.
Semantic Model Q&A with Kerstin Forsberg
Q: What is your involvement with this project?
I'm the initiator together with Frederik and act as a kind of networker hub between CDISC community and the W3C and academia. As I am also engaged in the IMI project EHR4CR this is also part of my networking.
Q: What do you see as the future of this technology?
In the same way as the first two generations of web drastically changed the way we share documents and interact, I foresee that this third generation of web standards and technologies will drastically change the way we use, and reuse, data.
BioClinica is interested in the future of this technology and would like to thank Kerstin for all of the volunteer work that has gone into this project.
Kerstin is based in Sweden. Feel free to contact Kerstin with any input and questions on Semantic Models and web standards as they relate to clinical research.