Int J Med Inform 2019 09 7;129:133-145. Epub 2019 Jun 7.
Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, South Korea. Electronic address:
Background: Standardized healthcare documents have a high adoption rate in today's hospital setup. This brings several challenges as processing the documents on a large scale takes a toll on the infrastructure. The complexity of these documents compounds the issue of handling them which is why applying big data techniques is necessary. The nature of big data techniques can trigger accuracy/semantic loss in health documents when they are partitioned for processing. This semantic loss is critical with respect to clinical use as well as insurance, or medical education.
Methods: In this paper we propose a novel technique to avoid any semantic loss that happens during the conventional partitioning of healthcare documents in big data through a constraint model based on the conformance of clinical document standard and user based use cases. We used clinical document architecture (CDAR) datasets on Hadoop Distributed File System (HDFS) through uniquely configured setup. We identified the affected documents with respect to semantic loss after partitioning and separated them into two sets: conflict free documents and conflicted documents. The resolution for conflicted documents was done based on different resolution strategies that were mapped according to CDAR specification. The first part of the technique is focused in identifying the type of conflict in the blocks that arises after partitioning. The second part focuses on the resolution mapping of the conflicts based on the constraints applied depending on the validation and user scenario.
Results: We used a publicly available dataset of CDAR documents, identified all conflicted documents and resolved all the them successfully to avoid any semantic loss. In our experiment we tested up to 87,000 CDAR documents and successfully identified the conflicts and resolved the semantic issues.
Conclusion: We have presented a novel study that focuses on the semantics of big data which did not compromise the performance and resolved the semantic issues risen during the processing of clinical documents.