Concept acquisition and improved in-database similarity analysis for medical data

2018 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Concept acquisition and improved in-database similarity analysis for medical data​
Wiese, I.; Sarna, N.; Wiese, L. ; Tashkandi, A. & Sax, U. ​ (2018) 
Distributed and Parallel Databases37(2) pp. 297​-321​.​ DOI: https://doi.org/10.1007/s10619-018-7249-x 

Documents & Media

License

GRO License GRO License

Details

Authors
Wiese, Ingmar; Sarna, Nicole; Wiese, Lena ; Tashkandi, Araek; Sax, Ulrich 
Abstract
Efficient identification of cohorts of similar patients is a major precondition for personalized medicine. In order to train prediction models on a given medical data set, similarities have to be calculated for every pair of patients—which results in a roughly quadratic data blowup. In this paper we discuss the topic of in-database patient similarity analysis ranging from data extraction to implementing and optimizing the similarity calculations in SQL. In particular, we introduce the notion of chunking that uniformly distributes the workload among the individual similarity calculations. Our benchmark comprises the application of one similarity measures (Cosine similariy) and one distance metric (Euclidean distance) on two real-world data sets; it compares the performance of a column store (MonetDB) and a row store (PostgreSQL) with two external data mining tools (ELKI and Apache Mahout).
Issue Date
2018
Journal
Distributed and Parallel Databases 
ISSN
0926-8782
Language
English

Reference

Citations


Social Media