The development of clinical decision support and statistical predictive models has been historically made by manually selecting and tuning sets of predictive variables. This research axis aims at going over these historical approaches, by learning from real-word data, such as EHRs or cohort data. EHRs consist of structured data, such as demographics, diagnosis, medication exposures, omics data and unstructured data, such as clinical notes or pathology reports.However, the use of EHR data for any precision medicine application represents an initial and significant information extraction challenge because of their heterogeneity, incompleteness, and dynamic nature.
The aim of this axis is to develop methods and tools for leveraging patients’ data in their wide variety and complexity. This encompasses the extraction and transformation of raw data into engineered features and learned representations of good quality that will enable or facilitate the development of further clinical decision support and knowledge discovery approaches, as those presented in Axes 2 and 3.
Keywords: Information extraction, Natural language processing, Phenotyping, Phenotyping community-based evaluation, Patient representation learning
Seminal references:
Jouffroy J, Feldman SF, Lerner I, Rance B, Burgun A, Neuraz A. (2021) MedExt: combining expert knowledge and deep learning for medication extraction from French clinical texts. JMIR Medical Informatics, 17934. DOI: 10.2196/17934.
Lerner I, Jouffroy J, Burgun A, Neuraz A. (2020). Learning the grammar of prescription: recurrent neural network grammars for medication information extraction in clinical texts. arXiv preprint arXiv:2004.11622.
Garcelon N, Neuraz A, Salomon R, Bahi-Buisson N, Amiel J, Picard C, Mahlaoui N, Benoit V, Burgun A, Rance B. (2018). Next generation phenotyping using narrative reports in a rare disease clinical data warehouse. Orphanet journal of rare diseases, 13(1), 85. DOI: 10.1186/s13023-018-0830-6
Digan W, Névéol A, Neuraz A, Wack M, Baudoin D, Burgun A, Rance B. (2020) Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites. J Am Med Inform Assoc. Dec 15:ocaa261. DOI: 10.1093/jamia/ocaa261.