Workshop Schedule

Opening

Dr. Ali Hasnain

Keynote: Design patterns for neuro-symbolic medical decision support systems

Prof. Dr. Frank van Harmelen

Vrije Universiteit Amsterdam

Coffee Break

Hyperbolic Embedding of Subsumption Groups in Large-Scale Biomedical Ontologies with Containment Cone Constraints

Shervin Mehryar and Michel Dumontier

Embedding methods are a key enabler of recent advances in language model and augmented retrieval systems. For biomedical ontologies in particular, embedding data in a geometric space with non-zero curvature can capture the structure of hierarchies more robustly. In this work, a hyperbolic embedding method is proposed which is equipped with containment constraints for modeling subsumption groups. On synthetically generated directed graphs, it is found that arbitrarily low distortions can be achieved. Through experimentation on a real-world biomedical ontology, it is shown that our method satisfies a larger portion of containment properties compared to existing models.

Synthetic Biomedical Data Generation Via a Beam Tree Strategy for Large Language Models

Juan Cano de Benito, Sven Hertling, Fabian Berns and Heiko Paulheim

In recent years, the generation of synthetic data in the biomedical domain has become increasingly important in addressing the limited availability of public datasets and the imbalance that exists in certain datasets. However, generative models often present problems, such as repetitive phrases and hallucinations, that limit the quality and reliability of the data produced. This paper proposes a method for creating synthetic biomedical sentences using a combination of beam tree search and knowledge graphs. The approach explores the generation of datasets in order to maximise the diversity of the phrases generated, while the incorporation of knowledge graphs guides the identification of valid biomedical entities and balances their distribution. The method is evaluated using three benchmark datasets (BC5CDR, ChemProt, and MedMentions) to measure its efficiency in generating synthetic data according to each dataset.

Enabling Clinical Research with Semantic Knowledge Graphs: The Cancer Virtual Lab Platform

Antonella Carbonaro, Luca Giorgetti, Lorenzo Ridolfi, Roberto Pasolini, Andrea Pagliarani, Paolo De Angelis, Alice Andalò and Nicola Gentili

The secondary use of clinical data for research in life sciences is still hindered by data fragmentation, heterogeneity, and limited semantic interoperability across healthcare systems. Semantic technologies and knowledge graphs have emerged as promising enablers to overcome these challenges, yet their adoption in operational research platforms remains limited. In this paper, we present the Cancer Virtual Lab (CVL), a semantic platform designed to enable clinical research through the integration of standardized data representations, biomedical ontologies, and knowledge graph technologies. CVL leverages HL7 FHIR–based data models and RDF/OWL representations to transform heterogeneous real-world oncology data into interoperable, provenance-aware semantic knowledge graphs. The platform has been applied to a large-scale, real-world oncology dataset comprising 36,335 patient records, in which 1,093,705 hospital stay records were successfully converted into 1,151,559 distinct RDF-based FHIR resources. This semantic backbone supports advanced querying, ontology-driven reasoning, and explainable inference over clinical cohorts, enabling reproducible and transparent research workflows. Beyond data integration, CVL provides user-facing tools for researchers and clinicians, including semantic cohort identification, interactive knowledge graph exploration, and natural-language access to clinical data mediated by AI-based agents. Through architectural descriptions and illustrative screenshots, we demonstrate the feasibility and practical impact of semantic knowledge graphs as a foundation for advanced analytics, AI-driven decision support, and large-scale reuse of clinical data in life sciences research.

A Two-Method Framework for Aligning Medical Terminologies

Stan Ostaszewski, Ömer Durukan Kılıç, Ensar Emir Erol, Michel Dumontier and Remzi Celebi

The healthcare data interoperability relies on aligning disparate terminologies to a unified ontology, such as SNOMED CT. In this work, we present a framework consisting of two methods that leverages expert-curated reference sets for direct 1-to-1 mappings; augmented by algorithmic strategies for incomplete or 1-to-M cases. Firstly, for unmapped procedure (CPT) and medication (NDC) codes, we power logistic regression-based imputation and non-contextual ranking with medBERT text embeddings. Second, for aligning ambiguous diagnoses (ICD-9) codes, a patient-context-aware ranking exploits SNOMED CT’s hierarchical structure via three proximity metrics: textual cosine similarity, exact shortest-path distances, and Node2Vec embeddings. Text-based imputation yields high AUCs (0.95+ for NDC, 0.85 for CPT); and contextual ranking with Node2Vec for ICD-9’s generic mappings achieve Hits@1 of 0.37-0.45 and Hits@5 of 0.82-0.88 on curated EHR-like tests, outperforming text-only methods while delivering a significant cost reduction over computing exact distances. Interpretable confidences (𝜅𝑖) are calculated to reward context-specific outliers, enabling robust entity resolution in graph-triple pipelines. With this framework, we bridge gaps between medical terminologies, reducing mapping ambiguity and increasing interoperability. By supporting expert-curated reference sets with algorithmic and statistical methods, our framework advances scalable semantic integration within the healthcare domain.

The evolution of artificial intelligence in healthcare education: A temporal and methodological analysis

Jodie Finn, Srivarshini Sankar, Taha Farooqi, Mohamed Elhassadi and Ali Hasnain

Artificial intelligence (AI) has rapidly emerged as a transformative technology in higher education, with particularly significant implications for healthcare training. The recent introduction of large language models (LLMs), such as ChatGPT and GPT-4, has accelerated experimentation with AI-assisted learning tools, yet the structural evolution of research in this domain remains poorly understood. This study conducts an exploratory analysis of published research examining AI applications in healthcare education to identify temporal trends, shifts in model adoption, methodological patterns, and thematic emphases in reported outcomes. A curated dataset of studies published between 2015 and 2025 was analysed using descriptive statistics, cross-tabulation, and chi-square tests. Results demonstrate a pronounced increase in publication activity after 2022, coinciding with the public release of ChatGPT. GPT-based systems dominated the literature, accounting for the majority of studies, while traditional machine learning approaches declined proportionally. Despite this rapid expansion, methodological rigour has not increased significantly, with descriptive study designs remaining predominant and randomised controlled trials relatively rare. Thematic analysis further indicates a strong emphasis on performance-related outcomes, with comparatively limited attention to ethical considerations, academic integrity, and governance issues. These findings suggest that while AI-driven educational research is expanding rapidly, methodological maturation and critical evaluation frameworks have yet to develop at the same pace.

Closing

Dr. Ali Hasnain