Abstract

Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA with different modality combinations and prediction settings, MoMA outperforms existing methods on three prediction tasks using private datasets, highlighting its enhanced accuracy and flexibility across various tasks.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Citations
0
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex

Cite This

Jifan Gao, Md Mahmudur Rahman, John Caskey et al. (2025). MoMA: a mixture-of-multimodal-agents architecture for enhancing clinical prediction modelling. npj Digital Medicine . https://doi.org/10.1038/s41746-025-02219-4

Identifiers

DOI
10.1038/s41746-025-02219-4