Abstract

Eye tracking scanpaths encode the temporal sequence and spatial distribution of eye movements, offering insights into visual attention and aesthetic perception. However, analysing scanpaths still requires substantial manual effort and specialised expertise, which limits scalability and constrains objectivity of eye tracking methods. This paper examines whether and how multimodal large language models (MLLMs) can provide objective, expert-level scanpath interpretations. We used GPT-4o as a case study to develop eye tracking scanpath analysis (ETSA) approach which integrates (1) structural information extraction to parse scanpath events, (2) knowledge base of visual-behaviour expertise, and (3) least-to-most and few-shot chain-of-thought prompt engineering to guide reasoning. We conducted two studies to evaluate the reliability and effectiveness of the approach, as well as an ablation analysis to quantify the contribution of the knowledge base and a cross-model evaluation to assess generalisability across different MLLMs. The results of repeated-measures experiment show high semantic similarity of 0.884, moderate feature-level agreement with expert scanpath interpretations (F1 = 0.476) and no significant differences from expert annotations based on the exact McNemar test (p = 0.545). Together with the ablation and cross-model findings, this study contributes a generalisable and reliable pipeline for MLLM-based scanpath interpretation, supporting efficient analysis of complex eye tracking data.

Affiliated Institutions

Related Publications

Components of expertise

This article discusses frameworks for studying expertise at the knowledge level and knowledge-use level. It reviews existing approaches such as inference structures, the distinc...

1990 DIGITAL.CSIC (Spanish National Resear... 476 citations

Publication Info

Year
2025
Type
article
Volume
6
Issue
4
Pages
164-164
Citations
0
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex
0
Influential
0
CrossRef

Cite This

Xiangdong Li, Kailin Yin, Yuxin Gu (2025). Approach to Eye Tracking Scanpath Analysis with Multimodal Large Language Model. Modelling—International Open Access Journal of Modelling in Engineering Science , 6 (4) , 164-164. https://doi.org/10.3390/modelling6040164

Identifiers

DOI
10.3390/modelling6040164

Data Quality

Data completeness: 81%