Abstract

With the large-scale invasion of alien species, forest ecosystems are facing severe challenges, and the health of trees is increasingly threatened. Accurately detecting and counting trees affected by such invasive species has become a critical issue in forest conservation and resource management. Traditional detection methods usually rely only on the information of a single modality of an image, lack linguistic or semantic guidance, and often can only model a specific diseased tree situation during training, making it difficult to achieve effective differentiation and generalization of multiple diseased tree types, which limits their practicality. To address the above challenges, we propose an end-to-end multimodal diseased tree detection model. In the visual encoder of the model, we introduce rotational positional encoding to enhance the model’s ability to perceive detailed structures of trees in images. This design enables more accurate extraction of features related to diseased trees, especially when processing images with complex environments. At the same time, we further introduce a cross-attention mechanism between image and text modalities, so that the model can realize the deep fusion of visual and verbal information, thus improving the detection accuracy based on understanding and recognizing the semantics of the disease. Additionally, this method possesses strong generalization capabilities, enabling effective recognition based on textual descriptions even when samples are not available. Our model achieves optimal results on the Larch Casebearer dataset and the Pests and Diseases Tree dataset, verifying the effectiveness and generalizability of the method.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Volume
17
Issue
24
Pages
3971-3971
Citations
0
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex

Cite This

Rui Zhang, Zhibo Chen, Guangyu Huo et al. (2025). A Multimodal Visual–Textual Framework for Detection and Counting of Diseased Trees Caused by Invasive Species in Complex Forest Scenes. Remote Sensing , 17 (24) , 3971-3971. https://doi.org/10.3390/rs17243971

Identifiers

DOI
10.3390/rs17243971