EnglishVersion  
     
 
小组动态

EarthVQA: Bridges the Earth Vision and Geoscience Language

1.Description

  The multi-modal multi-task VQA dataset (EarthVQA) is designed to bridge the Earth vision and Geoscience language, which includes the co-registered remote sensing image, land-cover semantic mask, and task-driven language text.
  The EarthVQA dataset contains 6000 images with 0.3m, and 208,593 QA pairs with urban and rural governance requirements embedded. The QA pairs are designed for judging, counting, object situation analysis, and comprehensive analysis types in many reasoning tasks such as traffic conditions, educational facilities, green ecology, cultivated land status, etc. This multi-modal and multi-task dataset poses new challenges, requiring geo-spatial relational reasoning and induction for remote sensing images.


Fig. 1. EarthVQA overview

2.Annotation format

  Semantic category labels: background-1, building-2, road-3, water-4, barren-5,forest-6, agriculture-7, playground-8. The no-data regions are assigned 0. The QA pairs are constructed and illustrated as follows:
       "275.png": [
        {
             "Type": "Basic Judging",
             "Question": "Are there any buildings in this scene?",
             "Answer": "Yes"
        },
        {
             "Type": "Comprehensive Analysis",
             "Question": "What are the road materials around the village?",
             "Answer": "There are cement roads, and asphalt roads"
        }]
3.Download

  We hope that the release of the EarthVQA dataset can promote the development of multi-modal remote sensing, especially for land-cover classification and visual question answering. You can click the link below to download the data.

● Baidu Drive: download
● Google Drive: download


4.Evaluation Server

  If you want to get the test scores, please join our hosted benchmark platform: Semantic Segmentation and Visual Question Answering.

5.Copyright

  The copyright belongs to Intelligent Data Extraction, Analysis and Applications of Remote Sensing(RSIDEA) academic research group, State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, China. The EarthVQA dataset only can be used for academic purposes and need to cite the following paper, but any commercial use is prohibited. Any form of secondary development, including annotation work, is strictly prohibited for this dataset. Otherwise, RSIDEA of Wuhan University reserves the right to pursue legal responsibility.

[1] Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, Yanfei Zhong*. EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering//Proceedings of the AAAI Conference on Artificial Intelligence[C], vol. 38, pp. 5481-5489, 2024
[2] Junjue Wang, Ailong Ma*, Zihang Chen, Zhuo Zheng, Yuting Wan, Liangpei Zhang, Yanfei Zhong. EarthVQANet: Multi-task visual question answering for remote sensing image understanding//ISPRS Journal of Photogrammetry and Remote Sensing[J], vol. 212, pp. 422-439, 2024

6.Contact

  If you have any the problem or feedback in using EarthVQA dataset, please contact:
  Mr. Junjue Wang: kingdrone@whu.edu.cn
  Prof. Ailong Ma: maailong007@whu.edu.cn
  Prof. Yanfei Zhong: zhongyanfei@whu.edu.cn

 
版权所有@RS-IDEA | 地址:武汉市珞喻路129号 | 单位:武汉大学测绘遥感信息工程国家重点实验室 | 办公室:星湖楼709 | zhongyanfei@whu.edu.cn