Rsam-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

1Nanjing Forestry University 2Nanjing University of Posts and Telecommunications 3Nanjing University of Aeronautics and Astronautics

Abstract

The development of high-resolution remote sensingsatellites has provided great convenience for research workrelated to remote sensing. Segmentation and extraction of specificand interesting targets are essential tasks when facing the vastand complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provide a universal pre-trainingmodel for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does notyield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed inthe multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection, and road mapping. The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. Inaddition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets.

Overview

Network

We propose RSAM-Seg, Feature information is extracted from specific domains and inserted into the ViT blocks in the encoder to improve the performance in remote sensing field.

Details

Vit1 Vit2

ViT blocks of the encoder are modified by incorporating Adapter-Scale inside, and embedding Adapter-Feature between ViT layers to extract image information.

Results

Result1


Result2

BibTeX

@inproceeding{Zhang2023,
        title={rsam-seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation},
        author={Jie Zhang, Xubing Yang, Rui Jiang, Wei Shao, Li Zhang},
        booktitle={},
        year={2024}
      }