CRFT: Transformer-Based Cross-Modal Image Registration

Date:

CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

Summary: arXiv:2604.05689v1 Announce Type: cross

Abstract

We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on feature flow learning for robust cross-modal image registration.
CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation.
The coarse stage establishes global correspondences through multi-scale feature correlation, while the fine stage refines local details via hierarchical feature fusion and adaptive spatial reasoning.
To enhance geometric adaptability, an iterative discrepancy-guided attention mechanism with a Spatial Geometric Transform (SGT) recurrently refines the flow field, progressively capturing subtle spatial inconsistencies and enforcing feature-level consistency.
This design enables accurate alignment under large affine and scale variations while maintaining structural coherence across modalities.
Extensive experiments on diverse cross-modal datasets demonstrate that CRFT consistently outperforms state-of-the-art registration methods in both accuracy and robustness.
Beyond registration, CRFT provides a generalizable paradigm for multimodal spatial correspondence, offering broad applicability to remote sensing, autonomous navigation, and medical imaging.
Code and datasets are publicly available at https://github.com/NEU-Liuxuecong/CRFT.

Introduction

Cross-modal image registration is a critical task in various fields including medical imaging, remote sensing, and autonomous navigation.
Traditional methods often struggle with the complexities involved in aligning images from different modalities due to variations in scale, illumination, and other factors.
The CRFT framework addresses these challenges through an innovative use of transformer architecture, enabling robust and efficient image alignment.

Methodology

The CRFT framework consists of two main stages: a coarse stage and a fine stage.
Each stage employs a unique set of techniques to ensure accurate and consistent feature registration:

  • Coarse Stage: Establishes global correspondences between images by utilizing multi-scale feature correlation. This allows the model to capture broad patterns and structures across different modalities.
  • Fine Stage: Refines the alignment by focusing on local details through hierarchical feature fusion. Adaptive spatial reasoning is applied to enhance the precision of the registration.
  • Iterative Discrepancy-Guided Attention: An innovative mechanism that leverages Spatial Geometric Transform (SGT) to recursively refine the flow field, effectively addressing subtle spatial inconsistencies.

Results

Comprehensive experiments conducted on a variety of cross-modal datasets indicate that CRFT significantly outperforms existing state-of-the-art registration methods.
Metrics such as accuracy and robustness were evaluated, showcasing CRFT’s superior capability in handling diverse image modalities and registration challenges.

Applications

The versatility of CRFT extends beyond mere image registration. Its ability to establish multimodal spatial correspondences makes it suitable for a wide range of applications:

  • Remote Sensing: Accurate alignment of satellite images for environmental monitoring and analysis.
  • Autonomous Navigation: Enhanced perception systems for vehicles by aligning data from diverse sensors.
  • Medical Imaging: Improved integration of images from different modalities, aiding in diagnosis and treatment planning.

Conclusion

The CRFT framework represents a significant advancement in the field of cross-modal image registration.
By leveraging the power of transformer architecture and innovative feature flow learning, it provides a robust solution to longstanding challenges in aligning images from different modalities.
The public availability of code and datasets encourages further research and application of this groundbreaking work.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.