Abstract:Aiming at the problems of low accuracy and limited adaptability to large baseline scenes and motion blur scenarios in current homography estimation methods, an end-to-end homography estimation methodwith attention mechanism for large baseline scenes was constructed, which utilized unsupervised learning for homography estimation. Firstly, by introducing the SE channel attention module, a homography regression network layer with attention mechanism was constructed, enabling the network to learn the inter-channel correlations of images. Secondly, a binary unsupervised loss construction method based on mask and perceptual loss metrics was proposed to enhance the network′s perception range and adaptability to large baseline scenes. Finally, a Homo-COCO synthetic dataset was created, and data augmentation was used to improve the network model′s robustness to changes in lighting and motion blur, resulting in stronger generalization capabilities in real-world scenes. Extensive comparative and ablation experiments demonstrate that this method outperforms existing methods in terms of accuracy and scene adaptability, showing good precision and adaptability. It can effectively estimate image homography and provide accurate parameter estimation for subsequent computer vision tasks such as image stitching and image correction.