The aim of infrared and visible image fusion is to produce a composite image that can highlight the infrared targets and maintain plentiful detailed textures simultaneously. Despite the promising fusion performance of current deep-learning-based algorithms, most fusion algorithms highly depend on convolution operations, which limits their capability to represent long-range contextual information. To overcome this challenge, we design a novel infrared and visible image fusion network based on Res2Net and multiscale Transformer, called RMTFuse. Specifically, we devise a local feature extraction ...