We present a system to perform joint registration and fusion for RGB and Infrared (IR) video pairs. While RGB is related to human perception, IR is associated with heat. However, IR images often lack contour and texture information. The goal with the fusion of the visible and IR images is to obtain more information from them. This requires two completely matched images. However, classical methods assuming ideal imaging conditions fail to achieve satisfactory performance in actual cases. From the data-dependent modeling point of view, labeling the dataset is costly and impractical.In this context, we present a framework that tackles two challenging tasks. First, a video registration procedure that aims to align IR and RGB videos. Second, a fusion method brings all the essential information from the two video modalities to a single video. We evaluate our approach on a challenging dataset of RGB and IR video pairs collected for firefighters to handle their tasks effectively in challenging visibility conditions such as heavy smoke after a fire, see our project page.