Foreground detection using attention modules and a video encoding

Foreground detection is the task of labelling the foreground (moving objects) or background (static scenario) pixels in the video sequence and it depends on the context of the scene. For many years, methods based on background model have been the most used approaches for detecting foreground; however, their methods are sensitive to error propagation from the first background model estimations. To address this problem, we proposed a U-net-based architecture with a feature attention module, where the encoding of the entire video sequence is used as the attention context to get features related to the background model. Furthermore, we added three spatial attention modules with the aim of highlighting regions with relevant features. We tested our network on sixteen scenes from the CDnet2014 dataset, with an average F-measure of 97.84. The results also show that our model outperforms traditional and neural networks methods. Thus, we demonstrated that feature and spatial attention modules on a U-net based architecture can deal with the foreground detection challenges.