Most digital cameras today employ Bayer Color Filter Arrays in front of the camera sensor. In order to create a true-color image, a demosaicing step is required introducing image blur and artifacts. Special sensors like the Foveon X3 circumvent the demosaicing challenge by using pixels lying on top of each other. However, they are not commonly used due to high production cost and low flexibility. In this work, a multi-color multi-view approach is presented in order to create true-color images. Therefore, the red-filtered left view and the blue-filtered right view are registered and projected onto the green-filtered center view. Due to the camera offset and slightly different viewing angles of the scene, object occlusions might occur for the side channels, hence requiring the reconstruction of missing information. For that, a novel local linear regression method is proposed, based on disparity and color similarity. Simulation results show that the proposed method outperforms existing reconstruction techniques by on average 5 dB.
Face pose contains rich information about the intent of a person, hence, estimating the face pose is important in assessing the attention of the driver. Most of the methods for pose estimation derive some image features and then either model the appearance (3D or 2D) or apply regression on the features. But these methods have high computational costs. On the other hand, we aim to estimate pose from only the facial landmark locations. In most driver monitoring systems, the important facial landmarks are readily available as they are essential in assessing driver drowsiness. Therefore, we utilize the existing eye landmarks along with nose and mouth landmarks to estimate the face pose. For this, we propose to apply linear regression on features derived only from the 2D facial landmark locations. Instead of relying on a single linear regression model, we propose to apply a global linear model to predict the pose and then refine the predicted pose by applying a local model built for that pose region. Local models are built using partially overlapping subsets of training samples. The experiments on Pointing'04, MultiPIE, and Biwi Kinect datasets show that the proposed two-level models achieve accuracy comparable to that of the state-of-the-art methods. At the same time, the proposed method can process 2000 frames per second in Octave.