Personal real-time video monitoring devices are popular in recent years especially for people living in bigger houses who are taking care of babies. People take advantage of a real-time video monitoring device to monitor the babies’ activities while they are occupied by other issues. However, danger happens any time in any situations. It is necessary for a baby monitoring device to have a motion detection function to trigger recording functions or alert the guardians. This paper introduces a solution for the motion detection problem. The solution combines statistical methods, kernel density estimation and histogram analysis methods, and deep learning method for detecting the salient object. The final outputs are both the motion levels indicating the severity of the motions and the bounding boxes indicating the location of the motion happening in the video frame. The different levels of motion can later be used as triggers for different functions built into the device, such as starting video recording, playing sirens, etc., and the bounding boxes can provide reference focus areas for the guardians to check the details of the motion to decide whether to take actions or not.
Video-based detection of moving and foreground objects is a key computer vision task. Temporal differencing of video frames is often used to detect objects in motion, but fails to detect slowmoving (relative to the video frame rate) or stationary objects. Adaptive background estimation is an alternative to temporal frame differencing that relies on building and maintaining statistical models describing background pixel behavior; however, it requires careful tuning of a learning rate parameter that controls the rate at which the model is updated. We propose an algorithm for statistical background modeling that selectively updates the model based on the previously detected foreground. We demonstrate empirically that the proposed approach is less sensitive to the choice of learning rate, thus enabling support for an extended range of object motion speeds, and at the same time being able to quickly adapt to fast changes in the appearance of the scene.