Personal real-time video monitoring devices are popular in recent years especially for people living in bigger houses who are taking care of babies. People take advantage of a real-time video monitoring device to monitor the babies’ activities while they are occupied by other issues. However, danger happens any time in any situations. It is necessary for a baby monitoring device to have a motion detection function to trigger recording functions or alert the guardians. This paper introduces a solution for the motion detection problem. The solution combines statistical methods, kernel density estimation and histogram analysis methods, and deep learning method for detecting the salient object. The final outputs are both the motion levels indicating the severity of the motions and the bounding boxes indicating the location of the motion happening in the video frame. The different levels of motion can later be used as triggers for different functions built into the device, such as starting video recording, playing sirens, etc., and the bounding boxes can provide reference focus areas for the guardians to check the details of the motion to decide whether to take actions or not.