To improve the workout efficiency and to provide the body movement suggestions to users in a “smart gym” environment, we propose to use a depth camera for capturing a user’s body parts and mount multiple inertial sensors on the body parts of a user to generate deadlift behavior models generated by a recurrent neural network structure. The contribution of this paper is trifold: 1) The multimodal sensing signals obtained from multiple devices are fused for generating the deadlift behavior classifiers, 2) the recurrent neural network structure can analyze the information from the synchronized skeletal and inertial sensing data, and 3) a Vaplab dataset is generated for evaluating the deadlift behaviors recognizing capability in the proposed method.