Hand gesture recognition is a crucial but challenging task in the field of Virtual Reality (VR) and Human Computer Interaction (HCI). In this paper, a skeleton-based dynamic hand gesture recognition approach is proposed, in which the skeleton structure of the hand captured by 3D depth sensor is firstly exploited and the spatiotemporal multi-fused features that concatenate four skeleton hand shape features and one hand direction feature are extracted. Then the hand shape features are encoded by Fisher Vector obtained from a Gaussian Mixture Model (GMM). To add the temporal information, hand shape Fisher Vector and hand direction feature are represented by a Temporal Pyramid (TP) to obtain the final feature vectors to be fed into a linear SVM classifier to recognize. The proposed approach is evaluated on a challenging dataset containing eight gestures performed by ten participants. Compared with the state-of-the-art dynamic hand gesture recognition methods, the proposed method shows a relative high recognition accuracy of 90.0%.