Micro-expression (ME) analysis has been becoming an attractive topic recently. Nevertheless, the studies of ME mostly focus on the recognition task while spotting task is rarely touched. While micro-expression recognition methods have obtained the promising results by applying deep learning techniques, the performance of the ME spotting task still needs to be largely improved. Most of the approaches still rely upon traditional techniques such as distance measurement between handcrafted features of frames which are not robust enough in detecting ME locations correctly. In this paper, we propose a novel method for ME spotting based on a deep sequence model. Our framework consists of two main steps: 1) From each position of video, we extract a spatial-temporal feature that can discriminate MEs among extrinsic movements. 2) We propose to use a LSTM network that can utilize both local and global correlation of the extracted feature to predict the score of the ME apex frame. The experiments on two publicly databases of ME spotting demonstrate the effectiveness of our proposed method.