This paper proposes a guitar fingering assessing system based on CNN (Convolutional Neural Network) hand pose estimation and SVR (Support Vector Regression) evaluation. To spur our progress, first, a CNN architecture is proposed to estimate temporal 3D position of 16 joints of hand; then, based on a DCT (Discrete Cosine Transform) feature and SVR, fingering of guitarist is scored to interpret how well guitarist played. We also release a new dataset for professional guitar playing analysis with significant advantage in total number of video, professional judgement by expert of guitarist, accurate annotation for hand pose and score of guitar performance. Experiments using videos containing multiple persons' guitar plays under different conditions demonstrate that the proposed method outperforms the current state-of-art with (1) low mean error (Euclid distance of 6,1 mm) and high computation efficiency for hand pose estimation; (2) high rank correlation (0.68) for assessing the fingering (C major scale and symmetrical excise) of guitarists.
Zhao Wang, Jun Ohya, "A 3D Guitar Fingering Assessing System Based on CNN-Hand Pose Estimation and SVR-Assessment" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Intelligent Robotics and Industrial Applications using Computer Vision, 2018, pp 204-1 - 204-5, https://doi.org/10.2352/ISSN.2470-1173.2018.09.IRIACV-204