Activity recognition and pose estimation are ingeneral closely related in practical applications, even though they are considered to be independent tasks. In this paper, we propose an artificial 3D coordinates and CNN that is for combining activity recognition and pose estimation with 2D and 3D static/dynamic images(dynamic images are composed of a set of video frames). In other words, We show that the proposed algorithm can be used to solve two problems, activity recognition and pose estimation. End-to-end optimization process has shown that the proposed approach is superior to the one which exploits the activity recognition and pose estimation seperately. The performance is evaluated by calculating recognition rate. The proposed approach enable us to perform learning procedures using different datasets.