Recent studies have shown that the steganalytic approaches based on deep learning frameworks cannot surpass their rich?model features based companions in peiformance. According to our analysis, one of the main causes of the unsatisfactory performance of deep learning frameworks is that training procedure tends to get stuck at local plateaus or even diverge when starting from a non-ideal initial state. In this paper we will try to investi?gate how to fit deep neural network to a rich-model features set. We regard it as a pre-training procedure and study its 4fect on deep learning for steganalysis. The state-of-the-art JPEG steganalytic features set DCTR is selected as the target and its features extraction procedure is divided into multiple sub-models. A deep learning framework with similar sub-networks is proposed. In the pre-training procedure we train theframeworkfrom bottom to up, fitting the output of each sub-network to the actual output of the corresponding sub-module of DCTR. The motivation behind the scenario is that we reinforce the proposed framework learn to fit the nonlinear mapping implicit in DCTR and expect when it is trainedfrom an initial state which represents an approximate so?lution of DCTR, we can get better peiformance compared to what DCTR has achieved.