The Rate-Distortion adaptive mechanisms of MPEG-HEVC (High Efficiency Video Coding) and its derivatives are an incremental improvement in the software reference encoder, providing a selective Lagrangian parameter choice which varies by encoding mode (intra or inter) and picture reference
level. Since this weighting factor (and the balanced cost functions it impacts) are crucial to the RD optimization process, affecting several encoder decisions and both coding efficiency and quality of the encoded stream, we investigate an improvement by modern reinforcement learning methods.
We develop a neural-based agent that learns a real-valued control policy to maximize rate savings by input signal pattern, mapping pixel intensity values from the picture at the coding tree unit level, to the appropriate weighting-parameter. Our testing on reference software yields improvements
for coding efficiency performance across different video sequences, in multiple classes of video.