Cone-beam computed tomography (CT) is an attractive tool for many kinds of non-destructive evaluation (NDE). Model-based iterative reconstruction (MBIR) has been shown to improve reconstruction quality and reduce scan time. However, the computational burden and storage of the system matrix is challenging. In this paper we present a separable representation of the system matrix that can be completely stored in memory and accessed cache-efficiently. This is done by quantizing the voxel position for one of the separable subproblems. A parallelized algorithm, which we refer to as zipline update, is presented that speeds up the computation of the solution by about 50 to 100 times on 20 cores by updating groups of voxels together. The quality of the reconstruction and algorithmic scalability are demonstrated on real cone-beam CT data from an NDE application. We show that the reconstruction can be done from a sparse set of projection views while reducing artifacts visible in the conventional filtered back projection (FBP) reconstruction. We present qualitative results using a Markov Random Field (MRF) prior and a Plug-and-Play denoiser.