Back to articles
Work Presented at Electronic Imaging 2024
Volume: 67 | Article ID: 060405
Image
Computational Image Formation: Simulators in the Deep Learning Era
  DOI :  10.2352/J.ImagingSci.Technol.2023.67.6.060405  Published OnlineNovember 2023
Abstract
Abstract

At the pinnacle of computational imaging is the co-optimization of camera and algorithm. This, however, is not the only form of computational imaging. In problems such as imaging through adverse weather, the bigger challenge is how to accurately simulate the forward degradation process so that we can synthesize data to train reconstruction models and/or integrating the forward model as part of the reconstruction algorithm. This article introduces the concept of computational image formation (CIF). Compared to the standard inverse problems where the goal is to recover the latent image x from the observation y=G(x), CIF shifts the focus to designing an approximate mapping Hθ such that HθG while giving a good image reconstruction result. The word “computational” highlights the fact that the image formation is now replaced by a numerical simulator. While matching Mother Nature remains an important goal, CIF pays even greater attention to strategically choosing an Hθ so that the reconstruction performance is maximized.

The goal of this article is to conceptualize the idea of CIF by elaborating on its meaning and implications. The first part of the article is a discussion on the four attributes of a CIF simulator: accurate enough to mimic G, fast enough to be integrated as part of the reconstruction, provides a well-posed inverse problem when plugged into the reconstruction, and differentiable to allow backpropagation. The second part of the article is a detailed case study based on imaging through atmospheric turbulence. A plethora of simulators, old and new ones, are discussed. The third part of the article is a collection of other examples that fall into the category of CIF, including imaging through bad weather, dynamic vision sensors, and differentiable optics. Finally, thoughts about the future direction and recommendations to the community are shared.

Subject Areas :
Views 98
Downloads 44
 articleview.views 98
 articleview.downloads 44
  Cite this article 

Stanley H. Chan, "Computational Image Formation: Simulators in the Deep Learning Erain Journal of Imaging Science and Technology,  2023,  pp 1 - 17,  https://doi.org/10.2352/J.ImagingSci.Technol.2023.67.6.060405

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2023
 Open access
  Article timeline 
  • received July 2023
  • accepted November 2023
  • PublishedNovember 2023
jist
JIMTE6
Journal of Imaging Science and Technology
J. Imaging Sci. Technol.
J. Imaging Sci. Technol.
1062-3701
1943-3522
Society for Imaging Science and Technology
1.
Introduction
The cameras we use today are largely a variant of the pinhole camera which, according to some scientists, can be traced back to nomadic tribes of North Africa thousands of years ago. A pinhole camera is easy to understand. Many of you have probably seen Gemma Frisius’s drawing [1] (See Figure 1): Light is emitted from the source, propagating through the medium, and finally arriving at a tiny pinhole. Assuming that light travels along a straight line, a scaled and inverted image is formed on the screen. Today’s cameras are certainly more complicated but they arguably follow the same principle, just with slightly more advanced optical elements to bend light and a sensor to record the image intensity.
Figure 1.
A schematic drawing of the pinhole camera made by Gemma Frisius in 1544 [1].
Over the past century, imaging has evolved to a much more diverse form, which we now call computational imaging [2, 3]. Unlike pinhole cameras that aim to produce a sharp image, a computational imaging system co-designs the sensing unit with the algorithm. The signal generated at the junction of sensor and algorithm may not necessarily be a sharp and clean image. It can be an unrecognizable signal, but with the magic of the algorithm, we can recover the image.
However, computational imaging should not be bounded to the co-design of a hardware and algorithm. In this paper, I want to the concept of computational image formation (CIF). CIF, in my mind, is a branch in computational imaging with the focus on selecting and optimizing the image formation process. This can be a confusing (or even redundant) idea, because the image formation process is defined by either Mother Nature or hardware. If it is Nature, e.g., shot noise due to the Poisson arrival process, we have a very precise equation describing how images are formed. If it is hardware, we have already seen many great examples such as coded aperture, coded exposure, CT, MRI, etc. So, what does CIF mean?
The core substance of CIF is a simulator; a differentiable, fast, and accurate simulator that can be integrated into the image reconstruction framework. You may wonder: “A simulator? We have a ton of simulators in physics, and they are great. Why do we need your simulator?” The attributes of the simulator(s) I like to discuss here are quite different from what a physics simulator looks like. In physics, a simulator’s only job is to mimic nature or processes we can observe but cannot control. In CIF, the focus of the simulator is not about matching Mother Nature unconditionally, but about maximizing the image quality of the image reconstruction algorithm. To this end, a simulator in CIF needs to have four properties:
(1)
Accurate. It should be accurate enough to mimic the true image formation process. The level of precision is determined by the image reconstruction goal.
(2)
Fast. It needs to be fast enough to be used to synthesize training data and/or integrated in the image reconstruction loop.
(3)
Benefits reconstruction. It needs to improve the performance of the image reconstruction. To do so, the simulator’s parameters can be updated during inference.
(4)
Differentiable. It needs to be differentiable, so that the reconstruction neural network can be trained while having the simulator in the loop.
My goal in this paper is to conceptualize CIF and invite discussions from peers. I want to explain why CIF is important in our time. I will use atmospheric turbulence as a case study, but a few other examples will also be introduced to support my discussions. I will conclude the paper with some thoughts about moving the field forward.
2.
Concept of Computational Image Formation
2.1
Pinhole Camera
In a pinhole camera1
1
A distinction is made between a conventional camera and a computational camera. As the spellings of the two appear similar, the former is referred to as a pinhole camera.
, we can think of the camera being a passive device. If we use xRd to denote the ground truth image in the object plane, and yRm to denote the observed image in the image plane, the camera can be mathematically described as a mapping H from x to y:
(1)
yobserved image=Hcamera(xtrue image).
The camera model H can involve lenses, color filter arrays, image sensors, etc.
A post-processing image reconstruction algorithm for a pinhole camera is to find the best estimate x̂ by solving a certain optimization (e.g., maximum likelihood estimation or maximum-a-posteriori estimation) or running it through a neural network. For generality, the reconstruction is considered as an operator R:
(2)
x̂=R(y,H)reconstruction.
The reconstruction method requires two inputs; the measurement y and the camera model H. As a quick example, for those who do image deconvolution, R can be a total variation minimization:
(3)
x̂=R(y,H)=argminxH(x)y2+λx TV ,
where H(x) denotes the blurring operation, and ∥⋅∥TV denotes the total variation norm. We can come up with many other examples which I shall skip for brevity.
The key observation of the above equations is that in a pinhole camera, we perform a capture-then-reconstruct operation. The camera is pre-defined. While we use the model H during reconstruction, we never send any feedback signal to H. Furthermore, the design of H is separated from the reconstruction. If we ask a camera engineer to build a camera H, he/she will never ask if we are using total variation minimization or a generative adversarial network for image reconstruction. With probability one, his/her goal is to make a H that produces the ideal image, for good reasons.
2.2
Computational Camera
In a computational camera, it is well known that the above isolation of the camera and the algorithm is replaced by a co-design philosophy. To accomplish this goal, I parameterize H with a state vector θ so that the model becomes Hθ.
The presence of the state vector θ makes the overall design interesting, as shown in Figure 2. Instead of solving a two-stage feed forward problem (design H and then design R), the reconstructed image x̂ will be used to guide the design of H so that we can close the loop. By parameterizing R as Rψ, the end-to-end design can be written as
(4)
(ψ̂,θ̂)=argminψ,θExReconLoss Rψ(y,Hθ)x̂,x,
where ReconLoss (x̂,x) denotes the reconstruction loss between the predicted image x̂ and the ground truth x. I keep the definition of the loss function vague because the choice of the loss function depends on applications. It can be the squared loss, the cross-entropy loss, or any other loss that would make sense for the application.
Figure 2.
In a computational camera, the goal is to configure the camera parameters and the reconstruction’s parameters so that the quality of the final reconstructed image is maximized.
Example 1.
(Coded aperture/lensless imaging) Consider a coded aperture camera where we are interested in recovering the full signal xRd from a coded measurement yRm. The coding scheme we use is a fat matrix HRm×d where md so that the measured signal
y=Hθ(x)=defHxencoded signal+nnoise term
has a lower dimension than the true signal x.
To reconstruct the signal, we create a reconstruction algorithm (e.g., a neural network) that does
x̂=Rψ(y).
Here, we can think of ψ being the weights of the neural network.
To train the neural network and simultaneously find the optimal sensing matrix H, we perform the joint optimization
Ĥ,ψ̂=argminH,ψEx,nReconLoss (Rψ(Hx+n),x).
The expectation is taken with respect to both the signal x and noise n because they are random.
If we consider the hardware feasibility, we can further pose constraints on H. For example, we can require H to be binary so that it can be implemented through the digital micro-mirror devices (DMD).
2.3
Nature and Simulator
The biggest difference between a computational camera and computational image formation (CIF) is the presence of a simulator. To explain the idea it would be useful to consider the problem of imaging through an undesirable environment, e.g., fog, haze, turbulence, or scattering medium.
Nature. The subject of CIF is often concerned with nature. For the time being let this be called a general degradation process, denoted by G:
(5)
yobserved image=Gnature(xtrue image).
The degradation process is unknown to us. We have no idea of how each point in the object plane is mapped to a digital value in the image plane. We can in theory run procedures such as ray tracing to find out how each ray is distorted. But there will be infinite rays to trace and so realistically we can never perform such calculations. In some situations such as random scattering medium, we cannot even trace rays because it is impossible to know how each molecule affects the light.
Example 2.
(Modeling haze) One of the landmark papers in imaging through weather is the dark channel prior by He et al. published in CVPR 2009 [4], although the modeling can be traced back to Fattal [5], Rossum and Nieuwenhuizen [6] and Koschmieder [7].
When light propagates through a scattering medium, the water molecules along the propagation path will cause attenuation of the light. The exact image formation process is governed by nature, and is unknown. We denote it as G.
To simulate the image formation, people use the so-called radiative transport equation by modeling the overall degradation as a sum of the airlight term and the direct attenuation term. This gives us a simulator Hθ:
Hθ(x)=xt+A(1t),
where the state vector θ contains the airlight color A and the transmission map t, with ⊙ defining the elementwise multiplication. Note that Hθ is a very good model for G, but HθG.
Simulator. Since G is unknown, we need to approximate it using a model Hθ. I call it a simulator. A simulator, by construction, reproduces part of the nature. Therefore, the output of the simulator is not y but
(6)
ŷ=Hθ(x).
I emphasize that Hθ is a parameterization of nature. The parameterization is a dimensionality reduction of the operator G from an infinite-dimensional space to a finite (and often low) dimensional space. The choice of the parametric model is often based on physics, but there are also man-made parameterizations. The following is an example.
Example 3.
(Environment parameterization) In Ref. [8], Gnanasambandam et al. proposed the idea of optically perturbing the appearance of an object by using structured illumination. The concept was that given a ground truth image x associated with a class label , an optical system Hθ can be built such that the distorted image ŷ=Hθ(x) will be misclassified as label . The optical system was implemented using a projector-camera setup, where the projector illuminates the real 3D object using a calculated pattern. This provides a mechanism to test the resilience of the classifier to manipulated attacks.
What is relevant to CIF is that the same optical setup can be used to analyze the robustness against natural perturbations, such as shadow or overcast. These natural perturbations are G, of which the exact forms are unknown to us. Some people use graphics rendering (e.g., for flight simulator) to synthesize the scenes. With the projector-camera system, we can use the multiple projectors to parameterize nature by using a finite number of tunable knobs to reproduce outdoor environment in a controlled setting, hence providing an approximation HθG.
The presence of nature and a simulator changes the problem scope from designing a hardware camera to designing a numerical simulator as shown in Figure 3. Because of its middle-person role between nature and algorithm, the selection and optimization of the simulator is significantly more complicated than configuring a camera such as deciding which coded aperture pattern to use. An analogy here is that selecting a simulator is like selecting a neural network, whereas configuring a camera is like learning the weight of the neural network.
Figure 3.
In Computational Image Formation (CIF), the true image formation process is determined by nature G. The design task here is to select and optimize a simulator Hθ such that it not only achieves a desirable modeling accuracy, but more importantly maximizes the reconstruction quality.
2.4
Four Attributes of a CIF Simulator
In what follows, I will describe the four attributes of the simulator in CIF.
Attribute 1: Accurate. The first attribute of a CIF simulator Hθ is that it is accurate enough to approximate the true image formation process G. The form of Hθ can be anything. It can be a one-line equation, an algorithmic procedure, or even a neural network. The performance of Hθ is measured in terms of how good the simulated image is when compared to the true image. Mathematically, we define the loss as
(7)
Esim(Hθ)=defExSimLoss (Hθ(x),G(x))simulator loss.
The equation says that for a given image x, the simulated image Hθ(x) needs to be sufficiently similar to G(x). The goodness of Hθ is measured according to the loss function SimLoss. For example, in atmospheric turbulence, SimLoss can be the pixelwise distance between the simulated long/short exposure function and the measured (real) long/short exposure function. It can also be the differential tilt statistic or the z-tilt statistic between the simulation and the theoretically derived statistical average. Or, SimLoss can be measured using target patterns seen through a heat chamber. There are also instrumentations such as the scintillometer to measure the turbulence profile. In any case, SimLoss is an application specific metric(s).
Attribute 2: Fast. The second attribute of a CIF simulator is that it needs to be fast so that we can integrate it as a part of the reconstruction algorithm. But speed depends on which computing platform we use. (See discussions in the next paragraph.) A slightly better way is to measure the complexity of the model, with the following notation
(8)
Ecomplexity(Hθ)
The definition of the complexity can take various forms. If Hθ is implemented using a deep neural network, then the complexity can be measured in terms of the number of hidden layers, width of hidden layers, number of parameters, number of filters, size of the filters, etc. The complexity of Hθ can also be viewed at two levels: (1) The intrinsic capacity of the model and (2) The effective model capacity given a specific choice of θ.
Model complexity is usually linked to speed (aka runtime), but some caveats should be noted. For example, 2D convolutions today can be efficiently implemented because of the dedicated hardware architectures on graphics processing units. In contrast, depth-wise 3D convolutions are substantially slower even if the number of parameters is the same as a 2D convolution because of the lack of specialized hardware. While this problem will likely be solved in the near future, the complexity is not translated to runtime directly.
Attribute 3: Benefits Reconstruction. The third attribute of a CIF simulator is that it needs to benefit the reconstruction performance because in CIF, the ultimate goal is to maximize the final image produced by the entire system.
The reconstruction loss is abstractly defined as
(9)
Erecon(Hθ)=defminRψExReconLoss Rψ(y,Hθ),xreconstruction loss.
The reconstruction loss in Eq. (9) is reminiscent to Eq. (4), but the goals are very different. In Eq. (4), the camera Hθ is already chosen except for its parameter θ. Thus, the joint minimization in Eq. (4) is to simultaneously optimize for the reconstruction (realized via a neural network) and the state vector θ of the camera. In contrast, the minimization problem in Eq. (9) says that we do not even know which simulator to use. Using atmospheric turbulence as an example (see Section 3), the decision could be the choice between two completely different models, e.g., the pixel-jitter model or the deformable model. Therefore, in CIF simulator selection, the difficulty is how to build a simulator that fulfills the criteria instead of adjusting parameters of some simple equations.
Attribute 4: Differentiable. The fourth attribute of a simulator is that it needs to be differentiable. The metric for differentiability can be a simple indicator function:
(10)
Ediff(Hθ)=def0,if  Hθ is differentiable ,,if  Hθ is not differentiable .
Why do we need differentiability? The reason is that most, if not all, image reconstruction algorithms today are based on deep neural networks. For the simulator to be part of the reconstruction framework, it is necessary that the gradient of the loss function can be backpropagated to the input. Therefore, being able to take the gradient of the simulator with respect to its model parameters becomes essential.
Enforcing differentiability can be realized in multiple ways. The easiest way, of course, is to build a simulator using a deep neural network. However, speaking from experience, physics-based simulators often offer significantly better guarantees of the theoretical properties and are more interpretable. Yet, physics-based simulators are often complicated. Enforcing differentiability is not a trivial task. For example, iterative procedures such as Newton’s method, which is widely used to find the intersection of a ray and the lens surface, needs to be replaced by another method that is non-iterative before we can make it differentiable.
2.5
Simulator Selection and Parameter Optimization
After stating the four attributes of the simulator, I would like to elaborate on the simulator selection problem and the parameter optimization problem. The two problems are different. The former is more about building a simulator that fulfills the criteria. The latter assumes that the candidate simulator has already been decided, but during the inference process, we would like to update the state vector to better match with the observation. The simple analogy here is that simulator selection picks the number of partitions in a spatially varying blur system (i.e., how many pixels to share one blur kernel), whereas the parameter update assumes that the partition is already fixed but we need to estimate the individual blur kernels.
Simulator Selection. The simulator selection problem can be formulated as a constrained optimization.
(11)
argminHθErecon(Hθ)+Ediff(Hθ)subject to Ecomplexity(Hθ)τcomplexity,Esim(Hθ)τsim
The objective function of the problem is the reconstruction loss Erecon(Hθ) plus the indicator function Ediff(Hθ). Since by construction the simulator needs to be differentiable, the indicator function will serve the purpose of only allowing differentiable simulators to be considered. The main objective is therefore the reconstruction loss.
There are two constraints in the optimization. The complexity constraint limits how complex the simulator can be. The simulator loss constraint controls the match between the simulator Hθ and the ground truth model G. The reason why these two criteria are listed as constraints instead of the objective is that in practice, the simulator designer would seldom navigate the four criteria simultaneously when choosing the simulator. It is far more common to develop a simulator with a certain level of complexity and simulation accuracy in mind, while leaving some design freedom to the reconstruction algorithm.
I should stress that while the simulator selection problem appears like an ordinary constrained optimization, the problem is never solved using any gradient based methods. In contrast, the problem is more often solved manually based on experience and creativity. So it is more of an art.
Parameter Optimization. To stimulate the parameter update discussion, it is best to first consider an example.
Example 4.
(Blind deconvolution) In the classical blind image deconvolution problem, we are often interested in the alternating minimization
hk+1=argminhhxky2+prior (h),xk+1=argminxhk+1xy2+prior (x),
where prior(⋅) is a generic notation representation the prior model of the respective variable.
The first equation, put into the CIF framework, can be rewritten as
θk+1=argminθSimLoss H θ(xk),G(x).
Here, G is the true image formation process given by nature, and Hθ(x)=hx+n is the approximation. The second equation can be written as
xk+1=Rψ(y,Hθk),
which is a generic form of the image reconstruction method.
To be precise about the state vector θ, one can think of it as the parametric description of the forward model Hθ. For example, in Ref. [9], key points are used to characterize the motion where a continuous curve is drawn to connect the key points to generate a motion blur kernel. The key points are coordinates in the 2D space, and the state vector θ is the collection of all key points.
The above example illustrates the typical procedure of updating the simulator’s parameters and estimating the latent image. These two steps can be summarized by the following pair of iterative updates:
(12)
xk+1=Rψ(y,Hθk)
(13)
θk+1=argminθSimLoss (H θ(xk+1),G(x))
As mentioned earlier, this pair of equations are performed after the simulator is selected.
2.6
Discussions
Ideal G ? At this point, it should be clear why a perfect simulator Hθ=G may not be a good simulator in CIF. If G is intrinsically complicated, then matching it would require a complex Hθ. This will violate Attribute 2 (simple and fast), and possibly Attribute 4 (differentiable). There is also a possibility that Attribute 3 (improve reconstruction) can be poor because a complex Hθ often leads to an ill-posed inverse problem. The reconstruction model Rψ has to be even more complex in order to invert the effect of G. This means that we have a good forward model, but we cannot solve the inverse problem.
Compared to a physics simulator. Compared to a pure physics simulator, a CIF simulator has additional emphasis on speed, reconstruction, differentiability and compatibility with the reconstruction algorithm. This leads to the summary in Table I.
Table I.
Comparison between a physics simulator and CIF simulator.
Physics simulatorCIF simulator
GoalDescribe natureHelp image reconstruction
Accurate?As much as possible because we need it to reproduce nature.No need to be absolutely accurate. As long as it meets a certain level, it is okay.
Fast?Fast is certainly welcome, but slow is also okay if it is justified by application.Need to be very fast, for synthesizing datasets and integrating with inverse algorithms.
Benefits Reconstruction?Irrelevant because the simulator’s goal is to reproduce nature.Simulator in the reconstruction loop is the key of CIF.
Differentiable?Irrelevant. No needNeeded, especially if we want to co-optimize reconstruction and simulator.
Compared to computational camera. How is CIF different from a computational camera? In a computational camera, the optimization variable is the hardware camera. For example, we adjust the lens parameter, or the mask in coded aperture, or the pattern of the exposure multiplexing scheme. CIF can, in principle, be a superset because these hardware elements can always be described by mathematical models. However, the bigger difference (and also the problem context) is that CIF is more relevant to situations where G is not easily parameterized by simple equations. Weather is the best example to think about. In those cases, choosing which Hθ is more relevant than optimizing for θ. In short, CIF is not only about the co-optimization of the image formation elements, but also about the selection of the image formation model.
Compared to model selection literature. The subject of simulator selection is in many ways similar to the classical topic on model selection [10]. The core difference, however, is that in model selection, the models are more or less simple equations; for example, the 1-norm (LASSO) or the 2-norm (ridge). In CIF, the selection of simulator can be drastically more difficult. Often times, there may not be any readily available simulators to “choose from”. The designer needs to develop a simulator to match the complexity and simulation loss, while maximizing the image reconstruction performance. This is not a simple gradient search problem, but more of a design problem.
Because of the different goals and different problem settings, the methodologies used in model selection largely do not apply. For example, tools such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) are not applicable because (i) even though there is an underlying probability distribution of the true data, we do not know what it is, (ii) most simulators are complex numerical procedures. Analyzing them and calculating in the context of AIC and BIC is not feasible, (iii) there exist domain-specific alternative assessment techniques that are much more reliable and direct than AIC and BIC.
Trade-off. Because of the conflicting goals in the simulator selection problem, it would be useful to see the tension via the illustration in Figure 4. In this figure, the y-axis defines the reconstruction loss whereas the x-axis defines the simulator loss (or complexity). The question of CIF is about which point on the trade-off curve to pick.
Figure 4.
A conceptual illustration of the trade-off between the performance of the simulator and the image reconstruction.
On the far right of the figure, the simulator is too simple to be of any use. Even though the reconstruction is easy, the reconstructed image will not be good. Moving slightly to the left, the simulator is improved and the reconstruction is also improved, thanks to a more meaningful simulator. The benefit can come from the data synthesis (i.e. to generate a good training dataset), as well as in-the-loop inverse algorithm. The optimal operating point is somewhere in the middle, where the simulator is reasonably accurate and the reconstruction is also decent. As the simulator moves towards the left hand side, the reconstruction becomes more difficult. Not only do we have a harder time to synthesize enough data because the simulation can be time-consuming, but we also have a harder time to put the simulator in the reconstruction loop. As the simulator moves to the far left, the simulator is extremely good but it offers little to no help to the reconstruction. Thus, the performance can be very bad. In Fig. 4, a colored region is marked with a cutoff τ. Any simulator that gives a performance better than τ will be inside the region. Therefore, among all the possible simulators, only those that meet the simulator criteria can be adopted for reconstruction.
3.
Case Study: Imaging through Turbulence
Now that we have seen what CIF means, it is time to look at a concrete example. I will use turbulence as an illustration because of the richness of the literature. To give you an idea of the accuracy of the model, I will use the number of stars ⋆ is used to indicate how accurate the simulator is. A simulator with one (⋆) means that it is cheap but inaccurate, whereas a simulator with (⋆ ⋆ ⋆ ⋆⋆) means that it is extremely accurate. The number of stars is partially objective because we can measure the quality, but it is also partially subjective because much of these are based on my experience.
3.1
Wave Propagation Physics
Our atmosphere is a complicated medium because of its turbulent nature. Factors such as temperature, wind velocity, humidity, and other weather conditions can affect the strength of turbulence. As light propagates, the random index of refraction will cause phase delays along the optical paths, leading to pixel displacements and blurs [11, 12]. The exact image formation for one specific turbulent instance cannot be determined because it is stochastic.
Split-step (“Gold standard”) (⋆ ⋆ ⋆ ⋆⋆) Let me start by discussing the most accurate model so that we have a reference point. In physics, the most accurate model for atmospheric turbulence, to date, is the split-step propagation [13]. The model says that we can partition the propagation path into a sequence of segments where each segment has a random phase screen determining the phase distortion. The propagation is performed by integrating the electromagnetic field u of which the magnitude produces the image x = |u|2.
Without diving into the technical details, we can think of the split-step propagation as a sequence of operations along the light propagation path. At the ith segment, the electromagnetic field propagates according to the equation
(14)
ui+1=Fresnel (uiith fieldejϕiith phase screen),
where ⊙ denotes elementwise multiplication, Fresnel(⋅) denotes the Fresnel diffraction integral [14, 15], and ϕi denotes the ith phase screen drawn from the Kolmogorov power spectral density [1618]. (Assuming a “typical” situation with a small to moderate Rytov number so that we can ignore the amplitude effect and the power spectral density roughly follows the 5∕3-power law predicted by Kolmogorov [19].) The process is nonlinear because for every pixel in the object plane, the equation has to be evaluated repeatedly (which includes Fourier transform, phase cropping, and multiplication), as illustrated in Figure 5. For completeness, the equation is written as
(15)
ŷ=|Fresnel (Kolmgrv ((Fresnel (Kolmgrv (x)))))Hθsplit(x)|2,
where Hθsplit denotes the split-step model, with θ defining the random phase screens along the propagation path.
Figure 5.
The split-step propagation model Hθsplit is considered as the “gold standard” in modeling atmospheric turbulence. While the model is accurate, its numerical wave propagation equation makes the implementation extremely difficult and very slow. Thus, it is a good simulator for mimicking nature, but not so much for image restoration.
For decades, physicists have agreed that the split-step propagation is very reliable and accurate for simulating atmospheric turbulence. With a sufficient number of phase screens along the path, and assuming that the turbulence is at the most moderately strong, it is safe to say that
(16)
G(x)Hθsplit(x).
Various simulation packages have been built [2030], and they are widely used to study the optical communication, high energy beam propagation, and astronomical applications. There are also attempts to use these simulators for image processing tasks, but scholars generally find split-step too slow for any real-time applications. The bigger problem arises if we want to use Hθsplit for image restoration. Let’s see why.
3.2
Image Restoration
Split-step propagation is never used as part of the image restoration pipeline. I will illustrate this using two reconstruction methods.
The first approach is the optimization approach. Using total variation as one of the (many) examples, we have
(17)
x̂=Rψ(y)=argminxHθsplit(x)y2+λxTV.
But solving this problem can be difficult because Hθsplit is complicated. If we perform variable-splitting techniques such as the ADMM algorithm, we have no way of computing the proximal map because the inverse (or regularized inverse) (Hθsplit)1 is not possible to obtain. Note that I am not even talking about estimating the state vector θ, which will add more difficulties to the problem.
Another approach is deep learning. The idea is to train a deep neural network Rψ that performs
(18)
x̂=Rψ(y,Hθsplit).
However, the problem is in generating the clean-noisy pair for training. We can send the clean image through Hθsplit, and repeat the process until we have created a dataset of training samples. This is in theory doable, but since the simulator is so slow, there is no way we can synthesize enough data [31].
There is one more issue about deep learning. If we want to incorporate Hθsplit into the reconstruction neural network Rψ, the simulator needs to be differentiable. Since the split-step simulation is a serial chain of complex operations going back and forth with the Fourier transforms per pixel, back propagating the gradient of the loss function through the simulator is nearly impossible unless we introduce additional approximations.
3.3
Lightweight Simulators
Having discussed the “gold standard” split-step simulator, let us move on to something faster.
Pixel Jitter Model [20]. (⋆) The simplest simulator reported in the literature is the pixel shift + blur model. This model says that
(19)
ŷ=h(jitter (x))Hθjitter(x),
where jitter(⋅) denotes the pixel jittering, where the displacement is a random vector with an independent and identically distribution (i.i.d). The operator h is a spatially-invariant Gaussian blur, and ⊛ is the convolution. Figure 6 shows an example. Hθjitter is very cheap and fast – just draw a random i.i.d. motion vector field (pick any distribution, e.g., Gaussian), and convolve the image with an invariant blur kernel. The state vector θ contains the blur and the motion vector field.
Figure 6.
Random jitter and spatially invariant Gaussian blur. The simulated turbulence effect is quite far from the reality.
Needless to say, inverting Hθjitter is easy, at least in the non-blind case where we know the random vector field and the blur; just remove the blur using any off-the-shelf non-blind deconvolution algorithm, and then un-do the pixel displacement. We can plug it into a neural network. We can also plug it into a classical optimization so we can solve the proximal map. The only caveat (and a big caveat) is that HθjitterG, not even close.
Deformable Model [3236]. (⋆ ⋆) A slightly better version of the simulator is the deformable grid model. Instead of assuming an i.i.d. random displacement field, we select a small set of anchor points in the image grid and perform a non-rigid deformation. The overall equation is
(20)
ŷ=h(deform (x))Hθdeform(x),
The benefit of the deformable model is that it is physically more meaningful because the pixel displacement caused by turbulence is spatially correlated. The deformable model provides a way to enforce the spatial correlation. Figure 7 illustrates an example, where we can see that the deformable grid is more structured. The sampling process is reasonably easy because we only need to identify a few anchor points.
Figure 7.
Deformable field and spatially invariant Gaussian blur. The simulated turbulence effect is better than the random jitter, but still not accurate enough to match the real turbulence.
The deformable grid facilitates the estimation of the pixel displacement because the number of anchor points is finite. The drawback, though, is that the blur remains spatially invariant. It is still valid for isoplanatic turbulence, but definitely not true for anisoplanatic turbulence where the blur must be spatially varying.
Varying Blur Model. (⋆ ⋆⋆ Incomplete) To model anisoplanatic turbulence, people propose to adopt a spatially varying blur. Keeping the deformable grid idea (which can be replaced by other pixel displacement models), the output image is defined in a per-pixel basis:
(21)
ŷi=hi(deform (x))Hθvarying(x),
where i denotes the ith pixel of the image ŷ and hi denotes the blur for the ith pixel. Because of the per-pixel blur, this model can now describe a wider set of turbulence effects. However, the generation of these per-pixel blurs remains unclear (that’s why I mark it as “incomplete”). In the example blur kernels I show in Figure 8, I use the phase-over-aperture model (to be discussed in the next subsection) to generate the spatially varying blur. These blurs are then integrated into this model Hθvarying to perform the distortion.
Figure 8.
Spatially varying blur kernels used in Hθvarying. The kernels are generated using the phase-over-aperture model. Note that the shape of the blur changes from one location to another.
The inverse problem associated with Hθvarying is not easy, because spatially varying blurs do not have simple inverses using standard tools such as the Fourier transform. Therefore, additional approximations are needed to reduce complexity.
Flipped Model [35, 37, 38]. (⋆ ⋆⋆) Because of the difficulty of inverting the spatially varying blur, a work-around solution has been proposed by observing that turbulence-distorted images exhibit a lucky effect. From time to time, and from pixel to pixel, there are instants where the turbulence distortion is weak. The lucky pixels / patches existing in the raw video input are blur-free and displacement-free. Therefore, if we can identify these lucky patches, then we can mitigate majority of the turbulence effects, leaving only the diffraction limited blur. This two step procedure can be summarized as
(22)
x̂=deblur lucky (y)reconstruction.
There are many variations of this approach, such as [32, 36, 3941].
The two-step procedure inspired researchers to consider an alternative model where we flip the order of blur and pixel displacement in the simulator. This gives us the flipped model:
(23)
ŷi=deform (hix)Hθflipped(x),
which according to [42], is called the blur-tilt model. The blur-tilt model is not the same as the tilt-blur model because it can be proved that wave propagation follows tilt-blur but not blur-tilt. However, the blur-tilt model is easier from the angle of inverse problem because inverting the deformation can be realized by lucky imaging algorithms.
Figure 9.
Two models: Tilt-then-blur Hθvarying and blur-then-tilt Hθflipped. They generate similar result for most regions but not for edge pixels. As shown in [42], Hθflipped has a tendency to “destroy” the blurs.
Summary. As you can see, the consideration of which Hθ to use is not a simple question of matching physics. If it was, then any of the cheap models would have no value. But the fact that they have been used in the literature and sometimes generated satisfactory results speak of their validity. They are not justified from the angle of physics, but the angle of image reconstruction.
3.4
Differentiable Simulator
Over the past few years, there is an increasing amount of effort to build faster and more accurate turbulence simulators with the goal of maximizing the image reconstruction performance. One of these efforts is a series of work of random sampling in the phase domain. In this subsection, I briefly comment on their basic principles and highlight a few key attributes, particularly, differentiability and scalability.
Phase-over-aperture [43] (⋆ ⋆ ⋆  ⋆, not differentiable) One of the biggest hurdles of the spatially varying blur model Hθvarying is that there is no simple way of constructing the spatially varying blur kernels while satisfying the turbulence physics. In addition, the pixel displacement motion field is also not a simple deformable grid. These two limitations require a new method to model the turbulence.
The idea behind the phase-over-aperture model, proposed in Ref. [43], was to recognize that the overall distortion (blur+displacement) is constructed from the Fourier magnitude square of the phase:
(24)
hi=|Fourier (ejϕi)|2,
where ϕi is the phase function of the ith pixel. The displacement motion vector can be obtained by identifying the centroid of hi. Any displacement from the centroid will be marked as the pixel displacement vector. Therefore, for every pixel i, there is a one-to-one mapping from the spatial domain to the phase domain, as illustrated in Figure 10.
Figure 10.
Phase-over-aperture model assumes that every point spread function has a corresponding phase function. Thus, as long as we can generate the phase functions efficiently, we can construct the point spread function without using numerical propagation techniques.
For any i, the phase function ϕi has a basis representation using the Zernike polynomials:
(25)
ϕiphase function=j=1Mαi,jcoefficientsZjZernike polynomials.
Since the Zernike polynomials are known functions, the blur kernel hi is constructed as soon as the coefficients α = {αi, j | i = 1, …, N, j = 1, …, M} become available.
The construction of the coefficient αi, j can be done by sampling it from an enormous covariance matrix if we follow Tatarskii’s assumption that the underlying random process is Gaussian [11]. In this case, by following the tedious derivations shown in Ref. [43], the coefficients can be obtained by
(26)
α=Sampling from  (E[ααT]),
where E[ααT] denotes the correlation matrix. The correlation matrix can be derived from the turbulence statistics where there are formulae to employ. Depending on how much turbulence statistics is utilized, α can maintain some degree of spatial correlations from one pixel in the scene to another pixel, and from one Zernike basis to another Zernike basis. If the turbulence condition is changed, then the correlation matrix will change and so the random coefficients change too.
Figure 11 shows a random motion field generated by the phase-over-aperture model, as well as the resulting image.
Figure 11.
Phase-over-aperture [43]. (a) The displacement motion field, and (b) The resulting image. Note that the spatially varying blur in Figure 9 is generated using this phase-over-aperture model.
The bottomline of the simulator is that it turns the split-step propagation equation into a random sampling method. This gives us a model
(27)
ŷi=hitixiHθphase(xi),
where ti is the pixel displacement vector that can be extracted from the first two Zernike polynomial bases. The operator denotes the pixel displacement operation.
Phase to Space (⋆ ⋆ ⋆  ⋆, differentiable) [31]. The improvement brought by the phase-over-aperture model is substantial. But in order for it to become useful for image restoration, the speed needs to be further improved and the simulator needs to be differentiable.
To enable a differentiable simulator while being fast, scholars recognized that the blur can be efficiently represented in a low-dimensional space using fixed and pre-defined basis functions, as follows:
(28)
hi==1Lβi,φ.
Here, the functions φ is the th basis function of the blur, and βi, is the th coefficient for the ith blur kernel. A common choice of the basis functions is the Gaussian basis and its derivatives, although the authors of [31] showed that one can also perform principal component analysis (PCA) to construct the basis functions.
Why do we need to write hi as a linear combination of the basis functions? The reason is that we can now represent all the hi’s using the coefficient vector β = {βi,  | i = 1, …, N, = 1, …, L}. Then, as long as the relationship between α and β is established, we can go back and forth between the representations.
(29)
αPhase to Spaceβ
The forward mapping from α to β is called the phase-to-space transform (P2S) [31]2
2
The inverse mapping from β to α is known as the space-to-phase transform (S2P). The implementation of S2P remains an open problem, as on date.
. P2S is implemented using a very shallow neural network because the dimensionality of the input is typically αR36 whereas that of the output is around βR100. Thus, a shallow 3-layer fully connected network is sufficient to learn the mapping. Figure 12 illustrates the conceptual diagram of P2S.
Figure 12.
Phase-to-space (P2S) simulator HθP2S defines a mapping from the phase domain to the spatial domain using two different forms of linear representations. The mapping is implemented using a shallow neural network, and hence the model is differentiable.
With P2S, the turbulence model is now fully differentiable. This is because the distorted pixel can be obtained through the linear equation
(30)
ŷi==1Lβi,φxHθP2S(xi),
where θ = β. As long as β is known, the simulator only needs to perform the basis convolutions. Since βα via P2S, the simulation model is completely characterized by the Zernike coefficient α. Moreover, since the P2S model HθP2S is linear in β, any gradient with respect to β can be computed. Even if we need to compute the gradient with respect to α, we can do so by backpropagating the gradient through the P2S neural network (which is a three-layer fully connected network.)
Further Improvements. There are additional points scholars proposed to improve the speed and generality of the P2S simulators. For example, in [44], it was shown that the off-diagonal blocks of the correlation matrix E[ααT] can be truncated without hurting the performance. In [45], it was shown that with a carefully designed reformulation, the simulator can be generalized to an arbitrary turbulence profile along the path (i.e., the structure constant Cn2 changes with respect to the distance). Both changes do not affect the differentiability of the simulator, but they make the simulator even faster and more accurate. Another modification is the recognition of the scattering/gathering forms of convolution in [46]. It was shown that for the physics to be valid, the convolution needs to be implemented in the scattering form.
3.5
Simulator-in-Reconstruction
An important aspect of CIF is to use simulators in reconstruction. The goal of this subsection is to explain how this can be (has been) achieved.
Simulator as Synthesizer for Training Data. The most straightforward application of the simulators is to use them to synthesize training data. Starting with a collection of clean images, the simulator generates the turbulence effect to produce a distortion-clean pair.
One often overlooked usage of a simulator is its flexibility in synthesizing data according to the needs of the training:
Multiple scales: A simulator can synthesize turbulence across different resolutions. For example, for a fixed propagation distance and object size, a lower resolution image would require a smaller displacement vector and a smaller blur kernel. These images are often easier to restore. If the simulator can produce multi-resolution distortions, the overall restoration will be benefited.
Tilt-free, blur-free, all-free: A simulator can generate tilt-only distortions, blur-only distortions, or no distortion. This provides a powerful way to disentangle the coupling effects of the turbulence and object movement. For approaches such as knowledge distillation or student-teacher learning, the decoupling capability offered by a simulator is the key enabler because there is no alternative ways we can train those models.
How much improvement does a good simulator offer when compared to a bad simulator? Various studies have reported a consistent observation that the difference is significant [47, 48]. Figure 13 shows a comparison between TSRWGAN [49] and a variant of the P2S [47], reported in [48]. A common neural network architecture is chosen, and it is trained using two different datasets. TSRWGAN is a more rudimentary simulator with simple deformable grid and blurs, whereas P2S is more advanced. The results shown in Fig. 13 provides a strong evidence that a better simulator indeed makes a big difference in terms of image restoration.
Figure 13.
How much difference does a simulator make? This figure shows the reconstruction result of a network trained using datasets generated by two different simulators: TSRWGAN [49] and P2S [47]. With a fixed network architecture, the effect of the simulator is evident.
Simulator inside a Generative Adversarial Network. A simulator can be directly used in image restoration, for example through the generative adversarial network (GAN) [50, 51]. In the GAN setting, the simulator can be used as part of the generative branch to synthesize what a distorted image should look like. This mirrors nature where the image formation is determined by physics and the image sensor. The performance of the simulator has a direct impact on the performance of the GAN. If the simulator fails to mimic nature, then the simulated distorted image would not appear similar to the true distorted image. This adds more burden to the generator where it needs to compensate for the mismatch error caused by the simulator in addition to generating the latent unknown image. An accurate split-step propagation would not work either because it is simply too slow and it is not differentiable.
Simulator for Re-degradation Loss. Besides GAN, another approach to use simulator in the reconstruction loop is to consider the consistency loss, defined as
(31)
Consistency Hθ(Rψ(y))Hθ(x̂),G(x).
This is an additional loss put on top of the traditional reconstruction loss ReconLoss (x̂,x).
The performance of the image reconstruction depends on two factors: (i) How good is Hθ compared to G ? (ii) How good is x̂ compared to the ground truth x? Many scholars have asked why ReconLoss (x̂,x) is insufficient. The answer is that the reconstruction loss never provides any explicit knowledge about the forward model. Since G and Hθ are often ill-conditioned and so many x can be mapped to the same y, the consistency loss helps the reconstruction by enforcing it not to create artifacts that cannot be explained by the forward model. The benefit of the consistency loss is supported by numerical evidence in Figure 14.
Figure 14.
What happens when a simulator is not used? The image reconstruction performance is significantly worse. The figure is adopted from [52].
Looking Forward. The results of the latest reconstruction methods are promising. However, the full power of CIF is yet to be explored. Here are a few observations:
State vector θ update. As I explained previously, the state vector update is analogous to the blur kernel estimation problem in blind deconvolution. Given a distorted image, having the ability to estimate the turbulence parameters could significantly reduce the uncertainty of the reconstruction. However, as on date, these ideas are yet to be developed.
State encoding. The state vector θ is often a high dimensional vector, e.g., the collection of Zernike coefficients over the entire image. However, it is likely that θ has a low dimensional representation. Presently, there is little understanding of how these state vectors can be encoded more efficiently.
Bijective mapping from pixel space to embeddings. An open problem today is the non-uniqueness of the low-dimensional representation. Using Zernike coefficients as an example, it is possible that two sets of Zernike coefficients can give the same pixel-level image distortion. Therefore, while P2S is easy to do, the inverse mapping S2P can be significantly harder. For many problems dealing with optics through different environments, such a bijective mapping is an important technical challenge.
Ultra-fast simulator. Advanced simulators such as P2S can achieve 40 frames per second for a 512-by-512 image. However, for P2S to be used as part of an iterative algorithm or used as an integral part of the reconstruction neural network, the runtime would probably need to be suppressed to a microsecond range. This is a major challenge for both hardware (GPU) and algorithm.
4.
More Examples
In this section, I would like to elaborate more on how CIF could fit other imaging problems by discussing a few other examples in addition to the turbulence example above.
4.1
De-raining and de-hazing
Consider the problem of imaging through rain and haze. Like imaging through atmospheric turbulence, the exact ground truth image formation y=G(x) cannot be determined exactly due to the stochastic nature of the process [53, 54].
In the literature, the degradation can be modeled in different ways [55]. For example, if the distortion is caused by rain streak, then
(32)
Hθstreak(x)=x+b,
where b is a sparse vector representing the line streak effect of the rain. If the distortion is caused by adherent raindrops, [56] proposed a model
(33)
Hθraindrop(x)=(1M)x+d,
where M is a binary mask indicating whether a pixel has a raindrop, ⊙ is the elementwise multiplication, and d is a sparse vector with localized scattering raindrops. If the distortion is caused by haze and rain streak, then the model is
(34)
Hθhaze(x)=xt+A(1t)+b,
where t is the transmission map (see Example 2), A is the airlight color transformation, and b is a sparse vector representing the streak.
In recent years, there has been increased efforts to use neural networks to learn the model so that Hθ can be more similar to G. For example, [57] proposed the model
(35)
yt=Hθdynamic(xt)=x+bt+nt,
where t denotes the time, and nt ∼Gaussian(0, σ2) is the noise vector. The rain model bt is defined through some variations of the Markov process such that
(36)
bt=Aα(bt1),
where Aα is a neural network that provides a memoryless update based on previous time stamps. Because Aα is a neural network, it inherits several desired properties such as being differentiable. Since the simulator is co-optimized with the reconstruction algorithm, the reconstruction performance is evidently better.
Besides these examples, the general direction of imaging through adverse weather today is to inject more physics into the problem [58]. In the area of rain, snow, and fog, there is an increasing amount of high quality physics-based simulators that can simulate these optical effects [59, 60]. The usage highlights the relevance of CIF.
4.2
Differentiable Optics
While CIF is mostly concerned with degradation processes arising from nature, the concept can be applied to other forms of optics-algorithm co-design problems such as the one summarized in Figure 15.
Figure 15.
Differentiable optics. In a conventional camera system, there is an optical component, an image signal processing (ISP) unit which controls the exposure, color filter, etc, and an image reconstruction algorithm. Differentiable optics aims at approximating the physical optics module and the ISP with neural networks so that the entire system is differentiable. This will enable end-to-end training of the image reconstruction algorithm.
(1) Differentiable optics for lens systems [6164]. Traditional lens design is a standalone process where people use ray-tracing tools such as Zemax to optimize the parameters of the lenses. If one wants to design the downstream image reconstruction algorithm, these ray tracing tools, however, would be incompatible with the reconstruction. To overcome the difficulty, various methods have been proposed to approximate the true lens system G with neural networks Hθ. Since the reconstruction is usually a neural network, having a neural network Hθ will give us an end-to-end differentiable camera system.
Today, Hθ is mainly used to improve the image reconstruction. There is relatively little work on co-designing the lens parameters. The reason is that co-designing the lens parameters would require a method to “translate” the weights of the neural network Hθ to the lens parameters. This is largely an open problem.
(2) Metasurface design [65, 66]. Another usage of CIF is the design of the metasurfaces. Metasurfaces are nanoscale materials where each element can be engineered to perform a specific phase operation. Compared to traditional glass-based lenses which are bulky, metasurfaces are substantially thinner while achieving a competitive optical performance. The design of metasurfaces is often performed together with the image reconstruction algorithm. This is because metasurfaces today still have many limitations in terms of chromatic aberration control and spatially varying points spread functions. Therefore, it is necessary to co-design a deconvolution algorithm.
(3) Differentiable rendering or computational light transport [6769]. The problem here is more concerned with a realistic rendering of objects in computer graphics. For example, as light propagates through milk and wax, how does the image look like? Or, if the light source is located at a certain position in the room, how will the light bounce among the walls and eventually reach the camera? Because of the complexity of the actual environment and the underlying image formation process (which sometimes requires us solving partial differential equations), newer approaches attempt the problem by approximating the ground truth G with a neural network Hθ.
4.3
Image Sensor Circuit Model
Thus far I have been mainly talking about optics. But CIF can be extended to other components such as the circuit level modeling of image sensors from photo diodes, comparator, capacitors to the output signal.
The top row of Figure 16 shows the circuit diagram of a dynamic vision sensor (DVS) [70]. The exact signal formation process y=G(x) is both stochastic and complex. However, it is possible to approximate the transient behavior using an ordinary differential equation and the probabilistic events by drawing samples from a pre-defined covariance matrix. This leads to the bottom row of Fig. 16 where there are two ordinary differential equation blocks and two autoregressive blocks. Simpler models have been proposed [71], but it was shown that the performance is not sufficient.
Figure 16.
What happens when a simulator is not used? The image reconstruction performance will be significantly worse. The figure is adopted from [70].
Besides DVS, there is also a growing interest in novel digital image sensors including photon counting devices, quanta image sensors (QIS), and single-photon avalanche diodes. Using QIS as an example, a series of 1-bit and multi-bit models have been proposed [7274] and analyzed [75, 76], together with various image reconstruction algorithms [7781] and applications [82, 83].
5.
Thoughts and Discussions
In this section I summarize a few commonly asked questions about CIF.
Is CIF = model-based image reconstruction (MBIR)? My view is that CIF and MBIR are aiming for two different goals. In MBIR, the premise of the problem is that someone has given us a model ŷ=Hθ(x). Our job is to find the best algorithm to solve for x, by exploiting various signal priors such as sparsity or generative models. CIF, in contrast, primarily focuses on the design of Hθ. As illustrated in the previous sections, there are accurate but complex Hθ and less accurate but effective Hθ for the inverse solver. Constructing a meaningful Hθ while maintaining the computational efficiency is what CIF is about.
Why not just use neural networks to approximate G ? With the growth of building deep neural networks to mimic the optics, we might be tempted to think that a good CIF simulator must be a neural network (so that everything is differentiable). I personally think that this is not the best (nor the only) direction. While I completely acknowledge the power of a neural network, I do not think today’s neural networks have advanced to the stage that it can perform every task without using a much larger model. For some specialized tasks such as solving an ordinary differential equation, a neural network could offer a powerful approximation. But for equations such as Fresnel propagation, Fourier transforms are much more efficient.
I envision that future CIF simulators will most likely be a hybrid model where neural networks are used as one of the building blocks to complete some specific sub-tasks. Differentiability can be ensured without a neural network. For example, in the turbulence case study described above, the differentiability is enabled by a different representation of the phase function. Even for tasks such as ray tracing, it will be far more interesting (and impactful) to derive new equations that preserves differentiability without using automatic differentiation of computational graph of any sort.
Is sensor-algorithm co-optimization always needed? As I follow past few year’s of publications in computational imaging, I observe a trend that whenever we see a sensor and an algorithm, it will be sensor-algorithm co-optimization. I can see the necessity of co-optimization if the goal is to maximize the system’s performance unconditionally. However, from a practical point of view, we should not forget about the feasibility and physical constraints. In a recent paper [84], it was shown that co-optimization brings negligible benefits to the actual performance in some specific problems. This counter example is perhaps a good reminder to us about the reality.
6.
Conclusion
Computational imaging is the intersection of image acquisition, signal prior, and numerical algorithm. Forty years ago when we were still in the beginning of solving inverse imaging problems, our attention was mostly spent on developing better and more powerful priors (i.e., regularization functions such as 1 norm, total variation, Markov random field, etc) together with faster numerical algorithms (e.g., gradient descent, alternating minimization, operator splitting, etc). As we continue to advance computational imaging in 2023, it is perhaps time to rethink about the role of the forward model that describes the image acquisition process.
This article describes the concept of computational image formation (CIF). CIF highlights the choice of a simulator Hθ that approximates the true image formation process G. Unlike a physics simulator whose goal is to match G unconditionally, the simulator Hθ in CIF needs to maximize the image reconstruction performance while matching G up to some level. Moreover, the simulator needs to be very fast so that it can be used to generate data, and it needs to be differentiable so that it can be used in the reconstruction loop. Several examples are elaborated to explain CIF.
We all stand on the shoulder of giants. CIF is no exception. It is a concept summarizing decades of collective efforts of the computational imaging community. As we look forward to the future of imaging, it is I envision that simulators will play an unprecedented role in the deep learning era.
Acknowledgment
The research is based on work supported in part by the Intelligence Advanced Research Projects Activity (IARPA) under Contract No. 2022–21102100004, and in part by the National Science Foundation under the grants 2133032, 2030570, 2134209 and 1763896. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
References
1FrisiusR. G.De Radio Astronomico et Geometrico Liber1558ap. Gul CavellatISBN-10: 1166638855
2BoumanC.Foundations of Computational Imaging: A Model-Based Approach2022SIAMPhiladelphia, PAISBN: 978-1-61197-712-7
3BhandariA.KadambiA.RaskarR.Computational Imaging2022MIT PressCambridge, MA
4HeK.SunJ.TangX.Single image haze removal using dark channel priorIEEE Conf. Computer Vision and Pattern Recognition (CVPR)2009IEEEPiscataway, NJ195619631956–6310.1109/CVPR.2009.5206515
5FattalR.2008Single image dehazingACM Trans. Graph. (TOG)27191–910.1145/1360612.1360671
6van RossumM. C. W.NieuwenhuizenT. M.1999Multiple scattering of classical waves: Microscopy, mesoscopy, and diffusionRev. Mod. Phys.71313371313–7110.1103/RevModPhys.71.313
7KoschmiederH.1924Theorie der horizontalen sichtweiteBeitrage zur Physik der freien Atmosphare12335333–53
8GnanasambandamA.ShermanA. M.ChanS. H.2021Optical adversarial attackIEEE/CVF Int’l. Conf. on Computer Vision (ICCV) Workshops9210192–101IEEEPiscataway, NJ10.48550/arXiv.2108.06247
9SanghviY.MaoZ.ChanS. H.2023Structured kernel estimation for photon-limited deconvolutionIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)986398729863–72IEEEPiscataway, NJ10.48550/arXiv.2303.03472
10DingJ.TarokhV.YangY.2018Model selection techniques: An overviewIEEE Signal Process. Mag.35163416–3410.1109/MSP.2018.2867638
11TatarskiiV. I.Wave Propagation in a Turbulent Medium1961Dover PublicationsNew York, NY
12RoggemannM. C.WelshB. M.Imaging through Atmospheric Turbulence1996Taylor & FrancisOxfordshire
13SchmidtJ. D.Numerical Simulation of Optical Wave Propagation: With Examples in MATLAB2010SPIE PressBellingham, WA
14GoodmanJ. W.Introduction to Fourier Optics20053rd ed.Roberts and CompanyGreenwood Village, CO
15GoodmanJ. W.Statistical Optics20152nd ed.John Wiley and Sons Inc.Hoboken, NJ
16KolmogorovA. N.1941The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbersDokl. Akad. Nauk SSSR30301305301–5
17FrischU.Turbulence: The Legacy of A. N. Kolmogorov1995Cambridge University PressCambridge
18LaneR. G.GlindemannA.DaintyJ. C.1992Simulation of a Kolmogorov phase screenWaves Random Media2209224209–2410.1088/0959-7174/2/3/003
19BosJ. P.RoggemannM. C.GudimetlaV. S. R.2015Anisotropic non-kolmogorov turbulence phase screens with variable orientationAppl. Opt.54203920452039–4510.1364/AO.54.002039
20LeonardK. R.HoweJ.OxfordD. E.2012Simulation of atmospheric turbulence effects and mitigation algorithms on stand-off automatic facial recognitionProc. SPIE85461181–18
21PotvinG.Luc ForandJ.DionD.2011A simple physical model for simulating turbulent imagingProc. SPIE801480140Y
22HardieR. C.PowerJ. D.LeMasterD. A.DroegeD. R.GladyszS.Bose-PillaiS.2017Simulation of anisoplanatic imaging through optical turbulence using numerical wave propagation with new validation analysisOpt. Eng.5607150210.1117/1.oe.56.7.071502
23BosJ. P.RoggemannM. C.2012Technique for simulating anisoplanatic image formation over long horizontal pathsOpt. Eng.5110170410.1117/1.OE.51.10.101704
24SchwartzmanA.AltermanM.ZamirR.SchechnerY. Y.2017Turbulence-indueced 2D correlated image distortionProc. Int’l. Conf. on Computational Photography1121–12IEEEPiscataway, NJ10.1109/ICCPHOT.2017.7951490
25VorontsovM. A.KolosovV.2005Target-in-the-loop beam control: Basic considerations for analysis and wave-front sensingJ. Opt. Soc. Am. A22126141126–4110.1364/JOSAA.22.000126
26LachinovaS. L.VorontsovM. A.DudorovV. V.KolosovV. V.ValleyM. T.2007Anisoplanatic imaging through atmospheric turbulence: Brightness function approachProc. SPIE6708
27RoggemannM. C.WelshB. M.MonteraD.RhoadarmerT. A.1995Method for simulating atmospheric turbulence phase effects for multiple time slices and anisoplanatic conditionsAppl. Opt.34403740514037–5110.1364/AO.34.004037
28RepasiE.WeissR.2008Analysis of image distortions by atmospheric turbulence and computer simulation of turbulence effectsProc. SPIE694169410S
29PotvinG.2020Linear perturbation model for simulating imaging through weak turbulenceOpt. Eng.5903410510.1117/1.OE.59.3.034105
30HardieR. C.RucciM. A.DaporeA. J.KarchB. K.2017Block matching and wiener filtering approach to optical turbulence mitigation and its application to simulated and real imagery with quantitative error analysisOpt. Eng.56071503
31MaoZ.ChimittN.ChanS. H.2021Accelerating atmospheric turbulence simulation via learned phase-to-space transformIEEE/CVF Int’l. Conf. on Computer Vision147591476814759–68IEEEPiscataway, NJ10.48550/arXiv.2107.11627
32ShimizuM.YoshimuraS.TanakaM.OkutomiM.2008Super-resolution from image sequence under influence of hot-air optical turbulenceIEEE Conf. on Computer Vision and Pattern Recognition181–8IEEEPiscataway, NJ10.1109/CVPR.2008.4587525
33MaoY.GillesJ.2012Non rigid geometric distortions correction - application to atmospheric turbulence stabilizationInverse Probl. Imaging3531546531–4610.3934/ipi.2012.6.531
34GillesG.OsherS.2016Wavelet burst accumulation for turbulence mitigationJ. Electron. Imaging2503300310.1117/1.JEI.25.3.033003
35LouY.KangS. H.SoattoS.BertozziA.2013Video stabilization of atmospheric turbulence distortionInverse Prob. Imaging7839861839–6110.3934/ipi.2013.7.839
36ZhuX.MilanfarP.2013Removing atmospheric turbulence via space-invariant deconvolutionIEEE Trans. Pattern Anal. Mach. Intell.35157170157–7010.1109/TPAMI.2012.82
37LauC. P.LaiY. H.LuiL. M.2019Restoration of atmospheric turbulence-distorted images via RPCA and quasiconformal mapsJ. Inverse Probl.351331–33
38LauC. P.LuiL. M.2021Subsampled turbulence removal networkMath. Comput. Geom. Data11331–3310.4310/MCGD.2021.v1.n1.a1
39MaoZ.ChimittN.ChanS. H.2020Image reconstruction of static and dynamic scenes through anisoplanatic turbulenceIEEE Trans. Comput. Imaging6141514281415–2810.1109/TCI.2020.3029401
40AnantrasirichaiN.AchimA.KingsburyN. G.BullD. R.2013Atmospheric turbulence mitigation using complex wavelet-based fusionIEEE Trans. Image Process.22239824082398–40810.1109/TIP.2013.2249078
41AnantrasirichaiN.AchimA.BullD.2018Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusionIEEE Int’l. Conf. on Image Processing289528992895–9IEEEPiscataway, NJ10.48550/arXiv.1808.03550
42ChanS. H.2022Tilt-then-blur or blur-then-tilt? clarifying the atmospheric turbulence modelIEEE Signal Process. Lett.29183318371833–710.1109/LSP.2022.3200551
43ChimittN.ChanS. H.2020Simulating anisoplanatic turbulence by sampling intermodal and spatially correlated Zernike coefficientsOpt. Eng.5908310110.1117/1.OE.59.8.083101
44ChimittN.ZhangX.MaoZ.ChanS. H.2022Real-time dense field phase-to-space simulation of imaging through atmospheric turbulenceIEEE Trans. Comput. Imaging8115911691159–6910.48550/arXiv.2210.06713
45ChimittN.ChanS. H.2023Anisoplanatic optical turbulence simulation for near-continuous cn2 profiles without wave propagationSPIE Opt. Eng.6207810310.48550/arXiv.2305.09036
46ChimittN.ZhangX.ChiY.ChanS. H.“Scattering and gathering for spatially varying blurs,” Preprint, arXiv:2303.05687 (2023)
47MaoZ.JaiswalA.WangZ.ChanS. H.2022Single frame atmospheric turbulence mitigation: A benchmark study and a new physics-inspired transformer modelEuropean Conf. on Computer Vision430446430–46SpringerCham10.48550/arXiv.2207.10040
48ZhangX.MaoZ.ChimittN.ChanS. H.“Imaging through the atmosphere using turbulence mitigation transformer,” Available online: https://arxiv.org/abs/2207.06465. (2022)
49JinD.ChenY.LuY.ChenJ.WangP.LiuZ.GuoS.BaiX.2021Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learningNature Mach. Intell.3876884876–8410.1038/s42256-021-00392-1
50LauC. P.SouriH.ChellappaR.2021Atfacegan: Single face semantic aware image restoration and recognition from atmospheric turbulenceIEEE Trans. Biometrics Behav. Identity Sci.3240251240–5110.1109/TBIOM.2021.3058316
51FengB. Y.XieM.MetzlerC. A.2022Turbugan: An adversarial learning approach to spatially-varying multiframe blind deconvolution with applications to imaging through turbulenceIEEE J. Sel. Areas Inf. Theory3543556543–5610.1109/JSAIT.2023.3234225
52JaiswalA.ZhangX.ChanS. H.WangZ.Physics-driven turbulence image restoration with stochastic refinementIEEE Int’l. Conf. Computer Vision (ICCV)2023IEEEPiscataway, NJ121701218112170–8110.48550/arXiv.2307.10603
53CantorA.1978Optics of the atmosphere–scattering by molecules and particlesIEEE J. Quantum Electron.14698699698–910.1109/JQE.1978.1069864
54GargK.NayarS. K.When does a camera see rain?IEEE Int’l. Conf. on Computer Vision (ICCV)2005Vol. 2IEEEPiscataway, NJ106710741067–7410.1109/ICCV.2005.253
55LiS.AraujoI. B.RenW.WangZ.TokudaE. K.JuniorR. H.Cesar-JuniorR.ZhangJ.GuoX.CaoX.2019Single image deraining: A comprehensive benchmark analysisIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)383338423833–42IEEEPiscataway, NJ10.1109/CVPR.2019.00396
56YouS.TanR. T.KawakamiR.MukaigawaY.IkeuchiK.2016Adherent raindrop modeling, detection and removal in videoIEEE Trans. Pattern Anal. Mach. Intell.38172117331721–3310.1109/TPAMI.2015.2491937
57YueZ.XieJ.ZhaoQ.MengD.2021Semi-supervised video deraining with dynamical rain generatorIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)642652642–52IEEEPiscataway, NJ10.1109/CVPR46437.2021.00070
58KadambiA.de MeloC.HsiehC.-J.SrivastavaM.SoattoS.2023Incorporating physics into data-driven computer visionNature Mach. Intell.5572580572–8010.1038/s42256-023-00662-0
59BaY.ZhangH.YangE.SuzukiA.PfahnlA.ChandrappaC. C.de MeloC. M.YouS.SoattoS.WongA.KadambiA.AvidanS.BrostowG.CisséM.FarinellaG. M.HassnerT.Not just streaks: Towards ground truth for single image derainingComputer Vision – ECCV 2022. ECCV 2022Lecture Notes in Computer Science2022Vol. 13667SpringerCham723740723–4010.1007/978-3-031-20071-7_42
60ZhangH.BaY.YangE.MehraV.GellaB.SuzukiA.PfahnlA.ChandrappaC. C.WongA.KadambiA.2023Weatherstream: Light transport automation of single image deweatheringIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)134991350913499–509IEEEPiscataway, NJ
61TsengE.MoslehA.MannanF.St-ArnaudK.SharmaA.PengY.BraunA.NowrouzezahraiD.LalondeJ.-F.HeideF.2021Differentiable compound optics and processing pipeline optimization for end-to-end camera designACM Trans. Graph. (TOG)4018.118.1918.1–18.1910.1145/3446791
62WangC.ChenN.HeidrichW.2022dO: A differentiable engine for deep lens design of computational imaging systemsIEEE Trans. Comput. Imaging8905916905–1610.1109/TCI.2022.3212837
63ChenN.CaoL.PoonT.-C.LeeB.LamE. Y.2023Differentiable imaging: A new tool for computational optical imagingAdv. Phys. Res.2220011810.1002/apxr.202200118
64SitzmannV.DiamondS.PengY.DunX.BoydS.HeidrichW.HeideF.WetzsteinG.2018End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imagingACM Trans. Graphics (TOG)37114.1114.13114.1–114.1310.1145/3197517.3201333
65HazinehD. S.LimS. W. D.ShiZ.CapassoF.ZicklerT.GuoQ.“D-flat: A differentiable flat-optics framework for end-to-end metasurface visual sensor design,” Preprint, arXiv:2207.14780 (2022)
66ChenW. T.ZhuA. Y.SanjeevV.KhorasaninejadM.ShiZ.LeeE.CapassoF.2018A broadband achromatic metalens for focusing and imaging in the visibleNature Nanotechnol.13220226220–610.1038/s41565-017-0034-6
67VeltenA.WuD.MasiaB.JaraboA.BarsiC.JoshiC.LawsonE.BawendiM.GutierrezD.RaskarR.2016Imaging the propagation of light through scenes at picosecond resolutionCommun. ACM59798679–8610.1145/2975165
68O’TooleM.KutulakosK. N.2010Optical computing for fast light transport analysisACM Trans. Graph. (TOG)29164.1164.12164.1–164.12
69GkioulekasI.LevinA.DurandF.ZicklerT.2015Micron-scale light transport decomposition using interferometryACM Trans. Graph. (TOG)3437.137.1437.1–37.1410.1145/2766928
70SuessA.GuoM.JiangR.MouX.HuangQ.YangW.ChenS.Physical modeling and parameter extraction for event-based vision sensorsInt’l. Image Sensor Workshop (IISW)2023IISWEdinburghR5.17R5.24R5.17–24
71MouX.FengK.YiA.WangS.ChenH.HuX.GuoM.ChenS.SuessA.2022Accurate event simulation using high-speed videoElectron. Imaging34242-1242-1242-1–10.2352/EI.2022.34.7.ISS-242
72ChanS. H.2022What does a one-bit quanta image sensor offer?IEEE Trans. Comput. Imaging8770783770–8310.1109/TCI.2022.3202012
73ElgendyO. A.ChanS. H.2018Optimal threshold design for Quanta Image SensorIEEE Trans. Comput. Imaging49911199–11110.1109/TCI.2017.2781185
74GnanasambandamA.ChanS. H.2020HDR imaging with Quanta Image Sensors: Theoretical limits and optimal reconstructionIEEE Trans. Comput. Imaging6157115851571–8510.1109/TCI.2020.3041093
75GnanasambandamA.ChanS. H.2022Exposure-referred signal-to-noise ratioIEEE Trans. Comput. Imaging8561575561–7510.1109/TCI.2022.3187657
76ChanS. H.2023On the insensitivity of bit density to read noise in one-bit quanta image sensorsIEEE Sensors J.23366636743666–7410.1109/JSEN.2023.3235493
77ChanS. H.ElgendyO. A.WangX.2016Images from bits: Non-iterative image reconstruction for Quanta Image SensorsMDPI Sensors16196110.3390/s16111961
78ChiY.GnanasambandamA.KoltunV.ChanS. H.2020Dynamic low-light imaging with quanta image sensorsEuropean Conf. on Computer Vision122138122–38SpringerCham10.1007/978-3-030-58589-1_8
79GnanasambandamA.ElgendyO.MaJ.ChanS. H.2019Megapixel photon-counting color imaging using Quanta Image SensorOSA Optics Express27172981731017298–31010.1364/OE.27.017298
80ChoiJ. H.ElgendyO. A.ChanS. H.2018Image reconstruction for Quanta Image Sensors using deep neural networksIEEE Int’l. Conf. Acoustics, Speech, and Signal Processing654365476543–7IEEEPiscataway, NJ10.1109/ICASSP.2018.8461685
81ElgendyO. A.GnanasambandamA.ChanS. H.MaJ.2021Low-light demosaicking and denoising for small pixels using learned frequency selectionIEEE Trans. Comput. Imaging7137150137–5010.1109/TCI.2021.3052694
82GnanasambandamA.ChanS. H.2020Image classification in the dark using Quanta Image SensorsEuropean Conf. on Computer Vision484501484–501SpringerCham10.1007/978-3-030-58598-3_29
83LiC.QuX.GnanasambandamA.ElgendyO. A.MaJ.ChanS. H.2021Photon-limited object detection using non-local feature matching and knowledge distillationIEEE/CVF Int’l. Conf. on Computer Vision Workshops (ICCVW)395939703959–70IEEEPiscataway, NJ10.1109/ICCVW54120.2021.00443
84QuX.ChiY.ChanS. H.“Spatially varying exposure with 2-by-2 multiplexing: Optimality and universality,” Preprint, arXiv:2306.17367 (2023)