Back to articles
Work for IS&T 75th Anniversary
Volume: 6 | Article ID: 000401
Pictures: Crafting and Beholding
  DOI :  10.2352/J.Percept.Imaging.2023.6.000401  Published OnlineJuly 2023

The psychogenesis of visual awareness is an autonomous process in the sense that you do not “do” it. However, you have some control due to your acting in the world. We share this process with many animals. Pictorial awareness appears to be truly human. Here situational awareness splits into an “everyday vision” and a “pictorial” mode. Here we focus mainly on spatial aspects of pictorial art. You have no control whatever over the picture’s structure. The pictorial awareness is pure imagery, constrained by the (physical) structure of the picture. Crafting pictures and beholding pictures are distinct, but closely related, acts. We present an account from experimental and formal phenomenology. It results in a generic model that accounts for the bulk of formal (rare) and informal (common) observations.

Subject Areas :
Views 159
Downloads 50
 articleview.views 159
 articleview.downloads 50
  Cite this article 

Jan Koenderink, Andrea van Doorn, "Pictures: Crafting and Beholdingin Journal of Perceptual Imaging,  2023,  pp 1 - 11,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2023
  Article timeline 
  • received October 2022
  • accepted May 2023
  • PublishedJuly 2023

Preprint submitted to:
Journal of Perceptual Imaging
J. Percept. Imaging
J. Percept. Imaging
Society for Imaging Science and Technology
This paper sketches our time path in arriving at a formal understanding of “pictorial space”. We are writing in retrospect. So we are backed up with extensive empirical evidence gathered over decades. Hence we are able to outline a much more coherent perspective than was possible during the course of our quest.
We “discovered” relevant facts in arbitrary order and dealt with them conceptually as best as we could at the time. Now we are to trace interrelations and “post-dictions” that we unfortunately passed up as pre-dictions.
So we present a neat formal account that should provide a useful framework for further investigations.
Daily Vision and Pictorial Vision
“Everyday vision” is a process that uses optical structure — both “received” and “sought for” — to efficaciously act in the world. It may be regarded as a kind of user-interface [20, 27, 32]. For the most part human everyday vision is not essentially different from the vision of other animals. It is for a large part dynamic and autonomous. It involves the whole body1
It may be remembered that the eyes are mounted roughly on top of the platform and are mechanically connected to the world by the feet.
and a huge store of experience gathered over evolutionary timespans [33, 34].
“Pictorial vision” is different in that binocularity, parallaxes and other “physiological cues” serve only to reveal a picture as the planar (think of a painting on the wall) object it is. A picture is not a window in that you don’t look through it, but at it, at least if you see it as a physical object [10]. However, many humans — in this we are different from animals — are able to look into a (2D-)picture and experience a virtual, or imaginary 3D space. This is not “inverse optics”,2
The optics is not invertible because it is a projection that discards one dimension (range) [6].
but something science is — by definition — unable to handle.
Pictorial space is an enigma for the sciences. Indeed, it is frequently dismissed as self-contradictory since there is no relevant disparity. Most dictionaries define “stereopsis” as binocular stereopsis, whereas vision science sometimes refers to “paradoxical monocular stereopsis” [7]. There is hardly a reason to doubt the experience of pictorial (thus monocular) depth [3, 4, 22, 40]. No visual artist doubts it, whereas many vision scientists do.
This was the reason for our interest in pictorial vision since the early 1990’s. It involved a shift from physiology and psychophysics — in its proper sense — to experimental phenomenology [1]. Most of our efforts were spent on the development of novel methods to obtain precise, structured, quantitative data on pictorial qualities like shape and depth. This led to the discovery of lawful “beholder’s shares” [15] that allow the essentially idiosyncratic optical awareness of different people to be formally accounted for.
Only modulo a group “mental movements” (see A.5.1) can such awarenesses be quantitatively compared. It led to a novel type of theoretical (or formal) phenomenology with considerable predictive power. Although not “science” in the strict sense, it opens up an academic endeavor with both formal and empirical underpinnings.
We use the ancient spelling to distinguish our topic from optics. It is indeed fundamentally different from what you find in the “Optics” textbooks. The first text on opticks is by Euclid [13].
Euclid’s opticks is either construed as a defective theory of physical optics, or as a defective theory of linear perspective. It is neither,3
It has nothing to do with the propagation of electromagnetic disturbances, nor with projections on picture planes [21].
it is a geometrical information theory involving “visual rays”.
The theory has obvious applications to terrestrial animal vision, indeed so obvious that Euclid is rarely mentioned.
Bishop Berkeley [6] put his finger on the crux:
… distance, … cannot be seen. For distance being a line directed end-wise to the eye, it projects only one point in the fundus of the eye, which point remains invariably the same, whether the distance be longer or shorter.
Opticks involves a projection, which is why it is not invertible. Thomas Reid [37] most incisively researched the consequences in his phantasy of the Idomenians.
The Frontal Observer
Euclid and Reid considered a formal momentary point observer with a perfectly isotropic and homogeneous structure. However, humans are not like that. Our system (like most organic systems) is neither homogeneous, nor isotropic [42].
We are bipedal animals with an optical system that samples the half-space in front of us. So we’d better adopt a coordinate system that recognizes that. Here we use Cartesian coordinates, with the X-axis in the principal viewing direction. The canonical viewing direction is the (horizontal) forward direction.
Euclid’s opticks is the same in all planes that contain the viewing direction. Thus — for the sake of succinctness (and intuition!) — we will discuss mainly the horizontal plane at eye height in this paper. It does not sacrifice generality.
Whereas Alberti’s “linear perspective” [2, 8] appears to be a mere artistic convention, it actually fits the human condition quite well.4
The reason is generic biology, rather than something special due to our singular position relative to fellow animals [42, 43].
So this is the preferred system: X-axis forwards, Y -axis horizontal to the right, Z-axis vertically upwards. Here we discuss and illustrate the XY -plane, the “horizontal plane”. The sagittal (XZ) plane has the same geometry. The frontal (Y Z) plane is special. It is the boundary of the “scene” and the space behind the observer (which is optically elsewhere).
The generalization from XY to XY Z is trivial and adds nothing new.
Specialization of the Euclidean Projection to the Human Case
We deal with what is essentially Alberti’s perspective [2]. However, we prefer (because formally most simple) a perhaps unfamiliar representation. Overall, we prefer synthetic geometrical methods, but the conventional algebraic method has its uses.
The optickal projection is implemented by a hyperbolic involution in projective space (see Appendices A.3.1 and A.3.2, [9], Figure 1). For a forward looking observer, we define the plane of the “viewport” as x = 1, thus we use the “viewing distance” as the natural unit of length. It is the distance (see appendix A.1) to the nearest relevant points of the scene and may vary greatly. The size of the viewport is nearly equal to the viewing distance for a “normal view”, smaller for a “narrow angle” view and larger for a “wide-angle” view (see A.3.4).
On the X-axis (principal visual ray) we have that u = 1∕x. Thus the elsewhere region x < 0 is mapped onto itself and so is the scene x > 0. Within the scene the map swaps the empty region 0 < x < 1 and the visual scene proper 1 ≤ x < + (“proper” because it singles out the part of the scene in front of the observer). See Fig. 1.
The central projection on any frontal plane5
Useful choices are the frontal plane through the eye, the frontal point, or the principal vanishing point. One choice may be preferred over others according to the context of the discussion.
is nothing but Alberti’s linear perspective (see A.3.2). Thus the projection contains Alberti’s perspective, but retains the distance (x) order and maps it on a “nearness” (u ∈ (0,1)) order. We refer to 1 − u as the “depth”.
Figure 1.
The hyperbolic involution that defines “perspective”. The red objects (point J, and red line through the frontal point F) are invariant. Vertical lines and the lines through J are conserved as lines, but generically not point-wise. The points P and Q which are mutually swapped by the involution lie on the orange line through J. The origin is the “eye” (point E). Rays through the origin (green arrow) map on horizontal lines (blue), they meet at the principal vanishing point V at infinity. The involution swaps the eye and the principal vanishing point, and likewise P and Q. The constructions of P given Q and of Q given P work in exactly the same way.
For a canonical view (size equals the viewing distance6
Examples are the “normal focal length” in photography and the traditional sizes of letter paper.
) one has a field of view of approximately 53. We refer to the image of the frustum as the “viewbox”. For the normal view it is a cubical region. The visual rays in the viewbox are parallel to the principal view direction, the “eye” being swapped with the principal vanishing point (see A.3.1 and Figs. 1 and 2).7
Formally, the viewbox is what is known as the NDC (Normalized Device Coordinates, or depth buffer) in Computer Graphics.
Figure 2.
The basic optickal projection. Here E denotes the eye, F the frontal point of the viewport, V the principal vanishing point. The “elsewhere” region is the space behind the back, the “out of view” region (which is connected!) is what is cut off by the viewport. Although not optically specified, such regions are in situational awareness. The white region is the empty space between the eye and the viewport. Visual rays are plotted in blue, note their direction! The zigzags signify an infinite interval. The yellow area indicates the frustum (left) and the viewbox (right). It may take some thought to get familiar with the topology.
Figure 3.
The viewbox is spanned by the boundary P, Q of the viewport and the boundary R, S of the field of view. The principal viewing direction (red line) runs from the frontal point F to the principal vanishing point V. The centre is the hyperfocal point H, the hyperfocal plane is u=12. The red lines are loci of constant y, the blue lines represent equidistance planes. Transversal scaling is u, longitudinal scaling u2, so things get smaller and flatter with depth. Note the shears in oblique directions.
Figure 4.
The metric can conveniently be judged by mapping circles of equal size. We indicate the images of diameters in the X- and Y-directions. From a formal standpoint one constructs the Riemann metric tensor. This also allows one to find the curvature (the viewbox is “flat”) and geodesics (straight lines with projective parameterization).
Of course, the viewbox is a severely squashed version of the scene (Figures 3 and 4). However, it has at least some thickness, whereas the Albertian perspective is completely flat.
We interpret the projection in a special way. Its domain is physical space (coordinates {x, y}), whereas its range is a mental space (coordinates {u, v}), referred to as “visual space”. Thus {x, y} and {u, v} are incommensurable because they are ontologically distinct [25]. This differentiates our interpretation from scenography, which treats the domain and range on ontologically identical terms [16]. The formal description is identical, but the meaning is categorically different.
A Further Specialization to Pictorial Vision
A picture is a plane covered with pigments in some simultaneous order [10]. Thus there is no scene. A picture does not come with “ground truth”, that would appeal to a God’s Eye view.
Since there is no scene, the viewbox cannot be the image of some (non-existing) frustum. It is a mental figment, some form of imagery. We refer to the space-frame of this imagery — in case it exists in awareness — as a “picturebox”.
In order to formalize this, we define the depth dimension in the picturebox as isotropic (see A.5), that is to say the full real line “rolled up in a point”. This allows for an (imaginary) 3D-space packaged in a (2 + 1)D picture plane. This non-Euclidean “thick plane” is a well known geometrical object with attractive8
Indeed, simpler than Euclidean, because the distance and angle metrics are both parabolic. In contradistinction, the Euclidean plane has an elliptic angle metric which leads to numerous pesky “exceptions”.
properties [38, 39, 41, 44, 45].
The group M of (special) similarities conserves the (physical) picture plane but affects the (imaginary) depth (see A.5.1 and Figure 5). Because these similarities conserve the picture plane point-wise, they are essentially arbitrary, although they conserve the geometry (the projective structure). Thus the picturebox should be taken modulo arbitrary similarities M. Pictorial observers may apply arbitrary M-transformations as “beholder’s share”. We refer to these as “mental movements” [28].
The group M largely coincides with the group of ambiguities for the familiar Shape from X algorithms. For instance, the so-called “bas-relief ambiguities” identified for shape from shading are elements of M (see A.6, [5, 23]).
Experimental Phenomenology of Pictorial Space
Here are the most essential findings distilled from a huge set of (quantitative and/or qualitative) empirical phenomenological observations:
typically, observers yield (often very) different depth fields. However, applying suitable mental movements these differences often become rather minor. Depth correlations may change from non-significant to 0.99. Without such corrections the observations are essentially meaningless and will lead to faulty conclusions [28].
observers relate spatial attitude to the local visual ray direction [24]. This explains a large part of the well known “deformations” of linear perspective renderings.
observers assume a default “normal field of view” of 4060. This explains another large part of the deformations often ascribed to linear perspective renderings [24, 29].
observers experience “depth”, a projective image of distance, as an image of range (see A.1) [25].
These rules (so far) seem to have no exceptions and may (at least in this stage of the investigation) be taken as “Laws of Pictorial Perception”. We view them as an additional chapter to the well known Gestalt Laws [12, 26, 30, 35]. Of course, there are numerous useful facts of a different nature that might eventually make it to such a status.
The rules predict various huge(!) deviations from “veridical perception” — not even mentioned in traditional textbooks — in a quantitative way. They may also be used to arrange (or even deform) scenes and pose actors in such a way as to yield intended pictorial awareness in viewers of the resulting pictures [36].
Figure 5.
Examples of “mental movements”. The red lines are the wires of the abacus model (Figure 6). Starting with a fiducial configuration we have, from left to right: a translation, a scaling and a rotation. Note that only the depth is affected, the picture plane coordinates remain invariant.
A Formal Model of the Psychogenesis of Pictorial Space
How does the creative imagination fill the picturebox so as to present one with a pictorial space? This is the problem of the psychogenesis of pictorial space. The process runs in pre-awareness, the results simply happen (like sneezing), one doesn’t “do” it in reflective thought. It is obviously “creative” (the opticks being non-invertible), but equally obviously constrained by the physical structure of the picture (pixels, paint strokes, or whatever). The relevant structures are conventionally referred to as (pictorial) “cues”.
Here we present a simple quasi-geometrical model of such psychogenesis. The model is generic in the sense that it does not cover the nature or use of specific cues, that would be a subsequent stage. Thus it leaves open the topic of whether pictorial vision is predominantly “constrained hallucination” or some form of (“regularized”) “inverse optics”. (Our choice would put the emphasis decisively on the former.) We refer to it as the “Abacus Model” (see A.7 and Fig. 6).
The picturebox is a rectangular volume9
Although we do not mention it repeatedly, remember that this immediately transfers to the 3D case. The 3D viewbox and picturebox are cuboids.
with two opposite edges parallel to the viewing direction, thus taking account of the fact that all visual rays in the picturebox are mutually parallel. The box is capped by a hither pane and a yonder pane.
The hither pane is the mental image of the picture surface. For ease of reference we will mutually identify those, although it should be kept in mind that they are ontologically different.Then the visual rays may be labelled as “pixels”. Each pixel can be thought of as extended “in depth”. The depths are essentially arbitrary. For instance, when viewing a picture obliquely, one may notice the hither pane to break apart from the perceived physical picture surface [31].
The yonder pane is a “backdrop” at perceptual infinity, yet the “total depth” of the picturebox is experienced as finite. The yonder pane is perhaps best understood as the plane of vanishing points, or — what amounts to the same — the space of directions of visual rays in a (in this case imaginary) scene.
Figure 6.
A simple example of a representation in the Abacus Model, here a convex object in front of a backdrop. Regular beads are shown in red, “virtual beads” in white and “split beads” in blue. The split beads are singular, here they are points on the occluding contour. The virtual beads are actually of different kinds. Some are on the backsides of objects, others are invisible due to occlusion with a nearer object.
Typical pictorial objects are opaque surfaces of volumetric entities. One experiences the frontal sides of these, simultaneously being aware of their volumes and hidden backsides. Thus psychogenesis somehow assigned a depth-field along the rays belonging to an area of pixels. It is as if it shifted a bead along the ray until it was “stopped” by some opaque surface [19]. Thus one may conceive of psychogenesis as a “bead-shifting game”.10
See the introductory chapter of Hermann Hesse’s Das Glasperlenspiel [18].
This is indeed how it has been characterized by practicing visual artists [11, 17].
We refer to this model as the “Abacus Model”, because it is obviously reminiscent of the counting frames often used to teach children the arithmetic of addition. Note that there may be more than one bead on a thread, generically either one (a surface), two (an occluding contour), or three (a T-junction). There may also be “virtual beads”11
Remember that all beads are imaginary!
representing the backsides of visual objects or otherwise occluded surfaces.
The group of mental movements M (see A.5.1) is essential in the experimental phenomenology of pictorial space. It by and large accounts for observer variations and the variations found for a single observer under repeated viewings.
Some Consequences of the Formalism
We mention just a few remarkable cases. Of course, the formalism has a much wider applicability.
Windows and Pictures
The view of a picture has again and again been compared with the experience of looking through a window. It describes the experience of pictorial space at least to some extent. However, it is very rare — taking very sophisticated measures — to confuse the experience of looking through a window with that of looking at (or even into) a picture. Such experiences (if any) are limited to professionally executed instances. Generically, when looking into a picture, you have the simultaneous situational awareness of looking at a picture surface. This is (very much!) part of the charm of pictorial perception. “Illusions” are the meal tickets of Madame Tussaud’s and Disneyland. They have little or nothing to do with enjoying (or even using) pictures [19].
The difference is due to the fact that you are not one of Euclid’s, Berkeley’s or Reid’s momentary point observers. You probably have both eyes open and move your body, head and eyes continuously. This immediately identifies a window as an aperture and a picture as an opaque surface in your situational awareness.
Oblique Viewing
Figure 7.
Lord Kitchener points straight at the spectator. A oblique sideways view shows the head as thinner, but the pointing is not affected. A really weird viewpoint renders the picture illegible.
Keyhole vision is an extreme case of looking through a window. The crux is that any vantage point yields a distinct view. For a picture, the case is different. The vantage point only deforms the optical structure a bit, but it does not add or remove anything.
Is it possible to “see the same scene” as you change your vantage point with respect to the window frame? Yes, it is, but you have to change the physical scene for that. A scene and some affine deformation of it may be set up so as to provide the same optickal data. This was famously proven by de la Gournerie in the mid nineteenth century (see A.4).
The proof has often been used to explain certain “deformations” due to the oblique viewing of pictures [36]. This is nonsense for a variety of reasons, most importantly ontological.
When you view a picture obliquely, the abacus model predicts that your pictorial awareness will only change by a mental movement, whereas your situational awareness, including the awareness of the physical picture surface, will reflect the usual effect of a change of vantage point. Thus “constancy mechanisms” will apply to the latter, but not to the former. This has the following experiential effects (Figure 7)
A pictorial face will “follow you”. The reason is simply that an en-face view will remain that, it cannot become a (say three-quarters) profile.
The hither pane will break free of the apparent picture surface. Note that the two are in ontologically distinct spaces, thus cannot formally be compared. Your awareness manages to deal with simultaneously present mutually distinct spatial frameworks.
Oblique viewing leaves the apparent size of the picture frame invariant (“size constancy”), but the pictorial object will apparently shrink in one direction [31].
It is easy enough to check such predictions in front of a suitable picture with a few minutes of mindful observation.
Various Well Known Problems with Linear Perspective Renderings
Ever since its advent in the early Italian Renaissance, artists had to deal with complaints concerning “perfect” linear perspective renderings. By the end of the Renaissance artists had figured out how to beat these problems using various intentional deviations from the correct methods. However, there never was something like a formal canon for these.
We find that the bulk of the effects is captured by the simple assumption that psychogenesis treats the geometry of the viewbox and picturebox much as the space they move in. No doubt there will be various corrections to this model, that might account for the observations in more detail. However, we are convinced (on the basis of our own, rather extensive, observations over some decades) that this simplest model is surprisingly effective and generic.
More extensive and precise observations — unlikely to become available in the near future — will almost certainly turn up deviations, although these may well turn out to be of an idiosyncratic nature.
Apparent Rotations
Consider a row of mutually translated, but otherwise identical objects. Think of a row of persons lined up in perfect military order. If the field of view is say 90 or more, most observers spontaneously notice that the outermost persons appear to look to the side. They appear “rotated” [24].
The amount of apparent rotation is well predicted by the eccentricity, even up to eccentricities of a hundred degrees.
This is an immediate consequence of the Law of Locality. In order to make a photograph that “looks right”, all actors need to view the camera.
Apparent Sizes
Depth may reflect range or distance, depending upon current situational awareness. That is another reason why the Albertian perspective breaks down for wide-angle views.
In order to experience an arrangement of actors as being posed in perfect military order, they have to face the camera and be at equal range from the camera. But Albertian perspective translates this into weird pictorial size variations.
In order to “look right” one needs to use Guido Hauck’s Plattkarte (“equirectangular projection”).12
The Plattkarte is the simplest example. Many cylindrical projections work just as well and may have advantages in certain settings.
That this works very well can easily be demonstrated using a fish-eye13
At least on the horizon, most fish-eye lenses are “angle true”. Full optical array cameras often yield perfect equirectangular maps.
instead of a regular photographic objective [24].
Apparent Frontal Planes
Frontal planes typically appear as curved with the convexity turned towards the eye. This may be due to the fact that one has to look back and forth in order to cover the horizon. The planar wall somehow disappears into depth at the horizontal vanishing points to the left and right. A concave circular wall with the eye at its centre actually looks frontoparallel. This is a consequence of the locality principle [24].
Apparent Depth Dilations and Contractions
If the viewing distance relative to the size of the main object is varied the perspective changes. The immediate impression is that for a short viewing distance near objects become drawn out in depth (Fig. 8 left), whereas, for a long viewing distance they are flattened (Fig. 8 right).
The reason why wide-angle views appear to have an expanded space, whereas narrow-angle views appear to be contracted in depth is shown in Figure 9. (See appendix A.3.4.)
Perhaps strange enough, there really is something like a “canonical scene” (Fig. 9 left). It contains a major topic set in a scene. The major topic will fill at least half of the scope and is typically roughly isotropic (sphere-like or cube-like). This is crucial for the segmentation of depth.
If the main topic extends beyond the hyperfocal point (“halfway infinity”, see A.3.3) there is only a foreground which intrudes into the background. If the main topic does not reach the hyperfocal point there is room for a middle ground between the foreground and the background. The former case applies to wide-angle views, the latter to narrow-angle views (Figs. 8 and 9).
Figure 8.
Bill Brand’s (1961) Nude on the Beach extends far into the background. Its space is strongly dilated. In contrast, Mantegna’s Dead Christ (ca,1480) is just a shallow foreground with a featureless dark backdrop at indeterminate depth. Its space is strongly contracted.
Figure 9.
At left a typical layout: a roughly isotropic, convex “main topic” embedded in a scene. The frontal plane is placed such as to touch the main topic, so all of the setting is behind it. The main topic is given half the width of the viewport, leaving a quarter width for both wings. This layout lets us define the foreground, middle ground and background depth layers. In the “normal view” (viewing distance f equals width) the field of view is 53. At right we show what happens for wide-angle (113) and narrow angle views (19). (Note that the viewport is the same in all cases.) This explains the effects seen in Figure 8.
Effective Methods of Depiction
The various problems discussed above are easy to solve in practice. Apparent rotations are cured by counter rotation. Either all actors face the vantage point, or one uses orthographic projection. If an impression of “correct perspective” is desired, one uses the correct construction on the architecture, but treats the actors as seen frontally in orthographic projection.14
A conventional example is Raphael’s fresco Scuola di Atene (1509–1511, Stanza della Segnatura, Apostolic Palace, Vatican City).
Apparent sizes are trivially corrected by adjusting pictorial sizes as indicated by eye measure.
Apparent frontal planes are treated differently, depending upon the scope of the depicted scene and the planned field of view during viewing. The viewing situation may be “forced”, for instance by hanging a painting relative to an entrance and so forth.
Apparent depth dilations and contractions are easily controlled by juggling planned depth layers. For a typical rendering one may use up to three proper (having finite thickness) layers (foreground, middle ground and background) and a (flat) backdrop. Each layer is treated orthographically at a magnification that helps indicate its place in the depth order. So linear perspective is simplified and discretized in a well planned manner (Figure 10). This leads to renderings that “read well”, often better than a real scene would. A well crafted picture is like a “Reader’s Digest” version of the scene (as in Fig. 10).
In the scene the observer has to construct cues,15
Cues need to be constructed, rather than “found”, because cues are not physical objects. They are intentional entities and require a mind.
whereas in the pictures the artist communicates them. This assumes a common cultural background of picture crafter and beholder.
Figure 10.
The use of depth layers. At top a foreground with a backdrop. At bottom foreground, middle ground and backdrop. Each single layer is drawn orthographically, though the scale changes with (average) depth. The layers (or coulisses) are treated as trellises, so deeper ones are seen through the apertures of nearer ones. In the picture plane the layers meet and their planar interrelations are important in design. As Hildebrand says the layers “shake hands in the picture plane”, although staggered in depth. In more sophisticated constructions, one uses objects such as roads that cover large depth ranges continuously and interact with various layers. But Hokusai’s constructions have the advantage of simplicity and speak clearly.
We present a generic model of pictorial space based on a huge base of data from experimental phenomenology. The model is “generic” in the sense of leaving open all kinds of specializations, it is hardly more than a succinct formalization of general observations. So we do not consider it “speculative”, but we believe that it will be part of the framework of any future specialization.
Although very general, the model predicts many observations that can easily be made by informal means. Perhaps, surprisingly, such observations include huge discrepancies between visual awareness and what is referred to as “veridical perception”. Although not proper psychophysics, many huge effects never made it into the textbooks. Even if experienced, many scientists will dismiss the presentations in their awareness. This may be okay from the standpoint that textbooks should only treat topics of proper science, but it is at odds with the requirements of many practical problems. For issues involving human mental processes, one relies on experimental phenomenology, rather than science proper.
Many of the effects mentioned here have been familiar to visual artists for centuries and are widely used, but are essentially unknown — or at least ignored — by vision science. There have hardly been serious attempts at formalization, it remains “how–to” knowledge propagated through workshop practice.
The present paper at least offers an attempt.
We acknowledge structural funding from the Flemish government awarded to Johan Wagemans (METH/22/02). The authors are thankful to numerous students and collaborators over half a century of research.
Appendix A.
Various Formal Matters
We succinctly introduce the formal structure that underlies some of the statements in the main text. The topics appear roughly in the order they appear in the text, from an overall perspective rather arbitrarily. We keep the account as simple and short as possible. All the relevant math is readily available from introductory texts.
Distance and Range
The simplest concept of “remoteness from the self” is range. Range is simply the Euclidean distance from the eye.
In contradistinction, “distance” is a formal concept that tends to be not intuitive at first blush. Distance is defined as the separation of any frontoparallel plane from the frontal plane through the eye. It fits the notions of geometrical optics, so distance (not range!) is what is marked on photographic lenses. Distance is the measure of separation from the self that is naturally adapted to the notion of a “frontal observer” (humans, cats and other predators as opposed to sheep and other prey animals).
In our notation the x–coordinate is distance, whereas x2+y2 measures range. They coincide on the principal viewing direction, but — even for a canonical view — may differ very significantly.
2D and 3D
Because opticks relies on straight “rays”, it works out the same in all planes through the eye. Thus a 2D formalism serves as a 3D formalism too. For example, the geometry in the horizontal (XY) and sagittal (XZ) planes is identical. The formalism is easily adjusted by adding or removing the Z and W dimensions and treating the {y, v} and {z, w} coordinates relations as fully analogous.
Projective Geometry and Homogeneous Coordinates
We use homogenous coordinates {p, q, r} (r≠0) to represent points {x, y} = {p, q}∕r and (if r = 0 ∧ (p2 + q2)≠0) directions {p, q}. We set it up such that {0,0,1} represents the eye, {1,0,0} the principal viewing direction and the plane x = 1 the frontal plane. Points on the frontal plane define visual rays, this is Alberti’s linear perspective (see A.3.2).
The Basic Hyperbolic Involution
The backward-identity J3=001010100 describes the basic opticks. We have J3 ⋅{x, y, 1} = {1, y, x}, which defines the hyperbolic involution Π : {x, y}↦{1, y}∕x.
Note that J32 = I3, thus J3 is its own inverse. It is a “special square root of the identity”. It has an invariant point {− 1,0} and an invariant line x = 1 (frontal plane of the frustum). The point {0,0,1} (the eye) and the direction {1,0,0} (the principal vanishing point) are mutually exchanged. Likewise, points {x, y} and {1, y}∕x on lines through {− 1,0} are mutually swapped.
Alberti’s “Linear Perspective”
A projection of the viewbox on either the hither or yonder pane {u, v}↦v yields Alberti’s linear perspective. One has v = yx. A projection on the hither pane describes Alberti’s notion of the rete, conventionally illustrated in textbooks. A projection on the yonder pane reveals the points as the vanishing points defined by the visual rays.
The Hyperfocal Point
Consider a viewport {p, q} = −d∕∕2 < y < + d∕∕2 in the frontal plane. It defines a quadrangle (the 2D frustum) PQRS, where R, S are the directions {1, ∓d∕2}. The intersection of the diagonals is {2,0}, independent of d. It is known as the “hyperfocal point” (Fig. 3). It is important because it is (by construction) the centre of the frustum and is located exactly “halfway infinity”.
Figure A1.
At left we illustrate a change of the viewing distance, keeping the size of the main topic (the square) invariant (section A.3.4). Compare the location of the square relative to the hyperfocal point for the three cases. The viewbox (at right) has unit width in all cases.
Wide- and Narrow-angle Views
For a fixed viewport the viewing distance defines the extent of the field of view Φ = 2arctanδ∕2, where δ is the ratio of the viewport width to the viewing distance. The viewing distances shown in Figure A1 left are 1∕2, 1 and 2. It is about the range used in regular photographic cameras.
Consider a “main topic” that fills the viewport (Fig. A1). We suppose the main topic is to be isotropic (as deep as wide). For a viewing distance d, it extends into the viewbox from the hither-pane to the depth 1∕(1 + d), here 1∕3, 1∕2 and 2∕3. This is very noticeable for common topics.
For wide-angle views, the topic extends beyond the hyperfocal point. Conversely, for narrow-angle views, the topic does not extend to the hyperfocal point, but is concentrated near the hither pane. If the “background” is defined as the region behind the hyperfocal depth (“halfway infinity”), one sees that narrow angle views allow for a “middle ground”, whereas wide-angle views do not (Figs. 8 and 9).
Scenographic Transformations of Window Views
In 2D, the empty region between the eye and the frustum is a triangle, in 3D, a pyramid. If the eye is moved with respect to the viewport this region changes shape. An “equivalent window view” should also transform the empty region appropriately. The immediate geometrical inference is that such equivalences must be due to affine (not projective!) transformations.
This was famously proved by De La Gournerie in the mid nineteenth century. It is often applied to discussions involving the oblique viewing of pictures. Such applications are spurious due to the ontological distinction between window and picture viewing.
Nil-square Infinitesimals
Consider the imaginary unit ε with ε2=0ε0. It is definitely small. The number ε has no sign, but is different from zero. Thus ε is not a real number.
The “dual numbers” z=u+εv, u,vR span a non-Euclidean plane with useful properties. One has that |z1z2| = u1u2, which is zero for points u1 = u2, that only differ in that v1v2. Such points are mutually “parallel.” They may be assigned a special distance v1v2.
All lines of constant u have zero length. These are called “isotropic”. If one treats depth as an isotropic dimension, say εd, with dR, then the depth may range over all real numbers, yet is contained in a point — a line of length zero. Thus the picture plane can be flat, but still carry a full depth line at each point.
This avoids the inconsistencies that would occur if depth would be treated on a par with the picture plane dimensions.
Mental Transformations
In the picturebox we use coordinates {u, v}, which we may treat as dual numbers v+εu. Note that transformations of the dual parts have no metrical effects in the picture plane. Specifically, transformations like uu + τ + ρv are (special) congruences (they do not change the special distance), whereas transformations like uσu with σ≠0 are (special) similarities (these scale the special distance). We denote the group M of special congruences and similarities the “mental movements” (Fig. 5).
Ambiguities of “Shape from X”
Many “shape from X” algorithms are bilinear problems of the type CET = D, where C is a Calibration matrix, E an estimation matrix and D an observation matrix.
A special solution is immediately obtained through singular values decomposition D = UWVT, where W is diagonal. So an “honest” solution is C=UW and E=VW.
But then C = CA and E=EA¯ with A¯=A1T and detA≠0 is a solution too. There is no further constraint on A, thus we obtain a group of ambiguities A. This is typical for many sfx problems [23].
Thus the estimated structure (E) remains observable only up to such ambiguities. For “Shape from Shading” one immediately obtains the well known “relief transformation”.
In generic cases the group of ambiguities A coincides with the group M of mental movements (see A.5.1), which is important from a biological perspective.
The “Abacus Model”
The “Abacus Model” (Fig. 6) essentially treats pictorial space as a fiber bundle. The base space is the mental image of the picture plane. It has (by and large) the structure of the Euclidean plane E2. The fibers are the depth dimension, which is approximately an affine line A1. There is no depth origin, nor something like a natural “unit”. Best one can do — depending on the settings — is to judge ratios of depth differences. Thus a formal model of pictorial space might be E2×A1.
Singly isotropic space (see A.5) serves as an effective algebraic model.
The Default Metric
Assume the Euclidean metric in the viewbox. Then this metric inherited from the viewbox induces a non-Euclidean metric in the scene (Figure A2). The metric tensor is G=1+y2x4yx3yx31x2.
Figure A2.
Non-Euclidean metric in the frustum induced by a Euclidean metric in the viewbox. Here we plot equal sized circular disks with diameters in the u and v directions from the viewbox to the frustum.
All components of the Riemann tensor are zero, so the scene is a flat space in this metric, though evidently not Euclidean. The geodesic equations are readily integrated. The geodesics are straight lines with a projective parameterization.
For instance, frontal geodesics at distance x0 with y ∈ [y0, y1] are {x0, y0 + s(y1 − y0)} with s ∈ [0,1] (like in the Euclidean metric). The principal viewing direction x ∈ [1, ] is a geodesic {1∕1 − s, 0} with s ∈ [0,1] (a nonlinear division, unlike the Euclidean metric).
1AlbertazziL.AlbertazziL.Experimental phenomenology: An introductionHandbook of Experimental Phenomenology: Visual Perception of Shape, Space and Appearance2013John Wiley & SonsChichester, UK1371–37
2L. B.AlbertiDella Pittura1972Penguin ClassicsLondon, UK(In English: On Painting) original 1435
3Ames Jr.A.1925aDepth in pictorial artArt Bull.85245–2410.1080/00043079.1925.11409464
4Ames Jr.A.1925bThe illusion of depth from single picturesJ. Opt. Soc. Am.10137148137–4810.1364/JOSA.10.000137
5BelhumeurP. N.KriegmanD. J.YuilleA. L.1999The bas–relief ambiguityInt. J. Comput. Vis.35334433–4410.1023/A:1008154927611
6BerkeleyG.1709An Essay towards a New Theory of VisionBookseller at Skinner-RowDublin, IrelandPrinted for Aaron Rhames, for Jeremy Pepyat
7ClaparèdeE.1904Stereoscopie monoculaire paradoxaleAnn. d’Oculistique132465466465–6
8ColeR.1921PerspectiveSeeley, Service & Co.London, UK
9CoxeterD.1989Introduction to Geometry2nd ed.John WileyNew York, NY, USA
10DenisM.1890Définition du néo-traditionnismeArt Crit.65556558556–8
11DunnC. R.1995Conversations in Paint: A Notebook of FundamentalsWorkman Publishing CompanyNew York, NY, USA
12EhrenfelsC. von1890Über gestaltqualitätenVierteljahrsschrift für wissenschaftliche Philosophie14249292249–92
13Euclid1945The Optics of EuclidJ. Opt. Soc. Am.35357372357–72(tranlated by H. E. Burton)
14FoleyJ.D.van DamA.FeinerS.K.HughesJ.1995Computer Graphics: Principles and Practice in C2nd ed.Addison–WesleyReading, MA
15GombrichE. H.1961Art and Illusion: A Study in the Psychology of Pictorial RepresentationPrinceton University PressPrinceton, NJ
16De La GournerieJ.1859Traité de perspective linéaire contenant les tracés pour les tableaux, plans et courbes, les bas-reliefs et les décorations théâtrales, avec une théorie des effets de perspectiveDalmont et DunodParis, France
17HamJ.1972Drawing Scenery: Landscapes and SeascapesPerigee BooksNew York, NY, USA
18HesseH.1943Das Glasperlenspiel. Versuch einer Lebensbeschreibung des Magister Ludi Josef Knecht samt Knechts hinterlassenen SchriftenHerausgegeben von Hermann Hesse. 2 BändeFretz & WasmuthZürich, Switzerland
19HildebrandA. von1893Das Problem der Form in der Bildenden KunstHeitz & MündelStrassburg, Germany
20HoffmanD.The interface theory of perception: Natural selection drives true perception to swift extinctionObject Categorization: Computer and Human Vision Perspectives2009148166148–6610.1017/CBO9780511635465.009
21KoenderinkJ. J.1982Different concepts of “ray” in optics: link between resolving power and radiometryAm. J. Phys.50101210151012–510.1119/1.12955
22KoenderinkJ. J.van DoornA. J.KappersA. M. L.1992Surface perception in picturesPerception Psychophysics52487496487–9610.3758/BF03206710
23KoenderinkJ. J.van DoornA. J.1997The generic bilinear calibration-estimation problemInt. J. Comput. Vis.23217234217–3410.1023/A:1007971132346
24KoenderinkJ. J.van DoornA. J.RidderH. deOomesS.2010Visual rays are parallelPerception39116311711163–7110.1068/p6530
25KoenderinkJ. J.van DoornA. J.WagemansJ.2011aDepthi–Perception2541564541–6410.1068/i0438aap
26KoenderinkJ. J.2011bGestalts and pictorial worldsGestalt Theory33289324289–324
27KoenderinkJ. J.2011cVision as a user interfaceProc. SPIE786578650410.1117/12.881671
28KoenderinkJ. J.van DoornA. J.2012aGauge fields in pictorial spaceSIAM J. Imaging Sci.5121312331213–3310.1137/120861151
29PontS. C.NefsH. T.van DoornA. J.WijntjesM. W. A.PasS. F. teRidderH. deKoenderinkJ. J.2012bDepth in box spacesSeeing Perceiving25339349339–4910.1163/187847611X595891
30KoenderinkJ. J.2015Ontology of the mirror worldGestalt Theory37119140119–40
31KoenderinkJ. J.van DoornA. J.PinnaB.PepperellR.2016Facing the spectatori-Perception71291–2910.1177/2041669516675181
32KoenderinkJ. J.2019aVision, an optical user interfacePerception48545601545–60110.1177/0301006619853758
33KoenderinkJ. J.Sentience2019bClootcrans PressTrajectum
34KoenderinkJ. J.2019cBetween Sentience & SapienceClootcrans PressTrajectum
35MetzgerW.1953Gesetze des Sehens. Herausgegeben von der Senckenbergischen Naturforschenden Gesellschaft zu Frankfurt am MainWaldemar Kramer, Frankfurt am MainGermany
36PirenneM. H.1970Optics, Painting & PhotographyCambridge University PressCambridge, UK
37ReidT.1769An Inquiry into the Human Mind3rd edn.CadellLondon, UK
38SachsH.1987Ebene Isotrope GeometrieVieweg–VerlagBraunschweig and Wiesbaden, Germany
39SachsH.1990Isotrope Geometrie des RaumesVieweg–VerlagBraunschweig and Wiesbaden, Germany
40SchlosbergH.1941Stereoscopic depth from single picturesAm. J. Psychol.54601605601–510.2307/1417214
41StrubeckerK.Geometrie in einer isotropen EbeneMath. Naturwiss. Unterricht15, 297–306, 343–351 and 385–394 (1962)
42UexküllJ. von1920Theoretische BiologieSpringerBerlin
43UexküllJ. von1940Bedeutungslehre (Bios, Abhandlungen zur theoretischen Biologie und ihrer Geschichte sowie zur Philosophie der organischen Naturwissenschaften. Bd. 10)Verlag von J.A. BarthLeipzig
44YaglomI. M.Complex Numbers in Geometry1968SpringerNew York NY, USAAbe Shenitzer (trans.)
45YaglomI. M.A Simple Non-Euclidean Geometry and its Physical Basis: An Elementary Account of Galilean Geometry and the Galilean Principle of Relativity1979SpringerNew York NY, USAE. J. F. Primrose (trans.)