While a work of art can evoke an aesthetic experience, the same can apply to a grand ballroom or a sunset. Like fine art, everyday scenes contain aesthetic qualities, with some scenes being preferred over others. The general meaning of a scene, known as scene gist, is extracted rapidly and automatically, with just a brief glance, mainly from the low spatial frequency information in the image. We asked whether such rapid and coarse overall representation also allows for a stable aesthetic impression. In a series of experiments, we investigated to what extent intrinsic (image type) and extrinsic (temporal and spatial resolution) factors affect the aesthetic response to real-world images. We varied these factors in different groups of participants who rated sets of images for aesthetic pleasure. Here we show that aesthetic responses are extracted rapidly, consistently and automatically with just a glance at scene. Moreover, participants preferred natural scenes over urban or indoor scenes, at both rapid and unlimited exposures. This pattern of preference interacted significantly with self-similarity and anisotropy, two image statistics previously shown to correlate with aesthetic response to artworks. The results presented here allow for a deeper understanding of the aesthetic response to our every-day surroundings.