We present a novel bi-modal system based on deep networks to address the problem of learning associations and simple meanings of objects depicted in "authored" images, such as ne art paintings and drawings. Our overall system processes both the images and associated texts in order to learn associations between images of individual objects, their identities and the abstract meanings they signify. Unlike past deep net that describe depicted objects and infer predicates, our system identies meaning-bearing objects ("signifiers") and their associations ("signifieds") as well as basic overall meanings for target artworks. Our system had precision of 48% and recall of 78% with an F1 metric of 0.6 on a curated set of Dutch vanitas paintings, a genre celebrated for its concentration on conveying a meaning of great import at the time of their execution. We developed and tested our system on ne art paintings but our general methods can be applied to other authored images.
Gregory Kell, Ryan-Rhys Griffiths, Anthony Bourached, David G. Stork, "Extracting associations and meanings of objects depicted in artworks through bi-modal deep networks" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Computer Vision and Image Analysis of Art, 2022, pp 170-1 - 170-14, https://doi.org/10.2352/EI.2022.34.13.CVAA-170