For image recognition, our model takes two images, x and y, as inputs. If x and y are slightly distorted versions of the same image, the model is trained to produce a low energy on its output. For example, x could be a photo of a car, and y a photo of the same car that was taken from a slightly different location at a different time of day, so that the car in y is shifted, rotated, larger, smaller, and displaying slightly different colors and shadows than the car in x.