How do we recognize objects, and how do we do it so quickly?
Memories of objects and faces are not stored like memories of experiences and places. The difference takes advantage of the different needs we have for remembering them and the value of speed of recall. Experiences are stored based on their emotional and logical content. They are stored in associative memory, and they are recalled with various associations that are important to each experience, including especially the people that are in them. Places are remembered in place memory, which is a grid of cell bundles that support navigation in and around the given place. That’s all I’ll say about them.
Objects and faces are different than experiences and places in two important ways. First, they are primarily sense memories, primarily visual but also auditory. As such, they have invariant objective features that can be optimized for recall. Second, we benefit from immediate recognition of faces and objects. We expect to recognize a ball or banana as soon as we see it, not after a short while, no matter how short the time. The sooner the better.
Objects and faces are largely recognized with the same sort of mechanism, though different areas of the brain seem to specialize in each. I’ll describe the shared mechanism, but I’ll concentrate on the object path since that’s what the question is about.
Before talking about the mechanism, I want to describe what it means for the brain to “recognize” something. Understanding this is critical to understanding the explanation.
There are basically two ways that people recognize things, whether objects or faces. In one way, we see something for the first time, not knowing what it is. We have to take it in and make sense of it. This happens for totally new things, and it also happens whenever we see a new instance that is like some previous instance. This is also what happens when we see a new face.
The second way we see something is when it is an object or someone we have seen before. In this case, we have some reason to expect that the thing or face will be present. We are waiting for it, anticipating it.
Psychologists have known about these two ways for decades. The difference was discovered when researchers attempted to measure recognition time and found that it wasn’t predictable in their early experiments. What they discovered was that expectation or foreknowledge changes the response time radically. They learned to structure psychological tests so that the advent of objects and faces in experiments was carefully planned. The term “priming” is used to describe a situation where the subject knows what to expect.
The general mechanism for recognizing objects and faces involves a pipeline of visual processing areas. Each segment in the pipeline involves what is called a feedforward network, embedded with a feedbackward network. These are really completely intertwined, but we have found that speaking of them separately is useful.
In the case of object recognition, the pipeline is called the ventral visual pathway. It runs from V1 in the very back of the brain forward along the underside of each hemisphere. Primate evidence suggests that it consists of six segments.
The feedforward circuitry of the ventral visual pathway is for making sense of new objects. The feedbackward circuitry is for anticipating previously seen objects, i.e., priming. Another way of looking at these directions is that most of the time we are seeing things we have seen before, so the feedbackward pathway allows V1 cortex to confirm that we are seeing a known object, while the feedforward pathway consists of detailed error signals starting from V1 that allow learning about new objects and variations on old objects.
Now that we’ve considered the first question, let’s go on to the second.
The ventral visual pathway (purple in the image above) is used for all objects. In effect, it is considering all objects you have previously seen at the same time. The segments in the pipeline are finding and predicting commonalities of all the different objects you have ever seen as the visual information passes through them. This means there is no serial consideration of what an object is, as the question suggests.
Recognizing an object, whether a new one after considering all its characteristics, or a previously known one by confirmation of a prediction is nothing like a memory of an experience. Experiences tend to interfere with one another since their associations are subject to interference with each recall. The recognition of of objects is seldom lost because the complexities of the ventral visual stream allow recognition of a vast number of objects simultaneously.
As a side note, recognition of a new object through the feedforward network takes 200–400 milliseconds, while recognition of a known object with priming by the feedbackward network can be done in as few as 60 milliseconds.
Complicating this explanation is that object recognition takes place independently in the two hemispheres. Rather than duplicating these abilities, the hemispheres provide complementary functionality. The hemispheres look at different features and use different parts of the visual field.
The right hemisphere uses the entire visual field, central and peripheral vision, left and right sides. This allows it to recognize the object in context, and allows the entire object to be considered as a whole. It also allows larger objects to be recognized.
The left hemisphere only perceives the right half of the central (foveal) visual field. This means that it tends to recognize smaller parts of objects, with the object itself as context, but not the area around the object. The relatively consistent high resolution in this part of the visual field allows it to concentrate on details of the object that the right hemisphere might not use.
These two complementary views of an object work together. The left side can often recognize an object somewhat earlier since its part of the visual field isn’t delayed by passing across the corpus callosum. Also, the left side can sometimes recognizes objects in new orientations and with moderate visual interference because it is recognizing parts of the object that might escape interference. The right hemisphere takes slightly longer, but since it is more likely to recognize the object correctly, it can correct left hemisphere mistakes.
Now I hope I have waved my hands sufficiently at this essential, complex process of object recognition. I hope you can see now why you can recognize a vast number of objects almost instantaneously.