Why do we perceive things 3D when the images on our retinas are 2D?
When we are born, we have no concept of how the world is organized. We don’t know 2D or 3D, and certainly not time. We have senses, and we have lots of uncommitted neurons that can learn. I’ll leave time for another story.
The first thing we notice is that the world is lumpy. We notice lumpiness by moving and paying attention to the variety of touch, pressure, pain, hot, cold and other touch sensors all over our skin.
The second thing we notice is the onrush of visual information, which initially makes no sense. What we inherit at birth is a neural mapping of the retina onto the V1 visual cortex, lots of uncommitted visual cortex, and a neural algorithm that can recognize boundaries/edges in the visual field. This algorithm allows us to recognize the lumpiness of the visual field.
The third thing we notice is that sometimes the lumpiness of the tactile world matches the lumpiness of the visual world. Like good scientists, we are attracted to those similarities. We not only recognize them, we attempt to find more of them. We reach out and touch things. We move (what little we can) to touch more things. We put things in our mouths so that the mouth’s superior tactility (at that point) can understand the 3D nature of small objects.
The fourth thing we notice is the 3D nature of small objects from the tactile senses. They have a front side and back side, a left side and a right side, a top side and bottom side, and there is distance between the sides. As we continue to explore tactilely, we discover that this distance is (mostly) rotationally and translationally invariant. Here we also learn that the width and height of the object correlates with the 2D size of the object in our near visual field.
The fifth thing we notice is how small objects, with their apparently invariable dimensions, appear as they move in the the visual field. The movement may result from a caregiver moving them or the child moving them. A newborn’s eyes are optimized for looking at near objects, no further than mom’s eyes when nursing. Again the little scientist notices correlations between tactile and visual aspects of objects which can be touched and seen. Among the objects being observed are the parts of mom’s face, body and clothing, along with various toys. At this point, an important lesson is that things appear smaller in the visual field as they get farther away.
The sixth thing we notice is that apparently invariable shapes can begin to disappear when other objects become visible near them. With some additional research, we learn that one object can block our view of another object (“peekaboo!”). Additional study suggests that there is distance between these objects in the radial direction. This is the beginning of an egocentric 3D view of the world; what we pay attention to is left/right and distance from self.
The seventh thing we notice is that we can move the position of the eyes, either by turning the head or moving the head left/right, to perform 3D experiments on the world. These experiments confirm and extend our understanding of the relation between the 2D visual field and the 3D world immediately surrounding the child.
From then on, further experimentation leads to the idea that not just the local environment but the whole world can be understood by interpreting it with a 3D cognitive model. The 3D model seems to successfully predict where to reach to touch or grab something, so it is further refined, and the successes of the model lead to more experimentation, which lead to further refinement, until we become experts of mentally manipulating that 3D model.
Consider what this explanation didn’t include. At no point did it mention binocular vision. All of the steps in learning about 3D modeling of the world can be done with one eye, so a child that is blind in one eye is not limited in developing the 3D sense. Binocular vision is an optimization that most of us benefit from, but it has no developmental effect.
The explanation also didn’t include total blindness. A blind infant goes through all the same stages of learning. Of course, all the tactile issues are the same, but the visual cues are replaced by audio cues. Of course, sighted infants also use audio cues, which I left out for the sake of simplicity. A blind infant becomes highly attuned to loudness levels and directionality, and they provide corroboration and insight to the concept of distance from self, which are then correlated with tactile sensations, as with any child.
The 3D model of the world that we carry around in our heads appears and persists because it is a successful infant-scientist theory of how the world is constructed.