The auditory cortex underlies are effortless ability to discriminate and remember complex sounds, including speech. Our findings in monkeys have raised the possibility that, like the occipitotemporal visual areas, superior temporal auditory areas send highly processed stimulus quality information to downstream targets via a multisynaptic corticocortical pathway that are important for stimulus recognition. The auditory core (areas A1, R, and RT) on the supratemporal plane (STP) constitutes the first stage of cortical processing followed by a stepwise serial projection from A1 to R to RT to the rostrotemporal polar field RTp and then into the medial temporal rhinal cortices. The core areas receive their primary input from the medial geniculate nucleus of the thalamus while the more rostral fields receive little input from the auditory thalamus, suggesting their physiological responses to sound are mediated by the corticocortical pathways along the STP. We investigated the nature and emergence of specialization for auditory stimuli and particularly vocalizations by measuring auditory evoked field potentials to species-specific vocalizations along the caudal to rostral processing stream. We found that neural discrimination performance among vocalizations, compared to matched control stimuli in which only the frequency spectra or temporal content was preserved, was highest in the most rostral sector of STP, while this difference was minimal in the core areas. The most rostral sector had greater representation for complex stimuli in particular vocalization categories illustrating the progression in differential coding of conspecific vocalizations along the ventral auditory pathway. To investigate the importance of this ventral stream in auditory memory monkeys were trained on an auditory recognition task. We found their memory performance limited to short-term memory, and unaffected by lesions of the rhinal cortex; this is in sharp contrast to their memory performance in vision which extends to long-term memory and is severely disrupted by a rhinal lesion. These studies suggest that monkeys may be unable to store acoustic signals in long-term memory, raising the possibility that they may therefore also lack auditory working memory (WM). A stimulus trace may be temporarily retained either actively i.e., in WM or by the weaker mnemonic process we have termed passive short-term memory, in which a given stimulus trace is highly susceptible to overwriting by a subsequent stimulus. It has been suggested that WM is the more robust process because it exploits long-term memory (i.e., a current stimulus activates a stored representation of that stimulus, which can then be actively maintained). We tested monkeys on a serial delayed match-to-sample task (DMS). There was a steep drop in performance with a single intervening stimulus between the sample and the match. This drop in accuracy was not due to passive decay of the samples trace, but to retroactive interference from the intervening non-match stimulus. This overwriting effect was far greater than that observed previously in serial DMS with visual stimuli. The results indicate that monkeys perform serial DMS in audition remarkably poorly and that whatever success they had on this task depended largely on the retention of stimulus traces in the passive form of short-term memory. Reliance on a passive sensory trace could render memory particularly susceptible to confusion between sounds that are similar in some acoustic dimension. If so, in the DMS task, the monkey's performance should be predicted by the similarity in the salient acoustic dimension between the sample and test stimulus. We examined the pattern of errors made while performing the auditory DMS task. Manipulation of the stimuli showed that removal of spectral cues was more disruptive to matching behavior than removal of temporal cues. This suggests that the passively retained trace is not only highly susceptible to overwriting but is also vulnerable to similarity-based confusion. The neural underpinnings of this passive trace are unknown, but by analogy to sensory memory in vision and touch, are likely to engage non-primary auditory cortex, e.g., the rostral STP and gyrus (rSTG). Single-unit activity was recorded across these regions while monkeys performed the DMS task. We identified two phenomena potentially associated with mnemonic tasks: modulation of the sensory response by task context (match suppression MS or enhancement ME), and modulation of activity during the delay interval (delay suppression DS or enhancement DE). Firing rates represented acoustic features of the stimuli, but seldom signaled a categorical match or nonmatch. The absence of excitatory response modulation following the first nonmatch sound coincided with the marked increase in behavioral error rate, raising the intriguing possibility that these signals aid match detection, but any stimulus-specific trace spanning the delay interval appears not to be carried by spiking. In contrast, to the performance of monkeys humans are very proficient in auditory recognition memory. Because humans seem to have such robust long-term auditory memory and the possibility that monkeys lack it is surprising and raises the question of whether or not apes possess this ability. We tested adult chimpanzees that had extensive testing on a variety of cognitive tasks on two different long-term auditory recognition memory paradigms. For comparison, the chimps were tested on a corresponding paradigm but with visual stimuli. The chimps like monkeys had great difficulty in learning an auditory memory task, but easily learned the equivalent visual recognition task. These data suggest that like monkeys chimps have no long-term auditory memory. To examine whether this modality difference extends to another form of learning, viz. habit formation, we tested monkeys on their ability to learn auditory discriminations. Ventrocaudal neostriatal (VCN) lesions result in deficits in visual discrimination learning and since this same portion of the neostriatum receives a major projection from the auditory areas in the rSTG, we also examined the effects on auditory discrimination of rSTG lesions. As with most comparisons of auditory and visual behavioral tasks, it took many more it took many more trials to train monkeys on the auditory discriminations than it takes on visual discriminations. Postoperatively, all animals showed only a mild retention deficit of the previously learned pairs. In learning new problems, however, both groups had great difficulty, failing to reach criterion on a single discrimination even after several hundred trials. The results indicate that the rSTG-VCN connection is an essential pathway for auditory habit formation and as it is with visual habit formation. The impoverished auditory memory ability in monkeys contrasts not only with their excellent memory in vision but also with the human facility to encode auditory stimuli in LTM, thus raising the question of whether the human ability is supported in some way by speech and language. To test this possibility, we asked whether humans could store representations of speech sounds that can be neither repeated nor labeled. Our results indicate that the less that articulation and verbal labeling can be used the poorer the memory performance. This in turn has led us to propose that human speech and human auditory memory evolved together, possibly as a result of the evolution of the arcuate fasciculus from a primitive connection between the auditory and oromotor systems present in nonhuman primates to the dense and complex linkage in humans.