The auditory cortex underlies are effortless ability to discriminate and remember complex sounds, including speech. The auditory cortex integrates spectral and temporal acoustic features to support the perception of complex sounds, including conspecific vocalizations. We investigated coding of vocal stimuli by simultaneously measuring auditory evoked potentials over a large region of primary and higher order auditory cortex along the supratemporal plane chronically using high-density microelectrocorticographic (ECoG) arrays. The neural information about vocalizations in the caudal areas was similar to the information about synthetic stimuli that contained only the spectral or temporal features of the original vocalizations. In the rostral sectors, however, the classification for vocalizations was significantly better than that for the synthetic stimuli, suggesting that conjoined spectral and temporal features were necessary to explain differential coding of vocalizations in the rostral areas. However, acoustic variability in calls among and within individual monkeys has not been evaluated extensively. Thus, we aimed to quantify acoustic variability in Coo calls from a large number of calls obtained from several monkeys. The monkeys were placed in a sound-attenuating testing chamber, and 1,000-10,000 coos were recorded from each monkey over several months. We found that the fundamental frequency can more reliably discriminate a monkeys identity compared to other acoustic features such as spectral entropy or duration. Thus, neuronal mechanisms sensitive to caller identity might have the capability of extracting acoustic features specifically useful in discriminating caller identity. During vocal production, an individuals own voice is perceived without being confused with sounds produced by external sources. To achieve normal perception of self-generated sounds, the auditory cortex must be able to differentiate self-generated sounds from sounds produced externally. Previous studies have shown that the primary auditory cortex responds to mismatch between expected and actual auditory feedback during vocal production, but the coding property of this mismatch signal and its underlying cortical network interaction are not well understood. We recorded from monkeys trained to vocalize Coo calls for water rewards with/without loud background white-noise playback. We found that the two most caudal sites in STP showed robust increases in power in the lower gamma band after call onset in the presence of noise, whereas the gamma band power in these sites decreased after call onset in the absence of noise. Furthermore, we were able to decode the fundamental frequency of Coo calls produced in noise on a single-trial basis. Together, these results suggest that the gamma-band power in primary auditory cortex carries information about the mismatch in spectral content of expected and actual auditory feedback. Interestingly, we did not find this mismatch activity in the higher-order auditory cortex on the rostral STP, suggesting that it was not relayed from higher-order to primary auditory cortex. We also found a robust increase of gamma-band power in primary motor cortex. This increase generally started 500-1000 ms before the onset of the call, and thus this activity could encode motor commands associated with vocal production. Motor cortical areas could also be potential sources of the mismatch signal found in the primary auditory cortex. Vocal production is an example of controlled motor behavior with high temporal precision. Previous studies have decoded auditory evoked cortical activity while monkeys listened to vocalization sounds but there have been few attempts at decoding motor cortical activity during vocal production. We recorded cortical activity during vocal production in the monkey. We detected robust activity in motor cortex during vocal production. Using a nonlinear dynamical model of the vocal organ to reduce the dimensionality of Coo calls produced by the monkey we could account for approximately 65% of the variance in the reduced sound representations, supporting the feasibility of using the dynamical model for decoding motor cortical activity during vocal production. As indicated the ventral stream maybe important for processing stimulus quality information that maybe important for stimulus recognition in auditory memory. However, we found auditory memory to be extremely impoverished, limited to a passive short term trace and unaffected by lesions of the rhinal cortex; this is in sharp contrast to their memory performance in vision which extends to long-term memory and is severely disrupted by a rhinal lesion. We tested monkeys on a serial delayed match-to-sample task (DMS). There was a steep drop in performance with a single intervening stimulus between the sample and the match. This drop in accuracy was not due to passive decay of the samples trace, but to retroactive interference from the intervening non-match stimulus. The neural underpinnings of this putative trace are unknown, but are likely to engage non-primary auditory cortex, e.g., the rostral superior temporal plane and gyrus. We recorded single-unit activity and local field potentials (LFP) across these regions while monkeys performed a serial DMS task. In the unit activity, we identified two phenomena: First, 35% of units exhibited a sustained change in firing rate (excitation or suppression) during the delay interval. Second, the auditory response was modulated by task context, with some showing match enhancement (relative to the sample presentation), and exhibiting match suppression. Similar characteristics were mirrored in the LFP power. During the first delay period, LFP power at a given site could be suppressed or enhanced relative to the pre-trial baseline. By contrast, only suppression was observed in the second delay period following a nonmatch stimulus. The delay-period modulation in the LFP spanned multiple frequency bands, suggesting that the suppression is a network-wide effect. Taken together, we find that evoked LFPs are modulated by task demands, and complement the mnemonic effects observed in single-unit activity. In contrast, to the performance of monkeys humans are very proficient in auditory recognition memory. Because humans seem to have such robust long-term auditory memory and the possibility that monkeys lack it is surprising and raises the question of whether or not apes possess this ability. We tested adult chimpanzees that had extensive testing on a variety of cognitive tasks on long-term auditory recognition memory. For comparison, the chimps were tested on a corresponding paradigm but with visual stimuli. The chimps like monkeys had great difficulty in learning an auditory memory task, but easily learned the equivalent visual recognition task. These data suggest that like monkeys chimps have no long-term auditory memory. To examine whether this modality difference extends to another form of learning, viz. habit formation, we tested monkeys on their ability to learn auditory discriminations. Ventrocaudal neostriatal (VCN) lesions result in deficits in visual discrimination learning and since this same portion of the neostriatum receives a major projection from the auditory areas in the rSTG, we also examined the effects on auditory discrimination of rSTG lesions. Postoperatively, all animals showed only a mild retention deficit of the previously learned pairs. In learning new problems, however, both groups had great difficulty, failing to reach criterion on a single discrimination even after several hundred trials. The results indicate that the rSTG-VCN connection is an essential pathway for auditory habit formation and as it is with visual habit formation.