Several diverse pieces of evidence suggest the possibility that auditory long-term memory may require the assistance of the oromotor system. The first set of findings comes from a series of studies carried out on the 3-generational KE family, half of whose members suffer from an inherited speech and language disorder. The core deficit associated is in executing orofacial, especially articulatory, movements as a result of a mutation of the FOXP2 gene. The second line of evidence is monkeys are easily able to store visual and other sensory stimuli in long term memory, but are unable to do so with auditory stimuli. Although seemingly unrelated, these pieces of evidence from the human and animal studies suggest the following proposal. Because natural acoustic stimuli such as speech sounds fluctuate rapidly in time, it may be that their neural representations cannot be packaged for long-term storage in the sensory system alone, because the sensory system may not contain an integration time-window that is long enough to represent the full duration of the fluctuating stimulus. Consequently, packaging of such stimuli may require the aid of the oromotor system, which is uniquely organized to chain-link rapid sequences. If storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences then the classical speech areas in the caudal-most portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation to each of these left-hemisphere loci while participants listened to pseudowords. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate. By contrast, IFG stimulation led only to a non-significant trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. It is clear that auditory cortex underlies humans' effortless ability to discriminate and remember complex sounds, including speech. However, in monkeys we found auditory memory to be extremely impoverished, limited to a passive short-term trace and unaffected by lesions of the rhinal cortex (RhC). In our previous study, a mild impairment in auditory memory was obtained following bilateral ablation of the entire medial temporal lobe, including the RhC, and an equally mild effect was observed after bilateral ablation of the auditory cortical areas in the rostral STG (rSTG). In order to test the hypothesis that each of these mild impairments was due to partial disconnection of acoustic input to a common target (e.g., prefrontal cortex), we examined the effects of a more complete auditory disconnection by combining the removals of both the rSTG and the RhC. We found the combined lesion led to nearly abolishing auditory recognition memory, and leaving behind only a residual echoic memory. Thus it appears the more complete disconnection of the temporal lobe auditory areas from the frontal cortex is responsible for near complete loss of auditory short-term memory. Auditory short-term memory comprises at least two components: an active working memory and a sensory trace that may be passively retained. Working memory relies on representations recalled from long-term memory, and their rehearsal may require phonological mechanisms unique to humans. Monkeys appear to employ passive STM to solve recognition memory type tasks, as evidenced by the impact of interfering stimuli on memory performance. Neural correlates of DMS performance have been observed throughout the auditory and prefrontal cortex, defining a network of areas supporting auditory STM with parallels to that supporting visual STM. It is clear that individual primates can seemingly be identified by the sound of their voice. Macaques have demonstrated an ability to recognize conspecific identity from a harmonically structured 'coo' call. Voice recognition presumably requires the integrated perception of multiple acoustic features. However, it is unclear how this is achieved, given considerable variability across utterances. Specifically, the extent to which information about caller identity is distributed across multiple features remains elusive and would suggest long-term auditory memory. We examined these issues by recording and analyzing a large sample of calls from eight macaques. Single acoustic features, including fundamental frequency, duration and Weiner entropy, were informative but unreliable for the statistical classification of caller identity. A combination of multiple features, however, allowed for highly accurate caller identification. A regularized classifier that learned to identify callers from the modulation power spectrum of calls found that specific regions of spectral-temporal modulation were informative for caller identification. These ranges are related to acoustic features such as the call's fundamental frequency and FM sweep direction. We further found that the low-frequency spectrotemporal modulation component contained an indexical cue of the caller body size. Thus, cues for caller identity are distributed across identifiable spectrotemporal components corresponding to laryngeal and supralaryngeal components of vocalizations, and the integration of those cues can enable highly reliable caller identification. Our results demonstrate a clear acoustic basis by which individual macaque vocalizations can be recognized but we have not yet been successful in demonstrating this behaviorally. The ventral stream of the primate auditory cortex, cortico-cortical projections emanate from the primary auditory cortex (AI) along two principal axes: one mediolateral, the other caudorostral. Connections in the mediolateral direction from core, to belt, to parabelt, have been well described, but less is known about the flow of information along the caudorostral dimension. Our results from a series of neuroanatomical studies describe a pathway comprising stepwise projections from AI through the rostral and rostrotemporal fields of the core (R and RT), continuing to the recently identified rostrotemporal polar field (RTp) and the dorsal temporal pole. Each area was strongly and reciprocally connected with the areas immediately caudal and rostral to it. The results support a rostrally directed flow of auditory information with complex and recurrent connections, similar to the ventral stream of macaque visual cortex. In addition to this serial cascade of corticocortical connections, every region of auditory cortex receives parallel thalamocortical projections from the medial geniculate nucleus (MGN), with AI and R being the primary recipient of input from the ventral division (MGv). Whereas AI and R both receive nearly 90% of their thalamic inputs from the MGv, RT receives only 45% from MGv, and an equal share from the dorsal subdivision, MGd. Area RTp receives 25% of its inputs from MGv, but 30% of its thalamic inputs arise from multisensory areas outside the MGN. In accord with the laminar patterns evident in corticocortical connections in the same cases, these thalamocortical connections support a model in which AI and R lie at the same hierarchical level, but RT and RTp lie at a higher level, perhaps between that of the core and belt. These results demonstrate an expanded hierarchical model with a complexity which may well exceed the complexity of the primate's ventral visual stream.