Several diverse pieces of evidence suggest the possibility that auditory long-term memory may require the assistance of the oromotor system. The first set of findings comes from a series of neurobehavioral studies carried out on the 3-generational KE family, half of whose members suffer from an inherited speech and language disorder. The core deficit associated with the widespread speech and language impairments is in executing orofacial, especially articulatory, movements. Identification of the neurobehavioral phenotype in the affected family members led to the work of a genetics team and a second line of evidence identification of the genotype, a mutation in FOXP2, thereby implicating FOXP2 in the development of speech and language. The third line of evidence, less momentous than the first two, but just as unexpected: monkeys are easily able to store visual and other sensory stimuli in long term memory, yet they appear to be unable to do so with auditory stimuli. Although seemingly unrelated, these pieces of evidence from the human and animal studies, taken in combination, suggest the following proposal. Because natural acoustic stimuli such as speech sounds fluctuate rapidly in time, it may be that their neural representations, unlike those of stationary sensory stimuli, cannot be packaged for long-term storage in the sensory system alone, because the sensory system may not contain an integration time-window that is long enough to represent the full duration of the fluctuating stimulus. Consequently, packaging of such stimuli may require the aid of the oromotor system, which is uniquely organized to chain-link rapid sequences. An alternative mechanism, particularly for a nonverbal stimulus that cannot be easily mimicked, would be to associate it with a previously stored verbal label or nonauditory stimulus, as in voice-name or voice-face association, and then recognize the stimulus later on the basis of its learned associate. The corollary of this hypothesis is that a sound that cannot be mimicked (nor be associated with an already stored stimulus) cannot be stored in long-term memory. While the core problem underlying this neurodevelopmental speech impairment has been identified as an orofacial and verbal dyspraxia, we also investigated whether an additional core deficit, viz., an auditory processing impairment, is associated with KE family speech difficulties. To compare auditory processing in affected KE, unaffected KE family members and controls, we tested subjects on a series of complex listening tasks that measured the ability to process the temporal and spectral aspects of auditory information. A tendency towards a significant effect of group for frequency discrimination was observed, which was due to a significantly lower, thus better, frequency discrimination threshold for affected KE family members compared to controls. All other psychoacoustic tests did not yield a significant group effect. These results suggest that auditory processing difficulties do not contribute to the orofacial dyspraxia of the affected KE family. However, the orofacial motor control deficits do not sufficiently explain the wide ranging language impairments. In humans, the FOXP2 gene is expressed in brainstem, thalamic and neostriatal structures. We aimed to determine whether there are functional deficits in the subcortical auditory pathway by assessing low brain stem, neural and cochlear auditory function in affected individuals. Affected members of the KE family were tested with pure tone audiometry, acoustic impedance tests, transient evoked otoacoustic emissions and auditory brainstem responses (ABR). Affected members demonstrated normal audiometric thresholds for their age range. Otoacoustic emissions were present in all and were normal. Acoustic reflexes and ABRs were normal in all cases. These results suggest that subcortical auditory pathways are not involved at a clinically significant level in KE family members. The auditory cortex underlies are effortless ability to discriminate and remember complex sounds, including speech. However, in monkeys we found auditory memory to be extremely impoverished, limited to a passive short-term trace and unaffected by lesions of the rhinal cortex; this is in sharp contrast to their memory performance in vision which extends to long-term memory and is severely disrupted by a rhinal lesion. We tested monkeys on a serial delayed match-to-sample task (DMS). There was a steep drop in performance with a single intervening stimulus between the sample and the match. This drop in accuracy was not due to passive decay of the samples trace, but to retroactive interference from the intervening non-match stimulus. The neural underpinnings of this putative trace are unknown, but are likely to engage non-primary auditory cortex, e.g., the rostral superior temporal plane (STP) and gyrus. We recorded from the rostral STP as monkeys performed auditory DMS. A subset of neurons exhibited modulations of their firing rate during the delay between sounds, during the sensory response, or during both. This distributed subpopulation carried a predominantly sensory signal modulated by the mnemonic context of the stimulus. Excitatory and suppressive effects on match responses were dissociable in their timing and in their resistance to sounds intervening between the sample and match. Like the monkeys behavioral performance, these neuronal effects differ from those reported in the same species during visual DMS, suggesting different neural mechanisms for retaining dynamic sounds and static images in STM. The auditory cortex integrates spectral and temporal acoustic features to support the perception of complex sounds, including conspecific vocalizations. We investigated coding of vocal stimuli by simultaneously measuring auditory evoked potentials over a large region of primary and higher order auditory cortex along the STP chronically using high-density microelectrocorticographic (ECoG) arrays. The neural information about vocalizations in the caudal areas was similar to the information about synthetic stimuli that contained only the spectral or temporal features of the original vocalizations. In the rostral sectors, however, the classification for vocalizations was significantly better than that for the synthetic stimuli, suggesting that conjoined spectral and temporal features were necessary to explain differential coding of vocalizations in the rostral areas. We also found a robust increase of gamma-band power in primary motor cortex. This increase generally started 500-1000 ms before the onset of the call, and thus this activity could encode motor commands associated with vocal production. Vocal production is an example of controlled motor behavior with high temporal precision. Previous studies have decoded auditory evoked cortical activity while monkeys listened to vocalization sounds but there have been few attempts at decoding motor cortical activity during vocal production. We recorded cortical activity during vocal production in the monkey and detected robust activity in during vocal production. Using a nonlinear dynamical model of the vocal organ to reduce the dimensionality of Coo calls produced by the monkey we could account for approximately 65% of the variance in the reduced sound representations, supporting the feasibility of using the dynamical model for decoding motor cortical activity during vocal production. Thus while the auditory ventral stream may be important for complex processing of auditory stimuli such as vocal processing it does not however, appear to be sufficient to support short-term auditory recognition memory in non-human primates.