Understanding speech is one of the most important functions of the human brain. We use information from both the auditory modality (the voice of the person we are talking to) and the visual modality (the facial movements of the person we are talking to) to understand speech. This is advantageous because using the independent information available from the auditory and visual modalities is more accurate than either modality in isolation; an evolving tenet of multisensory integration (of which audiovisual speech perception is one of the most important examples) is that multisensory integration is Bayes-optimal-that is, the reliability of different sensory modalities are taken into account when integrating them. A key obstacle to progress is our lack of knowledge about whether audiovisual speech perception is also Bayes-optimal. For instance, If we hear a talker say ba (auditory modality) but see them say ga (visual modality), we often perceive a completely different syllable, da. This illusion, known as the McGurk effect, provides a useful demonstration of our ignorance of multisensory speech perception. We will construct a computational model of how the brain should combine auditory and visual speech information, and fit it to behavioral data collected as subjects listen to audiovisual speech. Then, we will test the model's predictions using the two most powerful methods for examining human brain function: blood oxygen-level dependent functional magnetic resonance imaging (BOLD fMRI) and direct neural recording using implanted electrodes (electrocorticography or ECoG). Some subjects perceive the McGurk effect and others do not. Our model accounts for this behavioral variability by positing that subjects' encoding of audiovisual speech is corrupted by sensory noise; across-subject differences in variability are modeled as different levels of sensory noise. We will measure neuronal variability to audiovisual syllables using BOLD fMRI, a method well-suited to testing large numbers of subjects. We expect that subjects with greater neuronal response variability (more sensory noise) will show greater behavioral variability in speech perception. In order to predict how subjects will perceive speech, our model estimates subjects' internal representations of different audiovisual syllables. Our hypothesis is that this model estimate will correspond to the neural representation of audiovisual speech. We will test this hypothesis by constructing a neural dissimilarity index (using the high gamma band response as an index of neuronal firing) and comparing it with the model's estimate of behavioral dissimilarity. Our preliminary results suggest that there are both early (stimulus-driven) and late (cognitive-related) neural responses to audiovisual speech with different representational properties; only ECoG has the necessary temporal resolution to distinguish these components. Our model estimates the weights applied to individual sensory modalities during multisensory integration but do not specify the source of these weights. Eye movements are a possible moderator of individual differences in these weights. Our hypothesis, supported by preliminary data, is that subjects who weight the visual modality strongly (and do perceive the McGurk effect) preferentially fixate the mouth of the talker in an audiovisual speech stimulus, while subjects who weight the visual modality weakly (and do not perceive the McGurk effect) fixate the eyes of the talker.