Multi-feature modeling of the neural representation of emotion- and identity-related information derived from facial and vocal cues. People signal their internal emotional state by a range of cues including facial expressions, non-verbal vocalizations and the tone and content of speech. Much work to date on emotional signaling has focused on a small number of emotions and has relied on stimulus sets with limited numbers of exemplars and poor representation of individuals of different ages and ethnicities. The computer vision literature has long recognized the need to be able to compare models, for example of face recognition, across complex, diverse, and naturalistic stimulus sets. Here, we argue that a parallel approach needs to be applied if we are to achieve a robust and generalizable understanding of how the human brain represents emotion and identity related information derived from facial and vocal cues. Three multi-session functional magnetic resonance imaging (fMRI) experiments are proposed. In the first, participants will view a large corpus (~2000) of faces of individuals varying in age, gender and ethnicity and showing a wide range of emotional expressions. In the second, participants will be presented with a large (~1000) and equally diverse set of emotional vocalizations. The third experiment will make use of an even more complex and naturalistic stimulus set comprising ~1000 video clips of individuals expressing emotions through facial expression, non-verbal vocalizations and emotion-laden speech. This set will be broken into three parts, with as closely matched content as possible. These will be presented to participants in audio only, visual only and audio-visual (bimodal) conditions, with the stimuli allocated to each condition balanced across participants. For each experiment, data collection will comprise separate Model Estimation and Model Validation periods with the majority of stimuli presented at Validation being distinct from those presented at Estimation. A range of models will be fit to the fMRI data acquired during Estimation runs and tested and compared using data acquired during Validation runs. These models contain sets of features that describe the emotional content of the stimuli presented in terms of either dimensional or categorical models of emotion. By comparing the fit of these models we can determine which models capture most variance in voxel response profiles and examine how this varies across brain regions. Across the three experiments, we also seek to establish whether there are regions where voxels show common coding of emotional state regardless of whether information is carried by facial or vocal cues or a combination of both. Further, by contrasting models with and without terms for characteristics such as age, gender and ethnicity we can also investigate the (in)dependence of the representation of emotion and identity-related features. The proposed research has the potential to greatly advance our understanding of how cues to others' emotional state are represented in the human brain and the extent to which this is influenced by the characteristics of the person we are interacting with. In the medium term, we plan to extend this to also model listener/ viewer characteristics (both demographics and predisposition to anxiety or depression) in a hope to advance our understanding of how biases in the interpretation of emotional signals can arise.