: In fluent speech, speakers begin to pronounce the next sound before they're done pronouncing the last. As a result, speech sounds not only occur right next to one another but actually overlap, and pauses only occur between whole phrases and not individual sounds. This characteristic of fluent speech presents the listener with two formidable problems: separating overlapping sounds and then recognizing sounds whose acoustics have been distorted by the overlap with its neighbors' pronunciations. This proposal pursues the hypothesis that both separation and recognition can happen because successive intervals in the signal contrast with one another perceptually. For example, after an interval in which most of the sound energy is at high frequencies, a sound whose energy is at mid frequencies will sound relatively low, or after a relatively long interval, an interval of intermediate duration will sound relatively short. The experiments test a version of this hypothesis in which sequential contrast is exaggerated like this in the initial auditory evaluation of the sounds, before the listener has assigned any linguistic value to the sound, i.e. before the sounds are recognized as instances of particular categories. If sequential contrast arises before the sounds are recognized, then it will be impervious to any linguistic knowledge the listener may have, e.g. of whether the current sound makes a word with its context, occurs frequently in that context, is phonotactically legal in that context, etc. A separate, prelinguistic, auditory stage of phonetic processing is diagnosed by better discrimination of sound sequences that differ in the direction of their sequential contrast, e.g. high-low vs low-high, than of sequences that don't, i.e. high-high vs low-low. If linguistic knowledge is used at all stages of processing, these two pairs of sequences should instead be equally easy to distinguish because all the intervals will have been assigned to categories and will therefore be equally different. The results of these experiments therefore permit a choice between interactive models of speech sound recognition in which listeners use their linguistic knowledge at all stages in processing the speech sounds they hear and autonomous models in which they use only the psychoacoustic properties of the signal during the first stage, and only later apply what they know linguistically to the output of that stage. If the autonomous model is supported, then the robustness of speech perception under adverse conditions or by impaired listeners can be improved more by enhancing signal quality than adding redundant linguistic information.