Rare epilepsies are a devastating group of diseases that begin in childhood, and often cause profound neurologic, medical, and psychiatric disabilities. There are reliable epidemiologic estimates for only a few of these diseases. Tuberous sclerosis (TS), for example, affects 1 neonate per 5700 live births. However, for some diseases like Aicardi syndrome, estimates are limited to counts of known cases (900 in the US). And for others, such as MERRF (Myoclonic Epilepsy with Ragged Red Fibers), there are no estimates at all. Several obstacles have impeded surveillance and epidemiology of this vulnerable and medically complex population. First, identifying individuals in large datasets is difficult. Although some diseases have specific billing codes (i.e. ICD-9 759.5 TS + 345.x epilepsy = TS with epilepsy), most are coded with nonspecific diagnoses like 345.9 (epilepsy unspecified) or 780.39 (other convulsions). Second, although caregivers have formed advocacy groups for individual rare epilepsies, these groups only recently united to support research. Third, many individuals seek care at multiple centers, preventing a full assessment of their history from clinical data at a single center. There are new opportunities to study these diseases. First, broad use of electronic health records (EHRs) now allows researchers to analyze large volumes of clinical notes with text processing tools. A regular expression, for example, is a robust, easy-to-share technique to specify a text search. Second, the Rare Epilepsy Network (REN) has unified advocacy groups for rare epilepsies into a federally funded research consortium. Third, multi-institutional clinical data research networks such as the New York City Clinical Data Research Network (NYC-CDRN) are gathering medical records from multiple institutions. The central idea of our proposal is that text processing of clinical notes ill improve surveillance and epidemiology of the rare epilepsies. We will use the NYC-CDRN to identify affected individuals using EHRs from multiple academic medical centers. We will describe the incidence, prevalence, comorbidities, mortality, and quality of ambulatory care for these individuals. Finally, we will develop, characterize, and disseminate specifications for searching text (a set of regular expressions) to find affected individuals in clinical notes. Improved epidemiological estimates will guide clinical care, prioritize research initiatives, spur development of therapies by industry, and help caregivers understand these devastating diseases. The text searching specifications (regular expressions) will help centers identify rare epilepsies to support surveillance, research, quality improvement, care management, and referral to advocacy organizations. This work aligns with recent Institute of Medicine (IOM) recommendations (1, 2, 4, 8, 9, 13) and with the 2014 National Institute of Neurological Disorders and Stroke (NINDS) Epilepsy Benchmarks (IC, IIC, IIF, and IVD).