Healthcare information is increasingly distributed across many independent databases and systems, both within and among organizations as separate islands with different patient identifiers. This is the case for data collected within an institution where there may be multiple identifiers, and for data collected about the same patient at different health care institutions, different pharmacy systems, different payers, different public health agencies, and so on. This situation hinders the aggregation of information about individuals across such databases as needed for clinical decision support, clinical care, public health reporting, clinical research, and outcomes management. Aggregation is important not only to determine a patient's health care status, but also for clinical effectiveness research, drug safety research and other population-based studies requiring comprehensive data. While HIE's are an increasingly common source of comprehensive clinical, formal recommendations explicitly addressing HIE data aggregation approaches are lacking. Consequently, HIE's currently use a variety of differing data aggregation approaches. Because HIE's represent complex "melting pots" of heterogeneous clinical information sources with varying data quality and characteristics, they present unique data aggregation challenges and opportunities. Therefore, clear documentation and dissemination of concrete, real-world methods for accurate, efficient, and data aggregation are crucial to developing a robust and reliable National Health Information Network (NHIN). We will formally document and disseminate two distinct, existing classes of linkage methodologies currently used in the context of a long-standing, operational health information exchange. We will implement and evaluate extensions to the probabilistic method that are designed to improve algorithm accuracy. Extensions will include: stochastic and closed-form solutions for parameter estimation methods;generalization of the probabilistic method to accommodate statistical dependence between fields;evaluation of novel nearness comparators and continuous and discrete modifications allowing formal inclusion of comparators. We will evaluate and extend methods for creating synthetic linkage data that closely reflects the statistical characteristics of the underlying. We will evaluate methods that detect the presence or absence of specific data characteristics that inform the selection of extensions to the underlying probabilistic matching model. We will develop and evaluate processes for identifying data element combinations that fail the test for statistical independence. We will evaluate and characterize the technical performance and clinical and operational value of linking real world HIE data sources for a variety of scenarios using both deterministic and probabilistic methods.