Privacy is a fundamental right and needs to be protected. For health care related d information, there are regulations for disclosure. These regulations were motivated by the public's concern of breaches of confidentiality that might result in discrimination. The recent progress in electronic medical record technology, the Internet, and the genetic revolution, together with media reports on violations of privacy have generated increasing interest in this topic. A common belief is that sensitive information is more easily available with the use of networked computers. Since total lack of disclosure is not realistic, current regulations require that the "minimal amount" of information be given to a certain party. A thorough study on what constitutes "minimal" for particular types of applications and a "usefulness index" is lacking. An exact quantification of the potential for privacy breach in de-identified or anonymized databases is also lacking. Definition and quantification of these indices is important for decision-making. As we demonstrate, de-identified data sets can still be used for inference and therefore may disclose sensitive information. The use of machine learning methods to verify the remaining functional dependencies in a de- identified data set leads to better understanding of the possible inferences. Anonymization techniques based on logic, statistics, database theory, and machine learning methods can help in the protection of privacy. We will formally define and study anonymity in databases, from a theoretical and a practical standpoint. We will develop and implement algorithms to anonymize data sets that will be in accordance with the balance of anonymity and "usefulness" of the disclosed data sets. We will also develop and implement algorithms to verify the anonymity of a given data set and indicate the type of records that are at highest risk for a privacy attack. We will make our methods and documented tools freely available to researchers via the WWW.