Recent technological advances led to drastically increased amounts of genetic data available to researchers. This resulted in an unprecedented escalation of the number of statistical hypotheses routinely tested in a single study. Instead of following a carefully crafted set of scientific hypotheses with statistical analysis, researchers can now test many possible relations and let P-values or other statistical summaries generate hypotheses for them. Driven by these advances, testing a handful of genetic variants in relation to a health outcome has been largely abandoned in favor of agnostic screening of the entire genome followed by selection of most significant results. The overwhelming majority of statistical testing is being done using the traditional framework of significance testing in which the evidence of every test is summarized via a P-value. The P-value is then compared to a significance threshold, adjusted to accommodate the number of tests in a study. Partly due to their widespread use, P-values have been at the center of replicability crisis. Inherent uncertainty associated with statistical inference imposes limitations on reliability of conclusions that can be drawn from data, but misuse of statistical methods and summaries is a growing concern. Significance, hypothesis testing and the accompanying P-values are being scrutinized as representing most widely applied and abused practices. P-values have been described as inherently unfit to fulfill their ostensible role as measures of scientific hypothesis's credibility. Rather than adopting the view that P-values should be abandoned because they are poorly suited for what they are used for in practice, we have been developing statistical methods for extracting information from them in such a way that when augmented with the external (prior) information about the effect size distribution, P-value can be transformed into a complete posterior distribution for a standardized effect size. Approximate posterior distributions can be obtained for non-standardized parameters. P-values and the null hypothesis significance testing are commonly associated with assuming the point null hypothesis, for example that the difference in susceptibility to disease is exactly zero between carriers of the minor and the major alleles of a genetic locus. The point null is largely an unrealistic assumption, and subsequently a common line of criticizing P-values is that no matter how tiny the actual difference might be, the null hypothesis will be rejected with increasingly high probability as the sample size becomes larger. On the other hand, posterior distributions derived from information summarized by P-values allow straightforward testing of any types of hypotheses and lead to interval inference in terms of probabilistic bounds for the effect size. Main focus of our research this year was on posterior inference using top-ranking association statistics in studies with very large number of tests, such as in genetic association studies. While studying expected behavior of these statistics, we found that an increase in the number of tested hypotheses increases the proportion of true signals among epidemiological predictors with the smallest P-values. This happens regardless of whether multiple testing corrections were applied and challenges a common belief that more testing leads to spurious findings.