Title: Valid and informative p-values from big data, illustrated in an epigenomic cross-over experiment
Abstract: A common issue that arises with current analyses of epigenomic data is the repeated use of statistical tests. For example, consider 17 people in a randomized experiment measuring the results of exposure to two treatment conditions (e.g., clean air and ozone), with measurements at 484,531 epigenome locations, where the aim is to find the locations with an epigenetic effect (i.e., of clean air versus ozone). Here, we describe the use of randomization-based tests to obtain a Fisher exact p-value that is valid whatever the correlational structure of the data from the epigenomic locations. The power of the resultant test to detect real differences, however, requires the careful a priori selection of the single test statistic.