A Direct Approach to False Discovery Rates (original) (raw)

Summary

Multiple-hypothesis testing involves guarding against much more complicated errors than single-hypothesis testing. Whereas we typically control the type I error rate for a single-hypothesis test, a compound error rate is controlled for multiple-hypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential _p_-value rejection methods based on the observed data. Whereas a sequential _p_-value method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the _q_-value, the pFDR analogue of the _p_-value, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.

1. Introduction

The basic paradigm for single-hypothesis testing works as follows. We wish to test a null hypothesis H_0_versus an alternative _H_1 based on a statistic X. For a given rejection region Γ, we reject _H_0 when _X_∈Γ and we accept _H_0 when _X_∉Γ. A type I error occurs when _X_∈Γ but _H_0 is really true; a type II error occurs when _X_∉Γ but _H_1 is really true. To choose Γ, the acceptable type I error is set at some level α; then all rejection regions are considered that have a type I error that is less than or equal to α. The one that has the lowest type II error is chosen. Therefore, the rejection region is sought with respect to controlling the type I error. This approach has been fairly successful, and often we can find a rejection region with nearly optimal power (power =1−type II error) while maintaining the desired _α_-level type I error.

When testing multiple hypotheses, the situation becomes much more complicated. Now each test has type I and type II errors, and it becomes unclear how we should measure the overall error rate. The first measure to be suggested was the familywise error rate FWER, which is the probability of making one or more type I errors among all the hypotheses. Instead of controlling the probability of a type I error at level α for each test, the overall FWER is controlled at level α. None-the-less, α is chosen so that FWER ≤α, and then a rejection region Γ is found that maintains level α FWER but also yields good power. We assume for simplicity that each test has the same rejection region, such as would be the case when using the _p_-values as the statistic.

In pioneering work, Benjamini and Hochberg (1995) introduced a multiple-hypothesis testing error measure called the false discovery rate FDR. This quantity is the expected proportion of false positive findings among all the rejected hypotheses. In many situations, FWER is much too strict, especially when the number of tests is large. Therefore, FDR is a more liberal, yet more powerful, quantity to control. In Storey (2001), we introduced the positive false discovery rate pFDR. This is a modified, but arguably more appropriate, error measure to use.

Benjamini and Hochberg (1995) provided a sequential p_-value method to control FDR. This is really what an FDR controlling p_-value method accomplishes: using the observed data, it estimates the rejection region so that on average FDR ≤_α for some prechosen α. The product of a sequential p_-value method is an estimate k^ that tells us to reject p(1),p(2),…,p(k^)⁠, where p(1)≤_p(2)≤…≤_p(m) are the ordered observed _p_-values.

What can we say about k^? Is there any natural way to provide an error measure on this random variable? It is a false sense of security in multiple-hypothesis testing to think that we have a 100% guaranteed upper bound on the error. The reality is that this process involves estimation. The more variable the estimate of k^ is, the worse the procedure will work in practice. Therefore, the expected value may be that FDR ≤α, but we do not know how reliable the methods are case by case. If point estimation only involved finding unbiased estimators, then the field would not be so successful. Therefore, the reliability of k^ case by case does matter even though it has not been explored.

Another weakness of the current approach to false discovery rates is that the error rate is controlled for all values of _m_0 (the number of true null hypotheses) simultaneously without using any information in the data about _m_0. Surely there is information about _m_0 in the observed _p_-values. In our proposed method, we use this information, which yields a less stringent procedure and more power, while maintaining strong control. Often, the power of the multiple-hypothesis testing method decreases with increasing m. This should not be so, especially when the tests are independent. The larger m, the more information we have about _m_0, and this should be used.

In this paper, we propose a new approach to false discovery rates. We attempt to use more traditional and straightforward statistical ideas to control pFDR and FDR. Instead of fixing α and then estimating k(i.e. estimating the rejection region), we fix the rejection region and then estimate α. Fixing the rejection region may seem counter-intuitive in the context of traditional multiple-hypothesis testing. We argue in the next section that it can make sense in the context of false discovery rates.

A natural objection to our proposed approach is that it does not offer ‘control' of FDR. Actually, control is offered in the same sense as the former approach—our methodology provides a conservative bias in expectation. Moreover, since in taking this new approach we are in the more familiar point estimation situation, we can use the data to estimate _m_0, obtain confidence intervals on pFDR and FDR, and gain flexibility in the definition of the error measure.

We show that our proposed approach is more effective, flexible and powerful. The multiple-hypothesis testing methods that we shall describe take advantage of more information in the data, and they are conceptually simpler. In Section 2, we discuss pFDR and its relationship to FDR, as well as using fixed rejection regions in multiple-hypothesis testing. In Section 3 we formulate our approach, and in Section 3 we make a heuristic comparison between the method proposed and that of Benjamini and Hochberg (1995). Section 3 provides numerical results, comparing our approach with the current one. Section 3 describes several theoretical results pertaining to the proposed approach, including a maximum likelihood estimate interpretation. Section 3 describes a quantity called the _q_-value, which is the pFDR analogue of the _p_-value, and Section 3 argues that the pFDR and the _q_-value are the most appropriate false discovery rate quantities to use. Section 3 shows how to pick a tuning parameter in the estimates automatically. Section 3 is the discussion, and Appendix A provides technical comments and proofs of the theorems.

2. The positive false discovery rate and fixed rejection regions

As mentioned in Section 1, two error measures are commonly used in multiple-hypothesis testing: FWER and FDR. FWER is the traditional measure used; Benjamini and Hochberg (1995) recently introduced FDR. Table 1 summarizes the various outcomes that occur when testing _m_hypotheses.

Table 1

Outcomes when testing m hypotheses

Hypothesis Accept Reject Total
Null true U V _m_0
Alternative true T S _m_1
W R m
Hypothesis Accept Reject Total
Null true U V _m_0
Alternative true T S _m_1
W R m

Table 1

Outcomes when testing m hypotheses

Hypothesis Accept Reject Total
Null true U V _m_0
Alternative true T S _m_1
W R m
Hypothesis Accept Reject Total
Null true U V _m_0
Alternative true T S _m_1
W R m

V is the number of type I errors (or false positive results). Therefore, FWER is defined to be Pr(_V_≥1). Controlling this quantity offers a very strict error measure. In general, as the number of tests increases, the power decreases when controlling FWER. FDR is defined to be

i.e. the expected proportion of false positive findings among all rejected hypotheses times the probability of making at least one rejection. Benjamini and Hochberg (1995) and Benjamini and Liu (1999)provided sequential _p_-value methods to control this quantity. FDR offers a much less strict multiple-testing criterion over FWER and therefore leads to an increase in power.

In Storey (2001), we define a new false discovery rate, pFDR.

Definition 1.

The term ‘positive' has been added to reflect the fact that we are conditioning on the event that positive findings have occurred. When controlling FDR at level α, and positive findings have occurred, then FDR has really only been controlled at level α/Pr(_R_>0). This can be quite dangerous, and it is not the case for pFDR. See Storey (2001) for a thorough motivation of pFDR over FDR.

Benjamini and Hochberg (1995) precisely define FDR to be expression (1) because this quantity can be controlled by a sequential _p_-value method. (Note, however, that weak control of FWER is implicitly embedded in this definition, i.e. FWER is controlled when all null hypotheses are true.) pFDR is identically 1 when all null hypotheses are true (_m_=_m_0), so the usual sequential _p_-value approach cannot be applied to this quantity. Therefore, to control pFDR, it must be estimated for a particular rejection region.

A sequential _p_-value method allows us to fix the error rate beforehand and to estimate the rejection region, which is what has traditionally been done in multiple-hypothesis testing. In the context of FWER this makes sense. Because FWER measures the probability of making one or more type I error, which is essentially a ‘0–1' event, we can set the rate a priori at which this should occur. False discovery rates, in contrast, are more of an exploratory tool. For example, suppose that we are testing 1000 hypotheses and decide beforehand to control FDR at level 5%. Whether this was an appropriate choice largely depends on the number of hypotheses that are rejected. If 100 hypotheses are rejected, then clearly this was a good choice. If only two hypotheses are rejected, then clearly this was a less useful choice.

Therefore fixing the rejection region beforehand can be more appropriate when using pFDR or FDR. For example, when performing many hypothesis tests, it can make sense to reject all _p_-values that are less than 0.05 or 0.01. Also, expert knowledge in a particular field may allow us to decide which rejection regions should be used.

It will also be seen that this approach allows us to control pFDR, which we find to be a more appropriate error measure. Probably the most important reason for fixing the rejecting region is that it allows us to take a conceptually simpler approach to complicated compound error measures such as pFDR and FDR.

The _q_-value (Section 3) is the pFDR analogue of the _p_-value and allows the rejection regions to be determined by the observed _p_-values. This is more appropriate over either fixing the rejection region or fixing the error rate. But, by first fixing the rejection region in our approach, we can formulate the _q_-values quite easily.

3. Estimation and inference for the positive false discovery rate and false discovery rate

In this section, we apply the proposed approach to both pFDR and FDR. We first present a few simple facts about pFDR under independence to motivate our estimates. Suppose that we are testing m identical hypothesis tests _H_1,_H_2,…,H m with independent statistics _T_1,_T_2,…,T m. We let H _i_=0 when null hypothesis i is true, and H _i_=1 otherwise. We assume that the null T i|H _i_=0 and the alternative T i|H _i_=1 are identically distributed. We assume that the same rejection region is used for each test, which make the tests ‘identical'. Finally, we assume that the H i are independent Bernoulli random variables with Pr(H _i_=0)=_π_0 and Pr(H _i_=1)=_π_1. Let Γ be the common rejection region for each hypothesis test.

The following is theorem 1 from Storey (2001). It allows us to write pFDR in a very simple form that does not depend on m. For this theorem to hold we must assume that the H i are random; however, for large _m_this assumption makes little difference.

Theorem 1.

Suppose that m identical hypothesis tests are performed with the independent statistics _T_1,…,T m and rejection region Γ. Also suppose that a null hypothesis is true with _a priori_probability _π_0. Then

pFDR(Γ)=π0Pr(T∈Γ∣H=0)Pr(T∈Γ)

(3)

where Pr(_T_∈Γ)=_π_0 Pr(_T_∈Γ|_H_=0)+_π_1 Pr(_T_∈Γ|_H_=1).

This paper will be limited to the case where we reject on the basis of independent _p_-values. See Storey and Tibshirani (2001) for a treatment of more general situations. It follows that for rejections based on _p_-values all rejection regions are of the form [0,_γ_] for some _γ_≥0. (See remark 1 in Appendix A for a justification of this.) For the remainder of the paper, instead of denoting rejection regions by the more abstract Γ, we denote them by γ, which refers to the interval [0,_γ_].

In terms of _p_-values we can write the result of theorem 1 as

pFDR(γ)=π0Pr(P⩽γ∣H=0)Pr(P⩽γ)=π0γPr(P⩽γ),

(5)

where P is the random _p_-value resulting from any test. Under independence the _p_-values are exchangeable in that each comes from the null distribution (i.e. uniform[0,1]) with probability _π_0 and from the alternative distribution with probability _π_1. It is easiest to think about this in terms of simple versus simple hypothesis tests, but the theory also works for composite alternative hypotheses with a random effect (Storey, 2001).

Since π_0_m of the _p_-values are expected to be null, then the largest _p_-values are most likely to come from the null, uniformly distributed _p_-values. Hence, a conservative estimate of _π_0 is

π^0(λ)=#{pi>λ}(1−λ)m=W(λ)(1−λ)m

(6)

for some well-chosen λ, where _p_1,…,p m are the observed _p_-values and W(λ)=#{p _i_>λ}. (Recall the definitions of W and R from Table 1.) For now we assume that λ is fixed; however, we show how to pick the optimal λ in Section 3. (Efron et al. (2001) used a different estimate of _π_0 in an empirical Bayes method that is related to pFDR.) A natural estimate of Pr(P_≤_γ) is

where R(γ)=#{p i_≤_γ}. Therefore, a good estimate of pFDR(γ) for fixed λ is

Q^λ(γ)=π^0(λ)γPr^(P⩽γ)=W(λ)γ(1−λ)R(γ).

(8)

pFDR and FDR are asymptotically equivalent for a fixed rejection region. We see in Section 3 that Q^λ(γ) shows good asymptotic properties for pFDR. In fact, we show that it is a maximum likelihood estimate. However, because of finite sample considerations, we must make two slight adjustments to estimate pFDR. When R(γ)=0, the estimate would be undefined, which is undesirable for finite samples. Therefore, we replace R(γ) with R(γ)∨1. This is equivalent to making a linear interpolation between the estimate at [0,p(1)] and the origin. Also, 1−(1−γ)m is clearly a lower bound for Pr{R(γ)>0}. Since pFDR is conditioned on R(γ)>0, we divide by 1−(1−γ)m. In other words γ/{1−(1−γ)m} is a conservative estimate of the type I error, conditional that R(γ)>0. (See Section 3 for more on why we do this.) Therefore, we estimate pFDR as

pFDR^λ(γ)=π^0(λ)γPr^(P⩽γ){1−(1−γ)m}=W(λ)γ(1−λ){R(γ)∨1}{1−(1−γ)m}.

(9)

Since FDR is not conditioned on at least one rejection occurring, we can set

FDR^λ(γ)=π^0(λ)γPr^(P⩽γ)=W(λ)γ(1−λ){R(γ)∨1}.

(10)

For large m these two estimates are equivalent, but we find the pFDR quantity to be more appro- priate and think that it should be used. When _γ_=1/m, pr {R(γ)>0} can be as small as 0.632, so FDR can be quite misleading as mentioned in the previous section. For fixed m and _γ_→0, FDR(γ) and FDR^λ(γ) show unsatisfying properties, and we show this in Section 3.

We show in Section 3 that pFDR^λ(γ) and FDR^λ(γ) offer an analogous property to strong control in that they are conservatively biased for all _π_0. However, as we argued in Section 1, the expected value of a multiple-hypothesis testing procedure is not a sufficiently broad picture. Since the _p_-values are independent, we can sample them with replacement to obtain standard bootstrap samples. From these we can form bootstrap versions of our estimate and provide upper confidence limits for pFDR and FDR. This allows us to make much more precise statements about how much multiple-hypothesis testing control is being offered. The full details of the estimation and inference of pFDR(γ) are given in algorithm 1. The same algorithm holds for the estimation and inference of FDR(γ), except that we obviously use FDR^λ(γ) instead. In Section 3, we extend our methodology to include an automatic method for choosing the optimal λ.

If pFDR^λ(γ)>1⁠, we recommend setting pFDR^λ(γ)=1 since obviously pFDR(γ)≤1. We could smooth the estimate so that it is always less than or equal to 1, but we have taken a simpler approach here. The same comment holds for FDR^λ(γ)⁠.

Even though the estimates presented in this section are new, the approach has implicitly been taken before. Yekutieli and Benjamini (1999) introduced the idea of estimating FDR under dependence within the Benjamini and Hochberg (1995) framework. Also, Benjamini and Hochberg (2000) incorporated an estimate of _m_0 into their original algorithm in a post hoc fashion. Tusher et al. (2001) fixed the rejection region and estimated FDR.

3.1. Algorithm 1: estimation and inference for pFDR(γ) and FDR(γ)

FDR^λ(γ)=π^0(λ)γPr^(P⩽γ).

(11)

4. A connection between the two approaches

In this section we present a heuristic connection between the sequential _p_-value method of Benjamini and Hochberg (1995) and the approach presented in the previous section. The goal is to provide insight into the increased power and effectiveness of our proposed approach.

The basic point that we make is that using the Benjamini and Hochberg (1995) method to control FDR at level α/_π_0 is equivalent to (i.e. rejects the same _p_-values as) uscing the proposed method to control FDR at level α. The gain in power from our approach is clear—we control a smaller error rate (α_≤_α/_π_0), yet reject the same number of tests.

Let p(1)≤…≤p(m) be the ordered, observed _p_-values for the m hypothesis tests. The method of Benjamini and Hochberg (1995) finds k^ such that

Rejecting p(1),…,p(k^) provides FDR≤α.

Now suppose that we use our method and take the most conservative estimate FDR^(γ)⩽α⁠. Then the estimate π^0=1 if we reject p(1),…,p(l^) such that

But since

this is equivalent to (with π^0=1⁠)

Therefore, k^=l^ when π^0=1⁠. Moreover, if we take the better estimate

then l^>k^ with high probability.

Therefore, we have shown that l^⩾k^⁠. In other words, using our approach, we reject a greater number of hypotheses while controlling the same error rate, which leads to greater power. The operational difference between FDR^λ(γ) and the Benjamini and Hochberg (1995) procedure is the inclusion of π^0(λ)⁠. It is important to note, however, that we did not simply reverse their method and stick in π^0(λ)⁠. Rather, we took a very different approach, starting from simple facts about pFDR under independence with fixed rejection regions. Benjamini and Hochberg (1995) did not give us much insight into why they chose their particular sequential _p_-value method. This comparison sheds some light on why it works.

5. A numerical study

In this section we present some numerical results to compare the power of the Benjamini and Hochberg (1995) procedure with our proposed method. As mentioned in Section 3, it is not straightforward to compare these two methods since Benjamini and Hochberg (1995) estimated the rejection region whereas our method estimates FDR. We circumvent this problem by using the Benjamini–Hochberg procedure to control FDR at level FDR^λ(γ) for each iteration.

We looked at the two rejection regions _γ_=0.01 and _γ_=0.001 over several values of _π_0. The values of γ and _π_0 were chosen to cover a wide variety of situations. We performed _m_=1000 hypothesis tests of _μ_=0 versus _μ_=2 for independent random variables Z i_∼_N(μ,1), _i_=1,…,1000, over 1000 iterations. The null hypothesis for each test is that _μ_=0, so the proportion of Z i_∼_N(0,1) was set to _π_0; hence, _π_1 of the statistics have the alternative distribution N(2,1). For each test the _p_-value is defined as p i_=Pr{N(0,1)≥_z i}, where z i is the observed value of Z i.

To calculate the power of our method, test i was rejected if p i_≤_γ, and the power was calculated accordingly. Also, FDR^(γ) was calculated as we outlined in Section 3. The Benjamini–Hochberg method was performed at level FDR^(γ)⁠, and the power was calculated. This approach should put the two methods on equal ground for comparison; reporting FDR^(γ) is equival- ent in practice to using the Benjamini–Hochberg method to control FDR at level FDR^(γ)⁠.

The simulations were performed for _π_0=0.1,0.2,…,0.9. Even though here we know the alternative distribution of the _p_-values, we did not use this knowledge. Instead, we estimated FDR as if the alternative distribution was unknown. Therefore, we had to choose a value of λ to estimate _π_0; we used λ=12 in all calculations for simplicity.

Table 2 shows the results of the simulation study. The first half of the table corresponds to _γ_=0.01, and the second half corresponds to _γ_=0.001. It can be seen that there is a substantial increase in power by using the proposed method. One case even gives an increase of over 800% in power. The power is constant over each case of our method because the same rejection region is used. The power of the Benjamini–Hochberg method increases as _π_0 grows larger because the procedure becomes less conservative. In fact, it follows from Section 3 that, as _π_0→1, the Benjamini–Hochberg method becomes the proposed method.

Table 2

Numerical comparison between the Benjamini–Hochberg and proposed methods

_π_0 FDR Power E ( FDR^_), proposed method_ E ( π^0_),proposed method_ E ( γ^_),Benjamini–Hochberg method_
Proposed method Benjamini Hochberg method
_γ_=0.01
0.1 0.003 0.372 0.074 0.004 0.141 0.0003
0.2 0.007 0.372 0.122 0.008 0.236 0.0008
0.3 0.011 0.372 0.164 0.013 0.331 0.001
0.4 0.018 0.372 0.203 0.019 0.426 0.002
0.5 0.026 0.372 0.235 0.027 0.523 0.003
0.6 0.039 0.372 0.268 0.040 0.618 0.004
0.7 0.060 0.371 0.295 0.061 0.714 0.005
0.8 0.097 0.372 0.319 0.099 0.809 0.007
0.9 0.195 0.372 0.344 0.200 0.905 0.008
_γ_=0.001
0.1 0.0008 0.138 0.016 0.001 0.141 1×10−5
0.2 0.002 0.138 0.031 0.002 0.236 5×10−5
0.3 0.003 0.137 0.046 0.003 0.331 0.0001
0.4 0.005 0.138 0.060 0.005 0.426 0.0002
0.5 0.007 0.138 0.074 0.008 0.523 0.0003
0.6 0.011 0.138 0.088 0.011 0.618 0.0004
0.7 0.017 0.138 0.101 0.017 0.714 0.0005
0.8 0.028 0.138 0.129 0.030 0.809 0.0006
0.9 0.061 0.137 0.133 0.066 0.905 0.0008
_π_0 FDR Power E ( FDR^_), proposed method_ E ( π^0_),proposed method_ E ( γ^_),Benjamini–Hochberg method_
Proposed method Benjamini Hochberg method
_γ_=0.01
0.1 0.003 0.372 0.074 0.004 0.141 0.0003
0.2 0.007 0.372 0.122 0.008 0.236 0.0008
0.3 0.011 0.372 0.164 0.013 0.331 0.001
0.4 0.018 0.372 0.203 0.019 0.426 0.002
0.5 0.026 0.372 0.235 0.027 0.523 0.003
0.6 0.039 0.372 0.268 0.040 0.618 0.004
0.7 0.060 0.371 0.295 0.061 0.714 0.005
0.8 0.097 0.372 0.319 0.099 0.809 0.007
0.9 0.195 0.372 0.344 0.200 0.905 0.008
_γ_=0.001
0.1 0.0008 0.138 0.016 0.001 0.141 1×10−5
0.2 0.002 0.138 0.031 0.002 0.236 5×10−5
0.3 0.003 0.137 0.046 0.003 0.331 0.0001
0.4 0.005 0.138 0.060 0.005 0.426 0.0002
0.5 0.007 0.138 0.074 0.008 0.523 0.0003
0.6 0.011 0.138 0.088 0.011 0.618 0.0004
0.7 0.017 0.138 0.101 0.017 0.714 0.0005
0.8 0.028 0.138 0.129 0.030 0.809 0.0006
0.9 0.061 0.137 0.133 0.066 0.905 0.0008

Table 2

Numerical comparison between the Benjamini–Hochberg and proposed methods

_π_0 FDR Power E ( FDR^_), proposed method_ E ( π^0_),proposed method_ E ( γ^_),Benjamini–Hochberg method_
Proposed method Benjamini Hochberg method
_γ_=0.01
0.1 0.003 0.372 0.074 0.004 0.141 0.0003
0.2 0.007 0.372 0.122 0.008 0.236 0.0008
0.3 0.011 0.372 0.164 0.013 0.331 0.001
0.4 0.018 0.372 0.203 0.019 0.426 0.002
0.5 0.026 0.372 0.235 0.027 0.523 0.003
0.6 0.039 0.372 0.268 0.040 0.618 0.004
0.7 0.060 0.371 0.295 0.061 0.714 0.005
0.8 0.097 0.372 0.319 0.099 0.809 0.007
0.9 0.195 0.372 0.344 0.200 0.905 0.008
_γ_=0.001
0.1 0.0008 0.138 0.016 0.001 0.141 1×10−5
0.2 0.002 0.138 0.031 0.002 0.236 5×10−5
0.3 0.003 0.137 0.046 0.003 0.331 0.0001
0.4 0.005 0.138 0.060 0.005 0.426 0.0002
0.5 0.007 0.138 0.074 0.008 0.523 0.0003
0.6 0.011 0.138 0.088 0.011 0.618 0.0004
0.7 0.017 0.138 0.101 0.017 0.714 0.0005
0.8 0.028 0.138 0.129 0.030 0.809 0.0006
0.9 0.061 0.137 0.133 0.066 0.905 0.0008
_π_0 FDR Power E ( FDR^_), proposed method_ E ( π^0_),proposed method_ E ( γ^_),Benjamini–Hochberg method_
Proposed method Benjamini Hochberg method
_γ_=0.01
0.1 0.003 0.372 0.074 0.004 0.141 0.0003
0.2 0.007 0.372 0.122 0.008 0.236 0.0008
0.3 0.011 0.372 0.164 0.013 0.331 0.001
0.4 0.018 0.372 0.203 0.019 0.426 0.002
0.5 0.026 0.372 0.235 0.027 0.523 0.003
0.6 0.039 0.372 0.268 0.040 0.618 0.004
0.7 0.060 0.371 0.295 0.061 0.714 0.005
0.8 0.097 0.372 0.319 0.099 0.809 0.007
0.9 0.195 0.372 0.344 0.200 0.905 0.008
_γ_=0.001
0.1 0.0008 0.138 0.016 0.001 0.141 1×10−5
0.2 0.002 0.138 0.031 0.002 0.236 5×10−5
0.3 0.003 0.137 0.046 0.003 0.331 0.0001
0.4 0.005 0.138 0.060 0.005 0.426 0.0002
0.5 0.007 0.138 0.074 0.008 0.523 0.0003
0.6 0.011 0.138 0.088 0.011 0.618 0.0004
0.7 0.017 0.138 0.101 0.017 0.714 0.0005
0.8 0.028 0.138 0.129 0.030 0.809 0.0006
0.9 0.061 0.137 0.133 0.066 0.905 0.0008

The fifth column of Table 2 shows E(FDR^) for our method. It can be seen that this is very close to the true FDR in the second column (usually within 0.1%), and it is always conservative. The proposed method is nearly optimal in that it estimates FDR(γ) basically as close as conservatively possible for each rejection region. Therefore, we essentially lose no power regardless of the value of _π_0. Moreover the method becomes better as the number of tests increases; the opposite has been true in the past. The sixth column shows E(π^0) for our method. It can be seen that this estimate is always conservative and very close to the actual value. Recall that the Benjamini–Hochberg method essentially estimates the rejection region [0,_γ_]. The eighth column shows E(γ^) over the 1000 realizations of γ^⁠. It can be seen that these estimates are quite conservative. The power comparisons are also shown graphically in Fig. 1.

Average power versusπ0 for the Benjamini–Hochberg method (. . . . . . .) and the proposed method (—): (a) rejection region defined by γ=0.01; (b) rejection region defined by γ=0.001 (it can be seen that there is a substantial increase in power under the proposed method in both cases)

Fig. 1

Average power versus _π_0 for the Benjamini–Hochberg method (. . . . . . .) and the proposed method (—): (a) rejection region defined by _γ_=0.01; (b) rejection region defined by _γ_=0.001 (it can be seen that there is a substantial increase in power under the proposed method in both cases)

The success of our method also largely depends on how well we can estimate pFDR(γ) and FDR(γ). It is seen in this simulation that the estimates are very good. This is especially due to the fact that the power–type I error curve is well behaved in the sense discussed in Section 3. If we choose λ more adaptively, the estimation is even better. This is the topic of Section 3.

6. Theoretical results

In this section, we provide finite sample and large sample results for pFDR^λ(γ) and FDR^λ(γ)⁠. Our goal of course is to provide conservative estimates of pFDR(γ) and FDR(γ). For example, we want pFDR^λ(γ)⩾pFDR(γ) as much as possible without being too conservative. The following is our main finite sample result.

Theorem 2.

E{pDDR^λ(γ)}⩾pFDR(γ) and E{FDR^λ(γ)}⩾FDR(γ) for all γ and _π_0.

This result is analogous to showing ‘strong control' of our method. The theorem is stated under the assumption that we do not truncate the estimates at 1. Of course in practice we would truncate the estimates at 1 since FDR≤\hboxpFDR≤1, but the expected value of the estimates nevertheless behaves as we would want it to. The following result shows that truncating the estimates is a good idea when taking into account both bias and variance.

Theorem 3.

E[{pFDR^λ(γ)−pFDR(γ)}2]>E[{pFDR^λ(γ)∧1−pFDR(γ)}2] and E[{FDR^λ(γ)−FDR(γ)}2]>E[{FDR^λ(γ)∧1−FDR(γ)}2]

We now present large sample results for k^ Q^λ(γ) FDR^λ(γ) p(1),…,p(k^) p(1),…,p(l^) π^0=1 k^=l^ l^>k^ l^⩾k^pFDR^λ(γ)⁠. (These results also hold for FDR^(γ) since FDR^(γ)~pFDR^(γ).) The tightness to which pFDR^(γ) converges to an upper bound of pFDR(γ) largely depends on how the power changes with the type I error. For this, let g(λ)=Pr(P_≤_λ|_H_=1) be the power as a function of type I error λ. Note that g(⋅) is just the cumulative density function of the alternative p_-values. For m identical simple tests, g(λ) is the same for each test. If the alternative hypothesis is composite, then g(λ) must be defined as the appropriate mixture. We assume that g(0)=0, g(1)=1 and g(λ)≥_λ for 0<λ<1.

Theorem 4.

With probability 1, we have

limm→∞{pFDR^λ(γ)}=π0+π1{1−g(λ)}/(1−λ)π0pFDR(γ)⩾pFDR(γ).

(16)

This theorem can be understood graphically in terms of the plot of power against type I error for each rejection region [0,_λ_]. The function g(λ) gives the power over the rejection region [0,_λ_], and of course the type I error over this region is λ. The estimate of π_0 is taken over the interval (λ,1], so 1−_g(λ) is the probability that a p_-value from the alternative distribution falls into (λ,1]. Likewise, 1−_λ is the probability that the null _p_-value falls into (λ,1]. The estimate of _π_0 is better the more g(λ)>λ. This is the case since the interval (λ,1] will contain fewer alternative _p_-values, and hence the estimate will be less conservative. Fig. 2 shows a plot of g(λ) versus λ for a concave g. For concave g, the estimate of _π_0 becomes less conservative as _λ_→1. This is formally stated in the following corollary.

Power g(λ) versus type 1 error λ: it can be seen that since g is concave {1−g(λ)}/(1−λ) grows smaller as λ→1; the line has slope  lim λ→1[{1−g(λ)}/(1−λ)], which is the smallest value of {1−g(λ)}/(1−λ) that can be attained for concave g

Fig. 2

Power g(λ) versus type 1 error λ: it can be seen that since g is concave {1−g(λ)}/(1−λ) grows smaller as λ_→1; the line has slope lim λ_→1[{1−_g(λ)}/(1−_λ)], which is the smallest value of {1−g(λ)}/(1−λ) that can be attained for concave g

Corollary 1.

For concave g

infλlimm→∞{pFDR^λ(γ)}=limλ→1limm→∞{pDDR^λ(γ)}=π0+g′(1)π1π0pFDR(γ) almost surely,

(17)

where g ′ (1) is the derivative of g evaluated at 1.

In other words, the right-hand side of equation (17) is the tightest upper bound that pFDR^(γ) can attain on pFDR as m_→ ∞ for concave g. The corollary can be seen graphically in Fig. 3. A plot of {1−_g(λ)}/(1−λ) versus λ is shown for a concave g. It can be seen that the minimum is obtained at _λ_=1. The minimum value is g ′ 1), which happens to be 14 in this graph. Whenever the rejection regions are based on a monotone function of the likelihood ratio between the null and alternative hypotheses, g is concave. If g is not concave, then the optimal λ used in the estimate of _π_0 may not be _λ_=1.

{1−g(λ)}/(1−λ) versusλ for a concave g: it can be seen that the minimum is obtained at λ=1 with value g′(1)=14

Fig. 3

{1−g(λ)}/(1−λ) versus λ for a concave g: it can be seen that the minimum is obtained at _λ_=1 with value g′(1)=14

A nice property of this last result is that g ′ 1)=0 whenever doing a one-sided test of a single parameter of an exponential family. Therefore, in many of the common cases, we can achieve exact convergence as _λ_→1.

Recall the estimate Q^λ(γ)=π^0(λ)γ/R(γ) from equation (8) in Section 3. pFDR^λ(γ) and FDR^λ(γ) are modified versions of this that show good finite sample properties, as we have seen. It follows, however, that Q^λ(γ) is a maximum likelihood estimate of the limiting quantity in theorem 4.

Theorem 5.

Under the assumptions of theorem 1, Q^λ(γ) is the maximum likelihood estimate of

π0+π1{1−g(λ)}/(1−λ)π0pFDR(γ).

(18)

This quantity is slightly greater than pFDR(γ) for powerful tests. In situations where g is unknown, this estimate is, loosely speaking, ‘optimal' in that the bias can usually be made arbitrarily small (see corollary 1), while obtaining the smallest asymptotic variance for an estimator of that bias. pFDR^λ(γ) has good finite sample properties (avoiding the inconveniences of the pure maximum likelihood estimate), but it is asymptotically equivalent to Q^λ(γ)⁠, so it has the same large sample properties.

7. The _q_-value

We now discuss a natural pFDR analogue of the _p_-value, which we call the q-value. This quantity was first developed and investigated in Storey (2001). The _q_-value gives the scientist a hypothesis testing error measure for each observed statistic with respect to pFDR. The _p_-value accomplishes the same goal with respect to the type I error, and the adjusted _p_-value with respect to FWER.

Even though we are only considering hypothesis testing with independent _p_-values, it helps to introduce the _q_-value formally in a general setting to motivate its definition better. We shall also define the _q_-value in terms of _p_-values. For a nested set of rejection regions {Γ} (for example, {Γ} could be all sets of the form [c, ∞ ) for − ∞ ≤_c_≤ ∞ ), the _p_-value of an observed statistic _T_=t is defined to be

p-value (t)=min{Γ:t∈Γ}{Pr(T∈Γ∣H=0)}.

(19)

This quantity gives a measure of the strength of the observed statistic with respect to making a type I error—it is the minimum type I error rate that can occur when rejecting a statistic with value t for the set of nested rejection regions. In a multiple-testing situation, we can adjust the _p_-values of several statistics to control FWER. The adjusted _p_-values give a measure of the strength of an observed statistic with respect to making one or more type I error. As a natural extension to pFDR, the _q_-value has the following definition.

Definition 2.

For an observed statistic _T_=t, the q-value of t is defined to be

q(t)=inf{Γ:t∈Γ}{pFDR(Γ)}.

(20)

In words, the _q_-value is a measure of the strength of an observed statistic with respect to pFDR—it is the minimum pFDR that can occur when rejecting a statistic with value t for the set of nested rejection regions.

The definition is simpler when the statistics are independent _p_-values. The nested set of rejection regions take the form [0,_γ_] and pFDR can be written in a simple form. Therefore, in terms of independent _p_-values, the following is the definition of the _q_-value of an observed _p_-value p.

Definition 3.

For a set of hypothesis tests conducted with independent _p_-values, the _q_-value of the observed _p_-value p is

q(p)=infγ⩾p{pFDR(γ)}=infγ⩾p{π0γPr(P⩽γ)}.

(21)

The right-hand side of the definition only holds when the H i are random as in theorem 1. See Storey (2001) for more theoretical details about the _q_-value. Here, we propose the following algorithm for calculating q(p i) in practice.

This procedure ensures that q^(p(1))⩽…⩽q^(p(m))⁠, which is necessary according to our definition. The _q_-values can be used in practice in the following way: q^(p(i)) gives us the minimum pFDR that we can achieve for rejection regions containing [0,p(i)] for _i_=1,…,m. In other words, for each _p_-value there is a rejection region with pFDR equal to q(p(i)) so that at least p(1),…,p(i) are rejected. Note that we write the calculated _q_-values as q^(p(i))⁠. This is because q^(p(i)) is an estimate of q(p(i)). The exact operating characteristics of q^(p(i)) are left as an open problem, but simulations show that it behaves conservatively, as we would want.

7.1. Algorithm 2: calculating the q-value

8. The advantages of pFDR^λ(γ) and q^ over FDR^λ(γ)

In this section, we take a closer look at the differences between pFDR^λ(γ) and FDR^λ(γ)⁠, and why it makes sense to use pFDR and the _q_-value. Consider the following fact for fixed m:

limγ→0{pFDR^λ(γ)}=π^0(λ).

(22)

In other words, as we make the rejection region increasingly smaller, we eventually estimate pFDR as π^0(λ)⁠. This is the conservative thing to do since all that we can conclude is that

Also, under no parametric assumptions, this is exactly what we would want. For example, suppose that we take a very small rejection region. Then it is most likely that only one _p_- value falls into that region. Without information from other _p_-values, and without parametric information about the alternative distribution, there is little that we can say about whether this one _p_-value is null or alternative. Therefore, it makes sense to estimate pFDR by the prior probability π^0(λ) in extremely small rejection regions.

Note in contrast that

Does this makes sense? It does in that lim _γ_→0{FDR(γ)}=0. But the only reason why we always achieve this convergence is because of the extra term Pr{R(γ)>0} in FDR. Therefore, as γ becomes small, FDR is driven by the fact that the rejection region is small rather than the fact that the ‘rate that discoveries are false' is small. After all, as we said above, there is not enough information about the alternative distribution in these small intervals to know how likely it would be that a _p_-value is null or alternative.

Therefore, if we were to define the _q_-value in terms of FDR, then for small _p_-values it would be driven to 0 just because the _p_-value is small, even though we know little about how likely it came from the alternative hypothesis without serious assumptions. Consider Fig. 4. We performed 1000 hypothesis tests of N(0,1) versus N(2,1). 800 came from the null N(0,1) distribution and 200 came from the alternative N(2,1) distribution. Fig. 4(a) shows pFDR^λ(γ) and FDR^λ(γ) as a function of γ, as well as the _q_-value as a function of the observed _p_-values. It can be seen that all three functions look similar except close to the origin.

Plot of pFDR^λ(γ) (—-), FDR^λ(γ) (- - - - - - -) and q^ for the N(0,1) versusN(2,1) example: it can be seen thatpFDR^(γ) and q^ behave more reasonably than FDR^(γ) near the origin

Fig. 4

Plot of pFDR^λ(γ) (—-), FDR^λ(γ) (- - - - - - -) and q^ for the N(0,1) versus N(2,1) example: it can be seen thatpFDR^(γ) and q^ behave more reasonably than FDR^(γ) near the origin

Fig. 4(b) zooms in near zero, where we see that pFDR^λ(γ) shoots up to π^0(λ)⁠, and FDR^λ(γ) shoots down to 0. The _q_-value, however, sits steady where pFDR^λ(γ) reaches its minimum (at about p(10)). In other words, the _q_-value calibrates where we start to receive enough information to make good statements about pFDR. FDR says nothing about ‘the rate that discoveries are false' near the origin and merely measures the fact that we are near the origin. Moreover, the area near zero is arguably the most important region since this is where the most significant _p_-values lie. Therefore, by using pFDR^λ(γ) and the _q_-value, we obtain robust estimates of pFDR, which we argue is the more appropriate error measure. The _q_-value bypasses our having fixed the rejection regions and makes the rejection regions random in the appropriate way. It also bypasses any need to fix the error rate beforehand, as must be done in the traditional framework.

9. Calculating the optimal λ

In Section 3 we showed how to estimate pFDR(γ) and FDR(γ), using the fixed parameter λ for the estimate of _π_0. In this section, we show how to pick λ optimally to minimize the mean-squared error of our estimates. We present the methodology for pFDR^λ(γ)⁠, although the same procedure works for FDR^λ(γ)⁠. We provide an automatic way to estimate

λbest =argminλ∈[0,1](E[{pFDR^λ(γ)−pFDR(γ)}2]).

(24)

We use the bootstrap method to estimate _λ_best and calculate an estimate of E[{pFDR^λ(γ)−pFDR(γ)}2] over a range of λ. (Call this range R; for example, we may take R_={0,0.05,0.10,…,0.95}.) As mentioned in Section 3, we can produce bootstrap versions pFDR^λ*b(γ)(for_b_=1,…,B) of the estimate pFDR^λ(γ) for any fixed_λ. Ideally we would like to know pFDR(γ), and then the bootstrap estimate of the MSE(λ) would be

1B∑​b=1B{pFDR^λ∗b(γ)−pFDR(γ)}2.

(25)

However, we do not know pFDR(γ), so we must form a plug-in estimate of this quantity (Efron and Tibshirani, 1993). For any λ we have

E{pFDR^λ(γ)}⩾minλ′[E{pFDR^λ′(γ)}]⩾pFDR(γ),

(26)

as was shown in Section 3. Therefore, our plug-in estimate of pFDR(γ) is minλ′∈R{pFDR^λ′(γ)}⁠. The estimate of MSE(λ) is then

MSE^(λ)=1B∑​b=1B[pFDR^λ∗b(γ)−minλ′∈R{pFDR^λ′(γ)}]2.

(27)

This method can easily be incorporated in the main method described in Section 3 in a computationally efficient way. Our proposed method for choosing λ is formally detailed in algorithm 3. Finally note that, in choosing λ over the q-values, we can minimize the averaged MSE(λ) over all the _q_-values and adjust algorithm 3 accordingly.

We provide some numerical results under the following set-up. We tested m hypotheses of N(0,1) versus N(1,1) with the rejection region Γ=[c, ∞ ). Each statistic is independent—therefore, when we form bootstrap statistics, we simply sample from the _m_statistics. We calculated _λ_best from the true mean-squared error for each case. For each set of parameters, we performed the bootstrap procedure on 100 data sets with _B_=500 and then averaged their predicted mean-squared error curves. λ^ was chosen by taking the minimum of the averaged mean-squared error curves. Taking the median of the 100 λ^ produces nearly identical results.

Fig. 5 shows the results for _m_=1000 and _c_=2 over the values _π_0=1,0.8,0.5,0.2. Averaging over applications of the procedure only 100 times gives us the correct _λ_best for the first three cases. It is not important to predict the mean-squared error curve, but rather where its minimum is. It can also be seen from the plots that the bootstrap procedure produces a conservative estimate of the mean-squared error for any λ. Table 3 shows simulation results for several other sets of parameters. It can be seen that even when λ^≠λbest the difference in their true mean-squared errors is not very drastic, so the minimum mean-squared error is nearly attained in almost all the situations that we simulated.

![Plots of MSE(λ) versusλ for Γ=2, ∞ ) for (a) π0=1, (b) π0=0.8, (c) π0=0.5 and (d) π0=0.2 (——, true mean-squared error; . . . . . . . mean-squared error predicted by the bootstrap procedure averaged over 100 applications)

Fig. 5

Plots of MSE(λ) versus λ for Γ=[2, ∞ ) for (a) _π_0=1, (b) _π_0=0.8, (c) _π_0=0.5 and (d) _π_0=0.2 (——, true mean-squared error; . . . . . . . mean-squared error predicted by the bootstrap procedure averaged over 100 applications)

Table 3

Simulation results for the bootstrap procedure to pick the optimal λ

_π_0 m Cut point _λ_best λ^ MSE(_λ_best) (⁠λ^⁠)
1 1000 2 0 0 0.0602 0.0602
0.8 1000 2 0.8 0.8 0.00444 0.00444
0.5 1000 2 0.9 0.9 0.000779 0.000779
0.2 1000 2 0.95 0.9 0.000318 0.000362
0.8 100 2 0.75 0.65 0.123 0.127
0.8 500 2 0.75 0.75 0.00953 0.00953
0.8 10000 2 0.9 0.9 0.000556 0.000556
0.8 1000 0 0.7 0.85 0.00445 0.00556
0.8 1000 1 0.7 0.8 0.00361 0.00385
0.8 1000 3 0.85 0.9 0.0323 0.0326
_π_0 m Cut point _λ_best λ^ MSE(_λ_best) (⁠λ^⁠)
1 1000 2 0 0 0.0602 0.0602
0.8 1000 2 0.8 0.8 0.00444 0.00444
0.5 1000 2 0.9 0.9 0.000779 0.000779
0.2 1000 2 0.95 0.9 0.000318 0.000362
0.8 100 2 0.75 0.65 0.123 0.127
0.8 500 2 0.75 0.75 0.00953 0.00953
0.8 10000 2 0.9 0.9 0.000556 0.000556
0.8 1000 0 0.7 0.85 0.00445 0.00556
0.8 1000 1 0.7 0.8 0.00361 0.00385
0.8 1000 3 0.85 0.9 0.0323 0.0326

Table 3

Simulation results for the bootstrap procedure to pick the optimal λ

_π_0 m Cut point _λ_best λ^ MSE(_λ_best) (⁠λ^⁠)
1 1000 2 0 0 0.0602 0.0602
0.8 1000 2 0.8 0.8 0.00444 0.00444
0.5 1000 2 0.9 0.9 0.000779 0.000779
0.2 1000 2 0.95 0.9 0.000318 0.000362
0.8 100 2 0.75 0.65 0.123 0.127
0.8 500 2 0.75 0.75 0.00953 0.00953
0.8 10000 2 0.9 0.9 0.000556 0.000556
0.8 1000 0 0.7 0.85 0.00445 0.00556
0.8 1000 1 0.7 0.8 0.00361 0.00385
0.8 1000 3 0.85 0.9 0.0323 0.0326
_π_0 m Cut point _λ_best λ^ MSE(_λ_best) (⁠λ^⁠)
1 1000 2 0 0 0.0602 0.0602
0.8 1000 2 0.8 0.8 0.00444 0.00444
0.5 1000 2 0.9 0.9 0.000779 0.000779
0.2 1000 2 0.95 0.9 0.000318 0.000362
0.8 100 2 0.75 0.65 0.123 0.127
0.8 500 2 0.75 0.75 0.00953 0.00953
0.8 10000 2 0.9 0.9 0.000556 0.000556
0.8 1000 0 0.7 0.85 0.00445 0.00556
0.8 1000 1 0.7 0.8 0.00361 0.00385
0.8 1000 3 0.85 0.9 0.0323 0.0326

9.1. Algorithm 3: estimation and inference of pFDR(γ) and FDR(γ) with optimal λ

10. Discussion

In this paper, we have proposed a new approach to multiple-hypothesis testing. Instead of setting the error rate at a particular level and estimating the rejection region, we have proposed fixing the rejection region and estimating the error rate. This approach allows a more straightforward analysis of the problem. We have seen that the result is a more powerful and applicable methodology. For example, we estimated a new definition, the positive false discovery rate, one which we argued is usually much more appropriate. And we successfully ‘controlled' it. By using theoretical results about pFDR with fixed rejection regions, we could derive well-behaved estimates of both pFDR and FDR. Interestingly, the Benjamini and Hochberg (1995) step-up method naturally falls out of these results.

Everything that we have discussed in this paper has been under the assumption that we are working with independent_p_-values. In more general cases, such as with dependence or in nonparametric situations, it is possible to apply very similar ideas to obtain accurate estimates of pFDR and FDR. See Storey and Tibshirani (2001) for a treatment of this. There are several other open questions that this approach brings to light. Other, better, estimates may be available. One could also possibly prove optimality theorems with respect to estimating pFDR within certain frameworks.

The_q_-value was discussed, which is the pFDR analogue of the_p_-value. Whereas it can be inconvenient to have to fix the rejection region or the error rate beforehand, the_q_-value requires us to do neither. By developing our methodology with fixed rejection regions, we could formulate the_q_-value in a conceptually simple manner. As an open problem, it is of interest to investigate the operating characteristics of the calculated_q_-values, which we called q^⁠.

In a very interesting paper, Friedman (2001) discusses the role that statistics can play in the burgeoning field of data mining. Data mining involves investigating huge data sets in which ‘interesting' features are discovered. The classical example is determining which products tend to be purchased together in a grocery store. Often the rules for determining interesting features have no simple statistical interpretation. It is understandable that hypothesis testing has not played a major role in this field because, the more hypotheses we have, the less power there is to discover effects. The methodology presented here has the opposite property—the more tests we perform, the better the estimates are. Therefore, it is an asset under this approach to have large data sets with many tests. The only requirement is that the tests must be exchangeable in the sense that the_p_-values have the same null distribution.

Even if the tests are dependent, our approach can be fully applied. It was shown in Storey (2001) that the effect of dependence is negligible if m is large for a large class of dependence. Also, Storey and Tibshirani (2001) treated the case where dependence cannot be ignored. Therefore, we hope that this proposed multiple-hypothesis testing methodology is useful not only in fields like genomics or wavelet analysis but also in the field of data mining where it is desired to find several interesting features out of many, while limiting the rate of false positive findings.

Acknowledgements

Thanks go to Bradley Efron and Ji Zhu for helpful comments and suggestions. I am especially grateful for the ideas and encouragement of my advisor, Robert Tibshirani. This research was supported in part by a National Science Foundation graduate research fellowship.

References

1

Benjamini

,

Y.

and

Hochberg

,

Y.

(

1995

)

Controlling the false discovery rate: a practical and powerful approach to multiple testing

.

J. R. Statist. Soc.

B,

57

,

289

300

.

2

— (

2000

)

On the adaptive control of the false discovery rate in multiple testing with independent statistics

.

J. Educ. Behav. Statist.

,

25

,

60

83

.

3

Benjamini

,

Y.

and

Liu

,

W.

(

1999

)

A step-down multiple-hypothesis procedure that controls the false discovery rate under independence

.

J. Statist. Planng Inf.

,

82

,

163

170

.

4

Efron

,

B.

and

Tibshirani

,

R. J.

(

1993

)

An Introduction to the Bootstrap

. New York:

Chapman and Hall

.

5

Efron

,

B.

,

Tibshirani

,

R.

,

Storey

,

J. D.

and

Tusher

,

V.

(

2001

)

Empirical Bayes analysis of a microarray experiment

.

J. Am. Statist. Ass.

,

96

,

1151

1160

.

6

Friedman

,

J. H.

(

2001

)

The role of statistics in the data revolution

?

Int. Statist. Rev.

,

69

,

5

10

.

8

Storey

,

J. D.

and

Tibshirani

,

R.

(

2001

)

Estimating false discovery rates under dependence, with applications to DNA microarrays

. To be published. (Available from http://wwwstat.stanford.edu/jstorey/.)

9

Tusher

,

V.

,

Tibshirani

,

R.

and

Chu

,

G.

(

2001

)

Significance analysis of microarrays applied to transcriptional responses to ionizing radiation

.

Proc. Natn. Acad. Sci. USA

,

98

,

5116

5121

.

10

Yekutieli

,

D.

and

Benjamini

,

Y.

(

1999

)

Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics

.

J. Statist. Planng Inf.

,

82

,

171

196

.

Appendix A: Remarks and proofs

Remark 1.

Here, we explain why rejection regions for _p_-values should be of the form [0,_γ_]. Recall that, for a nested set of rejection regions {Γ}, the _p_-value of _X_=x is defined to be

p-value (x)=inf{Γ:x∈Γ}{Pr(X∈Γ∣H=0)}.

(30)

Therefore, for two _p_-values _p_1 and _p_2,_p_1≤_p_2 implies that the respective observed statistics _x_1 and _x_2 are such that _x_2∈Γ implies _x_1∈Γ. Therefore, whenever _p_2 is rejected,_p_1 should also be rejected.

Proof of theorem 1. See Storey (2001) for a proof of theorem 1.

Proof of theorem 2. Recall pFDR^λ(γ) from equation (9). Also note that

pFDR(γ)=1Pr{R(γ)>0}E{V(γ)R(γ)∨1}.

(31)

Therefore,

E{pFDR^λ(γ)}−pFDR(γ)⩾E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}],

(32)

since Pr{R(γ)>0}≥1−(1−γ)m under independence. Conditioning on R(γ), it follows that

E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}∣R(γ)]=[E{W(λ)∣R(γ)}/(1−λ)]γ−E{V(γ)∣R(γ)}{R(γ)∨1}Pr{R(γ)>0}.

(31)

By independence, E{W(λ)|R(γ)} is a linear non-increasing function of R(γ), and E{V(γ)|R(γ)} is a linear non-decreasing function of R(γ). Thus, by Jensen's inequality on R(γ) it follows that

E[{W(λ)/(1−λ)}γ−V(γ)R(γ)Pr{R(γ)>0}∣R(γ)>0]⩾E[{W(λ)/(1−λ)}γ−V(γ)∣R(γ)>0]E{R(γ)∣R(γ)>0}Pr{R(γ)>0}.

(34)

Since E{R(γ)}=E{R(γ)|R(γ)>0} Pr{R(γ)>0}, it follows that

E[{W(λ)/(1−λ)}γ−V(γ)∣R(γ)>0]E{R(γ)∣R(γ)>0}Pr{R(γ)>0}=E[{W(λ)/(1−λ)}γ−V(γ)∣R(γ)>0]E{R(γ)}.

35

Also, note that

E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}∣R(γ)=0]=E[W(λ)γ(1−λ)Pr{R(γ)>0}∣R(γ)=0]

36

⩾E[W(λ)γ(1−λ)E{R(γ)}∣R(γ)=0],

(37)

where the last inequality holds since_E_{R(γ)}≥Pr{R(γ)>0}. Therefore,

E{pFDR^λ(γ)}−pFDR(γ)⩾E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}]

(38)

⩾E[{W(λ)/(1−λ)}γ−V(γ)∣R(γ)>0]E{R(γ)}Pr{R(γ)>0}

(39)

+E[{W(λ)/(1−λ)}γ∣R(γ)=0]E{R(γ)}Pr{R(γ)=0}

(40)

=E[{W(λ)/(1−λ)}γ−V(γ)]E{R(γ)}.

(41)

Now

E[{W(λ)/(1−λ)}γ−V(γ)]⩾{mπ0(1−λ)/(1−λ)}γ−mπ0γ=0.

(42)

Thus, {pDR^λ^*1(γ),…,pDR^λ^*B(γ)}⁠.

Since we showed that

E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}]⩾0,

43

and

1Pr{R(γ)>0}[E{FDR^λ(γ)}−FDR(γ)]=E[{W(λ)/(1−λ)}γ−V(γ){R(γ)∨1}Pr{R(γ)>0}],

(44)

it follows that E{FDR^λ(γ)}⩾FDR(γ)⁠.

Remark 2.

It was implicitly used in the previous proof that the_H_ i are random. However, this assumption is unnecessary, and in fact the assumption that the alternative statistics are independent is also unnecessary. See Storey and Tibshirani (2001) for a proof of theorem 2 under these weaker assumptions.

Proof of theorem 3.

The proof of theorem 3 easily follows by noting that

E[{pFDR^λ(γ)−pFDR(γ)}2∣pFDR^λ(γ)>1]>E[{pFDR^λ(γ)∧1−pFDR(γ)}2∣pFDR^λ(γ)>1]

(45)

since pFDR(γ)≤1. The proof for FDR^λ(γ) follows similarly.

Proof of theorem 4.

Recall FDR^λ(γ) from equation (9). By the strong law of large numbers, Pr^(P⩽γ)→Pr(P⩽γ) almost surely. Also, Pr(P_≥_λ|H_=0)=1−_λ and Pr(P_≥_λ|H_=1)=1−_g(λ), where g(λ) is the power of rejecting over [0,_λ_] as described in Section 3. Therefore, by the strong law of large numbers W(λ)/m_→(1−_λ)π_0+{1−_g(λ)}_π_1 almost surely. Thus, it follows that

limm→∞{pFDR^λ(γ)}=[π0+π1{1−g(λ)}/(1−λ)]γPr(P⩽γ)=π0+π1{1−g(λ)}/(1−λ)π0pFDR(γ)⩾pFDR(γ).

(46)

Proof of corollary 1.

Since g(λ) is concave in λ, {1−g(λ)}/(1−λ) is non-increasing in λ. Therefore, the minimum of {1−g(λ)}/(1−λ) is obtained at lim λ_→1[{1−_g(λ)}/(1−λ)]. By L'Hopital's rule,

limλ→1[{1−g(λ)}/(1−λ)]=g′(1).

Proof of theorem 5.

We can observe both R(γ) and R(λ). Under the assumptions of theorem 1, it follows that the likelihood of the data can be written as

{π0γ+(1−π0)g(γ)}R(γ){1−π0γ−(1−π0)g(γ)}m−R(γ),

(47)

and also

{π0λ+(1−π0)g(λ)}R(λ){1−π0λ−(1−π0)g(λ)}m−R(λ).

(48)

The result follows from standard methods.

Remark 3.

If g(⋅) is known then the maximum likelihood estimate of pFDR(γ) is

Q˜(γ)=π˜0γF^(γ)={g(γ)−R(γ)/m}γ{g(γ)−γ}R(γ)/m.

(49)

© 2002 Royal Statistical Society