Evaluation of Biases of Satellite Rainfall Estimation Algorithms over the Continental United States (original) (raw)

Introduction

Satellite rainfall estimates are used for a variety of applications, including climate modeling and diagnostics, weather forecasting, and hydrologic modeling. Satellite observations have an advantage over radar and rain gauge data because of their global spatial coverage. However, the relationship between satellite-measured radiances and rainfall reaching the ground is difficult to determine, so it is important to quantify the magnitudes of the errors of the satellite estimates.

The most recent major algorithm intercomparison project, the AIP-3, revealed that algorithms of the same class (i.e., visible/infrared, microwave, combinations) tend to perform similarly, but the differences between classes are significant (Ebert et al. 1996). However, it has been difficult to draw quantitative conclusions on algorithm performance from major intercomparison projects, because the spatial and temporal extent of the ground reference data used in these projects to evaluate the satellite estimates is usually limited and the accuracy of the data is often questionable.

There is a scarcity of validation data over the oceans, which makes it difficult to quantify the accuracy of oceanic satellite estimates. However, the quality and quantity of validation data over land continue to improve, especially over the United States, which has extensive radar and rain gauge coverage. For example, Conner and Petty (1998) used a rain gauge network of approximately 2700 sites and the Radar Data Processor-II (RADAP-II) archive to validate and to intercompare five microwave algorithms. With the advent of the National Weather Service's Next-Generation Weather Radar (NEXRAD; Crum and Alberty 1993) network, there are around 120 Weather Surveillance Radars-1998 Doppler (WSR-88D) covering most of the United States and providing data useful for satellite rainfall validation.

In this study, we attempt to quantify the biases of two satellite algorithms. We use radar and rain gauge data from U.S. networks over a period of three years. The algorithms are the Geostationary Operational Environmental Satellite (GOES) multispectral rainfall algorithm (GMSRA; Ba and Gruber 2001) and the National Oceanic and Atmospheric Administration National Environmental Satellite, Data, and Information Service (NOAA/NESDIS) Special Sensor Microwave Imager (SSM/I) scattering algorithm (Ferraro 1997). As in Conner and Petty (1998), we address the quality of the radar and gauge data first. Then we use a bias-corrected radar rainfall product to assess the biases of the satellite estimates.

We give special attention to the NESDIS SSM/I microwave algorithm, because microwave estimates are usually assumed to have low bias in comparison with geostationary estimates and so they are used quantitatively for many applications. The NESDIS SSM/I algorithm described here is used for bias adjustment of infrared estimates in the land portion of the Global Precipitation Climatology Project (GPCP) multisatellite algorithm (Huffman et al. 1997). In addition, this algorithm was implemented by the U.S. Navy's Fleet Numerical and Oceanography Center in 1995 as the operational SSM/I rain-rate algorithm and continues to operate in that capacity. Also, the algorithm has been chosen as the prelaunch rainfall algorithm for the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave Imager/Sounder, which has an expected 2002 launch year.

The NESDIS algorithm has been modified slightly to be used in the Goddard profiling algorithm (GPROF) (Kummerow et al. 2001) with other microwave data, specifically Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) and Advanced Microwave Sounding Radiometer (AMSR) data. By design, the biases of the land portions of these algorithms will be almost identical to those of the SSM/I algorithm. The version-5 TMI land rainfall estimates currently produced at the National Aeronautics and Space Administration (NASA) use this modified algorithm. However, based on the results of this and other studies, McCollum and Ferraro (2002) recalibrated the algorithm for the upcoming version-6 TMI release and for the first U.S. AMSR team land rainfall algorithm for the Aqua satellite, launched in May of 2002.

Although the quantity of radar and gauge data over the continental United States is high, it is important to assess the relatively unknown quality of these data so that we have confidence in our estimates of the satellite algorithm biases, which we calculate using the radar and gauge data. The question of how best to use radar and gauge data to evaluate satellite algorithms is still very open ended. Here we present what we believe to be a useful approach for our problem, that is, evaluation of long-term satellite biases, realizing that it does not answer all questions.

The three sensors involved, rain gauges, radar, and satellites, sample the space–time domain in vastly different ways (e.g., North and Nakamoto 1989). Radar, despite having a sampling area some eight orders of magnitude greater than that of the rain gauges, represents a compromise between good spatial and temporal coverage. In our approach, we use the continuous sampling ability of rain gauges to adjust the bias of the radar. Then, in turn, we use the bias-adjusted hourly rainfall estimates to assess the bias of the satellite products.

We made many subjective decisions in the study in choosing parameters whose optimal values were difficult, if not impossible, to determine. In the following sections, we present results describing the quality of the radar and gauge data and how we use the gauge data to create a bias-adjusted radar rainfall product used for bias assessment of the satellite products. We begin with a brief description of the data and the satellite rainfall estimation algorithms.

Data and algorithms

GMSRA

NOAA/NESDIS continuously operates two GOES satellites over the Western Hemisphere. These satellites collect data from five spectral channels that range from visible to infrared at a spatial resolution of 4 km and a temporal resolution of 15 min. Geostationary satellites offer near-continuous observation of cloud systems from approximately 60°N to 60°S, and the two GOES satellites observe all of the continental United States (CONUS).

Infrared (IR) information provided by the GOES satellite can be used to estimate rainfall rate based on cold cloud tops. IR-only methods are based on the observation that very cold cloud tops are generally associated with thunderstorms. However, these methods have difficulty in screening all nonraining cold cirrus. Also, the IR-only methods are not applicable to warm cloud tops because the precipitation associated with these clouds is primarily sensitive to microphysical properties of cloud (i.e., effective radius of cloud particles) parameters, which are insensitive to IR brightness temperatures. The GMSRA, which is based on the combined IR/visible and near-IR channels, attempts to identify these parameters, making it possible to derive rainfall for warm, raining clouds.

The GMSRA combines techniques developed from previous algorithms into a comprehensive rainfall algorithm. It utilizes the basic technique of using cold IR cloud-top temperatures to estimate rainfall rate (Arkin and Meisner 1987; Vicente et al. 1998). The key to the GMSRA is the rain–no-rain screening technique, and the most innovative approach in the screening is to use an estimate of effective radii of cloud particles developed by Rosenfeld and Gutman (1994) to classify warm cloud tops as raining or nonraining. It also uses a convective–stratiform separation technique using spatial and temporal gradients of cloud-top temperature similar to Adler and Negri (1988) and Vicente et al. (1998). Another innovation is a regional calibration of rain-rate-to-cloud-top-temperature relationships, dividing the continental United States into four quadrants and using different relationships derived from radar rainfall estimates for each quadrant. The hourly estimate is calculated as the trimean of the GOES observations at the start, middle, and end of the hour. Ba and Gruber (2001) show that these improvements appear to result in much more accurate rainfall estimates than the IR-only GOES precipitation index method of Arkin and Meisner (1987).

The GMSRA has been running routinely at NESDIS since 1997, and the estimates used here were collected daily from a rotating archive from 1998 to 2001. Thus, the algorithm was undergoing development throughout this period into the algorithm presented in Ba and Gruber (2001). For example, a programming error was found at the end of 1999, when it was realized that the Rosenfeld and Gutman (1994) screening method was not being implemented. Also, the regional calibrations were added in August of 1999; only one relationship was used prior to this date.

SSM/I

The SSM/I, on the DMSP platform, became operational in July of 1987. The DMSP satellites are polar orbiting; that is, any point on the earth is observed at most 2 times per day by a single satellite at approximately the same local morning and evening times. The data that we used for this study are from the three most recently launched DMSP satellites, the F-13, F-14, and F-15 (collection of F-15 data began in spring of 2000). At present, the data from F-13 come at approximately 0600 and 1800 LT, F-14 data are for approximately 0900 and 2100 LT and F-15 data correspond to approximately 0915 and 2115 LT.

The basis of the SSM/I land rainfall algorithm developed at NOAA/NESDIS is from Grody (1991), who developed a global scattering index (SI) at 85 GHz for use with the SSM/I sensor. Further refinement of the technique is described in Ferraro (1997). This and other microwave land algorithms use scattering by precipitation ice/raindrops as their basis, because emission is difficult to detect over land because of the emissivity of the land surface. Thus, land algorithms are less accurate than ocean algorithms, which can utilize the emission signal from raindrops over water.

The relationship between ground-based radar-estimated rainfall rate and microwave scattering determined by Ferraro and Marks (1995) was used to develop the NESDIS rainfall-rate algorithm. The instantaneous rain rate (RR) over land in millimeters per hour is given by

1.9468(1)

unless the computed RR is greater than 35 mm h−1 in which case RR is set to 35 mm h−1.

The SSM/I estimates we used are from orbital data; that is, estimates are made at 12.5-km spacing centered on the location of the approximately 15 × 13 km2 85.5-GHz field of view (FOV). There is a unique time (in seconds) assigned to each estimate, corresponding to the time at which the microwave measurement of the FOV is taken. In this study, we compare the SSM/I estimates with hourly estimates of the other sensors for the hours containing the microwave estimates. This is an arbitrary choice dictated mainly by convenience, because the optimal scale of temporal integration of rainfall that corresponds to satellite estimates is not understood well. In terms of bias, the focus of our study, we have a large sample so that the large random errors should still cancel out.

However, the second-order accuracy will be lowered from the comparisons with hourly estimates. For example, Habib and Krajewski (2002) used 2-km pixel-based radar rainfall estimates for locations with 1-min rain gauge data to find the optimal scales for matching. They found that time integration scales of 5–20 min, with an offset of approximately 5 min after the radar observation, give the best correspondence of gauge data with the instantaneous radar estimates. The correspondence decreased more rapidly with mismatch in time, with correlation dropping from near 1 with a time offset of approximately 5 min to around 0.2 as the offset is changed by ±20 min. Although the results would be different in matching satellite FOVs with radar pixels, the finding that the rainfall correlation can change significantly for a 20-min offset applies here, because we are comparing instantaneous products with hourly products.

Radar rainfall estimates

We obtained the NEXRAD radar rainfall product from the National Centers for Environmental Prediction (NCEP), which provides hourly digital precipitation (HDP) radar rainfall estimates on a national grid. The HDP product is created after receiving estimates from nearly 100 WSR-88DS that report to NCEP in real time. The HDP algorithm, described in detail in Fulton et al. (1998), uses a power-law Z_–_R relationship to estimate rainfall rate R from radar reflectivity Z. The rainfall-rate intensity maps are integrated over time to produce hourly accumulations. Quality-control algorithms attempt to identify and to remove outliers and ground returns from the radar precipitation estimates. The individual radar estimates are merged onto the Hydrologic Rainfall Analysis Project (HRAP) grid of approximately 4 × 4 km2 covering the CONUS. Rainfall for bins that contain more than one radar estimate is calculated using a simple inverse-distance weighted average. Evaluations of HDP products have been carried out by Smith et al. (1996) for the southern plains and by Young et al. (1999) for the complex mountainous terrain of the northern Appalachians.

Rain gauge data

Automated 24-h rainfall accumulation observations from 13 832 gauges over the CONUS are available via the GOES data collection platform. The data are from the hydrometeorological automated data system (HADS), a real-time data acquisition and data distribution system operated by the Office of Hydrology of the National Weather Service. These data have limited previous quality control; thus, prior to calculating the bias-corrected radar product, we had to screen them for obvious problems. We discuss this procedure below.

Development of the bias-adjusted radar rainfall product

Approaches to validation

There are many viable approaches for validating satellite rainfall algorithms, depending on the goals. Here, the primary goal is to understand the temporal and spatial variations of the bias over large timescales and space scales. Two other approaches to validation are the use of only gauges, because ultimately we use the gauges to assess the bias, or the use of multisensor (gauge and radar) estimates created by the National Weather Service River Forecast Centers in near–real time applications (Fulton et al. 1998).

Rozumalski (2000) used the National Weather Service NCEP stage-III multisensor (hourly radar estimates adjusted and merged with hourly gauge reports) product for validation of the NESDIS autoestimator (A-E) rainfall algorithm (Vicente et al. 1998). The stage-III product is useful for validation of the A-E, because the A-E biases are large when compared with those of the algorithms investigated in this study. Thus, the biases in the stage-III product are negligible when compared with those of the A-E. Young et al. (2000) investigated the stage-III product and found that range-dependent biases still may be present in areas of low gauge densities and that, overall, the stage-III estimates were approximately 20% low when compared with gauge observations. A preliminary finding of Fortune (2002) is that the estimates from hourly gauges and, consequently, the estimates of National Weather Service multisensor products such as stage III are approximately 25% low in comparison with 24-h gauge reports. This result presumably is due to hourly gauge underestimation, indicating again that standard hourly gauge-adjusted products might not be the best method for long-term bias studies when the bias of the algorithms is assumed to be small.

Conner and Petty (1998) used their own gauge-adjusted hourly product, and similar problems were encountered but in the opposite direction in terms of bias. They divided the radar estimates by radar–gauge ratios with means ranging from 0.18 to 0.56, giving much higher adjusted estimates when compared with the original radar estimates. Their final satellite validation results were that satellite algorithms (including the NESDIS SSM/I algorithm used here) were unbiased relative to gauges but underestimated rainfall by about 50% relative to the gauge-calibrated radar. The main purpose of their study was to compare different predictive indices, not to estimate bias, so their product may still have been the best available, as was the stage-III product in Rozumalski (2000). However, we have chosen to focus on bias here, which requires a different approach.

An approach that should theoretically give the most bias-free validation products is the use of gauge data only for comparisons at the collocated satellite pixels. The reasons that we have not chosen this approach relate to sampling and quality control. The temporal sampling error (diurnal bias and/or random sampling error) is large for comparisons of the daily mean of infrequent (for SSM/I) satellite overpasses with the 24-h gauge accumulation, so we wish to make comparisons more closely matched in time.

Hourly gauge data are also available from the HADS, and these data were collected for this study for quantitative validation. We believe that these data are poor, because there were frequent hourly gauge reports over all ranges of magnitudes (even exceeding 1000 mm h−1) for which the gauge reported rainfall and the radar and satellites estimated no rainfall. Numerous attempts at automatic quality control of these data were unsuccessful because of the amount of apparently poor data, and manual quality control is not feasible for such a large dataset. These problems, which were not encountered with the 24-h data, combined with the uncertainty in bias assessment found in previous studies influenced us to eliminate the hourly gauge data from our bias correction procedure. Other gauge data sources also exist, but data quality is usually unknown. In addition to these problems with gauge comparisons, the spatial sampling error is also significant for comparisons of the point gauge estimates with ∼15 × 13 km2 SSM/I satellite pixels.

Another traditional approach is the use of monthly, gridded gauge estimates. This approach was used in Krajewski et al. (2000) and Adler et al. (2001), but only for selected 2.5° grid boxes for which the gauge density was high. Again, the differences in the temporal and spatial sampling of satellites versus gauges still come into play in that kind of analysis; the SSM/I observations cover a very small percentage of the month that the gauges continuously observe the rainfall.

Therefore, we decided to create an hourly gauge-adjusted radar product with the requirement that this product is unbiased with respect to the 24-h gauges that are used to adjust the radar rainfall estimates. The validation statistics are calculated by comparing this hourly product with the collocated satellite estimates of SSM/I overpasses. We only use the locations of the 24-h gauges for the validation data, because we are more confident in the quality of our bias adjustments at the gauge locations. This method should produce a more accurate assessment of bias than in previous studies while still using a product with temporal and spatial scales that are similar to the satellite estimates themselves. In the ideal, this and gauge-only analyses should yield similar results for overall bias, providing more confidence in our quantitative assessment.

Quality control

The first aspect of creating a surface reference product is the removal of data of bad or poor quality. To do this, removal effectively for our objectives, we consider the rain gauge and radar data jointly. For obvious reasons, we do not want to use bad rain gauge data to adjust the radar products. By the same token, we do not want to “correct” the radar products for the locations at which radar has difficulty detecting precipitation (e.g., because of beam blockage and/or overshooting the clouds). Other estimation problems arise because the equivalent rainfall rate for snowfall is underestimated significantly by radar, whereas gauges report the liquid water equivalent of the melted snow but suffer from undercatch. Snow fortunately is a minor problem in our study because the SSM/I algorithm does not attempt to make estimates over snow-covered surfaces and so these data are excluded from our analysis. The quality of the rain gauge data is affected by recording and transmission errors, undetected failures of the gauge mechanism, debris inside of the gauge orifice, and so on (Steiner et al. 1999; Guard et al. 2001, manuscript submitted to J. Atmos. Oceanic Technol., hereinafter GKK). These are some of the possible major error sources; other error sources also exist, such as gauge undercatch because of wind, which Krajewski et al. (1998) estimated to be on the order of 5%.

A major error source discovered by the National Weather Service Office of Hydrology is a truncation problem in processing NEXRAD rainfall rates (see the interagency memorandum of understanding final report for 1 June 1999–31 May 2000, which was available online at http://www.nws.noaa.gov/oh/hrl/papers/2000mou/Report/Index.html). A coding error causes truncation, rather than rounding off, of the rainfall amount in each 6-min radar scan period into 0.1-mm intervals. To make matters worse, for rainfall events lasting more than 1 h, a running hourly total is calculated that results in diminished rain rates of 1.5–2.0 mm h−1 for each hour that rain continues steadily. The effect of the truncation error is greatest during persistent, light precipitation events and is least during heavy events of short duration. The results presented here in subsequent figures show evidence of this truncation problem. Although this coding error was easy to fix, there are many regulations for changing operational codes for NEXRAD, so the corrected data are just at the time of this writing becoming available for the operational products. There are no plans to correct the archived products suffering from the coding error.

Our starting point for the data quality control is estimation of the probability of detection of precipitation of the 24-h accumulated HDP products. We analyze data for the 3-yr period from 21 July 1998 to 22 July 2001. Figure 1 shows the map of probability of daily precipitation (POP) calculated as the number of 24-h accumulations with nonzero rainfall reported for a given location (i.e., a 4 × 4 km2 HRAP pixel) divided by the number of reported accumulations. Some range-dependent rings around the radar locations can be seen, especially in regions without overlapping radar coverage. We summarize the POP data in a histogram in Fig. 2a. The double-peaked shape of the histogram suggests that POPs may be lowered because of radar rainfall detection problems. If the rainfall regime were the same for the entire country, the true POP histogram would be a single peak. Varied rainfall regimes would result in a smooth distribution of the POP. We use collocated 24-h rain gauge accumulations for additional quality control of the radar data described next, resulting in the distribution of Fig. 2b.

Consider the following conditional probabilities: the probability P that radar observes rainfall at a given location given that a collocated rain gauge observes rainfall, that is,

P_RG_P R T G T(2)

and the probability that rain gauge observes rainfall given that radar observes rainfall at the same location, that is,

P_GR_P G T R T(3)

In (2) and (3), the subscript T denotes the rainfall accumulation period. For small T (on the order of minutes), we would expect _P_RG > _P_GR because radar can “see” a larger area better. For longer T, on the order of days, the difference between the two probabilities should diminish and both probabilities should tend to increase. We calculated both conditional probabilities _P_RG and _P_GR using daily accumulations. The minimum accumulation in the data is 1 mm, but this threshold is not significant, because nonzero 24-h accumulations are generally much greater than 1 mm.

We performed the calculations on a gauge-by-gauge basis for the 13 832 daily gauges of the 3-yr period. It is difficult, if not impossible, to determine the correct minimum conditional probability threshold for quality-control purposes; thus, we had to make a subjective decision based on a sensitivity study. We varied the conditional probability thresholds (both _P_RG and _P_GR were required to be greater than the threshold) from 0.0 to 0.9 in 0.1 increments and used the validation results derived from the procedures described in the following sections to produce some of the final statistics.

Figure 3 shows the results of the sensitivity study. Figure 3a shows the number of gauges left after the quality control; over 1000 of the initial gauges are eliminated by the 0.0 threshold, meaning that at least one of the conditional probabilities was 0.0. The gauges cover different lengths of time over the three years; shorter records are more apt to this initial removal. As the threshold is increased, gauges are removed presumably because of their combination of 1) location from the radar and 2) climate regime; gauges in stratiform regimes far from radars will miss more rain than those from convective regimes close to the radar.

Figure 3b shows the validation statistics with respect to matchups with the hourly satellite estimates. As the quality control becomes more stringent, the false-alarm ratio (FAR) decreases as expected. The satellite has better correspondence with the radar product for the more stringent levels of the quality control, but the sample size is reduced.

Figure 3 does not provide clear guidance for choosing a level of stringency for the quality control; there is a trade-off between the amount of data available for validation and the amount of rainfall possibly missed by the radar. We wanted to keep enough locations to cover most of the eastern United States but eliminate those locations at which the radar often missed rainfall. We chose 0.6 as a compromise.

After this quality control, there are 7589 gauges, whose locations and conditional probabilities are shown in Fig. 4. Most of the data are in the eastern and central United States, because most of the gauges are located there. In addition, a majority of the data in the west was removed by the quality control; presumably there is extensive beam blockage by the mountains and/or orographic/stratiform rainfall missed by radar. The highest conditional probabilities occur in a north–south band through the central United States, an area known for convective storms, which the radars can see a high percentage of the time.

The results of Fig. 4 are somewhat surprising, because theoretically the conditional probabilities of nonzero rainfall should be higher for radar than for gauges, because the radar observes a much larger area. The time series (averaged over the radar–gauge matchups) of both conditional probabilities are shown in Fig. 5, and the results suggest that two sources are contributing to the low radar probabilities of detection. First, the low values correspond to colder seasons, which have fewer convective events and more shallow, stratiform-type rainfall. This type of rainfall is more subject to beam overshooting. In addition, the lighter stratiform-type rainfall results in truncation error in the radar rainfall product, which contributes to lower radar detection probabilities.

To investigate better the factors that contribute to radar detection problems, such as mean storm duration and mean rain rate during storms, we used the high-quality, high-temporal-resolution, 15-min Oklahoma Mesonet rain gauge dataset (Brock et al. 1995). We used data observed from February through October of 1999, consisting of 114 gauges. Figure 6 (top panel) shows the conditional probabilities of precipitation for daily accumulations using the mesonet gauge–radar matchups, similar to the nationwide data of Fig. 5. The results are consistent with Fig. 5, although the Oklahoma radars perform better with respect to the gauge detection. The lower two panels of Fig. 6 show the mean storm duration and mean 15-min conditional rainfall rate, conditional on nonzero rainfall, calculated from all mesonet gauge data. The poor radar detection with respect to the gauges occurs during the lightest rainfall in late winter–early spring, and the radar does best with respect to the gauges during a period of short, heavy storms during midsummer. This result supports the hypothesis that the poor radar rainfall detection is, in part, due to the truncation error in the NEXRAD rainfall processing.

Correction factors

The radar rainfall estimation biases have significant variations in both time and space, and the biases are due to errors in both the rainfall rate estimation algorithm and rainfall detection. Figure 7 shows the 3-month running mean of all available quality-controlled radar–gauge matchups for each day of the 3-yr period, along with the bias-corrected radar rainfall estimates, created from the procedure described in the following sections. Each day contains matchup data from an average of around 2000 gauges, because neither the gauge nor radar data record is complete for any particular day. The correspondence is poor between the uncorrected radar and gauge means, but this is not necessarily unexpected. The difference between gauge and radar is much greater over the winter months than over the summer months because both sensors have major difficulties in detecting and/or estimating winter precipitation. Smith et al. (1996) found that gauge precipitation exceeded radar estimates by 14%–100% in the southern plains, and these numbers are similar to those we present in Fig. 7. However, the procedure described in this section probably works correctly, because the bias-corrected radar rainfall estimates are unbiased with respect to the gauges.

Now we turn attention to the question of how best to represent the above discrepancy. Radar rainfall estimation bias can be expressed as an additive or multiplicative bias. In an additive bias formulation, the same correction is added to each radar rainfall estimate to produce the corrected estimate; using multiplicative bias entails multiplying the radar rainfall estimate by a multiplicative correction. We investigated both methods by calculating the additive and multiplicative gauge–radar bias as a function of gauge rainfall rate. To maximize the sample size, we used the 24-h radar–gauge matchups as well as three years of hourly radar–gauge matchups that were collected for this study but were not used for bias assessment. We believe that the hourly data are useful for assessment of additive versus multiplicative radar bias, because this does not require anywhere near the accuracy that is needed for quantitative satellite bias assessment. We grouped the data according to the nearest integral value of the rain gauge rainfall and calculated the biases for each group; the results are plotted in Fig. 8. The multiplicative correction is more applicable to our radar rainfall estimates, because the multiplicative bias is relatively constant over the range of rain rates, particularly for the hourly data matching the timescale of our hourly corrections.

The radar rainfall bias varies in time and space, so the bias-corrected radar rainfall product should account for the variations. However, it is difficult to determine how much averaging in time and space is needed so that the differences between gauge and radar rainfall are primarily from the bias and not from random fluctuations owing to a small amount of comparison data. The goal is to have enough data to calculate the corrections so that they are statistically stable. On the other hand, the space–time averaging domain should be small enough to avoid smoothing the effects of seasonality and spatial climate regimes. To get a better idea of the appropriate scales for calculating bias, we performed a sensitivity study with the Oklahoma Mesonet data.

The mean and standard deviation of the bias calculated by grouping the mesonet gauge–radar matchups into different spatial and temporal scales is shown in Fig. 9. For spatial averaging, groups of five gauges in the vicinity of each other were chosen. The purpose of this plot is to illustrate that averaging is necessary, otherwise some corrections might turn out to be very large because of small sample size. For example, the left-most points represent the mean and standard deviation of the biases calculated from the first day in the record, and there is a wide variation among the different means in addition to a high mean bias. In extreme cases, including or excluding an extra day leads to a large difference in the calculated bias. As the timescale increases, biases are calculated by including subsequent days so that the groups of data become larger. The plots suggest that if no spatial grouping is performed, the bias stabilizes after about 75 days of averaging, whereas with grouping this timescale decreases to about 60 days. These results suggest that some degree of spatial averaging is necessary and that several weeks of temporal averaging produce more stable bias values.

Based on these findings, we use 2-month (61 day) temporal and 1.1° spatial averaging windows to calculate the gauge-to-radar multiplicative biases. The biases are calculated for all 0.1° grid boxes using the 2 months of daily gauge–radar matchups from the gauge locations within the 1.1° (±5 grid boxes) centered on the 0.1° grid box of interest. Corrections are calculated for all 0.1° grid boxes for the 2-month periods with sufficient quality-controlled radar and gauge data pairs within the 1.1° grid box to contain a minimum of 30 nonzero values. Additional manual quality control was done for the 61-day periods, because gauge or radar data are sometimes bad (e.g., repeating values, including repeating values of zero) for an isolated period of time. Figure 10 (upper panel) shows an example of the corrections calculated for the 61-day period in April–May of 1999. The map shows climatological features of higher corrections in the north, where April is still often dominated by stratiform precipitation. In contrast, the lower panel shows the corrections calculated for the June–July 1999 period. They clearly show that the corrections become lower as the rainfall regime becomes more convective. The validation study is only done for these regions with corrections, so most of the western United States is excluded from the following analyses.

Validation results

Comparisons of SSM/I and GMSRA rainfall rates with the corrected rainfall estimates are shown in the following tables. The data are from the hours of SSM/I overpasses of the daily gauge locations, because these locations are the ones at which we are most confident with our gauge-corrected radar rainfall estimates. We collected the SSM/I estimates, valid over the ∼15 × 13 km2 85.5-GHz channel footprint, for those locations containing the gauges and those days on which the gauges report data, so that the SSM/I estimates correspond to the days used to calculate the corrections. We also collected the 3 × 3 grids of both the ∼4-km NEXRAD HDP estimates and the ∼4-km GOES estimates corresponding to the SSM/I footprint for the corresponding hours, producing a matchup dataset with relatively small spatial and temporal sampling differences. The estimates were collected daily for the three years from rotating archives. We compared data for the hours with no missing estimates from any of the products, resulting in a total of 1 750 230 hourly data points.

In the tables of results, the multiplicative bias is the ratio of the mean satellite estimate to the mean corrected radar rainfall estimate over the period. The correlation coefficient ρ is calculated from all the hourly data. The probability of detection (POD) is calculated as the fraction of the total number of nonzero radar estimates for which the algorithms estimate nonzero rainfall. The FAR is calculated as the fraction of nonzero estimates from the algorithms for which the radar estimates no rainfall. Note that these FAR values are most likely higher than in reality, because the sensitivity analysis of Fig. 3 showed that with the 0.6 threshold we are including data from locations for which we know the radar sometimes misses rain. The adjusted radar rainfall value is the product of the multiplicative factor and the HDP; it is always zero where the radar indicates no rainfall, because gauges cannot compensate for detection problems by radar. Also, we are comparing the instantaneous satellite image(s) (one per hour for SSM;clI, two images per hour for GMSRA) with hourly radar accumulations, so the satellite might not see rainfall that the radar observed during the hour, reducing the satellite POD. As a result, the POD and FAR values for the satellite algorithms presented here are biased, and the true statistics should be better.

The previous figures have shown the regional variation in rainfall characteristics over the continental United States. To test regional differences, we divided the continental United States into four quadrants, similar to the calibration regions of Ba and Gruber (2001); here we use 40°N and 90°W as boundaries. The largest amount of data is in the southeast quadrant, because the most original gauges are there and the fewest were removed by quality control. There are 704 518 data points (mean rain rate = 0.09 mm h−1) from the southeast, 543 523 points (mean rain rate = 0.10 mm h−1) from the southwest, 362 270 points (mean rain rate = 0.08 mm h−1) from the northeast, and 139 919 points (mean rain rate = 0.11 mm h−1) from the northwest quadrants. The results for the different regions are shown in Table 1; the four numbers per table cell are arranged based on location.

The biases are tied to geographical region, with the major feature of lower biases in the east. There is a slight bias increase from south to north, possibly owing to more stratiform-type rainfall, which does not have the cloud signature needed for satellite rainfall estimation. The GMSRA has the potential to estimate this type of rainfall, but these results indicate the need for improvement. As a visual illustration of the SSM/I bias results in Table 1, the 3-yr mean SSM/I multiplicative (SSM/I to bias-corrected HDP) bias smoothed to 2.5° resolution is shown in Fig. 11. In general, the bias increases from east to west, consistent with the results of Table 1.

It is interesting to look also at the comparisons with the unadjusted radar HDP. The multiplicative HDP biases of 0.54 and 0.55 imply mean correction factors of almost 2 for the eastern quadrants as compared with smaller corrections in the western quadrants. Upon first glance, it might appear that the differences in satellite bias may be an artifact of the bias adjustments, but the mean corrected rain rates are higher in the western quadrants (0.112 and 0.103 mm h−1, as compared with 0.076 and 0.088 mm h−1). This implies that the discrepancy between uncorrected HDP values is even higher between the eastern and western quadrants. This is consistent with the generalization of more convective storms in the western quadrants than the eastern, requiring less correction because the radar has better detection of the deeper, convective storms. These are just generalizations for this dataset, and the quantitative results may not be applicable to the entire land region covering the United States, because these results are only for the regions remaining in the analysis after quality control and depend on the gauge locations.

One reason for creating the hourly adjusted product rather than doing comparison with daily gauge accumulations is to account for a possible diurnal cycle in the bias. The F13 has approximate 0600/1800 local overpass times, whereas the F14 and F15 are closer to 0900/2100 or 1000/2200. Table 2 shows validation statistics using the estimates corresponding to data from overpasses of either the F13 (first number) or F14_–_15 (second number). Because of the addition of the F15 satellite in 2000, there are more data points (921 896) for the F14_–_15 combination than for F13 (828 334). There is more bias-adjusted radar rainfall (0.10 mm h−1) for the F13 data than the F14_–_15 (0.08 mm h−1) data, because the F13 tends to have overpasses during more convectively active times of the diurnal cycle.

The results for the two overpass times differ, especially the GMSRA bias. This is presumably due in part to the daytime effective cloud radius screening that always occurs for the F14_–_15 a.m. overpasses, whereas it only sometimes uses this screening for the F13 overpasses. There also must be some difference in the rainfall rate–to–cold cloud relationships within the diurnal cycle not accounted for in the algorithm.

A similar discrepancy in the GMSRA bias was found using the last year only (Table 3), when all the programming bugs were fixed and the F15 was already in orbit. However, the biases were higher (1.04 and 1.65, as compared with 0.84 and 1.47 for the entire period). Ba and Gruber (2001) did not investigate long-term bias or the diurnal cycle, but case studies indicated sometimes nearly negligible and sometimes significant positive GMSRA overestimation of gauge rainfall, depending on the case. For the SSM/I, there was a larger diurnal effect considering the last year only; the F13 and F14_–_15 biases for the last year are 1.22 and 1.49, respectively, as compared with 1.23 and 1.31 for three years of data in Table 2. The other statistics are similar for the both tables; there is a slight increase in the correlation coefficient and POD for the last year of the GMSRA after the corrections were made to the algorithm.

These results are consistent with previous knowledge of the algorithms. The GOES–microwave combination algorithms are based on the principle that the FOV-based microwave estimates are more accurate than the GOES estimates, so the microwave estimates are used to adjust the bias of the more frequent but less accurate GOES observations. All of the SSM/I statistics show better correspondence than the GMSRA estimates with the bias-corrected radar rainfall estimates on the hourly timescale.

The time series of the estimates is shown in Fig. 12. The seasonal bias pattern of the SSM/I algorithm is consistent with physical principles, because the algorithm relies on the scattering produced by ice particles more likely found in convective rainfall during warmer seasons. One single equation [(2)] relating rainfall rate to microwave scattering by ice particles, calibrated using a radar rainfall dataset, is used to estimate rainfall for all locations and times of the year. However, the relationship between rainfall reaching the ground and microwave scattering depends on many regional and season factors, such as atmospheric moisture and aerosols (McCollum et al. 2000), thermodynamics (Petersen et al. 2002), and ice particle size distributions (Bennartz and Petty 2001). The variations of these factors can result in biased satellite rainfall estimates. The development of the GMSRA was ongoing throughout the 3-yr period, and improvement throughout the period can be seen. In particular, improvement can be seen in the time series at the end of 1999, when several errors in the code were found and fixed.

The rainfall rate distributions are shown in Fig. 13. The _y_-axis values are the contribution of each rainfall rate to the total rainfall accumulation for the 3-yr dataset. One obvious feature of the plots is the high number of 35 mm h−1 (the maximum estimated rain rate) SSM/I estimates. This is from the radar rainfall calibration dataset of Ferraro and Marks (1995), which had a maximum radar rainfall rate of 35 mm h−1. The bias-adjusted radar rainfall dataset of this study probably should not be used to judge the rain-rate distribution, especially at high rates, because, as the sensitivity study showed, our nonzero rain rates are too high (by design) in order to offset the radar nondetection problem. However, the dramatic jump in the observed SSM/I rain-rate histogram does suggest that some kind of recalibration at the high rain rates would be beneficial.

For the GMSRA, the rainfall rate distribution is lower by design; the maximum 4-km instantaneous rain rate is only 20 mm h−1 for the western United States and 40 mm h−1 for the eastern United States (Ba and Gruber 2001). This fact makes it nearly impossible to reach high rainfall rates on the 12-km scale, especially when averaging three estimates per hour. It is difficult to calibrate algorithms at the high-rain-rate end because of the lack of calibration data for these rare events.

Comparison with other results

Although our results suggest overestimation by the microwave algorithm, most others show underestimation for this latitude region. Kummerow et al. (2001) found a zonal bias when comparing the NESDIS land algorithm with GPCP satellite–gauge estimates; they observed underestimation starting around 28°N with increasing underestimation up to 35°N. This latitude band overlaps our region of the continental United States up to 35°N, but their results also include the Eastern Hemisphere. Adler et al. (2001) also found a significantly low bias for the NESDIS land algorithm over land areas 30°–60°N, based on comparisons with many 2.5° grid boxes with the highest number of monthly rain gauge reports over this latitude band. The AIP-2, conducted over a large region of western Europe, showed significant underestimation by the NESDIS algorithm (Ebert et al. 1996). The Krajewski et al. (2000) validation study of GPCP monthly rainfall shows overestimation in the central United States (western quadrants of our study) and underestimation in the east for the warm-season rainfall from the multisatellite algorithm that should, by design, have similar bias as the NESDIS microwave algorithm.

The results of these studies may differ because their data cover different regions. For example, in Adler et al. (2001), the distribution of boxes was fairly uniform over the United States, whereas most of the Eastern Hemisphere boxes were in Europe because of the quantity of available gauges. Kummerow et al. (2001) used the GPCP estimates from all land grid boxes, but we focus on the areas of the continental United States with sufficient radar–gauge data. If the bias characteristics differ for different regions, these studies may give different conclusions.

The underestimation in western Europe found in AIP-2 is presumably due to less ice scattering in less-convective, more-maritime systems coming from the Atlantic. A related concept was proposed by McCollum et al. (2000), who suggested that satellite algorithms overestimate rainfall in drier but convectively active climates because of significant evaporation of rain between the cloud and the ground. They found the greatest overestimation over the dry air masses of central equatorial Africa, and in this study there similarly is overestimation in the drier climate of the central United States in comparison with the east. McCollum et al. (2000) and Kummerow et al. (2001) show clear overestimation in the convectively active Tropics. We surmise that similar processes are occurring over North America; Table 1 indicates overestimation in the more convective and dry central United States in comparison with the East Coast. We unfortunately do not have many data for the U.S. West Coast, but the limited data we have suggest underestimation (Fig. 13).

Therefore, for this latitude band, conclusions on the bias are dependent on the location of the validation data: continental regimes may have overestimation, and coastal regimes, especially west coasts, will result in underestimation. All the results may still be consistent, even though Kummerow et al. (2001) and Adler et al. (2001) show an overall underestimation for these latitudes, whereas we show overestimation.

Discussion

We apply several subjective decisions for the bias-correction procedure. The conditional probability threshold of 0.6 is chosen even though it eliminates nearly one-half of the gauges in the original dataset. It is difficult to determine the appropriate threshold; the 0.6 threshold appears to retain enough gauges to cover most of the eastern and central United States without leaving significant gaps while eliminating those that we feel do not have a sufficient radar-to-gauge correspondence.

Another issue is how best to apply the bias corrections so that the error introduced by multiplying the HDP estimate by a multiplicative factor is minimized. If a small amount of data is used, the sampling error is high and multiplicative factors can be high. Based upon guidance from the Oklahoma Mesonet data shown in Fig. 9, we decided that 1.1° × 1.1° and 2-month averaging should provide sufficient data to calculate stable correction factors while preserving the seasonal and regional variation in these factors. These subjective decisions are all open to further investigation, but we have confidence that the procedure is working correctly, because Fig. 7 shows that our bias-corrected estimates are unbiased with respect to the gauges.

We require the validation product to be overall unbiased with respect to the gauges. The sensitivity study suggested that we would have to eliminate almost all of the data in order to use only locations at which the radar did not miss gauge rainfall. Because we use a multiplicative adjustment (based on Fig. 8) and we know radar often misses the rain, the missed rain is compensated by higher nonzero rain rates. If the goal of the validation product is a best estimate in a least squares sense, a different adjustment procedure would be more appropriate, but we have chosen bias as our top priority and so the second-order differences between satellite estimates and our validation product are greater.

Conclusions

We have used three years of radar and rain gauge data to create a bias-adjusted hourly radar rainfall product, which is compared with estimates from coincident SSM;clI and GOES satellite estimates. Like previous validation studies, we faced many challenges in determining how best to use the validation data to draw meaningful conclusions.

We feel confident in making general conclusions about the diurnal, seasonal, and regional patterns in the SSM/I and GMSRA algorithm bias and in concluding that the improvements in the GMSRA algorithm have led to a better product. On a diurnal basis, there is more rainfall and a smaller positive bias for the data corresponding to the SSM/I F13, which has approximate 0600/1800 local overpass times, as compared with the F14–F15 SSM/Is, which have closer to 0900/2100 or 1000/2200 local overpass times. For the GMSRA, the difference is dramatic, with an overestimation of 45% for the F14_–_15 as compared with 15% underestimation for F13. General conclusions for the seasonal and regional cycles are that there is more rainfall and a positive bias for the SSM/I in the summer months as compared with the winter months, and the bias increases from the East Coast to the central United States.

These results, together with other studies, suggest that the microwave rainfall estimation bias over the continental United States is dependent upon regime. It appears that the maritime west coast regimes result in significant underestimation, which is countered by overestimation in the more convective, dry, central continental regimes. We unfortunately do not have many gauge data for the western United States, and many of these data were removed after quality control, so we can only surmise that the U.S. West Coast has characteristics similar to what was found from AIP-2 over western Europe. To address this issue, the U.S. Advanced Microwave Sounding Radiometer team has created a validation site using the Eureka, California, radar, which is the NEXRAD radar closest to the coast that still receives substantial rainfall in comparison with those in southern California. The coastal coverage of this radar, located in northern California, stands out clearly in Fig. 1.

In creating the adjusted radar validation product, we found that radar had more problems identifying rainfall than expected. This study is consistent with others in the conclusion that the quality of the information available for validation is, in general, poor. Despite applying conservative quality-control procedures we remain unsure about the data that are the necessary base for all validation efforts. It seems to us that if the hydrometeorology and hydroclimatology research community wants confidence in rainfall products based on remote sensing technologies, we need to organize special validation setups. Two specific ideas would go a long way. The first is the use of dual rain gauges for rainfall measurement. This approach was suggested earlier by Ciach and Krajewski (1999) and Steiner et al. (1999), and the benefits of such a procedure have been demonstrated by GKK. The second is the use of high-density clusters of rain gauges for calibration and validation of radar and satellite rainfall algorithms and products. First implemented in the TRMM ground validation and field campaigns, this approach provides information of spatial and temporal variability of rainfall, enhanced quality-control possibilities, and more reliable (i.e., less uncertain) surface reference data and products (e.g., Habib and Krajewski 2002; Anagnostou and Morales 2001).

Acknowledgments

We thank our NOAA/NESDIS colleagues for their help in many different aspects of this study, and we thank the three anonymous reviewers for the suggestions that were critical to the quality of the final version. The first author was partially supported by the National Research Council Research Associateship Program, and partially by NASA Grant S-87398-F.

REFERENCES

Fig. 1.

Fig. 1.

Fig. 1.

Map of the probability of precipitation for 24-h accumulations calculated by dividing the number of days with nonzero accumulation by the number of days of data. Based on three years (Aug 1999–Jul 2001) of HDP radar rainfall data

Citation: Journal of Applied Meteorology 41, 11; 10.1175/1520-0450(2002)041<1065:EOBOSR>2.0.CO;2

Fig. 4.

Fig. 4.

Fig. 4.

Locations and conditional rainfall probabilities over the 3-yr record for daily gauges (G) compared with the corresponding 24-h HDP radar (R) accumulations using the 7589 gauge–radar matchup locations that passed the quality-control procedure

Citation: Journal of Applied Meteorology 41, 11; 10.1175/1520-0450(2002)041<1065:EOBOSR>2.0.CO;2

Fig. 5.

Fig. 5.

Fig. 5.

Running mean of the conditional probabilities of detection for the quality-controlled matchups. Each value is calculated using the data from a 61-day window centered on a given date. Shaded regions correspond to cold-season (Nov–Apr) months

Citation: Journal of Applied Meteorology 41, 11; 10.1175/1520-0450(2002)041<1065:EOBOSR>2.0.CO;2

Fig. 6.

Fig. 6.

Fig. 6.

Running mean of the (top) conditional probabilities of detection, (middle) storm duration, and (bottom) conditional mean 15-min rainfall rates for the Oklahoma Mesonet gauge–radar matchups. Each value is calculated using a 61-day window. Shaded regions correspond to cold-season months

Citation: Journal of Applied Meteorology 41, 11; 10.1175/1520-0450(2002)041<1065:EOBOSR>2.0.CO;2

Fig. 12.

Fig. 12.

Fig. 12.

A 3-yr time series of 61-day running mean rainfall from the matchup data based on SSM/I overpasses over 24-h gauge locations. The arrow indicates the time at which the GMSRA error was corrected. Shaded regions correspond to cold-season months

Citation: Journal of Applied Meteorology 41, 11; 10.1175/1520-0450(2002)041<1065:EOBOSR>2.0.CO;2

Table 1.

Statistics of the algorithms with respect to the bias-corrected radar rainfall data for all hourly matchup data from 21 Jul 1998 to 22 Jul 2001 for northwest (top left), northeast (top right), southwest (bottom left), and southeast (bottom right) quadrants of the United States

Table 1. 

Table 1. 

Table 2.

Statistics of the algorithms with respect to the bias-corrected radar rainfall data for all hourly matchup data from 21 Jul 1998 to 22 Jul 2001, separating the data from the F13 (left) from the F14 and F15 data together (right)

Table 2. 

Table 2. 

Table 3.

Statistics of the algorithms with respect to the bias-corrected radar rainfall data for all hourly matchup data from 22 Jul 2000 to 22 Jul 2001, separating the data from the F13 (left) from the F14 and F15 data together (right)

Table 3. 

Table 3.