Track Dependence of Tropical Cyclone Intensity Forecast Errors in the COAMPS-TC Model (original) (raw)

1. Introduction

Reliable tropical cyclone (TC) intensity prediction is vital to operational TC forecasting centers. While the TC track forecast skill has shown a steady improvement over the last few decades, the progress in the official TC intensity forecast skill by numerical models has been slow to date with marginal improvements (Bender et al. 2007; DeMaria et al. 2007, 2014; National Hurricane Center or NHC,1 Tallapragada et al. 2014, 2015; Doyle et al. 2017). For example, the NHC’s official reports have shown that the maximum surface wind (_V_MAX) forecast errors have been little improved since 1990s and are currently around ~12–15 kt (1 kt = 0.514 m s−1) at 4–5-day lead times in both the North Atlantic (NA) and eastern North Pacific (EP) basins. With the continuous reduction of track errors at all forecast lead times, such discrepancy in improvement between track and intensity forecast skill is intriguing, given the fact that TC intensity is a strong function of environmental factors such as sea surface temperature (SST), vertical wind shear, topography, or cold air intrusion as shown in previous observational, theoretical, and modeling studies (e.g., Gray 1968; George and Gray 1976; Emanuel 1986; Holland 1997; Mandal et al. 2007; Hill and Lackmann 2009; Wang 2009; Zhan et al. 2012).

From a physical perspective, the dependence of TC intensity on TC motion is expected. By analyzing the best track data and the global forecasting system forecasts/analyses during the 2002–09 seasons, DeMaria (2010) demonstrated that the intensity forecast errors by the statistical-dynamical Logistic Growth Equation Model (LGEM) could be reduced by about 35% at the 5-day lead time after confining TC intensity verification to the best track (i.e., the forecasts with track errors close to zero). Using cloud-resolving simulations for the TC cases that had small track errors during 2007–10, Du et al. (2013) also found that reducing track forecast errors by 55% at 3-day lead times can reduce intensity errors by ~15%–19%, with no significant intensity improvement at 1-day lead times in their model. In addition, they observed that the dependence of intensity errors on track errors is more pronounced in the north Western North Pacific (WP) basin than in the North Atlantic (NA) basin. These statistical analyses and hindcast studies suggested that improving TC track forecast skill would be beneficial to intensity forecasts, yet a key question of why intensity error reduction is substantially smaller than the track error reduction is still elusive.

While the answer for this question may root in the fact that TC intensity has a maximum potential intensity (PI) limit while TC tracks do not have any restriction on where they can go, this fact is alone insufficient to ensure that TC intensity errors must be bounded. This is because one has to prove that the PI is globally stable with a sufficiently large basin of attraction such that all TC intensity states must be confined around this PI state (i.e., PI is a bounded attractor rather than a point). To our knowledge, the existence of such a bounded PI attractor has not been understood to date, other than numerical demonstrations in previous studies. The above question is therefore of significance, especially for isolating how much of TC intensity errors is caused by TC intrinsic2 dynamics as compared to what is caused by TC tracks. Addressing this question will help determine the maximum intensity improvement that one can obtain in TC models by improving track forecasts.

Indeed, quantifying such TC intrinsic intensity variability has been examined in recent idealized and observational studies (Hakim 2013; Brown and Hakim 2013; Kieu and Moon 2016; Judt et al. 2016; Kieu et al. 2018, hereinafter K18). Using TC scale analyses, Kieu and Moon (2016) proposed that TC dynamics possess some chaotic characteristics that could be responsible for intrinsic intensity variability, even when all favorable environmental conditions for TC development are fixed. This inherent variability of TC intensity is linked to the so-called PI chaotic attractor, which is the regime in the phase space of TC dynamics where intensity could take any value at equal probability around the PI equilibrium. The existence of such a chaotic attractor, if proven, will have profound implications to the practical bound of TC intensity forecast errors that one can achieve with numerical weather prediction (NWP) models as discussed in K18.

From a practical perspective, TC intensity errors in any limited-area dynamical model can be attributed to several factors beyond the intrinsic variability, which include 1) vortex initial conditions; 2) model errors due to inadequate representations of TC physical and coupling processes; and 3) imperfect boundary guidance from global models (see e.g., Bender et al. 2007; Gall et al. 2013; Du et al. 2013; K18). Among these three factors, vortex initial conditions and model errors have been the most extensively studied due to their dominant impacts on TC intensity in real-time forecasts. In fact, previous studies of various vortex initialization and data assimilation schemes have shown that a poorly initialized vortex often experiences some unrealistic adjustment during the first 0–12 h before it can develop more consistent dynamics (Bender et al. 1993; Davidson and Weber 2000; Kwon et al. 2002; Tong et al. 2018). Likewise, inadequate model physics or resolution may lead to an incorrect response of the model vortex to its ambient environment, or produce a track that steers the vortex into a wrong environment. These model errors can result in a very different TC intensity, even with perfect vortex initial conditions (see, e.g., Liu et al. 1997; Braun and Tao 2000; Shen et al. 2000; Davis and Bosart 2002; Pattnaik and Krishnamurti 2007; Osuri et al. 2011).

For limited-area TC models, the third factor related to the external control from global model guidance (mostly via lateral and surface boundary conditions) is another significant source of errors that has received, however, less attention. This type of error due to global boundary conditions is noteworthy, because a regional model may produce an erroneous track and intensity forecast due to erroneous boundary guidance from global models, even if the regional model and its initial conditions are perfect. While a large domain configuration in a regional model is often used to help alleviate the direct boundary impact, the fact that global models dictate the large-scale flow via lateral boundary implies that the role of global models will be increasingly important for TC forecasts at longer lead times as the lateral boundary information sweeps through the domain.

Given the potential roles of boundary conditions in governing TC motion in operational regional models, in this study we examine a central question of how intensity forecast errors in an operational limited-area model depend on the boundary conditions derived from global models. Specifically, we wish to quantify a relationship between track and intensity errors for the Coupled Ocean–Atmosphere Mesoscale Prediction System for Tropical Cyclone (COAMPS-TC) model in this study. This question is of importance not only for future model development, but also of significance for further research on extracting the intrinsic intensity variability from real-time TC forecasts in different basins as discussed in K18. Thus, the main objectives of this study are to 1) estimate the maximum reduction of intensity errors in the COAMPS-TC model when conditioned on track errors for each ocean basin, and 2) assess the intrinsic variability of TC intensity at the TC mature stage, which can account for the minimum 4–5-day intensity error threshold that the COAMPS-TC model can most achieve in the future.

The remainder of the paper is organized as follows. Section 2 describes the input data and methodology for estimating intensity uncertainties caused by track errors in COAMPS-TC retrospective experiments. Section 3 describes a model configuration for a set of idealized experiments that help further extract the intrinsic variability of TC intensity in the COAMPS-TC model. The main results are provided in section 4, and conclusions are given in the final section.

2. Data and methodology

a. Data

In this study, the retrospective TC forecasts for the 2015–17 seasons as well as the real-time forecasts during 2018 were analyzed, using the 2018 version (C218) of the COAMPS-TC model. The C218 retrospective experiments and real-time forecasts use the NCEP Global Forecast System (GFS) initial and boundary conditions for all TCs in the NA, WP and EP basins. Following the standard procedure in an operational mode, all C218 retrospective experiments produce track and intensity forecasts up to 5 days for a list of TCs during 2015–17 in the Automated Tropical Cyclone Forecasting System format. Table 1 lists the number of TC cases conducted in the C218 retrospective experiments, while C218 real-time forecasts include all TC cases for the 2018 season in all three basins. This C218 version presented a number of upgrades as compared to the previous version, including refinements to the balanced vortex initialization method and surface drag coefficient for the high wind regime, improved representation of snow–radiation interaction, and a modified shallow convection parameterization.

Table 1.

List of TCs during 2015–17 seasons included in retrospective experiments with the COAMPS-TC model.

Table 1.

Table 1.

For the best track dataset used to verify the retrospective experiments, the postanalysis TC database provided by NHC for the EP and NA basins and the Joint Typhoon Warning Center (JTWC) for the WP basin were used. Although the postanalysis best track data contains some uncertainties in the track, intensity, and surface wind radii information (e.g., Landsea et al. 2014; Torn and Snyder 2012), the main goal of this study is to examine the relative improvement of the COAMPS-TC intensity forecasts with different track error thresholds, based on the same best track verifying data. Thus, the evaluation of the relative change in intensity errors as a function of track errors is expected to be valid so long as the same best track data are used for all verifications.

b. Methodology

To isolate the impact of global models on C218 intensity forecasts, the accuracy of track forecasts by the global GFS model, which was used to initialize C218, was treated in this study as a condition to stratify C218 intensity verification. For this aim, all GFS track forecasts were first verified against the best track data. Given the GFS track errors, the GFS forecasts were then classified into seven different groups based on their 5-day track error thresholds, which are defined as 500, 400, 300, 200, 150, 100, and 80 n mi (1 n mi = 1.852 km). For example, a group with a 5-day track error of 500 n mi means that only GFS track forecasts whose 5-day track errors are less than or equal to 500 n mi would be used for C218 intensity verification. Given these thresholds for the GFS 5-day track errors, all TC cases corresponding to each threshold were subsequently used for C218 intensity verification.

By decreasing the 5-day track error threshold from 500 to 80 n mi, one should generally obtain the forecast cycles that the GFS model could forecast closer and closer to the best track. In principle, one could lower the 5-day track error threshold to 0 n mi such that only cases with perfect GFS track forecasts could be verified. However, choosing too small of a 5-day track error in the GFS forecast would rapidly decrease the sample size to just a few cases, which cannot ensure any statistical significance. Figure 1 shows the sample size that corresponds to each GFS 5-day track error threshold in the entire C218 dataset during 2015–18. For the 80 n mi threshold, the total number of cases drops by about 50% to 131, 169, and 316 cases in the NA, EP, and WP basins, respectively. For the 5-day track errors smaller than 80 n mi, these numbers drop so rapidly that no statistical significance can be obtained. As such, the smallest 5-day track error threshold for the GFS model is limited to 80 n mi in this study, which corresponds to ~65%, 55%, and 70% track error reductions in the NA, EP, and WP basins, respectively.

Fig. 1.

Fig. 1.

Fig. 1.

(a) Distribution of the number of TC cases during 2015–18 as a function of the GFS 5-day track error thresholds for the NA (black), EP (gray), and WP (light gray) basins. (b) As in (a), but using the C218 5-day track error thresholds. In all the global model track conditioning selection, the 4-day track error thresholds are assumed to be 80% that of the 5-day thresholds to eliminate cases with complicated tracks. See text for more details.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

By choosing TC cycles with smaller GFS track forecast errors as outlined above, we intentionally choose the TC forecasts where the global model supposedly provides better initial and boundary conditions for the regional model. While the initial condition in a regional model is often further enhanced by an additional assimilation step, the boundary guidance provided by global models is always fixed and completely external during the entire course of integration. Therefore, the impacts of lateral boundary conditions become significant at longer lead times, depending on how large the outermost domain of a regional forecast model is and how fast the storm translation speed is. By filtering out the GFS forecasts with large 5-day track errors, it is therefore expected that the large-scale forcing through the lateral boundary conditions from the global model will be more accurate, thus allowing us to evaluate how intensity forecast errors in the COAMPS-TC model depend on its GFS global model input.

Although the above analyses with different GFS track forecast error thresholds could indicate the impacts of the global model on the accuracy of TC intensity forecasts in the COAMPS-TC model, a question of how TC intensity errors depend on track errors that are inherent to the COAMPS-TC model is still not fully known. This is because a more accurate global model forecast does not necessarily imply a more accurate track forecast in a regional model. Likewise, a regional model may still be able to produce a good track forecast even when a global model forecast has relatively poor track performance. In this regard, the use of the global GFS track errors as a sole condition for the C218 intensity verification may not fully capture the relationship between track and intensity errors in the COAMPS-TC model.

To further address the question about how TC intensity errors in the COAMPS-TC model depend on its own track errors, we also examine an analysis in which the track errors of the COAMPS-TC model were used to condition on its own intensity verification, instead of using the global GFS track errors. By looking into how C218 intensity errors behave when its own track errors are reduced, it is possible to provide more conclusive information about the relationship between track and intensity errors in the COAMPS-TC model. This track-intensity relationship will help answer the question of how much TC intensity improvement one would expect from the improvement of track forecasts alone, e.g., via large-scale physical parameterizations or boundary condition settings. A conclusion such as a 50% track error reduction in the COAMPS-TC model due to an improvement in the large-scale environment could lead to a 20% reduction in TC intensity errors would be of equal importance as improvements related to TC inner-core physics or dynamics.

In this regard, the same 5-day track error thresholds as used for the global GFS track forecasts are applied to C218 retrospective intensity verification. As seen in Fig. 1b, these 5-day track thresholds for C218 forecasts show that a smaller track error threshold will also yield a fewer number of cases. However, the sample size for C218 data decreases faster when applying smaller 5-day track error thresholds for C218 track forecasts. For example, the 80 n mi threshold for the GFS tracks could retain 131 cases in the NA basin, but only 121 cases remain when the C218 5-day track errors are applied. A similar change in the sample size is also seen in the EP and WP basins. This behavior is typical for regional models and indicates that regional models may not inherit more accurate track forecasts from global models as seen in real-time forecasts.

In all global or regional track-conditioning intensity verifications carried out in this study, one extra step was also carried out to prevent possible outliers where the 5-day track errors are small, yet the track errors at shorter lead times would be very large. Such cases may be related to too fast or too slow-moving storms during their life cycle, or those making multiple loops such as Hurricane Joaquin (2015) or Hurricane Jose (2017). To prevent these abnormal tracks, we imposed a condition that the 4-day track errors must be less than or equal to 80% of the 5-day track errors. By imposing both conditions to the 4- and 5-day track errors, a smaller 5-day track error in one cycle will indeed give a better overall track forecast across other lead times as expected. For the entire C218 dataset, these 4- and 5-day track error thresholds suffice to ensure the selection of good tracks for the purpose of this study.

We should mention that there is no conclusive way to attribute the impacts of global models on the intensity forecasts of a regional model exclusively to either lateral boundary or initial conditions, using only retrospective or real-time forecast data described above. Regardless of this relative role between boundary and initial conditions, choosing a set of better global track forecasts still captures the degree of impact that the GFS forecasts can impose on the overall intensity forecast accuracy in the COAMPS-TC model. Such information is helpful in estimating the external influence of the GFS model on the skill of the COAMPS-TC model, thus allowing us to better determine the intensity variability that is inherent to the COAMPS-TC model.

3. Idealized experimental design

An apparent issue with extracting intrinsic intensity variation from real-data TC cases is that real TCs experience various external environmental factors that could mask out intrinsic intensity variability. For example, constant TC movement into different environments such as varying wind shear, dry air intrusion, cold SST or land falling can strongly affect TC intensity much more than the intrinsic variation of TC dynamics. In addition, real TC forecast data contain a mix of intensifying and weakening cycles as well as initial adjustment of a model vortex during the early stage of model integration. Thus, intensity errors obtained directly from real-time forecasts or retrospective experiments could not fully capture the intrinsic variability of TC intensity that we wish to also quantify in this study.

While the intensity verification conditioned on track errors as presented in section 2 could help alleviate some of these issues by filtering out the TC cases that either the global model or the COAMPS-TC model could not predict accurately, the intensity errors obtained from the good track forecasts still include environmental factors from global models. As a step to evaluate TC intensity variability inherent to the COAMPS-TC model, a set of idealized experiments was carried out in which all external impacts related to environmental inhomogeneity are minimized.

For this purpose, an idealized configuration of the COAMPS-TC model was designed with a homogeneous 9-km horizontal resolution. The model is configured with 40 sigma vertical levels, and a large domain of ~4100 km × 4100 km in the (x, y) direction. The subgrid-scale moist convective processes were parameterized using an improved Kain–Fritsch scheme (Pan and Wu 1995), and the planetary boundary layer and free atmospheric turbulent mixing and diffusion used Mellor–Yamada’s level 2.5 formulation (Mellor and Yamada 1982). Microphysics were specified via prognostic equations for mixing ratios of cloud droplets, ice particles, rain, snow, graupel, and drizzle following a modified parameterization scheme based on Rutledge and Hobbs (1983) and Lin et al. (1983). An advanced four-stream radiation parameterization of Fu and Liou (1993) was employed to allow for aerosol, cloud, and radiation interactions. The surface layer scheme was based on Wang et al. (2002), along with a shallow convection parameterization scheme. A detailed description of the COAMPS-TC model is provided by Doyle et al. (2012, 2014), and overviews of COAMPS in Hodur (1997), Chen and Yau (2003), and Jin et al. (2014).

Given the above model configuration, a set of control ensemble simulations (CTL) were first conducted in which each level of the sounding profile used to initialize the model was added or subtracted by a temperature perturbation in the range of 0–0.2 K or a relative humidity perturbation from 0% to 0.2%. A total of 30 ensemble members was generated for the temperature perturbation experiments, and another set of 30 ensemble members was generated for the relative humidity perturbation experiments. The difference in _V_MAX among these ensemble members (i.e., the ensemble spread) will indicate how different types of random perturbations could affect the overall intrinsic variability of TC intensity under idealized environments.

Along with the control ensemble, three additional sets of ensemble experiments with different environmental conditions were also examined. In the first set, the lapse rate of the control sounding was modified such that the mean static stability of the troposphere was decreased (i.e., less stable) from ~6.5° to ~6.1°C km−1, while maintaining the same surface and tropopause temperature (Fig. 2). This experiment (referred hereinafter as the WN2 experiment) is motivated by recent studies by Kieu and Wang (2017) and Kieu and Zhang (2018), which showed that PI is sensitive to the tropospheric static stability in the absence of strict moist neutrality condition, even under the same SST and outflow temperature. Similar to the CTL ensembles, two 30-member ensembles were conducted for this set of less stable troposphere experiments; one for the temperature perturbation and the other for relative humidity perturbation.

Fig. 2.

Fig. 2.

Fig. 2.

Initial soundings for three sets of ensemble experiments including the control experiment (CTL; black), the moist experiment (HRH; red), and the weaker stratification experiment (WN2; blue). Thick solid lines are for the temperature, while thick dashed lines are for the dewpoint temperature profiles at the initial time. Ensemble members for each experiment are generated from the corresponding sounding by adding random temperature or relative humidity perturbations of magnitude ±0.1 K and ±0.1% to each level of the initial soundings.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

For the second set of sensitivity experiments, the relative humidity profile in the control sounding was modified such that the tropospheric moisture content was increased from 75% in the CTL sounding to 90% from the surface to 500 hPa but retaining the same moisture content aloft (hereinafter referred to as low-level high moist experiments, or HRH). Two 30-member ensembles in which either small temperature or relative humidity random perturbations were added to the mean profile were then conducted, similar to the CTL and WN2 experiments. This moist lower-troposphere experiment was designed to examine how the environmental moisture could affect TC intensity variability in the COAMPS-TC model. By initializing the model with such a moistened lower troposphere, the onset of the model rapid intensification is expected to take place earlier while the PI limit is unchanged as previously shown in e.g., Kieu et al. (2014), and Kieu and Zhang (2018). Such an earlier onset of rapid intensification without changing the PI limit may expose some additional property of TC intensity variability that we wish to examine.

The last set of sensitivity experiments is to verify the dependence of the error variability on the model resolution. In this set of experiments (5 km), an identical setup including model physics and ensemble design as in the CTL ensemble was used, except that a finer horizontal resolution of 5 km was used. In accordance with this higher resolution, note that the model time step is reduced from 90 to 45 s to ensure the model numerical stability. This 5-km resolution is comparable to the highest resolution (4 km) in the current setting of the operational COAMPS-TC model.

Although a higher resolution is desirable for idealized experiments, the 9-km resolution in the idealized CTL ensemble and other sensitivity experiments is a compromise we had to adopt herein due to the large number of ensemble experiments at a single homogeneous resolution carried out in this study. This homogenous resolution requires not only a great amount of computational, but also significant storage resources for data output and analyses. Except for the 5-km sensitivity ensemble, the results from the 9-km resolution are presented hereinafter, with a caveat that any estimation of TC intrinsic variability at this resolution may capture an estimation rather than the actual TC intensity variability in the COAMPS-TC model.

4. Results

a. Intensity error saturation

As a first step before examining the intensity variability in the 2018 version of the COAMPS-TC model, Fig. 3 shows the intensity verification of the C218 retrospective experiments during 2015–18, using a combination of C218 data for all three major ocean basins including the NA, EP, and WP. One notices that unlike the track errors that display a linear error increase with forecast lead times, intensity errors show a rapid increase for the first 2 days and then quickly saturate at 3–5-day lead times. As discussed in K18, this fundamentally different characteristic between the track and intensity error growth curves may have a profound implication for the understanding of predictability limits. Indeed, it is well known that any dynamical system with limited predictability must possess a saturated error growth curve, which dictates the limit that represents the so-called attractor dimension. The fact that the intensity error curve shows a level-off behavior with the forecast lead times suggests that TC intensity possesses some inherent limited predictability, which prevents one from improving much further in the future as proposed in Kieu and Moon (2016) and K18. In contrast, the track forecast problem does not seem to display such a limited predictability, at least up to 5-day lead time, and so can be potentially further improved in the future, depending on the improvement of large-scale steering flows.

Fig. 3.

Fig. 3.

Fig. 3.

(a) Homogenous 5-day verification of the absolute intensity (solid line; kt) and intensity bias (columns; kt) for the NA basin (black), EP basin (cyan), and WP basin (red) as obtained during 2015–18 period. (b) As in (a), but for the track forecast errors (n mi).

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

It should be noted that the intensity error saturation curve is not always apparent from real-data TCs, as it can vary from basin to basin and season to season. This is because real-time TC intensity forecasts contain a mix of different TCs with various basin-dependent characteristics (see, e.g., Tallapragada et al. 2014, 2015; K18). In fact, the seemingly linear growth of the intensity errors in the EP basin shown in Fig. 3 turns out to be specific to the TCs in the C218 dataset in the EP basin, which is mostly comprised of weak storms. As an illustration, Fig. 4 shows separate intensity verification for a group of strong TCs (initial intensity ≥ 50 kt) and a group of weak TCs (initial intensity < 50 kt) in each individual basin. Consistent with previous findings in K18, one notices that the weak TC group displays a slower intensity error growth rate during the first 3 days (i.e., a smaller slope) as compared to the strong TC group. In addition, the intensity error growth increases roughly linearly with the lead times for the weak TC group, whereas the error growth curve displays a typical error saturation for the strong TC group. Physically, such a linear growth error curve for weak TCs indicates that this type of error growth is a just transient property, which cannot be used to quantify the intensity error saturation related to a predictability limit. Such a distinct error growth characteristic between the strong and weak TC groups is important to highlight here, because the linear growth of TC intensity for the weak TC group may lead to overestimation of the predictability range for TC intensity (see, e.g., Emanuel and Zhang 2016).

Fig. 4.

Fig. 4.

Fig. 4.

As in Fig. 3, but for the verification of the C218 model with a strong/weak TC stratification that is based on the initial maximum 10-m wind larger or smaller 50 kt.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

Despite the strong evidence of intensity error saturation for strong TC statistics as shown in Fig. 4, it should be emphasized that it is not sufficient to attribute this intensity error saturation entirely to the intrinsic variability of TC dynamics. This is because the COAMPS-TC model, like all other operational models, tends to have a larger intensity bias at 4–5-day lead times as seen, e.g., in the EP basin where the underestimated intensity bias of several strong storms dominates the statistics during 2015–18 periods (Fig. 3a). Together with the uncertainties in the guidance from lateral boundary conditions and the physical representation of the large-scale environment, the intensity bias at 4–5-day lead times could account for large intensity errors at longer lead times beyond the pure intrinsic intensity variability. It is the existence of such intensity bias in real-data forecasts that renders the estimation of TC intensity intrinsic variability from real-time forecasts or retrospective experiments challenging.

b. Global boundary dependence of intensity errors

Given the potential effects of lateral boundary conditions on the C218 intensity error saturation at 4–5-day lead times, our first aim is to examine how the global GFS input can influence intensity errors in the COAMPS-TC model as shown in Figs. 3 and 4. Note that the GFS model provides both the initial and boundary conditions for the C218 version of the COAMPS-TC model. Thus, C218 can have a large intensity error due to bad initial and boundary conditions from GFS forecasts, even if COAMPS-TC is a perfect model. In general, it is not possible to attribute the C218 intensity errors entirely to the GFS boundary conditions simply by choosing the cases with better GFS track forecasts. Nonetheless, verifying C218 intensity forecasts conditioned on the GFS track forecast errors could at least indicate how much the GFS model can overall influence intensity forecasts in an embedded regional model. By analyzing these conditional intensity statistics, it is thus expected that the influence of the GFS model on C218’s performance can be evaluated.

In this regard, Fig. 5 shows the verification of C218 intensity errors as the GFS 5-day track errors are reduced to below 500, 400, 300, 200, 150, 100, and 80 n mi. One notices generally that a better GFS track forecast does lead to a better COAMPS-TC intensity forecast, especially at longer lead times. In the NA basin, a 70% GFS track error reduction (i.e., from 500 to 150 n mi) could lead to ~37% reduction (from 14 to 9 kt) in C218 intensity errors up to day 3. At day 4–5, the intensity improvement becomes a mix with no significant improvement at day 4 while a more noticeable intensity error reduction is captured for day 5. In contrast, the intensity improvement in the EP and WP basins is more consistent when the GFS track forecasts are better (cf. Figs. 5c,e). That is, the intensity error reduction is in the range of 20%–23% at the 3–5-day lead times in the EP basin, and is ~19% in the WP basin.

Additional separation between strong and weak TC groups shows that most of this GFS track-related error reduction in C218 at the 3–5-day lead times is from the strong TC group, especially in the NA and EP basin (Figs. 6a–c). Such substantial intensity error reduction for the strong TC group suggests that the intensity of strong TCs is more sensitive to the GFS track guidance as compared to weak TCs, albeit the results are still inconclusive due the small sample size of strong TCs after applying the track error filtering.

Fig. 5.

Fig. 5.

Fig. 5.

Verification of the (left) C218 absolute intensity errors and (right) intensity bias with different thresholds of the GFS 5-day track errors for (a),(b) the NA; (c),(d) the EP; and (e),(f) the WP basins. The black solid line/numbers on top of each panel denote the C218 intensity verification and the corresponding number of cases using all TC cases, and the red solid line/numbers denote the C218 intensity verification and the corresponding number of cases for which the GFS 5-day track errors < 80 n mi.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

Fig. 6.

Fig. 6.

Fig. 6.

Verification of the C218 absolute intensity errors for the strong (gray) and weak TC groups (blue) as in Fig. 4, using the 80 n mi threshold for the 5-day GFS track errors for (a) NA basin, (b) EP, and (c) WP basin. (d)–(f) As in (a)–(c), but using the 2-day GFS track thresholds of 30 n mi that corresponds to ~70% track error reduction at the 2-day lead time (gray shaded column), as compared to a similar 70% track error reduction (80 n mi) for the 5-day lead time (blue shaded column). Black columns are the reference C218 intensity errors using the entire C218 dataset without any tracking filtering for each basin.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

To further confirm the control of the GFS track errors on C218 intensity errors, a sensitivity analysis is carried out in which a 70% reduction threshold is applied to the 2-day GFS track errors (i.e., all GFS 2-day track errors < 30 n mi), instead of applying a similar 70% GFS track error reduction to the 5-day lead time (<80 n mi). Because a 70% track error filter at 2-day lead time does not guarantee the same 70% track error reduction at 5-day lead time, one notices in Figs. 6d–f that the C218 intensity error reduction is consequently smaller for the 2-day threshold as compared to the 5-day threshold. This difference is most apparent in the NA basin (Fig. 6a) at all forecast lead times, whereas the difference appears to be small at 3–5-day lead times in the EP and WP basin. Physically, such a smaller C218 intensity error reduction when conditioned on 2-day GFS track errors, which is hold true even with the strong/weak TC stratification (not shown), reflects the fact that TC intensity at the shorter lead time is less sensitive to lateral boundary conditions as expected. Although a better GFS track forecast presumably indicates overall better boundary and initial conditions, the consistent more improvement of C218 intensity at 4–5-day lead times across the basins, in this regard, supports that the GFS model tends to have a larger influence on the COAMPS-TC intensity forecast skill via lateral boundary control rather than initial conditions.

Except for the NA basin at the 5-day lead time, it is of interest to notice in Fig. 5 that there appears to be a limit beyond which reducing the GFS track 5-day errors does not help lower C218 intensity errors further. On average, the most significant intensity improvement is obtained when the GFS track errors are reduced from <500 to <150 n mi. This improvement is most clear between day-3 and day-5 lead times in the EP and WP basins, and between day 2 and day 4 in the NA basin. This result is noteworthy, because it suggests a limit in TC intensity variability inherent to the COAMPS-TC model, which governs TC intensity beyond the GFS boundary or initial input. Of course, the precise limit of this intrinsic intensity variability cannot be determined from the C218 data, because the limited sample size prevents us from reducing the GFS track errors to zero for C218 intensity verification. Moreover, such an internal intensity variability after removing all possible track influences still contains a combination of both model errors and intrinsic variability of TC dynamics related to initial condition errors that cannot be separated. However, the indication of the existence of such an internal intensity error as obtained in the C218 data is itself important because it indicates how much of the GFS errors contribute to the overall C218 intensity errors as compared to those caused by the COAMPS-TC model internal dynamics.

Of also interest is that by choosing the forecast cases with smaller GFS track forecast errors, the C218 intensity bias is shifted from negative to positive bias at 2–3-day lead times and approaches zero bias at 4–5-day lead times in the NA basin. This substantial change in the C218 intensity bias is likely due to the dominance of weak TCs in the NA basin as pointed out in K18. When choosing cases with better GFS track forecasts, C218 intensity errors are gradually reduced and the TC intensity errors inherent to the COAMPS-TC model are now better isolated. Thus, the change in intensity bias shown in Fig. 5 suggests the tendency of overestimation of TC intensity for weak storms in the C218 data. In this regard, it appears that the C218 negative intensity biases in the NA basin are also due to TC movement guided by the GFS model beyond the inherent physics, data assimilation, or dynamics of the COAMPS-TC model.

In the EP and WP basins, the behaviors of TC intensity bias are somewhat different, with a more consistent negative intensity bias for all thresholds of GFS track errors. This is because TCs in these two basins are generally stronger than those in the NA. For these strong TC cases, C218 tends to underestimate TC intensity regardless of the accuracy of global GFS track forecasts. Thus, reducing the GFS global track errors does not seem to change the negative intensity bias much in these basins, especially at 3–5-day lead times. As a result, the intensity forecasts for the cases with better GFS track forecasts still possess a negative bias in these two basins, suggesting that the intensity biases in the EP and WP basins are related more to the COAMPS-TC’s internal issues with strong TCs rather than GFS position errors.

c. Track dependence of intensity errors

As mentioned in section 2, C218 intensity verification conditioned on GFS track forecasts does not preclude the situations in which C218’s tracks are different from the observed tracks, even with good GFS track forecasts. This is because regional models always develop their own dynamics that may depart from the global model forecasts due to different model resolution, physics, numerical procedures, and boundary settings. Therefore, a good track guidance from global models does not necessarily imply that a regional model will inherit or develop its similar accurate track forecast. For example, one forecast cycle of Hurricane Florence (2018) for which the GFS track forecast was very accurate is shown in Fig. 7, yet most regional models had a very different track at 4–5-day lead times. Such different track forecasts between GFS and regional models that use the GFS input data for initial and boundary conditions are commonly realized in real-time forecasts across ocean basins.3 A question of how intensity errors depend on track errors in models such as COAMPS-TC is therefore still not fully clear with the intensity verification conditioned on GFS track forecast errors alone.

Fig. 7.

Fig. 7.

Fig. 7.

An example of real-time track forecasts of Hurricane Florence (2018) valid at 1200 UTC 8 Sep 2018 from three different operational regional models HWRF (red), HMON (blue), and CTCX (cyan) that used the GFS global forecast (AVNO; green) as initial and boundary conditions, which were provided in real time by NOAA/NCEP. The thick black solid indicates the best track, and the orange solid line denotes the NHC official track forecast.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

To further examine the relationship between track and intensity errors, Fig. 8 shows C218 intensity verification conditioned on C218’s own track errors, using the same 5-day track error threshold of <500, 400, 300, 200, 150, 100, and 80 n mi as in the analyses with GFS tracks. Despite the use of its own track errors as a condition for intensity verification, one notices the overall consistent behaviors in the C218 intensity error reduction as a function of C218 track errors, similar to those conditioned on the GFS track errors. Specifically, it is observed again that most of C218 intensity error reduction is realized at longer lead times between 3 and 5 days in all basins. Second, the percentage of the TC intensity error reduction is substantially smaller than that of the track error reduction. That is, a decrease of 50%–70% in the track forecast errors leads only to about 18%–23% reduction in C218 absolute intensity errors among the three basins. In addition, reducing the 5-day track errors below 100 n mi does not seem to further improve intensity forecasts (except for the 5-day intensity error in the EP basin). These consistent characteristics of the dependence of C218 intensity errors on its track errors thus confirm that a significant portion of C218 intensity errors is related to a TC movement into a wrong environment.

We note that unlike the intensity verification conditioned on the GFS track errors, the verification conditioned on C218’s own track errors generally better captures the dependence of intensity errors on track errors. This is because applying a filtering condition directly to the C218 track error allows for selecting better C218 track forecasts, whereas choosing a TC case with good GFS track forecasts does not always imply a similar good track forecast in regional models driven by the GFS input data (see, e.g., Fig. 7). Therefore, the consistently smaller percentage of the C218 intensity error improvement between two different verifications conditioned on two different tracks (i.e., Fig. 5 versus Fig. 8) strongly supports the existence of an internal TC intensity variability that is inherent to TC dynamics and model physics, rather than merely from the track influence.

Because of the potential impact of landfalling effect even after removing all TC cases with bad track forecasts that could interfere with the C218 error reduction, Fig. 9 shows additional analyses of C218 intensity errors before and after removing all forecast cycles that contain landfalling during any part of C218 5-day track forecast, using one specific filter threshold filter of 80 n mi for 5-day track errors. Because the impacts of landfall on TC intensity is most apparent at long lead times, only 4- and 5-day intensity errors are presented in Fig. 9. Except for the 5-day lead time in the WP basin, it is seen from Fig. 9 that C218 intensity errors do not change the overall characteristics of the C218 error saturation for the 80 n mi threshold. That is, removing landfalling or not, the 4–5-day C218 intensity errors cannot be reduced further. We note here that the WP basin is somewhat of an outlier (Fig. 9b) in the sense that the C218 intensity errors could be further reduced after removing all landfalling cycles due to many supertyphoon cases in the C218 retrospective database that have different landfall timing in the forecasts, such as Dujuan (2015), Koppus (2015), Mindulle (2016), or Namtheun (2016). Thus, removing the landfalling forecasts of these TCs could further reduce C218 intensity errors, yet the smallest absolute intensity error is still saturated around 12–13 kt at 4–5-day lead times as shown in Fig. 9.

Fig. 9.

Fig. 9.

Fig. 9.

Comparison of the C218 intensity error reduction at 5-day lead time between all TC tracks and no-landfalling tracks (columns) when the C218 5-day track error filter is set to be less than 80 n mi for (a) NA, (b) EP, and (c) WP basins.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

This important finding regarding a potential existence of a lower bound in the absolute TC intensity errors even after conditioning on the 5-day track forecast errors < 80 n mi is worth emphasizing. On the one hand, it indicates that improving TC intensity forecast requires not only improvement in representing model TC physics or initial conditions but also the large-scale environment governing TC motion. On the other hand, this result suggests that there is a limit in intensity improvement related to TC intrinsic variability that one cannot overcome in regional models. Such an intrinsic variability always exists due to the stochastic nature of the atmosphere, which prevents one from obtaining a perfect initial condition that we now turn to in the next section.

d. Intrinsic intensity variability in the COAMPS-TC model

From practical perspective, choosing a TC case with a small track error could not generally capture the variability of intensity intrinsic to TC dynamics, even for cases with zero track errors. This is because these perfect track cases, in reality, always have some initial condition errors as well as model errors that cannot be eliminated by simply selecting perfect track forecasts. In addition, the best track dataset contains observational errors that contribute further to uncertainties in TC intensity verification. Thus, the intensity error saturation shown in Figs. 5 or 8 after reducing either the GFS or the C218 track errors would not be able to represent the truly intrinsic intensity variability related to TC dynamics as expected.

As a step to eliminate such external factors, Fig. 10 shows the time series of _V_MAX for a set of ensemble experiments under a perfect model scenario as described in section 3. One notices first in Fig. 10a that the CTL ensemble shows an expected vortex development with an initial period of moistening in the lower troposphere from 0 to 72 h, followed by a period of rapid intensification from 72 to 90 h, and finally a steady stage that is well maintained around 50 m s−1 during the rest of the simulation. Such a model vortex development is typical in all idealized simulations of TC development, confirming the tendency of the TC idealized vortex toward a stable potential intensity equilibrium as found in previous studies (e.g., Rotunno and Emanuel 1987; Hakim 2013; Kieu 2015).

Fig. 10.

Fig. 10.

Fig. 10.

Time series of the maximum 10-m wind during the course of idealized simulations under the COAMPS-TC perfect model scenario that are perturbed by temperature and relative humidity random perturbations, which are obtained from (a) the CTL experiments, (b) WN2 experiments, (c) HRH experiments, and (d) 5-km resolution. The black lines are the ensemble spread standard deviations of the surface maximum wind speed.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0085.1

Of specific relevance to this study is, however, the behavior of the ensemble spread of TC intensity at the maximum intensity4 stage. We note that the intensity fluctuation at the mature stage is of more interest here, because it represents the intrinsic variability of TC dynamics that we wish to quantify. The existence of such a quasi-stationary mature stage is essential for any dynamical system that possesses chaotic dynamics. While rapid intensification is certainly an important feature of TC development that is critical in operational forecast, the evolving background dynamics during TC intensification (the so-called noncentral orbit in Lorenz 1963) does not generally possess a well-defined error growth saturation and related statistics needed to examine TC predictability. Thus, we hereinafter focus only on the intensity variation at the mature stage, which is assumed to roughly represent the intrinsic intensity errors at 4–5-day lead times in real-time forecasts. The validity of this assumption is certainly an open issue, but it is the closest one can have to compare idealized _V_MAX variability with the 4–5-day intensity errors from real TC forecasts.

As seen in Fig. 10a, the CTL ensemble spread shows a quick growth with time during the early development, which displays a transient property before approaching the PI equilibrium as pointed out in K18. The ensemble spread reaches its peak near the end of TC rapid intensification period and eventually settles down to a stationary value at the PI state after 154 h into integration. For the CTL ensemble, this stationary ensemble spread is ~2 m s−1, regardless of how the CTL ensemble is perturbed by moisture or temperature noises. Under the perfect model scenario, such variability of TC intensity at the PI equilibrium is inherent to the COAMPS-TC model, and thus represents the internal variability due to random fluctuations of the atmosphere that can never be eliminated. Indeed, an observational accuracy of ±0.1% for relative humidity or ±0.1 K for temperature is virtually impossible, if at all, to be avoided in any practical model initialization. Yet, this small uncertainty is capable of introducing a variability of TC intensity of ~2 m s−1 as shown in Fig. 10. While this uncertainty may appear to be specific to the COAMPS-TC model, we note that a similar study by K18 using the Hurricane Weather Research and Forecasting (HWRF) Model captured the same magnitude of intrinsic intensity variability in their idealized experiments. As such, the intensity variability of ~2 m s−1 at the PI equilibrium may reflect the chaotic dynamics of TCs as recently proposed in K18, instead of model specifics.

For the WN2 and HRH sensitivity ensemble experiments, one notices several changes in the characteristics of TC intensity. First, the ensemble mean of PI in the WN2 ensemble (~57 m s−1, Fig. 8b) is significantly higher than that in the CTL ensemble (~50 m s−1, Fig. 8b), while the PI limit in the HRH experiment is almost the same as the CTL (Fig. 8c). Such a higher PI for a less stable troposphere as seen in the WN2 ensemble is consistent with recent studies by Kieu and Wang (2017) and Kieu and Zhang (2018), in which they showed that the tropospheric static stability can affect the TC maximum intensity beyond Emanuel’s classical PI theory. Specifically, the strict assumption of the slantwise moist neutrality in Emanuel’s PI theory appears to be too strong for real TC development (e.g., Ooyama 1969; Peng et al. 2018). Relaxing this assumption leads to an explicit dependence of PI on the static stability with a larger PI for a less stable troposphere, even when SST and outflow temperatures are fixed. On the other hand, increasing the initial moisture profile in the HRH ensemble does not affect the PI limit, except the onset of rapid intensification that occurs much earlier in the HRH ensemble similar to what was obtained in previous modeling studies (e.g., Kieu et al. 2014; Kieu and Zhang 2018). This early intensification onset in the moist ensemble is expected, because an initially higher moisture troposphere could allow the storm central region to be saturated earlier and develop deep convection more efficiently.

The second noteworthy aspect that can be drawn from the sensitivity ensembles is the characteristic of TC intensity ensemble spread at the PI equilibrium. Despite the change in the PI value, it is apparent that there is insignificant change in the fluctuation of _V_MAX around the PI equilibrium across all sensitivity experiments. For example, the averaged ensemble spread decreases slightly from ~2.0 m s−1 in the CTL to ~1.8 m s−1 in the HRH ensemble during the last 36 h of model simulation, but stays about the same in the WN2 ensemble (Figs. 10b,c). Likewise, the 5-km resolution ensemble exhibits almost the same spread approximately 2 m s−1 at the mature stage. In this regard, changing the initial moisture content in the HRH ensemble or increasing the model resolution in the 5-km ensemble does not affect the fluctuation of TC intensity around the PI attractor, and reflects the intrinsic intensity variability as expected.

Although the _V_MAX ensemble spread during rapid intensification is not the focus of our idealized experiments in this study, it is of interest to note here some unique behaviors of the _V_MAX spread among different sensitivity experiments (Fig. 10). For example, the peak in the intensity ensemble spread in the WN2 and 5-km experiments is shifted much earlier with an amplitude that is roughly half of that in the CTL ensemble, whereas the peak in the intensity spread almost vanishes in the HRH ensemble. This substantial change in the intensity spread is likely rooted in the fact that the onset of rapid intensification in these ensembles takes place at a significantly earlier time (around 66 and 30 h into integration, respectively) as compared to 108 h in the CTL ensemble. As a result, the environmental conditions in these ensembles tend to be more homogeneous, thus accounting for the smaller ensemble spreads. In contrast, the long incubating time before intensification in the CTL ensemble leads to a significantly different ambient environment and locations where the model vortex reaches its maximum intensity (not shown).

The implication of the potential dependence of the intrinsic intensity variability on environmental conditions is nontrivial. Indeed, one direct consequence of this result is that the minimum absolute intensity errors that the COAMPS-TC model can achieve is no longer a constant, but it changes with environment. For example, in the WP basin where the tropospheric static stability is less than that in the NA basin (e.g., Knutson and Manabe 1995; He et al. 2017), the smallest absolute intensity errors that one can achieve with the COAMPS-TC model may differ from that in the NA basin, assuming a climatologically warmer SST in the WP basin. While the results presented in our idealized experiments herein could not explicitly capture such a change in the _V_MAX error saturation as a function of environment, this is central for studying TC intensity predictability and warrants further study.

5. Conclusions

In this study, the dependence of TC intensity errors on track errors for the COAMPS-TC model was examined, using the 2015–17 retrospective experiments and the 2018 real-time forecast database (C218). By conditioning the C218 intensity verification on global GFS 5-day track errors, it was found that the global GFS forecasts have a significant impact on the accuracy of the COAMPS-TC intensity forecast. Specifically, C218 intensity errors at the 4–5-day lead times can be reduced by ~20% for the cycles in which the GFS 5-day track errors are reduced from 500 to 80 n mi (corresponding to about 65%–70% track error reduction, depending on the basin). This impact of the GFS control on the COAMPS-TC model is consistent in all three major ocean basins including the NA, EP, and WP basins, thus highlighting the notable impact of global models on the quality of intensity forecast by regional models. While it is generally difficult to separate the effects of GFS initial and boundary conditions in an operational model, the finding that GFS 5-day track errors have the largest impact on C218 intensity errors at 4–5-day lead times indicates that GFS lateral boundary conditions can account for a significant amount of C218 intensity errors in real-time forecasts.

It is of interest to note, however, that choosing cycles with GFS 5-day track errors less than 80 n mi does not seem to show much further improvement in C218 intensity forecasts, except for the EP basin where improving the global GFS track forecast could reduce the C218 intensity errors persistently at all lead times. This result is significant, because it indicates a limit that TC tracks can control the intensity errors in the COAMPS-TC model. Although the 80 n mi threshold for the GFS 5-day track errors is not sufficient to eliminate the contribution of TC tracks to intensity errors, the consistency of the intensity error reduction rate across the basins as found herein highly supports the existence of internal intensity variation beyond the impacts of TC tracks.

Because regional models tend to develop their own dynamics that may not inherit all benefits from better global track forecasts, additional conditional intensity verification was conducted for the C218 data in which its own track errors were used. By applying the same 5-day track error thresholds, it was found that there is a similar dependence of intensity errors on track errors as obtained from that of using the GFS tracks. That is, reducing overall C218 track errors by 60%–70% can improve TC intensity errors by roughly 20% at 3–5-day lead times in all three basins, while no significant improvement in the TC intensity errors was obtained at shorter lead times. Moreover, reducing the C218 5-day track errors below 80 n mi does not help improve C218 intensity errors further. Such consistent behaviors of C218 intensity errors between the two intensity verifications conditioned on two different track datasets indicate the limit that the TC motion can impose on TC intensity.

The finding from the above analyses that reducing 65%–70% of track errors could help improve C218 intensity errors by only 18%–25% has noteworthy implications. One the one hand, this result provides an estimation for the portion of TC intensity errors in the C218 experiments is caused by inaccurate TC tracks that drive TCs into different environments. On the other hand, this same result also reveals the fraction of C218 intensity errors that is inherent to the COAMPS-TC model (model errors) and TC dynamics (intrinsic variability related to model initial conditions and stochastic fluctuations). In fact, a saturated intensity error in the range of 8–12 kt, even after removing the track influence, is quite persistent in all basins (within a given uncertainty of the sample size). This internal variability of TC intensity is critical for future operational model development, as one needs to know how much of intensity errors are related to the intrinsic TC dynamics, which can never be improved, versus the COAMPS-TC model error and/or representation that can be improved in the future.

To isolate the intrinsic intensity variability in the COAMPS-TC model, without uncertainties of TC intensity observation or model errors as in real TC cases, a range of idealized experiments under a perfect model scenario with different environmental settings and resolution were presented. By perturbing each environment with small-amplitude temperature and relative humidity noises, we showed that the COAMPS-TC model possesses an inherent intensity variability of ~2 m s−1 in the absence of all intensity observational errors and other external factors. In addition, this intrinsic intensity variability changes when the environmental conditions are varied, thus supporting the conclusion that the limit that one can reduce TC absolute intensity errors in the COAMPS-TC model is not universal, but varies from basin to basin due to the difference in large-scale environments. Practically, this inherent intensity variability along with the observational intensity errors, therefore, puts a limit on the minimum absolute intensity errors around 9–12 kt that the COAMPS-TC model can achieve that one has to take into account in the future model development.

Acknowledgments

This research was supported by the ONR Young Investigator Award (N000141812588), and the ONR/TCRI Award (N000142012411). The first author also wishes to thank the NCAR/ASP visiting program for their summer support and hospitality during the preparation of this work. The NRL coauthors were supported by the Chief of Naval Research through the NRL Base Program (PE 61153N) and the Office of Naval Research TC Rapid Intensification DRI (PE 0601153N). We wish to also thank three anonymous reviewers for their constructive comments and suggestions, which help improve this work substantially.

Data availability statement

All real-time and retrospective dataset produced by the COAMPS-TC model used in this study are available for public release upon request. The real-time hurricane forecast from other operational models is provided directly by the National Hurricane Center data portal, which is also freely accessible.

REFERENCES