In our analysis, we compare several methods for estimating the unrealized U.S.-Cuban trade potential in the context of the gravity trade model. We are the first to focus on the Hausman-Taylor method for out-of-sample trade projections, and find that this seldom-used method to be the superior choice. The Hausman-Taylor method eliminates the heterogeneity bias that plagues ordinary least squares (OLS) estimation and the correlation between included variables and the individual error term that introduces bias in random-effects estimation. Further, unlike fixed-effects estimation, the Hausman-Taylor method allows for the inclusion of time-variant explanatory variables.
The U.S.-Cuban trade relation provides a unique opportunity to estimate trade potentials. The economic relationship between the United States and Cuba was very strong prior to the socialist period. Sixty-seven percent of Cuban exports and 70 percent of imports were with the United States in 1958.1 The U.S. was also the main source of both private and official capital for Cuba.2 Since the Cuban revolution and the subsequent U.S.-imposed economic sanctions, trade between the two countries has been effectively eliminated, at least until recently (in the case of agricultural exports to Cuba). In addition to analyzing competing estimators based on their economic properties, the unrealized trade potential between the U.S. and Cuba allows for a more practical assessment. The final trade potential estimates should be comparable to those of similar countries in the region, as well as the historical (pre-1959) U.S.-Cuban trading pattern.
The gravity trade model is the obvious choice for this analysis; since the early 1960s it has been utilized to estimate trade flows.3 The Model is based on the assumption that trade can be explained by size (GDP or GDP per capita), distance (physical distance and/ or various measures of economic distance), and other measures of preferences (common border, common language, etc.). In various forms, it has been applied in studies analyzing the border effect on trade,4 as well as estimating the impact of currency unions, preferential trading agreements, free trade agreements, and removing trade barriers.5
In predicting trade potential, the gravity model has been used in two different ways. The first strategy is based on in-sample predictions.6 In this method, the country pair(s) under examination is included in the sample. The residual is then interpreted as the difference between potential and actual bilateral trade relations. Recent research has been critical of this approach. In the context of trade potential between EU and former COMECON countries, Egger (2002) shows that large systematic differences between residuals among country groups are not found when the proper estimation technique (one with white-noise residuals) is used. Egger (2002) suggests “that any systematic difference between observed and in-sample predicted trade flows indicates misspecification of the econometric model instead of unused (or overused) trade potentials.”
The second strategy, and the one employed here, is the out-of-sample approach. The gravity trade model is estimated excluding the trade flows of interest. The model’s parameters are then used to project natural trade relations between countries outside the sample. The difference between the observed and the predicted trade flows can be interpreted as unrealized trade potential. This approach is similar to that used in Wang and Winters (1991), Hamilton and Winters (1992), and Brulhart and Kelley (1999).
As alluded to earlier, the choice of estimation technique is extremely important in correctly estimating trade potentials. The most common techniques used to estimate the gravity trade model has been questioned in recent literature. Among others, Cheng and Wall (2002) demonstrate that OLS estimation of the gravity model is susceptible to heterogeneity biases. That is, if trading partners are heterogeneous in ways not accounted for in the model, and if that heterogeneity is somehow related to the variables that are included in the regression, then the resulting estimates will be biased. They suggest the fixed-effects estimator based on a data panel, that is, cross-sectional observations on two or more years.
Fixed-effects estimation allows for individual effects by estimating a separate intercept for each country pair. However, this technique does not allow for the inclusion of time invariant variables. Their effect on trade is captured by country-pair-specific constant terms. “This modeling assumes that there are fixed pair-specific factors that may be correlated with levels of (trade) and with the right-hand-side variables. It is in this sense fixed-effects modeling is a result of ignorance: we do not have a good idea which variables are responsible for the heterogeneity bias, so we simply allow each trading pair to have its own dummy variable.” 7 This estimation method has severe limitations when estimating potential trade flows using the outof- sample technique. Much information needed for an accurate prediction of potential trade flows is contained in the country-specific constant terms. Estimation of a constant for out-of-sample countries is problematic, and at best ad hoc.
Another method that allows for the inclusion of individual effects is the random-effects estimator. Random- effects has the added benefit of the inclusion of time-variant variables. This specification is based on the assumption that individual effects can be included as part of the error term; however, this method is susceptible to bias if there is correlation between these effects and the regressors. This is often the case empirically. Nonetheless, it has been used as an alternative to fixed-effects estimation when the effect of time-variant explanatory variables is of importance or when no bias has been detected.8
Hausman and Taylor (1981) suggest an alternative that combines the beneficial aspects of both the random- effects and fixed-effects estimators. The major shortcoming of the random-effects model is the assumption that the included explanatory variables are uncorrelated with the error term. The Hausman- Taylor method is an instrumental-variable technique that uses only information already contained in the model to eliminate the correlation between countryspecific effects and the error term. Unlike the fixedeffects estimator, this approach does not necessitate the elimination of time-invariant explanatory variables. Egger (2002) is the first to apply this approach to the gravity model in his critique of in-sample trade potential estimation.
In our analysis, we employ the out-of-sample approach to estimating the trade potential between the U.S. and Cuba. We compare the OLS, fixed-effects, random-effects, and Hausman-Taylor estimation of the gravity trade model and provide substantial evidence that the Hausman-Taylor estimator is the superior choice in this setting.
The remainder of the paper is organized as follows. The next section contains a detailed description of the methodology used and a description of our data set. In the following section, we summarize results. The last section concludes and offers ideas for future research.
DATA AND METHODOLOGY
We estimate the gravity model separately using four different techniques: OLS, fixed-effects (FE), random- effects (RE), and the Hausman-Taylor method (HTM). The OLS (equation 1), fixed-effects (equation 2), and random-effects (equation 3) estimators are straightforward and are as follows:
where α0 is an overall constant and εijt is a mean zero error term;
where αij is a specific country-pair effect between trading partners and captures the effect of all time invariant variables;
We define the independent variable, Yijt as imports of country i from country j in year t. The data set contains annual trade flows9 between 101 trading partners (see Appendix A) for the time period 1996 to 2000. Numerous individual trading pairs were eliminated due to missing data, and the final data set consists of 9,230 country pairs. This translates to 46,150 trade-flow observations over the five-year period.
The explanatory variables are divided into two groups, those that change through time and those that are constant.
Xijt = [ xit xjt …] is a 1 x 9 row vector of country-specific variables that change through time. These include the standard-gravity model variables: GDPs per capita, and populations of both countries.10 We also include a measure of economic freedom for each country, the Heritage Foundation’s Index of Economic Freedom. In this index, a higher value indicates less economic freedom. High levels of economic freedom are associated with low levels of governmental, social, and/or political barriers to trade. Therefore, we expect negative coefficients for these variables. In addition, we include the absolute value of the difference of the two trading partners’ freedom index and trade freedom index scores.11 The coefficients of the freedom index variables are expected to be negative; the closer two countries are in terms of their freedom level, the more likely they are to trade. Lastly, we include a variable to indicate both countries’ membership in a preferential trading agreement.12 Member countries enjoy the benefits of reduced transaction costs (such as tariffs), which would presumably lead to higher levels of trade.
Zij = is a 1 x 4 row vector of time-invariant country-pair- specific variables. These include the direct-line distance between capitals and common border.13 We also include dummy variables for past or present communist affiliation. We include a variable that takes on the value of one if both of the trading parties have past communist affiliation and zero elsewhere. A different indictor variable takes on the value of one if both trading partners are not former communist countries.
The Hausman-Taylor method is an extension of the random-effects estimator. The main assumption of the Hausman-Taylor method is that the explanatory variables that are correlated with μij can be identified. Equation (3) is augmented as follows:
where X1 are the variables that are time varying and uncorrelated with μij; X2 are time varying and correlated
with μij; Z1 are time invariant and uncorrelated with μij; and Z2 are time invariant and correlated with μij.
The presence of X2 and Z2 is the cause of bias in the random-effects estimator. The strategy proposed by Hausman and Taylor (1981)14 is to use information already contained in the model to instrument for the problematic variables, X2 and Z2. Hausman and Taylor show that the needed set of instrumental variables can be constructed as follows:
The group mean deviations of X1 and X2 can be used as instrumental variables. This is based on the same logic as the fixed-effects estimator. The transformation to deviations from the group means removes the part of the disturbance term that is correlated with X2. By definition, Z1 is uncorrelated with the error term and can therefore be included in the set of instrumental variables. The final set of instrumental variables is the group means of X1. The availability of these variables as instruments is not intuitive, but an econometric explanation is provided by Hausman and Taylor. The model is identified as long as the number of variables in X1 is greater than the number of variables in Z2.
The selection of the variables that should be included in X2 and Z2 is not obvious. Hausman and Taylor (1981) base their selection on economic intuition. In our model, intuition alone does not point to a set of variables. We propose a process to select the set of variables to instrument that goes beyond economic intuition. The goal is to identify the variables that are correlated with the individual effects. If the Hausman- Taylor method is being considered, then the random-effects estimator has been shown to contain bias. The fixed-effects model, however, includes the proper modeling of the individual effects. Therefore, we estimate the fixed-effects estimator; this gives us an individual-specific constant term for each country in the sample. We then test for correlation between this term and the explanatory variables. Table 1 shows the correlations for each explanatory variable and αij. The variables separate into three groups, variables with high, medium, and low correlation. The relatively high-correlation group (over 0.1 in absolute value) includes border, distance, population of country i, population of country j, membership in a preferential trading agreement, per capita GDP of country j, and the freedom index score of country j. The medium-correlation (0.05 to 0.1 in absolute value) group is much smaller, and includes the absolute value of the difference-in-trade-freedom score and both countries having a communist history. The lowcorrelation group contains variables with correlations of less than 0.05 in absolute value. This group includes the absolute value of the difference-in-freedom score, per capita GDP of country i, the freedom- index score of country i, and both countries having a non-communist history.
Given the restriction for identification of X1 being greater than Z2, the selection of variables to instrument for is not difficult. We select the variables from the high correlation group as follows: Z2 (border and distance) and X2 (population of country i, population of country j, membership in a preferential trading agreement, per capita GDP of country j, and the freedom-index score of country j). In this way, we have been able to identify the variables that are correlated with the individual effects in the data.
The results will be discussed in three sections. First, we will summarize the various economic tests to determine the appropriateness of each of the estimators. Next, the parameter estimates will be discussed, and finally, estimates of U.S.-Cuban trade flows will be given based on each estimator.
Comparison of the Estimators’ Econometric Properties
Past research has shown that OLS is susceptible to heterogeneity bias. An examination of our residuals confirms the presence of heterogeneity bias in our data, as well. Figure 1 contains the residuals from OLS estimation. When graphed against imports, the residuals form a clear pattern. As the magnitude of the trade flow increases, the errors are positive and increasing. At low levels of trade, the residuals are consistently negative. In contrast, Figures 2-4 contain the residuals from the fixed-effect, random-effect and Hausman-Taylor methods, respectively. It is clear that for each of these estimation techniques, the heterogeneity bias is eliminated.
The next step in selecting the appropriate estimator is to use an F-statistic to test for individual and time effects. If individual effects are present, then OLS is not appropriate and another method that allows for individual effects (fixed-effects, random-effects, or the Hausman-Taylor methods) should be selected. We find strong evidence indicating the presence of individual effects15 and evidence against time (or period) effects.16 The results of the F-tests and the presence of heterogeneity bias are clear evidence against the use of OLS, suggesting that a more appropriate estimator should allow for individual effects.
Next, we test to determine if there is correlation between included variables in the model and the error terms. If correlation is detected, the random-effects estimator can be eliminated as a possible estimation technique. First we perform a Hausman (1978) test comparing the fixed and random-effects estimators.17 We conclude that there is correlation between the included variables and the error terms, and therefore fixed-effects is a better choice than random-effects.
An additional Hausman (1978) test is conducted using the fixed-effects and the Hausman-Taylor methods to determine if the instrumental variable technique has eliminated the correlation that plagued the random-effects estimator.18 We find that the correlation has been removed, and conclude that, of the two alternatives considered here, the Hausman-Taylor estimator is the better choice. That is, the problematic correlation between variables included in the model (X2 and Z2) and the individual component of the error term that introduced bias into the random-effects estimator has been removed through the use of instrumental variables.
Comparison of Parameter Estimates
Table 2 contains the parameter estimates of the gravity model using the four different estimation techniques (equation 1-4). As expected, the parameter estimates for the fixed-effects and Hausman-Taylor methods are very similar. This confirms that we are able to separate the effects of time-invariant variables using the Hausman-Taylor estimator without compromising the parameter estimates of the time-varying variables. Comparing the parameter estimates for the fixed and random-effects estimators shows that if the random-effects estimator were chosen under the premise that the time-invariant variables is crucial to the analysis, the time-varying parameter estimates would be compromised. Specifically, the parameter estimates for population of country i, per capita GDP of country j, population of country j, the freedom- index score of country j, and membership in a preferential trading agreement are quite different for the fixed and random-effects estimators.
Further, the Hausman-Taylor method is able to provide statistically significant parameter estimates for 2 out of the 4 time-invariant variables. Therefore, we are able to successfully estimate the effect of time-invariant explanatory variables that, under fixed-effects estimation, would be consolidated in the countryspecific constant term. In addition, it is of particular interest to note that the distance and common border variables are not statistically significant. This is consistent with the results of Egger (2002) who also finds the effect of distance (as measure by physical distance between capitals and the border) effect to be insignificant. These results clearly call into question the use of this type of measure in gravity-model estimation.19
It is also interesting to note that membership in a preferential trading agreement has a statistically significant effect for the OLS and random-effects methods, but not for the Hausman-Taylor or fixed-effects methods. In many cases, countries that enter a preferential trading agreement have similar characteristics. In the random-effects and OLS specifications, this variable may be capturing effects that are included in the individual effects in the properly specified models (fixed-effects and Hausman-Taylor methods).
Trade Flow Estimates
Table 3 contains trade-flow estimates. We apply the out-of-sample technique to calculate these estimates. The out-of-sample approach to estimating trade potential between the U.S. and Cuba is straightforward for the OLS, random-effects, and Hausman-Taylor estimators and is calculated as follows:
The parameter estimates for OLS and random-effects have been shown to be biased; however, the Hausman- Taylor method parameter estimates are not and we are therefore able to use the out-of-sample method of trade projections and include time-invariant variables.
In the case of the fixed-effects estimator, the approach is much more complex, and relies on the ad hoc assignment of an individual dummy variable for the U.S.–Cuba trading pairs. The trade flow estimate is achieved as follows:
The constants from equation (6), can be recovered using the OLS normal equations as follows:
The same notation is followed for X. All of the needed information is present in the data set except for Yij for Cuba. As a proxy, we substitute the individual- specific mean for the Dominican Republic, the country that arguably most closely matches Cuba. This underscores the ad hoc nature using the out-of- sample method with the fixed-effects estimator.
Table 3 contains the trade flow estimates for each technique, along with the 95 percent confidence interval for the estimation.20 The OLS and random-effects estimates are very similar, and tend to be less than those of fixed-effects and Hausman-Taylor methods. Although the parameter estimates are very similar for the fixed-effects and Hausman-Taylor estimators, the trade projections are quite different. This highlights the benefits of using the Hausman- Taylor method, which yields a more precise estimate than fixed-effects due to the inclusion of more explanatory variables. In addition, the Hausman-Taylor method does not require an ad hoc specification of the individual specific constant term for Cuba.
Table 4 places the trade-potential projections in both historical and regional perspective. The trade flow percentages included in this table are based on the assumption that 50 percent of the trade projected between the U.S. and Cuba would displace existing Cuban trade.21 In the case of imports, the OLS and random-effects estimators consistently underestimate the U.S.-Cuban trade flow (52 percent) as compared to the historical US-Cuban trading pattern (70 percent) and that of the Dominican Republic (62 percent), the country most like Cuba in the region. The projections based on fixed-effects estimation (74 percent) seem to be more reasonable, but overestimate the level of imports. On the other hand, the Hausman- Taylor method produces estimates that are nearly identical to regional trading patterns (61 percent compared to 62 percent for the Dominican Republic) and very similar to the historical U.S.-Cuban relationship.
In terms of Cuban export projections, each of the estimation techniques produce estimates that are reasonably close to those of the regional trading patterns and the historical relation between the U.S. and Cuba. However, the fixed-effects and Hausman-Taylor estimators produce projections considerably higher than those of the OLS and the random-effects estimators.
It is important to keep in mind that the Hausman- Taylor and the fixed-effects methods are the only estimators that properly model the individual effects in the data. The consistently lower projections of the OLS and random-effects estimators may be due to the various forms of bias introduced with these methods. Based on the historical Cuban data and trading patterns of the region (especially the Dominican Republic), we conclude that the Hausman-Taylor estimator produces the most plausible trade potential predictions. In addition, the Hausman-Taylor method is the only estimator with projections that are reasonable for both imports and exports.
In our analysis, we compare several methods for estimating the unrealized U.S.-Cuban trade potential in the context of the gravity-trade model. We find the seldom-used Hausman-Taylor method to be the superior choice for estimating trade flows using the out-of-sample approach. The Hausman-Taylor method is ideal because it allows for the inclusion of time-invariant variables in trade projections and circumvents the problem of an ad hoc estimation of the country-specific dummy variable needed for a projection based on the fixed-effects estimator. In addition, based on a Hausman (1978) specification test comparing the Hausman-Taylor method and the fixedeffects estimator, the Hausman-Taylor method proved to be a superior specification given our data. Examining the trade potential projections of the various estimators in both historical and regional contexts, it is clear that the Hausman-Taylor estimator produces more plausible projections than the OLS, random-effects, and fixed-effects estimators. This result holds for both Cuban imports and exports.
This research could be extended in a number of ways. First, our results, combined with those of Egger (2002), call into question the use of physical distance and border in the gravity model, at least in their current forms. The use of the distance between the capitals or economic centers of two countries does not seem to reflect important issues involved in the likelihood of trade, such as transportation costs and political environment. Variables that better capture economic distance or actual transportation costs seem to be better suited to measure the distance between potential trading partners. Therefore, there is room for improvement in this area. In addition, the border variable could be improved. For example, the addition of length of border may prove informative.
Further, an interesting topic for future research is the amount of trade displacement that would occur if the U.S.-Cuban trading relationship were based on economic fundamentals and not political policy. That is, to what extent would free trade between the U.S. and Cuba merely substitute for trade already occurring with Europe? We leave this topic for future research.
1. United Nations Economic Commission for Latin America (ECLAC), Economic Survey of Latin America, 1963 (New York: United Nations, 1965), p.273.
2. Economic Impact of U.S. Sanctions with Respect to Cuba: Chapter 3: “Overview of the Cuban Economy and the Impact of U.S. Sanctions,” U.S. International Trade Commission, February 2001.
3. See Tinbergen (1962) and Poyhonen (1963).
4. See, among others, Helliwell (1998); Helliwell and Verdier (2001); Wolf (2000); and Anderson and Wincoop (2003).
5. See Pakko and Wall (2001).
6. See Baldwin (1994) and Nilsson (2000).
7. Wall (2000).
8. See Baldwin (1994), Gros and Gonciarz (1996), Matyas (1997), and Egger (2000)
9. Trade statistics were obtained from Statistics Canada’s World Trade Analyzer dataset.
10. These data were obtained from the World Bank’s Development Indicators Database.
11. These data were obtained from the Heritage Foundation / Wall Street Journal Index of Economic Freedom. http://www.heritage.org.
12. This variable is based on World Trade Organization records. It includes properly notified and recognized customs unions, free trade agreements, and service agreements. The included agreements are EC, BANG, ASEAN, ECO, GCC, LAIA, SPARTEC, MERCOSU, CEFTA, EFTA, CARICOM, CACM, CIS, BAFTA, NAFTA, PATCRA, CER, EAC, CEMAC, WAEMU, MSG, COMESA, SAPTA, and AFTA.
13. These data were obtained from Direct-Line Distances International Edition.
14. This strategy is explained in detail in Greene (2002, pp. 303 to 309).
15. We use a F[9228,36903] statistic to test if all of the individual effects are equal across groups. The test statistic of 212.91 is far larger than the critical value, and the we can conclude that there are indeed individual effects in the data and OLS estimation is not appropriate.
16. The F[4,46126] statistic value of 0.74 is fair less than the critical value of 2.37, indicating that there are no significant trade flow differences across periods that are not accounted for by our explanatory variables.
17. A test statistic of 34.45 is far larger than the critical value of a chi-squared with 8 degrees of freedom.
18. A test statistic of 12.63 (less than the critical value of 15.51) indicates the hypothesis that the individual effects are uncorrelated with the other regressors in the model cannot be rejected.
19. Trumbull (2001) summarizes a number of issues related to the use of this measure of distance and border.
20. A confidence interval is not included for the fixed-effects estimator projection due to the ad hoc estimation procedure.
21. For reference, Appendix B contains the percentages of trade that would be with the U.S. assuming various levels of displacement. Determining the amount of trade that would be displaced by U.S. trade is a complicated issue, and beyond the scope of this paper. The USITC, Economic Impact of U.S. Sanctions with Respect to Cuba, circumvented this issue with the ad hoc assumption that US-Cuban trade should be restricted to a percentage of current Cuban trade levels. We feel this specification is overly simplistic and the assumption naive.