next up previous
Next: Geographically weighted regression Up: Spatial econometrics and lattice Previous: Specification testing

Modelling spatially dependent data

Ord (1975) gives the Maximum Likelihood methods for estimating the spatial lag and spatial error SAR models; no satisfactory alternatives have been found subsequently, chiefly because of the important role of the Jacobian expressing the spatial transformation of either the dependent variable in the spatial lag model, or the disturbance in the spatial error model. Unlike the time series case, the logarithm of the determinant of the tex2html_wrap_inline971 asymmetric matrix tex2html_wrap_inline973 or tex2html_wrap_inline975 does not tend to zero with increasing sample size; it constrains the parameter values to their feasible range between the inverses of the smallest and largest eigenvalues of tex2html_wrap_inline821 , since for positive autocorrelation, as tex2html_wrap_inline979 , tex2html_wrap_inline981 , and analogously for tex2html_wrap_inline947 . The log-likelihood function for the spatial lag model is:

eqnarray256

and tex2html_wrap_inline985 , where tex2html_wrap_inline987 is the ML estimate, and for the spatial error model:

eqnarray288

To complete the model, the variance-covariance matrix of the parameters needs to be estimated. In many cases it is approximated numerically following non-linear optimization of the likelihood function, but SpaceStat derives its estimates of the asymptotic standard errors analytically (Anselin, 1995b, Anselin and Hudak, 1992). For larger N, this can take considerable time, requiring the inversion of an tex2html_wrap_inline991 matrix.

As Pace and Barry (1997, 1997b, 1997c) have conclusively demonstrated, a feasible solution to modelling situations with large N is to exploit the sparse nature of the spatial weighting matrix, both saving memory and making computation practical in reasonable time without supercomputer resources. They were able to compute results for a model of the median price of dwellings over all the 20,640 block groups in California from census data, improving the fit of the model over OLS results, halving the median absolute residual, finding a highly significant spatial lag coefficient estimate, and recording several significant sign changes among the independent variables (1997b). They also provide a profile likelihood solution to the calculation of coefficient estimate standard errors, avoiding the computation of the information matrix.

Hepple (1995, 1995b), LeSage and Pan (1995), and LeSage (1997) propose the widening of spatial econometrics to include Bayesian techniques, not least because of the information that this yields around the specific point estimates reached in standard modelling. Pinkse and Slade (1996) and Dubin (1997) have begun work on the application of spatial econometric techniques to discrete-choice models, noting that non-spherical disturbances are extremely difficult to handle in the limited dependent variable context. Pinkse and Slade are concerned to be able to detect spatial clustering or dispersion of in retail gasoline contract types across branded service stations in Vancouver, while Dubin models the behaviour of automobile dealers.

Simply in order to give a flavour of the kinds of issues involved, I will briefly run through one of the standard examples, first analysed in this context by Hepple (1976).

   table309
Table 4: Modelling used car prices in 1960, 49 U.S. states, (t-values in parentheses).

Hanna (1966) proposed that the 1960 value of 1955-9 used cars would be higher in states that had higher sales taxes and/or higher transport charges added to the price of new vehicles, a hypothesis confirmed by his ordinary least squares results (Table 4). Hepple (1976) used Hanna's study to illustrate the effects of error dependence in regression modelling, and demonstrated that this finding was spurious. The price variable is significantly autocorrelated (the standard variate of Moran's I is 8.07 under randomisation, prob. < 0.001), as is the least squares error term (Moran's I = 4.25, prob. < 0.001). Hepple drew the conclusion that the problem was in the error term, not least because at that time other tests were not available.

Testing the OLS model using the standard Lagrange multiplier tests gives highly significant results for both of the alternative specifications, but using the new LM tests accommodating the alternative non-zero nuisance parameters yields values of 8.42 for the test for an underlying spatial lag model ( tex2html_wrap_inline1003 with 1 d.f., prob. = 0.004), and of 0.035 for an underlying spatial error model (prob. = 0.851). A likelihood ratio test between the estimated spatial lag and spatial error models just fails to find in favour of the spatial lag model (LR = 1.80, prob. = 0.121).

The consequences of taking spatial dependence into account are quite clear. The error variance of the two spatial models is much smaller than that of the least squares regression estimates, and the proportion of the variance in used car prices explained has risen from a quarter to three quarters. The coefficient of the cost variable is no longer significant at the tex2html_wrap_inline1005 = 0.05 level. Perhaps unsurprisingly, the spatial lag tex2html_wrap_inline957 and error tex2html_wrap_inline947 coefficient estimates are highly significant. Were we to prefer the spatial lag model, we could interpret the results to indicate that tex2html_wrap_inline957 represents the influence of the average price in contiguous states, indicating that price setting involves the comparison of prices across state lines. From the final column in Table 4, we see that the residual variance of the autoregressive model, dropping the tax/charges variable altogether, does very nearly as well as the spatial error model, and indeed the LR test to differentiate between the autoregression and the spatial lag model does not come down strongly for the latter (LR = 2.60, prob. = 0.067).


next up previous
Next: Geographically weighted regression Up: Spatial econometrics and lattice Previous: Specification testing

Roger Bivand
Fri Mar 5 08:30:34 CET 1999