The fortune of ordinary least squares is built on the myth that the disturbances are normally distributed. For example, when the shape of the distribution is unknown because too few measurements have been taken, there may then be a temptation to assume that a normal distribution is applicable. A good reason for the prevalence of normality among real-world distributions is the fact that the least squares is the best unbiased estimator under normality while its relative efficiency deteriorates when the error distribution is asymmetric or has a finite lower and/or upper bound or has heavier tails then the normal. Nevertheless, natural phenomena often produce departures from normality (outliers that appear in one side of the data provide a clear indication that errors are not normally distributed) and many recent findings suggest that the most commonly used methods of estimation exhibit varying degrees of nonrobustness to certain violations of the normality assumption. The solution is to develop efficient estimators of coefficients in multiple linear regression when the underlying distribution is non-normal. Naturally, one would prefer a model of the error distribution that can assume a wide variety of curve shapes (normal included) and uses only one general formula over the entire range of data. According to the line of thought indicated by Parzen (1979}, we believe that a such a result can be more usefully obtained in the quantile plane [p, Q(p)] rather than in the distribution plane [x,F(x)]. In particular, we intend to implement a least squares regression procedure based on a distributional approach in which the stochastic component is parametrically defined by a five-parameter version of the generalized lambda distribution (FPLD). The FPLD is of practical relevance in many fields and applications since it has been found to adapt to a wide variety of theoretical and practical distributions. Hence, it is useful for the representation of data when the underlying model is unknown because it avoids the need to make an a priori choice among the embedded cases. One unrealistic assumption underlying standard use of the regression method is that each point in one side of the regression plan provides equally precise information about the deterministic part of the response variable. Elementary accounts of statistical methods commonly give little attention to the possibility that the experimental values analysed may not all be equally reliable. This is a simplification that we cannot afford in many practical applications. In situations when it may not be reasonable to assume that every observation should be treated equally, a weighting system can often be used to maximize the efficiency of parameter estimation. In this paper a new approach is proposed in order to obtain a weighting scheme for the observations of the data set and, simultaneously, an Lp estimation of the regression parameters by the iteratively reweighted least squares method.

Weighted Distributional Lp Estimates

TARSITANO, Agostino
2010-01-01

Abstract

The fortune of ordinary least squares is built on the myth that the disturbances are normally distributed. For example, when the shape of the distribution is unknown because too few measurements have been taken, there may then be a temptation to assume that a normal distribution is applicable. A good reason for the prevalence of normality among real-world distributions is the fact that the least squares is the best unbiased estimator under normality while its relative efficiency deteriorates when the error distribution is asymmetric or has a finite lower and/or upper bound or has heavier tails then the normal. Nevertheless, natural phenomena often produce departures from normality (outliers that appear in one side of the data provide a clear indication that errors are not normally distributed) and many recent findings suggest that the most commonly used methods of estimation exhibit varying degrees of nonrobustness to certain violations of the normality assumption. The solution is to develop efficient estimators of coefficients in multiple linear regression when the underlying distribution is non-normal. Naturally, one would prefer a model of the error distribution that can assume a wide variety of curve shapes (normal included) and uses only one general formula over the entire range of data. According to the line of thought indicated by Parzen (1979}, we believe that a such a result can be more usefully obtained in the quantile plane [p, Q(p)] rather than in the distribution plane [x,F(x)]. In particular, we intend to implement a least squares regression procedure based on a distributional approach in which the stochastic component is parametrically defined by a five-parameter version of the generalized lambda distribution (FPLD). The FPLD is of practical relevance in many fields and applications since it has been found to adapt to a wide variety of theoretical and practical distributions. Hence, it is useful for the representation of data when the underlying model is unknown because it avoids the need to make an a priori choice among the embedded cases. One unrealistic assumption underlying standard use of the regression method is that each point in one side of the regression plan provides equally precise information about the deterministic part of the response variable. Elementary accounts of statistical methods commonly give little attention to the possibility that the experimental values analysed may not all be equally reliable. This is a simplification that we cannot afford in many practical applications. In situations when it may not be reasonable to assume that every observation should be treated equally, a weighting system can often be used to maximize the efficiency of parameter estimation. In this paper a new approach is proposed in order to obtain a weighting scheme for the observations of the data set and, simultaneously, an Lp estimation of the regression parameters by the iteratively reweighted least squares method.
2010
9781584887119
Quantile approach; Lp regression; non gaussian error distributions
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/151177
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact