An Asymmetric Bimodal Distribution with Application to Quantile Regression / Julio 2019 / Diego I. Gallardo, Emilio Gómez-Déniz, Héctor W. Gómez, Osvaldo Venegas, Yolanda M. Gómez

Abstract:

It frequently occurs in real life that we find continuous data that are bimodal; these cannot be modeled by known unimodal distributions. It is therefore of interest to investigate more flexible distributions in modes that will be useful for professionals working in different areas of knowledge.In unimodal distributions, the flexibility is based on the asymmetry and kurtosis of the data. In this context, Azzalini [1] introduced the skew-normal (SN) distribution, with asymmetry parameter λ. It has a probability density function (pdf) given byf(y;μ,σ,λ)=2σϕ(y−μσ)Φ(λ(y−μ)σ),y,μ,λ∈R,σ>0,(1)where ϕ and Φ denote, respectively, the density and cumulative distribution functions of the N(0,1) distribution. This is denoted as Y∼SN(λ). SN(0) becomes the standard normal distribution.Bimodal distributions generated from skew distributions can be found in Ma and Genton [2], Kim [3], Lin et al. [4,5], Elal-Olivero et al. [6], Arnold et al. [7], Arnold et al. [8], and Venegas et al. [9], among others. The importance of studying these distributions is based on the fact that they do not have identifiability problems and can be used as alternative parametric models to replace the use of mixtures of distributions that present estimation problems from either the classical or the Bayesian point of view (see McLachlan and Peel [10]; Marin et al. [11]). One difficulty with these distributions is that in general, there is no closed-form expression for their cumulative distribution function (cdf). This makes it more difficult to generate data from these distributions for simulation studies or to carry out quantile regression. Additionally, many such bimodal distributions have complicated expressions for a general quantile (say, the q-th).A variety of bimodal data sets and appropriate models have been presented by many authors. For example, Cobb et al. [12] used the quartic exponential density presented by Fisher [13] to model crude birth rates data; Rao et al. [14] used a bimodal distribution to analyze fish length data; Famoye et al. [15] used the beta-normal distribution to analyze egg diameter data; Everitt and Hand [16] discussed some mixture distributions for modeling bimodal data; Chatterjee et al. [17] and Weisberg [18] presented two bimodal data sets on the eruption and interruption times of the Old Faithful geyser; Bansal et al. [19] discussed the bimodality of quantum dot size distribution; Famoye et al. [15] cited a variety of bimodal distributions that arise from different areas of science. On the other hand, the sinh Cauchy (SC) distribution is given byf(z;Λ)=λcosh(z)σπ(1+{λsinh(z)}2),where Λ=(λ,μ,σ), z=y−μσ, z∈R, μ∈R is a location parameter, σ>0 is a scale parameter, and λ>0 is a symmetric parameter. The SC distribution produces unimodal and bimodal densities. The disadvantage of the SC distribution is that it is symmetric, which limits it to modeling only symmetric bimodal data. The main objective of this article is therefore to study a bimodal skew-symmetric model with closed cdf, in order to apply it to quantile regression. To do this, we used an extension of the SC distribution that we call the gamma–sinh Cauchy (GSC) distribution, which presents flexibility in its modes and also closed-form expression in its cdf. The GSC distribution belongs to the (gamma-G generator) family introduced by Zografos and Balakrishnan [20]. For any baseline cdf G(y;Λ), x∈R, they defined the gamma-G generator by the pdf and cdf given byf(y;ϕ,Λ)=g(y;Λ)Γ(ϕ){−log[1−G(y;Λ)]}ϕ−1,(2)andF(y;ϕ,Λ)=γ(−log[1−G(y;Λ)],ϕ)Γ(ϕ)=1Γ(ϕ)∫−log[1−G(y;Λ)]0uϕ−1e−udu,(3)respectively, where ϕ>0 is a skewness parameter, Λ is a vector of parameters, g(y)=ddyG(y), γ(y,a)=∫y0ta−1e−tdt is the incomplete gamma function, and Γ(a)=γ(+∞,a) is the usual gamma function. We remark that in the literature, there are many models that can accommodate bimodal distributions. However, in only a few of them do the parameters have an interpretation in terms of measures of central tendency (mean, median, for instance) or a general q-th quantile. As we will show in Section 3, the main advantage of the GSC is that the location parameter represents the respective q-th quantile under a certain restriction over ϕ, which is very convenient for the use of this model in a quantile regression framework.The paper is organized as follows. Section 2 develops the GSC distribution, its basic properties, and quantile regression. In Section 3, we perform a small-scale simulation study of the maximum likelihood (ML) estimators for parameters. Two applications to real data are discussed in Section 4, which illustrate the usefulness of the proposed model. Finally, conclusions are given in Section 5.

https://www.mdpi.com/2073-8994/11/7/899

Diego I. Gallardo, Emilio Gómez-Déniz, Héctor W. Gómez, Osvaldo Venegas, Yolanda M. Gómez

DOI:

In this article, we study an extension of the sinh Cauchy model in order to obtain asymmetric bimodality. The behavior of the distribution may be either unimodal or bimodal. We calculate its cumulative distribution function and use it to carry out quantile regression. We calculate the maximum likelihood estimators and carry out a simulation study. Two applications are analyzed based on real data to illustrate the flexibility of the distribution for modeling unimodal and bimodal data.

Otras publicaciones