Home >> Science & Nature >> Science

Introduction to Kernel Density Estimation

Kernel Density Estimation is a statistical method for representing a range of data. Related to histograms, Kernel Density Estimation offers a way to estimate the distribution of a variable in population. The method is relatively sophisticated but the results a visual interpretation of a variable's probable density, in other words, the frequency with which a variable appears in a population.

Uses
- Kernel Density Estimation estimates shape of a density function. A density function shows frequency with which a variable appears in a random sampling of a population. The Kernel Density Estimation is considered a non-parametric method. In statistics, there are parametric and non-parametric methods. Parametric methods make more assumptions than non-parametric ones. No assumptions about distribution, means, or standard deviations are needed in non-parametric statistics. For example, if you wanted to know whether the tenth test in a classroom would have a higher score than the first nine, in parametric reasoning you would have to know the mean and standard deviation to derive an answer. In non-parametric reasoning, simply knowing the number of test is enough to know the last test has a 10 percent chance of being above the previous scores.
Kernel
- The Kernel Density Estimation has two crucial components: the kernel and the bandwidth. The Kernel is the density function. There are six common kinds of density functions in non-parametric statistics: normal, uniform, triangular, Epanechnikov, quartic, triweight and cosine. Each of these functions is used to estimate the frequency of a random variable in a population.
Bandwidth
- The second component, the bandwidth, smooths out the resulting data from the density function of the kernel. The bandwidth, therefore, has strongly impacts the visual representation of the data. A jagged line can become progressively smoothed until the data has been so paraphrased that it is no longer useful. In the Kernel density estimation formula, the bandwidth is represented by the letter h. It must be positive and result in a distribution that sums to one.
Advantages
- Kernel Density Estimation has advantages to other non-parametric estimating methods, especially histograms. Histograms represent the distribution of a variable in bins along a horizontal range. Stacked bins represent a greater density of the variable in the sector of the data. Because histograms symbolize data through bins, the variable is compartmentalized and different distributions are jagged and discrete, misrepresenting the fluid distribution of a variable that really exists in a population. Kernel Density Estimation better represents this fluidity with smooth line, whose smoothness is determined by the bandwidth chosen in the kernel density formula.

Science