Averages
The most basic form of statistical analysis is the average. There are three kinds of average: mean, median, and mode. The mean is found by finding the sum of a set of data and dividing this sum by the number of data points. The median is found by all the data points in order of size and selecting the middle data point. The mode is simply the data value that appears most frequently. All three types of average are used in population genetics, but the most common is the mean.
Variance
The variance is a measure of how much the values in a set of data vary. The variance is symbolized by s^2, or a sigma squared symbol. The variance is calculated by finding the sum of the square of the differences between each value and the mean value, and then dividing by one less than the number of data points. An example of variance might involve the number of spots on a particular species of beetle. Assume there are five beetles in the set, and the number of spots on each beetle are 5, 5, 6, 6 and 8. The mean is:
(5 + 5 + 6 + 6 + 8)/5 = 6
The variance is calculated as follows:
[(5 - 6)^2 + (5 - 6)^2 + (6 - 6)^2 + (6 - 6)^2 + (8 - 6)^2]/(5 - 1) = 1.5
This gives us an indication of the spread of the values.
Standard Deviation
The standard deviation is another measure of spread. It is the square root of the variance. It has the advantage that it has the same units as the data set it is derived from. In the example given above the standard deviation is:
(1.5)^(0.5) = 1.22474
This indicates that the standard deviation in the number of spots on the population of beetles is approximately 1.22 spots.
Correlation
Correlation is a measure of the amount of association between two sets of data. In population genetics, correlations might be used to study the relationship between the possession of particular genes and occurrences of particular characteristics. Correlation is symbolized by an r with the subscript xy, where x and y are variables of the two sets of data. Correlations are always between -1 and 1. -1 means there is a negative correlation, 0 means there is no correlation, and 1 means there is a positive correlation.
Regression
Regression is a statistical tool used to find the relationship between two variables. The regression coefficient is given the symbol b with the subscript yx, where y and x are variables of the two sets of data. The regression coefficient measures the predicted change in a y variable per a unit change in variable x.
Genetic Disorders Nomenclature
When discussing genetic disorder, there are several technical terms that describe the pattern of the disorder in the general population. These terms include incidence, prevalence, mortality, and lifetime risk. The incidence of a genetic disorder is the proportion of people in the population with a particular genetic disorder. The prevalence of a genetic disorder is the number of people within a specific subgroup, such as a particular age group, who have a particular genetic disorder. Mortality refers to the number of people in a particular group who die from a particular disorder per year. An example of a mortality statistic might be "10 000 people in the United States died from syndrome X in 2010." Lifetime risk is the mean risk of developing a particular genetic disorder at some point in an individual's life.