Scatter Plots - Statistics
To access this screen:
-
In the Scatter Plots screen, select a chart from the scatter plot thumbnails pane, select the Statistics tab.
The Statistics tab is used to view the selected chart's summary statistics and define which statistics are displayed on the chart.
To configure these settings:
-
Check Display Parameters Graphically to add vertical lines to the graph for selected statistics such as the mean, geometric mean, log estimate mean, and selected percentile positions.
-
When this option is active, choose a color for each graphical statistic that you want to display.
-
-
Check the statistics that you want to display in the chart statistics box.
-
Check Column Header Rows to include the NAME, X Axis, Y Axis, and DECIMALS column headers.
-
Use the DECIMALS column to control the number of decimal places shown for each statistic.
-
-
Review the preview after each selection. The preview updates automatically with the relevant information.
-
Drag the displayed statistics if you want to reposition them on the chart. By default, statistics are positioned at the top left of the chart.
Available Statistics
Use the descriptions below to decide which statistics to display for the selected scatter plot chart.
-
Total Records: the total number of data records for the selected object or file. This includes all keys and records with absent data values.
-
Total Samples: the total number of samples used to create the current chart. This takes into account any key fields that have been specified for the chart, but does not include records with absent Value or Weight fields.
-
No. of Missing Values: the number of samples not used to create the current chart. This is the difference between Total Records and Total Samples.
-
No. of Values > Trace: the number of values greater than the trace value. The trace value is defined as 0.10E-29, so values greater than trace are effectively values greater than zero.
-
Maximum: the maximum value used to create the current chart.
-
Minimum: the minimum value used to create the current chart.
-
Range: the range of data values. This is equal to Maximum-Minimum.
-
Total: the sum total of all values used to create the current chart.
-
Mean: the mean of all values used to create the current chart.
-
Variance: the statistical variance of the values used to create the current chart. This is calculated as:
Variance = ∑( xi– ẍ)2/ n = [ ∑xi2– (∑xi)2/ n ] / n
where xi are sample values, ẍ is the mean of the samples and n is the number of samples.
-
Standard Deviation: the square root of the variance.
-
Standard Error: also known as the standard error of the mean. It is calculated as the Standard Deviation divided by the square root of Total Samples.
-
Coefficient of Variation: the ratio of the Standard Deviation to the Mean.
-
Skewness: a measure of the asymmetry of the probability distribution of a variable. A negative skewness indicates that the left tail is longer than the right. A positive skewness indicates the opposite. A Standard Normal distribution has a skewness of zero:
-
Kurtosis: a measure of the peakedness of the probability distribution. A high kurtosis distribution has a sharper peak and longer, thinner tails, while a low kurtosis distribution has a more rounded peak with wider shoulders. A Standard Normal distribution has a kurtosis of zero:
Image showing a high-kurtosis peak in red, and lower-kurtosis results in blue.
-
Geometric Mean: a type of average calculated by multiplying the n sample values together and then taking the nth root of the product.
-
Sum of logs: the sum of the logs, base e, of the sample values.
-
Mean of logs: the mean of the logs, base e, of the sample values.
-
Logarithmic Variance: the variance of the logs, base e, of the sample values.
-
Log Estimate of Mean: an estimate of the arithmetic mean of the samples, assuming a lognormal distribution.
-
Correlation Coefficient: a measure of the degree of linear correlation between two variables, here the Y Axis and X Axis field values. These two variables are said to be correlated if the scatter plot shows a significant rectilinear, or straight-line, trend. Correlation coefficient values range from -1, a straight line with negative slope, to 1, a straight line with positive slope. Both ends of this range indicate strong correlation between the variables; a lack of straight-line correlation is indicated by values close to zero.
The formula used to calculate the correlation coefficient (cc) is as follows:
cc = (N * ∑XY - ∑X*∑Y) / sqrt((N*∑XX - ∑X*∑X) * (N*∑YY - ∑Y*∑Y))
where:
-
N is the number of pairs.
-
∑X is the sum of the X values.
-
∑Y is the sum of the Y values.
-
∑XY is the sum of the product of X and Y.
-
∑XX is the sum of the product of X and X.
-
∑YY is the sum of the product of Y and Y.
-
-
5th ... 95th Percentile: the value of the variable, X Axis or Y Axis fields, below which the Nth percent of values fall. These percentile values are calculated separately for each of the Y Axis and X Axis values.
Changing the Formatting of Statistics
You can enhance the display of your scatter plot to show vertical lines on the graph for the mean, log estimate mean, geometric mean, and any selected percentile values. Use Display Parameters Graphically to enable or disable this display. If selected, vertical lines are displayed in the selected color.
Displaying the Statistics on the Chart
Selected statistics are shown above each chart, aligned to the left border, for example:
Select a check box on this panel to automatically update the preview with the relevant information. The values to be displayed are shown in the green columns in the table, formatted to the number of decimals shown in the DECIMALS column.
By default, the displayed statistics are positioned at the top left of the chart. You can reposition them by dragging them with the mouse cursor.
Related topics and activities:



