Process Help STATS - compute summary parametric statistics on multiple numeric fields |
Process Name |
Menu Path |
Link to Command Table |
STATS |
Command line only |
Introduction
Calculates general summary parametric statistics on numeric fields in a file.
How to use
Individual fields for statistics may be selected using either the *F1, *F2, etc fields or may be specified in the &FIELDLST file. If no fields are selected then statistics will be calculated for all fields.
Ten optional keyfields are provided. If no keyfields are specified then a single set of statistics will be calculated for all data. If keyfields are specified and parameter KEYSORT=1 or the input file is already sorted by keyfield then statistics will be calculated for each unique combination of key values. If keyfields are specified and parameter KEYSORT=0 and the input file is not sorted by keyfield then data will be read from the input file until the value of one of the keyfields changes and the statistics will then be calculated for that data subset.
A limit of 256 fields is imposed. If more than 256 fields exist in &IN, the process will not complete.
An optional weighting field (*WEIGHT) is available to weight the sample data. For example in a desurveyed drillhole file the LENGTH field could be used as the weighting field to give length weighted grades.
When calculating MAD and percentile statistics, a specified WEIGHT field is ignored. |
The variance and other moments are calculated using the large sample method i.e. for the variance a divisor of N is used, where N is the number of samples.
The following statistics are calculated for each numeric variable :-
- total number of records in the file that meet retrieval criteria, if specified
- number of samples (excluding absent data).
- number of absent data values.
- minimum, maximum, range and mid-range.
- total and mean.
- variance, standard deviation and standard error.
- skewness and kurtosis.
- geometric mean.
- Sum, mean and variance of natural logs.
- log estimate of mean.
- coefficient of variation in percent.
- number of records equal to zero.
- number of negative value.
These results are displayed in the Command window and can also be saved to an optional &OUT file..
If the PCNTILES parameter is set to 1 then the 5, 10, 20, 25, 30, 40, 50, 60, 75, 80, 90 and 95 percentiles and the Median Absolute Deviation are also calculated. Processing will take longer. If this option is selected it is helpful to specify only the fields for which you wish to calculate the statistics. The Weight field is not used when calculating percentiles.
Missing Values
When the STATS process is run with retrieval criteria, data that is excluded by those criteria will not be reported. This is a change from previous versions which classified excluded data as "Missing Values".
The following fields are also output:
- NSAMPLES: The number of samples in the chosen numeric (F1 etc) fields that are non-absent and used to calculate statistics.
- NMISVALS: The number of missing records in the chosen numeric (F1 etc) fields. Samples are classified as missing if they have an absent value.
- NUMTRACE: The number of samples in the chosen numeric (F1 etc) fields that are equal to the TRACE value
- EQUAL0: The number of records in the chosen numeric (F1 etc) fields that contain values that = 0.
- NEGATIVE: The number of records in the chosen numeric (F1 etc) fields that contain negative values.
More information about this process can be found on Datamine's online Knowledge Base:https://datamine.freshdesk.com/en/support/solutions/articles/19000084724-stats-for-statistical-calculations-in-a-datamine-file |
Files, Fields and Parameters
Input Files
Name |
Description |
I/O Status |
Required |
Type |
IN |
Input file. |
Input |
Yes |
Table |
FIELDLST |
File to supply selected fields. |
Input |
No |
Undefined |
Output Files
Name |
I/O Status |
Required |
Type |
Description |
OUT |
Output |
No |
Table |
Output file. This will contain the fields:
If keyfields have been specified then they will also be included. There will be one record for each numeric field for every combination of keyfields. |
Fields
Name |
Description |
Source |
Required |
Type |
Default |
F1 |
First field for statistics. If no fields are specified then all fields will be used. |
IN |
No |
Numeric |
Undefined |
F2 |
Second field for statistics. |
IN |
No |
Numeric |
Undefined |
F3 |
Third field for statistics. |
IN |
No |
Numeric |
Undefined |
F4 |
Fourth field for statistics. |
IN |
No |
Numeric |
Undefined |
F5 |
Fifth field for statistics. |
IN |
No |
Numeric |
Undefined |
F6 |
Sixth field for statistics. |
IN |
No |
Numeric |
Undefined |
F7 |
Seventh field for statistics. |
IN |
No |
Numeric |
Undefined |
F8 |
Eighth field for statistics. |
IN |
No |
Numeric |
Undefined |
F9 |
Nineth field for statistics. |
IN |
No |
Numeric |
Undefined |
F10 |
Tenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F11 |
Eleventh field for statistics. |
IN |
No |
Numeric |
Undefined |
F12 |
Twelfth field for statistics. |
IN |
No |
Numeric |
Undefined |
F13 |
Thirteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F14 |
Fourteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F15 |
Fifteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F16 |
Sixteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F17 |
Seventeenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F18 |
Eighteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F19 |
Nineteenth field for statistics. |
IN |
No |
Numeric |
Undefined |
F20 |
Twentieth field for statistics. |
IN |
No |
Numeric |
Undefined |
FIELDNAM |
Field in FIELDLST to identify selected fields. |
FIELDLST |
No |
Character |
Undefined |
KEY1 |
Keyfield 1 for statistics. |
IN |
No |
Any |
Undefined |
KEY2 |
Keyfield 2 for statistics. |
IN |
No |
Any |
Undefined |
KEY3 |
Keyfield 3 for statistics. |
IN |
No |
Any |
Undefined |
KEY4 |
Keyfield 4 for statistics. |
IN |
No |
Any |
Undefined |
KEY5 |
Keyfield 5 for statistics. |
IN |
No |
Any |
Undefined |
KEY6 |
Keyfield 6 for statistics. |
IN |
No |
Any |
Undefined |
KEY7 |
Keyfield 7 for statistics. |
IN |
No |
Any |
Undefined |
KEY8 |
Keyfield 8 for statistics. |
IN |
No |
Any |
Undefined |
KEY9 |
Keyfield 9 for statistics. |
IN |
No |
Any |
Undefined |
KEY10 |
Keyfield 10 for statistics. |
IN |
No |
Any |
Undefined |
WEIGHT |
Weighting field. |
IN |
No |
Numeric |
Undefined |
Parameters
Name |
Description |
Required |
Default |
Range |
Values |
KEYSORT |
Set to 1 to automatically sort the data by key field. Only relevant if any key fields have been defined. =0 : Do not automatically sort by key fields. Use the record order of the input file to determine changes in key field values. =1 : Automatically sort the input data by key fields. |
No |
No |
Numeric |
Undefined |
KEYTOL |
The tolerance used to test whether numeric keyfields are equal. All key values are rounded to an integer multiple of this value. If set to zero then rounding will not be used. |
No |
0.00001 |
0,+ |
Undefined |
PCNTILES |
Set to 1 to calculate percentiles. When calculating percentiles the process will take longer to run. If this option is selected it is useful to specify only the fields for which you wish to calculate the statistics. If this option is selected the Median Absolute Deviation (MAD) value is also calculated. =0 : Do not calculate percentiles. Do not calculate the Median Absolute Deviation. =1 : Calculate the 5, 10, 20, 25, 30, 40, 50, 60, 75, 80, 90 and 95 percentiles and the Median Absolute Deviation. |
No |
0 |
0,1 |
0,1 |
SORTOUT |
Set to 1 to sort the output file by FIELD when key fields are being used. Sorting by FIELD makes it easier to compare values of variables across key fields when viewing the output file in the table editor. =0 : Do not sort the output file. =1 : Sort the output file by FIELD. |
No |
1 |
0,1 |
0,1 |
|
Print flag. Default (2). 0: minimum output. 1: minimum output plus keyfield progress list. 2: full output including stats for each keyfield group. |
No |
2 |
0-2 |
0,1,2 |
Notes
By default, statistics are calculated for all numeric variables. For example; in a typical drillhole data file containing sample co-ordinates, statistics will be calculated for both the values and the co-ordinates. The first bin in the histogram plot contains all values up to MINIMUM. The last bin contains all values above the top value. The log statistics are based on all sample values greater than, but not equal to, the system trace value.
Values of skewness and kurtosis calculated are interpreted as:
SKEWNESS |
= 0. No distortion (Gaussian). |
|
> 0. Positive skew (to the right). |
|
< 0. Negative skew (to the left). |
KURTOSIS |
= 0. Mesokurtic (Gaussian). |
|
> 0. Leptokurtic (peaked). |
|
< 0. Platikurtic (flat). |
Example
!STATS
|
|
Error and Warning Messages
Message |
Description |
Solution |
>>> ERR 121 <<< ( fileno) IN STATS |
File read error. Fatal; the process is exited. |
Check the values of the fields in the specified &IN file. |
|
|
|
>>> ERR 122 <<< ( fileno) IN STATS |
No numeric fields in file, or fields specified were not numeric. Fatal; the process is exited. |
In the &IN file, check that the specified *Fn fields are numeric; check that the file contains numeric fields. |