Process Help

STATS - compute summary parametric statistics on multiple numeric fields

 

Process Name

Menu Path

Link to Command Table

STATS

Command line only

Click here

 

Introduction

Calculates general summary parametric statistics on numeric fields in a file.

How to use

Individual fields for statistics may be selected using either the *F1, *F2, etc fields or may be specified in the &FIELDLST file. If no fields are selected then statistics will be calculated for all fields.

Ten optional keyfields are provided. If no keyfields are specified then a single set of statistics will be calculated for all data. If keyfields are specified and parameter KEYSORT=1 or the input file is already sorted by keyfield then statistics will be calculated for each unique combination of key values.  If keyfields are specified and parameter KEYSORT=0 and the input file is not sorted by keyfield then data will be read from the input file until the value of one of the keyfields changes and the statistics will then be calculated for that data subset.

A limit of 256 fields is imposed. If more than 256 fields exist in &IN, the process will not complete.

An optional weighting field (*WEIGHT) is available to weight the sample data. For example in a desurveyed drillhole file the LENGTH field could be used as the weighting field to give length weighted grades.

note.gif (1017 bytes)

When calculating MAD and percentile statistics, a specified WEIGHT field is ignored.

 

The variance and other moments are calculated using the large sample method i.e. for the variance a divisor of N is used, where N is the number of samples.

The following statistics are calculated for each numeric variable :-

  • total number of records in the file that meet retrieval criteria, if specified
  • number of samples (excluding absent data).
  • number of absent data values.
  • minimum, maximum, range and mid-range.
  • total and mean.
  • variance, standard deviation and standard error.
  • skewness and kurtosis.
  • geometric mean.
  • Sum, mean and variance of natural logs.
  • log estimate of mean.
  • coefficient of variation in percent.
  • number of records equal to zero.
  • number of negative value.

These results are displayed in the Command window and can also be saved to an optional &OUT file..

If the PCNTILES parameter is set to 1 then the 5, 10, 20, 25, 30, 40, 50, 60, 75, 80, 90 and 95 percentiles and the Median Absolute Deviation are also calculated. Processing will take longer. If this option is selected it is helpful to specify only the fields for which you wish to calculate the statistics.  The Weight field is not used when calculating percentiles.

Missing Values

When the STATS process is run with retrieval criteria, data that is excluded by those criteria will not be reported.  This is a change from previous versions which classified excluded data as "Missing Values".

The following fields are also output:

  • NSAMPLES: The number of samples in the chosen numeric (F1 etc) fields that are non-absent and used to calculate statistics.
  • NMISVALS: The number of missing records in the chosen numeric (F1 etc) fields. Samples are classified as missing if they have an absent value.
  • NUMTRACE: The number of samples in the chosen numeric (F1 etc) fields that are equal to the TRACE value
  • EQUAL0: The number of records in the chosen numeric (F1 etc) fields that contain values that = 0.
  • NEGATIVE: The number of records in the chosen numeric (F1 etc) fields that contain negative values.
More information about this process can be found on Datamine's online Knowledge Base:https://datamine.freshdesk.com/en/support/solutions/articles/19000084724-stats-for-statistical-calculations-in-a-datamine-file

 


Files, Fields and Parameters

Input Files

Name

Description

I/O Status

Required

Type

IN

Input file.

Input

Yes

Table

FIELDLST

File to supply selected fields.

Input

No

Undefined

Output Files

Name

I/O Status

Required

Type

Description

OUT

Output

No

Table

Output file. This will contain the fields:

  • FIELD: field for which statistics have been calculated
  • NRECORDS: total number of records in the input data (including those excluded by retrieval criteria)
  • NSAMPLES: number of records (samples) used to calculate the statistics
  • NMISVALS: number of missing values - either absent or excluded by retrieval criteria
  • NUMTRACE: number of samples equal to TRACE
  • MINIMUM: minimum sample value
  • MAXIMUM: maximum sample value
  • RANGE: range of the sample values
  • TOTAL: sum of the sample values
  • MEAN: mean of sample values
  • VARIANCE: variance of the sample values (Absent if fewer than two sample values)
  • STANDDEV: standard deviation of the input sample values
  • STANDERR: standard error of the input sample values
  • SKEWNESS:  skewness of the sample values
  • KURTOSIS: kurtosis of the sample values
  • GEOMEAN: geometric mean of the input sample values
  • SUMLOG: sum of the sample log values
  • MEANLOG: mean of the sample log values
  • LOGVAR: variance of the sample log values (If more than one sample value)
  • LOGESTM: log estimate of the mean
  • COVARTN%: coefficient of variation in percent
  • MIDRANGE: value mid-way between the minimum and maximum values
  • EQUAL0: number of samples with a value of zero
  • NEGATIVE: number of negative samples
  • WGTFIELD: weight field used
  • DATAFILE: name of the input data file

If keyfields have been specified then they will also be included. There will be one record for each numeric field for every combination of keyfields.

Fields

Name

Description

Source

Required

Type

Default

F1

First field for statistics. If no fields are specified then all fields will be used.

IN

No

Numeric

Undefined

F2

Second field for statistics.

IN

No

Numeric

Undefined

F3

Third field for statistics.

IN

No

Numeric

Undefined

F4

Fourth field for statistics.

IN

No

Numeric

Undefined

F5

Fifth field for statistics.

IN

No

Numeric

Undefined

F6

Sixth field for statistics.

IN

No

Numeric

Undefined

F7

Seventh field for statistics.

IN

No

Numeric

Undefined

F8

Eighth field for statistics.

IN

No

Numeric

Undefined

F9

Nineth field for statistics.

IN

No

Numeric

Undefined

F10

Tenth field for statistics.

IN

No

Numeric

Undefined

F11

Eleventh field for statistics.

IN

No

Numeric

Undefined

F12

Twelfth field for statistics.

IN

No

Numeric

Undefined

F13

Thirteenth field for statistics.

IN

No

Numeric

Undefined

F14

Fourteenth field for statistics.

IN

No

Numeric

Undefined

F15

Fifteenth field for statistics.

IN

No

Numeric

Undefined

F16

Sixteenth field for statistics.

IN

No

Numeric

Undefined

F17

Seventeenth field for statistics.

IN

No

Numeric

Undefined

F18

Eighteenth field for statistics.

IN

No

Numeric

Undefined

F19

Nineteenth field for statistics.

IN

No

Numeric

Undefined

F20

Twentieth field for statistics.

IN

No

Numeric

Undefined

FIELDNAM

Field in FIELDLST to identify selected fields.

FIELDLST

No

Character

Undefined

KEY1

Keyfield 1 for statistics.

IN

No

Any

Undefined

KEY2

Keyfield 2 for statistics.

IN

No

Any

Undefined

KEY3

Keyfield 3 for statistics.

IN

No

Any

Undefined

KEY4

Keyfield 4 for statistics.

IN

No

Any

Undefined

KEY5

Keyfield 5 for statistics.

IN

No

Any

Undefined

KEY6

Keyfield 6 for statistics.

IN

No

Any

Undefined

KEY7

Keyfield 7 for statistics.

IN

No

Any

Undefined

KEY8

Keyfield 8 for statistics.

IN

No

Any

Undefined

KEY9

Keyfield 9 for statistics.

IN

No

Any

Undefined

KEY10

Keyfield 10 for statistics.

IN

No

Any

Undefined

WEIGHT

Weighting field.

IN

No

Numeric

Undefined

Parameters

Name

Description

Required

Default

Range

Values

KEYSORT

Set to 1 to automatically sort the data by key field. Only relevant if any key fields have been defined.

 =0 : Do not automatically sort by key fields. Use the record order of the input file to determine changes in key field values.

 =1 : Automatically sort the input data by key fields.

No

No

Numeric

Undefined

KEYTOL

The tolerance used to test whether numeric keyfields are equal. All key values are rounded to an integer multiple of this value. If set to zero then rounding will not be used.

No

0.00001

0,+

Undefined

PCNTILES

Set to 1 to calculate percentiles. When calculating percentiles the process will take longer to run. If this option is selected it is useful to specify only the fields for which you wish to calculate the statistics. If this option is selected the Median Absolute Deviation (MAD) value is also calculated.

 =0 : Do not calculate percentiles. Do not calculate the Median Absolute Deviation.

 =1 : Calculate the 5, 10, 20, 25, 30, 40, 50, 60, 75, 80, 90 and 95 percentiles and the Median Absolute Deviation.

No

0

0,1

0,1

SORTOUT

Set to 1 to sort the output file by FIELD when key fields are being used. Sorting by FIELD makes it easier to compare values of variables across key fields when viewing the output file in the table editor.

 =0 : Do not sort the output file.

 =1 : Sort the output file by FIELD.

No

1

0,1

0,1

PRINT

 Print flag. Default (2).

 0: minimum output.

 1: minimum output plus keyfield progress list.

 2: full output including stats for each keyfield group.

No

2

0-2

0,1,2


Notes

By default, statistics are calculated for all numeric variables. For example; in a typical drillhole data file containing sample co-ordinates, statistics will be calculated for both the values and the co-ordinates. The first bin in the histogram plot contains all values up to MINIMUM. The last bin contains all values above the top value. The log statistics are based on all sample values greater than, but not equal to, the system trace value.

Values of skewness and kurtosis calculated are interpreted as:

SKEWNESS

= 0. No distortion (Gaussian).

 

> 0. Positive skew (to the right).

 

< 0. Negative skew (to the left).

KURTOSIS

= 0. Mesokurtic (Gaussian).

 

> 0. Leptokurtic (peaked).

 

< 0. Platikurtic (flat).

 


Example

!STATS

&IN(ASSAYS), &OUT(DHSTATS), *F1(AU), *F2(AG), *F3(CU),
*WEIGHT(LENGTH)

 


Error and Warning Messages

Message

Description

Solution

>>> ERR 121 <<< ( fileno) IN STATS

File read error. Fatal; the process is exited.

Check the values of the fields in the specified &IN file.

 

 

 

>>> ERR 122 <<< ( fileno) IN STATS

 No numeric fields in file, or fields specified were not numeric. Fatal; the process is exited.

In the &IN file, check that the specified *Fn fields are numeric; check that the file contains numeric fields.