JOIN Process

To access this process:

  • Data ribbon >> Data Tools >> Relational >> Join.

See this process in the Command Table.

Process Overview

Performs a relational join, subset join, weave or subset weave of two input files into an output file. The type of join required is controlled by the values of the optional parameters @SUBSETR AND @SUBSETF.

This process belongs to a group of four similar ones within the Datamine process collection; JOIN, SUBJOI, WEAVE and SUBWVE. Each provides a different outcome, as described by the following diagram:

By default, with @SUBSETR=0 and @SUBSETF=0, JOIN writes out all records and all fields from both input files. Records are compared on the specified keyfields, and if a match is found, the two records are combined into one on the output file. If both files have identical Data Definitions then the record from the second input file is the one written out. Thus in this case the second file updates the first. If no match is found then records are written out with absent data codes for the missing fields.

  • With @SUBSETR=1 and @SUBSETF=0, a relational subset join is carried out. See process SUBJOI for further details.

  • With @SUBSETR=0 and @SUBSETF=1, a relational weave is carried out. See process WEAVE for further details.

  • With @SUBSETR=1 and @SUBSETF=1, a relational subset weave is carried out. See process SUBWVE for further details.

At least one keyfield must be specified and must appear in both input files as an explicit field. The keyfield may be up to 5 words long, and support is provided for up to 30 fields. If a field is specified which does not exist in both input files, it is ignored, providing at least one field matches.

Both input files must be sorted in the order of the keyfields before they can be joined. If this is not the case, the process will exit with an error message.

Input Files

Name

Description

I/O Status

Required

Type

IN1

First file to be updated (sorted on required keyfields).

Input

Yes

Table

IN2

Second file (update file) (sorted on required keyfields).

Input

Yes

Table

Output Files

Name

I/O Status

Required

Type

Description

OUT

Output

Yes

Table

Output file.

Fields

Name

Description

Source

Required

Type

Default

KEY1 to 30

Keyfields for matching. Up to 30 keyfields can be specified.

IN1, IN2

No

Any

Undefined

Parameters

Name

Description

Required

Default

Range

Values

SUBSETR

Controls whether all records or a subset are written to the output file. If set to (0) all records are written.

No

0

0,1

0,1

SUBSETF

Controls whether all fields or a subset is written to the output file. If set to (0) all fields are written. With SUBSETR and SUBSETF set to 0 JOIN writes out all records and all fields from both input files. With SUBSETR=1 and SUBSETF=0 a relational subset join is carried out. With SUBSETR=0 and SUBSETF=1 a relational weave is carried out. With SUBSETR=1 and SUBSETF=1 a relational subset weave is carried out.

No

0

0,1

0,1

CARTJOIN

If set to (0) and if no keyfields are specified the process will terminate with an error. If set to 1 the full Cartesian product is produced and written to the output file. No keyfields should be specified to produce the Cartesian product.

No

0

0,1

0,1

KEYTOL

KEYTOL is the tolerance value used to test whether numeric key values are equal. It must be greater than or equal to zero. It replaces the previous heuristic comparison method.

If KEYTOL is set to a negative value then zero is used.

In a macro KEYTOL can be set to absent using -. "@KEYTOL=-" This will revert to legacy behaviour and use a heuristic comparison in relational commands and zero in sort.

No

0.00001

0,+

Undefined

PRINT

>=1 Display messages on Data definitions. Default is (0)

No

0

0,1

0,1

Example

!JOIN     &IN1(ASSAYS),&IN2(COLLARS.S),&OUT(HOLES),*KEY1(BHID)


Error and Warning Messages

Message

Description

Solution

>>> INPUT FILE nn NOT SORTED ON KEY FIELD(S)

>>> POSITION IN FILE 1 IS nnnnnnnn

>>> POSITION IN FILE 2 IS nnnnnnnn

Fatal; the process is exited. The position numbers are the record numbers reached in each file.

 

>>> FILE ffffffff CANNOT BE USED AS BOTH <<<

>>> INPUT AND OUTPUT BY THIS PROCESS <<<

>>> ERR 130 <<< ( fileno) IN JOIN

Either the first or second input file has the same name as the output file. Fatal; the process is exited.

 

>>> ERR 47 <<< ( 0) IN FNDKEY

Warning; none of the specified key fields exist in the input files. The full Cartesian product is produced and written to the output file.

 

>>> KEYFIELD aaaaaaaa MISSING FROM FILE ffffffff

A warning message that is produced if @PRINT >=1. The keyfield is ignored and processing continues.

 

Related topics and activities