Field Name Ambiguity in EXTRA

Technical Note: TN00452

Background

EXTRA is a general purpose EXpression TRAnslator that allows you to transform the contents of files by modifying fields and creating new ones based on the values of existing fields. EXTRA also makes it easy to calculate new fields in any file, based on existing fields and values.

EXTRA is available both as a separate process that can be run via a macro and interactively, and is an integral function of many Studio superprocesses. It is widely used and has a long history and is a prevalent feature of many customer macros worldwide.

There is a small chance that a subset of existing macro behaviours may change as a result of our recent enhancements to EXTRA. This document describes the problem we have solved, and how we have solved it, explaining the potential impact on a small subset of operational macros.

TL;DR

EXTRA now resolves function/procedure names before field names. As a result, if your input file contains a field whose name matches an EXTRA function/procedure (for example; type, field, erase, max, min), some existing transforms may fail or behave differently because an unbracketed token may be interpreted as a function/procedure rather than a field.

To fix affected macros and scripts, explicitly mark any such field references (and field creations) using square brackets, for example [type]=0 and [field]=.... Continue to call functions/procedures in the usual way (for example; type("BHID"), field("AU"), erase (). This matches the documented approach for disambiguating “field vs. function” usage.

What has Changed

Historically, EXTRA resolved ambiguous identifiers by treating an unqualified token as a field name first, and only then considering whether it might be a function or procedure. This meant that if a file contained a field called type, the token type(...) could be misinterpreted as a reference to the field type rather than the type() function (and therefore fail).

A recent change updates the resolver to check function/procedure names before field names. This improves consistency for function usage but introduces a compatibility risk: if your input file contains a field whose name matches a built-in function or procedure (for example; type, field, max, erase), then unbracketed uses of that name may now be interpreted as a function/procedure rather than a field.

Datamine’s published guidance already recommends using square brackets to explicitly mark field names where ambiguity exists (for example; [max] = max([max], grade)).

For example, if a file contains a field called “type”, then a statement like this will fail:

_TMP;a4 = type(“BHID”)

This is because EXTRA sees “type” as a field name before it checks if it is a function name. Studio software (including predecessors such as Studio 3) has always been like this.

You can see a full list of EXTRA function names in the online help system and via the Expression Translator screen's function lists, for example:

The Fix

Rather than prohibit the use of field names that match function names, EXTRA can be forced to recognize a field name if you enclose it in square brackets.

Wherever a field name could be confused with a function/procedure name, explicitly mark it as a field using square brackets:

• Use [fieldname] when referencing an existing field.

• Use [fieldname] when creating a new field whose name collides with a function/procedure name.

Consider the following transform:

[type]=0
spec1;a4=type("type")
spec2;a4=TYPE("type")
[erase]=0
erase()
  1. “[type]” forces “type” to be recognised as a field name. This is legacy behaviour.

  2. “type” is recognised as a function name, not a field.

  3. “TYPE” is recognised as a function name, not a field.

  4. Creates a field called “erase”.

  5. “erase” is recognised as a procedure name, not a field.

Summary: To avoid the problem of field and function name ambiguity, it is necessary to explicitly indicate the presence of a field name using square brackets. This is true regardless of how EXTRA is run (interactively or via a macro).

EXTRA Function Names

Where possible, avoid using the following function names as field names, but if you have to, be sure to enclose them in square brackets:

  • abs

  • absent

  • acos

  • asin

  • atan

  • atan2

  • azimuth

  • concat

  • cos

  • decode

  • default

  • delete

  • else

  • elseif

  • end

  • erase

  • exit

  • exp

  • field

  • first

  • if

  • ijkget

  • ijknum

  • int

  • join

  • keep

  • last

  • lcase

  • len

  • log

  • loge

  • logn

  • match

  • max

  • maxia

  • min

  • minia

  • mod

  • modc

  • next

  • not

  • phi

  • pow

  • prev

  • rais

  • rand

  • randbetween

  • round

  • rownum

  • saveonly

  • sin

  • special

  • sqrt

  • string

  • substr

  • tan

  • trim

  • type

  • ucase

  • xyzijk