Build Code Lookup Samples

Include all columns

This lookup connects as "keys" any columns whose names match, and then brings all other columns in as lookups.

lookup {
    cbase-input "other.cbase"
  }

Specify columns to remove

After removing "A" from other.cbase, this lookup connects as "keys" any columns whose names match, and then brings all other columns in as lookups.

lookup {
    cbase-input "other.cbase" {
      remove "A"
    }
  }

Specify columns to keep

This lookup adds columns "B" and "C", using "A" as the key.

lookup {
    cbase-input "other.cbase"
    key "A" "A"
    column "B"
    column "C"
  }

Specify Columns as Infos

This lookup flags the columns that are brought into the build as Info Fields for ProDiver. This is done by including the attribute keys-are-required-dimensions in the lookup block. For example:

build {
  
  text-input "file1.txt" {
    column "Customer" type="string"
    column "Product" type="string"
    column "Sales" type="integer"
    column "CustGrp" type="string" required-dimension="Customer"
  }
  
  text-input "file2.txt" {
    column "Customer" type="string"
    column "Product" type="string"
    column "Budget" type="integer"
    column "CustGrp" type="string" required-dimension="Customer"
  }

  lookup {
    text-input "CustomerGroup.txt" {
      column "Customer" type="string"
      column "Customer Group"
    }
    key "Customer" "Customer"
    keys-are-required-dimensions
  }
  output "qa-test.cbase"
}

Here you can see that ProDiver has set the info from the lookup as a dynamic dimension.

Auto dynamic dimension from lookup

Remove a Lookup column

This lookup adds column "B", using "A" as the key. Removing column "C" in the input block is the preferred use of the syntax.

lookup {
    cbase-input "other.cbase" {
      remove "C"
    }
    key "A" "A"
    column "B"
  }

Set dimensions as optional

This lookup flags column B as optional, that is, not a suggested dimension.

lookup {
  text-input "lookup.txt"
  key "A" "A"
  column "B" suggested-dimension=false
}

This lookup flags the columns other than B as optional. The columns can still be promoted for diving, but they are not dimcounted for the console.

lookup {
  text-input "lookup.txt" {
    include-other-columns suggested-dimension=false
  }
  key "A" "A"
  column "B"
}

An Updating Lookup

Suppose you have a column A in your input, and you want to bring in a possibly updated version of A from a lookup table. You could approach it as follows:

Perform the lookup, but bring in the column under a new name such as A_new.
Replace column A using the coalesce() function to combine the new and old value. Note that coalesce() returns the first non-null value it sees.
Remove column A_new.

build {
  text-input "data.txt" {
    column "Key" type="string"
    column "A" type="string"
  }
  lookup {
    text-input "lookup.txt" {
      column "Lookup Key" type="string"
      column "A" type="string"
      rename "A" "A_new"
    }
    key "Key" "Lookup Key"
    column "A_new"
  }
  replace "A" `coalesce(value("A_new"), value("A"))`
  remove "A_new"
  output "data.cbase"
}

Here's why this works.

When running the lookup:

A given key might have no match. In this case, A_new is null, and so coalesce() returns A and has no effect.
A given key does have a match, and it is not null. Then coalesce() returns this new value, A_new.
A given key does have a match, but it is null. Then coalesce() returns the original A, and has no effect.

That is how a Lookup of join type update works when using the Data Integrator.

You can even specify alternate null values to treat as null, the same as the Integrator Lookup object does when the attribute update_null_value is used:

text-input "lookup.txt" {
  column "Lookup Key"
  column "A" type="string" { 
    nulls "XXX"
  } 
  rename "A" "A_new"
}

Use Text Input

The input can be a simple or a more complicated text file. The text file needs to start with a row of column headers.

lookup {
  text-input "lookup-color.txt"
  key "Color" "Color"
  column "Wavelength"
  column "Hex"
  }

Use Text Input with a Range

A text file can include a range of values to use as part of the lookup process. For a lookup row to match, the main table's key value must be between the begin-range and end-range values, inclusively. The begin-range and end-range values can be null, in which case the range is open on either side (or both sides). Range lookup keys can be any type except string.

lookup {
  text-input "/data_sources/key_lookup_department_code.txt"{
    limit-similar-values-warnings 0
    column "Value" type="string"
    rename "Value" "Department Code"
    remove "Column Name"
    remove "Swim Lane"
    column "Eff Start Date" type="date" format="YYYY-MM-DD"
    column "Eff End Date" type="date" format="YYYY-MM-DD"
    column "Special Nursery Dept Key" type="string" suggested-dimension=false
    include-other-columns
  }
  range-key "Posting Date" "Eff Start Date" "Eff End Date"
  key "Department Code" "Department Code"
}

Use Dive Input

Results of a dive can be used to build the cBase, while the results of another dive can be used for lookup purposes.

build { 
  dive-input "/client-files/Monthly-Dive.dive" {
  }
  lookup {
    dive-input "/client-files/Master-Dive.dive" {
    }
    key "Drug" "Drug" 
  }
  output "/cbases/Monthly-Data-with-drugs.cbase" 
}

Use a Dive within a Lookup

The following sample Build script shows an example of a dive within a lookup. The first lookup is a regular lookup to get info values. The second lookup dives into another cBase to bring in summarized data. For example:

build {
  cbase-input "accounts_initial.cbase"
  
  lookup {
    text-input "days_since_last_discharge.txt" {
      column "Account ID" type="string"
      column "Last Discharge Date" type="date" format="YYYY-MM-DD"
      column "Days Since Last Discharge" type="integer"
    }
    key "Account ID" "Account ID"
  }
  
  lookup {
    dive-input {
      dive {
        cplan {
          cbase-input "charge_detail_initial.cbase"
        }
        window {
          dimension "Account ID"
          column "Charge Quantity"
          column "Charge Amount"
        }
      }
    }
    key "Account ID" "Account ID"
  }
  
  output "accounts.cbase"
}

The accounts and charges data sets have "Account ID" in common, but the charges data set has more detail, with one record per charge for the account. In order to bring charge information into the accounts data set, you need to summarize it—which is what the dive does.

The dive can be extracted and run separately.

Use Operations

This example sorts the data in order to identify duplicates, filters them out, and then adds a column to flag new items.

  lookup {
    text-input "/data/patient-data-*.dat" {
      column "Patient URN" type="string"
      sort {
        column "Patient URN"
      }
      add "dupe" ```
        if(on_previous_row(value("Patient URN"))
          !=value("Patient URN"),"N","Y")```
      filter `value("dupe")="N"`
      remove "dupe"
      add "New File Flag" `"Y"`
    }
    key "Patient URN" "Patient URN"
  }

Resolve Duplicates

Sometimes lookup tables have multiple rows with the same key.There is a resolve-duplicates tag for the lookup block so that you can specify how you want to handle the situation.

Consider this small data set listing 5 presidential names. You want to map the president's name to the number of his presidency.

Names of presidents

The lookup source could be like the following table (with two entries for Cleveland):

Lookup table

You can add the tag resolve-duplicates to return either value (any), the first value, or the last value. For example, this Dive would return 22 for president Cleveland:

dive {
     cplan {
       text-input "president-names.txt"
       lookup {
          text-input "presidents.csv"
          key "Name" "Name"
          column "Presidency"
          resolve-duplicates "first"
       }
     }
     window {
        dimension "Name"
        column "Presidency"
     }
}

Nested Lookups

The proper way to nest lookups is to put the inner lookup (and here the `add` it depends on) under the input tag of the outer lookup. For example:

build { 
  text-input "input.txt" { 
    column "A" type="string" 
    column "B" type="string" 
  } 
  lookup { 
    text-input "lookup_1.txt" { 
      column "A_lookup" type="string" 
      column "C" type="string" 
      add "D" `"D"` 
      lookup { 
        text-input "lookup_2.txt" {
          column "D_lookup" type="string" 
          column "E" type="string" 
        } 
        key "D" "D_lookup" 
      } 
    } 
    key "A" "A_lookup" 
  }
  output "my-output.cbase"
}

Concat Input for a Lookup

You can merge or concatenate two sources of the same type for a lookup using a consolidate block.

  lookup {
    consolidate {
      text-input "lookup-A1.txt"
      text-input "lookup-A2.txt"
    }
    key "A" "A"
    column "B"
  }

  lookup {
    consolidate {
      cbase-input "lookup-pre.cbase"
      cbase-input "lookup-post.cbase"
    }
    key "A" "A"
    column "B"
  }