Join Process Object
The Integrator Join process object performs SQL-style joins in Integrator. It discards records from one or both flows if they do not join any records from the other flow, and duplicates records when both flows contain multiple records. This expands on the ability of the Lookup process object in that it will implement standard SQL join types (inner, outer, left outer, right outer, merge), and will perform cartesian products for duplicate records. Unlike the Lookup object, it does not require the lookup flow to completely fit into memory. On the other hand, it may be slower if the lookup flow is not read into memory. It does require that both flows are sorted according to the join column(s). If the flows are not sorted properly, Integrator will display a warning message and the results of the join may not be correct. The autosort attribute can be used to automatically sort one or both of the Join inputs.
Join Attributes
| Attribute | Type | Description |
|---|---|---|
| process_type (required) |
String | Identifies the object as a Join process object. The value of this string is "join". |
| input (required) |
Array of Strings (size 2) | Defines the input flows for the join. The first flow is the left flow; the second flow is the right flow. Both flows should be sorted on the join column(s). |
| joins | Array of Strings (size 2) | Defines the columns that must match for the join. The first column name is a column in the left
flow; the second column name is a column in the right flow. Either the joins or multijoins attribute must be specified. They are mutually exclusive. |
| multijoins | Array of Strings | Indicates that each element of the array is a two-element array (pair) of column names. The first
element of the pair is a column name in the input (left flow). The second element of the pair is a
corresponding join column (right flow). Either the joins or multijoins attribute must be specified. They are mutually exclusive. |
| join_type | String |
Determines the type of join that is performed. If no join_type is given, then join_type defaults
to "inner". The join types in the Join object closely match classical relational database logic. The
following join types are supported:
NOTE: This attribute is Join Type in Visual Integrator. |
| autosort | String |
Automatically sorts one or both of the input flows in the Join object. It sorts an input flow by
creating an anonymous Sort object with sort columns matching the appropriate join columns, thus
saving the script writer the effort of writing the Sort object. The autosort attribute can have one
of the following values:
The sort uses default values for Sort attributes such as temp_directory and sort_size (see Sort Process Object). If these need to be set, then an explicit Sort object should be used. NOTE: This attribute is Auto Sort in Visual Integrator. |
| numeric_columns | Array of Strings | Identifies join columns that are sorted numerically as opposed to alphabetically. This is similar to the numeric_columns attribute in the Sort process object (see Sort Process Object). If a column is being joined using different names for the left and right input flows, only the name for the left input flow should appear in this list. |
| case_sensitive | Boolean |
Controls whether comparisons are case sensitive or case insensitive. The default is "false", so if the attribute is not given, comparisons will be case-insensitive. See locale below. NOTE: This attribute is Case Sensitive in Visual Integrator. |
| locale | Boolean | Controls whether or not comparisons for join columns are done in a locale-specific way. The default is "false". This is similar to the locale flag in the Sort process object (see Sort Process Object). When used, this attribute overrides the case_sensitive attribute. |
| trace_after | Sub-object | Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object. See Embedded Trace Object for more on using trace sub-objects. |