Join Process Object

The Integrator Join process object performs SQL-style joins in Integrator. It discards records from one or both flows if they do not join any records from the other flow, and duplicates records when both flows contain multiple records. This expands on the ability of the Lookup process object in that it will implement standard SQL join types (inner, outer, left outer, right outer, merge), and will perform cartesian products for duplicate records. Unlike the Lookup object, it does not require the lookup flow to completely fit into memory. On the other hand, it may be slower if the lookup flow is not read into memory. It does require that both flows are sorted according to the join column(s). If the flows are not sorted properly, Integrator will display a warning message and the results of the join may not be correct. The autosort attribute can be used to automatically sort one or both of the Join inputs.

Join Attributes

Attribute Type Description
process_type
(required)
String Identifies the object as a Join process object. The value of this string is "join".
input
(required)
Array of Strings (size 2) Defines the input flows for the join. The first flow is the left flow; the second flow is the right flow. Both flows should be sorted on the join column(s).
joins Array of Strings (size 2) Defines the columns that must match for the join. The first column name is a column in the left flow; the second column name is a column in the right flow.
Either the joins or multijoins attribute must be specified. They are mutually exclusive.
multijoins Array of Strings Indicates that each element of the array is a two-element array (pair) of column names. The first element of the pair is a column name in the input (left flow). The second element of the pair is a corresponding join column (right flow).
Either the joins or multijoins attribute must be specified. They are mutually exclusive.
join_type String Determines the type of join that is performed. If no join_type is given, then join_type defaults to "inner". The join types in the Join object closely match classical relational database logic. The following join types are supported:
  • inner—Only rows that have matching join columns in both input flows are returned in the output flow. If a flow has duplicate rows with the same join column values, then a cartesian product of the matching rows are returned in the output flow. If a column exists on both flows, its "left" flow values are kept.
  • outer—All rows are returned in the output flow. If a row in one flow does not have a match in the other flow, then it is matched with a row of blank values. As with the inner join, a cartesian product is returned for duplicate rows when present in both flows.
  • left outer—All rows from the first (left) flow are returned. If the join column values for a row in the left flow does not match the join column values for any row in the right flow, it is matched with a row of blank values for right flow columns. If a flow has duplicate rows with the same join column values, then a cartesian product of the matching rows are returned in the output flow.
  • right outer—All rows from the second (right) flow are returned. If the join column values for a row in the right flow does not match the join column values for any row in the left flow, it is matched with a row of blank values for left flow columns. If a flow has duplicate rows with the same join column values, then a cartesian product of the matching rows are returned in the output flow.
  • merge—The two input flows are merged according to the sorted join columns. This is the same as using the Concat object on the two input flows, except that the rows from the input flow are interleaved based on the sort instead of one input flow appearing after the other. This join is useful in situations when the user wants to maintain the sort order of the rows. All rows from the input flows will appear in the output flow.

NOTE: This attribute is Join Type in Visual Integrator.

autosort String Automatically sorts one or both of the input flows in the Join object. It sorts an input flow by creating an anonymous Sort object with sort columns matching the appropriate join columns, thus saving the script writer the effort of writing the Sort object. The autosort attribute can have one of the following values:
  • false—Does not autosort. (default)
  • true—Both flows are sorted according to their respective join columns.
  • left—The left or first flow is sorted according to its join columns.
  • right—The right or second flow is sorted according to its join columns.

The sort uses default values for Sort attributes such as temp_directory and sort_size (see Sort Process Object). If these need to be set, then an explicit Sort object should be used.

NOTE: This attribute is Auto Sort in Visual Integrator.

numeric_columns Array of Strings Identifies join columns that are sorted numerically as opposed to alphabetically. This is similar to the numeric_columns attribute in the Sort process object (see Sort Process Object). If a column is being joined using different names for the left and right input flows, only the name for the left input flow should appear in this list.
case_sensitive Boolean

Controls whether comparisons are case sensitive or case insensitive. The default is "false", so if the attribute is not given, comparisons will be case-insensitive. See locale below.

NOTE: This attribute is Case Sensitive in Visual Integrator.

locale Boolean Controls whether or not comparisons for join columns are done in a locale-specific way. The default is "false". This is similar to the locale flag in the Sort process object (see Sort Process Object). When used, this attribute overrides the case_sensitive attribute.
trace_after Sub-object Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object. See Embedded Trace Object for more on using trace sub-objects.