VI Squash Process Object

The Visual Integrator (VI) Squash process object combines rows that are identical for a set of dimension columns.

Optionally, the Squash process object summarizes numeric columns by calculating the total, minimum, maximum, average, or standard deviation. It is equivalent to the Builder squash attribute or an SQL GROUP BY statement.

NOTE:

There must be enough available memory to contain all the summarized rows at the same time.
For large amounts of data that require sorting, use a Sort process object before the Squash process object. For more information, see VI Sort Process Object.

The Squash process object has three panes where you set attributes.

VI Squash Object All Panes

Object Attributes

You set attributes for the Squash process object in the object attributes pane.

VI Squash Attributes

Attribute	Description
Input	Defines the object from which the data flow arrives. Use one of the following methods to attach the Squash object to an input data flow object: Right-click the input data flow object, and click New Object after this > Process > Squash. Place the Squash object on the task flow, and click its Input attribute's browse button. Move the pointer to the task flow, and note that a dotted-line connector is attached to the Squash object. Attach this connector to the input data flow object. NOTE: The Squash object's Input attribute is now populated with the name of the input data flow object.
Adjacent	This optional attribute determines whether or not the Squash object is limited to adjacent rows. true—Only combines adjacent rows that have identical dimension values. The squash can perform this operation by storing a single row in memory. This assumes that the data is sorted. false (default)—A sort is performed. All the data needs to fit into memory.
Auto_Sort	This optional attribute determines whether or not the Squash object automatically sorts the input flow according to the squash dimensions, which is needed to squash adjacent rows. This attribute is only recognized if the Adjacent attribute is set to true. true—The Squash object creates an anonymous Sort object with sort columns matching the squash dimensions. This uses default values for Sort attributes, such as temp_directory and sort_size. If you need to set these attributes or your data is larger than the available memory, use an explicit Sort object before the Squash object. false (default)—The Squash object does not perform an automatic sort on the input flow.
Count_Column	Defines an output column that will contain the count of rows with identical dimension values that have been squashed into a single output row. This option is equivalent to totaling a column containing the number 1.
Warn_Limit	This optional attribute controls how many data integrity warnings are displayed in the log. By default, this attribute is not set. If a limit is set and reached, the log displays the following message: `Further data integrity warnings suppressed for Squash object "<object_name>". Increase the warn_limit attribute to see more.`
Keep_All_Other_Columns	Determines whether or not all additional columns (beyond those that have check marks in the Dimension or Summary check boxes) are kept and passed to the output data flow. true—All input columns are kept even those that have a check mark in the Remove check box. false (default)—No additional columns are kept. Use the Other and Remove check boxes to select other individual input columns to be kept or removed.

Column Grid

The Squash column grid displays the input columns and the squash attributes, which you can set.

Attribute	Description
Input Column	Displays the name of each input column. This attribute is read-only.
Source Object	Displays the name and object type of the source object. Double-click the Source Object for a column to change the task flow focus to that object.
Dimension	Defines the input columns used to combine rows. Rows that have identical values for all the dimension columns are combined. Select the Dimension check box for columns that you want to set as dimensions. At least one input column must be marked as a dimension. A number is automatically entered in the row's Dim Order.
Dim Order	Defines the dimension order for input columns with a Dimension check mark. The Dim Order is automatically set in the order that you select the Dimension check box for input columns. You can edit these numbers to change the order. If you assign a Dim Order value for an input column not already marked as a dimension, a check mark is added to the Dimension check box.
Summary	Defines input columns that are totaled. These columns are summed into summarized output rows in the data flow. Select the Summary check box for numeric columns that you want summarized. These columns must be numeric or blank (null). Null values are treated as zeros and do not effect the summary calculation, but return null if all values are null.
Other	Defines input columns that are preserved and passed to the output data flow. Select the Other check box for input columns that you want to pass to the output data flow. This optional attribute is useful for keeping data intact. The Other and Remove check boxes are mutually exclusive. If a column has the Remove check mark, selecting the Other check box removes the Remove check mark.
Remove	Manages which columns are removed from the output data flow. Select the Remove check box for columns that you want to explicitly suppress from the output data flow. NOTE: Use the Remove check boxes only when no Keep check boxes are checked. The Remove and Other check boxes are mutually exclusive. If a column has the Other check mark, selecting the Remove check box removes the Other check mark.
Average	Defines input columns that are combined by calculating the average value of the squashed rows. Select the Average check box for numeric columns that you want to calculate the average values for. The data flow output column is named Average <column_name>. These columns must be numeric or blank (null). Null values are treated as having no data and do not effect the average calculation.
Max	Defines input columns that are combined by calculating the maximum value of the squashed rows. Select the Max check box for numeric columns where you want to calculate the maximum value. The data flow output column is named Max <column_name>. These columns must be numeric or blank (null). Null values are treated as having no data and do not effect the maximum calculation.
Min	Defines input columns that are combined by calculating the minimum value of the squashed rows. Select the Min check box for numeric columns where you want to calculate the minimum value. The data flow output column is named Min <column_name>. These columns must be numeric or blank (null). Null values are treated as having no data and do not effect the minimum calculation.
StdDev	Defines input columns that are combined by calculating the standard deviation of the column for the squashed rows. Select the StdDev check box for numeric columns where you want to calculate the standard deviation. The data flow output column is named Std Dev <column_name>. These columns must be numeric or blank (null). Null values are treated as having no data and do not effect the standard deviation calculation.

NOTE: Keep in mind the following while setting the check boxes in the Squash process object column grid:

Input columns can be set as one of the following: Dimension, Summary, or Other. At least one dimension is required.
Only input columns set for Dimension, Summary, Other, Average, Max, Min, or StdDev are passed into the output data flow.
Summary, Average, Max, Min, and StdDev can be set only for numeric columns (Summary check mark).
The Other and Remove check boxes are mutually exclusive.