Text, cBase, and Dive Input Build Tags
A Spectre Build script is defined using a build tag with optional tags between open and close { } braces. This page describes the build tags that can be used within the text-input, cbase-input, , and dive-input code blocks. The tags are listed in alphabetical order.
Text, cBase, and Dive Input Build Tags
Defines the block for this build description.
For example:
build {
...
}
- standard—Gregorian calendar; period type options are year, year-half, year-quarter, year-tertile, and year-month.
- offset <month>—Fiscal year similar to the standard calendar except starting on a different month; period type options are the same as standard calendars.
- iso 8601—ISO week-numbering system; period type options are year and year-week.
- custom—Custom calendar created from a text file with a contiguous range of dates; required columns are date and year, optional columns are half, quarter, tertile, month, and week.
See General Build Tags for a different use of calendar as a Build tag for calendar declarations.
For example:
column "Quarter" type="period" calendar="standard" period-type="year-quarter"
column "Fiscal Month" type="period" calendar="offset August" period-type="year-month"
Supports a period type column. Valid calendars are:
- gregorian month—Gregorian year-month
- gregorian quarter—Gregorian year-quarter
- iso8601 week—ISO-8601 year-week
- gregorian+<N> month—Offset Gregorian year-month for fiscal calendars. Specifying "gregorian+6 month" means "add six months to the real year-mo to get the fiscal year-mo", which is like saying "The fiscal year starts in July".
- <custom calendar file>—For more tailored fiscal calendars
See Spectre Calendars.
For example:
column "Sales Quarter" type="period" calendar="gregorian month" format="YYYY/MM" input-format="YYYY/M"
column "Accounting Period" type="period" calendar="iso8601 week"
column "Order Month" type="period" calendar="gregorian+9 month"
column "Sales Month" type="period" calendar="my-custom-calendar.cbase"
Specifies one or more cBases as a source of data. Can be combined with text-input. Columns do not need to match. Columns in secondary inputs need to be explicitly declared to be included. Missing columns are treated as nulls unless an add operation is included in the build block. See the text-input tag later in this table and the Build Operations Tags in General Build Tags.
- When using cbase-input, all columns are included in the output. If you need to eliminate some columns, use the remove build operation.
- cBase input is useful for working with intermediate data sets—already validated and typed so there is no need to repeat the labor intensive text parsing. Also, cBase input is useful for consolidating period based data sets.
For example:
build {
text-input "monthly.txt"
cbase-input "/cbases/revenue.cbase"
output "/cbases/revenue_plus.cbase"
}
Defines the columns for the cBase when using text-input.
- If column headings exist, Spectre attempts to auto-detect the type; however, it is a good idea to check these and correct as necessary to ensure faster processing and less memory usage.
- If column headings do not exist, you must specify all columns for Spectre to know the names. The empty string cannot be used as a column name. Column names are case insensitive.
- If no columns are specified, all are brought in.
- It is a best practice to specify which columns to keep rather than which to drop.
Column tags are often followed by type, calendar, format, or input-format options to clarify the data being read and added to the cBase. Specifying the type is a best practice.
When used in a lookup block, column supports the suggested-dimension tag. This allows you to flag the additional column as an optional dimension, avoiding the dimcount process used for ProDiver's console display.
NOTE: The column tag cannot be used with cbase-input.
For example:
column "Order Type" type="string"
column "Liters Per Case" type="double"
column "Event Day" type="date" format="MMMM D, YYYY"
column "Return Date" type="datetime"
column "Sales-Period" type="period" calendar="iso8601 week"
column "Revenue" type="fixed100" format="$#,#.00"
column "Invoice-Request" type="boolean" format="Y;N"
column "Ship Date" input-format="YYYY/MM/DD" format="YYYY-MM-DD" type="date"
column "Price" type="fixed100"
lookup {
text-input "lookup.txt"
key "A" "A"
column "B" suggested-dimension=false
}
Defines custom properties for the cBase. Custom properties are not used by Spectre but are available for users to add their own metadata to Spectre objects. Custom properties can be referenced in Spectre expressions using functions like column_custom_property(), table_custom_property(), or calc_custom_property(), depending on the context.
See Spectre Properties.
For example:
column "units_sold" type="integer" {
custom-property "My Prop" "A"
}
custom-property "Abbreviation" "Dir"
custom-property "Footer" "ABC Company®, Copyright 2019."
Specifies the delimiter for the flat file. If delimiter is not specified, Spectre attempts to auto-detect the value. Best practice is to specify the delimiter value. The value must be a string and is either a single character (for example, "\t" for tab, "," for comma, " " for space, ";" for semicolon, "|" for pipe) or the special strings "tab" or "comma".
For example:
build {
text-input "/data/sales.txt" delimiter="," {
...
}
}
Adds the system property Description for the column.
For example:
column "Str" {
property "Description" "Column Desc"
}
Turns off data warnings of similar values for this column. Use only after data validation steps have determined that the warnings are in fact not worrisome.
For example:
column "Address line 1" type="string" disable-similar-values-warnings=true
Specifies a dive file to use as input. The Dive is executed and the result used as a build input verbatim. Multiple dive-input tags are allowed and mix freely with the other inputs. Use a Dive as input to squash the data, that is, create a single record for a series of dimensions.
TIP: When an input lookup file has duplicate keys, you can dive into the input to remove the duplicates and lookup against that instead.
NOTE: There are limits to the types of dive windows that can be passed to Spectre build, for example, a crosstab Dive is not supported.
For example:
build {
cbase-input "Sales.cbase"
dive-input "Profit.dive"
output "Sales-and-Profit.cbase"
}
build {
cbase-input "Sales.cbase"
dive-input {
dive {
cplan "/cplans/transactions.cplan"
window {
dimension "Dept"
dimension "Item Type"
column "Total Charges"
column "Total Payments"
}
}
}
output "/cbases/SalesExpanded.cbase"
}
Drops the named column from the input file when creating the cBase.
When using text-input, if you want to specify only which columns to keep, so that the build drops all others, use the column tag to specify the columns.
Keep in mind that dropping columns only affects the detail file size and build time, it does not affect the performance of dives.
For example:
drop-column "Customer Name"
Defines the encoding used in the input and output files. Encoding types supported include: cp-1252, iso-8859-1, utf-16be, utf-16le, utf-8, and gb18030.
- A best practice is to specify the encoding of the input data.
- For text inputs, Spectre analyzes the first few megabytes of the file to attempt to determine the encoding.
- For databases, see ODBC Input Build Tags.
- Spectre Build's informational output indicates which encoding is in use.
- Starting with version 27, any incoming bytes that are not valid per the specified or assumed encoding are replaced with placeholders; these appear as squares in ProDiver.
- Non-Unicode DI clients are Unicode-aware.
- See also Integrator Unicode Data Support.
For example:
text-input "/data/sales.txt" encoding="utf-8" {
...
}
Adds a column with the input filename. When reading multiple files, this tag captures the filename on each row in the cBase. In this example, the resulting "Text Source" column might have values like "data-1.txt" and "data-2.txt".
For example:
text-input "data-*.txt" {
...
filename-column "Text Source"
}
Filters out records that are not needed for the cBase.
This example sorts on two columns and checks for a change in value to filter out all but the most recent record for the ID.
build {
text-input "data.txt" {
column "ID" type="string"
column "Status" type="string"
column "Status date" type="date"
}
sort {
column "ID"
column "Status date" reverse=true
}
filter `on_previous_row(value("ID")) != value("ID")`
output "data.cbase"
}
Sets both the display format and the input properties, unless a separate input-format is specified.
Formats the 'date' or 'period' data for the cBase column.
- To store dates like July 4, 2023, use format="MMMM D, YYYY".
- To include letters such as FQ (fiscal quarter), as in YYYY-FQQQ, backslashes are needed to force the characters to be taken literally, and another backslash is needed to escape each backslash in the string.
Formats numeric data, for example as currency.
Also indicates how to format Boolean data in the cBase column.
- The default format is "True;False".
- Other alternatives are "T;F", "Yes;No", and "true".
- If the format has a blank after the semicolon, or if there is no semicolon, then blank values are considered false. Otherwise, when there is a semicolon and the string after it is not empty, blank values are parsed as null.
Formats the datetime data.
- To indicate the data is zoned time and should be stored as GMT, use the zone option.
- To specify the default time zone for display, use the time-zone tag.
See also: Spectre Format Conventions.
For example:
column "Request Date" type="date" format="YYYY-MM-DD" input-format="YYYY-M-D"
column "fiscal quarter" type="period" calendar="MyFiscal" format="YYYY-\\F\\QQQ"
column "Ship Quarter" type="period" calendar="gregorian quarter" format="YYYY-\\QQ"
column "year_month" type="period" calendar="gregorian month" format="YYYY-MM"
column "Price" format="$#,0.00"
column "Revenue" type="fixed100" format="$#,#.00"
column "Active" type="boolean" format="T;F"
column "Times" type="datetime" format="YYYY/MM/DD hh:mm:ss {zone:short}" time-zone="America/New_York"
Indicates the presence or absence of column headings in the flat file. Columns with the heading "" are ignored. Set to False if the input file does not contain headings. Be sure to list each column by name and type.
For example:
text-input "/data/sales.txt" headers=false {
...
}
Ensures no columns are dropped. Use when you explicitly define a few columns with special properties and then want to ensure that all data is included in the build.
When used in a lookup block, include-other-columns supports the suggested-dimension tag. This allows you to flag the additional columns as optional dimensions, avoiding the dimcount process used for ProDiver's console display.
The include-other-columns tags are often followed by type and summary-type options to clarify the data being read and added to the cBase.
If no columns are specified in a Build script, include-other-columns is assumed to avoid building a cBase with no columns.
For example:
build {
text-input "basic.txt" {
column "Color" label "Favorite"
include-other-columns
}
output "basic.cbase"
}
lookup {
text-input "lookup.txt" {
include-other-columns suggested-dimension=false
}
key "A" "A"
column "B"
}
include-other-columns type="string"
include-other-columns type="integer" summary-type="sum"
Clarifies the incoming data.
- For dates and periods, the default format is YYYY/MM/DD. Use input-format to define the date format in the input file to ensure correct interpretation of date data.
- For Boolean data, use input-format to define the data as it appears in the input file.
Also see format.
For example:
column "Date" type="date" format="MMM D, YYYY" input-format="YYYY-MM-DD"
column "Active" type="boolean" format="yes;no" input-format="Y;N"
Defines the display label used for a column. This is a system property for the column.
For example:
column "Color" type="string" label="favorite"
Limits the number of lines with problems that are logged before the text reader stops logging the errors.
For example:
text-input "/data/sample.txt" {
limit-line-warnings 20
column "group" type="string"
...
}
Directs the build to use a specific number of rows from each input file. If multiple input files are used, the first 100 rows are read from each file, ensuring a complete sampling of data values.
TIP: Using limit-rows=1 is an easy way to ensure calcs find all needed columns in the input.
For example:
build {
text-input "basic.txt" limit-rows=100 {
...
}
}
Limits the 'similar-value' check for the entire text-input. The default is 20 warnings per column. Set to zero to turn off all warnings for the input. See also disable-similar-values-warning.
For example:
text-input "x.txt" {
limit-similar-values-warnings 2
column "Address line 1" type="string"
...
}
Limits the number of warnings logged for issues with values in the source file.
For example:
build {
text-input "/data/folder_properties.txt" {
column "folder" type="string"
column "group" type="string"
limit-value-warnings 10
}
...
}
Maps known values used to represent 'no data' to the null value in the cBase.
For example:
column "phone" type ="string" {
nulls "xxx" "N/A"
}
column "order date" type="date" {
nulls "0000/00/00"
}
Passes a parameter value from the Build to the Dive; the Dive must have take-parameter.
For example:
dive-input "insurances.dive" {
pass-parameter "file"
}
Indicates the period type being applied for the column. If the calendar tag is omitted, the default calendar is used. See the calendar tag description.
For example:
column "Ship Month" type="period" calendar="standard" period-type="year-month"
column "YearMo" type="period" period-type ="year-month"
Specifies a system property for the column.
See Spectre Properties.
For example:
column "units_sold" type="integer" format="#,#" label="Units Sold" {
property "Description" "How many units were sold"
}
Indicates how to handle quotes. By default, Spectre looks for quotes and keeps strings intact.
Set quote=false for Spectre to ignore any quotes found in the data file. This is useful when the input does not use quotes consistently.
For example:
text-input "/data/sales.txt" quotes=true {
...
}
text-input "notes.txt" quotes=false
Sets a column to behave like classic Info Fields. In ProDiver, this field is available as a column that you can add to other columns displayed in a dive. In this example, Product Status becomes an Info Field on Status in the client console. In other words, Product Status only displays if Status is in the dive window.
See Defining Info Relationships in Build Code Samples.
For example:
column "Product Status" type="string" required-dimension="Status"
Specifies an alternate sort. When specifying columns, use sort-by to order them based on another column's value.
- An alternate sort can only be applied to a string column, and the sort-by column must be a string or integer column.
- You cannot add, remove, or change alternate sorts after a cBase is built.
- The sort-by column does not need to be built into the cBase.
- Dive and cPlan files also have a sort tag, but for string data, rely on alternate sorts established (if any) when the cBase is built. Dives can also specify sorting on numeric columns.
-
Spectre uses "numeric collation" when sorting strings. When a subset of a string looks like a number, then the numeric value is considered during the sort. For example, Spectre sorts "a2" before "a10". See also sort-natural.
-
Null strings sort at the end.
- If two distinct strings in the column (to be sorted) are set to sort as the same value in the sort-by column, then the normal ordering of those distinct strings will resolve the tie.
-
If a row has the value null or the value unknown in the column with an alternate sort, then the sort-by column must also have the same value.
-
If the sort-by column has type string AND has the value null or the value unknown, then the column to be sorted must have the same value.
- Spectre issues a warning if you attempt to alternate sort null as non-null, or unknown as not-unknown.
For example:
column "Symbol" type="string" sort-by="Number"
column "Number" type="integer"
column "Priority Sort" type="integer"
column "Priority" type="string" sort-by="Priority Sort"
Disables natural string sorting in text-input. The default is true—Spectre interprets value of digits as decimal numbers and sorts according to the numeric values in the string.
TIP: This feature only makes sense together with sort-by if the sort-by column is also a string column.
For example:
column "ICD-10-code" type="string" sort-natural=false
add "diagnosis" `value("Diagnosis Code")+" -- "+value("Diagnosis Desc")` sort-natural=false
Allows you to promote (true) or demote (false) a column.
- Set to true to promote a numeric column type to become a diveable dimension and have it appear in the ProDiver console.
- Set to false to demote a column from classification as a dimension so it does not appear in the ProDiver console.
- When set to false, the new dimension does not appear in the console (which requires dimcount processing), but it can still be promoted to a dimension for diving—equivalent to the concept of a dynamic dimension for a classic model.
- Using required-dimension on a column effectively sets suggested-dimension=false.
For example:
column "Product Class" type="string" suggested-dimension=false
Sets the column's summary type explicitly. Any summary function that takes a single argument (except for count and percentile) are allowed. See Summary Functions.
Most common are sum, info, any, first, and last. The default for numeric data is sum; use kahan_sum if concerned about precision; use const for a column with only one value. The rest of the functions (for example, min, average, median) are better done as a separate calc (that is, not the default one).
For example:
column "Quantity In Stock" type="integer" summary-type="sum"
column "OrderNum" type="integer" summary-type="max"
column "CustomerNum" type="integer" summary-type="info"
include-other-columns type="integer" summary-type="sum"
Specifies one or more text files as a data source. Also note:
- Wildcard characters ("*" and "?") are supported.
- Zip files are supported, and the files within are treated as multiple files.
- Windows has a limit of 16,384 files. On Linux, the limit is set by an administrator.
- A build block can include both text-input and cbase-input tags.
- Columns from all inputs go into the cBase. Columns in secondary inputs need to be explicitly declared to be included. If an input does not have a column, the cBase column is created with nulls. If nulls are not appropriate, an add operation can be used under the input to specify the missing values.
- Operations outside the input block operate on the union of the inputs for the build.
- If multiple text files are listed using a single text-input tag, there is an option to parse the files in parallel when building the cBase. (See Spectre build Options and Build cBase Process Node.)
-
When there is no child block after text-input, all columns are used and their types are inferred. Use of a child block allows you to limit columns and apply build operations. See the table for Build Operations Tags under General Build Tags.
TIP: The input files can use a wildcard character in the filename, however, it does not allow recursively listing through subfolders. A Production extension can be used to process a list of files in subfolders and manipulate a Build script so that it names each individual input file.
For example:
build {
text-input "/data/sales.txt"
output "/cbases/sales.cbase"
}
build {
text-input "/months/jan.txt" "/months/feb.txt" "/months/mar.txt"
...
}
build {
text-input "/data/batch_item/2013*_HMA.dat" {
column "Batch Date" type="date" format="YYYY-MM-DD" input-format="YYYY-M-D"
column "Journal Code" type="string"
column "Batch Number" type="string"
...
}
output "/publish/cbases/HMA_batch_item.cbase"
}
build {
text-input "/data/dairy.csv" delimiter=","
output "/cbases/dairycsv.cbase"
}
build {
text-input "/data/months-data.zip"
output "/cbases/months.cbase"
}
Removes any leading or trailing spaces from the data. This is the default behavior for Spectre. Set trim=false to disable the trimming of white space.
For example:
text-input "/data/sales.txt" trim=true {
...
}
Specifies the type of data in the column.
- Valid values are boolean, date, datetime, double, fixed100, integer, period, and string. See Spectre Data Types.
- If the type is "period", a calendar tag is required. Here in the final snippet (deprecated), the format tag says to use the standard Gregorian counterparts—for example: 2016-01 (2015-Jul).
- The type for date assumes that the input has a formatted date, not the binary date value.
For example:
column "Qty Ordered" type="integer"
column "Telephone Number" type="string"
column "Order Date" type="date"
column "Ship Time" type = "datetime"
column "Margin" type="fixed100"
column "Account Closed" type="boolean" format="Yes;No"
include-other-columns type="string"
column "Quarter" type="period" calendar="standard" period-type="year-quarter"
column "Fiscal Month" type="period" calendar="offset August" period-type="year-month"
column "Fiscal Year-Mo" type="period" calendar="gregorian+6 month" format="YYYY-MM (GYYYY-GMMM)"
See also:
- ODBC Input Build Tags
- General Build Tags
- cBase Objects in Visual Integrator
- Build cBase Process Node in Production