Filein Input Object
An Integrator Filein input object accepts input from a set of external text files, described by an external dictionary file or column headers within the text file. When an external dictionary is used, these files can be either variable (delimited) or fixed format. For input objects, the input_type defaults to "filein". This object can be as brief as a filename and file_type (column_headers or DBF).
If the file is described by a DI dictionary, the column names and attributes are used from the dictionary. If a list of file names is given, the input is formed by concatenating all files. Multiple files listed with the same input object must have the same format. If the file formats are different, consider using the Concat process object to create a single flow (see Concat Process Object).
Filein Attributes
Attribute | Type | Description |
---|---|---|
input_type (required) |
String | Identifies the object as a Filein input object. The value of this string is "filein". This is the default type for an INPT object if input_type is omitted. |
file_type (required) |
String | Specifies the file type of the input file and how columns are named in the file. Values include:
NOTE:
|
filename | String | Defines the file name for a single input file. |
filenames | Array of Strings | Defines the file names for multiple input files. The files are logically concatenated together before using. Multiple column headers are stripped. |
starname | String | Defines a "star name" (file match string) for selecting a set of input files based on a wildcard.
Values
include:
starname = “*.dat” and starname = “*.DAT” match all .dat files. When using the starname attribute, files are returned in the order they are in the directory; they are not necessarily sorted alphabetically. The order will vary across systems, and are likely related to how files are deleted and added in the directory. Programs should not rely on the order of starnames. |
starnames | Array of Strings | Allows multiple starname strings to be used to specify filenames. If a starname string does not
match any files, it is ignored. NOTE: The attributes filename, filenames, starname and starnames are mutually exclusive. Use only one of these attributes. Use of one of these or the file_list_input attribute is required. |
file_list_input | String |
Specifies a separate input flow to generate a list of filenames. This string is an object name, not
a filename. This input flow should have a column named "filename". The Filein object uses the
values in this column as a list of filenames to open as its input. To use file_list_input, two Input
objects, one for the file_list_input and a second for the actual file input, are required.
|
union | Boolean | When true, concatenates multiple input files together and produces the union of the input columns. This attribute may only be "true" when reading files with column headers. This result is similar to the way that the Concat process object combines multiple input flows. If a column is requested in the output flow that does not appears in all the files, the value of that output column will be blank in the appropriate rows. This flag allows the Filein object to read multiple files that have columns added over time without having to add the columns to earlier files, that is, columns and column order can change over time. This attribute is optional, but recommended when using a series of files. |
delimiter | String |
Specifies the delimiter that is used to separate columns for variable format files. If not specified, ASCII tab is used. Choices are:
Required with file_type "column headers" to properly read a variable-format file. |
newline | String |
Specifies a newline character, as a string containing exactly one character from the input file. The specified character will be replaced with a new line. For example: Before a newline character is specified, the following is displayed: 483574387548~4434839~4782939029~ After specifying "~ "as the newline character: newline = "~" The information is displayed as follows: 483574387548
If not specified, the default newline is a carriage return (ASCII 13), a line feed (ASCII 10), or a
carriage return line feed.
|
require_crlf | Boolean |
Determines whether or not a carriage return followed by a line feed is required to indicate the end
of a line. If this attribute is "false", then either a carriage return, a line feed, or the combination of
carriage return and line feed would indicate the end of a line. If this attribute is "true", then both
characters must be present. This is useful for handling the output of certain Microsoft software that
export single line feeds without a carriage return from internal fields.
If this attribute is "true", the newline attribute cannot be used. translate(MyColumn, concat(chr(10),chr(13)), "|%") Where MyColumn is the name of the column that contains the line endings, chr(10) is the character for linefeed (LF), and chr(13) is the character for carriage return(CR). The linefeed will be replaced with | and the carriage return will be replaced with %. You can replace | and % with something that makes sense to your situation. NOTE: This attribute is Require CRLF in Visual Integrator. |
dictfile | String |
Defines the file name for the dictionary describing the file columns (for example, Sales.dic). This attribute is used with old format dictionaries, which list both the column names and the data categories. When the file_type attribute is set to "standard", either this attribute or the dictobj attribute below must be defined. When this attribute is used, do not define the dictobj attribute since these attributes are mutually exclusive. NOTE: This attribute is Dict File in Visual Integrator. |
dictobj | String |
Defines the object name of a dictionary object that lists the file columns. This dictionary object should appear in a separate script file as a 'DICT' object. This attribute is used with new format dictionaries. For information on the dictionary object, see Dictionary Input Object. When the file_type attribute is set to "standard", either this attribute or the dictfile attribute above must be defined. When this attribute is used, do not define the dictfile attribute since these attributes are mutually exclusive. NOTE: This attribute is Dict Obj in Visual Integrator. |
filename_column | String |
Indicates the name of a new column in the output flow that contains the input filename for each row of data. NOTE: This attribute is Filename Column in Visual Integrator. |
first | Integer | Determines the number of records to be read from the input file. If used, Integrator reads up to the specified number of records. This limit is particularly useful for script testing on a small number of input records. If this optional attribute is not used, all rows are returned. |
ignore_line_end | Boolean |
Specifies whether parse errors dealing with the end of an input line are ignored or not (for example, "Fixed field occurs past end of line" or "Too few fields"). This attribute helps control processing when there are too few data items in an input row. If "true", Integrator processes the line. If "false", Integrator prints out a warning message and skips the line. (default) NOTE: This attribute is Ignore Line End in Visual Integrator. |
ignore_extra_columns | Boolean | Displays or ignores extra columns that appear in the input following the last column described in
the column headers or dictionary. This attribute helps control processing when there are too many
data items in a row of the input. Values
include:
If this attribute is not used, "false" is assumed. NOTE: This attribute is Ignore Extra Columns in Visual Integrator. |
ignore_quotes | Boolean |
Ignores the beginning and ending quotes while keeping the embedded quote. This optional attribute should be used only in special cases. By default, if a field starts with a double quote ("), it is stripped away, along with any trailing double quotes. If the delimiter is the comma, all commas within double quotes are kept as part of the column value, rather than being parsed as a delimiter. When ignore_quotes="true", double quotes are passed in for processing, resulting in the display of quotes. NOTE: This attribute is Ignore Quotes in Visual Integrator. |
strict_quotes | Boolean |
If this attribute is set, delimiters found within a quoted string are always treated as part of the quoted string, as opposed to delimiting a new field in a variable format file. This behavior is always the case when the delimiter is a space, a comma (,) or a semi-colon (;). By default, Integrator will treat other delimiters (like tabs) as a hard stop for the field, with the expectation that quoting in fields is possibly incorrect. |
aliases | Array of Strings |
Defines new column names for the columns already defined in the input data. Format is "oldname=newname". Blanks before or after the columns names will be ignored. Spaces within a column name are acceptable. If newname is blank, then the given column is deleted from the output flow. NOTE: This attribute is Alias Lines in Visual Integrator. |
prefix | String | Defines a prefix that is prepended to all columns in the flow that are not aliased using the aliases array. If you want a space between the prefix and the column name, include that space in the prefix string definition. |
keep_columns | Array of Strings | Defines a list of columns to be kept by the input object. If this attribute is not used, all columns are kept. The output flow of the object is limited to those columns that are listed, and no excluded columns are available to subsequent process objects. Column names in the keep_columns array should be given after they are aliased or prepended with the prefix string. |
rename_duplicates | Boolean |
Creates new column names for duplicate columns names that appear in the input flow for this
object. Subsequent columns for a column with the same name as a column name will be given
the names name_2 ... name_(n) based on the positional order in the input. If, for some reason, a
column in the input flow already has this name, that number will be skipped.
For example, if the input flow already has a column named "DESC_2", the object will name the duplicate column DESC as "DESC_3". The duplicate naming process occurs before attributes defining aliases, prefixes or the columns to keep are applied, so these generated column names can be aliases to another name. NOTE: This attribute is Rename Duplicates in Visual Integrator. |
encoding | String |
Defines how files names are read and interpreted in terms of character encoding. Values include:
UCS-2 and UTF-8 files can include a Byte Order Mark (BOM) at the beginning of the file to denote the file encoding. These file signatures are defined as follows:
File signatures are common for Unicode files on Windows operating systems. If the file input object reads multiple files, the signature of each file determines its encoding. If the encoding attribute is auto and no signature is found, the encoding is assumed to be latin1 if no other object in the task handles Unicode data and the VI file is not encoded as utf-8 (using the charset 1208 directive). Otherwise, the encoding is assumed to be utf-8. See also Integrator Unicode Data Support. |
trace_after | Sub-object |
Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object. See Embedded Trace Object for more on using trace sub-objects. |