String Functions in Integrator
The Integrator Calc object supports the following string functions:
Returns the integer value of the first character in the string char; inverse of the chr function. On EBCDIC platforms, returns the integer value in the EBCDIC collating sequence.
ascii(char : string) : numeric
For example, assuming an ASCII platform:
ascii("A") returns 65
ascii("Foo") returns 69
ascii(" ") returns 32
Capitalizes the first letter of every word or token in the string to uppercase, and changes the remaining characters in each word to lowercase. Single quote characters (’) appearing in a word will be interpreted as an apostrophe, and will not cause the following character to be changed to uppercase.
capitalize(string : string) : string
For example:
capitalize("friends") returns Friends
capitalize("john f. kennedy") returns John F. Kennedy
capitalize("let’s get going") returns Let’s Get Going
capitalize("PLEASE STOP SHOUTING") returns Please Stop Shouting
capitalize("pre-sales") returns Pre-Sales
Returns a single-character string with the character represented by the number value in the current character set (ASCII or EBCDIC).
chr(value : numeric) : string
For example, assuming an ASCII platform:
chr(65) returns A
chr(32) returns space
chr(92) returns \
Concatenates strings together. It takes a variable number of arguments, and concatenates them together. Numbers will be interpreted as strings.
NOTE: In the first example, note the use of the \", \" to insert a comma followed by a blank space between the city and state (e.g. Allentown, PA).
concat(string1 : string, string2 : string, ...) : string
This example takes the Customer City and Customer State and concatenates them together into a new column called Cust CityState, with a comma and space between them.
object 'PROC' "city" {
process_type = "calc",
input = "sales",
calcs = {
name = "Cust CityState",
value = "concat(Customer City, \", \",Customer State)"
}
}
};
For some applications, the escape sequence may be quote-quote. For example:
concat("<a href=""",calc[URL],""">",Info[Summary],"</a>")
Compares two strings and returns an integer. If string1 is equal to string2, this function returns 1. Otherwise, it returns 0. The comparison is case-sensitive.
equals(string1 : string, string2 : string) : numeric
For example:
equals("0.0", "0") returns 1
equals(string("0.0","0")) returns 0
equals(string("0.0"),string("0")) returns 0
Functions like the equals function above, except that string comparison is not case-sensitive.
NOTE: Both equals and equals_ignore_case work for numeric values. In the case of calculated numbers, to allow for inexact floating point calculations, the tolerance is 0.0000001. That is, equals and equals_ignore_case return true (1) if the two numbers are within 0.0000001 of each other.
equals_ignore_case(string1 : string, string2 : string) : string
Returns the starting position (index) of string2 in string string1. If string1 does not contain string2, it returns 0. If string2 is null, it returns 1. This function is case sensitive.
index(string1 : string, string2 : string) : numeric
Following are examples of index and the position returned:
The index function can be used with the substr function on an inexact string (or scan).
substr("Boston, MA",1,(index("Boston, MA",",") - 1)) returns Boston
The index function, in conjunction with the sign function, can be used as a substitute for an "IN_String" function.
Sign(Index("ME,MA,CT,VT,NH,RI", State)) if State is TX, 0 is returned; if State is MA, 1 is returned
Function | Result |
---|---|
index("New York, NY","n") | 0 |
index("New York, NY","N") | 1 |
index("New York, NY",",") | 9 |
index("New York, NY", "NY") | 11 |
index("New York, NY", "NV") | 0 |
index("New York, NY","") | 1 |
Determines the length of the string string. Returns an integer.
length(string : string) : numeric
For example:
length("four") returns 4
Returns the string string with all uppercase characters converted to lowercase.
lower(string : string) : string
For example:
lower("PLEASE STOP SHOUTING") returns please stop shouting
Pads the beginning of the given string repeatedly with pad_string until the result is length characters long. If pad_string is omitted, spaces are assumed.
lpad(string : string, length : numeric, pad_string : string) : string
For example:
lpad("foo", 5, "x") returns xxfoo
lpad("James", 10, "+-") returns -+-+-James
lpad("Long string", 5, "x") returns Long string
lpad("bar", 5) returns bar (2 spaces precede bar)
Returns string stripped of any characters in trim_chars that appear at the beginning of the string. If trim_chars is omitted, it defaults to spaces.
ltrim(string : string, trim_chars : string) : string
For example:
ltrim(" test") returns test
ltrim("xxxab", "x") returns ab
ltrim("* * **Hello", " *") returns Hello
ltrim("xxxab", "xab") returns (empty string)
Maps (i.e. changes) special characters to underscores (_). Any character that is not numeric, alphabetic, ’-’, ’.’,’~’, or ’_’ will be mapped to ’_’. This prevents creating invalid filenames and paths. This function can also be used to strip a field of any special characters.
TIP: Properly handles UTF-8 encoded strings.
map_filename(string : string) : string
For example:
map_filename("mth**01") returns mth__01
map_filename("ma*_*1*") returns ma___1_
map_filename("*bos*2*") returns _bos_2_
Checks whether a string appears in a set of other strings, and returns a positive number based on which string it matches. If match_string is equal to string1, this function returns 1; if it is equal to string2, it returns 2. If match_string does not match any string, it returns 0. This function can be used to simplify boolean expressions by matching several strings simultaneously.
match(match_string : string, string1 : string, string2 : string, ...) : numeric
For example:
match("alice", "alice", "bob", "charlie") returns 1
match("fred", "alice", "bob", "charlie") returns 0
match("charlie", "alice", "bob", "charlie") returns 3
if(match(month, "January", "February", "March"), "keep", "discard") if month is March, returns keep
Performs a search of string based on regular expression, regexp. This function returns 1 (true) if a match is found, and 0 (false) if no match is found or the regular expression is invalid.
regexp(string : string, regexp : string, options : string) : numeric
For example:
regexp("Joe Smo","^jo") returns 0
regexp("Joe Smo","^jo","ignore_case") returns 1
See Regular Expressions in Integrator for more information.
Performs a search of string based on regular expression, regexp. This function returns the portion of the string that matches the given regular expression.
regexp_value(string : string, regexp : string, options : string) : string
For example:
If the Dimension "Company" looks like:
Company |
---|
Wholesale, Inc. (182) |
Lloyd and Louis Glasses, LLC. (299) |
Neverland 123 (0321) |
Dumbells & Company () |
regexp_value(Company, "([0-9][0-9]*)") returns
Result |
---|
(182) |
(299) |
(0321) |
NULL |
The fourth item is null since at least one digit is required for a match.
See Regular Expressions in Integrator for more information.
Replaces occurrences of src_string in string with dest_string; will add or delete characters as necessary.
replace(string : string, src_string : string, dest_string : string) : string
For example:
replace("We had an bogus time", "bogus", "excellent") returns We had an excellent time
replace("A|B|C", "|", "-") returns A-B-C
replace("We can merge words", " ", "") returns Wecanmergewords
replace(text, chr(10), \" \") turns the newlines in the text column into spaces
replace(text, chr(13), \" \") turns the carriage returns in the text column into spaces
Returns the reversed contents of string.
reverse(string : string) : string
For example:
reverse("abc") returns cba
Pads the end of the given string repeatedly with pad_string until the result is length characters long. If pad_string is omitted, spaces are assumed. If too short, no padding occurs.
NOTE: The command-line interpreter on UNIX (the shell) treats multiple spaces as a single space. To preserve multiple spaces, the argument needs to be quoted.
rpad(string : string, length L numeric, pad_string : string) : string
For example:
rpad("foo", 5, "x") returns fooxx
rpad("James", 10, "+-") returns James+-+-+
rpad("Long string", 5, "x") returns Long string
rpad("bar", 5) returns bar (bar followed by 2 spaces)
Returns string stripped of any characters in trim_chars that appear at the end of the string. If trim_chars is omitted, it defaults to spaces.
rtrim(string : string, trim_chars : string) : string
For example:
rtrim("test ") returns test
rtrim("abxxx", "x") returns ab
rtrim("Hello * -*", " -*") returns Hello
rtrim("xxxab", "xab") returns (empty string)
Parses the string into separate segments based on the set of characters in delimiters, and returns the segment indicated by token_number, with the first segment identified by 1. It will also accept negative segment numbers, and interpret them as segments from the end of the string: -1 would indicate the last segment, -2 would indicate the next-to-last segment. If the number is so negative that it identifies a segment before the first segment, a null string is returned. If delimiters is omitted, it defaults to spaces.
scan(string : string, token_number : numeric, delimiters : string) : string
For example:
scan("first, second, third", 2,",") returns second
scan("first, second,,fourth",4,",") returns fourth
scan("foo bar",4) returns (null string)
scan("foo bar",1) returns foo
scan("foo|bar/baz",3,"|/") returns baz
scan("c:\models\data\sales.mdl", -1, "\") returns sales.mdl
scan("c:\models\data\sales.mdl", -2, "\") returns data
scan("c:\models\data\sales.mdl", -4, "\") returns c:
scan("c:\models\data\sales.mdl", -5, "\") returns (null)
scan("c:/models/data\sales.mdl", -2, "/\") returns data
Converts value to a string (if necessary), and makes the return value be evaluated as a string in any subsequent operation. This function controls whether certain functions, like comparison functions, treat their arguments as strings or numbers. Normally, a comparison is done by first performing a numeric comparison by trying to convert both sides into numbers. If that fails, a string comparison is then performed. The string function provides a way to skip the numeric comparison.
string(value : string) : numeric
For example:
equals(string("0.0"),string("0")) returns 0
sin(string("0")) returns a null value (the sin of any string is returns a null value)
string("23") returns 23
string("19") < "2" returns 1
"19" < "2" returns 0
Allows separation of a portion of a string from the entire string. It accepts two or three arguments; start_pos specifies which character in the string to begin at (the first character is 1), and num_chars specifies how many characters from the starting position (this argument is optional). If num_chars is given, up to that number of characters is returned. If it is not included, substring will return all characters in string after the starting position. (A negative starting position is assumed to be 1.)
substr(string : string, start_pos : numeric, num_chars : numeric) : string
For example:
substr("this is a test",2,5) returns his i
substr("this is a test",3) returns is a test
This object takes a substring of the month column (e.g. "M01 January") and puts the first 3 characters of the month name into a new column called "Month Abbr."
object 'PROC' "month" {
process_type = "calc",
input = "sales",
calcs = {
{Name = "Month Abbr",Value = "substr(Month,5,3)"}
}
};
Counts the number of tokens in string based on the set of characters in delimiters, corresponding to the tokens parsed by the scan() function. If delimiters is omitted, it defaults to spaces.
This function allows the Integrator to find the number of words or tokens in a string. This function, along with the scan() function, allows the Integrator to parse a string, for example, "ak,ny,ma", into separate rows for each state.
token_count(string : string, delimiters : string) : numeric
For example:
token_count("first, second, third", ",") returns 3
token_count("foo|bar/baz", "|/") returns 3
token_count("foo bar") returns 2
token_count(",,,", “,”) returns 4
Translates characters in string that appear in src_chars with the character in the same position in dst_chars; will not remove characters from a string.
translate(string : string, src_chars : string, dst_chars : string) : string
For example:
translate("Mary had a little lamb", " ", "x") returns Maryxhadxaxlittlexlamb
translate("able", "abcde", "bcdea") returns bcla
translate("Dimensional Insight", "DI", "d") returns dimensional Insight
Returns the string string with all case characters converted to uppercase.
upper(string : string) : string
For example:
upper("New York") returns NEW YORK
Define a way to perform pattern-matching on strings. A regular expression is a character string that combines text and special symbols to define a match. The special symbols used are:
Character | Description |
---|---|
^ | Caret. When this appears as the first character in a regular expression, it matches the beginning of the string. When this appears anywhere else in a regular expression, it matches a literal '^'. |
. | Period. Matches an arbitrary single character. |
* | Asterisk. Matches 0 or more occurrences of the character or class immediately preceding the '*'. It will match as many characters as possible. |
$ |
Dollar sign. When this appears as the last character in a regular expression, it matches the end of the string. When this appears anywhere else in a regular expression, it matches a literal '$'. NOTE: Standard use of regexp requires literal $ characters to be escaped in some way. |
[] | Square brackets. Defines a class of matching characters. All characters following a '[' until a
closing ']' will match the regular expression at the given point. Within a class,
the '-' is used to define a range of characters, based on ASCII order. [A-z] will match any alphabetic character. [0-9] will match any digit. [0-9a-fA-F] will match any hex digit. |
\ | Backslash. Will escape the next character. It is used for the case where the next character is one of the special characters listed above. This should not be used to escape characters in a class definition. |
Any other character in a regular expression matches its literal value.
-
ignore_case—Regular expression match will be case-insensitive.
Character | Description |
---|---|
^$$ | will match an empty string. |
^.*$$ | will match any string. |
^This | will match any string beginning with 'This'. |
that | will match the string 'that'. |
^..$$ | will match any string containing exactly two characters. |
end$$ | will match any string ending with 'end'. |
^[0-9]*$$ | will match any string only consisting of numbers (including an empty string). |
^[0-9][0-9]*$$ | will match any string only consisting of numbers (not including an empty string). |
\^^^ | will match a string that contains '^^^' |
^[aeiou] | will match any string starting with a vowel. |
\$$[0–9] | will match a numeric string with a leading dollar sign character. |
concat(chr(36), $[0–9] | will match a numeric string with a leading dollar sign character. |
\$$ | will match a dollar sign character. |
\0$$ | will match 0 at the end of a string. |