What is a cBase?
A cBase is a proprietary columnar database optimized for aggregating and summarizing large volumes of data. It contains the raw data in binary format. Everything is stored in columns, and each column has a type. A cBase is intended to be kept mostly or entirely in memory and shared among all processors on a server. For best performance, DI recommends that cBases be located on local SSDs (solid-state storage devices that use integrated circuit assemblies as memory to store data persistently).
A cBase file supports date and period types with calendar support. It contains metadata for:
- Spectre Build scripts and logs
- Column types and statistics
- Data integrity errors and warnings
There are two methods for creating cBases:
- Using a Spectre Build script. See Creating cBases with Spectre Build Scripts.
- Using a Visual Integrator script. See Creating cBases with Visual Integrator Scripts.
You can iteratively edit the Build or Visual Integrator (cBase output object) scripts and re-build the cBase to fine tune it.
- If you are working with large data sets, you can use the Limit Rows property in a Build file to limit how much data is processed while you iterate.
TIP: The Limit Rows property in Spectre Build scripts sets the row limit for each file in your data set, ensuring that you get a complete sampling of your data. - There are no Dimensions, Info Fields, or Summaries, as in a classic Model—all data is stored in columns. However, you can indicate a preference for how a client such as ProDiver handles the cBase columns.
- Column typing is done by default, but it is configurable. Set each column's type to effect how the data is stored internally and displayed in the clients (for example, ProDiver or DivePort). Best practice is to always declare what type of data is in a field as it is built into a cBase.
- String—Columns are treated as dimensions by default
- Date , Period and Datetime—Columns are treated as dimensions by default
- Double—Columns are treated as summaries by default
- Integer and Fixed100—Columns are treated as summaries and available as dynamic dimensions
- Boolean—Columns are treated as dimensions by default
- You do not need to define consolidated multimodels (as with the classic Model structure). It is more efficient to merge the like file structures together first, and build a single cBase.
- A multilevel multimerge via a cPlan is possible when the cBases do not share common summary columns.
-
When you run a Spectre build and do not specify the encoding of the data, Spectre reads the first 4 megabytes of the file and attempts to figure out the encoding from this sample. If there are no high bit characters in this sample, Spectre uses latin1 (ISO-8859-1) encoding.
- The maximum number of rows a single cBase may contain is 232-1, which is 4,294,967,295.
See also:
- Spectre Data Types
- cBase Objects in Visual Integrator
- Build cBase Process Node in Production