Important: Please review the ETL Licensing article for information regarding purchasing and licensing.
Within Exago, ETL (Extract, Transform, Load) functions as a scheduled process capable of compiling a collection of data from disparate data sources, transforming this data, and finally writing the cleansed data back to a target database. It performs these processes all while using Exago’s built-in capabilities of sorting, grouping, filtering, aggregation, and so on.
ETL will ultimately make the process of reporting off of this data more performant and user-friendly. For example, if a data warehouse is physically separate from Exago's implementation, ETL can be configured to consolidate relevant information on a local database, allowing for lower latency and faster reporting.
Note: Currently, ETL Jobs are only capable of writing to SQL databases.
With the addition of ETL to Exago, two sections have been added to the Admin Console:
An ETL Job is similar to an Advanced Report in terms of its setup, design, and scheduling, and is stored as a standard report, rather than as a data source. This is intended to simplify the implementation of ETL within Exago—for example, if folder management is implemented for reporting, then the same folder management scheme can be used to store ETL Jobs.
In order to create ETL Jobs, an ETL Reports Folder must be defined.
The ETL Settings menu
The ETL Reports Folder is the parent folder where the particular ETL Reports are stored. This folder has been configured to hide the ETL Reports that are used to generate ETL data, preventing users from accessing these objects outside of the Admin Console.
The ETL Schedule Manager is also located within the ETL Settings menu. By clicking the Show ETL Schedule Manager button, scheduled ETL Jobs can be viewed and managed.
The ETL Schedule Manager
The ETL Schedule Manager functions identically to the standard report Schedule Manager. Within the manager, scheduled ETL Jobs can be tracked, edited and deleted.
ETL Jobs are the foundation of ETL processes within Exago. These jobs collect and compile the data from specified data sources, structure and cleanse the compiled data, and define where and how often this data is written to the target database.
A list of ETL Jobs
To create a new ETL Job, highlight the ETL Jobs section of the Data menu in the Admin Console and click Add, then insert the following information:
Note: A Destination Data Source must be selected before an ETL Report can be edited.
Tip: The Destination Data Source can be set to write to any data source regardless of the data sources it is compiling data from, allowing for the construction of separate data sources for reading a writing data.
Note: Only reports located within the ETL Reports Folder setting may be added as Data Source Reports.
ETL Jobs rewrites existing data tables every time they are executed. Due to this fact, administrators should be cautious when writing to pre-existing data tables.
In order to mitigate the accidental rewriting of data, whenever an existing data table is added to the Destination Data Object field, a warning icon will appear notifying you that a table with this name already exists, and that writing to this data table will rewrite all data.
A Destination Data Object set to a pre-existing data table
When executing an ETL Job on a pre-existing data table:
Pre-existing Columns:
New Columns:
The visual example below shows a possible outcome of an ETL Job rewriting the data of a pre-existing data table:
Pre-existing data table (left) and ETL Report data (right)
The data table after being rewritten by the ETL Job
The ETL Report designer functions similarly to the Advanced Report designer. In fact, it is an Advanced Report designer that is in “ETL Mode,” which removes some options that would not be necessary for the purposes of ETL reporting (e.g., visualizations).
To open the ETL Report designer, click the New Report or Edit Report button, which changes dynamically based on whether or not an existing ETL Report is selected in the Data Source Report Name dropdown, located next to the Data Source Report Name section.
Editing an existing ETL Report
Creating a new ETL Report
In the Name tab, specify a report name and select a folder.
Next, in the Categories tab, select the categories that are to be included within the report.
As with Advanced Reports, Sorts and Filters may be added; however, these are not required.
Next, in the Layout tab, the layout of the report may be adjusted. ETL Reports must have a single report header, which is used to set the names of the columns being written to the target database, and a single visible detail section, which is used for writing data to each column. This limitation exists in order to replicate standard database table formatting and prevent different sources of data from writing to the same data columns.
The detail row may consist of a single visible detail section, a report footer, or a grouped header or footer; however, multiple detail rows can exist within the ETL Report as long as they are suppressed. This functionality allows data in suppressed rows to be referenced in visible cells.
Tip: See the Grouping and Aggregation in ETL Reports section below for more details.
An ETL Report may then be executed in order to verify that the proper data is produced. A partial data set with a step size of 1,000 will be generated in order to optimize execution time; however, the step size may be incrementally increased if a larger sample size is desired. To return to the ETL Reort designer, press the back button, which is located in the top left corner.
Finally, click the Finish button. The newly created report's name will then appear in the Data Source Report Name dropdown.
Note: As with Advanced Reports, any specifications made during the initial creation of the report may be changed in the Report Options after creation.
Once the ETL Report is created, there are many formatting options available for use, such as:
Other aesthetic formatting options, such as italicizing and coloring font, are also available for use within the ETL Report designer.
Note: Currency symbols and other aesthetic formatting options will not be written to the target database. These formatting options have only been included within ETL reporting to allow the application of user-friendly designs when testing the report.
Unlike the Advanced Reports, however, there are specific formatting requirements for ETL Reports.
The formatting requirements for ETL Reports include:
An example of a properly formatted ETL Report
If these formatting requirements are not met, the report will not be able to function properly as a data source, and thus, will not be allowed to be executed.
If there is an attempt to execute an improperly formatted ETL Report, the following popup will block the report execution:
Note: The report may still be saved even if it is improperly formatted.
Within an ETL Report, data can be cleansed through the application of filters and filtering by formula.
Useful applications of filters for data cleansing include:
For each column within the ETL Report, a Data Type must be defined. ETL Data Types will be automatically defined based on the information provided in the ETL Data Types Dictionary. This information specifies metadata for the type of data value that will be written to the target database.
To manually select a Data Type, right-click a column within the report and select a Data Type from the ETL Data Type menu.
An example of possible ETL Data Types
Once a Data Type is defined, the type will appear in parentheses in the column label to indicate that the typing has been changed.
If a Data Type is not set for a column, then the ETL Job will not be able to execute. The following warning message will appear when attempting to save an ETL Report containing a column with a blank Data Type:
The possible selection of Data Types is manually adjusted in the ETL Data Types section within the dbconfigs.json file. For more information see the ETL Data Types Dictionary section below.
Within ETL Reports data can be grouped and aggregated. This allows for groups and aggregated data to be created and written to the target database prior to report execution, which will ultimately allow reports based on these ETL Jobs to process and execute this data at higher speeds.
Tip: Reporting off of pre-aggregated and grouped data is often more performant than aggregating in the report.
An ETL Report with suppressed detail rows and grouped and aggregated data
There is a plethora of possible data types that can be written to different types of databases. For example, within SQL Server there are 12 different string data types including char, varchar(max), ntext, varbinary, and so on. To make all of the possible data types available for selection as a Data Type could be incredibly overwhelming and redundant.
In order to mitigate confusion and simplify the list of selectable data types within the ETL Report, the ETL Data Types dictionary has been added. The ETL Data Types dictionary allows only the specified data types to be selectable within the ETL Report designer. This file can be manually adjusted to add, edit or remove specific data types, allowing for either a complete set or limited subset of data types to be defined.
The ETL Data Types dictionary is located within the dbconfigs.json file, which stores the settings information for each particular database. Within this file, which is located in the UI > Config > Other directory, data types can be added, edited, or removed using the following information:
The formatting of each dictionary entry should be as follows:
"<numKey>": { "Title": "<title>",
"Value": "<value>",
"Quoted": <true/false>,
"MetadataDefaults": ["<list of metadata types>"] }
An example of an ETL Data Types dictionary containing four data types
Important: The information detailed in the ETL Data Types dictionary should be consistent among each dbconfigs.json file located in both the web app and any managed schedulers.
Creating a data object from an ETL Job is a simple but separate process. These processes were separated to prevent data objects from being created unnecessarily from ETL Jobs that are being used solely for compiling, cleansing, and writing data (e.g., for use outside of Exago BI).
Creating a data object from an ETL Job follows the same process as creating a standard data object.
Tip: A Category specifically for ETL data objects can be created for organization purposes.
This newly created data object can now be used like any other standard data object within Exago.
In order to prevent administrators from having to create a new connection string for each client database, the server event OnEtlExecuteStart has been created. This server event allows administrators to define any number of databases and data connection strings that are required to run by running the same report in a loop for as many databases and connection strings as needed
This server event would be executed for a single ETL Report that would target multiple client databases, all of which have the same underlying schema to pull from.
Namespaces: WebReports.Api.Etl
C# Code:
EtlTenantCollection etlTenants = args[0] as EtltenantCollection;
//Execute twice while preserving data
etlTenants.CleardataBetweenTenants = false;
EtlTenant tenant;
tenant = etlTenants.AddEtltenant();
tenant = etlTenants.AddEtlTenant();
return true;
In the above code, each EtlTenant
object contains information regarding the settable data connection strings. This process can be performed within a .NET Assembly so that the connection string information will be hidden.
For information on how to use ETL through the .NET API, please continue to this article.