ETL

Important: Please review the ETL Licensing article for information regarding purchasing and licensing.

Within Exago, ETL (Extract, Transform, Load) functions as a scheduled process capable of compiling a collection of data from disparate data sources, transforming this data, and finally writing the cleansed data back to a target database. It performs these processes all while using Exago’s built-in capabilities of sorting, grouping, filtering, aggregation, and so on.

ETL will ultimately make the process of reporting off of this data more performant and user-friendly. For example, if a data warehouse is physically separate from Exago's implementation, ETL can be configured to consolidate relevant information on a local database, allowing for lower latency and faster reporting.

Note: Currently, ETL Jobs are only capable of writing to SQL databases.

With the addition of ETL to Exago, two sections have been added to the Admin Console:

 

ETL Settings

An ETL Job is similar to an Advanced Report in terms of its setup, design, and scheduling, and is stored as a standard report, rather than as a data source. This is intended to simplify the implementation of ETL within Exago—for example, if folder management is implemented for reporting, then the same folder management scheme can be used to store ETL Jobs.

In order to create ETL Jobs, an ETL Reports Folder must be defined.

screen.etlsettings.png

The ETL Settings menu

The ETL Reports Folder is the parent folder where the particular ETL Reports are stored. This folder has been configured to hide the ETL Reports that are used to generate ETL data, preventing users from accessing these objects outside of the Admin Console.

 

ETL Schedule Manager

The ETL Schedule Manager is also located within the ETL Settings menu. By clicking the Show ETL Schedule Manager button, scheduled ETL Jobs can be viewed and managed.

screen.etlschedulemanager.png

The ETL Schedule Manager

The ETL Schedule Manager functions identically to the standard report Schedule Manager. Within the manager, scheduled ETL Jobs can be tracked, edited and deleted.

 

ETL Jobs

ETL Jobs are the foundation of ETL processes within Exago. These jobs collect and compile the data from specified data sources, structure and cleanse the compiled data, and define where and how often this data is written to the target database.

screen.etljobtree.png

A list of ETL Jobs

 

Creating ETL Jobs

To create a new ETL Job, highlight the ETL Jobs section of the Data menu in the Admin Console and click Add, then insert the following information:

  1. Name: The name that identifies the particular ETL Job.
  2. Destination Data Source: Select the data source where the data will be written to.
    Note: A Destination Data Source must be selected before an ETL Report can be edited.
    Tip: The Destination Data Source can be set to write to any data source regardless of the data sources it is compiling data from, allowing for the construction of separate data sources for reading a writing data.
  3. Destination Data Object: Define the name of the data table that the data will be written to. Special characters such as ! @ # $ % ^ and [space] cannot be used.
  4. Data Source Report Name: The report that is created to generate the written data.
    Note: Only reports located within the ETL Reports Folder setting may be added as Data Source Reports.
  5. Execute Now: Immediately executes the ETL Job once, writing all data to the specified data source.
  6. Scheduling: ETL Jobs utilize the Scheduler within Exago. By selecting Scheduler Enabled, scheduling settings can be adjusted for when the ETL process is executed. These settings include:

 

Example

screen.etljob_example.png

 

Rewriting Data

ETL Jobs rewrites existing data tables every time they are executed. Due to this fact, administrators should be cautious when writing to pre-existing data tables.

In order to mitigate the accidental rewriting of data, whenever an existing data table is added to the Destination Data Object field, a warning icon.tablewarning.png icon will appear notifying you that a table with this name already exists, and that writing to this data table will rewrite all data.

screen.preexistingdatatablewarning.png

A Destination Data Object set to a pre-existing data table 

When executing an ETL Job on a pre-existing data table:

Pre-existing Columns:

New Columns:

The visual example below shows a possible outcome of an ETL Job rewriting the data of a pre-existing data table:

example.preexistingtable.png      example.etlreport.png

Pre-existing data table (left) and ETL Report data (right)

example.overwrittendata.png

The data table after being rewritten by the ETL Job

 

ETL Report Designer

The ETL Report designer functions similarly to the Advanced Report designer. In fact, it is an Advanced Report designer that is in “ETL Mode,” which removes some options that would not be necessary for the purposes of ETL reporting (e.g., visualizations).

To open the ETL Report designer, click the New Report or Edit Report button, which changes dynamically based on whether or not an existing ETL Report is selected in the Data Source Report Name dropdown, located next to the Data Source Report Name section.

screen.etljob_editbuttonclick.png

Editing an existing ETL Report

screen.etljob_newbuttonclick.png

Creating a new ETL Report

 

Creating an ETL Report

In the Name tab, specify a report name and select a folder.

ETLreport.png

Next, in the Categories tab, select the categories that are to be included within the report.

As with Advanced Reports, Sorts and Filters may be added; however, these are not required.

Next, in the Layout tab, the layout of the report may be adjusted. ETL Reports must have a single report header, which is used to set the names of the columns being written to the target database, and a single visible detail section, which is used for writing data to each column. This limitation exists in order to replicate standard database table formatting and prevent different sources of data from writing to the same data columns.

The detail row may consist of a single visible detail section, a report footer, or a grouped header or footer; however, multiple detail rows can exist within the ETL Report as long as they are suppressed. This functionality allows data in suppressed rows to be referenced in visible cells.

Tip: See the Grouping and Aggregation in ETL Reports section below for more details.

An ETL Report may then be executed in order to verify that the proper data is produced. A partial data set with a step size of 1,000 will be generated in order to optimize execution time; however, the step size may be incrementally increased if a larger sample size is desired. To return to the ETL Reort designer, press the back icon.backbutton.png button, which is located in the top left corner.

Finally, click the Finish button. The newly created report's name will then appear in the Data Source Report Name dropdown.

Note: As with Advanced Reports, any specifications made during the initial creation of the report may be changed in the Report Options after creation.

 

ETL Report Formatting

Once the ETL Report is created, there are many formatting options available for use, such as:

Other aesthetic formatting options, such as italicizing and coloring font, are also available for use within the ETL Report designer.

Note: Currency symbols and other aesthetic formatting options will not be written to the target database. These formatting options have only been included within ETL reporting to allow the application of user-friendly designs when testing the report.

 

Proper Formatting of an ETL Report

Unlike the Advanced Reports, however, there are specific formatting requirements for ETL Reports.

The formatting requirements for ETL Reports include:

screen.etljob_properformatting.png

An example of a properly formatted ETL Report

If these formatting requirements are not met, the report will not be able to function properly as a data source, and thus, will not be allowed to be executed.

If there is an attempt to execute an improperly formatted ETL Report, the following popup will block the report execution:

formattingpopup.png

Note: The report may still be saved even if it is improperly formatted.

 

Data Cleansing

Within an ETL Report, data can be cleansed through the application of filters and filtering by formula.

Useful applications of filters for data cleansing include:

 

ETL Data Types

For each column within the ETL Report, a Data Type must be defined. ETL Data Types will be automatically defined based on the information provided in the ETL Data Types Dictionary. This information specifies metadata for the type of data value that will be written to the target database.

To manually select a Data Type, right-click a column within the report and select a Data Type from the ETL Data Type menu.

screen.etldatatype_select.png

An example of possible ETL Data Types

Once a Data Type is defined, the type will appear in parentheses in the column label to indicate that the typing has been changed.

If a Data Type is not set for a column, then the ETL Job will not be able to execute. The following warning message will appear when attempting to save an ETL Report containing a column with a blank Data Type:

screen.etldatatype_notesetmsg.png

The possible selection of Data Types is manually adjusted in the ETL Data Types section within the dbconfigs.json file. For more information see the ETL Data Types Dictionary section below.

 

Grouping and Aggregation in ETL Reports

Within ETL Reports data can be grouped and aggregated. This allows for groups and aggregated data to be created and written to the target database prior to report execution, which will ultimately allow reports based on these ETL Jobs to process and execute this data at higher speeds.

Tip: Reporting off of pre-aggregated and grouped data is often more performant than aggregating in the report.

screen.etljob_aggregationexample.png

An ETL Report with suppressed detail rows and grouped and aggregated data

 

ETL Data Types Dictionary

There is a plethora of possible data types that can be written to different types of databases. For example, within SQL Server there are 12 different string data types including char, varchar(max), ntext, varbinary, and so on. To make all of the possible data types available for selection as a Data Type could be incredibly overwhelming and redundant.

In order to mitigate confusion and simplify the list of selectable data types within the ETL Report, the ETL Data Types dictionary has been added. The ETL Data Types dictionary allows only the specified data types to be selectable within the ETL Report designer. This file can be manually adjusted to add, edit or remove specific data types, allowing for either a complete set or limited subset of data types to be defined.

The ETL Data Types dictionary is located within the dbconfigs.json file, which stores the settings information for each particular database. Within this file, which is located in the UI > Config > Other directory, data types can be added, edited, or removed using the following information:

The formatting of each dictionary entry should be as follows:

"<numKey>": { "Title": "<title>", 
"Value": "<value>",
"Quoted": <true/false>,
"MetadataDefaults": ["<list of metadata types>"] }

 

Example

screen.etldatatypes_file.png

An example of an ETL Data Types dictionary containing four data types

Important: The information detailed in the ETL Data Types dictionary should be consistent among each dbconfigs.json file located in both the web app and any managed schedulers.

 

Creating a Data Object from an ETL Job

Creating a data object from an ETL Job is a simple but separate process. These processes were separated to prevent data objects from being created unnecessarily from ETL Jobs that are being used solely for compiling, cleansing, and writing data (e.g., for use outside of Exago BI).

Creating a data object from an ETL Job follows the same process as creating a standard data object.

  1. In the Admin Console, highlight the Objects section in the Data menu and click Add.
  2. Select the data source that the ETL Job references and then select the table that is created by the ETL Report.

    screen.etldataobject_datatableselection.png

  3. Next add an alias for the data object.
  4. Select the Unique Key Fields.
  5. Add the Category for the data object to be grouped in.
    Tip: A Category specifically for ETL data objects can be created for organization purposes.
  6. Enter a unique Id for the data object.
  7. Optionally add Parameters, Tenant Columns, a Description, and Column Metadata.
  8. Select a Schema Access Type.
  9. Optionally select a Filter Dropdown Object and set the Suppress Sort and Filter option.

    screen.etldataobject.png

  10. Click Apply.

This newly created data object can now be used like any other standard data object within Exago.

 

ETL Server Event

In order to prevent administrators from having to create a new connection string for each client database, the server event OnEtlExecuteStart has been created. This server event allows administrators to define any number of databases and data connection strings that are required to run by running the same report in a loop for as many databases and connection strings as needed

Example

This server event would be executed for a single ETL Report that would target multiple client databases, all of which have the same underlying schema to pull from.

Namespaces: WebReports.Api.Etl

C# Code:

EtlTenantCollection etlTenants = args[0] as EtltenantCollection;

//Execute twice while preserving data
etlTenants.CleardataBetweenTenants = false;
EtlTenant tenant;
tenant = etlTenants.AddEtltenant();
tenant = etlTenants.AddEtlTenant();
return true;

In the above code, each EtlTenant object contains information regarding the settable data connection strings. This process can be performed within a .NET Assembly so that the connection string information will be hidden.

 

ETL through .NET API

For information on how to use ETL through the .NET API, please continue to this article.