The 2016 release of Exago introduces a new powerful feature to the Report Scheduler: The Scheduler Queue. The Queue is a custom-built application library that sits in between the Exago core application and any number of scheduler instances and handles how schedule traffic is managed. The Queue is completely optional, but configurations with multiple scheduler instances for which load balancing is a priority are ideally suited to making use of this feature.
Default Setup (pre-2016)
First, some background. The way in which Exago has historically handled report scheduling, and the default behavior without using a queue, is the following.
NOTE. For this discussion, it's important to define some terms:
A Schedule is a term for all of the information that is set when creating a schedule in the Schedule Manager. This information is usually stored as an xml file in a repository. Schedules can be accessed from the API using the ReportSchedule class.
Each Schedule contains some interpreted data that tells the schedulers when to run it. This information is called a Job. Jobs can also be stored separately from schedules. Jobs can be accessed from the API using the QueueApiJob class.
The process whereby a scheduler runs a report at a specified time and emails or saves the information is called an Execution.
Within the host application, all scheduler instances are listed in the configuration xml file:
When a schedule is created in the UI, the host application sends the job to schedulers starting with the first and moving down the list ("round-robin" style). The queried scheduler stores the schedule xml in a local working directory. This acts as a repository for the scheduler's unique set of jobs.
From this point, each scheduler acts independently. The host application has no idea what happens to schedules after they are sent out successfully. Likewise, the schedulers have no more communication with the host application with regard to report execution.
A word about the Schedule Manager: You can view and edit schedules from the UI using the schedule manager, but this is essentially a combined front-end for the schedulers' existing files. If a scheduler is offline you will simply not see its schedules in the list (there will be a warning message). The schedule manager has no impact on the host application.
Schedulers periodically scan their repository for job execute times. If a job is ready and the current time is equal to or past the execute time, the scheduler knows to run the job. The scheduler will perform its duty and then alter the schedule xml to indicate success or failure and the next execute time.
This default behavior may be adequate for most cases, but there can be issues. In particular, the scheduler queue sets out to solve the following two issues that can arise in default configurations: Load Balancing and Unexpected Outages.
Load Balancing issues: Ideally, unoccupied schedulers would receive new jobs. This way stacks of unexecuted data do not build up on individual schedulers, leading to imbalanced load and potential time loss. But the host application has no idea which schedulers will be busy when, and no idea how long jobs will take to run. The randomness of round-robin job assignment could cause jobs to build up inordinately on one scheduler.
Outages: Once the host sends out a schedule, as far as it's concerned, it's finished. If a scheduler goes offline unexpectedly the host has no recovery function. The job will simply be delayed until the scheduler is restarted, which, to some extent, defeats the purpose of running jobs on a schedule. There is also no function to move schedules from one scheduler to another.
How the Queue Works
The Scheduler Queue is a custom .NET or Web Service library which aims to handle scheduling in a much more robust manner. It's important to note that the queue is entirely customizable. You are only required to implement all the applicable methods; how you do so is up to you. The following section will describe a typical setup which can improve load balancing and help resolve some common issues with multiple schedulers. Later on, we provide a pre-built Example that can be used as-is with minimal modifications, or altered as you see fit.
The queue sits in between the Exago host application and any number of scheduler services and handles logic for all scheduler requests and maintenance.
The host and scheduler applications all make calls to the queue at certain points during their runtime. In particular, schedulers will call the queue on three occasions: upon service startup, periodically while running, and when a job's status is changed. The host application calls the queue for various maintenance tasks related to schedule creation and populating the Schedule Manager. For now, we'll focus on the relationship of schedulers to the queue and how it can aid a typical multi-scheduler configuration.
When schedulers are configured to use the queue, their behavior changes somewhat.
Recall that in the default configuration, schedulers store their unique schedules in a local working directory, from which jobs are queried for execution.
Now, schedulers periodically query the queue, which has instructions (GetNextExecuteJob) for assigning jobs. (The query time defaults to 15 seconds, but is configurable). In a typical setup, the queue pulls from a central repository of stored schedules. In order to prevent duplication, schedulers lock the queue so that only one may access it at a time. Additionally the queue sets a job's status to "running" while it's active, so that other schedulers know to ignore it. (The provided Example also saves a temporary file in the job repository to indicate which scheduler is handling a running job).
NOTE. Schedulers still use a local working directory for temporary files.
This has several advantages. First, schedulers are no longer responsible for a unique set of schedules. This prevents outages from causing excessive missed executes. Only one job will ever be hung per scheduler, since a scheduler will be responsible for only one job at a time. If a scheduler goes offline in the middle of a job, the queue can be used to gracefully handle incomplete jobs (this is not present in the provided Example).
Next, jobs are now distributed much more evenly between the schedulers. We no longer have the problem where, due to their independence, schedulers will build up excessive numbers of jobs. Jobs will only be assigned to available schedulers.
Finally, since this allows us to control what data is being sent and received to the schedulers and the file system, we could implement any custom load balancing solution we wanted.
Getting Set Up
Setting up the queue is a multi-part process which depends on your desired configuration. We'll discuss some constants and some potential variations.
First we need to write the scheduler queue. This is discussed in more detail in the Example section. This can be a .NET assembly or a web service, and it can be part of another library. All the following methods must be implemented in the queue interface:
public static string GetJobList(string viewLevel, string companyId, string userId)
Called from the Exago UI to populate the jobs in the Schedule Manager.
public static string GetJobData(string jobId)
Called from the Exago UI Schedule Manager to get the full job XML data for a job.
public static void DeleteReport(string reportId)
Called from the Exago UI when a report is deleted.
public static void RenameReport(string reportId, string reportName)
Called from the Exago UI when a report is renamed.
public static void UpdateReport(string reportId, string reportXml)
Called from the Exago UI when a report is updated.
public static void Flush(string viewLevel, string companyId, string userId)
Called from the Exago UI Scheduler Manager in response to a click on the Flush button.
public static void Start(string serviceName)
Called from scheduler services to indicate when a specific service starts.
public static string GetNextExecuteJob(string serviceName)
Called from the scheduler services to return the next job to execute.
public static void SaveJob(string jobXml)
Called from both the scheduler services and the Exago UI to save the job. This method is called when a schedule is added, updated, completed, killed, etc.
The QueueApi and QueueApiJob helper classes have been added to the Api to facilitate writing the queue. You'll need to reference the WebReports.Api.Scheduler namespace. QueueApiJob wraps a Job object and a variety of useful methods for managing jobs. The QueueApiJob class will be used extensively in the following example.
The host application config and each scheduler config must contain the path to the scheduler queue assembly or web service class in the following format:
You can set the path in the host app by using the Admin Console and setting the following field in the Scheduler Settings:
Or by setting the field <schedulerqueueservice> in the config file,
Or by setting the field Api.SetupData.General.SchedulerQueueService via the API at runtime.
In each scheduler application, set the field <queue_service> in the scheduler config file.
Next, determine how you'll be accessing your schedules. A common solution uses a database to optimize lookup speed. The queue only needs to know the Job ID (filename), Next Execute Time, and the Running status to determine which schedules to run.
Next Execute Time
If you're using folder management, you can implement the those methods in the queue assembly (see Report and Folder Storage/Management for more information).
This example is provided for basic reference. It is not "plug-and-play." You will most likely have to make alterations to fit your configuration. This example uses a directory for schedule storage and fully implements the Schedule Manager. It supports unlimited scheduler services, handles basic load balancing, and writes a log file and temporary files to the storage directory.
Download the example here. To compile, set the QueueDirectory path, rename the file with a .cs extension, and add it to a Visual Studio project.