How to monitor services availability and performance
This tutorial will show you how step can be used to monitor services availability and metrics using plan executions in combination with the notification package, the scheduler and the monitoring view as a dashboard.
Before going through this tutorial, make sure you understand what stepkeywords and plans are.
You could also refer to the example usage of test sets and test cases for a better understanding on how monitoring plans are defined.
Services availability and performance monitoring
Let’s assume you are operating 3 different Windows services and you want to monitor them by checking their status (running or dead) and measure the average response time needed to execute the check.
A simple keyword will be used to check these services and return as an output the service’s health status (running / stopped).
The keyword has 2 inputs :
executablePath : in this tutorial we are using Powershell
register the WindowsServiceStatusKeyword keyword into step as described here
Define your monitoring plans
Setup your keywords and TestCases
For clarity purposes, the keyword calls have been labeled with the service name as below :
Let’s define a simple plan of type TestSet containing 3 TestCase controls to execute the services health check keywords :
A good practice is to wrap each of your checks in a TestCase control. This will enable an execution per test case split view and give you a better control on what is executed in the monitoring checks test set.
In addition, all your test cases will be executed in parallel if they are defined under a TestSet control.
Add an assertion on keyword output
In order to perform a check on the service status (is it running or not ?) within our test plan, let’s add an Assert control under each of our keywords :
Let’s execute the plan and click on the Check Sprint Pooler test case to display its content :
As per above screenshot, we can see that the Check Print Spooler keyword node status is red and marked as FAILED because the service is stopped !
These functional checks can now be productively used in order to monitor our services.
Schedule your plan
Now that we have some functional checks, let’s schedule them to run periodically using the scheduler.
From an execution of your plan, click the “Schedule” button on the top right panel :
You can now define the period you want your monitoring plan to be executed. In this example, we are using the “Every 5 minutes” preset (you can use the Java CRON expression of your choice) :
Click the “OK” button : you are redirected to the stepscheduler tab from where you can see and edit all the scheduling entries you created :
Let’s have a look at the monitoring dashboard : you can access it by clicking on the “Monitoring” tab from the top menu :
As you can see, the latest execution of our monitoring plan ended as FAILED as the Print Spooler service is not running.
In order to illustrate the “Last status change” column behavior, let’s fix the Print Spooler service and wait for the next plan execution.
Here the monitoring view 5 minutes after having applied the fix to the service :
We can see that the “Last status change” column value has been updated accordingly to the last plan execution overall status !
Long term trends / history
To display the performance metrics over time, open any execution of your monitoring test plan, switch to the “Performance” tab then click on “Interactive analytics” :
Now that we have been redirected to RTM, we can remove the existing filter base on the execution id : its purpose is to filter the measurements of the selected execution. In our case we are interested in all the measurements of a specific plan over time. Therefore we remove the existing filter base on the execution id.
Click on the associated right red cross to remove it :
Let’s now filter our result to display only the Check_Print_Spooler_Service_Health keyword’s response time : add a simple “Text filter” based on the keyword name :
See below the graph containing the average response time for the selected keyword :
To retrieve the executions data of all our keywords , we can use a regular expression filter still based on the keyword name as below (in our example, all the keyword name starts with “Check”) :
You can now see the graph containing the average response time of each services over time :