This tutorial will show you how to set up distributed system monitoring by leveraging keyword executions on dedicated token pools and analyzing the results as numerical measurements in RTM views. This example is based on windows performance counters using the standard typeperf command. The same approach can be reused on Unix with vmstat for instance and can easily be extended and customized to suit any other platform needs.
The idea here is to leverage the agent platform and executing a special keyword at regular interval in order to poll system metric values and feed them back to step’s measurement database. Sample keyword code is provided in our github sample repository under keywords/java/demo-system-monitoring.
It contains a generic keyword for executing managed processes from Java which is extended to cover the Windows typeperf process.
Setup your agent
First, we will create dedicated execution slots (called token pools) in order to isolate the execution of this technical keyword from business keywords.
All you need to modify in your existing agent setup is the AgentConf.json configuration file. You’ll need to add an AGENT_TYPE “DEFAULT” to the main tokenGroups and create a 2nd group with a unique AGENT_TYPE name and a capacity of 1 as shown in below example.
For the controller and database servers or any other servers you’d like to monitor, you will install a dedicated agent with only one token group for the monitoring following the same rules (capacity of 1 and unique AGENT_TYPE value).
Now that a distinct pool is available for monitoring purposes, we will tell step to route all regular keywords to the default pool. Later we will also make sure to route the monitoring keywords to the other pool in our plan.
Key: route_to_AGENT_TYPE, value: DEFAULT
Metric configuration and postprocessing
You may want to update the typeperf counters already defined in the included resource CSV file “typePerf.csv”.
Metric_name is the name of the metric which will be reported in RTM
Typeperf_counter is the typeperf counter to be retrieved (Refer to windows-commands/typeperf to get more knowledge about typeperf)
groovy is a groovy expression which will be evaluated during the post processing; the resulting value will be stored as measurement’s value
The step plan
The plan given as example is designed to run for one hour taking a sample for each defined agents every minute.
You’ll have to import it in your step instance, update the keyword reference and adapt the list of agents to be monitored in the json list:
Download plan: OSMonitoring.json
Note that the keyword execution is configured to take place on the new token pool reserved for monitoring (see the routing section of the keyword call node in the provided plan).
You may run the plan once and tweak it to match your desired duration and sampling interval, or schedule it to restart it every hour as shown below:
Choosing the CRON expression to run every hour will results in following entry in the scheduler
Unlike other test executions, all measurements from this keyword are created using a common execution id (eId) unrelated to the actual step execution.
The common eId used is “OSmonitor” and is used in RTM to retrieve related measurements. If you keep the monitoring running in the background, it is recommended to use a time frame filter in your query. Besides the metric name, the “hostname” of the monitoring agent is automatically added to your measurements, you may then use it as a group clause:
Which give such output:
All measurements are created using a common execution id (eId) unrelated to the actual step execution. The common eId used is “OSmonitor”
You may cleanup the measurement collection based on this eId and timestamp. Refer to our Housekeeping section.