Android in the cloud for a large Swiss corporation
Estimated read time: 15 min What you’ll learn: How exense automated and scaled complex Android scenarios for a large organization in Switzerland. Ideal profile(s): Application owner, Security specialist, Load tester Author: Dorian Cransac (exense GmbH)
Due to a shift toward new security standards and complex login scenarios involving multiple types of devices, a new set of problems are arising and challenging IT teams worldwide. In this case study, we will describe how exense has tackled these problems for a large corporation in Switzerland, and enabled the delivery of a series of mission-critical automation projects.
In this first section, we’ll look at the context in which the need for automating Android devices in the cloud appeared and the different reasons behind the introduction of a hybrid test setup for a large Swiss corporation.
Our client’s primary goal was to roll out a set of services designed to support the FIDO2 protocol and a variety of new authentication scenarios. Having a critical impact on the client’s customer-facing applications, these services could not be released without completing a variety of functional and non-functional tests at scale (i.e on a large number of devices).
Needless to say, security is a critical aspect to large private corporation, but in addition the criticality of the project, ambitious deadlines had been set before we even got involved, which forced us to move fast and make a series of rapid technical decisions.
The FIDO2 Protocol
The FIDO2 protocol greatly improves security by opting to always keep sensitive authentication data on the client, while enhancing convenience through the support of unified authenticators.
Credit: the FIDO Alliance
The variety of hardware and software stacks (browser, OS, device type) as well as authenticators (physical peripheral with pin code, fingerprint or face recognition, etc) results in many test scenarios and automation needs. In many cases, providers and users will opt for passwordless authentication, which makes user experience much smoother but adds complexity.
Credit: the FIDO Alliance
These improvements come at a relatively high technical cost and impact existing IT systems significantly. In addition to validating many combinations of software and hardware stacks, providers have to implement new services to support the new authentication scenarios, which also need to be tested.
Among the different software and hardware stacks, one emerged as particularly important and difficult to test: the mobile platform. In this day and age, it is clear that Android and iOS account for a large amount if not the majority of online traffic, including in the client’s sector. As far as the Swiss market goes, focus could have been set on iOS, but an early risk-reward analysis suggested that Android-based automation would provide stakeholders with the necessary and sufficient conclusions, while being a more natural platform to start with, given the technical specificities of the client’s applications.
To make things a bit more difficult, the authenticator chosen on mobile relied heavily on asynchronous messaging, especially as part of the user’s device registration workflow. Testing the Android login required the implementation of Google’s Firebase APIs in order to push notifications from internal client applications onto the Android devices. These notifications would play a major role as they’d carry information required to authenticate a mobile user, enabling the use of the client-side cryptographic keys. From a performance standpoint, measuring the propagation time of these notifications turned out to be critical.
A look at the following client’s architecture overview may give you a better sense of the key factors involved in the design of the test setup.
Among other areas of concerns were the choice of physical versus emulated devices, and where and how these could be industrially operated. Sensitive data such as proprietary cryptographic code or customer data would not be allowed to travel anywhere (and certainly not outside of Switzerland).
Over the course of a PoC phase, several R&D iterations and joint workshops, a series of important decisions were made, leading to the choice of a hybrid architecture which will be discussed throughout this section. While the brain of the automation will be operated from the client’s network, the Android farm will be hosted in exense’s own cloud and made remotely available to the client.
Side note: the automation itself (including script development and execution) relied on exense’s automation platform, called step. Since a large amount of documentation on step is provided in our knowledge base, we’ll intentionally set our focus on aspects relating more specifically to the management of the Android farm, the app itself and other technical challenges such as the handling of asynchronous messaging in a hybrid environment.
Here’s a coarse overview of the final setup chosen to run the tests. In this image, the Push Client and Android App were custom-tailored automation artefacts, while the Wait For Notification and Event Broker components were provided out-of-the-box as part of the automation platform (step). The other entities represented in the diagram were either third-party components or components which were part of the client’s own client system, subject of the tests.
the rectangles of different colors represent different network zones, highlighting the hybrid nature of the setup. It involves a mixture of cloud-based and on-premise components communicating with one another through (mostly) asynchronous messaging.
This approach allowed us to move quickly and within the bounds of budget and time constraints, while satisfying other key criteria such as high technical accuracy, meaning, our ability to reveal bugs and extrapolate the system’s behavior under productive workload.
Let’s now take a closer look at the thought process and underlying justifications behind each major decision.
Hosting Android in the cloud
Having already decided to place the Android platform at the center of our automation efforts, determining how and where this automation would be take place was key.
Creating and managing a farm of physical devices was deemed impractical, requiring the purchase and installation of smartphones. More importantly, it was unnecessary in our context since the generation of valid cryptographic keys without the use of a physical device was already possible. A first attempt was thus made to emulate Android devices on-premise, but meeting the hardware requirements on commodity client VMs proved to be excessively difficult.
The focus then shifted to exense’s private cloud in which the ability to operate Android devices had already been demonstrated. Operating the Android farm outside of the client’s network also made sense from an architectural perspective, since production users would effectively be using their devices from the internet’s public zone.
After clearing the usual networking roadblocks, a set of 50 emulated devices were provisioned and made remotely available. Basic calculations derived from an internal load model lead to the conclusion that these 50 distinct devices would provide more than enough capacity to simulate and handle an expected total load of 2000 logins per minute.
To help clearing some of the confidentiality concerns, exense was able to provide strong guarantees regarding the physical location of the datacenter (Switzerland). This aspect can be key for certain Swiss corporations which are subject to drastic local regulations when it comes to data location.
Using a mock application
In order to reduce the need for sensitive client code or data, a mock Android application was written and deployed on the Android emulators. A minimal amount of functionality was implemented, including the following:
the ability to register the underlying emulated device against Firebase
the ability to receive notifications directed at the test user
the ability to forward events to the client’s private Test Center
Events forwarded to the Test Center were of two types:
an event carrying registration information (i.e the device token assigned by Firebase upon starting the application)
an event carrying timestamp information marking the reception of the notification (required for End-To-End measurements)
Interconnecting hybrid components
Connecting the client’s on-premise Test Center with the Android emulators running in the cloud via the Event Broker proved to be essential. This communication channel enabled real-time monitoring of the notification’s propagation time as well as a break down of each individual device’s performance in one consolidated view in step.
The following figure shows the average end-to-end propagation time of notifications for each individual device (in milliseconds). The value is produced by measuring the duration between the moment the notification is pushed by the central application, to the moment it is received by the corresponding user’s device (and forwarded back to the Test Center).
The use of step’s Event Broker as a messaging proxy for external micro-polling allowed us to work around the strict networking policies of the organization. The white-listing of a couple of exense’s internet domains in the http proxy was the only internal change requested in the client’s test environment.
Operating Android at scale
The remote management of Android emulators was done via Keywords in step. All devices can be shutdown, started and initialized automatically. The management of application deployments, reboots and updates is performed using Appium and also exposed as step Keywords.
The following figure illustrates the software stack installed on hosts contributing to the Android farm.
The release of FIDO2 services was achieved within the expected deadlines. In the process, exense built and exposed a remotely-managed Android emulation farm for use in a hybrid cloud/on-premise setup. In addition to providing factual information regarding the overall performance of the system, the tests of authentication scenarios revealed several important bugs such as message duplication in Firebase and stability issues which would have been very difficult to identify otherwise. Since then, more comprehensive coverage of FIDO2 such as the registration and authentication with fingerprint’s keys was implemented and tested on top of the existing solution and new issues have been identified and analyzed.
As a software and services provider, our main take coming out of this project besides the purely technical success in scaling Android emulation and asynchronous testing, is the benefit of using hybrid solutions in corporate environments. Many clients are currently attempting to move much of their operations to the cloud, but the road to cloud-based IT is laden with traps. Therefore, we believe that flexibility at each level of the technical stack remains an invaluable asset at this point. By adding cloud components onto an existing on-premise test platform, our client benefited instantaneously from an external technological upgrade which would have been very difficult to achieve purely on premise.