Automated Testing in Telecom: Challenges and How AI Can Help

Author: Razvan Rusu

Gen AI is a very powerful tool that simplifies complex tasks in many areas including the technology field. This article tries to answer the question: Can Gen AI reduce the complexity of testing in telecom?

The short answer is Yes, in multiple ways, but AI won’t do all the work for us.

Mobile telephony is an easy-to-use service with a lot of complexity behind the scenes. Making a phone call is trivial, but this simple operation involves numerous systems and dozens of messages being exchanged. From the initial device authorization to the call end, all these messages are needed.

There are a few reasons why there are so many systems and messages involved:

Security
The communication takes place over an unsecured medium (wireless). Authorization and setting of the encryption keys must be performed before any call/data session. Encryption makes sure nobody can listen to your conversation or see your data transfer. Authorization, on the other hand, makes sure your phone can’t be cloned, which would allow another malicious device to receive or make calls as if it were your phone.

Standardization
The standardization for GSM is done by 3GPP (https://www.3gpp.org/about-us). The main driver for this standardization is interoperability between operators and interoperability between various vendors. An NE, (Network Element) part of the GSM core network, will work the same way for an operator in the United States as for an operator in Indonesia.

This standardization has some obvious advantages (roaming, for instance—a service we couldn’t live without these days), but it also has some drawbacks. The architecture was split into multiple systems (Network Elements) with clearly defined functionality and message flows. All mobile operators must use these Network Elements in the same way. None of them can decide they don’t like how things are working and choose to handle calls differently, like for instance having a single system performing all the logic. Everyone must stick to what the standard specifies.

Mobility and multiple generations of GSM (2G/3G/4G/5G) which must coexist
We can make calls on a 2G/3G connection or over a 4G/5G connection, depending on the coverage provided by the mobile operator in the area where we are located. The type of connectivity used is not within our control, and we expect consistent behavior for our calls. For instance, we expect to be informed if the called party has been ported to another mobile operator and we expect to be charged the same way regardless of the connection used for making that call.

Even more, a call may start on a 4G coverage as a VoLTE call and continue as a 3G call once the 4G coverage is lost. The caller shouldn’t feel this transition as for him it is the same call. However, for the mobile operator, switching from 4G to 3G is a big change that involves multiple systems and messages.

The Challenge

Testing a mobile service is as easy as making and answering a phone call. Or so it seems.

Testing using mobile phones has a few advantages:

It doesn’t require any special equipment/system; no investment is needed, as normal GSM phones can be used.
It doesn’t require specialized testing personnel. Anyone can use a phone, and the complexity of the systems involved in making a call is not visible during testing.
It provides an end-to-end testing, validating the user experience.

This testing method appears to be simple and very effective. Therefore it has been adopted by many mobile operators. Even more, this testing method was automated. Either with specialized equipment or by remotely controlling mobile phones. There are many solutions available for this type of automation.

If this method is effective, automated, and end-to-end, what more could be required? Well, let’s take a closer look at what this method does not cover. First of all, it checks only the edges of the solution. Did we notify all the systems that should have been notified about that call? We can’t say because this is not part of the test.

To make a parallel with testing an online shop: testing if the Place Order function works properly is done solely on the result page seen by the user. Whether the warehouse or the invoicing system was notified about that order is not checked. This would be unacceptable for testing an online shop. So why is it acceptable for mobile operators? We’ll discuss this a bit later.

The second big drawback of this mobile phone testing method is the limitation imposed by the device used for these tests. Several types of tests can’t be executed:

Roaming tests. The test phone is typically located in-office, within the country of the mobile operator. Therefore, all calls/events initiated from that phone will be national. As a funny side note, I was discussing this problem with the test lead of a mobile operator. She mentioned that when they need to test changes impacting roaming flows, they sometimes drive to the nearest border. It’s a one and a half hour drive, and they must be close to the border at midnight when the maintenance window starts. It’s not something they like or want to do, but there’s no other way they can test roaming scenarios.
Tests using the reference/test network instead of the live network. In these cases, the device must use the testing infrastructure, which may only be available in dedicated test sites, sometimes even requiring the terminal to be isolated in a Faraday cage.
International and premium destinations. For international calls, someone needs to answer the call at the other end, which is difficult to do when the device is not under your control. Premium numbers are expensive to call or text, so they are typically skipped in manual or automated testing.
Long calls. If you have an offering with 2000 national minutes included, testing what happens after these minutes are depleted requires 2000 minutes of testing (~ 33.5 hours). This makes it impossible to conduct nightly tests since they would not finish in time for the following day’s testing.

A new question arises: With all these problems, what makes this testing method so widely adopted? The answer lies in the complexity of the systems involved and the difficulty of having a test team with the required specialized technical knowledge. When running acceptance tests for Network Elements, mobile operators rely on the supplier of that NE. The supplier’s engineers possess the deep technical knowledge, and the mobile operator typically only observes and validates the process, without performing any actual testing themselves.

At the same time, mobile operators focus on testing new functionalities, such as a new voice plan, or a new data offering (e.g. free access to Instagram and TikTok). Regression testing is only seen as a nice-to-have.

The Solution

There isn’t a simple solution. If one existed, it would have been already used by mobile operators. However, this doesn’t mean there is no solution. Since it’s a complex problem, the best approach is to split it. Isolate the complex technical parts from the business-driven parts.

The technical parts hardly ever change in terms of the systems involved and message flows. It must be compliant with the 3GPP standards. So there isn’t a lot of room for creativity. What changes from test to test are the attributes/parameters of the messages. If you have a parametrized module that sends the messages and validates the responses, all you need do is call that module with the right parameter values. You don’t need to know the protocols involved or the specific messages that will be exchanged; the module will handle this complexity for you. This allows the QA team to run proper and complete testing without requiring deep technical knowledge.

For instance, let’s consider the example above. There is a new voice plan where calls are being charged differently. When placing a call, a CAP session triggers a Diameter Ro session towards OCS for 2G calls, or an SIP session which triggers a Diameter session for VoLTE (4G) calls. If you have a module that receives as parameters the originating party (A#), the calling party (B#), and the duration of the call, the QA team doesn’t need to know CAP, SIP, or Diameter, even though the test suite makes use of these protocols.

This separation allows the QA team to focus on testing functionality while simulating and validating the flows and data exchanged at telco-specific protocols. Testing becomes a bit more complicated than making a phone call, but not significantly so. The modules need to be called with the right parameters and their output needs to be validated. This can be done by an orchestrator (for instance a Shell/Python script) that takes input text files in CSV format and outputs the result in CSV format. The CSV format has several advantages:

It is in human-readable format
It has a very clear structure
Can be edited by well-known & used applications, like Excel, where data validation can be added to reduce the risk of human error

Having the test data (input data and expected results) in files opens the door to automation. The test execution can be easily integrated into a CI/CD pipeline. However, there is one additional thing to be considered before declaring the tests automated. The test scenarios need to be executed repeatedly and produce consistent results. They must be idempotent and repeatable to be added to an automated test suite. The steps of an idempotent test are:

Setup/configure required data for the test.
Execute the test steps.
Validate the results.
Delete/restore the data modified at step 1.

How can AI help

The success of Generative AI created a lot of hype. Enterprises are increasingly adopting Gen AI across their organizations. Chat GPT and GitHub Copilot have proven able to generate pieces of code and have become very useful tools for software developers.

Can Gen AI be used effectively in testing? Certainly, it can, and there are 2 main areas where it can help. (Note: the use cases presented below are not theoretical; they have been successfully implemented.)

Test case generation
This is considered the Holy Grail of Gen AI in testing – take as input a test plan, or even better the specification document, and generate the test suite. While Gen AI is not yet at this point, just as in the case of software development it can be used by QA engineers to develop faster test cases. The complexity isolation described above is very useful when generating test cases with AI.
Expecting Gen AI to generate the right messages, in the right order and with the right parameters according to 3GPP is unrealistic. And even if it could, the benefit would be limited as new business requirements don’t modify the 3GPP specifications. However, asking Gen AI to generate CSV files in a specific format with data presented in a natural language is a realistic expectation. For instance, you can give the following prompt to Gen AI: “Verify that a national call of 5 minutes deducts 300 units from NationalSeconds balance” or “A call of 2 minutes to +49123456789 should charge 0.012 EUR from the monetary balance”.
With some clever prompt engineering, Gen AI will generate CSV lines in the right format. This allows the QA team to focus on what they want to test rather than how the test is going to be conducted. Another benefit is significantly reducing the ramp-up effort required for new team members.

Troubleshooting support
There are situations where it’s crucial to understand the specific details of what went wrong in a test case, especially during regression testing. Most likely, something is wrong, preventing the new release from being deployed into production. But we must also investigate the issue.
If the problem is related to the business logic introduced by the new release, it may be easier to identify the cause. On the other hand, issues related to telco-specific protocols used during regression testing pose greater challenges, especially when the QA team lacks deep knowledge of these protocols.
Another scenario where detailed telco understanding is crucial is when developing telco-specific modules. If the QA engineer writes a test that fails, is the failure a test problem or an application problem? The 3GPP standard and the application specifications should provide clarity in such cases. However, in practice, this isn’t always the case. Have you ever tried to read a 3GPP document? To put it mildly, it’s not the most easily readable documentation. The complexity arises because each document references another, which references another, and so on. This complexity, while justified by the technical intricacies of telco standards, can be daunting for newcomers to the field.
Besides the standards and the project/system-specific documentation, another important source of information for the QA team is the history of tickets previously reported for that project/system. Since, in the telco world, a system is used for many years (often more than 10), these tickets provide valuable information. However, the sheer volume of tickets can be overwhelming, making it difficult, if not impossible, for a QA engineer to determine if a current problem has been previously reported. As a result, new tickets are frequently created, further increasing the number of tickets and decreasing the likelihood of identifying similar or identical issues.
Gen AI proves to be very useful for this problem. All we need is to create a custom knowledge base that includes:
- Standards and protocol specifications (3GPP docs)
- Product and project documentation
- Tickets reported during the product/project lifecycle (from the ticketing system, e.g. JIRA)
This way, Gen AI can quickly provide relevant information about a particular situation, indicating which parts of the documents are applicable. This saves hours or even days of digging through standards. Identifying existing tickets similar to the current failure is also extremely valuable, as these tickets include details on how the problem was solved, which might be applicable to the current situation.
Asking the questions in a natural language makes the adoption of such a solution instantaneous.

Bottom Line

Even though using Gen AI in testing is not yet mainstream, it has already been proven to facilitate the testing process. Thus, I anticipate a gradual but continuous adoption of Gen AI in testing overall, and specifically in telecom testing.