Photo: Getty Images/Delmaine Donson
The biggest problem with maintaining a good model is that it is an awful lot of work. Within an actuarial model life cycle, there are several common challenges that seem to come up again and again. These challenges can be grouped into four buckets:
- Calculation management
- Data management
But, as the saying goes, anything that is worth doing at all is worth doing well. So, this article will provide an introduction and short overview of a few modern technical solutions to help with common problems that arise in each of these four areas.
Calculation management is the process of managing what a model does—as in, what mathematical calculations are being performed, in what order and what level of detail is associated with those calculations. The management of these calculations is often a large focus of model governance practices, and this is to ensure calculations are doing what they are supposed to and are being used appropriately. This is a key part of the new Actuarial Standard of Practice (ASOP) 56 on modeling that goes into effect in October 2020.
The management of a model’s calculations can take on very different practical forms depending on the underlying modeling platform being used. A closed-box vendor platform may focus more on the specific configuration as well as the version of the vendor platform being utilized. Thus, there is often an intrinsic link between vendor development cycles and model management cycles. On the opposite end of the spectrum are home-grown models, where the calculations are developed and coded in-house (this includes self-built models contained in Excel workbooks). In these situations, the calculation management cycle will more likely be driven by business needs.
In both of these situations (closed-box and home-grown), the management of changes is critical for ensuring model updates are made appropriately.
Version-control systems, also known as revision control, track changes to files and documents. These systems provide workflows for the management of changes being made to files and documents, and they allow for the tracking of historical versions, as well.
Some commonly used examples include Amazon Web Services’ (AWS’) CodeCommit, Microsoft’s Team Foundation Server, Git and Subversion—just to name a few. When properly used, they provide a complete trail of all changes made to a software system or set of files through the use of a common shared repository. These systems frequently are used in the software development world. However, they are not applied as often to modeling applications.
Integrating a version-control system with existing models can be difficult, depending on how models are organized and stored. Since these systems rely on files and file structures to manage changes, some types of files—most notably binary data files—are harder to maintain because the version-control system doesn’t necessarily understand how to translate the binary data into meaningful business implications. The systems can still easily track different versions, but they may not provide meaningful insight into the nature of the changes.
This is also true for Excel workbooks, which are, by default, stored in either a binary or compressed format. Several companies now provide Excel-specific version control and tracking with similar capabilities to record different user revisions to formulas, data and workbook designs.
Data management is the process of managing the data that is used to pass through the model calculations. It includes inputs to the model, assumptions used by the model, tables and other fixed external data sources, as well as the model results.
In many types of models, managing data also can be handled through version-control systems to track changes and versions of various assumptions. However, an additional layer of complexity comes in when there are multiple sets of assumptions that feed into the same inputs within a model. For example, you may have a best-estimate mortality assumption that is used for planning or pricing purposes, but you may have a different mortality assumption for valuation that includes conservatism. Some modeling systems might require these assumptions to be stored in the same relative file location of the base model, and this can cause problems when multiple versions of assumptions are needed. Good management of the model assumptions is key.
There also can be interdependencies among different assumption inputs that need to be managed as sets of data. For example, a complex dynamic lapse formula on an account value-based product may contain different sets of inputs for base-lapse vs. dynamic-lapse functions. But the full-lapse assumptions from a business perspective consist of a combination of all of these various inputs working in coordination with one another. This can make it more difficult to manage specific assumptions using just a version-control system.
Generally speaking, most people want models that:
- Produce accurate results.
- Use precise methods for all features.
- Run quickly.
But often we find ourselves forced to pick just two out of these three characteristics. As a model starts to become complex enough, runtime will almost inevitably become an issue. As actuaries, we have a tendency to seek greater amounts of information with ever-increasing layers of complexity and detail. This leads to longer-running models.
There are many different methods that can help with runtime, and they largely fall into three categories:
- Reduce model complexity.
- Reduce model data.
- Increase computational capacity.
Reducing model complexity means to reduce the number of calculations that need to be performed for each model segment, thus reducing the strain on hardware. Reducing model data includes various model-reduction techniques, including clustering, model segmentation and reduced projection periods. Increasing computation capacity includes solutions such as purchasing newer and more robust hardware, distributive processing and cloud computing. I will only address the last of these.
Over the past few years, it seems that computer processors are not getting much faster. Technology improvements that allowed the electronic circuits on microchips to be made increasingly smaller drove advancements in computing power for many years. The rate at which this occurred allowed processors to double in speed roughly every 18 months—this is known as Moore’s Law. But today’s microchip design is approaching the physical limitations of what is possible. Instead, newer chips include more cores, which are available for processing data in parallel to continue increasing computation power. This is why methods for distributive processing have increased.
Distributive processing provides mechanisms for spreading workloads across multiple sets of hardware for parallel processing of calculations. This comes in several forms of increasing complexity. The simplest—handled by most modern-day laptop and desktop computers—is multithreading. This takes several executable processes and splits the workload across the multiple cores, or threads, on a single processor microchip.
Grid computing takes this to the next level by splitting workloads across multiple computers connected via a closed network. These grids can be scalable, but because they are physically connected within a single company, the maintenance can be costly. Some grids also require special connections to manage the data movement among them, which can lead to complexity.
Cloud computing provides some of the answers to the problems with distributive processing. Cloud providers have a vast pool of resources available for computation. These providers have reached economies of scale with the hardware they provide, which is hard to compete with for even the largest companies.
Cloud technologies also have evolved over the years. At first, cloud servers were nothing more than an outsourced server space for hosting applications. They were not much different from owning your own computer, other than the fact that they were stored in someone else’s server room.
Virtualization has allowed for a more flexible solution. Virtual machines (VMs) are an abstraction of hardware in a given computer. Multiple VMs can be set up on a single computer and share the same underlying set of hardware, allowing for separation of multiple independent servers without the need for each to have dedicated hardware. This allows for large, high-capacity servers to provide resources for multiple hosts simultaneously, allowing for higher utilization of the total available hardware.
The downside to a VM is there is a lot of redundancy in the setup. Each VM contains its own operating system, system files and drivers to interact with the hardware, which isn’t necessary on a single shared machine. Containers have solved this problem. They act similarly to VMs, but they don’t use that redundant overhead. A container can be set up so that it contains only the base set of code and specific required dependencies, which makes it much smaller. This allows for more operations to fit into a single set of hardware. One of the most common container systems used today is called Docker.
There are many container management systems that allow for management and deployment of containers across a large pool or grid of servers for dynamic management of resources. This includes systems like AWS Elastic Container Service (ECS) and, more recently, Kubernetes (“koo-burr-NET-eez”). These systems orchestrate a large number of smaller processes or resources and can effectively manage the distribution of this workload across the servers, regardless of the type of workload each individual container needs to perform.
Now, cloud providers also provide what they call “serverless offerings.” This is, of course, a misnomer, as servers are still processing the underlying applications. The difference is in the additional abstraction from the user of which servers are being used. Some common examples include Azure Functions, Google Cloud Functions and AWS Lambda. In these systems, code is executed against large pools of available hardware, and there is no explicit setup to define a server, VM, container or other aspects of the underlying infrastructure. Instead, the cloud provider handles all of the details of managing it.
Models are used for many purposes within organizations. They can have many upstream and downstream effects depending on how data flows into and through them, and is used downstream from them. This can result in a larger number of steps required to facilitate the modeling process.
Robotic process automation (RPA) is pretty much exactly what the name says. Robots (of the software-only kind) perform automated processes for you. If you have ever recorded and run a macro in Excel, you’ve built a very basic form of RPA: a clearly defined, repeatable process where the computer can perform a series of steps you’ve defined, often much faster than you could do on your own.
Modern RPA takes this a step further. The biggest problem with an Excel macro is that it lives inside one particular Excel file and has limited reach for what it can do, especially if you require steps outside of Excel. Modern RPA systems are installed on centralized systems, can be shared across multiple user processes and are software-agnostic, meaning they can run automated processes in your email system, customer relationship management (CRM) platform, Excel or your home-grown custom administration system equally well. Integrating an Excel macro that could do all of that would be quite a feat.
Currently, RPA systems generally are delegated to handling clearly defined, manual, repeatable processes. Most of these systems have not yet reached the maturity where they can take on complex decision-making processes. But increases in machine learning (ML) and artificial intelligence (AI) are moving them in that direction. Perhaps, sometime in the future, more complex tasks also will be delegated to the robots!
Results and Analysis
Understanding what comes out of your models is important—really important. In fact, it’s most likely the reason why the model exists in the first place. However, getting to the truly useful information can sometimes be difficult.
Business intelligence tools can give insights into data by providing query and visualization tools that are easy to use and simple for end users to manipulate. Behind the scenes, these tools require some setup to connect to all of the data sources. But once done, and the data are loaded and available, they can be powerful tools for diving into the data, discovering important results, and conveying those results to others quickly and easily.
Multiple tools are available for providing business intelligence and data visualization. Some commonly used ones include Tableau and Microsoft’s BI Query. Various visualization libraries are available for R and Python programmers. Using data visualization tools for analytics can provide more powerful insight into results.
These are just a few of the technologies that companies already use to manage some of the common problems with managing and running models. Incorporating some of these technologies into your modeling processes can provide more efficient model management, execution and analysis.
Copyright © 2020 by the Society of Actuaries, Schaumburg, Illinois.