Predictive analytics
Encompassing a wide variety of techniques, such as statistics, modelling, optimization, clustering and market research, predictive analytics helps businesses and organizations forecast future unknown events.
Actuaries in predictive analytics use data science tools and techniques to describe, predict and recommend courses of action that take into account consumer, provider and distributor behaviours. These highly skilled actuaries are responsible for:
- building tools for insurance underwriters and claim reviewers;
- calculating and setting up insurance premium categories;
- ensuring insurance companies are able to deliver on their promises to consumers; and
- nurturing Canadians’ confidence in financial institutions by ensuring these firms are not taking unnecessary risks with individuals’ savings and deposits.
Ultimately, actuaries working in predictive analytics help a variety of businesses better serve their customers by identifying opportunities and anticipating problems before they actually happen.
Predictive modelling resources
Introductory data science and actuarial resources
A Course in Machine Learning (free course)
Hal Daumé III
Structured like a textbook, this free course covers the mathematical foundations of modern machine learning in detail. The contents get fairly technical, and there’s only a limited discussion of the application of the methods discussed, but it remains an important overview that practitioners can reference when employing machine learning software packages.
CAS Monograph #5: Generalized Linear Models for Insurance Rating, 2nd Edition (free document)
Mark Goldburd, Anand Khare, Dan Tevet and Dmitriy Guller, 2020
This monograph, best suited to a more technical actuarial audience, covers the mathematical basis of generalized linear models (GLMs) and their application to ratemaking. Section 1 provides an overview of how GLMs work from a technical standpoint, which will be of particular interest to stakeholders wishing to get a handle on what GLMs are and how to use them. The sections that follow go into detail about model building, validation and refining, and are an invaluable resource for practitioners.
mlcourse.ai (free course)
Yury Kashnitsky, 2022
With lessons, lectures and problems for each topic covered, this machine learning course is a light-hearted and free introduction to data exploration and applications of machine learning, with an emphasis on Python code.
Volume 1: Predictive Modeling Techniques and Volume 2: Case Studies in Insurance (paid documents)
Cambridge University Press, 2014
Highly detailed yet eminently readable, this two-volume textbook is intended for actuaries and other financial analysts looking to develop their predictive modelling expertise and knowledge of advanced statistical techniques that are particularly relevant to insurance. The first volume provides a detailed look at many predictive modelling techniques, specifically applied to actuarial problems, while the second goes over case studies of techniques applied to actual insurance data.
Kaggle (free machine leaning and data science community)
Google
Part social network, part tutoring centre and sample data repository, Kaggle is the premier watering hole for data science practitioners. The online interface, which includes public datasets and code snippets as well as free micro-courses, allows learners to start applying cutting-edge techniques quickly. More advanced users may also enjoy participating in the online competitions.
Supervised Machine Learning Regression and Classification (free course)
Andrew Ng, 2022
This Coursera provides an entry-level overview of basic Python coding, fundamental math concepts for data science and a few entry-level supervised machine learning concepts. It’s a great place for beginners to start learning data science in Python.
Basics of programming
R provides many packages for statistical learning. Often one’s first exposure to this language is a university statistics course that teaches the basics of linear models and testing for statistical significance. R is very popular amongst academics and is a great place to start building explainable predictive models.
R for Data Science (free textbook)
Hadley Wickham and Garrett Grolemund, 2017
Written by the creators and maintainers of some of the most popular R packages, including tidyverse and tidymodels, R4DS offers a solid foundation in how to do data science with R. You’ll learn how to import your data into R, tidy it into a useful structure, transform it, visualize it and model it, all while practicing as you go with numerous code examples and exercises.
R Basics – R Programming Language Introduction (free course)
For those who learn better through videos, this approachable course offers a series of introductory video tutorials on the basics of programming in R. That said, it does lack certain “newer” R practices that are covered in more detail in the R for Data Science textbook.
Python is now one of the most widely used programming languages for building predictive models, including within the actuarial community. To use Python for PM, you’ll need to be familiar with both Python coding basics and the Python packages and libraries commonly used for data science.
LearnPython.org (free course)
Ron Reiter, 2022
A convenient get-started introduction for beginners, these interactive tutorials guide students from Python basics to more advanced topics with instant feedback on exercises. No installation required.
Python Data Science Handbook (free document)
Jake VanderPlas, 2016
This book is a great introduction to some of the most popular packages and libraries in Python for data science, including Pandas, NumPy, scikit-learn and Matplotlib. These are powerful basics libraries that facilitate different aspects of PM, from data cleaning to visualization to model building.
Ethics, fairness and bias
Weapons of Math Destruction (paid book)
Cathy O’Neil, 2016
Written by mathematician, data scientist and former Wall Street quant Cathy O’Neil, Weapons of Math Destruction is an introduction to the potential dark side of big data. O’Neil exposes how the algorithms that increasingly govern our economic and personal lives can result in harmful outcomes, including exacerbated social inequality, when left unregulated and unchecked.
Fairness and Algorithmic Decision Making (free document)
Aaron Fraenkel, 2020
Created from lecture notes from the author’s course on fairness and algorithmic decision making, this resource is aimed at the data science practitioner. It takes a holistic approach to discussing how data-driven systems interact with the populations they affect, providing practical approaches to identifying inequity in decision-making systems (e.g., parity measures) while probing the limits to these approaches.
Fairness in Algorithmic Decision-Making (free document)
Mark MacCarthy, 2019
This report from the Brookings Institution examines the ways in which automated decision systems can exacerbate protected-class disparities, despite their promise to improve the accuracy and fairness of eligibility determination for various private- and public-sector benefits. A key recommendation of the paper is that all companies in every sector must focus on the fairness of the algorithms they use, proactively measuring the extent to which their organizational systems create disparate impacts and fostering a tradition of disclosure and ongoing assessment.
IFoA Ethical and Professional Guidance on Data Science (free document)
IFoA, 2021
The Institute and Faculty of Actuaries (IFoA), in collaboration with the Royal Statistical Society (RSS), has produced non-mandatory ethical and professional guidance on data science for IFoA and RSS members and data science practitioners in general. The report contains a number of illustrations and case studies to help members who may be faced with ethical or professional issues when carrying out data science–related work.
Avoiding Unfair Bias in Insurance Application of AI Models (free document)
SOA Research Institute, 2022
As artificial intelligence adoption in the insurance industry increases, so does the potential for unfair bias in AI algorithms used in underwriting, pricing and claims processes, defined as unexplained adverse outcomes for already vulnerable populations. This research paper by the Society of Actuaries (SOA) Research Institute identifies methods to avoid or mitigate unfair bias unintentionally caused or exacerbated by the use of AI models and proposes a framework and mitigation strategies for insurance carriers to consider when looking to identify and reduce such bias.
Trustworthy AI: A Computational Perspective (free document)
Cornell University, 2021
Developing trustworthy AI requires careful consideration of how to avoid the unintended harm that automated decision-making can cause. This paper presents a comprehensive survey of trustworthy AI from a computational perspective, including the latest technologies for building safe, fair and reliable systems, with a focus on six crucial dimensions of trustworthy AI: safety and robustness; non-discrimination and fairness; explainability; privacy; accountability and auditability; and environmental well-being.
EIOPA Report on Artificial Intelligence Governance Principles (free document)
European Insurance and Occupational Pensions Authority (EIOPA), 2021
In response to the growing use of AI in insurance, the European Insurance and Occupational Pensions Authority convened an expert consultative group to identify opportunities and risks associated with digitization, including exploring possible limits to automation. This report presents their findings, setting out AI governance principles for the ethical use of AI in the European insurance sector along with additional guidance for insurance firms on how to implement them in practice.
AI and Society
The Canadian Institute for Advanced Research (CIFAR)
Canada-based global research organization CIFAR works to strengthen Canada’s technical and responsible leadership in AI through its partnership with the Government of Canada in the Pan-Canadian Artificial Intelligence Strategy. Its AI and Society program facilitates cross-sectoral discussion of the ethical, legal, political and social implications of AI’s expanding role in society, with insights published regularly in reports.
Practitioner data science and actuarial resources
XGBoost Documentation (free)
Available for both Python and R – and a favourite in machine learning competitions – XGBoost is an open-source software library that implements supervised learning algorithms under the Gradient Boosting machine (GBM) framework.
statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and statistical data exploration. An extensive list of result statistics is available for each estimator.
scikit-learn is a free software machine learning library offering simple and efficient tools for predictive data analysis in Python (built on NumPy, SciPy and Matplotlib). It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts.
PyTorch is an open source, end-to-end machine learning framework for building and training deep learning models, designed to accelerate the path from research prototyping to production deployment.
Hugging Face is a data science community and platform providing tools that enable users to build, train and deploy state-of-the-art machine learning models powered by a large open-source library, particularly the Transformers library. The community aspect of the platform allows users to benefit from the experience of other practitioners.
The tidyverse is a collection of R packages designed for data science, all sharing an underlying design philosophy, grammar and data structure. It’s the industry standard for managing, transforming and visualizing data in the R environment. Core packages include dplyr, ggplot2, tidyr, readr, purrr, tibble, stringr and forcats.
The tinymodels framework is a collection of R packages for modelling, statistical analysis and machine learning using tidyverse design principles. Other packages with similar objectives are caret and mlr.
ChainLadder is an R package providing models for insurance claims reserving based on the chain-ladder method, most commonly used in the P&C and health insurance fields. An equivalent package for Python called chainladder offers the same popular actuarial tools.
insurancerating is a helpful R package for P&C pricing work, designed to help actuaries implement generalized linear models (GLMs) within the steps required to construct a risk premium from raw data.
Designed for actuarial science functionality and support for heavy-tailed distributions, actuar provides a wide range of probability distributions that weren’t already in base R, most notably found in Appendix A of Loss Models: From Data to Decisions by Stuart Klugman, Harry Panjer and Gordon Willmot.
Canadian actuarial resources
Big Data and Risk Classification – Understanding the Actuarial and Social Issues
Canadian Institute of Actuaries, July 2022
The Use of Predictive Analytics in the Canadian Life Insurance Industry
Canadian Institute of Actuaries, May 2019
The Use of Predictive Analytics in the Canadian Property and Casualty Insurance Industry
Canadian Institute of Actuaries, June 2022
Resources from other actuarial organizations and industry bodies
Actuaries Institute (AI)
The AI has recognized data analytics as a practice area for members and included a data science requirement in its qualification program with a Data Analytics Principles module. For continuing education, the organization sponsors a Data Science Applications microcredential designed by and for actuaries. See the AI’s data science page, Actuaries and Data Science, for more resources or to sign up for their data science e-newsletter.
Actuview is the first international streaming platform designed specifically for actuaries. Sponsored by the Actuarial Association of Europe (AAE) and corporate partners, it features live broadcasts of congresses and colloquia as well as online sessions from actuarial associations, universities, companies, partner institutions and individual experts from around the world, including a good number of data science presentations. Membership is free for members of the International Actuarial Association, AAE and other sponsoring organizations.
American Academy of Actuaries (AAA)
The American Academy of Actuaries’ Data Science and Analytics Committee (DSAC), the successor of the Big Data Task Force, was established to advance the actuarial profession’s involvement in big data and machine learning and to inform public policy decision-making on the use of advanced analytics technologies. The DSAC website houses its archive of valuable publications on actuarial and ethical uses of advanced analytics.
Other major AAA publications in the area of data science include:
Big Data and the Role of the Actuary (free document)
American Academy of Actuaries Big Data Task Force, 2018
An Actuarial View of Correlation and Causation (free document)
American Academy of Actuaries, July 2022
CAS Institute (iCAS)
A subsidiary of the Casualty Actuarial Society (CAS), the CAS Institute, or iCAS, offers innovative credentials and specialized professional education for quantitative professionals, including the Certified Specialist in Predictive Analytics (CSPA) credential.
Other CAS activities in data and analytics include its regular co-sponsorship (with the SOA and CIA) of the annual Predictive Analytics Seminar and its co-sponsorship (with the CIA) of the two-volume Predictive Modeling Applications in Actuarial Science textbook series.
Institute and Faculty of Actuaries (IFoA)
The UK’s Institute and Faculty of Actuaries has engaged in data science and analytics on both the education and research sides, sponsoring the Certificate in Data Science and establishing the Data Science Working Party in 2018 to research and develop data science techniques in actuarial applications. The Research Section of the working group is open to non-IFoA members who’d like to be involved in research case studies. Other helpful data science links and guidance can be found on the IFoA’s Data Science practice page.
International Actuarial Association (IAA)
In 2020, the International Actuarial Association established the Big Data Task Force with the mandate of facilitating “discussion and knowledge-sharing among Full Member Associations on issues of international relevance for actuaries working with Big Data.” Though the time-limited task force was disbanded in May of 2021 (and succeeded by the Data and Analytics Virtual Forum), recordings of presentations are available on its web page.
Society of Actuaries (SOA)
The Society of Actuaries is very active in both education and research on predictive analytics and AI. As an actuarial qualification body, the SOA requires new Associates to complete a Predictive Analytics exam. Further details on exams and education are available on the SOA’s Associate of the Society of Actuaries (ASA) page. Current Fellows of the SOA may take certification programs in either Predictive Analytics or Ethical and Responsible Use of Data and Predictive Models. In the near future, Fellows will also be offered the certificate program Advanced Predictive Analytics. Details are available on the SOA Certificate Programs page.
Links to tools, open data sources and other resources are available on the SOA’s Data Analytics Resources page, and examples of experience studies using predictive analytics techniques are offered on their Predictive Analytics Experience Studies page.
European Insurance and Occupational Pensions Authority (EIOPA)
EIOPA’s Artificial Intelligence and Big Data page includes links to their free reports on the use of big data analytics by financial institutions and in insurance, as well as its report on digital ethics setting out AI governance principles for the European insurance sector.
National Association of Insurance Commissioners (NAIC)
A number of NAIC working groups and research initiatives examine the use of AI and big data in the insurance industry. The organization’s Big Data page offers resources on emerging technologies, including how new sources of data can complement more traditional ones to benefit consumers and carriers as well as some of the risks and challenges big data poses.
NAIC working groups also develop best practices that provide guidance to state regulators in their review of the use of predictive models by insurers. In a 2020 white paper, Regulatory Review of Predictive Models, for example, the Casualty Actuarial and Statistical (C) Task Force sets out best practices for the review of predictive models filed by insurers to justify rates.