here is a large gap between exploratory data science and building an intelligent application that continually learns from the data it encounters to provide business value. In this ACM Select, we highlight content to ease the transition from research to production and illuminate the hurdles you may come across in your journey.
Overview
Data science: challenges and directions
First published in Communications of the ACM, Vol. 60, No. 8, July 2017.
In this overview article, Prof. Longbing Cao describes the processes of data science, its overlap with other disciplines, and the challenges present in data-driven decision making.
Data Validation
Your machine learning model can break, degrade, and exhibit unwanted behaviour in numerous ways. The primary cause is issues and irregularities with your data, and data cleaning and validation help to minimize this.
Putting Machine Learning into Production Systems
First published in ACM Queue, Vol. 17, Issue 4, October 7, 2019.
Adrian Colyer gives an overview of two papers concerned with data validation techniques and provides insight into data skew and drift, where the data you trained the model on is no longer representative of the data your system is seeing in real-world operation.
Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach
First presented at DEEM'19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, June 2019.
Training your model on biased data results in a biased model. This paper describes methods for ensuring that your training data is accurate and free from bias.
Model Interpretability
Explanations of why a model arrived at its result help understand whether a machine learning model employed true evidence or the bias that widely exists in training data. Model interpretability is this ability to interpret the results of a model.
Techniques for interpretable machine learning
First published in Communications of the ACM, Vol. 63, No. 1, December 2019.
Interpretability can be classified as intrinsic or post-hoc, both of which can be further broken down into global and local. This article describes these classifications, and also discusses the larger goal of democratizing model explanations for end-users than only for research intuitions.
Bias
Algorithms are increasingly helping organize all aspects of our personal and professional lives; but one must be careful to avoid instances of pre-existing societal bias seeping into your models as they make real-world decisions.
Algorithms, Platforms, and Ethnic Bias
First published in Communications of the ACM, Vol. 62, No. 11, November 2019.
In this article, Martin Kenney, a Distinguished Professor at UC Davis, describes types of bias, how they arise from training data, choosing and interpreting models to minimize bias, and the fine line between accuracy and fairness that a data scientist must walk.
Putting It All Together: A Case Study of AI Bots
A Decade of Social Bot Detection
First published in Communications of the ACM, Vol. 63, No. 10, October 2020.
To generate business value, your model will need to be operationalised as part of a broader system. However, such systems aren’t always used for good. In this article, social media researcher Stefano Cresci looks at the influx of AI ‘bots’, how they impact people’s online interactions, and approaches to combat them.