Getting Started with Data Science

Post by 
November 3, 2020
No additional tags.

here is a large gap between exploratory data science and building an intelligent application that continually learns from the data it encounters to provide business value. In this ACM Select, we highlight content to ease the transition from research to production and illuminate the hurdles you may come across in your journey.


Data science: challenges and directions

First published in Communications of the ACM, Vol. 60, No. 8, July 2017.

In this overview article, Prof. Longbing Cao describes the processes of data science, its overlap with other disciplines, and the challenges present in data-driven decision making.

[Read more]

Data Validation

Your machine learning model can break, degrade, and exhibit unwanted behaviour in numerous ways. The primary cause is issues and irregularities with your data, and data cleaning and validation help to minimize this.

Putting Machine Learning into Production Systems

First published in ACM Queue, Vol. 17, Issue 4, October 7, 2019.

Adrian Colyer gives an overview of two papers concerned with data validation techniques and provides insight into data skew and drift, where the data you trained the model on is no longer representative of the data your system is seeing in real-world operation.

[Read more]

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

First presented at DEEM'19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, June 2019. 

Training your model on biased data results in a biased model. This paper describes methods for ensuring that your training data is accurate and free from bias. 

[Read more]

Model Interpretability

Explanations of why a model arrived at its result help understand whether a machine learning model employed true evidence or the bias that widely exists in training data. Model interpretability is this ability to interpret the results of a model.

Techniques for interpretable machine learning

First published in Communications of the ACM, Vol. 63, No. 1, December 2019.

Interpretability can be classified as intrinsic or post-hoc, both of which can be further broken down into global and local. This article describes these classifications, and also discusses the larger goal of democratizing model explanations for end-users than only for research intuitions.

[Read more]


Algorithms are increasingly helping organize all aspects of our personal and professional lives; but one must be careful to avoid instances of pre-existing societal bias seeping into your models as they make real-world decisions.

Algorithms, Platforms, and Ethnic Bias

First published in Communications of the ACM, Vol. 62, No. 11, November 2019.

In this article, Martin Kenney, a Distinguished Professor at UC Davis, describes types of bias, how they arise from training data, choosing and interpreting models to minimize bias, and the fine line between accuracy and fairness that a data scientist must walk.

[Read more]

Putting It All Together: A Case Study of AI Bots

A Decade of Social Bot Detection

First published in Communications of the ACM, Vol. 63, No. 10, October 2020.

To generate business value, your model will need to be operationalised as part of a broader system. However, such systems aren’t always used for good. In this article, social media researcher Stefano Cresci looks at the influx of AI ‘bots’, how they impact people’s online interactions, and approaches to combat them. 

[Read more]

THere's More

Recommended Selects

See all selects
Getting Started Series

Getting Started with Internet of Things: IoT Applications

This Selects finalizes with an example application domain of Industrial Internet ofThings (IIoT), and a source to delve into state-of-the-art IoT research trends.
Getting Started Series

Getting Started with Internet of Things: Computing and Communication

The selection includes easy to read articles describing and motivating the IoT, and later deep dives into the major aspects of IoT such as communication protocols, edge-to-cloud continuum, AI and data analytics, and security/privacy.
Computing in Practice Series

Trustworthy AI in Healthcare #02

AI needs to be trustworthy. Trustworthiness means that healthcare organizations, doctors, and patients should be able to rely on the AI solution as being lawful, ethical, and robust.

Help guide ACM Selects!

Let us know how we can improve your ACM Selects experiences, what topics you would like us to cover in the future, whether you would like to contribute and/or subscribe to our newsletter by emailing

We never share your info. View our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
continue learning with the acm digital library!
explore ACM DL