Getting Started with Data Science

Post by 
Published 
November 3, 2020
T
No additional tags.

here is a large gap between exploratory data science and building an intelligent application that continually learns from the data it encounters to provide business value. In this ACM Select, we highlight content to ease the transition from research to production and illuminate the hurdles you may come across in your journey.

Overview

Data science: challenges and directions

First published in Communications of the ACM, Vol. 60, No. 8, July 2017.

In this overview article, Prof. Longbing Cao describes the processes of data science, its overlap with other disciplines, and the challenges present in data-driven decision making.

[Read more]

Data Validation

Your machine learning model can break, degrade, and exhibit unwanted behaviour in numerous ways. The primary cause is issues and irregularities with your data, and data cleaning and validation help to minimize this.

Putting Machine Learning into Production Systems

First published in ACM Queue, Vol. 17, Issue 4, October 7, 2019.

Adrian Colyer gives an overview of two papers concerned with data validation techniques and provides insight into data skew and drift, where the data you trained the model on is no longer representative of the data your system is seeing in real-world operation.

[Read more]

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

First presented at DEEM'19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, June 2019. 

Training your model on biased data results in a biased model. This paper describes methods for ensuring that your training data is accurate and free from bias. 

[Read more]


Model Interpretability

Explanations of why a model arrived at its result help understand whether a machine learning model employed true evidence or the bias that widely exists in training data. Model interpretability is this ability to interpret the results of a model.

Techniques for interpretable machine learning

First published in Communications of the ACM, Vol. 63, No. 1, December 2019.

Interpretability can be classified as intrinsic or post-hoc, both of which can be further broken down into global and local. This article describes these classifications, and also discusses the larger goal of democratizing model explanations for end-users than only for research intuitions.

[Read more]

Bias

Algorithms are increasingly helping organize all aspects of our personal and professional lives; but one must be careful to avoid instances of pre-existing societal bias seeping into your models as they make real-world decisions.

Algorithms, Platforms, and Ethnic Bias

First published in Communications of the ACM, Vol. 62, No. 11, November 2019.

In this article, Martin Kenney, a Distinguished Professor at UC Davis, describes types of bias, how they arise from training data, choosing and interpreting models to minimize bias, and the fine line between accuracy and fairness that a data scientist must walk.

[Read more]


Putting It All Together: A Case Study of AI Bots

A Decade of Social Bot Detection

First published in Communications of the ACM, Vol. 63, No. 10, October 2020.

To generate business value, your model will need to be operationalised as part of a broader system. However, such systems aren’t always used for good. In this article, social media researcher Stefano Cresci looks at the influx of AI ‘bots’, how they impact people’s online interactions, and approaches to combat them. 

[Read more]

THere's More

Recommended Selects

See all selects
Apr
29
//
2021

Getting Started With HPC

This week's ACM Select features several introductory resources on HPC, with perspectives on the history, technical definitions of HPC terminologies, trends in HPC, and the implications of its use.
Apr
5
//
2021

Getting Started with Data Science #2

This ACM select highlights several resources about the data science fundamentals and best practices, and compares across different frameworks and tools to help you apply data science in your field.
Mar
25
//
2021
Getting Started Series

Getting Started with Networks

We hope that this shortlist will be useful as an introduction to Networks and, depending on the stage of your career, set the foundation for your future in Networks or take you down a nostalgic route with some basics and fundamentals of Networks.

Help guide ACM Selects!

Let us know how we can improve your ACM Selects experiences, what topics you would like us to cover in the future, whether you would like to contribute and/or subscribe to our newsletter by emailing selects-feedback@acm.org.

We never share your info. View our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
continue learning with the acm digital library!
explore ACM DL