Getting Started with Data Science #2

Post by 
April 5, 2021
No additional tags.

ata science has propelled us towards better decision making in numerous fields including science & technology, healthcare, and manufacturing. This ACM select highlights several resources about the data science fundamentals and best practices, and compares across different frameworks and tools to help you apply data science in your field.

This is the second installation in our data science series, the first select article was printed here.

We invite you to consider participating in ACM’s activities on these topics, be it through our professional community,​ global policy​ activities​, ongoing work in ​professional ethics​, and/or through our ​chapters, ​SIGs​, ​local meetups and/or ​conferences​.

We value your feedback and look forward to your guidance on how we can continue to improve ACM Selects together. Your suggestions and opinions on how we can do better are welcome via email through

Best Practices in the Field

Rules of Machine Learning

Martin Zinkevich, a Research Scientist at Google, lays out best practices for developing data science and machine learning systems in production. It is a great collection of rules of thumb, heuristics and pitfalls which can help in bringing more structure and clarity while building such systems.

[Read more]

Data Science and Prediction

The article emphasizes the importance of predictive modeling in data science because that makes new knowledge actionable for decision making rather than being a source of explanation of the past events. It then highlights that to become a good data scientist, an integrated skill-set spanning mathematics, machine learning and software engineering, along with good problem formulation and solving skills is required.

[Read more]

Tools and Technologies Involved

Applied Linear Algebra Methods for Data Science

Data science often relies on having access to vast amounts of data. This data is likely to be high dimensional and can contain some information which is not relevant to answering the question at hand. In this paper, efficient algorithms for reducing the dimensionality of data are introduced.  Often called `Feature engineering’ techniques, these algorithms are a critical stage of any data science workflow. 

[Read more]

Python vs R for Data Science

Python and R are the two leading languages used for carrying out data science. Want to understand the differences between the two, so you can figure out where to focus your attention? This article is for you! 

[Read more]

Scikit-learn: Machine Learning Without Learning the Machinery

Scikit-learn is a Python library which provides the tooling and frameworks to build up data science pipelines. From transforming data, to training models, Scikit-learn’s modular approach makes it simple to compare a range of techniques on your data set. 

[Read more]

Research and Coursework for Deep Dive

The Data Science Life Cycle: A Disciplined Approach to Advancing Data Science as a Science 

In this article, Victoria Sodden, an Associate Professor at University of Southern California, motivates the interdisciplinarity and scope of data science as a discipline. Sodden proposes an intellectual framework Data Science Life Cycle to describe the various steps and processes involved, and highlights the coursework to build a skillset in each of those components.

[Read more]

Computing competencies for Undergraduate Data Science Curricula.

This report from the ACM Data Science task force lays out the topics any comprehensive Data Science undergraduate course should cover. If you want to identify gaps in your knowledge, or figure out what to learn next, this report is a great place to start. 

[Read more]

THere's More

Recommended Selects

See all selects
Getting Started Series

Getting Started with Internet of Things: IoT Applications

This Selects finalizes with an example application domain of Industrial Internet ofThings (IIoT), and a source to delve into state-of-the-art IoT research trends.
Getting Started Series

Getting Started with Internet of Things: Computing and Communication

The selection includes easy to read articles describing and motivating the IoT, and later deep dives into the major aspects of IoT such as communication protocols, edge-to-cloud continuum, AI and data analytics, and security/privacy.
Computing in Practice Series

Trustworthy AI in Healthcare #02

AI needs to be trustworthy. Trustworthiness means that healthcare organizations, doctors, and patients should be able to rely on the AI solution as being lawful, ethical, and robust.

Help guide ACM Selects!

Let us know how we can improve your ACM Selects experiences, what topics you would like us to cover in the future, whether you would like to contribute and/or subscribe to our newsletter by emailing

We never share your info. View our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
continue learning with the acm digital library!
explore ACM DL