Getting Started with Data Science #2

Post by

Published

April 5, 2021

D

Tags:

No additional tags.

ata science has propelled us towards better decision making in numerous fields including science & technology, healthcare, and manufacturing. This ACM select highlights several resources about the data science fundamentals and best practices, and compares across different frameworks and tools to help you apply data science in your field.

‍

This is the second installation in our data science series, the first select article was printed here.

‍

We invite you to consider participating in ACM’s activities on these topics, be it through our professional community, global policy activities, ongoing work in professional ethics, and/or through our chapters, SIGs, local meetups and/or conferences.

‍

We value your feedback and look forward to your guidance on how we can continue to improve ACM Selects together. Your suggestions and opinions on how we can do better are welcome via email through selects-feedback@acm.org.

‍

Best Practices in the Field

Rules of Machine Learning

Martin Zinkevich, a Research Scientist at Google, lays out best practices for developing data science and machine learning systems in production. It is a great collection of rules of thumb, heuristics and pitfalls which can help in bringing more structure and clarity while building such systems.

Data Science and Prediction

The article emphasizes the importance of predictive modeling in data science because that makes new knowledge actionable for decision making rather than being a source of explanation of the past events. It then highlights that to become a good data scientist, an integrated skill-set spanning mathematics, machine learning and software engineering, along with good problem formulation and solving skills is required.

‍

Tools and Technologies Involved

Applied Linear Algebra Methods for Data Science

Data science often relies on having access to vast amounts of data. This data is likely to be high dimensional and can contain some information which is not relevant to answering the question at hand. In this paper, efficient algorithms for reducing the dimensionality of data are introduced. Often called `Feature engineering’ techniques, these algorithms are a critical stage of any data science workflow.

Python vs R for Data Science

Python and R are the two leading languages used for carrying out data science. Want to understand the differences between the two, so you can figure out where to focus your attention? This article is for you!

Scikit-learn: Machine Learning Without Learning the Machinery

Scikit-learn is a Python library which provides the tooling and frameworks to build up data science pipelines. From transforming data, to training models, Scikit-learn’s modular approach makes it simple to compare a range of techniques on your data set.

Research and Coursework for Deep Dive

The Data Science Life Cycle: A Disciplined Approach to Advancing Data Science as a Science

In this article, Victoria Sodden, an Associate Professor at University of Southern California, motivates the interdisciplinarity and scope of data science as a discipline. Sodden proposes an intellectual framework Data Science Life Cycle to describe the various steps and processes involved, and highlights the coursework to build a skillset in each of those components.

Computing competencies for Undergraduate Data Science Curricula.

This report from the ACM Data Science task force lays out the topics any comprehensive Data Science undergraduate course should cover. If you want to identify gaps in your knowledge, or figure out what to learn next, this report is a great place to start.

Prabhav Agrawal

Prabhav Agrawal is a Machine Learning Engineer in Facebook AI’s Speech team. He has 5+ years experience researching and creating AI powered products across leading companies such as Apple and Microsoft. At Apple, he led the efforts for creating Siri Voice experiences across devices including iPhone, Apple Watch and HomePod as part of the Text-to-Speech team. At Microsoft, he focused on improving search relevance and infrastructure for Bing Search, and also co-led the project for including Dictation as part of MS Office and Windows, starting from a hackathon prototype. Prabhav earned his Master's in Computer Science from University of California San Diego and his Bachelor's in Electrical Engineering from Indian Institute of Technology Delhi.

Sophie Watson

Sophie is a Data Scientist at Red Hat, where she helps customers use machine learning to solve business problems in the hybrid cloud.

THere's More

Recommended Selects

See all selects

Sep

29

//

2022

Getting Started Series

Getting Started with Internet of Things: IoT Applications

This Selects finalizes with an example application domain of Industrial Internet ofThings (IIoT), and a source to delve into state-of-the-art IoT research trends.

Aug

30

//

2022

Getting Started Series

Getting Started with Internet of Things: Computing and Communication

The selection includes easy to read articles describing and motivating the IoT, and later deep dives into the major aspects of IoT such as communication protocols, edge-to-cloud continuum, AI and data analytics, and security/privacy.

Aug

2

//

2022

Computing in Practice Series

Trustworthy AI in Healthcare #02

AI needs to be trustworthy. Trustworthiness means that healthcare organizations, doctors, and patients should be able to rely on the AI solution as being lawful, ethical, and robust.