n a distributed system, multiple components are stored across different machines, which in turn coordinate to ensure that the whole system works as one. While it is challenging to deploy and maintain these systems, a properly implemented distributed system serves as a backbone for modern computing at scale. From telecommunication networks to mobile banking to the Internet, these systems are built to tolerate the failure of individual machines, ensuring that the services we rely on continue with little to no disruption.
This week's Selects collects several materials that can serve as a starting point to understand distributed computing. As always, we invite you to share your feedback and suggestions at selects-feedback@acm.org. For more resources in computing, we encourage you to explore the ACM Digital Library and Learning Center.
Decentralized Computing
First published in ACM Queue, Vol. 18, No. 5, October 2020.
Terence Kelley discusses the role that decentralized methods can play in distributed computing using local communication and computation. Kelley discusses a decentralized protocol for self-organizing wireless networks and social networking problems and provides example code for experimenting with the protocol and a centralized solver.
Distributed Systems in One Lesson
Published through O'Reilly. Video lecture available to ACM members. Please refer to the following FAQ for any issues accessing the O'Reilly learning platform.
Simple tasks like running a program or storing and retrieving data become much more complicated when you do them on a collection of computers. In this 2015 O’Reilly video presentation, Tim Berglund (Senior Director of Developer Advocacy at Confluent) discusses five key areas in distributed systems people need to know to get started. We recommend this video session as an introductory deep dive to the topic.
[Read more]
Distributed information processing in biological and computational systems
First published in Communications of the ACM, Vol. 58, No. 1, December 2014.
In this Communications of the ACM article, Saket Navlakha and Ziv Bar-Joseph compare how biological and computational systems solve distributed information processing problems. The authors also discuss constraints, goals and strategies used in both domains, as well as the opportunities for bidirectional research to improve both fields. We believe this article is a good comparative perspective on the applicability of distributed systems.
The verification of a distributed system
First published in Communications of the ACM, Vol. 59, No. 2, January 2016.
Validating that a distributed system is doing the right thing can be challenging. A failure in one computer can be hard to track because of the complexity and scale of these systems. In her 2016 Communications of the ACM article, Catie McAffrey (Architect and Developer Manager, Azure Sphere Security Services) explains the various aspects of a good verification strategy for distributed systems. We recommend this article as an entry point for good software engineering processes that can help improve your and your client’s confidence in system correctness.
There is no getting around it: you are building a distributed system
First published in Communications of the ACM, Vol. 56, No. 6, June 2013.
In this Communications of the ACM article, Mark Cavage highlights the challenges of building a distributed system and useful tips in considering how to build such a system with commonplace use cases such as scaling a multitenant enterprise web application or migrating an existing application to a cloud service provider. The author explains decision points when architecting distributed systems such as geographies, data segregation, service level agreements, security, usage tracking, and deployment.