Community Development Series

People of Computing #3: Computer Vision

Post by

Community Development Series

Published

January 21, 2021

I

Tags:

No additional tags.

n Computer Vision we create algorithms which allow computers to extract information from imaging devices. To learn more about Computer Vision, please check out the ACM Selects on the topic.

Computer Vision has a profound impact on our world, unlocking paradigm changing applications such as image search, a new agricultural revolution, democratized medical imaging, automated manufacturing and logistics facilities, and autonomous vehicles. In this People in Computing ACM Selects, we aim to recognize four outstanding Computer Scientists whose careers and contributions to Computer Vision highlight meaningful aspects of the field.

We highly encourage learning more about their contributions, and invite you to consider participating in ACM’s activities on these topics, be it through our professional community, global policy activities, ongoing work in professional ethics, and/or through our chapters, SIGs, local meetups and/or conferences.

We would love to hear your feedback and suggestions at selects-feedback@acm.org for how we can do better. We look forward to your guidance on how we cancontinue to improve ACM Selects together.

‍

‍

‍

Fei-Fei Li

Professor Fei-Fei Li is a Computer Science professor at Stanford, Co-Director of Stanford’s Human-Centered AI Institute. She served as the Director of Stanford’s AI Lab. Previously, she was on faculty at Princeton University, and the University of Illinois Urbana-Champaign. During her Sabbatical, Fei-Fei served as a VP at Google and Chief Scientist of AI/ML at Google Cloud. Fei-Fei’s ImageNet and ImageNet challenge are in many ways synonymous with the rise of deep learning for Computer Vision tasks. Professor Li has lead many notable PhD students include Andrej Karpathy, Timnit Gebru, Olga Russakovsky, and Juan Carlos Nieble. Fei-Fei’s research interests include cognitively inspired AI, deep learning, computer vision and AI for healthcare.

[ Read their bio ]

‍

‍

Guiding computers, robots to see and think

First published in Communications of the ACM, Vol. 62, No. 3, February 2019.
In this interview, Fei-Fei Li, Co-Director of Stanford University’s Human-Centered AI Institute, discusses her background in and perspectives on the field, her work and collaborations in neuroscience and cognitive science, and the need for more diverse and interdisciplinary voices in the field.
[ Read more ]

‍

ImageNet Large Scale Visual Recognition Challenge

First published in the International Journal of Computer Vision, Vol. 115, No. 3.
In many ways the ImageNet Challenge and Dataset steered the tide of Deep Learning revolution for Computer Vision, thawing the AI winter, showing the promise of Convolutional Neural Networks and the importance of large, clean datasets. This paper, written 5 years into the yearly challenger, describes the creation of the dataset and the advancements that have resulted from it. Fei-Fei and her students have written many influential papers on dataset creation, crowdsourcing, and techniques such as active learning, unlocking the lifeblood of deep learning - large datasets.

‍

You can hear more ImageNet and its implications in Professor Li’s ACM Tech Talk.
[ Read more ]

‍

‍

Stanford CS231N

Every offering, hundreds of Stanford students and thousands of online students clamor to take Fei-Fei Li’s CS231N course. The course is a phenomenal intro and survey into cutting edge Deep Learning for Computer Vision.
[ Spring 2017 Lecture Videos ]
[ Spring 2020 Course Materials ]

‍

Amnon Shashua

Amnon Shashua is a Computer Science Professor at the Hebrew University, the CEO of Mobileye, and a co-founder of OrCam. Mobileye is an industry leader in Computer Vision powered autonomous vehicle technologies, the company was acquired by Intel for US$15.3billion in 2017. Orcam develops Computer Vision powered assistive devices for visually impaired people. Shashua’s career spans many of the themes that we, in ACM Selects, find exciting in Computer Vision: namely autonomous vehicles, assistive technologies (and tech for good applications), the rise of custom silicon, and the intersection of academia and industry.

‍

Self Driving Cars

First published in CVPR '06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, June 2006.
This 2006 paper by Shashua et al, an early work by Mobileye, is a great case study in the early days of self-driving cars. The Darpa Grand Challenge was a phenomenal catalyst for the rise of Autonomous Vehicles (AVs) and many of the founding teams of today’s top AV companies competed. This paper builds on many of the techniques common at the time such as hand crafted kernels for creating description vectors and line fitting. Shashua and Mobileye’s more contemporary work can be seen in thousands of patents over the years. Mobileye dominates the self driving car industry, selling to Nissan, Volkswagen, BMW, and more.
[ Read more ]

‍

The Rise of Custom Silicon‍

First published as an ACM Tech Talk, September 2017.
In this tech talk, David Paterson discusses the rising importance of custom computer architectures to keep up with our appetite for compute and face the new reality slowing our performance gains for standard microprocessors: the end of Dennard scaling (energy consumption per transistor not shrinking), and end of Moore's law (amount of transistors possible to fit on a chip not rising). An especially critical need for custom architectures arises in real time and power constrained applications, such as in autonomous vehicles.

‍

Paterson calls out Mobileye as an example of a company making heavy investments in custom silicon. As a leader in this industry tide towards custom Hardware, Mobileye develops custom accelerators for a variety of computer-vision, signal-processing, and machine-learning tasks through their the EyeQ SoCs line. In CES 2021, Mobileye announced that it is developing a Lidar SoC: an exciting announcement for time-of-flight sensors, silicon photonics technology, and custom architectures for computer vision and self driving cars.
[ Watch the Tech Talk ]

‍

Assistive Technologies

First published in Computer Vision and Image Understanding, January 2017.
As Computers gain the ability to ‘see’ the world in increasingly profound ways, they can relay some of that understanding to people who are visually impaired - empowering them to regain their independence. This review paper by Leo et al describes some of the advancements made in the growing field of using Computer Vision to aid the visually impaired. One of those advancements is the wearable line created by OrCam, a company founded by Amnon Shashua and Ziv Aviram in 2010. OrCam creates a smart camera that attaches to glasses frames and describes the visual world around it through audio. OrCam can read out loud a menu, street signs, or medicine labels, recognize different denominations of paper money, recognize faces, and respond to different hand gestures. In the ACM selects we are very excited about the power of Computer Vision to improve people’s daily lives.

‍

You can read some of the early theoretical work driving OrCam’s technology here.
[ Read more ]

‍

‍

Alexei “Alyosha” Efros

Alexei "Alyosha" A. Efros is an associate professor in the Computer Science Division in the Electrical Engineering and Computer Science Department at the University of California, Berkeley and part of the Berkeley Artificial Intelligence Research Lab (BAIR). Efros is widely recognized for his ground-breaking data-driven approaches to computer graphics and computer vision, with a focus on understanding, modeling and recreating the visual world around us. He is also a pioneer in combining huge image datasets drawn from the Internet with machine learning algorithms, and has also made significant contributions to texture synthesis.

‍

Efros is the author of 100 publications on topics including computer vision, computer graphics and artificial intelligence. He is Sloan Fellow, a Guggenheim Fellow, and received the ACMM SIGGRAPH Significant New Researcher Award, the IEEE PAMI Helmholtz Test of Time Prize and the 2017 ACM Prize in Computing, among other honors.
[ Read their bio ]

‍

What makes Paris look like Paris?

First published in the Communications of the ACM, Vol. 58, No. 12, November 2015.

Understanding visually distinct elements that define a place is easy for humans, difficult for machines. This poses a challenge when it comes to recreating geographically representative images such as windows, balcones, and street signs that are specific to certain places such as London or Paris. This 2015 Communications of the ACM research article by Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros presents an approach that takes these features into account, leading to results that can and are being used for a variety of computational geography tasks.

‍

The original paper was published in SIGGRAPH 2012.
[ Read more ]

‍

Portraiture in the age of big data: technical perspective

First published in the Communications of the ACM, Vol. 57, No. 9, September 2014.
In this Communications of the ACM Technical Perspective, Efros provides his thoughts on Moving Portraits, a 2015 research article that proposes the use of the large dataset of photographs related to an individual to create smoothened facial animations. These photographs are connected as nodes in a graph with edges representing a high degree of visual similarity between two photos. The proposed approach then smoothly connects these photos into a slideshow, making each photo a frame in a continuous “movie”. From Alyosha’s point of view, this approach showcases a practical use of computer vision, the application of which is now commonplace across different applications and products today.

‍
For details on the original paper, please refer to Exploring Photobios, first published in ACM Transactions on Graphics (SIGGRAPH), Vol. 34, No. 4, 2011.
[ Read the technical perspective ]
[ Read the the research article ]
[ Read the original paper ]

Technical perspective: When the adversary is your friend

First published in the Communications of the ACM, Vol. 63, No. 11, October 2020.

Generative models for machine learning -- that is, models that approximate the process of recreating data -- has historically been difficult to use for the recreation of real-world imagery. Quantifying how good a generator is in generating realistic images that no one has seen before is a challenge without having a good metric -- otherwise known as the objective function or loss function -- for what these images should look like.

‍

In their 2020 Communications of the ACM Technical Perspective Efros and Hertzmann look back on Generative adversarial nets, the seminal paper first published in NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, December 2014. This paper presents the idea of simultaneous learning for a generator and discrimintor (which looks for relevant information to determine what an image is) in an adversarial manner, which reinforces and improves the output of these models. The authors then discuss the outcomes of the paper, both in practice and its impact of imaging applications in this society.
[ Read the technical perspective ]
[ Read the research article ]
[ Read the original paper ]

‍

‍

Jitendra Malik

Jitendra Malik is Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Science at the University of California at Berkeley, with appointments in vision science, cognitive science and Bioengineering. Jitendra's group has worked on computer vision, computational modeling of biological vision, computer graphics and machine learning. Several well-known concepts and pioneering algorithms arose in this work, such as anisotropic diffusion, normalized cuts, high dynamic range imaging, shape contexts and R-CNN.
[ Read his bio ]
[ Learn more about their work ]

‍

Technical Perspective: What led computer vision to deep learning?

First published in Communications of the ACM, Vol. 60, No. 6, May 2017.
‍In 2012 Krizhevksy, Sutskever and Hinton published a landmark paper that shaped the trajectory of modern computer vision. Here Jitendra shares his technical perspective on the paper’s lasting impact, and how it caused the computer vision community to embrace deep learning. Later, Jitendra's group developed the state of the art R-CNN architecture for object detection using a modified version of the architecture in the paper. It was quite interesting to see 'strong opinions, loosely held' in the sense that Jitendra being a cynic of neural network techniques back then, still encouraged his group to build on top of the recent developments in object recognition to apply to other problems of computer vision.

ImageNet classification with deep convolutional neural networks was first published in NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, December 2012.
[ Read the technical perspective ]
[ Read the research article ]
[ Read the original paper ]

"This is one of the biggest contributions of computer vision to the rest of computer science — normalized cuts is a bona fide contribution to theoretical CS."
Jonathan Krause PhD, Google AI, Healthcare, in his CS231b Lecture 2: Introduction to Segmentation notes.

Normalized Cuts and Image Segmentation

First published in IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2000.

Jitendra and his collaborators have done seminal work in the area of image segmentation, and it is worth mentioning about Normalized Cuts algorithm which is one of the highly cited papers and it is still being taught in the courses today. We recommend you to check the slides for an overview of the image segmentation techniques and the paper for an in-depth understanding of the algorithm.

[ Read the paper ]
[ Read the lecture notes ]

‍

‍

‍

Noa Glaser

Noa Glaser is a Software Engineer in Google AI's Perception team. Noa has worked/researched at Harvard, MIT, Stanford's Computer Graphics lab, San Diego Supercomputer Center, Facebook/Building 8/Oculus, Intel, Apropose (an A16z backed startup), and Goldman Sachs Investment Banking. She graduated from Stanford University with a BS in Electrical Engineering and a Computer Science MS in AI where she served as President of Stanford Unmanned Aerial Vehicle club.

Prabhav Agrawal

Prabhav Agrawal is a Machine Learning Engineer in Facebook AI’s Speech team. He has 5+ years experience researching and creating AI powered products across leading companies such as Apple and Microsoft. At Apple, he led the efforts for creating Siri Voice experiences across devices including iPhone, Apple Watch and HomePod as part of the Text-to-Speech team. At Microsoft, he focused on improving search relevance and infrastructure for Bing Search, and also co-led the project for including Dictation as part of MS Office and Windows, starting from a hackathon prototype. Prabhav earned his Master's in Computer Science from University of California San Diego and his Bachelor's in Electrical Engineering from Indian Institute of Technology Delhi.

Juan Miguel de Joya

Juan de Joya is a Software Development Engineer for Autodesk Maya and Arnold, focused on advising and addressing priority issues in computer graphics, visualization and interactive techniques. Prior to this role, he worked at Oculus Meta, Google, DigitalFish, Pixar Animation Studios, the Walt Disney Animation Studios, and was the Project Officer responsible for digital strategy, research and assessment, and technical communications for AI for Good at the International Telecommunication Union, the United Nations agency for information and communications technologies. Juan was a researcher in computer graphics and physics at the Visual Computing Lab at the University of California, Berkeley. He serves on the ACM Practitioner's Board, Professional Development Committee, Future of Computing Academy, and is the Chair of the Practitioner Development Committee for ACM SIGGRAPH.

THere's More

Recommended Selects

See all selects

Sep

29

//

2022

Getting Started Series

Getting Started with Internet of Things: IoT Applications

This Selects finalizes with an example application domain of Industrial Internet ofThings (IIoT), and a source to delve into state-of-the-art IoT research trends.

Aug

30

//

2022

Getting Started Series

Getting Started with Internet of Things: Computing and Communication

The selection includes easy to read articles describing and motivating the IoT, and later deep dives into the major aspects of IoT such as communication protocols, edge-to-cloud continuum, AI and data analytics, and security/privacy.

Aug

2

//

2022

Computing in Practice Series

Trustworthy AI in Healthcare #02

AI needs to be trustworthy. Trustworthiness means that healthcare organizations, doctors, and patients should be able to rely on the AI solution as being lawful, ethical, and robust.