Expect Deeper and Cheaper Machine Learning
Supercharged hardware will speed up deep learning in everything from tiny devices to massive data centers
By DAVID SCHNEIDER
Dec 29 2016
Last March, Google’s computers roundly beat the world-class Go champion Lee Sedol, marking a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand.
Unknown to the public at the time was that Google had an ace up its sleeve. You see, the computers Google used to defeat Sedol contained special-purpose hardware—a computer card Google calls its Tensor Processing Unit.
Norm Jouppi, a hardware engineer at Google, announced the existence of the Tensor Processing Unit two months after the Go match, explaining in a blog post that Google had been outfitting its data centers with these new accelerator cards for more than a year. Google has not shared exactly what is on these boards, but it’s clear that it represents an increasingly popular strategy to speed up deep-learning calculations: using an application-specific integrated circuit, or ASIC.
Another tactic being pursued (primarily by Microsoft) is to use field-programmable gate arrays (FPGAs), which provide the benefit of being reconfigurable if the computing requirements change. The more common approach, though, has been to use graphics processing units, or GPUs, which can perform many mathematical operations in parallel. The foremost proponent of this approach is GPU maker Nvidia.
Indeed, advances in GPUs kick-started artificial neural networks back in 2009, when researchers at Stanford showed that such hardware made it possible to train deep neural networks in reasonable amounts of time [PDF].
“Everybody is doing deep learning today,” says William Dally, who leads the Concurrent VLSI Architecture group at Stanford and is also chief scientist for Nvidia. And for that, he says, perhaps not surprisingly given his position, “GPUs are close to being as good as you can get.”
Dally explains that there are three separate realms to consider. The first is what he calls “training in the data center.” He’s referring to the first step for any deep-learning system: adjusting perhaps many millions of connections between neurons so that the network can carry out its assigned task.
In building hardware for that, a company called Nervana Systems, which was recently acquired by Intel, has been leading the charge. According to Scott Leishman, a computer scientist at Nervana, the Nervana Engine, an ASIC deep-learning accelerator, will go into production in early to mid-2017. Leishman notes that another computationally intensive task—bitcoin mining—went from being run on CPUs to GPUs to FPGAs and, finally, on ASICs because of the gains in power efficiency from such customization. “I see the same thing happening for deep learning,” he says.
A second and quite distinct job for deep-learning hardware, explains Dally, is “inference at the data center.” The word inference here refers to the ongoing operation of cloud-based artificial neural networks that have previously been trained to carry out some job. Every day, Google’s neural networks are making an astronomical number of such inference calculations to categorize images, translate between languages, and recognize spoken words, for example. Although it’s hard to say for sure, Google’s Tensor Processing Unit is presumably tailored for performing such computations.