TL;DR
In today’s issue we dive into the booming AI chip ecosystem. Driven by the astounding growth in deep learning over the past decade, AI chips have become the hottest semiconductor startup sector, with multiple acquisitions such as Nervana’s (for $400 million by Intel in 2016 only 2 years after being founded). At present, Nvidia holds a dominant lead in the AI chip ecosystem, but a number of startups exploring novel architectures and technologies aim to challenge Nvidia’s lead. Major players such as Google and Intel are working to challenge Nvidia’s technical dominance. Nvidia’s lead is buttressed by its broadly adopted CUDA software framework but its advantage could erode if technological bets made by new startups pay off. China is investing heavily in bootstrapping a domestic AI chip ecosystem and has dozens of startups currently working in the space looking to catch up. At present, these startups are behind their American counterparts, but continued Chinese governmental investment could rapidly change the state of affairs.
What is an AI Chip?
Deep learning architectures have different requirements than many other types of computer programs. Creating custom architectures (ASICs) that are specialized for deep learning applications can yield powerful advantages; for example, Nvidia’s GPUs yield dramatic speedups for deep learning training when compared with more standard CPUs. Nvidia’s chips were originally designed for graphics (hence Graphics Processing Units or GPUs) but have increasingly been adapted for deep learning applications. AI chip design bets that there are additional gains to be made from working with even more specialized architectures. As we noted in our earlier post on TSMC, there are some radical shifts underway like tightly integrating memory and logic that could yield more dramatic improvements yet.
AI chips are broadly split into training chips and inference chips. Training chips are destined to live in datacenters and be well supplied with power and cooling. Their challenge though is that machine learning architectures come in many shapes and sizes, with new classes of architectures routinely discovered every few years. For example, transformers have only really come into their own in the last few years. There are other commonly used architectures like convolutional networks, graph neural networks, GANs, normalizing flows, autoencoders, recurrent networks and many more. Designing one chip which can handle all types of architectures and yet be specialized enough to yield large performance boosts is a delicate balancing act.
Inference chips in general “live on the edge.” That is, they will be deployed in settings where power usage and heat dissipation must be much more tightly constrained. For example, edge chips may have to rely on a small stream of power from a battery as opposed to a dedicated power supply. Inference chips don’t need to be as flexible as a training chip, since only a more limited set of architectures will likely be chosen for deployment at the edge. Some minimal degree of flexibility will still be required.
AI Chip Startups
Nervana, founded in 2014, was an early pioneer in the AI chip space. Nervana took the tack of building their own deep learning framework, Neon on top of Nvidia Titan X’s, while working on their own custom silicon chips. Nervana was acquired by Intel in 2016 for over $400 million, an early and influential exit in the space (source). Nervana released a chip, the Nervana neural network processor at Intel. Unfortunately, the acquisition caused Nervana’s early momentum to sputter, and Nvidia rapidly regained its lead in the AI chip space, working past Nervana’s early momentum. Intel later acquired another chip startup, Habana Labs in 2019 and decided to close down Nervana in favor of Habana.
Another company worth highlighting is GraphCore, which has raised $682M in funds. Graphcore has attempted a novel architecture called the Intelligence Processing Unit (IPU) and which holds the complete machine learning architecture inside the chip. Cerebras is another startup attempting a broadly similar strategy. Cerebras has raised $112 million (source) and has created a novel design that uses an entire wafer to hold a chip. Process defects are challenging to manage at the wafer scale, since every wafer will have defective circuits. Cerebras is reputed to have custom routing technology that allows it to route around the defects on each wafer.
Optical computing is starting to find a potential niche in the AI chip space, led by startups such as LightOn and LightMatter. Optical neural architectures bring some powerful advantages, such as the ability to perform and detect linear (and some nonlinear transformations) at rates up to 100 GHz (source). Optical systems can even perform matrix operations without consuming any power. As a result, optical neural networks could be dramatically more power efficient than standard neural networks, especially for inference. Gains could be considerable, up to two orders of magnitude faster inference than possible with traditional CMOS architectures.
Major AI Chip Players
The biggest player in the AI chip ecosystem is of course Nvidia itself, which has continued an impressive streak of growth, shipping new chips at a furious rate. Nvidia has also impressively pioneered the CUDA software ecosystem. CUDA is a language that allows efficient programs to be built for GPUs. CUDA has penetrated throughout the industry and become the defacto foundation for building deep learning systems. The presence of CUDA and the familiarity of developers with CUDA acts as a giant moat around Nvidia’s lead, making it challenging for newcomers to compete.
The second major player is Google with its tensor processing units (TPUs), which have seen integration into TensorFlow and Jax. Google has continued to iterate on TPUs, steadily improving their design, and is now on its 4th generation iteration (source). Google appears to be staying far enough ahead of Nvidia for its needs that the TPU is worth the internal investment. Google has the powerful advantage of maintaining TensorFlow, one of the major frameworks for deep learning, and is also working on XLA, a backend language for machine learning systems. XLA is something like Google’s answer to CUDA. By establishing a software foundation that’s within Google’s control, Google gains a powerful hold over the deep learning ecosystem.
The dark horse entrant in the race is Apple, whose new M1 chip appears to have a capable accelerator in place (source). The M1 will undoubtedly be used to help developers build deep learning systems on Macbooks, but it’s unclear whether the M1 can achieve the same success in data centers given that Apple devices are not broadly used in the cloud.
Microsoft has taken a slightly different tack from the other major American players by choosing to use field programmable gate arrays (FPGAs), generic chips which can be customized for specific use cases after manufacturing, instead of custom AI chips for deep learning inference deployments. Microsoft has been willing multiple times to take nonstandard bets. (Microsoft’s bet on topological quantum computing (source) for example is a technical long shot other quantum computing companies aren’t pursuing.)
Intel is an interesting case study since it seems to have stumbled repeatedly with getting an AI chip, axing both its Xeon Phi and Nervana’s neural network processor (source). Intel’s new Habana processors have recently been deployed to AWS, but it’s not yet clear if they will succeed in gaining major traction relative to Nvidia’s offerings. Intel claims Habana offers better performance for cost than Nvidia’s offerings, but Intel needs to climb a steep barrier to challenge Nvidia.
All of the major cloud providers are building their own in-house silicon systems to vertically integrate their computing stack and reduce dependencies on Intel and Nvidia. These bets are primarily backed by TSMC’s manufacturing capability (source), which creates systemic vulnerability in the American economy to Chinese military actions against Taiwan.
The Chinese AI Chip Ecosystem
In 2018, venture investments in Chinese AI chip companies crossed those of the US (source). One of the largest players is Cambricon, which was founded in 2016 and IPO’ed in 2020 on the Shanghai stock exchange (source). Cambricon projects that it may be able to post a profit in 2022 and is investing large amounts on R&D to attempt to catch up to leader Nvidia (source).
The Chinese AI chip ecosystem now has a formidable foundation, with dozens of promising startups in the space (source). As we have highlighted previously, the Chinese government is investing very heavily in the semiconductor space, meaning that Chinese companies have very powerful state allies enabling them to think on a longer horizon than their American startup peers may be able to. The situation is unsettling, especially since the Chinese market for AI chips has an ominous twist with many chips used to surveil and oppress minority populations like the Uyghur people in Xinjiang (source).
Discussion
The AI chip ecosystem is still in its early days. Nvidia has a powerful lead, due to its impressive execution and its control of the CUDA software ecosystem. Developers are often reluctant to give up tools they are comfortable with, meaning that Nvidia’s position is highly defensible. That said, inference chips may be a space where Nvidia may not be able to match the performance of next generation optical designs that have fundamentally better power consumption.
It’s too early to say how the AI chip market will turn out. Building a new semiconductor company is a longshot, so many, even all, of the new AI chip startups may be absorbed into one of the giant cloud players. The Chinese AI chip ecosystem is also formidable. American companies would be well-advised to keep a careful eye on their Chinese counterparts to make sure they are not outcompeted in global markets by well-funded Chinese competitors. At present, the American AI chip ecosystem is ahead of its Chinese counterparts, but with the amount of money China is pouring in, the status quo could change rapidly without careful sustained effort from the US government and American companies.
Highlights for the Week
https://cset.georgetown.edu/research/chinese-state-council-budget-tracker/https://cset.georgetown.edu/research/chinese-state-council-budget-tracker/, https://cset.georgetown.edu/research/chinese-talent-program-tracker/: CSET from Georgetown maintains a superb collection of trackers of the Chinese state. The Chinese talent programs have been a powerful avenue for the CCP to surveil research being done in American institutions and is worth checking out.
https://twitter.com/tshugart3/status/1373759752296013827: A sobering assessment of the reality of the Chinese threat to American military dominance. China now holds the capability to pose a real threat to American military hegemony. Without continuing investment into our defensive capabilities, the US could find itself with a weak hand against CCP actions.
Feedback and Comments
Thank you for reading our subscriber-only newsletter! We’re still figuring out the rhythm for these posts, so if you have feedback on changes you’d like to see, please send them over to bharath@deepforestsci.com! If you’d like to see more financial analysis, or more technical analysis, or deeper dives into a particular industry let me know and I’ll see what we can do.
About
Deep Into the Forest is a newsletter by Deep Forest Sciences, Inc. We’re a deep tech R&D company specializing in the use of AI for deep tech development. We do technical consulting and joint development partnerships with deep tech firms. Get in touch with us at partnerships@deepforestsci.com! We’re always welcome to new ideas!
Credits
Author: Bharath Ramsundar, Ph.D.
Editor: Sandya Subramanian
Great Post. What about Marvell? Can its ASIC chips gain market share in AI given that GPU will loose its market share to ASIC in AI based on Mckinsey recent report.
Excellent post
I'm very curious to hear more about XLA, why/how did NVIDIA have such a stronghold on the space for so long. Doesn't Google have a more challenging problem since they potentially need to work with all hardware or is XLA more optimized on TPUs?
Also how come optical computing is still a niche? Free inference sounds like something everyone would jump on? Any programmer friendly tutorials you'd recommend here? What does programming an optical computer look like and what are the key differences a practitioner should be aware of