At least week’s Hot Chips 2024, Trevor Cai, OpenAI’s head of infrastructure, took to the keynote stage to discuss his predictions of the future of AI scaling and infrastructure.

OpenAI's head of infrastructure

Trevor Cai of OpenAI takes the keynote podium at Hot Chips 2024. 

Cai comes from a research engineering background, joining the company in 2022 and leading GPT-4 training and vision execution. He spent much of the prior year with OpenAI’s silicon and infrastructure teams, working on the company’s compute strategy. His experience with OpenAI gave him insight into AI infrastructure and the scaling required for continued growth in the area. 

Mass Data and Mass Processing: The Key to AI Success?

Large language model AI requires huge amounts of data. For an open-ended LLM processor, like ChatGPT-4, the base data set is essentially the internet. Building a model of this size starts with pre-training or tokenizing and analyzing the input model for word relationships, sentiment, and other language-based attributes. This pre-training involves repeatedly running matrix math and other algorithms on the data, creating the need for computing acceleration on a massive scale.

According to Cai, pre-training is only a part of the story. Post-training is also required to tune the algorithms and iterate the models for accuracy. Post-training adds human feedback and reinforcement learning to the computing load. This employs just as much, if not more, of a computing infrastructure requirement.

Getting to the Next Word

Cai explained that while artificial intelligence sounds quite complex, it can be boiled down to a series of small operations. OpenAI defines much of the generative AI challenge down to the task of predicting the “next word.”  Each individual word is essentially a unique AI exercise. Early experimentation found that text generation becomes more difficult with each successive generated word. This was especially evident as AI pioneer Alec Redford performed experiments in 2017 around artificially generating product reviews.

Evolution of generated words from neutral to negative

Evolution of generated words from neutral to negative in an early AI-generated review. 

In the given example, the greener the highlight color, the more positive the sentiment; the redder, the more negative. As the text generation progresses, the AI sentiment becomes negative, and words are chosen accordingly. Each slightly more negative word amplifies the sentiment of the review.

To counter this unnatural negative progression, OpenAI created new inference models that considered human psychology and deductive reasoning. Post-processing includes recursive “likelihood loss” and “log loss” computations on the results and the human feedback. “Loss” refers to the loss of irreducible entropy or the probability limit setting in the recursion process at which a word token selection is deemed to be most likely correct. Both functions improve the machine’s understanding of tokens and token relationships.

More Than Software: Hardware Insights

While the primary objective of this development process has been to improve the LLM’s accuracy and relevance of generative AI output, OpenAI gained significant insight into hardware infrastructure needs. OpenAI created experiments to determine what level of computing load was required and what the results would be with increasing computing capability. They found predictability in compute requirements based on desired accuracy and entropy factors. Looking at it another way, the GPT-1 model was trained on a single box with a small group of GPUs over a few weeks. GPT-2 used a cluster of 10,000 Nvidia V100 tensor core processors.

Training compute requirements (FLOP)

Training compute requirements (FLOP) as seen on AI models over time. Study by Sevilla and Roldán, 2024. 

OpenAI uses this data to justify big infrastructure bets on their research program. It has also resulted in the team’s concern about the chip industry’s ability to meet the hardware needs of OpenAI and the AI industry in general.

Predictive Scaling of Computing Resources

It’s not just the processors. All of the sundry components need to be procured as well. With this much computing power concentrated in server farms, AI computing electrical power draw becomes a first-order factor in the overall electrical power grid. It becomes even more of an issue when the load changes. Synchronized training across a data center will spike power drain. Today’s grid accommodates home and industrial use patterns but not necessarily data center surge loading.

Computer performance is not secondary to power, security, network, and reliability. Those additional factors are on equal terms. All need concerted efforts for expansion, especially the power grid and security. 

Bull Case for AI Compute

“We believe the world needs more infrastructure—fab capacity, energy, datacenters, etc.—than people are currently planning to build,” said Sam Altman, founder of OpenAI. “Building massive-scale AI infrastructure and a resilient supply chain is crucial to economic competitiveness.”

OpenAI can charge less as LLM capabilities grow, and the intelligence and output levels of AI are at a spot where they can grow significantly. However, the computing load is working in the opposite direction, and the industry fab and development capacity are not keeping up. The AI industry needs significant new infrastructure to build the chips and support the supply chain. The algorithms are there, but OpenAI is concerned that the hardware will become a primary constraint if the industry as a whole does not take clear action.

Computational scaling is verifiably predictable. Increasing intelligence is driving inference demand. AI technology and economics are ready to go. According to Cai, the missing piece of the puzzle is industry infrastructure.


Image used courtesy of Hot Chips.