Machine Learning

If you were to study machine learning in the 2000s, the most powerful method today, deep learning and neural networks, would be almost absent from the curriculum. This is not only because there are more theorems to prove about Support Vector Machines and Bayesian methods, but the compute infrastructure did not exist to support deep learning. If you want to create powerful deep learning models you are constrained not by ideas about deep learning, but by two things:

Your ability to iterate on the architecture and model.
Your ability to train and serve the model efficiently.

These two factors are not independent; and these two factors are not only necessary for SOTA models. For deep learning models that aim to be highly efficient, both the infrastructure the model is run on and the architecture of the model itself plays a crucial role. An example of this MobileNet – a CNN created to run locally on smartphones.

You can speed the training and inference of models by creating infrastructure to run neural networks in parallel by using heterogeneous processor, such as GPUs and TPUs, and distributed over a set of machines connected over a network. You can speed the iteration of models by increasing the speed of training through infrastructure, and by providing an interface to leverage that infrastructure. 1 million threads on a GPU beat out 16 threads on a CPU, clusters beat out single machines, and PyTorch, with its dynamic computation graph, beat out TensorFlow.

But what excites me most about here is that there is more work to do. Custom ASICs for serving specific architectures are being designed. Larger scaling, test time compute, and synthetic data generation will need software to enable it. Bigger and bigger computing clusters are being built that will need to be utilized. Furthermore, I believe that a key aspect of AGI will be program search, and that while we have the PyTorch for deep learning, we do not have the PyTorch for program search.

Back to About