Trace the evolution of artificial intelligence, from its mathematical roots to large-scale language models, and what this means for our workforce selection.
The first week of my quest for artificial intelligence is over. I left Silicon Valley today — temporarily — and headed to Edmonton, Canada. This is where Upper Bound takes place, a mix of academic conference and party for Canada's AI scene.
What did I learn in the Valley?
First of all, the topic of AI is everywhere. It's experiencing a boom right now. Investors report that they now receive hundreds of inquiries a day from founders who want to do something with artificial intelligence. I've talked to numerous founders who all want to jump on this bandwagon. Interestingly, the general attitude here is that the technology is no longer the problem. It's more about finding the right niche. Everyone seems to have someone on their team who worked at Google, Amazon or OpenAI and now wants to start their own startup.
Midjourney and ChatGPT have made the "nerd topic" of artificial intelligence mainstream, and even experts are having a hard time keeping track.
So it's worth pausing for a moment to reflect on where we've come from and what's new about this development.
Even though computer science has only existed as a separate science for about 60 years, its foundations lie in the much older algorithms of mathematics. After all, Blaise Pascal's first mechanical calculator dates back to 1642, and computer science has developed these mathematical algorithms into computer algorithms. With these algorithms, certain problems can be solved very reliably and quickly. In this case, someone has laboriously thought up a solution and written it down in detail. After that, a specific problem can be reliably solved in an automated way.
Next came stochastic algorithms, which no longer strictly follow the guidelines of the given path, but use chance to find a solution. They include advanced search and optimization algorithms. These methods can solve much more difficult problems, even though one can no longer be sure of success in every single case due to the random component. This class of algorithms has already been called artificial intelligence. They form the basis for chess computers and navigation systems.
The profession to develop and implement such algorithms is called "software developer" and the result of his work are "programs".
For the last 30 years or so, there has been notable progress in the field of machine learning (ML). What's new about it is that man no longer specifies in detail the rules by which a specific problem is solved. Instead, he leaves it to the computer to find the rules to solve the problem itself. The human trains the computer and the computer learns. Even though there are different methods for machine learning, deep learning methods have become popular in the last 8 years. They are based on so-called multi-level neural networks.
Neural networks are actually not new. The first concepts date back to the 1950s. But they have only really been successful since about 2012. Why? Because neural networks can only solve complex problems when they are very large. And then they require an incredible amount of computing power and a great deal of training data. Only since there are fast graphics cards and they are freely programmable, the necessary computing power is available. And through the Internet, enough training data is available in digital form. Since 2015, neural networks have therefore dominated the field of artificial intelligence and enabled impressive breakthroughs in areas such as speech recognition and image processing.
Since humans now no longer dictate rules to computers, but instead prepare data for training, there is the new profession of "data scientist" and the new work product of "model".
The data scientist can no longer prove that a model solves a specific problem, much less explain it. Instead, he can demonstrate statistics that show a probability of solving a specific class of problems.
The latest development is Generative Artificial Intelligence (GAI) and within it the subclass of Large Language Models (LLMs). Generative artificial intelligence is creative. How it achieves this is a topic for a separate post.
This brings us to ChatGPT. ChatGPT is a large language model and GPT stands for Generative Pre-trained Transformer. So it is generative, it generates language, and it is based on what is called the Transformer architecture. But more relevant for practice is the P in GPT, which stands for Pre-trained. This is because a GPT model no longer needs any further training to solve complex problems. It is enough to describe the problem precisely enough, and the GPT model itself finds a way to solve it.
Thus we have again a new profession, namely the "prompt engineer" and the new work result, the "prompt".
The prompt engineer does not have to give the computer any rules to solve a problem, like the software engineer. Nor does he have to serve the computer training data so that the computer can use training to find its own algorithm for solving the problem, as the data scientist does. Instead, the prompt engineer must provide a description of the problem that the large language model understands, the prompt. To do this, he must understand the underlying model and specify an appropriate problem description that fits the training of the model. In practice, the language for prompts is usually English, but it could just as easily be German or French, or a mixture of these, because current large language models can switch fluidly between languages.
A senior Google executive shared an interesting observation with me: He couldn't help but laugh when he recently realized that more than 75% of his "program" was actually plain English, and the remaining program code was rather trivial. English is increasingly seen as "the" new programming language in Silicon Valley.
The only small problem is that it is virtually impossible to predict what exactly the model does and why. That's because the models are black boxes that we can't see into. So we don't know how they arrive at their solutions. And their creativity is based on the use of very many random variables, making them even more unpredictable. This results in a new challenge in practical deployment. It is not enough to test the system before it is put into operation; it is necessary to control it continuously.
Large language models are thus a bit like people. Creative, intelligent, and capable of solving complex problems, but they also make mistakes, sometimes fantasize, and for important tasks it is best to provide for control.
The journey of artificial intelligence is far from over, and it will be interesting to see how this technology evolves and what new roles and professions it creates in the future. I look forward to following this development further, reporting on it and discussing it with you.