The brain inside a Large Language Model is a random number generator
Suppose you ask a Large Language Model (LLM) the same question several times. Why do you get a different but related answer? For example, I asked ChatGPT the following question 3 times,
Question: Explain Artificial Intelligence in 20 words or less?
Answer 1: AI is technology that enables machines to learn, reason, and make decisions, mimicking human intelligence.
Answer 2: Artificial intelligence is the simulation of human intelligence processes by machines, including learning, reasoning, and problem solving.
Answer 3: AI is the creation of intelligent machines that can think, learn, and solve problems like humans, changing industries worldwide.
How can a machine respond to the same question with a different answer? At first blush this would suggest that the computer is thinking. The real reason, however, is astounding and explains why we are still a very long way from computers thinking for themselves.
I attended a seminar this week at which a PhD candidate in AI spilled the beans. LLM’s use a weighting regime to calculate the probability of one word being followed by another. For example, if the first word is ‘I’ then the most common word that follows 99% of the time might be ‘am’. An LLM will follow ‘I’ with ‘am’ 99% of the time unless the probabilities change. Suppose the second most common word to follow I is ‘want’ and this occurs 1% of the time. By drawing random numbers and adding them to the probability of the word ‘want’ can displace ‘am’ in the word-string. Training the LLM on billions of passages of prose should generate a reasonable amount of patterns to recognise when calculating response probabilities. Using random numbers to perturb these probabilities will make responses different to give the impression that the computer has ‘thought’ of a different answer.
To demonstrate the use of random numbers to simulate thought, I asked ChatGPT to generate some Matlab code to estimate a LLM. The first function that popped out was,
// Function to initialize weights and biases
function [W, b] = initialize_parameters(layer_sizes)
W = []; // Cell array for weights
b = []; // Cell array for biases
for i = 2:length(layer_sizes)
W($+1) = rand(layer_sizes(i), layer_sizes(i-1)) * 0.01; // Small random values
b($+1) = zeros(layer_sizes(i), 1); // Zero initialization for biases
end
endfunction
where the emboldened line of code draws Small random values to seed the weight matrix. This seeding is very important for the LLM to be led to choose nuanced expressions in response to the same question. A random number generator is the brain behind a LLM.
Two things jump out of this expose. First, the response of a LLM is actually deterministic since, in the absence of a random number generator, the response will be the same forever. Second, and more concerning, a LLM’s lack of originality will just parrot back results that it has encountered during training. Unless new thought makes its way into the training set in sufficient abundance to materially impact the word structure probabilities, then the development of language will stagnate. A LLM will not generate anything new. This might be fine for some languages, such as French, where government departments are dedicated to preserving the language. But progressive and adaptive languages such as English will just morph into whatever ChatGPT’s internet crawlers have settled upon.