Why AI Now?

Me and Language Models

I’ve been building language models since 2004. Rather, I built them back in 2004 for grad school work and then took a break.

They’re an interesting component in the whole arsenal of “AI/Machine Learning”. Basically, they tell you if a sentence is likely to exist or not in a given text.

Why would something like this be useful? Well, if you’re building a text translation system, an easy way to do it is to translate everything word-for-word and push the results through a language model to fix it. So, from French to English, you’d translate “le chat blanc” to “the cat white”, and the language model would tell you that’s very unlikely to be the right word order/phrasing, and would generate more likely alternatives. They have other uses (like autocomplete) , but this is specificly what I was doing with them in 2004.

I don’t intend to rehash the history of machine learning here, but basically from there, more and more tech and effort went into recognizing context. “He fired the ray gun” is an unlikely sentence if ‘he’ is underwater. It seems like there would be lots of information needed to understand this, but, nope, it’s just statistics. “ray gun” and “underwater” tend to not go together, so even if the sentence is OK, the pairing isn’t. We found clever ways of modelling these relationships.

Convergence

Anyways, language models continued to evolve, but from the inside, it was just probabilities.

Actually, all of “AI/ML” was. The whole field felt like disconnected problems. We would dream up different problems (like navigating a maze, playing chess, differentiating cats from dogs, or translating languages) and say “oh, this is hard, I bet it takes intelligence to do this!”

Upon closer inspection, though, it seemed like each of these problems had long, tedious mathematical solutions that were ‘good enough’, and that many of these solutions had little to do with each other. That is, you used completely different approaches to translate languages than you did for differentiating cats from dogs. There was some overlap in the infrastructure needed for these processes, but not much. It felt like “AI” was just a weird umbrella term for “random problems we thought required intelligence but really just needed tons of data”.

Then, I had a little chat with ChatGPT.

It had been trained in the same way as its predecessors and with the same goal: predict text that makes sense in the given context. It does this quite well. From the ‘inside’, it’s clear that most of what it’s doing is just this. This is also why it ‘hallucinates’… the only objective it’s trained on is to identify text that could exist in the given context, not necessarily something true. So, again, just a really really good language model.

Except it can do other things. Things from other parts of the AI field. Things it shouldn’t be able to do. Like, sentiment analysis.

You can give it sentences like “I love ice-cream”, and ask it if this is a “happy” or “sad” sentence and it’ll get it right. The impressive part isn’t that it gets it right (all the information is there in its embeddings), the impressive part is you can ask it. You can ask and it does it, without ever having been trained to do it.

More technically, it’s given a document where a person asks an AI for a task, and then it predicts the likely completion of that document (where the task is done correctly). The only way it can possibly be making those predictions is if it’s modelling underlying human concepts and manipulating/transforming of those concepts.

The unknowns

Basically, somewhere inside its neural network of weights and biases is a series of states that emulate bits of human reasoning. Not all of it, not a lot of it, but definitely some of the more interesting bits of it.

I think the implications of this are literally unimaginable – we won’t know how it changes everything until it has.

I can’t think of anything more meaningful and more exciting than standing at the front of that wave and seeing the new world it brings, so that’s exactly what I’ve decided to focus on and to do.

What a ride it’ll be.

(image generated with Stable Diffusion)

The Journey Begins

Why AI Now?

Why AI Now?

Theory

Me and Language Models

Convergence

The unknowns