LLMs Are A Failure. A New AI Winter Is Coming.

Like many people, I got very excited when it was revealed that the Transformer neural network architecture appeared to break years of stagnation in AI research. Chatbots suddenly had emerging capabilities, derived almost entirely from unstructured, unsupervised learning, that were far superior to older technologies.

My first experience was with an unreleased model, pre-ChatGPT, and I was seriously impressed. Although these early, small models often messed up, even generating streams of garbage text, when they worked they worked. very nice. I completely understand why some people at the time thought they were being sensitive – that’s a different discussion for another time.

People were saying that this means that oh winter It was over, and a new era was beginning. Let me explain to anyone who hasn’t heard this term before, it was a time when early AI research was producing significant results, there was a lot of hope, as there is now, but eventually the technology stagnated. For the first time, AI was at scale symbolic – This basically means that attempts to model natural language understanding and reasoning were essentially based on hard-coded rules. This worked up to a point, but it soon became clear that creating a true AI this way was impractical. Human language is too messy for mechanized parsing to work in a normal way. Logic requires too much world knowledge to be practical for writing code by hand, and no one knew how to extract that knowledge without human intervention.

The other big problem with traditional AI was that many of its algorithms were NP-complete, which meant that sometimes you got the result, but often you didn’t, plus the algorithms took an arbitrarily long time to finish. I doubt one can prove it – I certainly wouldn’t attempt it – but I strongly doubt that there is ‘true AI’, for useful definitions of that term. at best NP-complete, possibly much worse. Although quantum computing could theoretically offer some benefits here, any technology currently being created or considered feasible is unlikely to be useful. There are not enough qubits to represent the type of data that will need to be processed – this is a far more difficult problem than trying to reverse encryption secured by the difficulty of prime factorization.

Then came transformers. Real AI appears to be capable, or, at least, advancing far enough to be called real AI, with surprising capabilities. For beginners, a transformer is basically a big stack of linear algebra that takes a sequence of tokens and calculates the possible next token. More specifically, they are fed one token at a time, creating an internal state that ultimately guides the generation of the next token. This sounds strange and perhaps impossible, but the big research breakthrough was to figure out that, by starting with essentially random coefficients (weights and biases) in linear algebra, and during training backstage Errors, these weights and biases can eventually converge on something that works. Exactly why this works is still somewhat mysterious, although progress has been made.

Transformers are not affected by the NP-completeness and scaling problems that caused the first AI winter. Technically, a single turn of the handle, from the previous token to the next token and producing some intact state always takes the same amount of time. This inner loop is not Turing-complete – a simpler program with a while loop is more computationally powerful. If you allow a transformer to generate tokens indefinitely then it is probably Turing-complete, although no one really does this because of the cost.

Transformers also address scaling, as their training can occur unsupervised (although, in practice they often require supervised training to guard against dangerous behavior). It is standard practice to train new models on every book ever written and everything that can be obtained from the internet.

This is good news. He Was Good news. But we are now past that point, and are now faced with the reality of widespread use of transformers.

All transformers have a fundamental limit that cannot be eliminated by scaling to larger models, more training data, or finer fine-tuning. This is fundamental to the way they operate. At each turn of the handle, the transformer emits a new token (a token is similar to a word, but in practice it can repeat parts of the word or even complete commonly used short phrases – this is why chatbots do not know how to spell!). In practice, the Transformer actually generates a number for each possible output token, with the highest number chosen to determine the token. This token is then fed back, so that the model can generate the next token in the sequence. The problem with this approach is that the model will Always Generate a token regardless of whether the context has anything to do with its training data. To put it another way, the model generates tokens based on what ‘looks most plausible’ as the next token. If it is a bad choice, and gets fed back, the next token will be drawn to match that bad choice. And as the handle rotates, the model will generate text that looks believable. models are Very good At it, because that’s what they’ve been trained to do. In fact, it’s all them can do to do. this is the root of it Maya The Transformer has a problem, and it can’t be solved because the Transformer can only hallucinate.

I would guess that this is another manifestation of the NP-completeness wall that slammed symbolic AI, causing the first AI winter. It is always possible to turn an NP-complete algorithm into one that runs faster, if you don’t mind that it fails to produce any output if you timeout. The equivalent transformer is producing implausible, inaccurate, misleading outputs in cases where it cannot match a good result based on its training. However, the problem is that with traditional AI algorithms you usually Know If you have timed out, or if any of your knowledge rules do not match. With transformers, producing the wrong output looks exactly the same as producing the right output, and there is no way to know which is which.

In practice, this manifests itself as the transformer producing poor output some percentage of the time. Depending on the context, and how well you should be able to recognize good or bad output, this could be a success rate ranging from 60% to 95%, with the remaining 5%-40% being bad results. This is not sufficient for most practical purposes. More worrying is the fact that large transformer models produce extremely potentially bad outputs, which can only be identified as bad by real experts.

Rumor has it that about 95% of generative AI projects in the corporate world fail. This isn’t really a surprise to anyone who was around during the dot com bubble, where all the corporate executives assumed that just going online would somehow transform their business, and new ventures really just needed user numbers and the financials would sort themselves out later. The same thing is happening again with generative AI, although the numbers are much larger. it is absolutely inevitable That the bubble will burst, and very soon. It is expected that OpenAI will crash badly and investors will be left out of pocket. Hopefully AI infra spending will be canceled and/or rolled back. Expect small AI startups that are not revenue positive to disappear overnight. Expect the most difficult use cases to be based on unrealistic expectations of LLM capabilities.

A good example is Transformers which are used to aid programming or generate code from scratch. This has convinced many non-programmers that they can program, but the results are consistently disastrous, as real expertise is still required to recognize hallucinations. Plausible hallucinations in code often result in really terrible bugs, security holes, etc., and these can be incredibly difficult to find and fix. My own suspicion is that it might get you closer to what you think is finished, but it still requires real engineering to actually get to actual production code, and it’s a terrible obligation to maintain a codebase that no one on the team actually wrote.

transformer should never happen Should be used for certain applications – their failure rates are unacceptable for anything that could directly or indirectly cause harm (or even significant discomfort) to a human. This means that they should never be used in medicine, for evaluation in school or college, for law enforcement, for tax assessment, or in a myriad of similar cases. errors are difficult to recognize Even if you are an expert, So non-expert users don’t have any chance.

The technology will not disappear – existing models, especially in the open source domain, will still be available, and will still be used, but hopefully some ‘killer app’ use cases will persist, with the rest dying out. We’re probably stuck in a spammy AI slide, and high school kids are using General AI to skip their boring homework. We’ll probably keep AI features in text editors and a few other places.

I know this is an unpopular opinion at present. However, it is based on solid science. For what it’s worth, I founded a chatbot company based on symbolic AI technology in the late 90s, which fell apart in the dot com crash. I’ve been around the block, and I’ve stayed up to date with the technology – I’ve built my own transformer from scratch, and experimented quite a bit.

My advice: Reduce as much risk as possible from the upcoming AI bubble crash.

Winter is coming, and it’s hard on tulips.

<a href

LLMs are a failure. A new AI winter is coming.

Like this:

Related

Leave a Comment Cancel reply

Share this:

Like this:

Related

Leave a Comment Cancel reply