How AI and LLM work

Artificial intelligence uses more than one strategy to learn and conclude

Reductive thinking

Discourse on artificial intelligence is slowly evolving. From blissful fascination to justified mistrust, we're arriving at a curious skepticism as the field continues to innovate at high speed. However, users' general understanding of the tool often boils down to generalizations like "AI applies statistical algorithms". Yes, it does that, but it doesn't just do that; it does much more than that. To advance our understanding, we need to go further.

The aim of AI researchers is to find ways of reproducing and applying virtually all the mental operations we perform. We observe, estimate, evaluate, conclude, verify, retry and redo these processes, while assigning a confidence value to each element and each experience.

For example, if we learn something and observe that three times out of four it works, we also ask ourselves why it didn't work the fourth time and what we should change or what experiment to do to identify the elements of success. From there, we can work towards improvement.

It's this kind of data processing that has enabled robots to develop a more effective approach than the one originally programmed. We don't walk the same way on a hard surface as we do on sand, snow or ice. We gather information and process it in several ways simultaneously or in several stages, and develop adapted responses.

Researchers' publications

Here are seven recent publications that help us understand how Large Language Models (LLMs) work. They describe the theoretical foundations, practical ideas and offer empirical evidence.

They discuss the evolution of LLMs, how they are prepared (pre-training), how they are adjusted, what strategies are used, and how their performance is evaluated. Their adaptation to different contexts and scales is also discussed; there are differences between processing billions of data on a single subject and a few hundred on thousands of subjects. When the stakes are high and you're not allowed to make many mistakes, you'd better learn fast.

Even more interesting are the techniques for multiplying points of attention in parallel. Not all data are equally important, and some are only used at certain stages of the reasoning process. You need to determine which ones to use, where and when. For example, in some cooking situations, dry ingredients are mixed before liquids are added.

"These articles cover various aspects of LLMs, including their architectures, pre-training methods, scaling properties, short-term learning capabilities and applications in tasks such as reasoning and transfer learning. They offer a comprehensive understanding of the underlying principles and cutting-edge techniques in this rapidly evolving field."

Read on for a better understanding of what's happening with artificial intelligence:
The 7 best arXiv papers to learn how LLMs work

Articles

"A Survey of Large Language Models" - https://arxiv.org/abs/2303.18223
"Scaling Laws for Transfer" - https://arxiv.org/abs/2102.01293
"Language Models are Few-Shot Learners" - https://arxiv.org/abs/2005.14165
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" -
https://arxiv.org/abs/1910.10683
"Large Language Models can Learn Rules" - https://arxiv.org/abs/2310.07064
"Attention is All You Need" - https://arxiv.org/abs/1706.03762
"Large Language Models: A Survey" - https://arxiv.org/abs/2402.06196

How AI and LLM work

Artificial intelligence uses more than one strategy to learn and conclude

Reductive thinking

Researchers' publications

Access exclusive services for free