The Power of Transparent AI Models

As artificial intelligence rapidly advances, the development of increasingly complex models has become a reality. However, with this newfound complexity comes a pressing need for transparency. Transparent AI models not only demystify these intricate systems but alleviate fears. At the same time they also to pave the way for more robust and responsible model development.

A common approach in the field is to study simplified models, which are more readily comprehensible. The hope is that the insights gained will hold true in their more sophisticated counterparts. This is the path followed by Anthropic’s Interpretability team who are paving the way for transparent AI solutions.

To truly grasp the intricacies of modern large language models, it’s often advantageous to start with an entirely simplified model. One such approach involves stripping away all intermediate layers from a GPT-3-like decoder-only model. That leaves only the embedding, encoder, and decoder matrices. When trained on the task of next-token prediction, these two matrices become a low-rank approximation of the bigram statistics. Here the “rank” is determined by the embedding dimension.

Let us reduce the embedding size

At its core, a bigram model reveals which word is most likely to follow another. However, if we were to construct this model for a vocabulary of 50,000 tokens (a common size for large language models ranging from 30,000 to 200,000 tokens), the resulting matrix would contain a staggering 2.5 billion parameters. Moreover, many elements in the training set would occur infrequently, rendering the statistics unreliable in those areas. By reducing the embedding size to 1,000, the parameter count is brought down to a more manageable 100 million.

Of course, when reintroducing this simplified model into a real-world context (with intermediate layers present), the elegant approach begins to break down – though efficiency gains are realized. The decoder layer is no longer tasked with reconstructing the bigram statistics, but rather an improved version of them. Consequently, the optimal embedding is no longer the low-rank representation, but instead becomes a representation that provides sufficient information to the intermediate layers.

While this merely scratches the surface of Anthropic’s research efforts, it highlights the significance of model transparency. By understanding the inner workings of these systems, we can harness their power while mitigating undesirable side effects, paving the way for safe and scalable artificial intelligence that truly benefits humanity.


Want to know more? Let us talk!