Transformers: The rise of generative AI

By 2022, almost everyone could see the potential of Large Language Models (LLMs) like ChatGPT as a breakthrough change, from companies to students writing essays.

Starting from a neural network architecture, transformer machine learning models have brought many AI applications from university research labs to the open market. Including companies such as OpenAI, Google Research, Meta AI and others.

Image generated by Midjourney

LLMs' capabilities

In the last five years, advances in GPUs and the availability of computing power in general have allowed these models to get bigger and better. As they get bigger, it is now possible for machines to write, program and draw with believable, human-like results and at superhuman speeds.

With ChatGPT, you can interact with a pre-trained AI model and give it text-based tasks. The system models a sequence of words (or generally, tokens) in a language and completes the content you start with your prompt. Since the database is huge, it can do this in any style and use any type of data that was available at the time of the model training.

It can do the same with computer programming languages and complete your code. Since it is a probabilistic model, you will get a different answer to your question every time. Language models are also the basis for the models generating text into images such as DALL-E, Imagen [2] and others that became popular in early 2022. All of this is called generative AI, where the models produce new results, as opposed to finding patterns in existing data, the realm of analytical AI.

Applications of Large Language Models in business

What could be the potential business implications for developers and value added IT resellers? According to Sonya Huang and Pat Grady of Sequoia Capital [3], GPT-3, ChatGPT and similar Large Language Models are just one platform. And as this platform layer matures, “the increasingly better/faster/cheaper models and the trend towards free and open-source model access, the application layer is ripe for an explosion of creativity. ... And just as the mobile inflection point a decade ago created a market opening for a handful of killer apps, we expect killer apps to emerge for generative AI. The race is on.”

Today, the top 5 apps based on GPT-3 (the predecessor to ChatGPT; the latter is optimized for dialogue) are all in the realm of improving customer communication – better chatbots or helping developers with coding. As you can see, the killer app is yet to be discovered in the next few years, and LLMs are just the platform that enables this opportunity.

LLM's shortcomings

However, Large Language Models are not without their drawbacks. Currently, they are black box systems that do not provide references, source citations or name the original authors. Most of them are able to explain issues in a very convincing way, which can be biased, ill-informed and factually incorrect. Deliberate human intervention and careful scrutiny of training data is required to avoid such cases.

Recently, an LLM-based system designed to help scientists write their articles was withdrawn by the authors after being online for only three days. Instead of helping, it mindlessly spewed out biased and incorrect nonsense. Another problem with texts that do not include references and source citations occurs in the educational field. OpenAI is working on a way to watermark AI-generated texts, and these techniques will most likely be incorporated into the next versions of plagiarism detection systems.

But apart from these isolated cases, the benefits of Large Language Model technology for IT companies are significant. Some examples are:

  • Developers can embed the new generative AI functionality into their own platform, application or service, creating a complete solution that was previously a utility which solved part of a workflow.
  • Building new AI-centric products and services based on LLM that solve content creation, moderation, review (e.g. grading and essay) and analysis problems and significantly reduce turnaround time.
  • For experienced developers, the generative AI tool can also be a great time-saver – in the early stages it can help with skeleton coding, research and feature comparison. It can also be useful when it comes to understanding the old code base, rewriting the code using a specific style guide or adding comments and documenting your code.
  • Even inexperienced developers or professionals from other fields can now use Build AI Expert companions to help them create new web apps and reports, interpret the data they collect and get more done faster.
  • The functionality and adoption of existing apps can be improved with Reinforcement Learning from Human Feedback (RLHF).

Summary

Large Language Model technology is improving at a rapid pace. In the last five years, the number of parameters for LLM training has increased more than tenfold thanks to technological advances in GPUs and the availability of open-source models. This will soon also be the case with the new GPT-4.

IT companies should start building their AI practice now. OpenAI is starting to offer ChatGPT Professional – a paid version of ChatGPT. For most developers and IT resellers, GPT-3.5, DaVinci, DALL-E and other OpenAI models are already available as a service on Azure via the ALSO Cloud Marketplace.

Check out our blog and learn more about AI use cases, sustainable IT, an much more!

References