Your AI Primer: Recent GPT-Based Researchs You Should Know

Various GPT research that would improve your arsenal

May 21, 2023

a close-up of several wires — Photo by Google DeepMind on Unsplash

Artificial Intelligence, or AI, is the buzzword of the year. It seems like everyone, everywhere, is talking about it. Whether in real life or on social media, you will hear the term AI being thrown around. Much of this hype can be attributed to ChatGPT, which has transformed the landscape of generative AI.

ChatGPT was developed by OpenAI, the organization that introduced the Generative Pretrained Transformers (GPT) concept. GPT is a type of AI designed to generate human-like text from a given input. If we were to summarize the workings of GPT, it follows these procedures:

Pre-training: During this stage, the artificial neural network based on the transformer architecture is trained on a large amount of unlabeled text data. Unlabeled means it doesn't know specifics about which documents were in its training set or any information about specific documents or sources.
Fine-tuning: After pre-training, the model is fine-tuned on a narrower dataset, generated with the help of human reviewers who follow guidelines or tasks. They review and rate possible model outputs for a range of example inputs.

Nearly all recent large language models (LLMs) have adopted this procedure. Especially the fine-tuning part; because pre-training models can be costly, many opt to use a pre-trained model to develop their models for intended tasks further.

This article will explore some intriguing GPT-based models that can enhance our workflow. What are they? Let's dive in.

T2M-GPT

What if you can generate a human motion only with the text input? It’s undoubtedly the research that Zhang et al. (2023) try to do.

The T2M-GPT research attempts to merge a generative framework based on Vector Quantised Variational AutoEncoder (VQ-VAE) and Generative Pretrained Transformer (GPT). This combination produces visual motion derived from the text. You can see the result in the GIF below.

Generated Human Motion based on TM2-GPT (Source: HuggingFace Space)

The research result is a human motion that can be applied in many use cases such as animation video, game development, and many others. If you want to try the model, use the following demo notebook or the HuggingFace space.

SpeechGPT

SpeechGPT is a research by Zhang et al. (2023) to create a large language model with intrinsic cross-modal conversational abilities. SpeechGPT is intended for the model to understand text and audio input and provide text and audio information.

The capability can be for a personal assistant, tutor, conversational trainer, etc. It’s research that might be useful to improve accessibility for LLM. The model demo can be seen on their project page or in the video below.

For further release of the model, you can follow the research on the following page.

StructGPT

StructGPT is research by Jiang et al. (2023) to improve the large language models (LLMs) on understanding structured data. Most of the research on LLM is all about text generation, which doesn’t involve structured data. With StructGPT, the author wants to utilize LLM's capability to understand structured data.

Instead of creating a model, the team has developed StructGPT as a fresh approach named Iterative Reading-then-Reasoning (IRR) for tackling tasks that rely on structured data. To put it briefly, this method employs specific functions to gather data from structured sources, allowing the Large Language Model (LLM) to execute tasks based on the collected data and the posed question. You can see the overall structure of StructGPT in the image below.

StructGPT IRR overview (source: Jiang et al. (2023))

You can visit their page to follow the research and replicate the experiment.

Thank you for reading Non-Brand Data. This post is public, so feel free to share it.

DreamGPT

Many people point out that LLMs aren't always accurate because they don't understand the natural world and can make stuff up. Some of the results feel like a hallucination created out of nothing. This can be a problem, like when you're searching for something or reading a news story and the model gets it wrong, but it's so sure it's right that nobody notices the mistake.

Introducing DreamGPT, an open-source project by DivergentAI to use the LLM hallucinations to develop new fresh ideas. This is the aim of DreamGPT to venture into as many potential avenues as it can rather than focus on solving specific tasks. DreamGPT works can be summarized in the image below.

dreamGPT flow — Source: (DreamGPT GitHub)

DreamGPT's operation is straightforward. Initially, it picks randomly from a collection of pre-established ideas, like individual words including 'metal', 'hat', 'song', etc. Subsequently, it generates ideas from these concepts, providing each with a title and summary.

DreamGPT then appraises these derived ideas based on criteria such as novelty or marketable. DreamGPT then combines the score and selects the most creative versions as the seed for fresh ideas. The example result can be seen in the image below.

The result above shows various scores that show how helpful the idea is. The score definition is as described below.

noveltyScore: How unique this concept was from the other ideas.
marketScore: How potential the idea to the market.
usefulnessScore: How potential benefit if we use this idea.
easeOfImplementationScore: How easy to make the idea into reality.
impactScore: How positive is the potential impact of this idea to the world.

Conclusion

GPT-Based research has grown significantly recently, leading to the development of many new models and approaches. In this newsletter, we've discussed several studies that may pique your interest, including:

T2M-GPT
SpeechGPT
StructGPT
DreamGPT

I hope it helps.

Don’t forget to comment if you find this post helpful, or share what you want me to write next. Also, follow me on Social Media: Linkedin and Twitter.

Non-Brand Data