Generative AI refers to artificial intelligence that can create human-like content, from pictures and videos to poetry and even computer code. To achieve this, several different techniques have emerged, mostly evolving over the last decade by building on foundations like deep learning, transformer models and neural networks.
All generative AI models rely on data to effectively ‘learn’ how to generate new content. However, they utilize quite different underlying methodologies:
Large Language Models
LLMs are the core technology behind tools like ChatGPT, Claude and Google’s Gemini. They are neural networks trained on vast text datasets to “understand” relationships between words and predict the next word in sequences. This allows them to generate fluent text, code, translations and more. LLMs can be further fine-tuned on domain-specific data.
Diffusion Models
Used widely for image and video generation, diffusion models work through an iterative denoising process. Starting from a text prompt, they generate random noise, then gradually refine it into a coherent image/video matching the prompt by removing noise while learning desired characteristics. Models like Stable Diffusion and DALL-E use diffusion to create photorealistic visuals.
Generative Adversarial Networks
GANs pit two neural networks against each other – a generator creating synthetic data and a discriminator trying to classify it as real or fake. As they improve through this adversarial process, the generator learns to produce outputs indistinguishable from real data across modalities like images, text and audio.
Neural Radiance Fields
NeRFs use deep learning to generate 3D representations and environments from 2D data like images. They map geometry, volumetric properties and light rendering to predict the full spatial structure of scenes and objects that can be explored from any angle.
Hybrid Models
The latest generative AI advances blend multiple techniques into hybrid models that integrate their strengths. For example, combining the adversarial training of GANs with the denoising process of diffusion models. Or fusing large language models with other neural networks for enhanced multi-modal generation capabilities.
The field continues to rapidly evolve, pushing the boundaries of what AI can create across an expanding range of creative domains and applications.