Rise of Generative AI in image generation

Artificial Intelligence (AI) has found applications in science, research, and industry for decades. Perhaps the most widely used family of AI algorithms is machine learning, with supervised learning methods being the most commonly applied in practice. Supervised learning has traditionally focused on two main tasks: classification (such as categorizing images as cats or dogs) and regression (such as predicting the price of a stock for the next hour, day, month, or year).

In the 2010s, AI scientists began experimenting with using AI and machine learning for a third task: generation. The initial problem was whether it was possible for AI to learn how to generate new images by processing existing images (photographs). This can be compared to learning all of Van Gogh's paintings and creating a new picture just like his.

One of the first AI algorithms that could generate images was “Generative Adversarial Networks” (GANs). The idea behind this algorithm was to train two neural networks: one to generate images (the generator), and the other (the discriminator) to score the generated image and provide feedback to the generator. The algorithm is trained using millions of images available in image databases and is able to produce new ones using the generator-discriminator framework. Examples of software that implement GAN architectures include NVIDIA’s StyleGAN and Google DeepMind’s BigGAN.

Another AI-based image generation algorithm is “Variational Autoencoders” (VAEs), which are based on statistical inference. Like GANs, VAEs also have two components: an encoder and a decoder. The encoder learns vectors of statistical parameters (mean and variance) from the image data. The decoder generates new images from the statistical (or “latent,” in technical terminology) parameters learned by the encoder. One well-known tool that combines VAEs with transformers for image generation is OpenAI’s DALL-E.

Recently, a new image-generating algorithm called “diffusion models” has gained popularity. In this algorithm, random noise is added to the images. Then, the algorithm is trained to remove the noise from the input images during training. In the generation phase, instead of using actual pictures, only random noise is given as input, producing generated images as outputs. The highly popular MidJourney tool uses diffusion models as its backbone algorithm.

Indeed, generative AI has found widespread application in image generation even before becoming prominent in text generation, and it continues to evolve. On top of these capabilities, generative models can also create audio and video graphics, even movie trailers. Researchers are exploring the potential of these models for scientific investigations in both academia and industry. We are approaching a time when these algorithms could bring transformative benefits to many fields, including biomedical research.

Systems Bio

Rise of Generative AI in image generation

Recent Posts

Comentários