Lumina Image 2.0: High-Quality AI Text to Image Generator

In this article, I’ll walk you through the features and capabilities of Lumina Image 2.0, a high-quality AI image generator that stands out for its impressive performance despite its relatively small size. With only 2 billion parameters, Lumina Image 2.0 manages to deliver stunning results, rivaling larger models like Flux, which boasts 12 billion parameters—six times larger.

What is Lumina Image 2.0?

Lumina Image 2.0 is a compact AI image generator with 2 billion parameters, offering high-quality images up to 1024x1024 resolution. It features multi-language support, diverse artistic styles, and advanced multi-panel image generation, making it efficient, versatile, and globally accessible.

Lumina Image 2.0 Demo

Lumina Image 2.0 Model Overview:

Feature	Details
Model Name	Lumina Image 2.0
Functionality	High-quality AI text to image generation
GitHub Repository	Lumina Image 2.0 GitHub
Hugging Face Model	Lumina Image 2.0 on Hugging Face
Hugging Face Space	Lumina Image 2.0 Space
Main Features	High-resolution image generation, Multi-language support, Artistic styles
Applications	Image generation, Artistic creation, Visual content production

What Makes Lumina Image 2.0 Special?

Compact Yet Powerful

Lumina Image 2.0 is a compact model with just 2 billion parameters, making it significantly smaller than many other AI image generators.

For comparison, Flux, another popular model, has 12 billion parameters. Despite its smaller size, Lumina Image 2.0 delivers high-quality results, supporting resolutions of up to 1024x1024 pixels. This makes it an excellent choice for users who want efficient performance without compromising on image quality.

Lumina Image 2.0 Demo

Advanced Architecture

Lumina Image 2.0 uses Gemma 2 as its text encoder and Flux as its VAE (Variational Autoencoder). This combination ensures that the model can accurately interpret text prompts and generate visually stunning images.

The integration of these components allows Lumina Image 2.0 to produce realistic scenes, intricate text within images, and a variety of artistic styles.

Finetuning Guide

1. Create a Conda Environment and Install PyTorch

conda create -n Lumina2 -y  
conda activate Lumina2  
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y

2. Install Dependencies

pip install -r requirements.txt

3. Install Flash-Attention

pip install flash-attn --no-build-isolation

4. Prepare Data

Add data file links in ./configs/data.yaml.
Training data format:

{
  "image_path": "path/to/your/image",
  "prompt": "a description of the image"
}

5. Start Finetuning

bash scripts/run_1024_finetune.sh

Inference Guide

We support Midpoint Solver, Euler Solver, and DPM Solver for inference.

Note:

Use .pth weight files (available on Google Drive or Hugging Face).
Specify the --ckpt argument for the file path.

Gradio Demo

python demo.py \
    --ckpt /path/to/your/ckpt \
    --res 1024 \
    --port 12123

Direct Batch Inference

bash scripts/sample.sh

Key Features of Lumina Image 2.0

1. High-Quality Realistic Images

Lumina Image 2.0 excels at generating realistic scenes. Whether you’re looking to create lifelike landscapes, portraits, or urban settings, this model delivers impressive results. For example, when prompted to generate a portrait of a woman in the city, the output is a detailed and visually appealing image that captures the essence of the prompt.

2. Multi-Language Support

One of the standout features of Lumina Image 2.0 is its ability to accept prompts in multiple languages. This makes it accessible to a global audience, allowing users from different linguistic backgrounds to create images effortlessly.

If you’re typing in English, Spanish, French, or any other supported language, Lumina Image 2.0 can interpret your prompts accurately.

3. Artistic Styles

In addition to realistic images, Lumina Image 2.0 can generate images in various artistic styles. For instance, you can create impressionist paintings, abstract art, or even real designs. This versatility makes it a valuable tool for artists, designers, and creatives looking to explore different visual aesthetics.

4. Multi-Panel Image Generation

A unique feature of Lumina Image 2.0 is its ability to generate multiple images within a single frame. For example:

You can create a dual-panel image where the lower half displays a canny edge map while the upper half retains the original image for direct visual comparison.
Another example is generating a two-panel depiction of a human face, where the left half shows a rough sketch and the right half transforms into a hyperrealistic portrait.

How to Use Lumina Image 2.0 using Huggingface?

Accessing the Model

Lumina Image 2.0 is available on Hugging Face Spaces, where you can use it online for free. Simply enter your prompt, and the model will generate the desired image. The platform also offers advanced settings for users who want more control over the output.

Lumina Image Demo

System Prompts

One of the strengths of Lumina Image 2.0 is the ability to use system prompts. These are overarching prompts that define the rules or role of the AI. For example:

Instead of a generic prompt, you can specify, “You are a professional photographer.” This ensures that all outputs are realistic, professional-quality photos.
Alternatively, you can set the system prompt to “You are an impressionist painter,” and the model will generate images in an impressionist style.

Lumina Image 2.0 AI Image

This feature allows users to tailor the AI’s behavior to suit their specific needs, making it highly versatile.

Negative Prompts

Lumina Image 2.0 also supports negative prompts, which let you specify elements you want to exclude from the image. For example, if you don’t want certain objects, colors, or styles in your output, you can list them in the negative prompt section.

Adjusting Settings

Users can adjust various settings to fine-tune their results:

Width and Height: Customize the dimensions of the output image.
Advanced Settings: Familiar options for those experienced with image generation, such as adjusting sampling steps, guidance scale, and more.

Lumina Image 2.0 AI Image

Using Hugging Face for Lumina Image 2.0

Follow these steps to generate images using Hugging Face's Lumina2Text2ImgPipeline:

1. Import Required Libraries

import torch
from diffusers import Lumina2Text2ImgPipeline

2. Load the Model

pipe = Lumina2Text2ImgPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", 
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()  # Optional: Saves VRAM by offloading the model to CPU.

3. Define Your Prompt

prompt = (
    "Golden sunlight reflects on calm, rippling water, creating a shimmering trail to the horizon."
    "A serene scene of the sun casting golden hues over gentle waves at sunset."
    "Warm golden light dances on a vast, tranquil sea, with ripples leading to the horizon."
    "The sun’s glow reflects across rhythmic waves, creating a peaceful golden-hour atmosphere."
    "A photorealistic view of golden sunlight shimmering on a rippling, expansive ocean."
    "Golden hour: the sun reflects warmly on calm water, with waves gently leading to infinity."
    "A tranquil sea bathed in golden hues, the sun’s light shimmering across its rhythmic surface."
    "The sun’s golden trail glimmers over textured waves, creating a serene, meditative scene."
)

4. Generate the Image

image = pipe(
    prompt,
    height=1024,  # Image height in pixels
    width=1024,   # Image width in pixels
    guidance_scale=4.0,  # Controls prompt adherence (higher = more adherence)
    num_inference_steps=50,  # Number of denoising steps (higher = better quality)
    cfg_trunc_ratio=0.25,  # Optional: Truncation ratio for classifier-free guidance
    cfg_normalization=True,  # Optional: Applies normalization for CFG
    generator=torch.Generator("cpu").manual_seed(0)  # Reproducible results
).images[0]

5. Save the Generated Image

image.save("lumina_demo.png")

Your generated image will be saved as lumina_demo.png in the working directory.

Practical Examples

Example 1: Portrait of a Woman in the City

Let’s start with a simple prompt: “A portrait of a woman in the city.” After entering the prompt and clicking Run, Lumina Image 2.0 generates a detailed and realistic image that matches the description.

Lumina Image Demo

Example 2: Impressionist Painting

Next, let’s experiment with the system prompt feature. Instead of using the default settings, we’ll set the system prompt to “You are an impressionist painter.” When we run the same prompt (“A portrait of a woman in the city”), the output is a beautiful impressionist-style painting. This demonstrates how the system prompt can dramatically alter the style and tone of the generated images.

Comparing Lumina Image 2.0 with Other Models

When compared to other models of similar sizes, such as SDXL, Stable Fusion 3, Dolly 3, Omn Gen, or NVIDIA’s SAA, Lumina Image 2.0 consistently scores higher on most benchmarks. Its ability to deliver high-quality results with fewer parameters makes it a standout choice for AI image generation.

Availability and Accessibility

Lumina Image 2.0 is completely free and open-source, making it accessible to everyone. The models are available for download on Hugging Face, and here's the link to the GitHub page (github.com/Alpha-VLLM/Lumina-Image-2.0) for those interested in exploring the technical details or contributing to the project.

Final Thoughts

Lumina Image 2.0 is a powerful and versatile AI image generator that punches above its weight. With its compact size, multi-language support, artistic capabilities, and unique features like multi-panel image generation, it’s a valuable tool for anyone interested in AI-generated imagery.