lllustrious: The AI Model That Wants to Rule Anime Art Generation

Illustrious, a text-to-image model based on Stable Diffusion XL, has become so dominant in the AI art community that Civitai, the largest hub for AI art models, had to create a separate category just to handle its massive ecosystem of resources.

And it all happened in three months. The secret behind its success? A return to the basics with a twist.

While newer models like SD 3.5 and Flux rely on lengthy natural language descriptions, Onoma AI, the developers of Illustrious, took a different approach by leveraging Danbooru tags to help their model understand concepts without having to reinvent the wheel with complex captioning systems.

The model’s training on Danbooru’s vast library of tagged anime images gives it an edge in understanding visual concepts.

Each tag in the Danbooru system represents specific elements like character features, clothing items, poses, or backgrounds, allowing for precise control over the generated images without wasting precious tokens on lengthy descriptions.

These tags have been around for years and have become kind of a standard for image categorization among art/anime enthusiasts.

The model is highly accurate and efficient when it comes to understanding the characteristics of a photo.

“It’s like having an artist who understands exactly what you want without having to explain it in paragraphs,” Vishnu, a Discord member who participates in a server focused on NSFW AI content, told Decrypt. “You just need to know the right tags.”

At its core, Illustrious uses the good old SDXL architecture with a sophisticated dual-encoder system that combines CLIP ViT-L and OpenCLIP ViT-bigG to understand words and associate them with their visual equivalent.

The model is capable of processing and generating images at an impressive 1536×1536 resolution, with the capability to stretch up to 2048×2048 and even 3744×3744 without significant quality loss.

For context, the original SDXL handled full HD resolutions (1024×1024).

Deep dive

The journey to create Illustrious was methodical and deliberate. The initial training phase, which produced version 0.1, processed 7.5M images at 1024×1024 resolution with a batch size of 192 images per batch.

The team carefully balanced learning rates, running for 20 epochs (the process in which AI studies 100% of its dataset) to establish a solid foundation. Once the results were satisfactory enough, the team proceeded to increase the size of the dataset and the resolutions used for the next iterations.

In the advanced training phase, Illustrious truly began to shine. Version 1.0 expanded the dataset to 10 M images and bumped the resolution to 1536×1536.

Though they reduced the batch size to 128, they introduced sophisticated tag manipulation strategies and register tokens, fundamental changes defining the model’s exceptional performance.

The final refinement phase for version 2.0 took things a bit further. Working with 20M images at the same high resolution but with a larger batch size of 512, the team incorporated a multi-caption method that dramatically improved text-image correspondence.

The result was the best waifu generator known to man, with good finetuning capabilities, prompt adherence, decent aesthetics, and high-quality outputs.

For the more tech-savvy, the Illustrious devs also introduced a lot of interesting techniques like a “No Dropout Token” approach, ensuring that specific tokens would never be excluded during training; the implementation of Quasi-Register Tokens, for the model to be capable of handling unknown or weird concepts; a Cosine Annealing Scheduler, for the learning rate; a Multi-Level Dropout system and Input Perturbation Noise Augmentation, to turn a simple AI model into a powerhouse.

How to use Illustrious

Illustrious doesn’t need any additional steps to run.

The installation process is the same as with any other SDXL Model. Download the checkpoint and put it in the corresponding folder, depending on which UI you use.

Windows and Linux

For ComfyUI, the route is \models\checkpoints.
For A1111/Forge, the route is /models/Stable-diffusion.
For Fooocus, the route is also \models\checkpoints.

MacOS

Mac users have similar routes. However, some popular macOS-oriented UIs require additional steps.

Draw Things users will have to click on “Models,” go to “Customize,” and then click on “Import Model.”
From there, they can enter the URL to download Illustrious directly or click “Import Custom Model” to select the file if they downloaded the model and saved it on their local drives.
Users of Diffusion Bee must click on the hamburger icon in the top right corner, then click on “Settings,” and then click on “Add new model,” and select their locally downloaded illustrious checkpoint.

Once the model is loaded, there are three things to consider.

Do not use natural language. Remember to rely on Danbooru tags and stick to the old SDXL prompting style for better results.
Do not use Pony LoRas. Since the model uses different approaches, it is better to use Illustrious Loras for best results.
Try not to use the original Illustrious model, instead pick some of the most popular finetunes. The original Illustrious model is a base model, perfect for finetunes that are focused on the results you want to achieve. It’s the same as SDXL, Pony or Flux. Finetunes tend to yield better results.

The best Illustrious models to choose

There are many models to choose from, all focusing on different styles, aesthetics, and characteristics.

There are even general models like the ones from Noob AI that used Illustrious as a base and are being used by fine-tuners to build their models.

However, here are our top pics for different needs. These are great at prompt understanding, output quality, and ease of use. All the samples are from the Civit AI community and are copyright-free.

Best for Versatility: Mistoon_Anime