Synthetic art is the end result of artificial intelligence models that have been trained to generate images from text prompts. "Comic synthesis" is one such use case, where comic illustrations are produced from textual descriptions. Previous attempts at comic synthesis have utili
...
Synthetic art is the end result of artificial intelligence models that have been trained to generate images from text prompts. "Comic synthesis" is one such use case, where comic illustrations are produced from textual descriptions. Previous attempts at comic synthesis have utilized conditional Generative Adversarial Networks (cGANs), but this approach has encountered challenges in generating consistent and visually appealing comic panels. Strict data requirements and quality limitations have left room for improvement. We propose a novel approach to comic synthesis using Stable Diffusion, a powerful generative modelling technique. The study investigates the fine-tuning of the stable diffusion model specifically for the generation of Dilbert Comics from textual prompts. We explore different techniques to fine-tune the stable diffusion model for comic synthesis including Dreambooth and LoRA. Through extensive experimentation and analysis, with an FID score of 123, results produced using the Lora technique outperformed Dreambooth, excelling in understanding the Dilbert style, while Dreambooth struggled with multiple-subject training. Results are also compared with previous approaches based on conditional GANs. While the quality and detail greatly improved, the transition from conditionals to text descriptions meant the results were less accurate. The results show the potential of stable diffusion in generating appealing Dilbert Comic panels while highlighting the need for further exploration to enhance the alignment between textual descriptions and the generated images.