StableYolo: Optimizing Image Generation for Large Language Models

Table of Contents

Abstract:

AI-based image generation is bounded by system parameters and the way users define prompts. Both prompt engineering and AI tuning configuration are current open research challenges and they require a significant amount of manual effort to generate good quality images. We tackle this problem by applying evolutionary computation to Stable Diffusion, tuning both prompts and model parameters simultaneously. We guide our search process by using Yolo. Our experiments show that our system, dubbed StableYolo, significantly improves image quality (52% on average compared to the baseline), helps identify relevant words for prompts, reduces the number of GPU inference steps per image (from 100 to 45 on average), and keeps the length of the prompt short (≈ 7 keywords).

Seminar by Héctor D. Menéndez at ITEFI on December 18, 2023.

Speaker:

Héctor D. Menéndez is currently a lecturer in Computer Science at King’s College London. He is a computer scientist (BSc, MSc and PhD) and a mathematician (BSc and MSc). He started working in machine learning during his PhD but, during his postdoc at University College London (UCL) under the mentorship of Dr. David Clark, he delved into the field of “Comprology”, with a primary focus on security, malware, diversity, and testing.

He is a researcher, developer and scientific disseminator. He is currently collaborating with multiple researchers worldwide contributing to three pivotal research domains: scaling mental health, validating AI-based diagnosis systems, and advancing the field of machine learning testing. As a developer, he is at the forefront of creating the MLigther system for holistically testing machine learning. Finally, and as a scientific disseminator, he is leading the initiative Endless Science, spreading wide scientific knowledge through video-papers in Youtube. This initiative reflects his commitment to making scientific information accessible and engaging to a broader audience.

Language

Spanish

[Video not available]

Abstract:

Speaker:

Language

Related Content

Timing attacks and the gap between research and practice

Different approaches for detecting misinformation by means of novel text analysis tools

A tutorial on Parallel Constrained Multi-objective Information theoretical Bayesian optimization