Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation
In the realm of artificial intelligence, the advent of text-to-image (T2I) systems has revolutionized the way we approach visual creativity. However, these systems often misalign with the natural progression of visual idea development. A new study published in arXiv (arXiv:2604.13956v1) introduces Creo, a multi-stage T2I system that allows users to engage in a more controlled and iterative image generation process.
Traditional T2I systems generate outputs that make implicit visual decisions on behalf of the user. This can lead to premature anchoring on fine-grained details, limiting the user’s creativity and ability to explore various design options. Additionally, unintended changes during the editing process can be challenging to correct, diminishing the user’s sense of control over the final output.
Introducing Creo
Creo addresses these limitations by implementing a scaffolded approach to image generation. The system guides users through a process that begins with rough sketches and progresses to high-resolution outputs. This multi-stage method exposes intermediary abstractions, allowing users to make incremental changes throughout the creation process.
Some key features of Creo include:
- Sketch-like Abstractions: Users can edit rough sketches, which invite modifications and encourage the exploration of design options while ideas are still forming.
- Manual and AI-Assisted Modifications: Each stage in Creo can be altered with both manual inputs and AI-assisted operations, providing users with fine-grained control over the image development.
- Decision Locking Mechanism: This feature preserves prior decisions, allowing subsequent edits to affect only specific regions or attributes without the need to regenerate the entire image.
- User Involvement: Users remain actively engaged in the decision-making process at each stage, allowing them to make and verify choices as the image evolves.
- Reduced Drift: Rather than regenerating full images, Creo applies diffs, minimizing discrepancies as the image fidelity increases.
Study Findings
A comparative study conducted alongside the development of Creo highlighted significant advantages over traditional one-shot T2I systems. Participants reported feeling a stronger sense of ownership over the outputs generated with Creo, as they could clearly trace their decisions throughout the image-building process.
Furthermore, an embedding-based analysis revealed that outputs produced by Creo are less homogeneous compared to one-shot results. This suggests that the multi-stage generation approach, combined with intermediate control and decision locking, enhances user agency, creativity, and output diversity in generative systems.
Conclusion
The introduction of Creo marks a significant advancement in the field of generative AI, emphasizing the importance of user control and creativity in the image generation process. As T2I technology continues to evolve, systems like Creo pave the way for more collaborative and iterative creative experiences that align more closely with human cognitive processes.
