Google has launched its latest experiment in generative AI, named Whisk, a tool that aims to transform the creative process by allowing users to generate images through prompts based on other images.
Unlike traditional image generation tools that rely on detailed text descriptions, Whisk enables users to drag and drop images for the subject, scene, and style, and remix them to create unique visuals, said Google in its blog post.
As per the tech giant, the process is powered by Google’s Gemini model, which automatically generates a detailed caption based on the inputted images. These captions are then used to feed into Google’s Imagen 3, the company’s latest image generation model. Whisk’s approach captures the essence of the subject rather than producing an exact replica, enabling users to experiment with combinations in novel ways.
Google describes Whisk as a tool for rapid visual exploration, designed for users to quickly create and iterate on a wide range of visual concepts. The platform is not intended as a traditional image editor but as a space for creatives to explore ideas in a flexible, iterative manner, added the California-based company. The result is a mix of new possibilities, from digital plush toys to enamel pins and stickers.
However, Whisk’s ability to generate highly accurate images may be limited. As it extracts only a few key characteristics from the uploaded images, the final results may not always align with users’ expectations. For example, the generated subject might have subtle differences in attributes such as height, weight, or skin tone. Google acknowledges that these features can be important for users and provides the option to edit and refine the underlying prompts as needed.
The launch of Whisk follows Google’s introduction earlier this year of its video generation model, Veo, and the subsequent release of Veo 2 and the latest iteration of Imagen 3. Both Veo and Imagen 3 have been lauded for achieving state-of-the-art results in their respective fields and are now available in various Google tools, including VideoFX, ImageFX, and Whisk.