“Prompt less, play more” is the slogan for Google’s new service, Whisk, fresh out of their Experiments lab. Bringing with it features that blend the strengths of DALL-E and Adobe’s FireFly engines: And new questions of copyright infringement?
Whisk UI and UX
From the start, Whisk has a simple drag-and-drop reference image interface where you can start out to make something into a simple plushy. So, of course, I grabbed an image of Capt. Pikachu from the Rising Volttacklers, to see if it could deliver Cap’ up into a believable stuffie my 8-year-old son would love.
And, of course, it did, complete with a tag and sitting in a white photo booth ready to be posted for sale on a site somewhere. Clearly trained on what style we see for product photography, which make since given Google’s data with commerce.
However, users can take this and see the prompt it generated to create this image. That prompt can be edited, in a Night Cafe-esq style, to further the users’ creativity. Again, this prompt is 100% AI created based on your reference images. Thus, “prompt less, and play more.” The AI, essentially, does your prompt engineering for you.
Gotta Catch ’em All
As many know, I have a routine set of tests to gauge any AI I work with. Pokémon is one of them. Being a multi-generational cultural phenomenon, their little yellow electric mouse is just as globally recognizable, if not more so, than Disney’s Micky Mouse. However, unlike Steam Boat Willy, Pikachu is anything but public domain. But, much like I wrote about with MKBHD’s observation of a plant in an AI video about a tech reviewer, it is not hard to say that Pokémon’s art style and cute tiny monsters is a style unto itself. And has defined a specific kind of anime and video game style. To check for this with Whisk, I started down a series of prompts, not giving it any Pokémon source material but rather using their prompt interface to check for that style and a screenshot of a photo of me on a mountain as its subject (and to see what their computer vision can parse). My end game in this experiment is to see what it interprets as my photo, and bring it within the world of the Pokémon style. After that, I’ll take it abstract and see what it will generate with just my photo and those prompts as a family photo in the world of Pokémon with a few of my favorite characters.
With Whisk, you can use a prompt to generate your source material.
So as Me as the Subject,
“Pokémon video game” as the Scene,
and Pokémon Anime as the Style
I Whisk’d it together and see how it draws inspiration, and ask it to “make me into a Pokémon Trainer. “
From my beard to my trekking poles half cut off in my source photo, it placed me within a square that the AI created from the Scene it generated.
The tonality of colors of the reds and blues pulled from the Style that it, again, generated. And we begin to see little monsters that could be a part of the Pokémon universe. But nothing directly from their universe, no Charizard, no Pikachu (even though It made something yellow and kinda close), and no real copyright infringement here…yet. But, of course, you know I’m going to push it further to see what it “knows” in its training data. I kept this prompt and had it generate a few more, and quickly you see it pull more and more IP from Pokémon. With Pikachu and Charizard making an appearance after asking it to put them in as “my friends”.
A Pokémon Family Photo
I decided to regenerate my Scene prompt a few times to see if it would give me something with more people and Pokémon in it before going into making a family photo. After a few refreshes, Whisk pulled a full Pokémon brand and even a pair of Pikachus with a few folks standing by a home through what looks like a tunnel. My Scene is set in my mind as to what I hope will be my Pokémon Family Portrait.
Again, keeping the same photo of me, the new Scene photo, and keeping the same Style photo I fleshed out my family photo prompt and let it rip.
And it delivered.
A Whisk of Innovation
Early on with AI, there was talk of people becoming “prompt engineers.” And really, this was always going to be a short-lived concept, in my opinion, as it was inevitable that AI would be helping optimize our prompts to get us what we wanted to see it do anyway. No different than in the early days of the internet, we all had to learn what made a good search result, and now, if you ask a college student what a Boolean operator is, they will more likely look at you with a puzzled face. Whisk brings the notion that a reference to what we already have is way better than us trying to whip up the words to describe it. Yes, a picture is worth an actual 1000 words. And combine that with the ability to prompt for reference images we don’t have, and we have the perfect imagination batter to whisk together inside our AI bowl (yes, you know I love my metaphors).
Firefly did add the ability to add a reference image to train its output on a style, but nothing to this level. Then again, Firefly is trained only on their artists’ images within Adobe Stock, whereas it is clear that Google trained Whisk on the open internet. But what I find interesting with Whisk is perhaps Google might Notebook LM this system, where it would be great if creators could confine this system to their own images to train.
Again, right out of the gate, I look at this and scratch my head as to how AI can legally be able to create something like this using their own copywritten materials. The Pokémon Company, in the last year alone, has already filed several lawsuits protecting their property, most notably against Palworld. But in all these instances, this was about their gaming IP and their ideas about their world. Will we see more of this within the AI image space? Will it lead to guidelines or legislation, or just more and more of a grey area on what is deemed copyright infringement as a human isn’t “creating” this…a bot is…right? I feel like this is an infection point to the metaphor of a tree falling in the forest, but nobody is around; does it make a sound? If an AI is trained on all the data to be able to make something, but nobody is around to push the keys to prompt it to do so; does it make it copyright infringement?
Other resource links
- Google Whisk
- Google Whisk FAQ
- You can favorite creations, they go to your Library
You must be logged in to post a comment.