Midjourney is best at producing a diverse and aesthetically pleasing range of styles and doesn’t refuse “in the style of…” requests. However, it is worst at text-in-images, avoiding uncanny AI artifacts (like extra fingers or unrealistic postures), and precise instruction-following (it messes up the specifics). Another major downside is that they don’t offer an API.
GPT-5 produces less artistic outputs but is better at following precise instructions on text and composition details.
Gemini “Nano Banana” is somewhere in the middle where it is ok-ish at everything—better at style than GPT-5 but worse than Midjourney, better at instruction-following than Midjourney but worse than GPT-5.
Image-generation models are better at making some styles look good than others. Key characteristics of styles that look good are:
I find asking for aquarelle/watercolor paintings particularly effective, inspired by the LessWrong team.
Think about the resolution at which the image will be displayed. Though models sometimes produce good-looking images with detail, they often look worse when you zoom in.
Avoid anything where getting the detail exactly right makes or breaks the image (e.g. careful hand positioning).
Hopefully, this will no longer be necessary as models improve, but for now I still find being conservative with the composition necessary to avoid weird alien elements.
Post-processing images in Python is a useful hack to remove annoying LLM color artifacts. It often helps to automatically set all pixels of a certain color to white / your background color of choice (example code). For example, I used this trick to generate this stylistically-consistent set of city illustrations on white backgrounds.
Here is an example Gemini output. Note the off-white background:
Here it is after the programmatic correction:
Unless you're doing this in bulk, you don't need to write code to do this, just ask ChatGPT to process your image using Python (thank you Chris for the reminder).