Rapid Realms: A Visual Novel with AI | Part 1

Using Stable Diffusion, Godot and GPT4, I’m going to try and make a Solarpunk visual novel in less than 30 hours


10 minutes

I’ve been messing with image synthesis tools since 2021 and have written about my experiences interacting with AI tools quite a bit since then.

I recently bought a Macbook Pro M2 Max. While setting it up and installing essentials like Homebrew and Xcode, I also introduced my device to Stable Diffusion/Automatic 1111.

Having Stable Diffusion – a cutting-edge, open-source software marvel – ON MY LAPTOP feels surreal. It just flys on Apple Silicon too.

Matt Webb‘s experiments with GP4 and I have been on my radar and I’ve contemplated employing AI as a creative co-pilot for quite some time. But the question has always been ‘what?’

That ‘what?’ was answered itself on a recent Zoom call. We were talking about how the distance between Zero and One has shrunk in certain creative areas.

In the coming years as barriers to entry drop we are going to see an explosion in creative work built using game engines. I suspect interactive, single or multiplayer software worlds will almost certainly become the folk art/media of the early 21st century. No creative person will be working on their ‘novel’, they’ll be making a game.

The only way to really know just how quickly this near future is coming and how close the distance between Zero and One has become – is to find out.

Part 2 can be found here

I’ve Decided To Speed Run Making A Solarpunk Visual Novel

The plan is to harness Stable Diffusion for the artwork, open-source game engine Godot to make the thing, and GPT4 for any coding needs. I already checked and GPT4 assures me that its well-acquainted with Godot’s scripting language, GDScript. So we’re good to go.

I aim to reconnect with my younger selfβ€”the one who dedicated a summer to crafting a ‘choose your own adventure’ game using Visual Basic on my first-ever desktop computer back in the late ’90sβ€”whilst making this game.

I’m not interested in ‘the thing’ being any ‘good’, I’m interested in it being done. Plus along the way I will have learnt the basics of everything I would need to know for the next one.

I’m going to keep track of time I spend on this as it’s important. for some reason I think I can can get to something ‘ok-ish’ in about 30 hours. Should anyone else join this adventure, they’ll be on the clock as well.

Let’s begin.

Day 1 – Morning ~4 hours

What style?

First things first, what art style is the game going to use?

Shall I use a base Stable Diffusion 1.5 model and prompt that? Or a custom one off civitai.com?

In the event I end up downloading a bunch of models from Civitai I liked the look of and began by exploring the with some super simple prompts.

Shaking the magic 8 ball to see what comes back. Looking for some inspiration.



For some reason these images remind me of Northern France as you’re driving towards Calais Euro-tunnel?

How do people look from both the realistic model and the anime one?

I’m leaning towards anime at this point. However I’m not a big fan of the basic near future Solarpunk landscapes it generated above.


This looks rad as hell. Also notice, the image and prompt is the same background/seed as the anime characters on their bikes above. Is nice as this style looks I’m not sure I want to look at this for hours in a visual novel. It would feel a bit like being inside a barcode.

Still though. Making a mental note that this style is so easily achievable.


Now we’re talking!!!

These images instantly resonated with me – this is the game I want to play. Theses initial images are all a bit wonky and weird, with too many arms and hands and weird faces right now, but I like em’.

So style found! This game is going to be SOOOOO good I know it already. Pre-Teen me is absolutely freaking out inside.


Model picked, I now need to try and find some ‘characters’. I also try and improve the level of detail in the images by doing some prompt engineering. I’m looking for a chacter design that is striking and recognisable. The sort of character you meet when you arrive at a scene in Phoenix Wright or whatever.

Side note – What the hell kind of image ratio was I using here?

These initial images look .. a bit.. weird … right?

Why do they have that weird Mona Lisa smile? The faces and eyes in particular are uncanny mashups of Disney and video games characters?

Also… this model must have seen so much porn in its time I’m surprised it’s not blind. It just can’t help itself can it?

I give these women some ‘big hats’ in the prompt to distract from their er … massive racks. I try to get other things dialled in.

Working on image fidelity and detail mostly, so I messed with the prompts a bit.

I was also trying out some ‘more details’ and ‘add details’ LoRA’s I found on Civitai

Results are below:

Much much better! Background details and the characters are much crisper and nicer. I like the blend between animated and realism.

The woman above is something like the character of ‘Frannie’ from my BSFA long listed Solarpunk short story ‘In the Storm, A Fire’.

I really need to do something about the ‘boob’ problem though. It’s a bit out of hand. Tabled as a thing to address. Training data bias is a real problem.

For now though, I want to see just how much detail I can get out of the stack of embeddings, LoRA, and textual inversions I’m using on top of the base model.

How about some Solarpunk street food vendors trying their best in the climate-camps of a post disaster future?

I decide to use the high-rez latent detail upscaler to see how good the images can get.

I’m having whatever the things the guy on the right is selling. Looks delicious.

Each of these images took around 1m 15secs to pop out of the magic image-synthesis machine I now have installed on my laptop. Just remarkable.

In 2021 it took over half an hour of waiting in a Google Colab doc to get something like this:

Cybernetic Meadow – 500 iterations CLIP (2021)

We’ve come so far in such a short amount of time.

Permanently Moved

Permanently Moved (dot) Online is a weekly podcast 301 seconds in length; written, recorded and edited by @thejaymo

Apple PodcastsSpotifyPocketCastsYouTubeOvercastAudibleRSS

Characters in ‘Places’

I remind myself that ‘the plan‘ is to make a visual novel and I question why I’m generating images in portrait? I change the aspect ratio back to landscape and start to explore an actual page/place/character/room that could make it into the game.

At this point I mess with variations of ‘big tits’ and ‘massive breasts’ in as negative prompt to see what happens. And important learning here is that prompts are essentially as course or as crude as the models training data. ‘Cleavage’ seems to be the best keyword to improve the ‘boob situation’.

It is starting to dawn on me that I’m getting a somewhat consistent character design from the same prompt with different seeds. I have ‘made a move‘ in latent space.

This is a big success!

Learning to β€˜make moves’ with DALLΒ·E mini is about learning how to write prompts that get the best from the subsurface potential of the machine intelligence you are interacting with. The prompt both represents β€˜the surface of the world’, and the thing that abracadabra’s it into existence.

Lets see what the same prompt as above looks like in high detail with a hire-rez fix (both images below have different seeds btw)

LOL @ the cucumbers? marrows? nailed(?) to the wall

I’m really happy with how everything is looking. I do a few more and pick the following image as my favourite.

I realise that the in one morning, I’ve generated more images that look like ‘Solarpunk’ to me than probably exist on the internet in total. Having Aphantasia I don’t really have an idea of what it does look like. Tho I do have opinions.

This is exactly the vibe I want.

You arrive at the location in the game and meet this character doing their thing. I think this person is a ‘real character‘, the pose, background etc. The fact she’s not looking at the viewer instantly adds a little bit more to whatever story line I might write for her.

Next step is to get as quickly as possible to an animation test. Using in painting and batch generation I set stable diffusion to give me images of the character with her mouth open

I also in paint her eyebrows and add ‘surprised’ to the prompt and generate another batch. I string the images together in DaVicnhi resolve and get the following:

This is proof of concept! I can make a game with this. Lessgo!

Lastly I wonder, what if I need/want to put the character in a different location?

Flush with the speaking animation success I generated this new image, and then attempt to use in-painting tool to remove the character from the scene.

It was here I hit my first major roadblock…..

It’s pretty clear that I have absolutely no fucking idea what I’m doing.

Stable-diffusion-art.com‘s tutorials and detailed walkthroughs suddenly become an invaluable resource and come to the rescue.

With much farting about I arrive at the following background. Not quite the rough and ready market garden we had before, and we’re not in the city limits anymore but Solarpunk Tuscany?

A make a quick matte of the original character from the animation gets made in Affinity photo.

Then I place her in the image above and re-run it to though SD. It’s only after it generates that i realise that I had selected one of the facial expression images rather than the original – but I’m feeling thrilled and another thing ticked off the list!

I close out my first 4 hours really pleased with the above images. But the matte making was looong, and in-painting disaster have left me a little deflated.

Thinking about the workflow for making and designing other characters, I wondered if it would be better to generate characters separately from backgrounds?

Over lunch I watched a few Youtube tutorials on how to get consistent characters in Stable Diffusion and fall down a daunting rabbit hole of LoRA training.

I also plug my laptop in to charge as I’d ripped through 16 hours of battery life in just 4. My friend who also has a new M2 Max told me in the pub the other day that he has never even heard its fan come on….

I’m right at the edge of compute doing all this stuff is thats for sure.

I’ll cover my afternoon experimenting with generating characters on white backgrounds in the next post.

Next Post

Prefer Email? πŸ“¨

Subscribe to receive my Weeknotes + Podcast releases directly to your inbox. Stay updated without overload!

Or subscribe to my physical zine mailing list from Β£5 a month

Leave a Comment πŸ’¬

Click to Expand

3 responses to “Rapid Realms: A Visual Novel with AI | Part 1”

  1. […] Part 1 covered the first 4 hours in my attempt at a 30 hour speed run to go from Zero to One making a Solarpunk visual novel. […]

  2. […] the first post in the rapid realms series. 2 more in my […]

  3. […] Finally managed to get Automatic1111 upscaler working again so I can get back to work on my AI game […]

Leave a Reply

Your email address will not be published. Required fields are marked *