💯💝 emoji in a yellow circle
Charlie Gleason

Charlie Gleason

Designer, developer, creative coder, and musician

Back to projects

Lysterfield Lake

An interactive, AI-generated 3D music video.

Lysterfield Lake is a song about a place outside Melbourne, Australia. It's about the endless summers of your youth, and the tiny changes in you that you don't even notice adding up. It's also about memories, which, like polaroids, fade and change over time.

I don't remember exactly when I wrote it in much the same way I don't remember exactly when I started this project, which has spanned the better part of a year and has all but consumed the most creative bits of my brain.

I think it all began when I came across Replicate (and by extention, their open-source container for running machine learning models, Cog.) I was so inspired by their Explore page, and, if I'm being totally honest, I was worried about the cost of experimenting with AI. I'd set up a gaming PC in the weird limbo of the 2020 lockdowns, and these tools made it effortless (and for the most part, free) to try, and test, and many, many times, fail.

Project Metadata

Design, development, 3D modelling
Charlie Gleason
Support and Encouragement
Glen Maddern
Company
Side project
Timeline
Early 2023 - December 2023
Platform
Web
Industry
Music
Libraries
Vite, React, Three.js, React Three Fiber
Tools
Blender, After Effects
A selection of dreams from the final video.
A selection of dreams from the final video.
A selection of dreams from the final video.
Zoomable

When I decided to release new music (it's been seven years since the last Brightly record) I knew I wanted to build something with these bits and pieces I'd been noodling with. I knew I wasn't great at making traditional music videos, and I felt empowered by the flexibility of the browsers, and the creative opportunties afforded by using machine learning and AI. I think the result is something greater than the sum of its parts.

And there are a lot of parts.

(Also, if you haven't seen the video, you should go and check it out before I ruin the magic by shining a very bright light on exactly how it all works.)

A preview GIF of the experience.
A preview GIF of the experience.

(And if you'd rather check out a recording, you can jump over to YouTube to see a simplified version that should give you a pretty good idea.)

Three polaroids containing the singer and lyrics overlaid.
Three polaroids containing the singer and lyrics overlaid.

What is it?

Lysterfield Lake is a 3D generative interactive music video. It works in the browser, using Three.js and react-three-fiber to stitch together seven feeds of video frame by frame, turning a single piece of footage shot on an iPhone into a three-dimensional fever dream. It uses the accelerometer in your phone, if it's available, or your mouse if you're on a computer. It shows and hides the protagonist as you watch depending on your actions, offering a myriad of ways to experience it. New dreams can be added at any time.

Behind the scenes it was modelled in Blender, (using a polaroid model by Edoardo Galati), generated in python and shell scripts, and built in JavaScript. It is made up of entirely open-source projects that are freely available.

A frame of each of the initial dreams.
A frame of each of the initial dreams.

The process

So, how does it work? I made a diagram to help.

A diagram showing the process of how the technology works under the hood.
A diagram showing the process of how the technology works under the hood.

First, a single video file, shot on an iPhone, is fed through a series of python and shell scripts. The initial one splits the video into individual frames. These frames then go through a bunch of processes depending on their final output.

A preview of the finished process.
A preview of the finished process.

The original footage vs the final output. Note to self: I probably should've gotten a haircut before this.


The Avatar

A 3D-watercolour version of the subject matter (which is, in this case, me).

A diagram of how the avatar is generated.
A diagram of how the avatar is generated.
A diagram of how the avatar is generated.
Zoomable
  • DiffusionCLIP turns the image into a painted watercolour style.
  • RobustVideoMatting finds the subject in the video and creates an alpha mask of it.
  • In python, the mask is used with the original image and passed to ZoeDepth, which creates a depth mask.
  • These three elements (the watercolour image, the alpha mask, and the depth mask) are used to render the avatar.
A preview of the generated avatar, from frame to 3D avatar.
A preview of the generated avatar, from frame to 3D avatar.

The Background

A watercolour version of the original image, without the subject matter.

A diagram showing how the background is generated.
A diagram showing how the background is generated.
A diagram showing how the background is generated.
Zoomable
  • Using the alpha mask and the original image, Inpaint Anything creates a version of the scene without the avatar in it.
  • This inpainted image is then also painted in watercolour using DiffusionCLIP.
  • This element is then used to render the background.
A preview of the background, from frame to watercolour background.
A preview of the background, from frame to watercolour background.

The Outline

A comic-book style sketch of the subject matter.

A diagram showing how the outline is generated.
A diagram showing how the outline is generated.
A diagram showing how the outline is generated.
Zoomable
  • In python, the original image is combined with the alpha mask to create a version with no background.
  • This is then fed into artline-torch, which creates a rough outline sketch.
  • This sketch is then overlaid on the avatar during rendering (it also uses a bit of dithering in the custom shader in Three.js, which is what gives it the printed ink / slight comic book style).
A preview of the diagram, from frame to outline.
A preview of the diagram, from frame to outline.

The Dreams

Prompt-generated reinterpretations of the original image.

A diagram showing the dreams.
A diagram showing the dreams.
A diagram showing the dreams.
Zoomable

The original images are fed into Deforum Stable Diffusion, along with a prompt, creating unique scenes that mirror the subject matter. For example, trees and a lake that match the contours of the image.

A preview showing the dreams, from the original frames.
A preview showing the dreams, from the original frames.

Each frame is then manually reviewed one by one and removed if not appropriate. (The primary reason for this is that Stable Diffusion models are trained on image sets that include a large amount of waifu's. Which, because the input clearly contains the shape of a person, means a lot of waifu's end up in the output, even when politely asking Stable Diffusion not to.)

An example of the tendency AI has toward waifu. Turns out every creative process eventually takes a sharp detour into the uncanniest valleys.
An example of the tendency AI has toward waifu. Turns out every creative process eventually takes a sharp detour into the uncanniest valleys.

The remaining images are fed into rife-ncnn-vulkan, which interpolates the missing frames and generates new ones to fill the gaps.

A diagram showing the interpolation process.
A diagram showing the interpolation process.

The Lyrics

The lyrics that are get overlaid on the polaroids.
The lyrics that are get overlaid on the polaroids.

At the same time, I recorded myself hand writing the lyrics in Procreate, and then cleaned them up in Adobe's After Effects.

(Apologies for my handwriting. I clearly should've been a doctor.)


The Output

An example of each frame, with the different dream for each.
An example of each frame, with the different dream for each.

This gets stitched together into an ultra-wide video with 7 square images side-by-side. Originally I tried using a single video for each element, but it‘s impossible to guarantee frame sync across multiple videos in the browser. Fortunately, much like how there's no rule against eating a candy bar on a dance floor, there's no rule that videos need to be 16x9. Using custom shaders in the browser, you can simply grab the crop of footage you need for each element in real time.

And having all of these elements in the one shader affords new opportunities. By passing the depth mask, alpha mask, and the watercolour scene, I was able to generate a rough 3D model.

An example of how the elements of the avatar are stitched together.
An example of how the elements of the avatar are stitched together.

Custom shaders essentially let you paint with pixels in real time, and it's an incredibly exciting medium once you start to explore the possibilities. It is how I can use the pixels from the depth mask to directly impact how close or far each fragment of the 3D avatar is. And, if you'd like to learn more, The Book of Shaders is a great resource.

Frame-by-frame animation of the AI-generated depth filters. AI interprets the depth mask from a video, because we live in the future. (Also, I hand painted a lot of these frames, because technology is, sadly, fallible.)
Frame-by-frame animation of the AI-generated depth filters. AI interprets the depth mask from a video, because we live in the future. (Also, I hand painted a lot of these frames, because technology is, sadly, fallible.)

Each frame of each video includes the handwritten lyrics, the watercolour scene, the watercolour scene with the background removed, the alpha mask, the depth mask, the sketch, and the dream. And those ultra-wide videos form the backbone of the project.


The Conclusion

The selected dream, represented by one of these videos, is then chopped up and stitched back together in real time in the browser, using react-three-fiber, and many of the incredible resources provided as part of the drei project. You can explore each dream one by one, wandering through a landscape that has been imagined by a computer, listening to a song about a time that sits on the very edge of my conciousness. Memory, like AI, is funny like that.

The rendered final output.
The rendered final output.

This is a probably a good time to mention that, while this is the conclusion, this project took every possible detour, hit every possible technical hitch, and explored every possible weird twist and turn. What I've ended up with is absolutely nothing like what I imagined, because I didn't really know what I was going to get at the end. Which makes it a sort-of love letter to the creative process, and to the joys of pursing something purely for the “what-if” of it. There are so many ways that sticking with something can truly surprise you, and that doesn't have to involve making weird music videos — it could be baking, or cross stitch, or learning a language, or dance. Things you can experiment with, be creative with, grow with. Things that make you feel like you did something you previously couldn't. Maybe even something great. That might be the greatest feeling there is.

I hope you find it inspiring, or at the very least, mildly interesting, and I appreciate you taking the time to learn more about how the project works.


Okay, how do I find out more?

If you'd like to check it out, you can do so here: https://lysterfieldlake.com/

If you'd like to share this project, that would be greatly appreciated. The project itself is entirely open-source, and available at the following GitHub repositories:

👉 Client App
A React app, powered by React and react-three-fiber.

👉 Pipeline
The pipeline that generates the video used by the app, powered by python and shell-scripts.

(Author's note: Fair warning—it's not the cleanest code, and in the case of the pipeline, it's not something that will work locally out of the box. I definitely intended open-sourcing this to be educational, in the sense of “oh, that's how he did that!”, as opposed to aspirational, like “oh, that's how he thinks React should be written??” It was a passion project, so maybe bear that in mind. Cheers. 😅)

Typeset in Commuter Sans and the system font stack.

Built with NextJS. CSS with Tailwind. Hosted on Heroku.

Primary illustration inspired by The Design Squiggle.