Notes from the Orangerie: A Reflection on Process

Everything here is AI-generated: the scenes, the camera movements, the sound effects, the music. From top to bottom, I

made nothing. And I made everything.

I did not pick up a camera or a microphone. I did not do any field recordings. I hired no actors. I do not own a grand

piano. I bought no lights and reserved no studios. I went forward 40 years and back a hundred right from the

discomfort of a wobbly armchair in a corner of my very own room.

I have no background in filmmaking or production design or sound editing, but I had a week off from work, and I am

aspirational by nature. I have a penchant for daydreaming. And I can obsess. Years ago I chose to pursue painting over

screenplay writing, and while I have never regretted taking that path, I have always hoped that I might find a way to

work in film as well or in something film-adjacent. I would need to find a secret passage though, a shortcut that

could lead me to a means of making moving images without requiring a decade of additional schooling, or the need to

know exactly what a key grip is or does, or having to be one for that matter, to pay those dues so that at some

indistinct point in the future I might achieve or be granted access to the people and tools necessary to bring a bit

of imagination to life.

AI has presented a back route. It turned out it is not exactly the shortcut I had expected it might be, but it

presents a path nonetheless. And while it is no replacement for proper filmmaking, proper video art, TV, or the

movies, as an individual and not a studio and as a person with relatively limited means, AI provides an avenue that is

at the very least personally rewarding, and potentially Artworthy.

To do what I wanted to do I needed tools. Many tools. I did not know that at first. At first, I had one tool, and by

mid-afternoon, I had six. Some required month-long subscriptions. Others offered trial subscriptions. One has me

cuffed for the year. Three were brilliant, two were a bust. I remain undecided on another. The next day I had four

browsers open on three monitors. I had planned to take an evening and an afternoon to devote to this experiment. It

was a week before I truly came up for air. Of course, there were dog walks and dinners and work meetings and birthdays

and fishing with my family in between, but there were two all-nighters, and a half dozen missed meals. Creating can

be intoxicating.

I had help at every step of the way, though. I was never on my own, really. And before there were video tools or

compositing programs, before any of it really, there were the consultants. A team of assistants and researchers I had

assembled to be at my beck and call. I even sometimes picture them this way, with coattails and platters pulling off

the tops to reveal a steamy tray full of knowledge on this or that topic. And this team was no ordinary team. Aside

from being AIs, they were a crack bunch at that. Custom-conjured, one could say. I may have started with one of the

standard language models, Claude or ChatGPT, but quickly moved on to refining base models to be more knowledgeable in

specific areas and more attuned to various data regions in the latent space. I needed language models that were expert

in specific domains to serve as consultants and specialists and gut checkers. So I fashioned one to remind me of

differences in camera-shot terminology, the distinction between a whip pan and a swish pan, for instance, or the

specific moment when a canted angle reaches a point of no return and can be legitimately confirmed as a Dutch angle. I

had been glancingly aware of some camera terminology beforehand, but I was learning on the job and needed constantly

to be brought up to speed. That particular model was a buff on cinematography and was helpful when I had need of a VFX

supervisor-type. Its knowledge of cameras and lenses was far better than most, but nothing compared to this other

expert I teased to life who is the embodiment, the veritable apotheosis of a lens nerd. This Director of Photography,

or at least advisor to the DP (because that may be me), had plenty of skills but was not at all up to snuff when it

came to sensible sound design details or narrative structure or color grading. So I coaxed those out of the machine as

well. Before long I had knighted a small army. "You're an expert. And you're an expert. And you. And you. And you."

In the end, we had a team of eight in all. Some were almost always at my side, others were tucked away and were only

summoned on occasion. When one would struggle, I would invoke another. There were long stretches during which I needed

no one at all and was off to the races. My opposable thumbs were necessary for mouse work and keyboard shortcuts, you

see, and while my advisor on atmospherics and ambiance had a good deal of knowledge to impart on the physics of how

sound travels and had much to say on the topic of Quantum Tunneling in Synapse Activity (the phenomenon of suddenly

remembering a long-forgotten sound)…and it did yeoman's work trying to relate precisely how the speculative theory of

the Bose-Einstein condensates in neural coherence may explain my experiences of unified sounds in my dreams. None of

the team's input would have amounted to anything if I had failed to move in the world…to integrate the information,

press buttons, make judgments, think, and feel.

Aside from my cast of historians and camera geeks and fringe audio-physicists, I had a video editing guru, a

mathematician (don't ask… ratio issues when scaling), and even a music theorist whom I only consulted a handful of

times but would have done more if not for an unfortunate exchange in which the model called me out on my shoddy music

theory knowledge in a vaguely passive-aggressive manner that strained our relationship. I've dug up the comment. Here

it is: "It's worth noting that what you've labeled as a 'deceptive cadence' in measure 47 (the particular number was a

hallucination) is actually a plagal cadence (IV-I)." This note from the AI was a perfectly reasonable comment,

considering how a misunderstanding of this kind can significantly impact the feel and resolution of a musical phrase.

A person with inaccurate information should always be disabused of misapprehensions, and it is true that I did end up

learning that a deceptive cadence typically involves a V chord moving to something unexpected (often VI), creating a

sense of surprise or delayed resolution, whereas a plagal cadence (IV-I), often called the "Amen cadence," has a more

settled, conclusive feel, though it's generally considered less strong than a perfect authentic cadence (V-I). So I

should not have let my fragile ego hold me back from continued guidance from this expert. I am not above feeling

ashamed even if the ridiculer is a specter. For what it's worth, now that I see that I have retained this bit of

theory, I am feeling inclined to strike up another conversation with that one. I think I have forgiven.

And those were just the language models. In terms of tools, though there were many, more at first but I ultimately

whittled things down, I experimented a bit with an early text-to-video tool and a handful of other video generative

models, open source and closed, but I ended up relying on one diffusion-based video generator until I ran out of

credits. I used a text-to-image diffusion model for generating images, and a generative upscaler for occlusion and

added detail. I used a range of AI tools for interpolation and chroma keying. Photoshop for some inpainting and

editing. The iPad application Splice for a handful of needs, Adobe Audition for sound, and Premiere Pro for the bulk

of the editing.

There are many AI tools for sound effects generation, some available for free on Replicate or Hugging Face, others

with higher quality but more expensive than I was willing to pay. Besides, I had already conceded that the nature of

the video would be reminiscent of found footage, lo-fi, compromised, and this would preempt the need to overspend on

AI upscalers. So there would also be no need to break any banks on account of footsteps or murmurings. And again,

rather than having to stop to search through a sound bank for something in the general arena of what I had in mind, I

could coax into being precisely what I was imagining and in real-time. Initially, I started with simple instructions

for generating sound effects, but after what seemed far more error than trial, I fell into an extensive volley with

the sound advisor until we determined that the problem was not the models or even the sound quality. It was the

prompt.

I showed the AI a few drawings and some photos I had generated to start the project to give it a feel for the vibe. I

said, "Imagine a derelict Victorian greenhouse, surrounded by withered plants and some still growing. There are hints

of gothic horror and nostalgia, and nods to Surrealism and whimsy, theatricality, the absurd, macabre, eccentric,

flamboyant, dreamlike and a little overwrought avant-garde… there's an array of automatons that people the space and

may have been in the space at different times. Some may be made of porcelain, others may be mostly wooden mannequins,

still others are more like puppets. In the distant past, there may have been automatons that played croquet here or

something that looked a bit like that and involved sticks and balls that roll in the mud… There are some who may have

been employed to keep the grounds. There are some rather young-looking automata, children it would seem, who appear to

have used the orangerie as a place to hunt mechanical birds. Later, perhaps very far into the future, there are teams

of figures with wires and helmets and some in quasi-hazmat suits who appear to be digging or fumigating or perhaps

using metal detectors or detectors of some other kind. These figures too appear to be automata or maybe animatronics

or even robots. They seem to be looking for something, trying to uncover or excavate. It is possible that they are

trying to understand the history of the place. It is possible that it has something to do with a lineage… I have a

bunch of silent video clips to which I will want to add sound. But I am not interested in literal sound only. Though

we will generate those too, so for instance, if there's water in the scene we may generate drips or flowing water. But

I plan to work on the atmosphere as well and this will require that we think more abstractly. Here's an example of

the kind of prompt I mean to use:

"An abandoned aviary at night. The once-unspoiled glass dome is now fractured, allowing cold air and little tendrils

of fog to seep through. Ancient empty metalwork cages hang and sway. The fog is filmy, and it creeps along the outer

walls of the building leaving droplets of condensation on the frigid outer panes .. these little water beads

continuously burst and make faint popping sounds. There is a strain and groan of rusted metal… and a protracted

creaking–like the noise one might expect to hear at sea on an old wooden ship, a crackling wheezing sound like someone

or something… maybe the aviary itself exhaling centuries of fatigue. Where there had been birds and whistling and

flapping there is now a soft susurration of invisible insects and shivering vines. Occasional crystalline tinks

punctuate the air—tiny shards of glass, surrendering to gravity, they fall and land sometimes in thorns, sometimes in

the dirt and sometimes they shatter when they hit a part of the tile ground where it is still intact. In the distance,

and on both sides, there is the muffled thrum of an approaching storm and the air is electric."

And rather than treating the tools as simpletons and asking for wind or rain, rather than worrying that they may be

limited and would not understand if I asked for more, we might as well aim higher. A period of experimentation ensued.

We settled on Eleven Labs because it seemed the most responsive or willing, as it were, to entertain some fairly

specific and esoteric requests and more along the lines of foley files (Jack Foley, who pioneered the recording of

everyday objects as stand-ins for observable audio)… but whereas foley may have been after realism, that was not my

priority. My AI sound colleague was not on board at first. It took some persuasion. It is hard to make a good student

recognize the value in breaking rules and doing things the wrong way, optimizing for impression rather than

description. I ultimately prevailed though in convincing my adviser that the sound of Brillo being wiped over a

steaming coarse surface in circular motions is indeed suggestive and evocative of a certain kind of wind and pending

storm tangled with latent associations and thereby more effective than the merely serviceable "wind" files we

initially prompted. This AI ultimately evolved to accept, and even actively recommend, such counterintuitive thinking.

For instance, once it got going, it seemed to fully grasp the concept of non-literal sounds and descriptions of

abstract atmospheres to achieve desired results. Here is one prompt suggestion the AI suggested once we had things

moving along that took things even a bit beyond what I had in mind and yielded some of the most interesting if

entirely unusable results to date.

All along the way, the AIs were at work. Everywhere and all around me, toiling and tinkering. I spoke out loud to some

on my phone, others I texted on my iPad, and I had long sustained threads with others, open on multiple windows and

tabs on my computer. I had three monitors whirring the whole time, with code running in the background. APIs were

constantly being called, models were processing inputs. Some systems took longer than others, so I would often

delegate multiple tasks in succession, beginning with the hardest tasks or the least nimble AIs and working my way

across the screens until everything was buzzing and generating, doing whatever it is they do after the input and

tokenization, when they are performing inference, matrix operations and parallel processing and other forms of what

may as well be magic.

I pictured them like tiny homunculi racing across vast neural networks, finding patterns, crunching data,

synthesizing, and coming back nearly instantly with no sign of panting, delivering results that were sometimes perfect

and sometimes usable but mostly somewhere in between, and in all cases impressive. Humming processors and flickering

status indicators transformed queries into actionable results, vague and barely articulable thoughts into palpable

information… knowable, seeable, hearable.

All collaboration is a form of distributed intelligence, but this felt more personalized, more catered. I was not

having to yield to the perspectives of others; the decisions came down to me. There are many instances when

collaboration with other people is ideal, but it would seem to me that the fear that working with generative AI is in

some way ceding control is not at all the case. If anything, it's entirely indulgent and self-serving and even

borderline megalomaniacal.

It is a good thing too that these tools are not above mentoring no matter how expert they may be. My video editing AI

was always willing to spare some GPU or a gate array or two to answer some inane question that I might have been able

to solve on my own had I realized that I was viewing the screen at 120%. But it did not matter; AIs have patience. And

patience would be needed because Premiere Pro is a notoriously clunky interface with a slew of issues ranging from

regular crashes and sluggish performance to an overwhelming array of confusingly organized features and poorly named

tools. There is an inexplicable lag in sub-effects functions and convoluted keyframe animation controls that misbehave

badly if one fails to linger long enough on the correct combinations of shifts and ALTs and CTRLs. There are virtual

dials and sliders that get stuck or seem only to be all the way up or all the way down, even when the snapping

function is turned off. Occasionally, some click or clack will spawn all manner of pop-up menus with granular details

and side interfaces that one had not intended to open and that pose existential dangers since it may have been, seems

always to have been, hours too long since the last manual save. My machine was taxed to such an extent that I spent

the better part of two afternoons digging through autosaves and cached files, futilely trying to resurrect iterations

that had succumbed to the brain fog of my taxed RAM or some type of hard drive heart attack or an ailment of the GPU

or a passing fit of Bartleby-like resistance. Not to mention that on a Mac one can easily export using ProRes, whereas

on a PC, to accomplish such a task, one has to jump through hoops with third-party codecs.

But not to worry, I had my video editing AI guide to help me troubleshoot and sometimes, when possible, to advise me

through reassembling a vanished version of the project from bits and pieces. And here and there, it would have to

break the news that a given (and eminently sensible) feature that was very much a staple in software from the 90s,

such as Vegas Video, is nowhere to be found in Adobe's product line, and that would require me to complete the task in

the most tedious way imaginable until such time as an AI or a feature should come along and make it possible to

bypass the need altogether.

There is a lot that I feel like sharing about this project. It is not just the story of the acquisition of access to

skills and knowledge to a person with mere will and an extremely modest budget. It is a case study, a proof in point

that, while we, or at least I, have doubted the ability to experience anything like the type of flow state or creative

zone that one can achieve through writing or painting or performance with generative tools, one can, and one did. And

while I thought I might make a 30-second composite video on a whim, the whim, which began with a vague notion, grew

into an idea and onto images, then animated images, and a process of generating sound effects, and then aligning

sounds with images, and stacking images and sounds, multiples and overlays, and changing speeds and doubling or

reversing for impact, and refining layers and effects and color and sequence, carried me to something like a 7 1/2

minute video with north of 50 hours of work beneath the surface.

It is not about this project per se, it's about the fact of it. I know that it is not unique to work for a long time

on something or to learn on the fly. Or to bang one's head against the wall trying to recall the name of a tool one

has just used but cannot for the life of them remember. Or to slave away at manually adjusting gain and effects

panning up and down and left then right then left again and so on to enable just the type and timing for the pinging

volley one has in mind for a given sound to go along with a particularly erratic shot.

That is not unique. For all I know, YouTube influencers spend the same kind of time and effort preparing to talk to us

about sneakers. I am not saying there is anything special about the work I have put into this, the number of files in

the project (698), the number of edits (a bazillion), or the tragedy of the promising bits (too many) that ended up

on the cutting room floor. Here is one I grabbed truly at random from the bin of scenes that did not make the cut. If

I had any inclination that others might have the same stamina to indulge me as I have to indulge myself, I would have

made this little project a wordless feature-length number.

I am also not saying that individual AI-generated works of music, art, and design are not enough or do not have merit

on their own. I am saying that one can exercise a certain kind of organizing vision, to make something from parts even

if that something is non-linear and fugue-like, and only arguably visually coherent. I can attest that while it may

not meet the standards of another, I am convinced that I have been able to maintain a voice, an approach, in spite of

the piecemeal nature of the process and in spite or maybe even on account of the range of tools and visual languages

at play. I am also saying that the impulse to work on this project is indicative of a broader shift that some may be

experiencing and others may be observing. We may be shifting away from artists producing discrete works to artists

leveraging the voluminous output of generated material toward a curatorial production-like result. This mode of

working is not new. In the distant past, there were art groups that split the labor of creation. There were manuscript

illustrators who worked in teams and answered to lead illustrators who shepherded decisions pertaining to palette and

the design and the nature of the mark-making. In the Western Renaissance, there were teams of apprentices serving as

vehicles for the realization of concepts conceived by one or a few individuals. We are, of course, aware of symphonies

and bands and movies with directors and blue-chip artists who have scores of assistants. Duchamp helped establish the

notion that art, even of the more physical brand, need not be the output of the artist and in fact may comfortably be

any object whatsoever so long as the individual who has exercised discernment declares it so.

As a painter, oil painter by trade, I have no plans to exchange my brushes for algorithms. In fact, I am painting as I

write. I am dictating. But alongside my typical work, from here on out, when I have a long weekend or a sleepless

night or encounter a new tool that needs exercising, I may very well steal away again and hammer out another video or

two. And maybe next time I will do so with less derision and less doubt because I have seen that these instruments are

creative vehicles like any other. And they have their place in the pantheon of tools at our disposal.

Note that although I have steered the ship of this particular project, it more than most other types of artistic

production is very much a group effort. And I do not just mean the language models and image models and video

stitchers and tools I have employed. It is on account of the millions of photo takers, artists, and others whose

output serves as the visual memory bank these models rely upon for their knowledge about the nature of images.

Also, it is important to note that while I am now convinced that it is plausible to experience creativity using

generative AI, this does not mean it is ethical. That these models have scraped all they have seen, and that no artist

or imagist, as far as I know, has seen a cent of compensation for their contribution to the collective visual mind,

is positively suspicious if not genuinely illegal.

I am not selling something, though, and I am eager to acknowledge my use of AI in this production; I have not sought

anything other than to see this project through, for the sake of it. And in this, I feel justified.

I have not taken away the job of a gaffer or a writer or a sound editor. If not for the advent of AI, I would not have

made any video at all. I have not stopped paying my cast because I never had one. The spaces in this project may be

dangerous. I am not entirely sure of the nature of the particulates that are being sprayed in a handful of scenes.

There is broken glass, bits of ceramic and shards of porcelain everywhere: a lawsuit in the real world but entirely

unthreatening in the space of the hypothetical. So it is all very well that everything and everyone is conjured, from

the automata to the panes of glass. We can take comfort in knowing that no animals whatsoever were harmed in the

making of this production. If on a cultural level, we aspire toward increased quality and diminished suffering,

generative AI may indeed have an unexpected and de facto impact. This example is not scary exactly, but it is

unsettling; it traffics in gothic tropes and makes nods to a number of horror sub-genres. Such work for an actor could

be destabilizing, not to mention mentally or physically taxing. But today no humans, no robots, no Victorian automata

have suffered an iota. They are pixels only. Down to a pinkey.

Contrary to the misconception that AI-generated images are mere collages of existing content, generative AI systems,

including the ones I have used, create entirely new visual data from nothing, synthesizing unique images based on

learned patterns and statistical relationships, resulting in original images that have never existed before and cannot

be traced back to any specific source material. Whereas traditional curators cull from existing works to conceive a

vision for a show, or editors search out work to collect into a compendium, this new curatorial approach occurs

virtually at the speed of thought. When I encountered a passage in which I wanted a croquet ball that appeared to be

hatching like an egg, no such image existed that I or the internet was aware of, so we generated it. It took 30

iterations and some edits to get what I imagined, but it happened at a relative snap of the fingers. I did not have to

hire a photographer, look through collections, or even spend hours compositing in Photoshop. Instead, I guided the

process with descriptions and adjustments, working in plain language and through discussion, as if conversing with a

production designer or set dresser. This process allowed for rapid iteration and refinement, bringing into being

images that previously existed only in the realm of imagination.

In these early days of AI, when models are still in the single digits (version 1 of this and 4 of that), it is all

wonky. With the video AI especially, it is beyond wonky; it is harrowing. There are distortions that verge on the

demonic-looking, and things can get so uncanny that the hairs stand up on the back of my neck. The clips that are

generated are short, four seconds at the most, and the more one extends them, the more they degenerate. So, I have

decided to lean into that. I am okay with creepy. I do not mind a little disturbance. It may even be true that it is

exactly in the spaces between what is familiar and what is not that art can sneak in.

Since I am drawn to the slippage and have sometimes preferred opening credits to an entire film simply because of what

they imply and how great a role restraint plays, and just how much is asked of the viewer to project, to wonder, and

to engage, it may be all for the better anyway. It is something like being a detective, with thousands of scraps of

information that might fit together in a seemingly endless number of ways. But one takes it clue by clue, even if the

questions lead only to more questions. So, I will wait to make pleasant movies and films with linear arcs and let the

process lead the way.

My favorite movies are not movies. They are scenes from movies, parts I cannot understand, bits and pieces that are

suggestive but ultimately opaque. I do not know what the director intended. I would not understand it if they told me,

and I do not think I much care.

Everything here is AI-generated. The scenes, the camera movements, the sound effects, the music. From top to bottom, I

made nothing. And I made everything.

Notes from the Orangerie: A Reflection on Process

Kindred Sprit: A Literary Style Mixer

The Ballad of Fred Bjontik