Bass + AI: Improvisation (Python, Unity3D, and Kyma)

An improvised duet(?) with a AI agent trained on the “Embodied Musicking Dataset. “

In this performance, Python listens to live audio input from the bass, and, based on models trained with the dataset, sends out data to Unity3D and Kyma. Unity3D creates the visuals (the firework), and Kyma processes the audio from the bass.

First, though, the dataset used for training was collected from several pianists in the US and UK. As pianists played, we recorded multiple aspects of their performance: audio, video of their hands, EEG, skeletal data, and galvanic skin response. After playing, pianists listened to their own performance and were asked to record their state of “flow” over the course of the performance. All of these different dimensions of data, then, were associated over time, and so neural networks can be trained on these different dimensions to make associations.

This demonstration uses the trained models from Craig Vear’s Jess+ project to generate X&Y data (from the skeletal data), and “flow”, from the amplitude of the input. These XY coordinates, “flow”, and amplitude are sent out from Python as OSC Data, which is received by both Unity3D (for visuals) and Kyma (for audio).

In Unity, the XY data moves the “firework” around the screen. Flow data affects its color, and amplitude affects its size. Audio in Kyma is a bit more sophisticated, but X position is left/right pan, and the flow data affects the delay, reverb, and live granulation.

As you can see, amplitude to XY mapping is limited, with the firework moving along a kind of diagonal. Possible next steps would be to extract more features of the audio (e.g. pitch, spectral complexity, or delta values), and train with those.

Applying this data trained on pianists to a bass performance (in a different genre) does not have the same goals music-generation AI such as MusicGen or MusicLM. Instead of automatically generating music, the AI becomes a partner in performance. Sometimes unpredictable, but not random, since its behavior is based on rules.

Pd Machine Learning Fail

A failed attempt at machine learning for real-time sound design in Pure Data Vanilla.

I’ve previously shown artificial neurons and neural networks in Pd, but here I attempted to take things to the next step and make a cybernetic system that demonstrates machine learning. It went good, not great.

This system has a “target” waveform (what we’re trying to produce). The neuron takes in several different waveforms, combines them (with a nonlinearity), and then compares the result to the target waveform, and attempts to adjust accordingly.

While it fails to reproduce the waveform in most cases, the resulting audio of a poorly-designed AI failing might still hold expressive possibilities.

0:00 Intro / Concept
1:35 Single-Neuron Patch Explanation
3:23 The “Learning” Part
5:46 A (Moderate) Success!
7:00 Trying with Multiple Inputs
10:07 Neural Network Failure
12:20 Closing Thoughts, Next Steps

More music and sound with neurons and neural networks here:

Pure Data Artificial Neural Network Patch from Scratch

Coding (well, “patching”) an artificial neural network in Pure Data Vanilla to create some generative ambient filter pings.

From zero to neural network in about ten minutes!

In audio terms, an artificial neuron is just a nonlinear mixer, and, to create a network of these neurons, all we need to do is run them into each other. So, in this video, I do just that: we make our neuron, duplicate it out until we have 20 of them, and then send some LFOs through that neural network. In the end, we use the output to trigger filter “pings” of five different notes.

There’s not really any kind of true artificial intelligence (or “deep learning”) in this neural network, because the output of the network, while it is fed back, doesn’t go back an affect the weights of the inputs in the individual neurons. That said, if we wanted machine learning, we would have to have some kind of desired goal (e.g. playing a Beethoven symphony or a major scale). Here, we just let the neural network provide us with some outputs for some Pure Data generative ambient pings. Add some delay, and you’re all set.

There’s no talking on this one, just building the patch, and listening to it go.

0:00 Demo
0:12 Building and artificial neuron
2:00 Networking our neurons
3:47 Feeding LFOs into the network
4:20 Checking the output of the network
5:00 Pinging filters with [threshold~]
8:55 Adding some feedback
10:18 Commenting our code
12:47 Playing with the network

Creating an artificial neuron in Pd:

Pinging Filters in Pd:

More no-talking Pure Data jams and patch-from-scratch videos:

Music and Synthesis with a Single Neuron

Recently, I’ve been hooked on the idea of neurons and electronic and digital models of them. As always, this interest is focused on how these models can help us make interesting music and sound design.

It all started with my explorations into modular synths, especially focusing on the weirdest modules that I could find. I’d already spent decades doing digital synthesis, so I wanted to know what the furthest reaches of analog synthesis had to offer, and one of the modules that I came across was the nonlinearcircuits “neuron” (which had the additional benefit that it was simple enough for me to solder together on my own for cheap).

Nonlinear Circuits “Dual Neuron” (Magpie Modular Panel)

Anyway, today, I don’t want to talk about this module in particular, but rather more generally about what an artificial neuron is and what it can do with audio.

I wouldn’t want to learn biology from a composer, so I’ll keep this in the most simple terms possible (so I don’t mess up). The concept here is that neuron is receives a bunch of signals into its dendrites, and, based off of these signals, send out its own signal through its axon.

Are you with me so far?

In the case of biological neurons these “signals” are chemical or electrical, and in these sonic explorations the signals are the continuous changing voltages of an analog audio signal.

So, in audio, the way we combine multiple audio source is a mixer:

Three signals in, One out

Now, the interesting thing here is that a neuron doesn’t just sum the signals from its dendrites and send them to the output. It gives these inputs different weights (levels), and combines them in a nonlinear way.

In our sonic models of neurons, this “nonlinearity” could be a number of things: waveshapers, rectifiers, etc.

Hyberbolic Tan Function (tanh)

In the case of our sonic explorations, different nonlinear transformations will lead to different sonic results, but there’s no real “better” or “worse” choices (except driven by your aesthetic goals). Now, if I wanted to train an artificial neural net to identify pictures or compose algorithmic music, I’d think more about it (and there’s lots of literature about these activation function choices).

But, OK! A mixer with the ability to control the input levels and a nonlinear transformation! That’s our neuron! That’s it!

Just one neuron

In this patch, our mixer receives three inputs: a sequenced sine wave, a chaotically-modulated triangle wave, and one more thing I’ll get back to in a sec. That output is put through a hyperbolic tan function (soft-clipping, basically), then run into a comparator (if the input is high enough, fire the synapse!), then comparator is filtered, run to a spring reverb, and then the reverb is fed back into that third input of the mixer.

Now, as it stands, this neuron doesn’t learn anything. That would require the neuron getting some feedback from it’s output (it feeds back from the spring reverb, but that’s a little different) Is the neuron delivering the result we want based on the inputs? If not, how can it change the weights of these inputs so that it does?

We’ll save that for another day, though.

EDIT 05.18.22 – Taking it on the road!