An improvised duet(?) with a AI agent trained on the “Embodied Musicking Dataset. “
In this performance, Python listens to live audio input from the bass, and, based on models trained with the dataset, sends out data to Unity3D and Kyma. Unity3D creates the visuals (the firework), and Kyma processes the audio from the bass.
First, though, the dataset used for training was collected from several pianists in the US and UK. As pianists played, we recorded multiple aspects of their performance: audio, video of their hands, EEG, skeletal data, and galvanic skin response. After playing, pianists listened to their own performance and were asked to record their state of “flow” over the course of the performance. All of these different dimensions of data, then, were associated over time, and so neural networks can be trained on these different dimensions to make associations.
This demonstration uses the trained models from Craig Vear’s Jess+ project to generate X&Y data (from the skeletal data), and “flow”, from the amplitude of the input. These XY coordinates, “flow”, and amplitude are sent out from Python as OSC Data, which is received by both Unity3D (for visuals) and Kyma (for audio).
In Unity, the XY data moves the “firework” around the screen. Flow data affects its color, and amplitude affects its size. Audio in Kyma is a bit more sophisticated, but X position is left/right pan, and the flow data affects the delay, reverb, and live granulation.
As you can see, amplitude to XY mapping is limited, with the firework moving along a kind of diagonal. Possible next steps would be to extract more features of the audio (e.g. pitch, spectral complexity, or delta values), and train with those.
Applying this data trained on pianists to a bass performance (in a different genre) does not have the same goals music-generation AI such as MusicGen or MusicLM. Instead of automatically generating music, the AI becomes a partner in performance. Sometimes unpredictable, but not random, since its behavior is based on rules.