Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders

Tomczak, M., M., Goto, J., Hockman, Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders, Proceedings of the 28th ACM International Conference on Multimedia (ACM MM), Seattle, WA, USA, October 12–16, 2020.

Work conducted during an internship at Media Interaction Group, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan.

1. Audio synthesis with trained generator (G)

We demonstrate reconstruction of bar-length drum patterns from the generator model trained on real drum recordings. Examples at 22.05kHz sample rate are recreated with Griffin-Lim algorithm together with their corresponding output from the proposed AAE-GM model. More detailed information about data used here can be viewed in Section 3.1 of the paper.

Source

Output Reconstruction

Source

Output Reconstruction

Source

Output Reconstruction

2. Latent Space Interpolation

**Figure: Interpolation between two rhythmic patterns from source to target**

The proposed model performs rhythmic transformation of bar-length drum patterns as follows:

Generator reconstruction of source input
Transformation into an intermediate rhythmic pattern
Resulting output transformation

A user is given the freedom to manipulate the structure within a bar without reliance on discrete identification of rhythmic boundaries towards a continuous transformation.

Interpolations in the latent space allow for the mixing of two different drum patterns
A gradual change is achievable from the source rhythmic pattern to the target pattern
The intermediate latent codes are produced using a linear interpolation between source and target latent codes

Source Recording

Stage 1 | α = 0.0

Source reconstructed with generator G (not interpolated)

Stage 2 | α = 0.25

This example is similar to source, but begins to be transformed closer to target

Stage 3 | α = 0.5

This example is just in-between the source and the target pattern

Stage 4 | α = 0.75

This example begins to be more rhythmically similar to the target pattern

Stage 5 | α = 1.0

Target reconstructed with generator G (not interpolated)

Target Recording