Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders

Audio examples accompanying paper for ACM International Conference on Multimedia (ACM MM) 2020.

Tomczak, M., M., Goto, J., Hockman, Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders, Proceedings of the 28th ACM International Conference on Multimedia (ACM MM), Seattle, WA, USA, October 12–16, 2020.

Work conducted during an internship at Media Interaction Group, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan.

1. Audio synthesis with trained generator (G)

We demonstrate reconstruction of bar-length drum patterns from the generator model trained on real drum recordings. Examples at 22.05kHz sample rate are recreated with Griffin-Lim algorithm together with their corresponding output from the proposed AAE-GM model. More detailed information about data used here can be viewed in Section 3.1 of the paper.

Source
Output Reconstruction
Source
Output Reconstruction
Source
Output Reconstruction

2. Latent Space Interpolation

Figure: Interpolation between two rhythmic patterns from source to target

The proposed model performs rhythmic transformation of bar-length drum patterns as follows:

  • Generator reconstruction of source input
  • Transformation into an intermediate rhythmic pattern
  • Resulting output transformation

A user is given the freedom to manipulate the structure within a bar without reliance on discrete identification of rhythmic boundaries towards a continuous transformation.

  • Interpolations in the latent space allow for the mixing of two different drum patterns
  • A gradual change is achievable from the source rhythmic pattern to the target pattern
  • The intermediate latent codes are produced using a linear interpolation between source and target latent codes
Source Recording
Stage 1 | α = 0.0
Source reconstructed with generator G (not interpolated)
Stage 2 | α = 0.25
This example is similar to source, but begins to be transformed closer to target
Stage 3 | α = 0.5
This example is just in-between the source and the target pattern
Stage 4 | α = 0.75
This example begins to be more rhythmically similar to the target pattern
Stage 5 | α = 1.0
Target reconstructed with generator G (not interpolated)
Target Recording