Audio Style Transfer with Rhythmic Constraints

Tomczak, M., C., Southall, J., Hockman, Audio Style Transfer with Rhythmic Constraints, Proceedings of the 21st International Conference on Digital Audio Effects (DAFx), Aveiro, Portugal, September 4–8, 2018.

Audio examples

The presented 15 transformation pairs were tested with 3 different loss terms L1, L2 and L3 defined in the paper, as well as, several additional audio style transfer (AST) transformations. In addition, transformations acquired with default parameters from AST approaches by Barry et al. (2018), Mital (2017) and Ulyanov et al. (2016) are included with the results. Inputs A and B refer to terms content and style used by the authors of the compared papers.

Mashup transformations using L2.

Input A: Marching In The Streets by Harvey Mason

Input B: Night and Day by Idris Muhammad with George Coleman

Output Loss 2 (Mashup - Style A + Style B)

Input A: Colours of the Season by Daudi Matsiko, Yung Veerp, Fazerdaze and others

Input B: Loop Trigger by Mathew Jonson, GPU Panic

Output Loss 2 - Loop Triggered Colours of the Season

Transformation comparisons.

Pair 1

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 2

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 3

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 4

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 5

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 6

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 7

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 8

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 9

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 10

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 11

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 12

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 13

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 14

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

Pair 15

Input A

Input B

Loss 1 (Vanilla AST - Content A + Style B)

Loss 2 (Mashup - Style A + Style B)

Loss 3 (Augmented Mashup - Style A + Style B + Content B)

Vanilla AST (Barry 2018)

Vanilla AST (Mital 2017)

Vanilla AST (Ulyanov 2016)

References:

Shaun Barry and Youngmoo Kim, “Style transfer for musical audio using multiple time-frequency representations,” 2018, Available at: https://github.com/anonymousiclr2018/Style-Transfer-for-Musical-Audio.
Parag K. Mital, “Time domain neural audio style transfer,” 2017, Available at: https://github.com/pkmital/time-domain-neural-audio-style-transfer.
Dmitry Ulyanov and Vadim Lebedev, “Audio texture synthesis and style transfer,” 2016, Available at: https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/.