Tomczak, M. , C., Southall , J., Hockman , Audio Style Transfer with Rhythmic Constraints, Proceedings of the 21st International Conference on Digital Audio Effects (DAFx), Aveiro, Portugal, September 4–8, 2018.
Audio examples The presented 15 transformation pairs were tested with 3 different loss terms L1, L2 and L3 defined in the paper , as well as, several additional audio style transfer (AST) transformations. In addition, transformations acquired with default parameters from AST approaches by Barry et al. (2018), Mital (2017) and Ulyanov et al. (2016) are included with the results. Inputs A and B refer to terms content and style used by the authors of the compared papers.
Mashup transformations using L2.
Input A : Marching In The Streets by Harvey Mason Input B : Night and Day by Idris Muhammad with George Coleman Output Loss 2 (Mashup - Style A + Style B) Input A : Colours of the Season by Daudi Matsiko, Yung Veerp, Fazerdaze and others Input B : Loop Trigger by Mathew Jonson, GPU Panic Output Loss 2 - Loop Triggered Colours of the Season Transformation comparisons.
Pair 1 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 2 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 3 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 4 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 5 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 6 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 7 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 8 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 9 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 10 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 11 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 12 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 13 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 14 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) Pair 15 Input A Input B Loss 1 (Vanilla AST - Content A + Style B) Loss 2 (Mashup - Style A + Style B) Loss 3 (Augmented Mashup - Style A + Style B + Content B) Vanilla AST (Barry 2018) Vanilla AST (Mital 2017) Vanilla AST (Ulyanov 2016) References: Shaun Barry and Youngmoo Kim, “Style transfer for musical audio using multiple time-frequency representations,” 2018, Available at: https://github.com/anonymousiclr2018/Style-Transfer-for-Musical-Audio .
Parag K. Mital, “Time domain neural audio style transfer,” 2017, Available at: https://github.com/pkmital/time-domain-neural-audio-style-transfer .
Dmitry Ulyanov and Vadim Lebedev, “Audio texture synthesis and style transfer,” 2016, Available at: https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/ .