Skip to the content.
Paper Github

This is the demo page for ISMIR2022 paper JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE

Author: Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang

Abstract

This is the demo page of JukeDrummer, the work generates a drum track in the audio domain to play along to a user-provided drum-free recording. Specifically, using paired data of drumless tracks and the corresponding human-made drum tracks, we train 2 vector-quantized variation autoencoders (VQ-VAE) to discretize both drumless and drum Mel spectrogram. Subsequently, we also train the Transformer to improvise the drum part of an unseen drumless recording with these discretized drum tokens. Finally, we use MelGAN as Vocoder to transform our Mel spectrogram decoded by the decoder of VQ-VAE into the audio wave. This demo page contains several results of our attempts at different domain inputs.

Model Structure & Configurations

We have our model highly refer to Jukebox. While there are hundreds of self-attention layers in Jukebox, there are only 9 layers in both encoder and decoder in our work. In addition, we also apply so called “Beat Information Extractor” to extract beat information externally in aid of generating rhythmically consistent drum accompaniment audio.

Fig 1. The Flowchart of JukeDrummer
Fig 2. The language model of JukeDrummer

Demo audio

In this demo section contains 3 parts. The first part is the comparison between different model configurations using testing data as input. The second part shows the diversity of our drum accompaniment tracks using our testing data by our best model. Finally, in the third part, we provide some results from our best model using external but famous drumless tracks as input.

Part 1: Evaluation of Different Model Variants on the Test Set


  Drumless GroundTruth W/ Encoder W/ BeatInfo W/ Encoder W/O BeatInfo W/O Encoder W/ BeatInfo W/O Encoder W/O BeatInfo
1.
2.
3.
4.
5.
6.

Part 2: Evaluation on Diversity


We use our best model W/ Encoder W/ BeatInfo to reaptly generate drum tracks with identical parameters and other configurations.

  Drumless GroundTruth Sample 1 Sample 2 Sample 3 Sample 4
1.
2.
3.
4.

Part 3: Evaluation on External Data


We use our best model W/ Encoder W/ BeatInfo to reaptly generate drum tracks with external input data. We use spleeter to extract drumless tracks of the first and the second tracks.

Earth, Wind & Fire - September

Drumless Sample 1 Sample 2 Sample 3

伍佰 Wu Bai & China Blue - 挪威的森林 Norwegian fores

Drumless Sample 1 Sample 2 Sample 3

All of me

Drumless Sample 1 Sample 2 Sample 3

Coldplay - Viva La Vida

Drumless Sample 1 Sample 2 Sample 3

Limitation

First, the generalizability of our model is not good enough. According to our own observation, our model is capable to perform functionally when using most of our testing data as input which is divided from the joined dataset of MUSDB18, MedleyDB, and MixingSecret prior to our training. However, the results are relatively worse when using recordings of drumless music outside our joined dataset as input. We conjecture that our model is sensitive to audio compression, original sample rate, or the way the music is mixed and mastered.

Second, the stability of our model still needs to be improved. At times, the model struggles to change its tempo going through different sections of a song. Moreover, the generation might be out of sync with the input in the beginning few seconds, until the model gets sufficient context.

Last but not least, it would be helpful if the drumless input contains some “rhythmical hints”, such as strong bass, rhythm guitar, and any other sources that can be conducive for our model to locate beats and downbeats. If so, the model is likely to perform better. On the other hand, if the model can’t get enough clues from input to locate beats and tempo, the result of generation would be pretty bad.

To sum up, issues related to generalizability, stability, and rhythm dependency are issues that should be addressed in future works.

Contact


Yueh-Kao Wu yk.lego09@gmail.com