GitHub - deezer/playntell: Code to reproduce the experiments presented in the article "Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge" (EMNLP 2022) (original) (raw)

The current repository provides the code for PlayNTell, described in the article Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge, presented at EMNLP 2022.

Installation

git clone git@github.com:deezer/playntell.git cd playntell

Setup

Build the docker image and run it in a container while launching an interactive bash session:

$ make build $ make run-bash

When building the docker image, the released data which is currently hosted on Zenodo is also downloaded in the directory data; thus the build may take a while. Also in data, we can find pre-computed embeddings for the discogs tags using music-w2v (see Doh et al., 2020 in the paper).

Run algorithms

In order to run the code, a cuda environment, with a version >=11.2 is required. While the inference is quite fast, training could last up to 2 days.

PlayNTell: training & infererence:

To train the model on pre-processed deezer training data:

$ poetry run python3 playntell/training_experiments/train_playntell.py

Note: playntell accepts multiple parameters. Two useful ones are:

Note: playntell saves its outputs in: ../data/playlist-captioning/p/curated-deezer/algorithm-data/playntell/. In particular:

Once the model is trained, you can perform inference on preprocessed data (from deezer or spotify playlists) with:

$ poetry run python3 playntell/infer.py --exp_name playntell --inference_dataset_name curated-deezer

Inference on new playlists:

You can use the playntell model to predict a caption for a playlist (given by its audio files, tags, and artists information) with:

$ poetry run python3 playntell/caption_playlist.py /data/playlist-captioning/p/test_playlist/playlist.json

the playlist.json file has two fields:

A dummy example of a playlist.json and audio files is provided for testing in /data/playlist-captioning/p/test_playlist/, found in the docker container.

Acknowledgements

This repo uses code from the following repositories, with some modifications:

Paper

Please cite our paper if you use this code in your work:

@InProceedings{Gabbolini2022,
  title={Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge},
  author={Gabbolini, Giovanni and Hennequin, Romain and Epure, Elena},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  month={December},
  year={2022}
}