Founded in August 2007, Deezer is an online music streaming service based in Paris, France. While Spotify has gained market leadership in the online music steaming space, Deezer continues to hold its own with 14 million monthly active users (MAU) in over 180 countries as of January 2019 . Per Similarweb.com estimates, 35% of Deezer’s users are in France, 11% in Brazil and the remainder spread out across the world.
What is Spleeter?
On November 4th 2019, Deezer released Spleeter, a machine learning tool for source separation. Spleeter is a project from Deezer’s research division and is made available online as a Python library based on Tensorflow. Although source separation remains a relatively obscure topic, its applications in music information retrieval (MIR) has the potential to have a far reaching impact on the way we are able to produce and consume music.
Source separation 101
At its core, source separation is the separation of a desired signal from a set of mixed signals. In music, this means deconstructing a recorded musical piece into its components by isolating each instrumental layer on the track. For example, 4-stem source separation on Coldplay’s hit single, “Yellow”, would yield the following layers (aka “stems”) in isolation:
- Vocals by Chris Martin, singing about the stars and such
- Drums/percussion by Will Champion
- Bass guitar by Guy Berryman
- Electric guitar by Jonny Buckland
While it may seem like a relatively straightforward process, accurate source separation is difficult to accomplish. Today, most professionally recorded music is made by recording each instrument on a separate channel, and then the final combined track is produced in a step called the “mixdown”. In this final step, all the individual tracks are blended together for mastering and then digitally compressed for delivery. All the sound waveforms are meshed together, in a process akin to an irreversible chemical reaction that is it impossible to undo.Nonetheless, Spleeter has made the impossibly difficult task of source separation a lot easier using machine learning.
How does Spleeter work?
A common technique used source separation is time-frequency (TF) masking. Different types of sounds in the musical track correspond to varying frequencies. For instance, the lead vocals would occupy different frequency bands as compared with the drums. Using TF masking, the mixture of frequencies that make up a piece of music is filtered, allowing us to pick and choose which frequencies to keep. What remains after this process is the separated stem of the instrument that we want to isolate.
The tricky part of this process is being able to approximate which frequencies correspond to which instruments. Given that the audible range of frequencies for humans is 20 to 20,000 Hz, a lot of processing is needed to accurately classify the broad range of frequencies contained in a musical track. Traditionally, this step was done manually by using snippets of isolated vocals (which are hard to find) to approximate the frequencies that should be left unmasked, thereby making a “minus one” track commonly used for bootleg karaoke.
Today, Spleeter does the heavy lifting as it comes with pretrained models for standard 2,4 and 5-stem separation. Using Spleeter is as simple as installing the package and running the separator function on a command line interface, which then creates a .wav file for each stem. In addition, Spleeter also allows users to train custom source separation models and evaluate the models against a benchmark dataset (musDB18 available on SigSep, another open source project).
Applications and challenges
Deezer is in a unique position to build a generalized source separation engine because it has access to a large catalog of music that few organizations do. However, a major challenge is that Deezer is not legally or ethically able to release the vast library of stems made from its own catalog due to copyrighting. So how does Deezer capture value?
Deezer can use its musical data assets to train its own machine learning models, which can then be deployed to improve its user value proposition. In addition, Spleeter enables Deezer to perform source separation at scale. On the GPU version, one can expect separation at 100x faster than real-time , which allows for rapid deployment on its growing catalog of music. As such, it is conceivable that Deezer can accomplish the following:
Make novel music recommendations. Deezer can use the finer data granularity to cluster music by similarity for a given instrument. For example, Deezer could identify bands that have lead singers who have a similar vocal timbre as Coldplay’s lead singer, Chris Martin. Coldplay fans might hence get recommendations to give Blue Merle a listen.
Machine-learning enabled classification of music by genre. Most music genres have unique instrumental characteristics. For instance, music in the drum and bass genre has a distinctive style of complex percussive syncopation whereas the house music genre almost always has a steady rhythm in 4/4 time. Using Spleeter, Deezer can isolate the drum stems of the music in its catalog and classify them along the axes of style and speed. As a result, Deezer can use machine-learning to automatically classify music by genre as it is able to deconstruct a mixdown and find similarities across different songs for any given instrument or combinations of instruments. Automatic classification would allow users to browse music by genre without requiring Deezer to manually label each track in its catalog.
Deezer should maintain Spleeter as an open source project that is kept autonomous from the consumer-facing business unit. In my opinion, the real value of a generalized source separation engine lies with applications in music production. Having access to high quality stems allows music producers to experiment, stretch the boundaries of musical genres and create new affective experiences for listeners. By becoming a leader in the open source music community, Deezer would cement its position as the center of gravity for digital innovation in music. In doing so, it will capture a fair portion of the value created as it gains brand recognition over its famous rival, Spotify.
 “About Us”. 2019. Deezer. https://www.deezer.com/us/company.
 Rafii, Zafar, Antoine Liutkus, Fabian-Robert Stoter, Stylianos Mimilakis, Derry FitzGerald, and Bryan Bryan. 2019. “An Overview Of Lead And Accompaniment Separation In Music”. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, no. 201: 3.
 “Releasing Spleeter: Deezer R&D Source Separation Engine”. 2019. Medium. https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e.