The sound of music
In a September 2018 paper, Japanese data scientists Eita Nakamura and Kunihiko Kaneko at Kyoyo University and the University of Tokyo respectively found that western classical music over the past several centuries has followed laws of evolution. This large- scale study has implications for the understanding of other cultural phenomena such as the evolution of language, fashion, and science. Evolution is an algorithmic process applied to populations in which certain traits are passed on to the next generation and others are culled. In music, as in art in general, this involves passing on past traditions while incorporating new features. Indeed, algorithmic music composition has a long standing history as in Chinese windchimes, Greek wind- powered Aeolian harps of the Japanese suikinkutsu. Iannis Xenakis used Markov chains – which use existing fragments of music in equal probability- for his 1958 compositions, Analogique.
I see in Magenta
Google Magenta’s NSynth takes this a step further and uses deep learning to aid music makers in their composition. NSynth specifically aims to use deep learning and deep neural networks to provide artists with a vast array of instrument sounds (according to instrument class, genre and complexity). Your odd musician could then answer a question such as “What do you get when you cross a piano with a flute?” Being able to do so pushes the evolution of music further and provide artists with a wide range of tools through which they expand existing global discography. Algorithmic composition has also been shown to be helpful in addressing composer’s block; arguably every artist’s worst nightmare.
The main challenge that the team at Magenta face is the fact that available data-sets are small, biased and not freely available largely due to copyright restrictions. To solve this in the short term, the team is building on the Free Music Archive (FMA), a web based repository and use the existing AudioSet concept ontology to classify specific sounds. This ontology (or knowledge management structure) is originally motivated by a lack of large- scale annotated audio- data for scientific research purposes and is derived from YouTube videos, with the goal of providing a testbed for identifying acoustic events. Lastly, the team has also conducted crowd- sourced annotation using the CrowdFlower platform in correcting and verify the labels of clips identified as likely positives. The use of control examples at this stage is especially important in weeding out contributors whose accuracy drops before a certain threshold. Should they consistently answer obvious questions wrongly (such as labelling a piano sample as a trumpet), they are not eligible to take part in the exercise.
However, given that the AudioSet collection is derived from YouTube videos, there are no guarantees on the legality of licensing, sharing, and archiving the content. As such, the use of the content is currently extremely limited. Moreover, most of the content comes from solo performances making it currently difficult to model and evaluate on ensemble performances. Nevertheless, this work will serve to build a baseline model that the team can build up from. In particular, they aim to focus on identifying the classes that correspond to musical instruments, resulting in a set of more than 70 relevant classes. For the sake of coverage, they are merged into “instruments”, e.g. “Acoustic Guitar”, “Electric Guitar”, and “Tapping (guitar technique)” become guitar, while “Cello” and “Violin” remain distinct. The medium to long term goal of the team is to iteratively refine the concepts as the dataset grows and the acoustic model improves. Moreover, another novel approach in designing their instrument dataset that they aim to use going forward is to consider instruments outside the current ‘vocabulary’ through additional crowd- sourcing, semi- supervised learning and incremental evaluation.
50 shades of audio
I would suggest that the management in the medium term also look into different deep learning techniques that can be used for broader ethnomusicology. A risk that exists given the current approach is that the dataset and instruments used are limited to those dominant in the West. As Magenta thinks about crowdsourcing, it could look at other models such as Vocaloids, software voicebanks that have grown over the past few years from collaborative content creation. Hatsune Miku, one of the vocaloid personas has performed in front of sold- out concerts. In the same vein of thinking out of the box, I believe there are multiple uses for products such as NSynth can be extended into other cultural fields. It has been argued that music could be older by language. Using neural networks in training a deep learning network for a dead language by way of example (half of the approximately 6000 languages in the world today will be extinct today) could be incredibly helpful in storing the rich cultural diversity we currently enjoy.
Interesting questions remain for Magenta and artists in general. To what degree can “art” be drawn from algorithmic composition based off computing power? What would the role of human composers look like in the future? How does the industry approach copyright law and proprietary rights in this context? (791 words)
 MIT Technology Review, “Data mining reveals the hidden laws of evolution behind classzical music,” Sept. 28, 2018, at https://www.technologyreview.com/s/612194/data-mining-reveals-the-hidden-laws-of-evolution-behind-classical-music
 Nakamura, E. and Kaneko, K. (2018), “Statistical Evolutionary Laws in Music Styles,” available at https://arxiv.org/abs/1809.05832.
 Schulkin, J., and Raglan, G. (2014), “The Evolution of Music and Human Social Capability,” available at https://www.frontiersin.org/articles/10.3389/fnins.2014.00292/full.
Giuseppe Bandiera, Oriol Romani Picas, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra. Good-sounds. org: A framework to explore goodness in instrumental sounds. In Proceedings of the 17th International Society for Music Information Retrieval Conference, pages 414–419, 2016.
 Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591– 596, 2011.
 Rachel M Bittner, Justin Salamon, Mike Tierney, Matthias Mauch, Chris Cannam, and Juan Pablo Bello. MedleyDB: A multitrack dataset for annotation intensive MIR research. In ISMIR, volume 14, pages 155–160, 2014.
 Juan J Bosch, Jordi Janer, Ferdinand Fuhrmann, and Perfecto Herrera. A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In ISMIR, pages 559–564, 2012.
 Mark Cartwright, Ayanna Seals, Justin Salamon, Alex Williams, Stefanie Mikloska, Duncan MacConnell, E Law, J Bello, and O Nov. Seeing sound: Investigating the effects of visualizations and complexity on crowdsourced audio annotations. Proceedings of the ACM on Human-Computer Interaction, 1(1), 2017.
 Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.