In recent years, significant advancements in
music-generating deep learning models and neural networks have revolutionized
the process of composing harmonically-sounding music. One notable innovation is
the Music Transformer, a neural network that utilizes context generation and
relationship tracking in sequential input. By leveraging transformer-based
frameworks designed for handling sequential tasks and long-range functions, the
Music Transformer captures self-reference through attention and excels at finding continuations of
musical themes during training. This attention-based model offers the advantage
of being easily trainable and capable of generating musical performances with long-term
structure, as demonstrated by Google Brain’s implementation. In this study, I
will explore various instances and applications of the Music Transformer,
highlighting its ability to efficiently generate symbolic musical structures.
Additionally, I will delve into another state-of-the-art model called TonicNet,
featuring a layered architecture combining GRU and self-attention mechanisms.
TonicNet exhibits particular strength in generating music with enhanced
long-term structure, as evidenced by its superior performance in both objective
metrics and subjective evaluations. To further improve TonicNet, I will
evaluate its performance using the same metrics and propose modifications to
its hyperparameters, architecture, and dataset.
References
[1]
Bhagat, D., Bhatt, N., & Kosta, Y. (2012). Adaptive Multi-Rate Wideband Speech Codec Based on CELP Algorithm: Architectural Study, Implementation & Performance Analysis. In 2012 International Conference on Communication Systems and Network Technologies (pp. 547-551). IEEE. https://doi.org/10.1109/CSNT.2012.124
[2]
Briot, J.-P., Hadjeres, G., & Pachet, F.-D. (2019). Deep Learning Techniques for Music Generation—A Survey. https://doi.org/10.48550/arXiv.1709.01620
[3]
Chu, H., Kim, J., Kim, S., Lim, H., Lee, H., Jin, S., Lee, J., Kim, T., & Ko, S. (2022). An Empirical Study on How People Perceive AI-Generated Music. In Proceedings of the 31st ACM International Conference on Information & Knowledge (pp. 304-314). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557235
[4]
Dua, M., Yadav, R., Mamgai, D., & Brodiya, S. (2020). An Improved RNN-LSTM Based Novel Approach for Sheet Music Generation. Procedia Computer Science, 171, 465-474. https://doi.org/10.1016/j.procs.2020.04.049
[5]
Hsu, J.-L., & Chang, S.-J. (2021). Generating Music Transition by Using a Transformer-Based Model. Electronics, 10, Article 2276. https://doi.org/10.3390/electronics10182276
[6]
Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A., Hoffman, M. D., & Eck, D. (2019). Music Transformer: Generating Music with Long-Term Structure. https://magenta.tensorflow.org/music-transformer
[7]
Jagannathan, A., Chandrasekaran, B., Dutta, S., Patil, U. R., & Eirinaki, M. (2022). Original Music Generation Using Recurrent Neural Networks with Self-Attention. In 2022 IEEE International Conference On Artificial Intelligence Testing (AITest) (pp. 56-63). IEEE. https://doi.org/10.1109/AITest55621.2022.00017
[8]
Ji, S., Luo, J., & Yang, X. (2020). A Comprehensive Survey on Deep Music Generation: Multi-Level Representations, Algorithms, Evaluations, and Future Directions. https://doi.org/10.48550/arXiv.2011.06801
[9]
Peracha, O. (2019). Improving Polyphonic Music Models with Feature-Rich Encoding. https://doi.org/10.48550/arXiv.1911.11775
[10]
Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (pp. 464-468). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2074
[11]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
[12]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. https://doi.org/10.48550/arXiv.1706.03762