Critical Response #1: MusicLM
Why do we want text-based music generation? What is MusicLM good for? Let's start first by observing what MusicLM is good AT?
Listening to the provided examples, MusicLM demonstrates competence in producing ecclectic compositions of highly specified genres and instrumentation. Where a musician might get caught in the logistical nightmare that is "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and awe, while being danceable" (MusicLM, 2023), MusicLM adds everything up like a simple math equation and spits out a shockingly accurate soundscape. While a musician might sit in existential dread, wondering how to effectively voice a piece with the given instruments, MusicLM excels at hamfisting instruments into a song that have no business being together. And that's great, really. MusicLM isn't inhibited by the pressure to make something that sounds good; that's not its job. It's job it to make something that sounds as described. And in this category of music vomit, MusicLM is in some ways more competent, or at least faster, than we are as human composers.
Before diving into the question "is this a good idea?", I want to briefly remark on the shortcomings of MusicLM. Looking at the story-mode feature, I would argue MusicLM is strictly incompetent at transitions, moving between prompts with the subtlety of a drunk roommate climbing into his bed, trying not to wake you up. And the vocals leave a lot to be desired--but the researchers address this, along with lyric generation, as a clear area for improvement of future models--so I won't dig in too hard here. Lastly, the generative music was fairly metronomic in tempo which was great for any electronic or arcade-style music, but definitely felt a bit robotic in genres like jazz.
Now, the question of the hour: "is this a good idea?" I surprise myself when I come to the conclusion, yes. As someone who has played classical music most of my life, it took me a while to even come around to the creative merits of DJing and producing music without recording any live instruments. But when I consider the utility of text-based generative music, I see MusicLM strictly fulfilling the objective that the researchers set out to achieve: "assist[ing] humans with creative music tasks." As it stands, MusicLM isn't coming for anyone's jobs. Rather, it can be utilized as a rapid and efficient tool for composers to experiment with instrumentation and genre-based inspiration. It would still take a talented musician to make good music out of the inspiration derived from a tool like MusicLM.
I think a natural follow-up to the strict evaluation of MusicLM's utility is the consideration of whether this takes the fun out of it? And while I think some art forms like painting and live music performance would lose their lustre if automated, having more tools to compose music just means we can get to playing and listening to the good music faster.
Listening to the provided examples, MusicLM demonstrates competence in producing ecclectic compositions of highly specified genres and instrumentation. Where a musician might get caught in the logistical nightmare that is "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and awe, while being danceable" (MusicLM, 2023), MusicLM adds everything up like a simple math equation and spits out a shockingly accurate soundscape. While a musician might sit in existential dread, wondering how to effectively voice a piece with the given instruments, MusicLM excels at hamfisting instruments into a song that have no business being together. And that's great, really. MusicLM isn't inhibited by the pressure to make something that sounds good; that's not its job. It's job it to make something that sounds as described. And in this category of music vomit, MusicLM is in some ways more competent, or at least faster, than we are as human composers.
Before diving into the question "is this a good idea?", I want to briefly remark on the shortcomings of MusicLM. Looking at the story-mode feature, I would argue MusicLM is strictly incompetent at transitions, moving between prompts with the subtlety of a drunk roommate climbing into his bed, trying not to wake you up. And the vocals leave a lot to be desired--but the researchers address this, along with lyric generation, as a clear area for improvement of future models--so I won't dig in too hard here. Lastly, the generative music was fairly metronomic in tempo which was great for any electronic or arcade-style music, but definitely felt a bit robotic in genres like jazz.
Now, the question of the hour: "is this a good idea?" I surprise myself when I come to the conclusion, yes. As someone who has played classical music most of my life, it took me a while to even come around to the creative merits of DJing and producing music without recording any live instruments. But when I consider the utility of text-based generative music, I see MusicLM strictly fulfilling the objective that the researchers set out to achieve: "assist[ing] humans with creative music tasks." As it stands, MusicLM isn't coming for anyone's jobs. Rather, it can be utilized as a rapid and efficient tool for composers to experiment with instrumentation and genre-based inspiration. It would still take a talented musician to make good music out of the inspiration derived from a tool like MusicLM.
I think a natural follow-up to the strict evaluation of MusicLM's utility is the consideration of whether this takes the fun out of it? And while I think some art forms like painting and live music performance would lose their lustre if automated, having more tools to compose music just means we can get to playing and listening to the good music faster.
Critical Response #2: Power to the People & Humans in the Loop
I'll be the first to admit that prior to this class, I kept a blissful ignorance to the progressions of AI and would instinctively tune-out my CS colleagues any time buzzwords like 'machine learning' came up in conversation. In fact, I would usually leave the room or actively change the subject. I mention this because I want to clarify that prior to this quarter, I have spent little to no time invested in the applications of AI, let alone the brainpower to ponder the great ethical dilemmas that human interaction with AI can present. Prior to this class, my opinions on AI were, if not completely substanceless, generally negative. I would have likely espoused an opinion along the lines of "it would honestly just be easier for all of us if AI and machine learning weren't around, so we wouldn't have to worry about it." And to be fair, I think I might still not totally disagree with that notion, but I am coming around on understanding how AI could either positively, or negatively impact the society we've set up for ourselves.
I think a reasonable take on AI is that it's like a tool or weapon that we, as society, have to wield responsibly. A weapon could be used for good intentions or nefarious ones. And the quality of the intentions don't necessarily dictate the positive or negative nature of the results. Improper use of a tool can cause damage. That's definitely where the greatest nuance lies--in this definition of proper use. But before happily moving forward with the framing of AI as a tool that must be wielded properly to do good, I want to challenge this notion altogether. Are all tools useful? Do all tools have the potential to be used for good, or are some doomed to a negative impact? Let's get nuclear--literally. The invention of nuclear weaponry was one of the greatest scientific achievements of the 20th century. I think if you polled most of the globe, the consensus would agree that the bombs were net negative on global impact. Of course, my reference to nuclear weaponry is not as shallow as to say "look at this technology that was misused and created a great deal of harm," because that's an easy claim to make, and one could argue that the tools were mismanaged which resulted in the harm caused. But what I find most fascinating about the narrative of nuclear weaponry is the way it all ended--underground. 50 years after their invention, even with the mistakes of the past to learn from, we decided we couldn't think of a productive use for nuclear weaponry--so we threw them away (disarmed and buried them).
I understand I've gotten away from the topic of AI a bit, but I really became fascinated with this question of "does AI--in the long term (of course short term gains could be made)--even possess the potential to do good with perfect management? Or will AI come back to do harm no matter how we wield it?"
I think a reasonable take on AI is that it's like a tool or weapon that we, as society, have to wield responsibly. A weapon could be used for good intentions or nefarious ones. And the quality of the intentions don't necessarily dictate the positive or negative nature of the results. Improper use of a tool can cause damage. That's definitely where the greatest nuance lies--in this definition of proper use. But before happily moving forward with the framing of AI as a tool that must be wielded properly to do good, I want to challenge this notion altogether. Are all tools useful? Do all tools have the potential to be used for good, or are some doomed to a negative impact? Let's get nuclear--literally. The invention of nuclear weaponry was one of the greatest scientific achievements of the 20th century. I think if you polled most of the globe, the consensus would agree that the bombs were net negative on global impact. Of course, my reference to nuclear weaponry is not as shallow as to say "look at this technology that was misused and created a great deal of harm," because that's an easy claim to make, and one could argue that the tools were mismanaged which resulted in the harm caused. But what I find most fascinating about the narrative of nuclear weaponry is the way it all ended--underground. 50 years after their invention, even with the mistakes of the past to learn from, we decided we couldn't think of a productive use for nuclear weaponry--so we threw them away (disarmed and buried them).
I understand I've gotten away from the topic of AI a bit, but I really became fascinated with this question of "does AI--in the long term (of course short term gains could be made)--even possess the potential to do good with perfect management? Or will AI come back to do harm no matter how we wield it?"
Possible Applications of Interactive Machine Learning:
1. Positive reinforcement machine. I think this could be a very simple language model that acts as a supportive friend and gives positive feedback; it molds to the user by asking the user to rate how they felt from the response.
2. JamBot. (Inspiration from Soohyun's Featured Artist). I think AI could learn to jam with live musicians with either binary or scaled feedback from the artists on how well it performed during a given session.
3. Music Instructor. A model could be trained on professional (or simply higher level) musicians and compare students' performance to the database to give feedback; would need music instructors to supervise efficacy of feedback for early learning.
4. Personal Mood-Light. Displays different lighting colors/patterns depending on mood (facial expression recognition); would adapt to user by requesting feedback on if the the lighting seemed accurate and if they observed the correct mood.
5. Personal Soundtrack. Similar interaction and functionality as above, uses facial expression to track mood and plays a soundtrack like you're the protagonist in a movie--molds to user by requesting feedback on correct mood ID and song choice.
6. Personal Soundtrack 2.0. Same concept, this time for your car. Could be really cool for a manual car with gas pedal, brakes, and shifting inputs to influence a badass driving soundtrack.
7. Personalized Coaching. AI-generated training schedules and workout plans already exist for running/endurance sports like triathlon but I think these could be improved by requesting perceived effort feedback from the user in addition to scraping as much hard data (heart-rate, pace, etc.) from the workout.
8. Pet Companion. Robotic pet companion that responds to user expression and provides comfort--similar feedback request for mood detection success as other ideas.
9. Shock Training. Electronic stimulation applied to force muscle contraction at all phases of the pedal stroke (cycling), learns from video playback of stroke smoothness.
10. High Five Machine. Learns from iteration to maximize decibel level of high fives with users.
1. Positive reinforcement machine. I think this could be a very simple language model that acts as a supportive friend and gives positive feedback; it molds to the user by asking the user to rate how they felt from the response.
2. JamBot. (Inspiration from Soohyun's Featured Artist). I think AI could learn to jam with live musicians with either binary or scaled feedback from the artists on how well it performed during a given session.
3. Music Instructor. A model could be trained on professional (or simply higher level) musicians and compare students' performance to the database to give feedback; would need music instructors to supervise efficacy of feedback for early learning.
4. Personal Mood-Light. Displays different lighting colors/patterns depending on mood (facial expression recognition); would adapt to user by requesting feedback on if the the lighting seemed accurate and if they observed the correct mood.
5. Personal Soundtrack. Similar interaction and functionality as above, uses facial expression to track mood and plays a soundtrack like you're the protagonist in a movie--molds to user by requesting feedback on correct mood ID and song choice.
6. Personal Soundtrack 2.0. Same concept, this time for your car. Could be really cool for a manual car with gas pedal, brakes, and shifting inputs to influence a badass driving soundtrack.
7. Personalized Coaching. AI-generated training schedules and workout plans already exist for running/endurance sports like triathlon but I think these could be improved by requesting perceived effort feedback from the user in addition to scraping as much hard data (heart-rate, pace, etc.) from the workout.
8. Pet Companion. Robotic pet companion that responds to user expression and provides comfort--similar feedback request for mood detection success as other ideas.
9. Shock Training. Electronic stimulation applied to force muscle contraction at all phases of the pedal stroke (cycling), learns from video playback of stroke smoothness.
10. High Five Machine. Learns from iteration to maximize decibel level of high fives with users.
Critical Response #3: Message in a Bottle
This message is intended for those who, like myself, have spent more energy attempting to stay UNaware of the progressions of artificial intelligence, machine learning, or any topic that falls under the comp sci umbrella. For those with supreme apathy towards AI and a general distaste for computer programming, this is the class for you.
Having spent all of my undergraduate degree avoiding learning any programming beyond the MatLab necessary for my mechanical engineering classes, I enrolled in this course because I thought that bringing music into the picture might help motivate me to learn a bit more about programming, as computer science continues to become an ubiquitous part of career and daily life. In a sense, music was the spoonful of sugar to help the medicine go down. And if this assumption had been correct, I probably would have still really enjoyed this class. But instead of sugar-laden cough syrup, I think CS 470 could be better described as ayahuasca--or whatever hip mind-bending psychadelic the kids are doing these days.
What I mean by this absurd analogy is that more than simply providing a fun medium to learn useful programming tools, CS 470 has drastically reframed the way I look at programming. Previously, the topics 'artificial intelligence' and 'machine learning' were just annoying buzzwords that my friends used in job interviews to make their projects seem more impressive. I found these topics intimidating, and assumed that they were only really relevant for the most cutting edge of applications in computing or design. But as we have implemented low tech examples of machine learning in our projects, I now see ML as a tool analogous to a mill or a lathe. You can use a lathe for tightly toleranced aerospace industry components, and you can also use a lathe to make a fun wavy cylinder handle to an ergonomic spatula. It's just one framework that can be used to make a product, or art piece. Part of this reframing has been thanks to the nature of the projects we've worked on. My experience with programming has always been using it as one step in an integrated process. I used MatLab to graph the solutions to analytical problems I solved by hand, or Arduino to control a physical prototype which had been designed in CAD and assembled in the physical world. Working on projects purely based in the digital world has helped me develop more analogous connections to physical product design than I had made while working on products with integrated software.
I really gelled with the sentiment that Ge put forward early in the class which was that he had become an engineer, not to solve problems, but to make beautiful things. This has always been the lane I've stayed in with physical product design and taking CS 470 has helped me draw very clear parallels to digital product design. Not everyone responds to drugs the same way, and I imagine not everyone will walk away from CS 470 with the same takeaways I did, but I can guarantee CS 470 will be a trip worth taking. Don't do drugs, but this is me peer pressuring you to take CS 470.
Having spent all of my undergraduate degree avoiding learning any programming beyond the MatLab necessary for my mechanical engineering classes, I enrolled in this course because I thought that bringing music into the picture might help motivate me to learn a bit more about programming, as computer science continues to become an ubiquitous part of career and daily life. In a sense, music was the spoonful of sugar to help the medicine go down. And if this assumption had been correct, I probably would have still really enjoyed this class. But instead of sugar-laden cough syrup, I think CS 470 could be better described as ayahuasca--or whatever hip mind-bending psychadelic the kids are doing these days.
What I mean by this absurd analogy is that more than simply providing a fun medium to learn useful programming tools, CS 470 has drastically reframed the way I look at programming. Previously, the topics 'artificial intelligence' and 'machine learning' were just annoying buzzwords that my friends used in job interviews to make their projects seem more impressive. I found these topics intimidating, and assumed that they were only really relevant for the most cutting edge of applications in computing or design. But as we have implemented low tech examples of machine learning in our projects, I now see ML as a tool analogous to a mill or a lathe. You can use a lathe for tightly toleranced aerospace industry components, and you can also use a lathe to make a fun wavy cylinder handle to an ergonomic spatula. It's just one framework that can be used to make a product, or art piece. Part of this reframing has been thanks to the nature of the projects we've worked on. My experience with programming has always been using it as one step in an integrated process. I used MatLab to graph the solutions to analytical problems I solved by hand, or Arduino to control a physical prototype which had been designed in CAD and assembled in the physical world. Working on projects purely based in the digital world has helped me develop more analogous connections to physical product design than I had made while working on products with integrated software.
I really gelled with the sentiment that Ge put forward early in the class which was that he had become an engineer, not to solve problems, but to make beautiful things. This has always been the lane I've stayed in with physical product design and taking CS 470 has helped me draw very clear parallels to digital product design. Not everyone responds to drugs the same way, and I imagine not everyone will walk away from CS 470 with the same takeaways I did, but I can guarantee CS 470 will be a trip worth taking. Don't do drugs, but this is me peer pressuring you to take CS 470.