PHASE 1
feature-extract1.ck & genre-classify1.ck use only the centroid value to classify the songs. When testing Gustav Holst’s Planets, the classical metric stayed at zero almost the entire time. However, by humming a constant note, I could get the classical metric to stay above 0.5 constantly. x-validate.ck shows that the guessing was nearly random across the genres (around a 17.5% success rate per genre out of 10 total genres).
feature-extract2.ck & genre-classify2.ck use the centroid, flux, and rms values to classify the songs. x-validate.ck shows that the guessing was substantially improved compared to the classifier using only centroid and guess with around a 30% success rate. Testing classical music versus humming, this model still performed fairly poorly as I was able to get higher classical readings from a sustained hum than a symphony playing through my phone.
feature-extract3.ck & genre-classify3.ck use the centroid, flux, rms, and the first 7 mfcc coefficients to classify the songs. x-validate.ck shows that the guessing was substantially improved compared to the classifiers without mfcc coefficients using and guessed with around a 43% success rate. This was the first model to pass my non-robust classical test. When playing Holst, the classifier would read above 0.5 for classical for the duration of the song. Additionally, this was the first model that would hold 1 or nearly 1 for classical during silence. Other models were distributed among other genres during silence.
feature-extract4.ck & genre-classify4.ck use the centroid, flux, rms, sfm, Kurtosis, ZeroX, XCorr, AutoCorr, Chroma, RollOff and the first 7 mfcc coefficients to classify the songs. This was the first feature extraction which was clearly more computationally expensive than the previous models, as it took much longer to extract all of these values but the classification was equally fast once the model was created. This did slightly worse when testing my own audio samples than the previous model without the additions of sfm, Kurtosis, ZeroX, XCorr, AutoCorr, Chroma, and RollOff but it was hard to tell just by observing. However, x-validate.ck shows that the guessing was FAR worse than any of the other models and averaged a classification rate less than 10% (which is worse than random guessing). This appears to be the result of overfitting.
feature-extract5.ck & genre-classify5.ck use the centroid, flux, rms, and 40 mfcc coefficients to classify the songs. Despite trading 7 features for 33 mfcc coefficients, this was less computationally expensive during extraction. x-validate.ck shows that the classification showed diminishing returns compared to the classifier with only 7 mfcc coefficients, showing a classification rate around 42% (marginally lower). This model performed fairly well against my own audio samples.
feature-extract2.ck & genre-classify2.ck use the centroid, flux, and rms values to classify the songs. x-validate.ck shows that the guessing was substantially improved compared to the classifier using only centroid and guess with around a 30% success rate. Testing classical music versus humming, this model still performed fairly poorly as I was able to get higher classical readings from a sustained hum than a symphony playing through my phone.
feature-extract3.ck & genre-classify3.ck use the centroid, flux, rms, and the first 7 mfcc coefficients to classify the songs. x-validate.ck shows that the guessing was substantially improved compared to the classifiers without mfcc coefficients using and guessed with around a 43% success rate. This was the first model to pass my non-robust classical test. When playing Holst, the classifier would read above 0.5 for classical for the duration of the song. Additionally, this was the first model that would hold 1 or nearly 1 for classical during silence. Other models were distributed among other genres during silence.
feature-extract4.ck & genre-classify4.ck use the centroid, flux, rms, sfm, Kurtosis, ZeroX, XCorr, AutoCorr, Chroma, RollOff and the first 7 mfcc coefficients to classify the songs. This was the first feature extraction which was clearly more computationally expensive than the previous models, as it took much longer to extract all of these values but the classification was equally fast once the model was created. This did slightly worse when testing my own audio samples than the previous model without the additions of sfm, Kurtosis, ZeroX, XCorr, AutoCorr, Chroma, and RollOff but it was hard to tell just by observing. However, x-validate.ck shows that the guessing was FAR worse than any of the other models and averaged a classification rate less than 10% (which is worse than random guessing). This appears to be the result of overfitting.
feature-extract5.ck & genre-classify5.ck use the centroid, flux, rms, and 40 mfcc coefficients to classify the songs. Despite trading 7 features for 33 mfcc coefficients, this was less computationally expensive during extraction. x-validate.ck shows that the classification showed diminishing returns compared to the classifier with only 7 mfcc coefficients, showing a classification rate around 42% (marginally lower). This model performed fairly well against my own audio samples.
PHASE 2
Code: mosaic-extract-xanderphase2.ck
mosaic-synth-mic-xanderphase2.ck
Instructions:
Run mosaic-extract-XanderPhase2.ck:"data/combo.txt":"model1.txt" --silent
Then run mosaic-synth-mic-XanderPhase2:"model1.txt"
Make noise
Acknowledgements: Avril Lavigne & Vivaldi
Overall, the biggest issue I faced was my mic picking up the laptop speaker inputs and going into a feedback loop. This should be avoidable in Phase 3 as I don't intend to use the mic for input.
mosaic-synth-mic-xanderphase2.ck
Instructions:
Run mosaic-extract-XanderPhase2.ck:"data/combo.txt":"model1.txt" --silent
Then run mosaic-synth-mic-XanderPhase2:"model1.txt"
Make noise
Acknowledgements: Avril Lavigne & Vivaldi
Overall, the biggest issue I faced was my mic picking up the laptop speaker inputs and going into a feedback loop. This should be avoidable in Phase 3 as I don't intend to use the mic for input.
PHASE 3
Creative Statement: I'd like to use the direct system audio from an live game of SSBM as the input for the mosaic (rather than the pre-recorded audio). Still need to tune the audio inputs to sound better for the live game. Would also be nice to work in some controller inputs (e.g. D-pad for taunting sounds).
|
|
Code: extractssbm.ck
synthssbm.ck
extractssbmtron.ck
synthssbmtron.ck
Acknowledgements: Lots of starter code logic, use of VB-Audio Virtual Audio Cable software, some of the TRON Soundtrack (Daft Punk), and copyright free animal sounds from YouTube.
synthssbm.ck
extractssbmtron.ck
synthssbmtron.ck
Acknowledgements: Lots of starter code logic, use of VB-Audio Virtual Audio Cable software, some of the TRON Soundtrack (Daft Punk), and copyright free animal sounds from YouTube.
Featured Artist Reflection
In a project focused on system tuning and a semester where I hardly have time to boot up Melee, I decided I might kill two birds with one stone by merging the two. In a way, this project was an excuse to play melee. Every time I would change parameters meant that I would get to start a new game and get out some frustration by beating up a computer--how fitting. But beyond this shallow excuse to play a game I love, the aesthetic motivation of this project was to revitalize a game almost as old as I am. For the past 22 years, Super Smash Bros Melee has been untouched. No patches, no graphics updates, and the same soundtrack it came with back in 2001. It's a beautiful time capsule of video game development at the dawning of the millenium. And as much as I love everything about it's vintage feel, nowadays when I play Melee, I usually turn off the sound and plug in music of my own. Usually I'll be listening to something fitting of the mood; something upbeat if I'm taking it seriously, or something relaxing if I'm just trying to wind down after a long day. But sometimes my playlist ends and a new song comes on that wrecks the vibe. How can I avoid this? Well, what if the soundtrack adapted to the gameplay? This idea excited me and motivated me to spend hours tuning the extraction and synthesis for different datasets. The first, more comical idea I came up with was to replace the character sounds with realistic, animal noises. This system was tuned to short feature vectors and was highly responsive to the character inputs. However, it's definitely a bit hectic and not something you could play with for hours on end. So I made a second version which was trained on the Tron soundtrack. I added several features for extraction and similarity retrieval which helped avoid pulling the same few portions of the song. Additionally, I allowed for more layering of voices to get a smoother, ethereal effect. You can watch the videos to see how different the mood is between these two models.