Hello, this id a mini guide on how to train a model to be used with rt-neural-generic lv2 plugin contained in aidadsp-lv2 bundle.
Constraints: we will model a guitar amp and the resulting inference will runs @48kHz
you connect your audio card so that ideally you can stream audio in the guitar in and you can record audio directly at the speaker output (with an adapter) or using a reactive load. The output of the amplifier should never clip your sound card input.
you register both BassMix_pcm_f32le_48000 and GuitarMix_pcm_f32le_48000 tracks you’ll find here through the device
in a Colab instance or whatsoever machine suitable for nn training (torch, keras), clone this repo, branch aidadsp_devel
rename for convenience the files you recorded (and the ones provided) in
copy these files on the machine instance and invoke this script to prepare Dataset for training
under Results dir the model json file will be created, but it’s in the wrong format. We can use the following script to adjust it
python3 modelToKeras.py -l RNN-aidadsp-1
you can now copy model_keras.json from Results folder, rename it in MyVeryFooModel.json and deploy on the target (Mod Dwarf, Aida DSP OS ) in user-files/Model SIMs. WARNING: this path doesn’t exist yet on Mod, see PRs
This work is HEAVILY based on work from @chowdspchowdsp and Alec Wright who wrote the real-time inference library and the training scripts. I want also to mention GuitarML (Keith Bloemer) · GitHub, my first version of the plugin was basically a porting of his Neural Pi. Please support them so that they can continue their work.
There is a galaxy of stuff that I’ve omitted but it would have been a mess otherwise. If you have questions, I will do my best
I now have an overall idea of the process but still have a lot of reading to do.
The first questions that come to my mind are about the hardware needed to record the training (BassMix & GuitarMix) audio.
Sending the audio to the guitar amp input:
Can I stream the audio from the Dwarf’s or DUO Audio File Player plugin or do I need to stream from a DAW?
I’m asking you about that because I believe that using the MOD devices would eliminate the need of a ReAmp box to match the signal of my interface(DAW) and the guitar amp input.
Recording the guitar amp speaker out:
I find this one a bit trickier…
Both of my amps have at least two speaker outputs (4 ohms).
This way I could have a cabinet connected to one of the outputs while recording the audio from the other output and not damage the amp ( I don’t have a reactive load ). Is this possible?
If it is, I imagine that I will need to attenuate the speaker output signal to a line level to be able to send to my interface(DAW) and record it. Correct?
I’ve explained this badly. With speaker out I mean an adapter (attenuator) to be able to provide a suitable input level to the recording interface. If you already did this succesfully with the 2nd output of your amp then it’s fine. Another way is to use a rective load, so that you don’t need to play the amp loud to achieve the desired tone. Also a reactive load should not damage your amp.
the input files are needed because the training works roughly like this:
trainable weights are randomly set
provide input file to the nn, forward the nn and obtain output
calculate loss between predicted and recorded output
again over and over until we detect that we are no longer learning (the loss is not diminishing anymore)
this job is done thousands of times.
The difference in respect to profiling technique, which uses some test signals is that here we’re using real audio, and the nuances of the amplifier should be better captured. Also the above training guide can be expanded so that we can create a conditioned model. Those models let you train over one or more parameters. Usually you record various target files changing gain of the amp (e.g. 0, 25, 50, 75. 100) and then in my plugin you change parameter1 and you’re changing the gain of the model…still need to work a bit on this…
Does it make sense to train with multiple input files and merge the result for more accuracy? Let’s say we have an amp with only one parameter gain, that can be changed between 1 and 100. If we record 100 files for each value, can they be “merged” in the training session? It’s kinda having 100 profiling sessions with kemper, one for each changed value but getting 1 single final profile.
In theory we could. However the resulting conditioned model learns to use the gain in a continuous way already (it’s not going in steps). Regarding the accuracy, maybe there is a point (number of gain settings) where difference with original amp is already low and it didn’t make sense to register additional files. A bit similar to “velocity layers” in soundfonts, the more the better but there is a knee where layers that you add didn’t improve final sound too much.
Sent you that beer on github. At the moment that’s all what I can do. The exchange rate between BRL and EUR It’s really high!
I don’t have the equipment to record the audio properly to make the training models of my amps at the moment. I would need to buy at least resistive load with a line level output (attenuator).
The price for a reactive load here in Brazil is just crazy expensive (+or- 450EUR)
I have two left hands considering soldering skills.
But… Considering the cost of the that “Torpedo” reactive load around here, this project could make a lot of sense.
I just remembered that my Mesa-Boogie MKIII does have a direct out. It has been so much time that I don’t look at the rear panel that I forgot about it.
Will need the speaker connected though. I have a room that is acoustically treated and isolated enough so that I may not disturb the neighbors.
I probably can make the model training with the amp up loud during the day and exit the room for a couple of hours.
Here is the transcript of the manual:
“DIRECT Previously known as the SLAVE, this feature provides a variable strength signal right from the speaker jack This way better tone is supplied, all Effects and Reverb are included, and there is absolutely no loss of the Boogie’s tone when running from the Direct to a mixing board or another amplifier. (Many players will still prefer a microphone “listening” to their speaker coloration.) In some sophisticated set-ups, players run their Direct into their Effects Rack and then from the Effects into other, external amplifiers. But such a set-up cannot route the Effects output back into the original Boogie. Also note that a speaker or load resistor should be plugged into a Speaker jack when using the Direct. Load resistor value…though not critical … can change the overall tone. Suggested value: 8 ohms, 50 watts minimum. And note that this resistor will get quite hot when running the Boogie “up loud” for long periods.”
The other amp is a DV Mark 50II (Solid State). This one I need to further investigate about its speaker outputs.
Nice find! Note that you don’t need to keep the amplifier loud for 2hrs. It will be loud for 6-7mins, the time for the Bass and Guitar tracks to be recordered. Then while the training is running, you can fool around with your amp and maybe find where it sounds at its best. Or you can register multiple tracks noting down the settings for each track. One day a guy from the future will use them with his quantistic multieffect device. Or maybe we can just add another guide for conditioned models. Let’s think about it.
@madmaxwell Thank you for your work, the whole project is absolutely awesome!
I have one question regarding the training. Is it feasible to train the network at home ( I have a GTX2070Super ) or do you need something more powerfull like the google colab GPUs? I have some experince with PyTorch and training NNs for medical purposes, so setting up a docker with the needed dependencys wouldn’t be the problem.
I’ve tried with my Laptop but it was blowing up plus was becaming super hot and impossible to work with. And I need to use it. Don’t know really, I would say that you can try yourself and see how it goes. Nice to have people interested, I can share pre-recorded files if you want to focus on training only. Let me know!
If you have some training data for an easy proof of concept run it would be very much appreciated.
What is your workflow to make sure the audio files ( input and output ) are aligned perfectly, after recording the training examples?
you’ll find an excel file describing what the recordings are. This is subject to change do a backup. In addition to that you’ll find the output from training on Colab in Results. This is subject to change too.
Regarding track alignment, I can’t really help, since for eliminating all the troubles with real hw recording and focusing on the rest I’m profiling plugins (Neural DSP). So in my case I use render track with Reaper and it’s aligned without even bothering for buffer or latency. In case of real hw, I would try Reaper auto latency compensation feature and see how it goes. The problem is that a slight phase misalignment could be due to filter in action, so align peaks and so on would not work…
Yes, that sounds like a good way to get the latency from the audio card.
I would like to see if this works for a loop with a distortion pedal connected to it.
Normally an analog pedal should introduce basically no latency, but that’s probably hugely depending
on the pedal schematics…
I think there’s also a good chance a combination of the dirac impulses and the auto latency compensation would give good results. Sending a few impulses through the loop, check the auto latency results and use them to align future recordings. That sounds very much like something an automatic modeller would probably do.
Oh, and one additional question, I never tried using such a auto latency feature, it would be really interesting how good it works. If I have time tomorrow, I’ll try to record something, put it through some effects, move the recording and try to align it again with the auto feature. Then we could see how trustworthy the feature really is.
Nope. When measuring latency, we’re interested in the one introduced by recording system, not gear in between like a distortion pedal. If the pedal or whatsoever stuff you want profile introduces latency, then this is something you want to model, this is part of the stuff you’re modeling. Once you get the magic number you register audio through the device and once finished you shift the track back in time by whatever it’s measured in latency test (aka the magic number).