Isolating or removing vocals from a song
This page describes some methods to try to isolate vocals in stereo tracks.
Last updated
This page describes some methods to try to isolate vocals in stereo tracks.
Last updated
© Muse Group & contributors. Contents licensed under the Creative Commons-Attribution 4.0 license.
Note: There is no reliable way of separating vocals. The methods described in the article depend on the position of the vocal track in the stereo field.
If the vocals are panned in the center of a stereo track this method can sometimes be effective by removing what is common to both tracks (that is, the vocals), leaving behind what is different (that is, the instrumentals).
Audacity includes the Vocal Reduction and Isolation effect which provides the Remove Vocals option that you can use to try to remove vocals from a stereo track.
Vocal Reduction and Isolation also lets you specify the audio frequency range for vocals (by default 120 to 9000 Hz). This can help cure the common problem where center-panned bass or Hi-hat is also removed when removing vocals.
You can also use the Vocal Reduction and Isolation effect to attempt to isolate the vocals by choosing one of the Isolate Vocals options from the Action dropdown menu in the dialog.
Note that the end result may not be total vocal isolation or even satisfactory isolation of the vocals; it all depends on how the original recording was engineered.
Note: This is an experimental feature not yet part of the normal Audacity installation.
To use AI models in Audacity, you first need to download the current alpha with this feature from https://interactiveaudiolab.github.io/project/audacity
Once you have installed this version, you can download and apply AI models via Effects → Deep Learning Effects.
Deep Learning Effects are computationally very intensive. Depending on the model used and your computer, it can take several minutes to hours to apply the effect to a single song. It is highly recommended to test out whether the model is satisfactory on a short section (less than 10 seconds) before applying it to an entire track.