Tags: posts polarity-music Bitwig Audio-FX Plugins

Creating a Robotic Voice with Formants in Bitwig Studio

Tutorial | Thu Mar 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time)

In this video, I explained what formants are and how they can be used to simulate a robotic voice. I showed how this can be done in Bitwig using the XP filter and vector 8. I also demonstrated how this can be achieved using Vital, a subtractive wavetable synthesizer, and a vocoder. Finally, I shared a free preset called Bitwig Lama which can be used to create this effect.

You can watch the Video on Youtube - support me on Patreon

Questions & Answers

Maybe you dont watch the video, here are some important takeaways:

What are formants?

Formants are roughly spoken static frequencies in a body or room. They are created when a person speaks, sings, or plays an instrument, and can be seen as the overtones that change independently of the pitch when a person speaks or plays. They can be simulated in a studio with filters and synthesizers, and can be used to create robotic voices and other sound effects.

How can formants be used to create robotic voices?

Formants can be used to create robotic voices by using a filter to create a static frequency, and then using modulation and an interface like the Vector 8 to create different vowel sounds. By changing the resonance and frequency of the filters, and by using an offset to create a different octave, a robotic voice effect can be created.

What is the best way to extract formants from a human voice?

The best way to extract formants from a human voice is to use a spectrum analyzer to visualize the frequencies of the voice, and then use this information to create a filter with the desired formants. Alternatively, a person can record their voice, edit it in a sampler, and then use a vocoder to


This is what im talking about in this video. The text is transcribed by AI, so it might not be perfect. If you find any mistakes, please let me know.
You can also click on the timestamps to jump to the right part of the video, which should be helpful.

[00:00.000] Hey folks, welcome back to another video. The day it's about formants. What are
[00:05.520] formants? Formants are basically roughly spoken static frequencies in a body
[00:13.960] or in a room. So for instance, you can see in the background when I speak, you
[00:20.040] can see all the frequencies changing. So when I say, for instance, the wall can
[00:27.200] see, we have your fundamental frequency, which is the pitch of my vocal chord.
[00:31.040] That's the oscillator. And on top of that, I can change my mouth for different
[00:36.640] power sounds while maintaining the same pitch. So I can change the overtones
[00:41.880] independently from the pitch. So I can go from E, O, E, A, right? You can see how
[00:53.160] this stays the same, but the overtones are changing, which means my mouth is
[00:59.880] basically a filter. And my vocal chord is the oscillator. And we can simulate
[01:05.160] this in a bit of a studio, of course, with the filters and can create some kind
[01:12.240] of wall sounds with synthesizers or it sounds a bit like this delay, Lama,
[01:18.120] VST plug-in. So yeah, there's also Wikipedia page about formants. And there's an
[01:24.240] interesting table at the right side here where you can read the average
[01:28.600] shovel formants for a male voice. So when I want to create an eye here, I use
[01:35.960] 240 hz for one filter or for one frequency peak or band pass filter. And another
[01:42.480] one needs to be at 2,400 hertz, right? So I can create this kind of E sound while
[01:50.240] sound. And beneath that, I can change the oscillator pitch independently and
[01:57.120] can play melodies with it, but I want to have these static frequencies in
[02:01.520] there. So it sounds always like an E sound or E while sound. It's also the
[02:07.320] same for when you create some kind of guitar presets or you want to have a
[02:12.360] guitar, physical modeling guitar sound. You probably have to integrate at some
[02:18.880] point in the chain static frequencies because the guitar body is the same
[02:24.440] all the time. You change the pitch with the strings, of course, and you play
[02:29.440] melodies, but the resonances of the body is always the same. Same with piano
[02:35.160] sounds when you play piano or you want to create a synthesized piano. You have
[02:39.720] to integrate all these formants static resonances of the body of this
[02:46.120] piano object. So this is what it's all about with the formants basically. So
[02:53.560] it's static frequencies. So here we can use this table to create basically a
[02:58.480] filter. So I'm just removing my Audi input and use an instrument track and here
[03:03.880] we use then a polysynth. Just just place a saw, so let's stay with that. Let's
[03:19.160] create here with an FX grid, some kind of robotic voice on top of that. So we use
[03:28.080] basically not SPF, I'm using XP filter, two filters here, and we disable
[03:38.600] completely at the pitch tracking because we want to have static frequencies.
[03:42.280] So we can go into one of that, the other one, and then use some just go
[03:53.720] straight to the output here. We probably also want to use the value and what
[04:01.520] you later, because we want to change the resonance. Resonance, okay. Of two filters
[04:11.040] here at the same time, we go up to 100% here. It's not needed to have more. So we
[04:18.840] can change the resonance of both filters at the same time. Okay, now we have
[04:27.360] this. We can go back to Wikipedia and you can say I want to create an A sound, A
[04:32.280] vowel sound, so A 150 hertz for one. And we have 1,610 for the other one. And now
[04:50.760] it should sound like an A, right? So we can change the pitch. So we have basically
[05:04.200] one vowel sound. So the interesting thing now is we want to morph between
[05:10.160] different vowel sounds here. And I found the best method to do that is as an
[05:15.360] interface, you use here the vector 4 or vector 8. So you can create kind of 8
[05:24.480] different vowel sounds where I go with the vector 8 here for now. So in the top
[05:29.120] one here, we want to go to A. So it dials back here to the init frequency and
[05:37.160] also go to band pass 4 and they go to band pass 4 here also. So now we have to
[05:45.200] do some bit of work. So we want to create an A sound here. So again, we have to
[05:50.160] go to A 150 here. And you can do that here a bit by hovering over here, your
[06:00.280] modulated input. And you can read down here, basically in the brackets, you can
[06:06.440] see where we end up on which kind of frequency we end up with the modulation.
[06:10.600] When we dial in the modulation, of course, you can see 579. That's not where
[06:15.960] we want. We want to go to A 150. And that's a bit of a problem here. Every time
[06:23.720] you modulate it, it goes away. So you have to just modulate a bit and then
[06:29.040] let go and then you can read where you end up. So A 150 here and 1,610 here.
[06:38.600] A little more. Don't feel like this, right? So now we are on A here.
[06:55.360] Okay. And here on this one, we maybe go to, let's say A, which is 390 hertz. So we
[07:06.400] go down here to, or actually also up. We want to have this here. 390. And the
[07:16.920] other one is 2300. It's too much. There it is. That's a bit of a hassle, actually.
[07:30.520] This would be nice if Bitcoin actually showed you where you end up with the
[07:34.720] modulation here, right? This would be very, very helpful sometimes. Right? So now
[07:45.760] we have two vowels basically in there. We can morph between these two. We can
[07:50.440] also bring in here maybe this one. Let's say that we have an E here, maybe go
[07:57.960] for an U, 250. That's a bit down. Just a tad. 250 and 595.
[08:12.440] That's okay. So now we have an U vowel sound. What else? We have U, we have A. We have, I think,
[08:40.960] E, maybe we can do an E here, down here. So E is 230, also, bit down.
[08:48.000] 240 and 240 and 2400 here. Yeah, that's it. Okay. So now we have an E in this corner.
[09:18.000] And we can call it a bit big Lama. Maybe you know the VST, the Dalai Lama.
[09:29.240] It works exactly the same. You just change vowels here. What we also can do is we can,
[09:35.720] instead of changing the resonance, we can also duplicate this.
[09:42.680] You can change here the offset. Maybe you'd use here this and call it moment offset.
[09:53.720] So we can offset the whole frequencies here, maybe by one octave. Maybe you use here also
[10:15.220] a micro. You can also download this then later on, of course, in the description below for free.
[10:20.040] I just want to give you a rough idea here. It's also not very precisely done, but so we can see
[10:27.880] how much work this is to dial this in, actually, to create a filter like this. It's basically a filter we
[10:33.400] just create here. This is the resonance, oh, the rest of the resonance. And this is the one of that. Oh, it's actually a
[10:49.280] format offset. It's actually fun to use. Super massive and maybe a bit of modulation here.
[11:19.280] Okay, so just to give you a rough idea of this, so this is basically how we create some robotic
[11:44.320] voices or this kind of Delay Lama VST thing in Bitwig. I save this here as a preset. So I call it for now,
[11:54.240] Bitwig Lama. I don't know if it makes sense. Let's save it as their vocal thing here. No, then just use
[12:05.440] filter. Yeah, filter makes sense. Soft, yeah. Delay Lama, ringing voice, okay, bam. Another example,
[12:22.160] vital is a three subtractive wave table synthesizer. And in here, we have, of course, also a saw. But in
[12:37.280] here, we have also filters, and you can select your format filter. And you have two format filters,
[12:43.280] a, o, e, e, um, can select us. And then you have your also the offset. And you can morph between
[12:53.840] different vowels. Okay, so you have different options there. Yeah, we also have basically an x, y,
[13:18.800] add basically the same thing as we have here on the, on this one here, right? But here you can
[13:26.000] also choose eight different positions. If you want to, if you want to have more vowels, I mean,
[13:31.520] you can see their multiple vowels here. It can also extract maybe more formats from your voice
[13:38.080] and see how your voice sounds, how your formats are. So every human has different frequencies for
[13:45.120] these formats here. Of course, and, but they are kind of in the same, in the same area, right,
[13:54.320] from frequencies, but different humans slightly differ. So you can change how, what, what kind of
[14:01.120] human you want to replicate with these formats here? Frequencies. It's the same with guitar bodies,
[14:07.200] right? If you have a big body for piano, piano thing or guitar body, then of course the formats change.
[14:15.120] If your room is bigger, where you play a piano in the resonance change. So with this, you can
[14:21.680] basically replicate how a human or what kind of human you want to simulate if it's a woman or a man.
[14:29.200] But you can extract us from your own voice here with the spectrum. I showed you in the beginning.
[14:34.000] Or you can even sample your own voice, play it back in the sampler and then use a vocoder.
[14:41.360] That's also possible, of course, if you don't want to use actually some filters, you can just use
[14:45.840] a vocoder for that. I don't think I, maybe I show you actually also this. So maybe use here an
[14:55.920] audio track. Just record it. And just put this here into a sample. Also use a loud split
[15:13.280] to cut all the unnecessary frequencies out. Also my fancy loudness preset here to make it
[15:23.840] really loud. And just bounce this. Post fader. So we can use a sampler for that. Put this into the
[15:37.360] sampler. Whoops, it's wrong track. To have to see on the sampler. And we probably also wanted
[15:47.600] to disable here the pitch tracking. Okay, so then we use the Polysynth to disable the bit
[16:01.840] big lama thing and choose a vocoder. Oh yeah, it's a special symbol. And here we have to carry
[16:09.120] out, I think. Let me put this in there. The carrier. This is actually the carrier. It's probably
[16:23.520] the other way around here. It's probably the modulator. Always mix this up.
[16:44.720] Right, so you can use basically your own voice to record this, but it's actually not a secret.
[17:00.000] We can also put this here to texture mode and release. So we can change the position or the
[17:05.440] vowel sound. So we use a macro for that. Change the position. I think that's okay.
[17:16.560] You get the idea, right? This is not the best recording.
[17:46.560] So this is also a method you can try out with the sampler and then put your vocals or your
[18:01.760] vowel sounds or record all your vowel sounds into the sampler. And then you use the texture
[18:07.360] mode to morph between these vowel sounds, use the vocoder to put all these filters on top of
[18:14.160] your synthesizer. And then you can create kind of the same sound. Also, you have more bands.
[18:19.920] It's basically more precise. But for most of the effects, this first solution is enough
[18:30.160] having just two filters here. It's also nice practice to build something like this in the grid.
[18:51.280] Okay, I think that's it for this video. Like I said, the preset for the bitviglama is in the
[18:56.880] scripture below. You can use it for whatever. Have some fun with it. Thanks for watching.
[19:02.800] Leave a like if you liked the video. Subscribe to the channel, of course. Think about the
[19:08.480] subscription on Patreon if you'd like to. And thanks for watching. See you in the next video. Bye.