@mux/ai workflows demo | @mux/ai + Vercel Workflows

Guides: How to auto-dub your videos into any language with AI

0:00

Boy, AI is moving fast, isn't it?

0:03

And some of these capabilities that we can now achieve effortlessly using

0:07

some of these AI tools are pretty unbelievable, including being able

0:11

to convert a video to speak a language that I can't even speak.

0:17

And that's what we're going to cover in this video, is showing one workflow

0:20

that you can follow to automatically dub your video

0:24

content into another language that you don't even speak,

0:27

to expand your audience.

0:29

So you don't really need a ton of coding experience for this, but

0:32

we'll walk you through step by step how to get this workflow set up.

0:36

So let's get going.

0:38

I have this video where I told a pretty amazing joke.

0:43

And I'm just going to play it here.

0:45

I used to hate facial hair, but

0:49

then it grew on me.

0:53

(Groan) If you're not groaning, I guess I don't blame you.

0:56

But we have this dad joke video, and I want to make sure

1:00

that folks across the world can watch the content that I'm producing.

1:04

So to start, I have a Mux account, and I'm going to set up this Mux account

1:09

and get the values that I need to approach this workflow to tackle this workflow.

1:15

I also want to introduce ElevenLabs, which is the audio platform

1:20

that we're going to be using to create the dubbing audio

1:23

files based off of the input that we'll be giving it.

1:26

And then n8n, which is a really neat way to set up work flows

1:31

that can be complicated if you were to write the code.

1:34

But here we're just building nodes

1:36

and kind of visually assembling what this workflow can look like.

1:40

So let's walk through this solution for how we're able to use

1:43

these three tools in tandem to solve this problem that we're faced with.

1:49

I have an environment that I work in, often called “deleteme”

1:52

and I can create a new asset within this environment.

1:55

But before I do that, if I were to upload

1:59

a video here, I would get a lot of features right

2:02

natively within Mux. I would get automatically generated animated GIFs

2:06

and captions and thumbnails, but Mux does not do any dubbing right out of the box.

2:11

What Mux does do is allow you

2:13

to hook up your other AI tools and APIs

2:17

to introduce additional workflows where you need those extra features.

2:22

So what we need to do is then basically have a connection

2:27

or a notification to come from Mux and be sent over to our automation

2:32

that we're going to build within n8n

2:35

to say, hey, Mux says, we have a new video.

2:38

Here's the information about that video.

2:40

Do whatever you want with it.

2:42

And within that workflow, that's where we're going to introduce

2:45

that new feature set.

2:46

So in order to do that I'm over in n8n,

2:50

and you can set these little triggers.

2:52

And this original trigger that we have here is kind of a handler

2:56

which creates a URL endpoint.

3:00

So this particular endpoint can be used

3:04

as the notification receiver from Mux.

3:07

So just by setting these different values here I'm saying

3:10

I'm expecting a POST request to come to this webhook receiver.

3:14

Here's the path that this is just an automatically

3:16

generated UUID and to respond immediately.

3:20

Essentially this is going to collect the information about the video

3:24

that we've uploaded when Mux sends this notification out.

3:27

So by creating this node within n8n

3:31

and I get this URL and I can use that URL

3:35

over in Mux under the Settings, Webhooks section.

3:40

And I want to make sure I'm

3:41

in the exact same environment that we're working within.

3:43

Look I've already added it here.

3:45

We can add this URL to notify and it matches the exact value

3:50

that came out of n8n.

3:51

All the events that are happening within Mux will be automatically sent

3:56

to this webhook handler that can now be acted upon

3:59

within the n8n workflow environment.

4:02

So that's cool.

4:03

So now we can take the next step so we can add these nodes within n8n.

4:09

And I'm using this sort of this conditional node within n8n.

4:13

It's an if node.

4:14

And it is basically going to check is the, the static rendition ready of this video.

4:20

Now Mux has this whole concept of what a static rendition is,

4:24

but it's essentially creating multiple files based off of your input file

4:30

to ensure that you can have the flexibility to work with it as you need.

4:34

Anytime a new static rendition is created, it will send a notification

4:38

through that webhook system to the receiver to say, hey,

4:41

static rendition that you've requested to create

4:44

is now available so you can act on it if you need it in some way.

4:48

So that's what we're checking here,

4:49

is that, hey, does this payload, this webhook notification,

4:53

is it equal to the value of

4:56

video.asset.static_rendition.ready?

5:00

So this is the literal value.

5:02

This string value.

5:03

The different payloads that are coming in will have different types.

5:07

And we can kind of take a look at what that can look like here.

5:11

So this is an example webhook that was sent

5:15

using video.asset.static_rendition.ready as the type.

5:19

And we get additional information

5:20

about the static rendition within that webhook here.

5:25

So we will act on it.

5:26

So what's essentially happening here is if

5:29

it is not that type, then we don't want to do anything.

5:33

We added this node here. That is a No Op.

5:35

That would mean that

5:36

we received a webhook that was not the type that we cared about.

5:39

So we only want to act on if the static rendition is ready.

5:43

Now, if it is ready, then we need to take the next step here.

5:47

And we're going to make an API call to Mux’s API.

5:51

We're going to use information from the webhook

5:55

to then request additional information from Mux related to that asset here.

6:00

And that particular information that we're looking for is called a Playback ID.

6:05

So I'm making this API call.

6:07

And then I'm using this variable syntax

6:10

to say hey whatever the data.asset_id from the webhook body

6:16

that came in was that's what we want additional information about.

6:20

So it's going to trigger off this GET request.

6:24

This is a HTTP request node within n8n.

6:27

And as soon as it hits this point it's going to make that GET request

6:31

and then pass it on to the next node that we've set up here.

6:34

And that next node that we've set up is the submit dubbing job node.

6:39

Again, just another HTTP request is kind of handling these

6:43

in a waterfall approach cascading one at a time.

6:46

So once we get additional information about that asset

6:50

now we want to make another API call.

6:52

But this time the API call is not going to Mux.

6:56

It's going over to a third party tool called ElevenLabs.

7:00

And ElevenLabs has a really neat product.

7:03

They are able to introduce music creation and dubbing here,

7:07

all within the dashboard.

7:09

But you can also access that programmatically through their API.

7:13

And that's what we're going to do here.

7:14

So within ElevenLabs, you will need to make an API key.

7:19

And you can do that under developers, I think it's API keys here

7:23

you can generate a new API key.

7:25

And you'll want to use that within your n8n instance

7:28

so that it can authenticate with the ElevenLabs API.

7:33

You can kind of see how I've set that up here.

7:35

Using the Header Auth auth type and selecting the value.

7:39

Now this was configured before recording this video

7:43

within the credentials area of n8n.

7:46

And this is true for the Mux API as well.

7:49

You can create new API keys within Mux.

7:52

Under Settings and Access Tokens, we have the ability to generate

7:57

new access tokens that will allow you to work within

8:00

the environment that you're working with over at Mux.

8:03

You can give your access token a name, and it will give you

8:06

the values that you need to plug in to a workflow like this.

8:10

I'll go back to, the request here.

8:13

See, we're getting a Basic Auth request type

8:15

and using the mux API credentials that we've set up.

8:19

So these are the different ways that you'll need to authenticate

8:21

with the APIs before you can actually interact with them.

8:25

So here we are making a POST request to the ElevenLabs API.

8:30

They have a dubbing endpoint.

8:31

And you can see that documented right here.

8:34

It's all relatively straightforward the way that it's set up here.

8:38

But if we take a closer look just at our request type,

8:42

we could see we're sending a Form-Data body type.

8:45

Here we are specifying the source URL so that it's specifically

8:49

locating the audio only of the video that I've uploaded.

8:54

And that was specified when I created the static rendition when it’s uploaded.

8:58

I'll show you that here in a little bit.

9:00

But we have a specific URL that says stream.mux.com.

9:04

Plus, look at the previous JSON data

9:08

from the Playback ID requests that we made over to Mux through the API.

9:13

Grab that Playback ID and put it here in this area.

9:17

So this is a variable.

9:19

And then also concatenate this string here at the end.

9:22

So it says +/audio.m4a.

9:26

So this is a way that we are going to pass that audio off to ElevenLabs for dubbing.

9:32

Now there's additional information here.

9:33

We have a target language.

9:35

And I've selected Finnish as the target language.

9:38

Something maybe someday I'll speak.

9:40

But I speak zero Finnish right now.

9:43

The source language can be automatically determined.

9:46

And then they have a couple of different values here that are also just specified.

9:50

The mode is automatic, number of speakers

9:53

If you go through the documentation, you should be able to see the description

9:57

for some of these set to zero to automatically

10:00

detect the number of speakers, which is what I've done here.

10:04

Give the job a name.

10:06

So I've just called it the "Dub for"

10:08

And then use the Mux Playback ID as the name for this dubbing job.

10:12

Do we want to watermark it? Nope.

10:15

So I put false and that's it.

10:17

So again we're going to take the Playback ID information and use it

10:22

to construct the URL for the audio file that we want to be dubbed.

10:28

And as soon as that sent off, we can now enter into this job status

10:33

checking loop here that will occasionally just check

10:37

in on the job status and determine whether or not the job is done.

10:41

Ready to go. Or if it's not, then we wait a little bit

10:44

and we go back around and check the job status again.

10:47

So I'll double click in here,

10:48

we can see that this is just again an HTTP Request node.

10:52

There is a GET method request.

10:54

Here's the API endpoint.

10:56

We're using the dubbing ID job that was returned from the previous step.

11:01

Again we are using the authentication types

11:04

and it's just a GET request to this endpoint.

11:06

Now if we were to look here we can see the GET request here.

11:10

It's matching the same you URL format.

11:13

So again we're checking is it ready.

11:16

Is it done.

11:17

Here's another if node right here.

11:19

Check the status field from what has been returned.

11:22

If it's equal to dubbed then yes the job has finished.

11:28

It's all set.

11:29

If it's not equal to that value then we go over to create this

11:34

wait node which says wait 10 seconds

11:37

and then go back here and enter this job status node once again.

11:43

So we'll exit out.

11:45

Go back and check the job status until it's finally dubbed.

11:49

If it's dubbed, then we can mark it as true and it would go off to this

11:54

final node over here to create the new audio track over on the Mux asset.

11:59

Now there's probably another exit path

12:02

that should be defined here that I haven't done for this prototype.

12:06

We don't want it to loop forever here to say, hey, it's not equal to dubbed.

12:10

Let's wait and check again when maybe the job failed for some reason.

12:14

And there's no amount of checking is going to give a successful response.

12:18

So we're checking if it's done.

12:20

If it is done, then we'll create the audio track again.

12:23

It's an HTTP request node within n8n.

12:27

and here's the endpoint here we're using now the API for Mux.

12:31

We're selecting the asset that was available

12:35

from the GET Playback ID step which was created earlier in the workflow.

12:41

Here's how we were able to find that,

12:43

that Asset ID within the JSON.

12:46

And we're making this post request to the Track’s endpoint.

12:49

Now this will create a new audio track on your Mux asset.

12:54

We have a URL, we have a language code, and we have some passthrough

12:59

which is just metadata about the way that this was created.

13:03

So that you URL is going to be another n8n endpoint,

13:08

and we'll see where this is coming from momentarily.

13:11

But just take a look that it's a webhook and we're making a dubbing request.

13:16

And in that dubbing request here we're using the Dubbing ID

13:21

and target language from the

13:24

Check Job Status step that we were using before.

13:27

So it is going to grab the Dubbing ID

13:31

and the target language and inject that into this URL as a variable.

13:36

So it's going to give us the job for the ElevenLabs,

13:41

dubbing job and the target language that was specified here,

13:44

that is just going to appear in the final URL when the request is made.

13:49

And then again down here, the language code is the same value

13:53

that was specified in that check job status execution.

13:57

So we'll be able to just inject it straight from the response

14:02

that was coming right from ElevenLabs.

14:05

So what happens here? When we make this POST request over to Mux

14:08

it's going to say, hey, add a new audio track to this asset

14:12

and use this file

14:15

that is returned from this URL as the new audio track, right?

14:21

Now, where is this URL actually going?

14:23

Because Mux is going to check this URL for the audio file and use it as the input.

14:29

That's where I've set up this second little mini

14:31

workflow down here, which is serve the audio file.

14:35

So this is another webhook trigger here

14:39

that is using these two variables, the Dubbing ID and the language code.

14:44

It's going to capture those values in the request that's made to this endpoint.

14:49

And we'll continue down

14:52

the path of this little workflow that was set up.

14:55

So anytime a request is made from Mux in this format,

14:58

it will capture the values that were specified here and make it

15:02

so that we can use them in the following steps.

15:05

So this is going to handle the request.

15:08

From here we're going to fetch the audio data.

15:11

It's an HTTP Request that's handed off right to ElevenLabs

15:16

using their, again, their fantastic API.

15:20

It's a request to the dubbing endpoint using the Dubbing ID from the params,

15:25

the variables that were set up in that URL, audio and the language code.

15:29

So if we were to go here, we can see the audio GET dubbed audio.

15:34

Here's the endpoint for it.

15:36

So we have the format up here.

15:38

We're using the Dubbing ID and the language code.

15:41

And what is returned is the file stream.

15:44

That file is basically going to be proxied

15:47

right through our handler here and responded to in that request.

15:53

So this request again is coming from Mux where it's saying,

15:57

hey, give me the audio file that you want to use as that second language track.

16:01

I'm proxying that request off to ElevenLabs to say, hey,

16:05

do you got that that audio file from that dubbing job that we just ran?

16:09

Here's the information about the dubbing job in the language that I'm looking for.

16:13

And the final step here is a response

16:16

to the webhook using a binary file type,

16:20

and the response data sources choose automatically from input.

16:23

You don't have to do anything special here.

16:24

It will just hand it off from the response in the previous step

16:29

and respond to the webhook with the appropriate file.

16:32

So this is the whole workflow.

16:34

I'm going to try to publish this.

16:35

So that it can be used and imported into your n8n instance without really

16:41

any additional configuration other than the authentication aspect.

16:45

But we should take a look at this whole thing in action and how this works.

16:49

When I upload a video file, the Mux.

16:52

So I'm going to go to assets.

16:54

I’m going to go to the correct environment.

16:57

And I want to create the new asset here I'm going to select the file

17:02

dad-joke.mp4

17:05

I’m going to call it dad-joke. I'm going to click next.

17:09

We have settings in place.

17:10

But I want to make sure that under this advanced tab again

17:13

this is where I'm going to create the static conditions.

17:15

So here in the Mux docs, it shows us how we can add static conditions

17:20

which will make the file available for retrieval - the audio file.

17:25

So I'm going to delete the “highest resolution.”

17:28

I only need an audio only resolution available statically.

17:32

So by adjusting this little payload here and introducing

17:36

this static rendition, I'm going to kick off this upload.

17:40

It will make that audio file available for n8n.

17:43

And then it will send a notification over to n8n

17:46

when the audio file is ready

17:48

so we can kick off this entire workflow, which we'll watch in real time.

17:52

So I'll go ahead and hit start upload.

17:56

We're uploading the file.

17:57

We can see it's ready to go here.

18:02

It's ready pretty fast.

18:04

I used to hate facial...

18:06

But what we don't have is the additional language tracks just yet.

18:10

And it's in progress.

18:11

We can see the static rendition was created,

18:13

and as soon as the static rendition is ready,

18:17

then it will send that webhook over to our n8n workflow.

18:21

And kick off this whole dubbing process.

18:24

So we'll keep an eye on both areas for when the static rendition is ready.

18:28

And we should start to see this job

18:31

and this workflow kick off.

18:33

There we go.

18:34

Now we can start to see,

18:36

these events are starting to come in.

18:38

Some of these are likely just webhooks that are indications

18:42

that the video is ready.

18:43

But, it wasn't related to the static rendition.

18:46

Now, this one looks like maybe it was the static rendition.

18:50

Because it's taking longer than some of these other ones

18:53

that were probably discarded right at this step.

18:56

And we can kind of click through here and see the path that it took.

19:00

So it said the static rendition was not ready.

19:03

So don't do anything here.

19:04

Up here, this was the long job in place that

19:09

we can kind of watch the path that it took.

19:12

As you see the green path here. We got the Playback ID,

19:14

we submitted the job, we started a loop here.

19:17

And when it was done, it created the audio track.

19:20

And continued down this path of serving the whole workflow that we had described.

19:26

Now it looks like this job has completed.

19:28

So I'm going to go back over to my dad joke video.

19:31

Just do a quick reload

19:34

on this page and check it out.

19:36

We have a new icon here in the corner for audio.

19:40

So not only do you have the opportunity to listen to the English track

19:44

for my dad joke, but we have the Finnish track as well.

19:51

<i>Generated Voice speaks Finnish.</i>

19:57

Pretty cool.

19:58

And I actually kind of sound like me too, which is pretty wild.

20:01

Now, here's the one caveat with all of this.

20:04

I don’t know any Finnish. And so did it get the joke right?

20:08

There's probably some context here, especially when it comes to humor

20:12

or nuance.

20:13

When you're trying to communicate

20:14

something that these AI models you need to do some sort of QA,

20:19

you need to make sure, to that to the extent that you are comfortable,

20:23

you will want to put a human in that process to ensure

20:26

that your content is meeting

20:28

the requirements for your application and for your business.

20:31

But, you know, for my case here, this is pretty neat to be able to now

20:36

suddenly apply my video to a brand new audience

20:39

that I wouldn't have been able to access before.

20:42

So there you have it.

20:43

If you have any questions

20:44

about how you might be able to use this or something similar in your application

20:48

please don't hesitate to reach out for us.

20:51

I'll make sure that I have a fresh new dad joke for you as soon as you contact.

20:55

Cheers! Have a good one.