Guides: How to auto-dub your videos into any language with AI
Boy, AI is moving fast, isn't it?
And some of these capabilities that we can now achieve effortlessly using
some of these AI tools are pretty unbelievable, including being able
to convert a video to speak a language that I can't even speak.
And that's what we're going to cover in this video, is showing one workflow
that you can follow to automatically dub your video
content into another language that you don't even speak,
to expand your audience.
So you don't really need a ton of coding experience for this, but
we'll walk you through step by step how to get this workflow set up.
So let's get going.
I have this video where I told a pretty amazing joke.
And I'm just going to play it here.
I used to hate facial hair, but
then it grew on me.
(Groan) If you're not groaning, I guess I don't blame you.
But we have this dad joke video, and I want to make sure
that folks across the world can watch the content that I'm producing.
So to start, I have a Mux account, and I'm going to set up this Mux account
and get the values that I need to approach this workflow to tackle this workflow.
I also want to introduce ElevenLabs, which is the audio platform
that we're going to be using to create the dubbing audio
files based off of the input that we'll be giving it.
And then n8n, which is a really neat way to set up work flows
that can be complicated if you were to write the code.
But here we're just building nodes
and kind of visually assembling what this workflow can look like.
So let's walk through this solution for how we're able to use
these three tools in tandem to solve this problem that we're faced with.
I have an environment that I work in, often called “deleteme”
and I can create a new asset within this environment.
But before I do that, if I were to upload
a video here, I would get a lot of features right
natively within Mux. I would get automatically generated animated GIFs
and captions and thumbnails, but Mux does not do any dubbing right out of the box.
What Mux does do is allow you
to hook up your other AI tools and APIs
to introduce additional workflows where you need those extra features.
So what we need to do is then basically have a connection
or a notification to come from Mux and be sent over to our automation
that we're going to build within n8n
to say, hey, Mux says, we have a new video.
Here's the information about that video.
Do whatever you want with it.
And within that workflow, that's where we're going to introduce
that new feature set.
So in order to do that I'm over in n8n,
and you can set these little triggers.
And this original trigger that we have here is kind of a handler
which creates a URL endpoint.
So this particular endpoint can be used
as the notification receiver from Mux.
So just by setting these different values here I'm saying
I'm expecting a POST request to come to this webhook receiver.
Here's the path that this is just an automatically
generated UUID and to respond immediately.
Essentially this is going to collect the information about the video
that we've uploaded when Mux sends this notification out.
So by creating this node within n8n
and I get this URL and I can use that URL
over in Mux under the Settings, Webhooks section.
And I want to make sure I'm
in the exact same environment that we're working within.
Look I've already added it here.
We can add this URL to notify and it matches the exact value
that came out of n8n.
All the events that are happening within Mux will be automatically sent
to this webhook handler that can now be acted upon
within the n8n workflow environment.
So that's cool.
So now we can take the next step so we can add these nodes within n8n.
And I'm using this sort of this conditional node within n8n.
It's an if node.
And it is basically going to check is the, the static rendition ready of this video.
Now Mux has this whole concept of what a static rendition is,
but it's essentially creating multiple files based off of your input file
to ensure that you can have the flexibility to work with it as you need.
Anytime a new static rendition is created, it will send a notification
through that webhook system to the receiver to say, hey,
static rendition that you've requested to create
is now available so you can act on it if you need it in some way.
So that's what we're checking here,
is that, hey, does this payload, this webhook notification,
is it equal to the value of
video.asset.static_rendition.ready?
So this is the literal value.
This string value.
The different payloads that are coming in will have different types.
And we can kind of take a look at what that can look like here.
So this is an example webhook that was sent
using video.asset.static_rendition.ready as the type.
And we get additional information
about the static rendition within that webhook here.
So we will act on it.
So what's essentially happening here is if
it is not that type, then we don't want to do anything.
We added this node here. That is a No Op.
That would mean that
we received a webhook that was not the type that we cared about.
So we only want to act on if the static rendition is ready.
Now, if it is ready, then we need to take the next step here.
And we're going to make an API call to Mux’s API.
We're going to use information from the webhook
to then request additional information from Mux related to that asset here.
And that particular information that we're looking for is called a Playback ID.
So I'm making this API call.
And then I'm using this variable syntax
to say hey whatever the data.asset_id from the webhook body
that came in was that's what we want additional information about.
So it's going to trigger off this GET request.
This is a HTTP request node within n8n.
And as soon as it hits this point it's going to make that GET request
and then pass it on to the next node that we've set up here.
And that next node that we've set up is the submit dubbing job node.
Again, just another HTTP request is kind of handling these
in a waterfall approach cascading one at a time.
So once we get additional information about that asset
now we want to make another API call.
But this time the API call is not going to Mux.
It's going over to a third party tool called ElevenLabs.
And ElevenLabs has a really neat product.
They are able to introduce music creation and dubbing here,
all within the dashboard.
But you can also access that programmatically through their API.
And that's what we're going to do here.
So within ElevenLabs, you will need to make an API key.
And you can do that under developers, I think it's API keys here
you can generate a new API key.
And you'll want to use that within your n8n instance
so that it can authenticate with the ElevenLabs API.
You can kind of see how I've set that up here.
Using the Header Auth auth type and selecting the value.
Now this was configured before recording this video
within the credentials area of n8n.
And this is true for the Mux API as well.
You can create new API keys within Mux.
Under Settings and Access Tokens, we have the ability to generate
new access tokens that will allow you to work within
the environment that you're working with over at Mux.
You can give your access token a name, and it will give you
the values that you need to plug in to a workflow like this.
I'll go back to, the request here.
See, we're getting a Basic Auth request type
and using the mux API credentials that we've set up.
So these are the different ways that you'll need to authenticate
with the APIs before you can actually interact with them.
So here we are making a POST request to the ElevenLabs API.
They have a dubbing endpoint.
And you can see that documented right here.
It's all relatively straightforward the way that it's set up here.
But if we take a closer look just at our request type,
we could see we're sending a Form-Data body type.
Here we are specifying the source URL so that it's specifically
locating the audio only of the video that I've uploaded.
And that was specified when I created the static rendition when it’s uploaded.
I'll show you that here in a little bit.
But we have a specific URL that says stream.mux.com.
Plus, look at the previous JSON data
from the Playback ID requests that we made over to Mux through the API.
Grab that Playback ID and put it here in this area.
So this is a variable.
And then also concatenate this string here at the end.
So it says +/audio.m4a.
So this is a way that we are going to pass that audio off to ElevenLabs for dubbing.
Now there's additional information here.
We have a target language.
And I've selected Finnish as the target language.
Something maybe someday I'll speak.
But I speak zero Finnish right now.
The source language can be automatically determined.
And then they have a couple of different values here that are also just specified.
The mode is automatic, number of speakers
If you go through the documentation, you should be able to see the description
for some of these set to zero to automatically
detect the number of speakers, which is what I've done here.
Give the job a name.
So I've just called it the "Dub for"
And then use the Mux Playback ID as the name for this dubbing job.
Do we want to watermark it? Nope.
So I put false and that's it.
So again we're going to take the Playback ID information and use it
to construct the URL for the audio file that we want to be dubbed.
And as soon as that sent off, we can now enter into this job status
checking loop here that will occasionally just check
in on the job status and determine whether or not the job is done.
Ready to go. Or if it's not, then we wait a little bit
and we go back around and check the job status again.
So I'll double click in here,
we can see that this is just again an HTTP Request node.
There is a GET method request.
Here's the API endpoint.
We're using the dubbing ID job that was returned from the previous step.
Again we are using the authentication types
and it's just a GET request to this endpoint.
Now if we were to look here we can see the GET request here.
It's matching the same you URL format.
So again we're checking is it ready.
Is it done.
Here's another if node right here.
Check the status field from what has been returned.
If it's equal to dubbed then yes the job has finished.
It's all set.
If it's not equal to that value then we go over to create this
wait node which says wait 10 seconds
and then go back here and enter this job status node once again.
So we'll exit out.
Go back and check the job status until it's finally dubbed.
If it's dubbed, then we can mark it as true and it would go off to this
final node over here to create the new audio track over on the Mux asset.
Now there's probably another exit path
that should be defined here that I haven't done for this prototype.
We don't want it to loop forever here to say, hey, it's not equal to dubbed.
Let's wait and check again when maybe the job failed for some reason.
And there's no amount of checking is going to give a successful response.
So we're checking if it's done.
If it is done, then we'll create the audio track again.
It's an HTTP request node within n8n.
and here's the endpoint here we're using now the API for Mux.
We're selecting the asset that was available
from the GET Playback ID step which was created earlier in the workflow.
Here's how we were able to find that,
that Asset ID within the JSON.
And we're making this post request to the Track’s endpoint.
Now this will create a new audio track on your Mux asset.
We have a URL, we have a language code, and we have some passthrough
which is just metadata about the way that this was created.
So that you URL is going to be another n8n endpoint,
and we'll see where this is coming from momentarily.
But just take a look that it's a webhook and we're making a dubbing request.
And in that dubbing request here we're using the Dubbing ID
and target language from the
Check Job Status step that we were using before.
So it is going to grab the Dubbing ID
and the target language and inject that into this URL as a variable.
So it's going to give us the job for the ElevenLabs,
dubbing job and the target language that was specified here,
that is just going to appear in the final URL when the request is made.
And then again down here, the language code is the same value
that was specified in that check job status execution.
So we'll be able to just inject it straight from the response
that was coming right from ElevenLabs.
So what happens here? When we make this POST request over to Mux
it's going to say, hey, add a new audio track to this asset
and use this file
that is returned from this URL as the new audio track, right?
Now, where is this URL actually going?
Because Mux is going to check this URL for the audio file and use it as the input.
That's where I've set up this second little mini
workflow down here, which is serve the audio file.
So this is another webhook trigger here
that is using these two variables, the Dubbing ID and the language code.
It's going to capture those values in the request that's made to this endpoint.
And we'll continue down
the path of this little workflow that was set up.
So anytime a request is made from Mux in this format,
it will capture the values that were specified here and make it
so that we can use them in the following steps.
So this is going to handle the request.
From here we're going to fetch the audio data.
It's an HTTP Request that's handed off right to ElevenLabs
using their, again, their fantastic API.
It's a request to the dubbing endpoint using the Dubbing ID from the params,
the variables that were set up in that URL, audio and the language code.
So if we were to go here, we can see the audio GET dubbed audio.
Here's the endpoint for it.
So we have the format up here.
We're using the Dubbing ID and the language code.
And what is returned is the file stream.
That file is basically going to be proxied
right through our handler here and responded to in that request.
So this request again is coming from Mux where it's saying,
hey, give me the audio file that you want to use as that second language track.
I'm proxying that request off to ElevenLabs to say, hey,
do you got that that audio file from that dubbing job that we just ran?
Here's the information about the dubbing job in the language that I'm looking for.
And the final step here is a response
to the webhook using a binary file type,
and the response data sources choose automatically from input.
You don't have to do anything special here.
It will just hand it off from the response in the previous step
and respond to the webhook with the appropriate file.
So this is the whole workflow.
I'm going to try to publish this.
So that it can be used and imported into your n8n instance without really
any additional configuration other than the authentication aspect.
But we should take a look at this whole thing in action and how this works.
When I upload a video file, the Mux.
So I'm going to go to assets.
I’m going to go to the correct environment.
And I want to create the new asset here I'm going to select the file
dad-joke.mp4
I’m going to call it dad-joke. I'm going to click next.
We have settings in place.
But I want to make sure that under this advanced tab again
this is where I'm going to create the static conditions.
So here in the Mux docs, it shows us how we can add static conditions
which will make the file available for retrieval - the audio file.
So I'm going to delete the “highest resolution.”
I only need an audio only resolution available statically.
So by adjusting this little payload here and introducing
this static rendition, I'm going to kick off this upload.
It will make that audio file available for n8n.
And then it will send a notification over to n8n
when the audio file is ready
so we can kick off this entire workflow, which we'll watch in real time.
So I'll go ahead and hit start upload.
We're uploading the file.
We can see it's ready to go here.
It's ready pretty fast.
I used to hate facial...
But what we don't have is the additional language tracks just yet.
And it's in progress.
We can see the static rendition was created,
and as soon as the static rendition is ready,
then it will send that webhook over to our n8n workflow.
And kick off this whole dubbing process.
So we'll keep an eye on both areas for when the static rendition is ready.
And we should start to see this job
and this workflow kick off.
There we go.
Now we can start to see,
these events are starting to come in.
Some of these are likely just webhooks that are indications
that the video is ready.
But, it wasn't related to the static rendition.
Now, this one looks like maybe it was the static rendition.
Because it's taking longer than some of these other ones
that were probably discarded right at this step.
And we can kind of click through here and see the path that it took.
So it said the static rendition was not ready.
So don't do anything here.
Up here, this was the long job in place that
we can kind of watch the path that it took.
As you see the green path here. We got the Playback ID,
we submitted the job, we started a loop here.
And when it was done, it created the audio track.
And continued down this path of serving the whole workflow that we had described.
Now it looks like this job has completed.
So I'm going to go back over to my dad joke video.
Just do a quick reload
on this page and check it out.
We have a new icon here in the corner for audio.
So not only do you have the opportunity to listen to the English track
for my dad joke, but we have the Finnish track as well.
<i>Generated Voice speaks Finnish.</i>
Pretty cool.
And I actually kind of sound like me too, which is pretty wild.
Now, here's the one caveat with all of this.
I don’t know any Finnish. And so did it get the joke right?
There's probably some context here, especially when it comes to humor
or nuance.
When you're trying to communicate
something that these AI models you need to do some sort of QA,
you need to make sure, to that to the extent that you are comfortable,
you will want to put a human in that process to ensure
that your content is meeting
the requirements for your application and for your business.
But, you know, for my case here, this is pretty neat to be able to now
suddenly apply my video to a brand new audience
that I wouldn't have been able to access before.
So there you have it.
If you have any questions
about how you might be able to use this or something similar in your application
please don't hesitate to reach out for us.
I'll make sure that I have a fresh new dad joke for you as soon as you contact.
Cheers! Have a good one.