TABLE OF CONTENT
1. Overview2. Translating the SRT files in different languages Set up a Trigger for S34. Translating the SRT, and Storing the files on S35. Lambda Code6. Conclusion7. CloudThat 8. FAQs
Overview
Streaming audio or video content is a great way to entertain, share information, and engage users. Every organization has a large collection of audio or videos with captions and subtitles. To make these videos and audio more accessible to more people, subtitles and captions can be translated in multiple languages. This blog will show you how to use Amazon Translate for an automated flow that automatically translates subtitles and captions without losing their context.
Subtitles and captions allow people with hearing impairment to access the video or audio. They also make it easier for users to use the video in quiet and noisy environments and support non-native speakers. Captions and subtitles are typically rendered in WebVTT (.vtt), or SRT (.srt). SubRipSubtitle is the most popular file format for captions and subtitles. WebVTT stands to Web Video Text Track, and is rapidly becoming a popular file format for the same purpose. This blog will discuss translating SRT files into other languages.
Translating the SRT Files into Different Languages
Amazon Translate is a neural translation service that delivers high-quality, customized, and affordable language translations. Neural machine translation, a type of automated language translation, uses machine learning models to produce more accurate and natural sounding translations than traditional rule-based translation algorithms.
Amazon Translate allows you to create local content, such as apps and websites, for different users, easily translate important texts for analysis, and effectively facilitate interaction between users.
This article will convert data from a text file to different languages. S3 triggers will be used to automate translations from start to finish. This article will provide a detailed overview.
Create a Lambda Role with access to the S3, Cloud Watch, Amazon Translate and Amazon Translate services
Amazon will allow you to create an S3 bucket to be used as an input bucket and an output bucket
Create a Lambda function using Python Run time that will extract caption text from a WebVTT file or SRT file and create an HTML tag-delimited text file.
Delimited text refers to removing the timestamps from the SRT files and converting them into regular text
Next, we translate the delimited file into multiple languages
After the translation is complete, we create the SRT files by using the translated file delimited and adding the timestamp.
Set up a Trigger for S3
Click on the Add Trigger’ option on the lambda. Select ‘S3’ as a source and the Event Type as ‘PUT. The prefix is the folder, and the suffix the file type. Our Lambda will only be activated when the file is uploaded to “input” folder.
Translating the SRT, and Storing the files on S3
Configuration Settings allows you to increase the Lambda timeout. It is default set to 3 seconds
Next, we will import the required libraries such as boto3 or webvtt.
This code reads Event and retrieves the Event’s Bucket Name & File Name.
To get the object, we use the API “get_object”. To download the file, you can also use “download_file API”.
Next, we decode the encoded Data to get the actual File data
Here, the SRT is considered to be in “English” Language. To automatically detect the text, you can call the ‘Amazon Comprehend API’
Now we will translate the “English” SRT file into different languages like Hindi, Marathi and Tamil
Amazon Translate supports 75 Languages, so you can modify code according to your needs
Amazon Translate doesn’t support SRT files