Making the current COVID-19 results more accessible to foreigners
When this article is being written in the first week of December 2020, we are in the middle of the Corona Virus' second wave. This is usually a very stressful time for everyone, but especially for those living abroad. I am currently in Hungary, and here the current regulations are only published in Hungarian.
For this project, I have decided to translate and verbalize the regulations in English for all the foreigners and maybe visually impaired people here in Hungary who are very lost right now.
Here is how I’m going to do that.
There is an official website in Hungarian about what the current regulations are in the country. This page gets updated when new regulations are published. This is useful since whenever new regulations get published, we only need to rerun the code and share the same information again.
From there, I will scrape the information using the Chrome Selector extension. That way, I will get the text from the website that I can work with later in R. I will be adding the code for this in the body of this article.
Once I have the information in R, I can use AWS Translate and AWS Polly to translate and verbalize these regulations. I can then upload the results to GitHub to share them with the foreigners in Hungary and keep them updated.
What are these tools?
If you have not heard about Amazon Web Services’ (AWS) Translate or Polly tool, let me introduce them to you shortly.
This tool is exactly what it sounds like. It takes a text and translates it into another language. You can get access to it through AWS or automatize this process through R. It currently supports 113 languages and can be easily applied to text files, especially if they are already stored in the AWS environment. Companies use it for tasks such as customer service support via text in other languages, understanding comments left online in foreign languages, etc.
I will be using the AWS Translate implementation in R. This specific tool will help me make this task simple and repeatable. Therefore if regulations change, I can easily rerun and get the new regulations in English.
This tool allows us to verbalize text. Visually impaired people often use these kinds of software to read documents or navigate electronic devices. They currently have 47 voices and 27 languages and are working on more. You can access Polly through the AWS website or use its integration in R. I will be using the latter to help me verbalize the regulations I find in English.
Retrieving the data
I will be using the official website of the Hungarian Government for the Corona Virus. You can find that website here. The “Aktuális” page is all about the current regulations. This page gets updated when new regulations get published. I will be scraping the current regulations on November 30th of 2020.
Now you have a list with all 33 regulations that are currently in place in Hungary.
Accessing AWS through R
To access AWS through R, I need to create an access key on the AWS platform. This will be used as a token for R to know I am allowed to use these services. This tab is under User in IAM. When I created my AWS security guard should look something like this. I have blacked out all sensitive information.
Once I have this toke, I can use the R code and give it access to AWS.Use this code to load the access key into R.
Carry out the translation
Now that you have your data in R, I can start the translation.
First, I will check the length of my strings in the list which contains the text. Since the AWS Translate can only handle 5000 characters, I need to create strings that do not exceed that length. This will involve some concatenation of the text, and I will separate again later. For this, I will use the paste0() function. I am using a special character to collapse to be able to split it into the different regulations again later.
Once I have these smaller strings, I move on to translations. This is relatively simple because we did all the preparations. The translations will be done in three parts.
Once we have the translations, we will combine them into one block of regulations. Then we will split this big block into the 33 regulations again using the special character we used when collapsing. This will give us a table that lists the 33 regulations. To do all this, you can use the code below.
In the end, I have decided to export the file as a .txt since it will be easiest to share with the community. You can find that .txt file here on my GitHub Repository.
Vocalize it using Polly
Now that we have the regulations, we can also use AWS Polly to vocalize it for anyone with visual impairments. This process will create an audio file that can be exported and shared along with the text file.
For this, I loaded the library and then showed the list of speakers available. I have decided to test one male and one female speaker to see what I preferred. After the test, the male voice seemed calm and more soothing because of its deep tone.
I ran into a problem when trying to verbalize the final regulations document. Since Polly can only verbalize 1500 characters at once, I would have to verbalize each section separately. Ultimately, I decided not to do that, but I left the code for Polly because I think it is interesting to see. Of course, you could use the code we used to create a sample and run it 33 times, each time on a different element of the list.
Overall, AWS Translate worked very well, translating the regulations. It could also be beneficial for international communities in Hungary to use whenever new regulations came out. The code can also be adapted to other countries very easily, as long as their language is supported by AWS Translate (find the list of supported languages here).
I have pushed the final text to GitHub, which you can find here. Further, I will rerun the exercise after the 10th of December when there will be an announcement about the updated regulations. If you require a translation of those, you can save this article or star my GitHub repository and check back after the announcement.