There are multiple automatic translation services among our daily tools. The longer the translated text is the more need for human finalization there is. Or is there? Microsoft Azure has AI Translator service, which needs at least 10 000 translated sentences to learn what is your organization way of translate things. I though it was just for text but it actually translates ready PDF into ready PDF with graphics in place.

Publication translated with Azure AI Translator

Publication PDF translated with Azure AI Translator

I have assignment for The Finnish Innovation Fund Sitra about artificial intelligence and Microsoft Power Platform development. Using the Azure AI Translator is one of my jobs. I decided to use custom translator since I had the base data to teach the model. I could get Finnish PDF documents directly from Sitra homepage as well as the translated ones. Not all PDF where published fully in different language, so it took some time to find about 15 publications. These publications contained more than 12k sentences and the model was ready for training.

Firstly I needed resource group and permission to create resources. I created the translator resource, then logged into, create workspace, project and started teaching. Please see quick guide for more details.

Training vs. tuning

I need first to have the base data which means at least 10 000 sentences from translated documents. I used training sets for that. Once the first version of the model is ready, I will ask Sitra communication to help me testing the language and translations and then we will use tuning and dictionary sets for finalizing the model.

Once I had set up all the document sets, it was time to train the model. Which is actually just pressing a button.

Then the actual test could be done with API or Windows App. Since the next phase is to build the solution, I chose to test with ready made translator app which will connect to my trained model.

I needed to scratch my head couple times to figure out where to get which keys or urls. Resource key is in the Azure resource keys and endspoints page. There was also the document translation endpoint which is customized with the name of the translator. Storage Connection String was bit strange. I figured out that maybe I should create a storage account and give that connections string. That worked. I have no clue why I need to do that. Checking what it saves there is quite nothing but one container $logs. Maybe it wants to save logs there.

Then I was ready to go testing my translator. I set profile for my creation, added to the Finnish PDF and couple minutes later I had translated document which uses Sitra way of telling things. I has the smae graphics. In some point there might be some mix-ups with complex graphics but mainly it looked just the same but with different language.

Pretty cool.