Phase 2 - Translating Transcription Messages
Here’s a brief summary of my work in the 2nd phase of GSoC 2018:
What did I work on?
Json Message Extension
I continued working on a Message stanza extension element for sending the required json-messages in a different packet instead of sending it in the body of the message from Jigasi. Now, the transcription messages being sent look like this :
<message ...>
<json-message ...>
{
"type":"transcription-result",
"transcript":[
{
"confidence":0.0,
"text":"this is an example Json message"
}
],
"is_interim":false,
"language":"en-US",
"message_id":"14fcde1c-26f8-4c03-ab06-106abccb510b",
"event":"SPEECH",
"participant":{
"name":"Nik",
"id":"d62f8c36"
},
"stability":0.0,
"timestamp":"2017-08-24T11:04:05.637Z"
}
</json-message>
</message
This required the following changes in Jitsi and Jigasi
Sending json-messages from the front end components required minor changes in lib-jitsi-meet
Translation Layer
I continued with adding an abstract translation layer for server side translations in Jigasi. An abstract translation service was added to use any required translation services and GoogleCloudTranslate
was implemented.
A TranslationManager
which implemented a TranscriptionListener
was used to keep a count of the required languages for translation as per the participants in the conference. Once the TranslationManager
is notified of a final TranslationResult
, it uses then TranslationService
to get translations in all the required languages. We do not translate the interim messages because the it would lead to higher costs and we need the full context of a sentence for translations. All the translated results are then notified to the list of TranslationResultListener
s.
We do not publish the translation results in the Chatroom as it will flood it with translation results in different languages. We send the json-messages of the type translation-result
which can be parsed to show only the required results in the front-end. This json looks like this :
<message ...>
<json-message ...>
{
"type":"translation-result",
"text":"नमस्ते आप कैसे हैं?",
"is_interim":false,
"language":"hi-IN",
"message_id":"14fcde1c-26f8-4c03-ab06-106abccb510b",
"event":"SPEECH",
"participant":{
"name":"Praveen",
"id":"d62f8c36"
},
"stability":0.0,
"timestamp":"201-07-10T11:04:05.637Z"
}
</json-message>
</message
The next task was to send the preference of target language from the front-end. After discussions, we decided to use the presence stanza to send the language preference to Jigasi. This can be updated from the developer console in the browser as of now with : APP.conference._room.setLocalParticipantProperty('translation_language','hi');
This triggers JvbConference#memberPresenceChanged
. We parse the target language using a custom presence stanza extension in Jitsi
and set this language preference for the participant with the given id and is added to the map of languages in TranslationManager
.
This was enabled in the following PRs :
What am I currently working on ?
I am currently working on using the json-messages of type translation-result
received in Jitsi-Meet
to display the final results as subtitles only in the specified language.
Future Work
- The souce language for transcription is harcoded to
en-US
as of now. We initially decided to send this language with the dial made for the transcriber in theiq
stanza but now will be sent in a similar manner as that of the target language as it will allow us to set different source languages for each participant. - The language preferences are set from the console as of now. UI elements to select the source and target languages from a list is to be designed.