Phase 2 - Translating Transcription Messages

Added json-message extensions and translated the transcription messages by working with Jigasi - the SIP component of Jitsi

Here’s a brief summary of my work in the 2nd phase of GSoC 2018:

What did I work on?

Json Message Extension

I continued working on a Message stanza extension element for sending the required json-messages in a different packet instead of sending it in the body of the message from Jigasi. Now, the transcription messages being sent look like this :

<message ...>
  <json-message ...>
    {
      "type":"transcription-result",
      "transcript":[
          {
            "confidence":0.0,
            "text":"this is an example Json message"
          }
      ],
      "is_interim":false,
      "language":"en-US",
      "message_id":"14fcde1c-26f8-4c03-ab06-106abccb510b",
      "event":"SPEECH",
      "participant":{
          "name":"Nik",
          "id":"d62f8c36"
      },
      "stability":0.0,
      "timestamp":"2017-08-24T11:04:05.637Z"
    }
  </json-message>
</message

This required the following changes in Jitsi and Jigasi

Sending json-messages from the front end components required minor changes in lib-jitsi-meet

Translation Layer

I continued with adding an abstract translation layer for server side translations in Jigasi. An abstract translation service was added to use any required translation services and GoogleCloudTranslate was implemented.

A TranslationManager which implemented a TranscriptionListener was used to keep a count of the required languages for translation as per the participants in the conference. Once the TranslationManager is notified of a final TranslationResult, it uses then TranslationService to get translations in all the required languages. We do not translate the interim messages because the it would lead to higher costs and we need the full context of a sentence for translations. All the translated results are then notified to the list of TranslationResultListeners.

We do not publish the translation results in the Chatroom as it will flood it with translation results in different languages. We send the json-messages of the type translation-result which can be parsed to show only the required results in the front-end. This json looks like this :

<message ...>
  <json-message ...>
    {
      "type":"translation-result",
      "text":"नमस्ते आप कैसे हैं?",
      "is_interim":false,
      "language":"hi-IN",
      "message_id":"14fcde1c-26f8-4c03-ab06-106abccb510b",
      "event":"SPEECH",
      "participant":{
          "name":"Praveen",
          "id":"d62f8c36"
      },
      "stability":0.0,
      "timestamp":"201-07-10T11:04:05.637Z"
    }
  </json-message>
</message

The next task was to send the preference of target language from the front-end. After discussions, we decided to use the presence stanza to send the language preference to Jigasi. This can be updated from the developer console in the browser as of now with : APP.conference._room.setLocalParticipantProperty('translation_language','hi');

This triggers JvbConference#memberPresenceChanged. We parse the target language using a custom presence stanza extension in Jitsi and set this language preference for the participant with the given id and is added to the map of languages in TranslationManager.

This was enabled in the following PRs :

What am I currently working on ?

I am currently working on using the json-messages of type translation-result received in Jitsi-Meet to display the final results as subtitles only in the specified language.

Future Work

  • The souce language for transcription is harcoded to en-US as of now. We initially decided to send this language with the dial made for the transcriber in the iq stanza but now will be sent in a similar manner as that of the target language as it will allow us to set different source languages for each participant.
  • The language preferences are set from the console as of now. UI elements to select the source and target languages from a list is to be designed.