Nuance Translator SDK

Nuance Translator SDK is a platform for adding sophisticated speech recognition and speech synthesis (text-to-speech) capabilities to your mobile applications.

NuanceMessagingActivity made use of Nuance Translator SDK to provide ASR and TTS functionality to the messaging UI.

This guide covers the API's that can be used to add ASR/TTS functionality to NuanceMessagingActivity.

SDK also comes with WakeUp Word funtionality.

Use below bool properties to enable and configure the translator sdk features.

Name	type	default	description
translator	bool	<bool name="translator">false</bool>	Use this property to turn on translator functionality.
continuousListening	bool	<bool name="continuousListening">false</bool>	Lets you configure the way speech recognition starts listening, configuring to true makes the sdk starts listen after every agent message is played.
wakeupWord	bool	<bool name="wakeupWord">false</bool>	Use the property to turn on wake up word.
playTTSOnRestore	bool	<bool name="playTTSOnRestore">false</bool>	Lets you configure how the messages are read upon restore.
playTTSOnlyOnASRInput	bool	<bool name="playTTSOnlyOnASRInput">false</bool>	Lets you configure how the messages are read when keyboard is used instead of speech.
playOpener	bool	<bool name="playTTSOnlyOnASRInput">true</bool>	Lets you configure if opener should be played or not.
playASRSuccessChime	bool	<bool name="playASRSuccessChime">true</bool>	Lets you configure if chime sound should play after successfull listening.
playASRNoInputChime	bool	<bool name="playASRNoInputChime">true</bool>	Lets you configure if chime sound should play after a no input error.
playRecordingChime	bool	<bool name="playRecordingChime">true</bool>	Lets you configure if chime shoud should play before starting listening.

Initializing NinaMobileController
NinaServerConfigurationBuilder
Configuring NinaSetting
NinaSynthesisValuesBuilder
SessionValuesBuilder
RecognitionValuesBuilder
WakeUp word
Translator API

NinaMobileController singleton class exposes the method that is used to initalize the Translator SDK.

public void initialize(Context context, NinaServerConfiguration ninaServerConfiguration)

Context: Current context object

ninaServerConfiguration:An instance of NinaServerConfiguration object. Use the NinaServerConfigurationBuilder class to construct NinaServerConfiguration instance.

public NinaSetting getNinaSetting()

Use this method to retrieve NinaSetting instance.

NinaServerConfigurationBuilder

To create builder instance: new NinaServerConfiguration.NinaServerConfigurationBuilder()

Public Methods

Use the Builder public methods to set the configuration values that are used by the SDK to do processing.

public NinaServerConfigurationBuilder setApplicationKey(String applicationKey)

The required name of the application that will map to the server configuration, in the format CompanyName_AppName, for example "JavaBeanz_Orders"
public NinaServerConfigurationBuilder setAuthenticationType(@AuthenticationType String authenticationType)

The type of authentication.Allowed values: "jws", "hashcode"
public NinaServerConfigurationBuilder setAuthenticationData(String authenticationData)

Authentication data based on your authentication method.
public NinaServerConfigurationBuilder setVerificationCode(String verificationCode)<

The required verification code if using "verification code" as an authentication method.
public NinaServerConfigurationBuilder setGateWayAddress(String gateWayAddress)

Set the address of the server used for processing the speech and text.
public NinaServerConfigurationBuilder setGateWayPort(int gateWayPort)

Set the port of the server used for processing the speech and text.
public NinaServerConfigurationBuilder setGateWayPath(String gateWayPath)

Set the path of the server address used for processing the speech and text.
public NinaServerConfigurationBuilder setGateWayScheme(String gateWayScheme)

Set the scheme of the server address used for processing the speech and text.

NinaSetting singleton class can be used to configure translator SDK TTS and ASR functionality.

public void setSynthesisValues(SynthesisValues synthesisValues)

synthesisValues:An instance of SynthesisValues object. Use the NinaSynthesisValuesBuilder class to construct SynthesisValues instance.

public void setSessionValues(SessionValues sessionValues)

sessionValues:An instance of SessionValues object. Use the SessionValuesBuilder class to construct SessionValues instance.

public void setRecognitionValues(RecognitionValues recognitionValues)

sessionValues:An instance of RecognitionValues object. Use the RecognitionValuesBuilder class to construct RecognitionValues instance.

NinaSynthesisValuesBuilder

To Create builder instance: SynthesisValues.NinaSynthesisValuesBuilder()

Public Methods

Use the Builder public methods to update the commands that are used to play audio using text as input.

public NinaSynthesisValuesBuilder setType(String type)

The type of data you provided. If none provided, the default will be taken from your customer config file.
Default value: text
Allowed values: "text", "ssml"
public NinaSynthesisValuesBuilder setSensitivity(@Sensitivity.Type String sensitivity)

String that controls whether or not sensitive data for this command will be encrypted when logged to disk and reporting platform. Parameters description: "open" - non-encrypted, "masked" - replace everything with same character (Information will be lost forever), "encrypted" - data encrypted using a provided key
Default value: open
Allowed values: "encrypted", "masked", "open"
public NinaSynthesisValuesBuilder setStatistics(String statistics)

Whether or not to return statistics on the command's performance."true" or "false"
public NinaSynthesisValuesBuilder setVoice(String voice)

The voice in which speech synthesis is to be done. If none provided, the default will be taken from your customer config file. Note this parameter works only with type "text". To set a voice in ssml, see the example below.

SessionValuesBuilder

To create builder instance: new SessionValuesBuilder()

Public Methods

Use the Builder public methods to update the commands that are used to while establishing a session.

public SessionValuesBuilder setSpeechSynthesisCodec(@SessionValues.SpeechSynthesisCodec String speechSynthesisCodec)

The codec to be used during speech synthesis. If none provided, the default will be taken from your customer config file.
Default value: pcm_16_8k
Allowed values: "pcm_16_8k", "pcm_16_16k", "opus_nb", "opus_wb", "ulaw"
public SessionValuesBuilder setSpeechRecognitionCodec(@SessionValues.SpeechRecognitionCodec String speechRecognitionCodec)

The codec to be used during speech recognition. If none provided, the default will be taken from your customer config file.
Default value: pcm_16_8k
Allowed values: "pcm_16_8k", "pcm_16_16k", "opus_nb", "opus_wb"
public SessionValuesBuilder setClientUserId(String clientUserId)

String defined by the client application to identify itself.
public SessionValuesBuilder setClientDeviceId(String clientDeviceId)

String defined by the client application to identify the device.
public SessionValuesBuilder setClientSessionId(String clientSessionId)

String defined by the client application to identify its session.
public SessionValuesBuilder setClientOsName(String clientOsName)

String defined by the client application to identify the OS it runs on.
public SessionValuesBuilder setClientOsVersion(String clientOsVersion)

String defined by the client application to identify the OS it runs on.

RecognitionValuesBuilder

To Create builder instance: new RecognitionValues.RecognitionValuesBuilder()

Public Methods

Use the Builder public methods to update the commands that are used to recognize speech using audio as input.

public void setSpeechDetector(String speechDetector)

The speech detector to be used for voice activity detection. If none provided, the default will be taken from your customer config file.
Default value: adaptive
Allowed values: "legacy", "adaptive"
public void setBeginNoiseSampleFrames(int beginNoiseSampleFrames)

The number of frames taken at the start of recognition to assess the environment’s noise level. If none provided, the default will be taken from your customer config file.
Default value: 10
Size range: 1..231-1
public void setVoiceThreshold(double voiceThreshold)

The amount of energy require for a frame to be considered 'voiced', as a factor of the background noise level. If none provided, the default will be taken from your customer config file.
Default value: 4.0
Size range: -263..263-1
public void setStartOfSpeechVoicedFrames(int startOfSpeechVoicedFrames)

Of the most recent startOfSpeechHistoryFrames, the number of 'voiced' frames required to assess that user has started speaking. If none provided, the default will be taken from your customer config file.
Default value: 7
Size range: -231..231-1
public void setStartOfSpeechHistoryFrames(int startOfSpeechHistoryFrames)

The number of historical frames to use when assessing that the user has started speaking. If none provided, the default will be taken from your customer config file.
Default value: 15
Size range: -231..231-1
public void setEndOfSpeechVoicedFramesint endOfSpeechVoicedFrames)

Of the most recent endOfSpeechHistoryFrames, the number of 'voiced' frames which will indicate the user has not finished speaking. If none provided, the default will be taken from your customer config file.
Default value: 5
Size range: -231..231-1
public void setEndOfSpeechHistoryFrames(int endOfSpeechHistoryFrames)

The number of historical frames to use when assessing that the user has finished speaking. If none provided, the default will be taken from your customer config file.
Default value: 50
Size range: -231..231-1
public void setConsiderNegativeRatios(boolean considerNegativeRatios)

Whether or not to consider negative ratios when assessing speech activity. If none provided, the default will be taken from your customer config file.
Default value: false
public RecognitionValuesBuilder setEndPointDetection(boolean endPointDetection)

Whether or not to perform end-point detection. If none provided, the default will be taken from your customer config file.
Default value: true
public RecognitionValuesBuilder setWordStream(boolean wordStream)

Whether or not to send intermediate recognition results. If none provided, the default will be taken from your customer config file.
Default value: true
public RecognitionValuesBuilder setSensitivity(@Sensitivity.Type String sensitivity)

String that controls whether or not sensitive data for this command will be encrypted when logged to disk and reporting platform. Parameters description: "open" - non-encrypted, "masked" - replace everything with same character (Information will be lost forever), "encrypted" - data encrypted using a provided key
Default value: open
Allowed values: "encrypted", "masked", "open"
public RecognitionValuesBuilder setActiveDynamicVocabularySets(String[] activeDynamicVocabularySets)

The list of dynamic vocabulary (defined by their ids) to activate for this speech recognition command. NOTE: Do not upload more then 500 entries as this may have a noticeable performance degradation
public RecognitionValuesBuilder setStartOfSpeechTimeoutSeconds(int startOfSpeechTimeoutSeconds)

How long to wait for start-of-speech before the server cancels a recognition, in seconds. Not applicable if endPointDetection is false. Use 0 (zero) for no limit.
Default value: 5
Size range: 0..231-1
public RecognitionValuesBuilder setUtteranceMaxTimeSeconds(int utteranceMaxTimeSeconds)

The maximum allowed time for an utterance, in seconds. When endPointDetection is true, this time begins when start-of-speech is detected. Use 0 (zero) for no limit.
Default value: 30
Size range: 0..231-1
public RecognitionValuesBuilder setStatistics(boolean statistics)

Whether or not to return statistics on the command's performance.
public RecognitionValuesBuilder setNoiseLevel(NoiseLevel noiseLevel)

The specific background noise level energy values to use for speech detection


					public class Application extends android.app.Application {

    public NuanMessaging chatInstance;
    private RefWatcher refWatcher;

    @Override
    public void onCreate() {
        super.onCreate();
        refWatcher = LeakCanary.install(this);
        chatInstance = NuanMessaging.getInstance();
        chatInstance.initialize(this, "west", "ceapiClientId", "ceapiClientSecret", "bestbrands");
        initializeNinaMobileController();
    }

    private void initializeNinaMobileController() {
        NinaServerConfiguration ninaServerConfiguration = new NinaServerConfiguration.NinaServerConfigurationBuilder()
                .setApplicationKey("companyName_appName")
                .setVerificationCode("5796d9e101d5355f5dbf95a3681f6ca5317fd9e96e5ffacac5e1305361a4eb1c")
                .setGateWayScheme("wss")
                .setGateWayAddress("webapi-demo.nuance.mobi")
                .setGateWayPort(443)
                .setGateWayPath("webapi-platform/websocket")
                .build();
        NinaMobileController.getInstance().initialize(getApplicationContext(), ninaServerConfiguration);

        RecognitionValues recognitionValues = new RecognitionValues.RecognitionValuesBuilder()
                .setStartOfSpeechVoicedFrames(0)
                .setStartOfSpeechHistoryFrames(0)
                .build();
        NinaMobileController.getInstance().getNinaSetting().setRecognitionValues(recognitionValues);
        NinaMobileController.getInstance().shouldDebugLog(true);
    }
}

Changing default translator action tone

interpretation_error.wav: Replace the default recognition failed tone by adding a raw file with the same name.

start_listening.wav: Replace the default recognition started tone by adding a raw file with the same name.

successful_recognition.wav: Replace the default recognition success tone by adding a raw file with the same name.

Styling Translator Animation

Properties for changing the appearance of Translator animated icon holder.

Name	type	default	description
msg_mic_permission	string	<![CDATA[Need Mic permission, go to Setting -> Apps -> <i>Your App</i> -> permission and provide mic permission]]></td>	Sets the message that gets diplayed when micro phone permission is denied.
translatorLogBackground	color	#9000	Sets the translator icon connection status label background color
translatorTextSize	dimen	12sp	Sets the translator icon connection status label text size.

You can override the below default style classes to customize Translator animated icon Fragment.


			//Translator Container style
<style name="TranslatorContainerDefault.TranslatorContainer></style>



			//Below style class is applied to translator place holder image view, TranslatorPlaceHolder is displayed while initializing translator SDK and if permission id denied
<style name="TranslatorPlaceHolderDefault.TranslatorPlaceHolder"></style>

//Below style class is applied to the translator animatod icon holder
<style name="AnimatedIconDefault.AnimatedIconDefaultHolder"/></style>

//Below style class is applied to translator state label view like connecting and no internet state
<style name="TranslatorStateDefault.TranslatorState"></style >

//procesing text style
<style name="TranslatorProcessingContainerDefault.TranslatorProcessingContainer"></style >

Persona Animation

Persona Animation for various states are configured through a JSON file, Application can override the JSON config and customize according to App needs.

Refer the below PDF to understand the high level items in the json config. Example JSON config and images can be found in Example11.


	 //Example to change the translator persona layout to center
	 <style name="TranslatorContainerDefault.TranslatorContainer">
        <item name="android:layout_centerVertical">true</item>
        <item name="android:layout_centerHorizontal">true</item>
        <item name="android:layout_height">75dp</item>
        <item name="android:layout_width">80dp</item>
        <item name="android:layout_alignParentLeft">false</item>
        <item name="android:layout_alignParentTop">true</item>
        <item name="android:layout_gravity">center_horizontal</item>
        <item name="android:layout_above">@null</item>
    </style>

    <style name="MessagingInputContainerDefault.MessagingInputContainer">
        <item name="android:layout_height">75dp</item>
        <item name="android:layout_alignParentBottom">true</item>
        <item name="android:layout_toRightOf">@null</item>
    </style>

    <style name="TranslatorProcessingContainerDefault.TranslatorProcessingContainer">
        <item name="android:layout_below">@+id/translator_container</item>
        <item name="android:layout_toRightOf">@null</item>
        <item name="android:padding">0dp</item>
    </style>

    <style name="MessagingFooterDefault.MessagingFooter">
        <item name="android:layout_height">150dp
    </style>

Configuring WakeUp Word

WakeUp Word functionality lets customer to wake up translator engine and immediately starts listening.

<bool name="wakeupWord">false</bool>

To configure the wake up word phrase overide the default String Array resource.

< string-array name="nina_wakeup_word_phrases"> < item>hello mike< /item> < /string-array>

Gradle Settings

Following settings must be there in the gradle to not to compress language pack

aaptOptions { noCompress 'bin' }

Translator API

SDK exposes API which allows Application to programatically control the state of Translator .

Following API methods from NinaMobileController.getInstance().getObserver() lets application to registar listeners for various Translator events.

public void registerRecordingListener(RecordingListener recordingListener)

RecordingListener fires to let Application know the various state of recorder

void onRecordingStarted()

void onRecordingStopped()

void onRecordingError()
public void registerInterpretationListener(InterpretationListener interpretationListener)

InterpretationListener fires to let Application know the various state of speech interpretation

void onInterpretation()

void onInterpretationError()

void onInterpretationCancel()
public void registerEndpointingListener(EndpointingListener endpointingListener)

EndpointingListener fires to let Application know the various speech event

void onStartOfSpeech()

void onEndOfSpeech()
public void registerPlaybackListener(PlaybackListener playbackListener)

PlaybackListener fires to let Application know the various state of TTS plaback

void onPlaybackStarted()

void onPlaybackStopped()

void onPlaybackQueueEmptied()

void onPlaybackError()
public void unregisterInterpretationListener(InterpretationListener interpretationListener)
public void unregisterEndpointingListener(EndpointingListener endpointingListener)
public void unregisterRecordingListener(RecordingListener recordingListener)
public void unregisterPlaybackListener(PlaybackListener playbackListener)

Following API methods from NinaMobileController.getInstance() lets application to trigger Translator functionality.

public void startListening()

SDK will start listening to the speech input, Listeners from the above section will be fired to notify the app about the various state of the recognition
public void stopListening()

SDK stop listening to the speech,onInterpretation event will be fired with the final recognized text if any
public void cancelListening()

SDK cancel listening to the speech.
public void playPrompt(String prompt)

SDK request for a TTS play back with the given string input. onPlaybackQueueEmptied event will be fired when all queued prompts has finished playing.
public void stopPrompts()

SDK trys to cancel the prompt providely there is a prompt being played or queued.
public boolean isPlaying()

Check TTS state with the Translator SDK.
public boolean isPlaying()

Check TTS state with the Translator SDK.