Out-Domain Utterance Detection For Bixby

Authors: Jeevan Kumar, Shreyas Acharya, Angel Paul, Ananya Muralidhar, Harsh Dutta Tewari

Affiliation: Ramaiah Institute of Technology

Company: Samsung R & D Institute India

Role: Research and Development Intern

Duration: Oct 2021 - Apr 2022

Summary

In the realm of dialog systems, distinguishing in-domain (ID) sentences from out-of-domain (OOD) ones is crucial for a seamless user experience. This research tackles the challenge of training models when only ID sentences are available, necessitating a solution for unbiased OOD sentence classification. Initially, multi-class models were employed, using solely in-domain datasets for training. Unclassified utterances were labeled as out-domain. However, as the number of classes grew, this approach encountered challenges. To address this, a shift was made towards a binary classification model.

To bolster generalization performance, a novel approach was introduced. Deep learning classification models were proposed, leveraging in-domain datasets to classify utterances as either in-domain or out-domain. Data transformation techniques, such as OneHot embedding, Glove, BERT, and Word2Vec, were used to convert the data into N-dimensional vectors. The research compared the performance of various multi-class classification models like LSTM and CNN, along with binary classification models including LSTM-Autoencoder, Bidirectional LSTM, One-class SVM, GAN, and others. To ensure a 75% confidence level, a minimum threshold for predicted softmax probability values was set for each instance.

Ultimately, the LSTM-Autoencoder model emerged as the optimal binary classification method, consistently achieving the highest accuracy across all tests.

Keywords: Out-domain Utterance, In-domain Utterance, LSTM Autoencoder, BERT, Glove, Word2Vec, GAN, Bidirectional LSTM

Bidirectional LSTM Model

System Architecture

Reflection:

This research is a crucial step in improving dialog systems, focusing on differentiating between in-domain and out-of-domain utterances. The challenges faced during the research included the selection of appropriate datasets and developing a model that effectively uses available in-domain data for training. This research aligns with broader goals to enhance natural language understanding and processing, contributing to advancements in user experience in dialog systems.

Audience Relevance:

This research is highly relevant to developers and researchers working on dialog systems and natural language processing. The methodology and findings can be applied to improve the accuracy and efficiency of out-domain utterance detection in various dialog systems, enhancing the overall user experience.

Further Research:

- Explore adaptive thresholding and incremental learning to enhance model adaptability.

- Investigate advanced autoencoder architectures and cross-domain transfer learning for improved feature extraction, classification, and rapid adaptation to changing environments.

- Address privacy and ethical considerations while conducting real-world deployment testing and collaborating for expanded datasets.