HomePublicationsProjectsOur TeamContact

Self-Training for SA on CS Data using Parallel Processing

By Muhammad Umar Salman
Published in Sentiment Analysis
September 18, 2022
1 min read
Self-Training for SA on CS Data using Parallel Processing

Code-Switching is the phenomena where a speech or text contains more than one language in a single sentence or conversation. Learning representations for such code-switched text has become a crucial area of research to support a greater variety of language speakers in natural language processing (NLP) applications. Although, a lot of research has been done in multilingual NLP, there are few that focus on the issue of code-switching and its applications. The application we are interested in is Sentiment Analysis, which is widely used in both industries and research which is why we opted for it. It’s use cases can be found in online shopping reviews, twitter feeds as well as product feedback etc. There is also a need for businesses to quickly grasp statistical data on what sentiment is more prevalent among the general population. However, due to the scarcity of available code-switched data we look into more advanced techniques rather the old-fashioned and convenient supervised-learning approach. Here we will look into techniques such as self-supervised and unsupervised learning to train a model to detect sentiment.

Resources

With the rise in new and innovative models that require heavy use of GPU resources, there has become a large gap between the research and the methods used in the industry. Models like BERT-Large etc. require a heavy GPU resource just to fine-tune it and requires multiple larger GPUs to train such a model from scratch. However, most business cannot afford these GPU requirements and are forced to settle with older, less compute intensive models which yield low accuracies. To solve this problem, we introduce the phase of parallel processing where we train our model in a distributed manner on multiple Linux instances.


Tags

sentimentcode-switching
Previous Article
AraDiaWER: An Explainable Metric For Dialectical Arabic ASR
Muhammad Umar Salman

Muhammad Umar Salman

MSc Natural Language Processing

Related Posts

Emotion Analysis of Arabic Tweets using Deep Learning Approach
October 16, 2022

Quick Links

About UsContact Us

Social Media