Emotion Detection From Text

About

Detection of emotions in textual parts of social media websites such as Facebook and Twitter has applications for business development, user interface design, content creation, emergency response, among others.

Whereas Emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise. Emotion detection may also have useful applications, such as:

  • Gauging how happy our citizens are. Different indexes have different definitions; most evolve around economic, environmental, health, and social factors. Since the mid-2000s, Government and organizations around the world are paying increasing attention to the happiness index.
  • Pervasive computing, to serve the individual better. This may include suggesting help when anxiety is detected through speech, or to check the tone of an email before sending it out.
  • Understanding the consumer. Improving perception of a customer with the ultimate goal to increase brand reputation and sales.

Introduction

Long before awareness of the World Wide Web became wide-spread, people used to seek their friends’ opinions or to consult consumer reports about products or services that they want to buy. Moreover, they would make verbal surveys in order to know which candidate most of the people are planning to vote for in a local election. Lately, the Internet and the Web have made it easier to collect such data. Text is widely used in the communication between people on the web. It delivers informative content, one’s opinion, and emotional state. Microblogs allow a vast pool of people that are neither personal acquaintances nor well-known professional critics to provide their experiences, opinions, and emotions.

How

There are 6 emotion categories that are widely used to describe humans’ basic emotions, based on facial expression : anger, disgust, fear, happiness, sadness and surprise. These are mainly associated with negative sentiment, with “Surprise” being the most ambiguous, as it can be associated with either positive or negative feelings. Interestingly, the number of basic human emotions has been recently “reduced”, or rather re-categorized, to just 4; happiness, sadness, fear/surprise, and anger/disgust. It is surprising to many that we only have 4 basic emotions. For the sake of simplicity for this code story, we will use the more widely-used 6 emotions.

Challenge

One of the biggest challenges in determining emotion is the context-dependence of emotions within the text. A phrase can have an element of anger without using the word “anger” or any of its synonyms. For example, the phrase “Shut up!”. Another challenge is the difficulty that other components of NLP are facing, such as word-sense disambiguation and co-reference resolution. It is difficult to anticipate the success rate of the machine learning approach without first trying.

Data Collection

We collected our data from Twitter, the social media website. The corpus collected includes 1776 tweets from more than 200 users Tweets ranged from a one-word tweet to 140 characters tweets.

We filtered out non-Arabic tweets, retweets, tweets including photos or videos. Finally, the corpus was ready for the annotation process.

Data Annotation

We created surveys where the annotator task was to guess the emotion of the writer based on the provided tweet text.

The entire tweets collected from Twitter represent an Arabic emotion annotated tweets corpus. An average of 15 persons labeled each tweet with the corresponding emotion. The emotions provided are the six most basic emotions. We have excluded the annotated tweets with less than 50% annotators’ agreement. Finally, we constructed the Twitter corpus that consisted of 1605 tweets.

Data Preprocessing

The data preprocessing took place using five different techniques; basic preprocessing, basic preprocessing in addition to the removal of a list of stopwords, Lucene light Arabic stemmer, Shereen Khoja Arabic stemmer, and modified Khoja Arabic stemmer. The basic preprocessing includes the removal of non-Arabic letters, multiple spaces, and punctuation. The list of stopwords is composed of a standard list of standard Arabic words. We have also added to it their equivalents in the Egyptian slang dialect and some additional slang words that appeared in the collected dataset and have no emotion significance. The Lucene light Arabic stemmer eliminates the definite articles and few prefixes and suffixes only. The Khoja stemmer does all the previous functionalities in addition to reducing each word to its root; however, it handles the standard Arabic language only. We modified Khoja stemmer in order to include the Egyptian slang dialect.

Data Classification

Finally, different data classification techniques were tried out. Weka software has been used for the SMO classifier, which is a simplification of the SVM classifier, and for the NaïveBayes classifier. And, a simple search and frequency algorithm based on the extracted sample word-emotion lexicon was also created. This algorithm counts the number of each emotion related words in the tweet. Then, it decides the emotion category of the tweet based on the emotion receiving the highest count. Ten folds cross validation was applied for the learning based algorithms. A comparison between the performances of the different classification algorithms has been held. Further, we conducted a comparison between the effects of the five different preprocessing techniques. Two different environments have been tried the random tweets environment and the limited features tweets environment.

Results

Emotion detection of Arabic content by showing that emotions can be automatically detected from tweets after performing Arabic language-related language preprocessing steps. The experiment demonstrates that the preprocessing steps added to Khoja stemmer improved the classification results by 8% compared to the original Khoja stemmer performance. In addition, it has been shown that our sample word-emotion lexicon enhances the emotion detection results by 27% compared to the SMO classification using the train/test option. Finally, it has been shown that the communication style is closely related to the emotion expressed in the case of anger, disgust, fear, and happiness categories. The relationship can be thought of as a reciprocal one.

--

--