Opinion Mining & Sentiment Analysis

Data Mining and Analytics is a field to seek answers from an ocean of data. One of the interesting field of mining is Text Mining. As I started exploring that field, I looked for the inspiration behind Opinion Mining and Sentiment Analysis. This blog is a quick introduction to the Text Mining topic with W-H questions answered.

What is Opinion Mining?

Opinion Mining is related to mining human generated data. Humans are like sensors indicating their opinions when they use a product or watch a movie or use a service. The output of these human generated sensors is unstructured data and may be in the form of video data, audio data, or text data. Such an opinion mining can be subjective to one’s perspective of analysis or interpretations of the unstructured data. Opinion provider is one person and who is interpreting the service/object experience and providing a feedback. Then there is one more person, the text miner/ data analyst who would be interpreting these opinions. Both the parties are working upon interpretations and sharing their understanding and inferences subjectively. This indicates that everything in opinion mining is subjective and hence nothing can be factually called as right or wrong.

What is it that we want to understand?

The basic questions that pop up often are

Who is talking about the product?
What is that product?

As we seek answers to these questions, the curiosity leads to few more questions-

What is the opinion?
What is the background under which this opinion was expressed?
Is it good for the product? Is it positive or negative?

How easy is this task of opinion mining?

Well, sometimes we readily have access to the information like who is talking about the product. But at times, we may only have the text passage and then the opinion holder and the target product is hidden in the text (may be the passage refers indirectly to a government personnel and the opinion holder is from opponent government party!). It would certainly involve information deduction from the passage. Also, the problem may get a little complex when opinion provider is a group than an individual, target may be someone else’s opinion or a set of products than a single entity, or the opinion text or context is highly complex.

Why Opinion Mining?

Some reasons that intuitively come forward are-

To make better and improved decisions
To understand people
To improve and make targeted advertising
For Business Intelligence
For Market Research
For any other research…

What is Sentiment Analysis?

Basically, it’s a classification problem! Very often, we already know the opinion bearer, opinion target/product, the context of the opinion and the context. Only thing left is analyzing the sentiments! So, the input is the text data and output is a sentiment. However, there could be 2 types of analysis here. a. Polarity Analysis – positive, negative, neutral or rank ordered categories like 1, 2, 3, 4 b. Emotion Analysis – sad, happy, angry, scared, disgusted but either way, it is sentiment classification problem.

How do we do this analysis?

Feature Identification comes here for rescue. It is the most complex step and identification of right features can make a huge difference. It is said that Natural Language Processing is an amazing tool for identifying right features but could lead to overfitting.

Some of the features that are commonly used are –

Character n-grams – n could be any number that analyst finds relevant. This represents characters allowed for analysis
Word n-grams – This represents total words allowed
Parts of speech(POS) tag n-grams – This refers to adjective, noun, verbs etc. allowed
Word classes – It could be Syntactics like POS tags, semantics like thesaurus, or some other word clusters
Word Patterns – It represents frequently used word patterns
Sentence Patterns – They are specific set of repeating sentences

Choosing the right feature could be a tough call! Just to elaborate, selection of text features depends on the purpose of your mining task. For example, if a data analyst aims to classify text as positive or negative, unigram (1-gram) word feature would be a bad feature. Let’s say there are 2 sentences, ‘I love my iPhone.’ and ‘I don’t love my iPhone as much as I love my MacBook.’ If the unigram feature was selected and ‘love’ was the unigram in consideration, we would end up classifying both the sentences as positive for iPhone with respect to positive-ness indicated by the unigram ‘love’ even when the second sentence is a negative one.

How do we define machine learning process for Sentiment Analysis?

The steps are like those of the rest of the machine learning problems.

Select a set of features that as an analyst & domain expert you believe are appropriate
Train the features on your data
Validate the features on new data and modify the model features based on the errors

Comments

Leave a Reply Cancel reply