Text sentiment analysis methods
Introduction to text sentiment analysis 
Inputting a text,
                                    and then the electronic system automatically feeds you what kind of sentiment
                                    orientation the text has, whether it is positive or negative, this is text sentiment
                                    analysis, also known as Opinion Mining. It refers to the process of collecting,
                                    processing, analyzing, summarizing and reasoning about subjective text with emotion,
                                    which involves various research fields such as artificial intelligence, machine
                                    learning, data mining and natural language processing. 
Text sentiment
                                    analysis is an important branch in the field of natural language processing, which
                                    is widely used in public opinion analysis and content recommendation, etc. It is a
                                    hot research topic in recent years. According to the different methods used, they
                                    are classified into sentiment analysis methods based on sentiment lexicons,
                                    sentiment analysis methods based on traditional machine learning, and sentiment
                                    analysis methods based on deep learning. 
1. Introduction of
                                        lexicon-based sentiment analysis methods 
The method based on
                                    sentiment lexicons refers to the division of sentiment polarity under different
                                    granularity based on the sentiment polarity of sentiment words provided by different
                                    sentiment lexicons.
                                    
                                
Firstly, the text is input and pre-processed through the data (including denoising,
                                    removing invalid characters, etc.), followed by word separation operation, then the
                                    words of different types and degrees from the sentiment lexicons are put into the
                                    model for training, and finally the sentiment types are output according to the
                                    sentiment judgment rules. 
Most of the existing sentiment lexicons are
                                    constructed manually, and according to the different granularity of division, the
                                    existing sentiment analysis tasks can be classified into word, phrase, attribute,
                                    sentence, chapter and other levels. 
Manual construction of sentiment
                                    lexicons is costly and requires reading a large amount of relevant materials and
                                    existing lexicons, summarizing words containing sentiment tendencies by summarizing
                                    them and labeling them with different levels of sentiment polarity and intensity.
                                    
Advantages and disadvantages:  
The sentiment
                                    lexicon-based approach can accurately reflect the unstructured features of the text
                                    and is easy to analyze and understand. In this method, the sentiment classification
                                    effect is more accurate when the coverage and accuracy of sentiment words are high.
                                    
However, this method still has some defects. 
The sentiment
                                    classification method based on sentiment lexicons mainly depends on the construction
                                    of sentiment lexicons, but due to the rapid development of the network at this stage
                                    and the speed of information update, there are many new words on the network, and
                                    the recognition of these new words does not work well, and the existing sentiment
                                    lexicons need to be continuously expanded to meet the needs. 
The same
                                    sentiment word in sentiment lexicons may express different meanings at different
                                    times, in different languages or in different domains, so the method based on
                                    sentiment lexicons is not very effective in cross-domain and cross-language.
                                    
When using sentiment lexicons for sentiment classification, the semantic
                                    relationships between contexts are often not considered. 
Therefore more
                                    scholars are needed to conduct sufficient research on sentiment lexicon based
                                    methods. 
2. Introduction of traditional machine learning-based
                                        sentiment analysis methods 
Machine learning is a learning
                                    method that trains a model from given data and predicts the results by the model.
                                    This method has been studied so far and has achieved many effective results.
                                    
Machine learning based sentiment analysis method refers to the extraction of
                                    features through a large amount of labeled or unlabeled corpus, using statistical
                                    machine learning algorithms, and finally outputting results in sentiment analysis.
                                
                                    
                                
Machine learning based sentiment classification methods are divided into three main
                                    categories: supervised, semi-supervised and unsupervised methods.
                                    
In the supervised methods, different sentiment categories can be classified
                                    by giving a sample set with emotional polarity. The supervised methods are more
                                    dependent on data samples and spend more time on manual labeling and processing of
                                    data samples. The common supervised methods are KNN, Naive Bayes and SVM. 
In
                                    semi-supervised methods, the text sentiment classification results can be
                                    effectively improved by feature extraction from unlabeled text, and this method can
                                    effectively solve the problem of sparse data sets with labeling. 
In
                                    unsupervised methods, unlabeled text is classified based on the similarity between
                                    texts, and this method is less used in sentiment analysis. 
Advantages
                                        and disadvantages: 
Traditional machine learning-based sentiment
                                    classification methods mainly focus on the extraction of sentiment features and the
                                    combination of classifiers, and the combination of different classifiers has a
                                    certain impact on the results of sentiment analysis. These methods often cannot make
                                    full use of the contextual information of the text, and have the problem of ignoring
                                    the contextual semantics when analyzing the text content, so their classification
                                    accuracy is affected. 
3. Introduction of deep learning-based
                                        sentiment analysis methods 
The sentiment analysis methods based
                                    on deep learning are performed using neural networks, and the typical neural network
                                    learning methods are: Convolutional Neural Network (CNN), Recurrent Neural Network
                                    (RNN), Long Short-Term Memory (LSTM) and so on. 
By subdividing the deep
                                    learning-based sentiment analysis methods, they can be divided into: single neural
                                    network sentiment analysis methods, hybrid (combined, fusion) neural network
                                    sentiment analysis methods, sentiment analysis by introducing attention mechanism
                                    and sentiment analysis using pre-trained models. 
1. Single neural network
                                        sentiment analysis: 
In 2003 Bengio et al. proposed a neural network
                                    language model, which uses a three-layer feedforward neural network to model the
                                    language. The neural network mainly consists of an input layer, a hidden
                                        layer, and an output layer. 
Each neuron in the input layer of
                                    the network represents a trait, the number of hidden layers and hidden layer neurons
                                    are set manually, and the output layer represents the number of categorical labels,
                                    a basic three-layer neural network is shown below.
                                    
                                
The essence of the language model is to predict the content of the next word
                                    based on the contextual information without relying on the manually labeled corpus,
                                    from which it can be found that the advantage of the language model is the ability
                                    to learn rich knowledge from the large-scale corpus. 
This approach
                                        can effectively solve the problem of ignoring contextual semantics in
                                        traditional sentiment analysis-based methods. 
2. Sentiment
                                        analysis by hybrid (combined, fused) neural networks: 
In addition to
                                    the research on approaches to single neural networks, a number of scholars have
                                    combined and improved these approaches and used them in sentiment analysis after
                                    considering the advantages of different approaches. 
Compared with sentiment
                                    analysis methods based on sentiment lexicons and traditional machine learning, the
                                    approach using neural networks has significant advantages in text feature learning,
                                    which can actively learn features and actively retain information about words in the
                                    text to better extract the semantic information of the corresponding words to
                                    effectively achieve sentiment classification of text. 
As the concept of deep
                                    learning was proposed, many researchers have continuously explored it and got a lot
                                    of results, so the text sentiment classification methods based on deep learning are
                                    expanding. 
3. Sentiment analysis with the introduction of attention
                                        mechanism: 
Based on neural networks, in 2006, Hinton et al.
                                    pioneered the concept of deep learning to improve the performance of learning by
                                    learning key information in the data through deep network models to reflect the
                                    characteristics of the data. 
Deep learning-based methods use continuous,
                                    low-dimensional vectors to represent documents and words, and thus can effectively
                                    solve the problem of sparse data. In addition, deep learning-based methods are
                                    end-to-end methods that automatically extract text features and reduce the
                                    complexity of text construction features. 
Deep learning methods have made
                                    significant progress in the field of natural language processing, such as machine
                                    translation, text classification, and entity recognition, in addition to remarkable
                                    results in the fields of speech and image. The research on text sentiment analysis
                                    methods belongs to a small branch of text classification. 
By adding
                                    attention mechanism to deep learning methods for sentiment analysis tasks, it can
                                    better capture contextually relevant information, extract semantic information and
                                    prevent the loss of important information, which can effectively improve the
                                    accuracy of text sentiment classification. 
The current stage of research is
                                    more about fine-tuning and improving the pre-training model so as to enhance the
                                    experiments more effectively. 
4. Sentiment analysis using pre-trained
                                        models: 
A pre-trained model is a model that has been trained with a
                                    dataset. By fine-tuning the pre-trained model, better sentiment classification
                                    results can be achieved, so most of the latest methods use pre-trained models, and
                                    the latest pre-trained models are: ELMo, BERT, XL-NET, ALBERT, etc. 
By
                                    making full use of the large-scale monolingual corpus compared with the traditional
                                    methods, the pre-training method using language models can model multiple meanings
                                    of a word, and the process of pre-training using language models can be regarded as
                                    a sentence-level contextual word representation. 
By pre-training a
                                    large-scale corpus using a unified model or adding features to some simple models,
                                    good results have been achieved in many NLP tasks, indicating that this approach is
                                    significantly effective in alleviating the problem of reliance on model structure.
                                    
There will be more research on natural language processing tasks in the
                                    future, especially on sentiment mining of text. Most of the latest approaches to
                                    sentiment analysis are based on fine-tuning of pre-trained models and have achieved
                                    good results. 
Therefore, it can be predicted that future sentiment
                                        analysis methods will focus more on researching deep learning-based methods and
                                        achieving better sentiment analysis results by fine-tuning the pre-training
                                        models. 
Conclusion 
Through the
                                    introduction of the previous articles, we can predict that the use of deep learning
                                    for sentiment analysis is a future research trend in the field of natural language
                                    processing, where the scale of text data is expanding. From the development trend of
                                    different methods, future research on text sentiment analysis needs to focus on the
                                    following aspects: 
1. By comparing different research methods, we can find
                                    that the existing research methods for sentiment analysis are mostly based on a
                                    single domain, such as social media Twitter, hotel reviews, etc.. In personalized
                                    recommendation, how to combine the content of multiple domains, perform sentiment
                                    classification, achieve better recommendation effect, and achieve in improving the
                                    generalization performance of the model are all worthy of future research and
                                    exploration. 
2. Most of the research on sentiment analysis is mostly used
                                    for explicit text sentiment classification problems, using data sets containing
                                    obvious sentiment words, while the detection and classification of certain implicit
                                    words is not effective. At this stage, the research on implicit sentiment analysis
                                    is still in the initial stage and not very adequate. In the future, better sentiment
                                    classification can be achieved by building an implicit sentiment lexicon or by using
                                    better deep learning methods to extract semantic related information in a deeper
                                    way. 
3. Research on sentiment analysis of complex utterances needs to be
                                    further improved. When online phrases with sentiment tendency appear more and more
                                    frequently, especially when the text contains ironic or metaphorical words, the
                                    detection of sentiment polarity will be difficult, which also needs further
                                    research. 
4. Multimodal sentiment analysis is also a recent research
                                    hotspot. How to extract and fuse the sentiment information in multiple modalities is
                                    the main research direction. When the sentiment expressions in multiple modalities
                                    are inconsistent, how to weight the sentiment information in different modalities
                                    also needs to be considered; and whether external semantic information can be
                                    considered, and whether it is helpful to the accuracy of sentiment analysis, also
                                    needs to have a lot of research. 
5. In the sub-task of sentiment analysis,
                                    it can also be found that most of the research is based on simple binary sentiment
                                    analysis, and achieving multi-categorization and more fine-grained sentiment
                                    analysis is also a hot topic for future research. 
6. Pre-training model is a
                                    hot research topic at this stage. It can effectively solve the problems of
                                    traditional methods, such as the limitation of not being able to parallelize the
                                    computation, and can also effectively capture the interrelationship between words
                                    and achieve better results in downstream tasks by fine-tuning. However, it also
                                    suffers from the problem of large number of model parameters and long training time.
                                    How to achieve good classification results with a small number of model parameters
                                    and effectively shorten the training time would also be a direction worth studying.