border border

KLASS_ Topic detection and Comment Analysis on Social Media Platforms

Team members

Chen Pengdan (ESD), Li Jiaqi (ESD), Feng Han (ESD), Dong Jiajie (ISTD), Jin Ziqi (ISTD), Lu Qianxi (ISTD), Xiong Maihe (ISTD)

Instructors:

Matthieu De Mari, Ying Xu

Writing Instructors:

Grace Kong

Teaching Assistant:

Esra Oymak

client logo

Imagine that if you are an advertiser or a content creator.

You want to know what the popular stuff people are talking about and people's attitudes towards them so that you can design your product, ads or content accordingly.

What will you do?

Our models can help!

In our project, we use Reddit Singapore as the platform to test our solutions.

 

 

Problem Statement
PS

 

Here we have our problem statement.

We build models to detect hot trends on Reddit Singapore and understand people's attitudes towards the trending topics by doing sentiment analysis on the comments.

 

 

 

Overall Structure

img2

The flow on the left is the overall structure of our solutions.

First, we input the posts from Reddit into the topic modeling model. The output is the hot topics detected from the posts together with the respective topic descriptions. 

Then the comments are grouped by the topic they belong to and taken as the input of the comment analysis model. The model extracts multiple objects being discussed in the comments and conducts sentiment analysis on them. 

Data Structure

Here is the data structure for both posts and comments.

Our data was collected by using the API of Reddit, including posts and the comments under each post. 

 

20220803183549

 

 

TM

 

 

Detect hot topics on Reddit Singapore. Take the posts as input and the model groups posts with similar content into clusters. Then we obtain the trending topics with keywords and popularity. 

 

Trending Topics

img5

 

 

The table on the left is a brief summary of the trending topics.

The "Topic" column is the index of each hot topic. Topic "-1" means the cluster of outliers. Topics are sorted by the number of posts on each topic. Then the "Name" column contains the top four keywords representing each topic.

 

 

Visualizations

 

The picture on the right illustrates one example of the hot topics with the representing keywords together with their word scores. The higher the word score is, the more likely the keyword can represent the topic.

There are three key elements.

Topic: The number of topics generated is automatically decided by the algorithm which assigns numbers to the topics starting from 0.  Here is one example.

Keywords: Words with the highest frequencies appeared in this topic, some topics can be directly inferred by them.

Word Score: Word frequency in the clustered posts.

 

 

Topic example

 

 

img6

 

The image on the left illustrates both the number of comments and the number of posts on each topic.

The popularity of each topic can be found by the total number of comments under the posts on this topic. 

 

 

 

 

 

Analyze people's attitudes towards the objects being discussed under each topic, these objects are called aspects. Take the comments as input and output the aspects with sentiments. Use visualization to explain people's opinions based on the comments.

CA

 

Sentiment Analysis

 

After detecting hot topics, we analyzed the sentiments of the comments under topics to get to know about people's attitudes. We used a span-based aspect sentiment triplet extraction (Span-ASTE) model to identify the object being discussed (aspect), its description and the sentiment of the description in a comment sentence.

On the right hand is an example of ASTE.

Example of input and output of the Span-ASTE model

 

What's unique?

span-level information:

         Extract a phrase instead of a single word.

         i.e., not enjoy rather than not & enjoy.

aspect-description match:

         Identify each aspect and its corresponding description.

 

Visualization

 

This visualization below explains the comments under one topic from four dimensions. Each bubble corresponds to an aspect.  

The four dimensions are:

Colour: represents sentiment towards the aspect. The warm colour indicates positive sentiment. The cold colour indicates negative sentiment.

Size: represents word frequency, which is the popularity of the aspect.

X-axis: represents the time when the aspect started to appear in the comments. 

Y-axis: represents the probability of how likely an aspect being a ‘topic’. The aspects with higher scores are more likely to be topics, and aspects with lower scores are less important.

 

 

 

 

 

 

 

summary
CA visual

 

 

 

 

The final delivery of this project mainly consisted of the above two parts: topic modelling and comment analysis. We detect hot trends on Reddit Singapore and find the sentiments in the comments.

Our models can contribute to extracting information from social media platforms with little cost.

1. facilitate precise advertising by identifying target customer groups

2. Adjust inventory for retail stores

3. Facilitate product design by trending topics detection

4. Provide inspiration for content creators

5. Collect feedback from customers to improve the service level

 

 

 

 

 

TEAM MEMBERS

student Chen Pengdan Engineering Systems and Design
student Li Jiaqi Engineering Systems and Design
student Feng Han Engineering Systems and Design
student Dong Jiajie Information Systems Technology and Design
student Jin Ziqi Information Systems Technology and Design
student Lu Qianxi Information Systems Technology and Design
student Xiong Maihe Information Systems Technology and Design
border border