Analyzing Customer Feedback: Sentiment Analysis Results

Apr 29, 2024, 3:39 PM

analyze comments and explain it's customer's intent

import pandas as pd

data = pd.read_csv('churn Data.csv')
data.head()
# Extracting comments from the data and displaying the first few comments
data['Comment'].head()
To analyze the comments and understand the customers' intent, we'll perform a sentiment analysis. This process will help us categorize the comments into positive, negative, or neutral sentiments. Let's proceed with this analysis to better understand the customer feedback.
from textblob import TextBlob

def analyze_sentiment(comment):
analysis = TextBlob(comment)
if analysis.sentiment.polarity > 0:
return 'Positive'
elif analysis.sentiment.polarity == 0:
return 'Neutral'
else:
return 'Negative'

# Apply sentiment analysis to the comments
data['Sentiment'] = data['Comment'].apply(analyze_sentiment)
data[['Comment', 'Sentiment']].head()
We have analyzed customer comments from the dataset to determine the intent behind them through sentiment analysis. The sentiments were categorized as Positive, Neutral, or Negative based on the polarity of each comment. Most comments in the sample analyzed are Positive.

Explore sentiment distribution further

import plotly.express as px

# Create a bar chart to visualize the distribution of sentiments
distribution = data['Sentiment'].value_counts()
fig = px.bar(distribution, title='Sentiment Distribution', labels={'index': 'Sentiment', 'value': 'Count'}, text_auto=True)
fig.update_layout(xaxis_title='Sentiment', yaxis_title='Count', showlegend=False)
fig.show()
Loading...
We created a bar chart to explore the distribution of sentiments in the dataset, which was displayed for visualization.

Identify key words in positive and negative comments

from sklearn.feature_extraction.text import CountVectorizer

# Filter positive and negative comments
data_positive = data[data['Sentiment'] == 'Positive']['Comment']
data_negative = data[data['Sentiment'] == 'Negative']['Comment']

# Function to extract most frequent words
def get_top_words(comments, num_words=10):
vec = CountVectorizer(stop_words='english').fit(comments)
bag_of_words = vec.transform(comments)
sum_words = bag_of_words.sum(axis=0)
words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
words_freq = sorted(words_freq, key=lambda x: x[1], reverse=True)
return words_freq[:num_words]

# Get top words for positive and negative comments
top_positive_words = get_top_words(data_positive)
top_negative_words = get_top_words(data_negative)

# Display the results
'Positive Comments Top Words:', top_positive_words, 'Negative Comments Top Words:', top_negative_words
The key words in positive comments include 'okay', 'best', 'good', and 'awesome' among others. For negative comments, common terms are 'happy', 'bull', and 'shit'. These keywords can help identify the tone of customer feedback in the dataset.

Visualize this for me

import plotly.graph_objects as go

# Prepare data for plotting
positive_words, positive_counts = zip(*top_positive_words)
negative_words, negative_counts = zip(*top_negative_words)

# Create subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Top Words in Positive Comments', 'Top Words in Negative Comments'))

# Add traces for positive and negative words
top_words_positive_trace = go.Bar(x=positive_words, y=positive_counts, name='Positive')
top_words_negative_trace = go.Bar(x=negative_words, y=negative_counts, name='Negative')

# Plot positive words
fig.add_trace(top_words_positive_trace, row=1, col=1)

# Plot negative words
fig.add_trace(top_words_negative_trace, row=1, col=2)

# Update layout for better visualization
fig.update_layout(title_text='Top Words in Customer Comments by Sentiment', showlegend=False)
fig.show()
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Prepare data for plotting
positive_words, positive_counts = zip(*top_positive_words)
negative_words, negative_counts = zip(*top_negative_words)

# Create subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Top Words in Positive Comments', 'Top Words in Negative Comments'))

# Add traces for positive and negative words
top_words_positive_trace = go.Bar(x=positive_words, y=positive_counts, name='Positive')
top_words_negative_trace = go.Bar(x=negative_words, y=negative_counts, name='Negative')

# Plot positive words
fig.add_trace(top_words_positive_trace, row=1, col=1)

# Plot negative words
fig.add_trace(top_words_negative_trace, row=1, col=2)

# Update layout for better visualization
fig.update_layout(title_text='Top Words in Customer Comments by Sentiment', showlegend=False)
fig.show()
Loading...
Attempted to visualize the top positive and negative words from customer comments, but encountered an error due to missing imports. The code was corrected and successfully executed, displaying a bar graph comparing the word frequencies by sentiment.

build customer groups using clustering to show the high churn customers based on comment column

import pandas as pd

data = pd.read_csv('churn Data.csv')
display(data.head())

Continue the analysis

import pandas as pd

data = pd.read_csv('churn Data.csv')
display(data.head())
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

# Filter data for churned customers
data_churned = data[data['Churn'] == 'Yes']['Comment']

# Vectorize comments using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(data_churned)

# Apply KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Add cluster labels to the data
data_churned = data[data['Churn'] == 'Yes']
data_churned['Cluster'] = kmeans.labels_

# Display the first few entries with cluster labels
display(data_churned.head())
We've loaded the dataset and displayed its structure, which includes various customer attributes like tenure, services used, and churn status. Additionally, we have filtered comments from churned customers and applied TF-IDF vectorization followed by KMeans clustering to categorize these comments into three clusters.

Explore cluster characteristics

import plotly.express as px

# Count the number of comments in each cluster
data_churned['Cluster'] = data_churned['Cluster'].astype(str) # Convert cluster labels to string for better plotting
cluster_counts = data_churned['Cluster'].value_counts().reset_index()
cluster_counts.columns = ['Cluster', 'Count']

# Create a bar chart to show the number of comments per cluster
fig = px.bar(cluster_counts, x='Cluster', y='Count', title='Number of Comments per Cluster', labels={'Count':'Number of Comments', 'Cluster':'Cluster'})
fig.show()
Loading...
import plotly.express as px

# Count the number of comments in each cluster
data_churned['Cluster'] = data_churned['Cluster'].astype(str) # Convert cluster labels to string for better plotting
cluster_counts = data_churned['Cluster'].value_counts().reset_index()
cluster_counts.columns = ['Cluster', 'Count']

# Create a bar chart to show the number of comments per cluster
fig = px.bar(cluster_counts, x='Cluster', y='Count', title='Number of Comments per Cluster', labels={'Count':'Number of Comments', 'Cluster':'Cluster'})
fig.show()
Loading...

Visualize this for me

import plotly.express as px

# Count the number of comments in each cluster
data_churned['Cluster'] = data_churned['Cluster'].astype(str) # Convert cluster labels to string for better plotting
cluster_counts = data_churned['Cluster'].value_counts().reset_index()
cluster_counts.columns = ['Cluster', 'Count']

# Create a bar chart to show the number of comments per cluster
fig = px.bar(cluster_counts, x='Cluster', y='Count', title='Number of Comments per Cluster', labels={'Count':'Number of Comments', 'Cluster':'Cluster'})
fig.show()
Loading...
The user requested a visualization of the number of comments per cluster. A bar chart was created and displayed, showing the distribution of comments across different clusters.