Exploring Online Shopping Behavior: The Relationship Between Total Spending and Product Recommendations

Mar 30, 2024, 7:11 PM

The file shoppers.csv contains a dataset related to online shopping behavior. The dataset contains information about users and whether they clicked on personalised product recommendations (Yes or No), whether they made a purchase (Yes or No) and the amount of the item in question.

(a) Plot a histogram and boxplot of the Amounts. How does each provide a visual description of the variation in the data? (b) Create a cross tabulation of recommendations and purchase decisions. Describe the association (if any) and what that suggests about the product recommendations. (c) Use the following code to create an aggregated DataFrame: Create a scatter plot of Amount spent and proportion of product recommendations clicked. Label the axes. Calculate the correlation and describe the relationship. What does this tell you about product recommendations and how does it relate to part (a)?
The histogram shows the distribution of amounts spent by users, with the number of bins set to 20 to capture a detailed spread. It helps in understanding the frequency of various purchase amounts, indicating where most data points lie.
The boxplot provides a summary of the distribution, highlighting the median, quartiles, and potential outliers. It's a concise way to view the central tendency and variability, as well as to spot outliers that the histogram might not clearly show.
Now, let's create a cross tabulation of recommendations and purchase decisions to explore their association.
The cross tabulation shows the relationship between product recommendations and purchase decisions. Rows represent whether a product was recommended or not, and columns indicate if a purchase was made.
From the table, we observe that a significant number of purchases were made without recommendations, suggesting that while recommendations may influence some decisions, many purchases occur independently of them. This insight could imply that other factors, such as product visibility or inherent customer preferences, play a crucial role in purchase decisions.
Next, we'll proceed with creating an aggregated DataFrame, followed by a scatter plot to analyze the relationship between the amount spent and the proportion of product recommendations clicked.
Now, let's calculate the correlation between the total amount spent and the proportion of product recommendations clicked to understand their relationship.

Summary of Findings

  • Histogram and Boxplot of Amounts:
    • The histogram showed the distribution of amounts spent, indicating where most data points lie. The boxplot provided a summary of the distribution, highlighting the median, quartiles, and potential outliers.
  • Cross Tabulation of Recommendations and Purchase Decisions:
    • The analysis suggested that many purchases were made without recommendations. This implies that other factors, such as product visibility or inherent customer preferences, might play a significant role in purchase decisions.
  • Scatter Plot and Correlation:
    • The scatter plot between the total amount spent and the proportion of product recommendations clicked showed a positive relationship. The correlation coefficient was approximately 0.414, indicating a moderate positive correlation. This suggests that as the amount spent increases, the proportion of recommendations clicked also tends to increase, albeit not strongly.
These findings provide insights into the impact of product recommendations on online shopping behavior, highlighting the complexity of factors influencing purchase decisions.

The file shoppers.csv contains a dataset related to online shopping behavior. The dataset contains information about users and whether they clicked on personalised product recommendations (Yes or No), whether they made a purchase (Yes or No) and the amount of the item in question.

(a) Plot a histogram and boxplot of the Amounts. How does each provide a visual description of the variation in the data? (b) Create a cross tabulation of recommendations and purchase decisions. Describe the association (if any) and what that suggests about the product recommendations. (c) Use the following code to create an aggregated DataFrame: Create a scatter plot of Amount spent and proportion of product recommendations clicked. Label the axes. Calculate the correlation and describe the relationship. What does this tell you about product recommendations and how does it relate to part (a)?
We'll tackle this in three parts:
(a) Plot a histogram and boxplot of the Amounts: We'll start by visualizing the distribution of the amounts spent using a histogram and a boxplot. This will help us understand the variation in the data.
(b) Create a cross tabulation of recommendations and purchase decisions: Next, we'll examine the relationship between product recommendations and purchase decisions through a cross tabulation.
(c) Scatter plot and correlation: Finally, using the provided code snippet, we'll create an aggregated DataFrame to analyze the relationship between the amount spent and the proportion of product recommendations clicked. We'll create a scatter plot, label the axes, calculate the correlation, and discuss the relationship.
Let's begin with part (a) by plotting the histogram and boxplot.

Continue the analysis

Summary of Analysis

  • Histogram and Boxplot of Amounts:
    • The histogram and boxplot were successfully generated, providing visual insights into the distribution of amounts spent by users. The histogram showed the distribution across different amounts, while the boxplot highlighted the central tendency, variability, and potential outliers in the data.
Next steps include creating a cross tabulation of recommendations and purchase decisions, and analyzing the relationship between amount spent and the proportion of product recommendations clicked through a scatter plot and correlation calculation.
Get started with Vizly