Exploratory Data Analysis: Sales and Profit Distributions in Superstore Dataset

Apr 4, 2024, 8:26 AM

Perform exploratory data analysis.

We've loaded the data and taken a preliminary look. It contains 10,194 entries across 21 columns, including sales, profit, and various categorical fields like Ship Mode and Segment. Now, let's dive deeper into the data by analyzing sales and profit distributions, and exploring relationships between different variables.
  • Loaded and previewed data from Sample - Superstore.xlsx - Orders.csv, containing 10,194 entries and 21 columns.
  • Data spans various categories, including sales, profit, shipping details, and customer segments.
  • Initial data overview revealed a mix of numerical and categorical data types.
  • Analyzed sales and profit distributions through histograms, which displayed the spread of these variables across the dataset.

Continue the analysis

To further explore the dataset, we can examine how sales and profits vary by category and sub-category. This will help us identify which products are the most profitable and which ones generate the most sales. We'll visualize this analysis using bar charts for clearer insights.
First, let's aggregate sales and profits by category and sub-category.
Now that we have the sales and profits aggregated by category and sub-category, let's visualize these metrics to better understand the performance of different product categories.

Analyze trends in sales and profits over time

To analyze trends in sales and profits over time, we'll first convert the Order Date to a datetime format. Then, we'll aggregate sales and profits by month to observe how they vary over the period covered by the dataset. This approach will help us identify any seasonal trends or significant changes in performance.
  • Converted Order Date to datetime format and aggregated sales and profits by month.
  • Visualized monthly sales and profit trends, revealing fluctuations over time that may indicate seasonal patterns or other factors influencing performance.

Forecast the sales and profit for next 12 months

To forecast sales and profit for the next 12 months, we'll use a time series forecasting model. A common approach is to use ARIMA (AutoRegressive Integrated Moving Average), which is well-suited for datasets with trends or seasonal patterns.
First, we'll prepare the data for modeling, then fit an ARIMA model to both the sales and profit data, and finally, forecast the next 12 months.
  • Installed necessary libraries and prepared data for ARIMA modeling.
  • Fitted ARIMA models for both sales and profit data, identifying the best models based on AIC.
  • Forecasted the next 12 months for sales and profits.
  • Encountered an issue with mismatched dimensions in the visualization step, which was resolved.
  • Visualized the historical data alongside the forecasted sales and profit for the next 12 months, providing insights into expected trends.

Explore other forecasting models for comparison

For a comprehensive analysis, we can explore additional forecasting models such as Prophet by Facebook and LSTM (Long Short-Term Memory) networks. Prophet is designed for forecasting with daily observations that display patterns on different time scales. LSTM, a type of recurrent neural network, is well-suited for sequences of data and can capture long-term dependencies.
We'll start by applying the Prophet model to our sales data, followed by an LSTM model for comparison.

Apply Prophet model to sales data

  • Installed Prophet package successfully.
  • Prepared the sales data for the Prophet model, converting the date to the required format.
  • Initialized and fitted the Prophet model with the sales data.
  • Generated a forecast for the next 12 months.
  • Visualized the forecast along with its components, displaying trends, yearly seasonality, and weekly seasonality (if any).

Explore LSTM model for sales forecasting

To apply an LSTM model for sales forecasting, we'll first preprocess the data to fit the LSTM's requirements. This involves normalizing the sales data and transforming it into sequences suitable for time series forecasting. Then, we'll split the data into training and testing sets, design the LSTM model, train it on the sales data, and finally, make predictions for the next 12 months.
This process involves several steps and requires the use of deep learning libraries such as TensorFlow and Keras.

Perform market basket analysis in this data.

  • Successfully installed the mlxtend package for market basket analysis.
  • Loaded the orders data and transformed it into a format suitable for market basket analysis, with each row representing a transaction and each column an item. Marked items bought together with 1 in their respective columns for a given transaction.
  • Due to the large number of unique products, the pivoted data has a very wide format, making it challenging to display in full here.

Analyze the association rules to identify strong product relationships

It appears there was an issue with generating the association rules due to the input DataFrame containing the frequent itemsets being empty. This could be because the min_support threshold set for finding frequent itemsets was too high, resulting in no itemsets meeting this criteria.
To address this, we can try lowering the min_support threshold to identify more frequent itemsets and subsequently generate association rules from these. Let's adjust the min_support value and attempt to find frequent itemsets again.
Get started with Vizly