Exploring Statistics of 'antigüedad' Column in Retail Data

Mar 29, 2024, 12:15 PM

sacame la media aritmetica, geométrica, moda, mediana, desviación tipica muestral, varianza muestral, cuartil 1,2,3, percentil 33, coeficiente de asimetrica, coeficiente de curtosis, minimo, maximo, rango, amplictud intercuartil y coeficiente de variazión de la columna antiguedad

import pandas as pd

# Load the Excel file
file_path = 'Minoristas_pr2_2024.xlsx'
data = pd.read_excel(file_path, sheet_name=None)

# Since the Excel file might contain multiple sheets, we display the keys to identify the relevant sheet
print(data.keys())
dict_keys(['Datos', 'Funciones', 'Cálculos', 'Histogramas'])
# Loading the 'Datos' sheet which likely contains the data of interest
sheet_name = 'Datos'
df = pd.read_excel(file_path, sheet_name=sheet_name)

# Display the first few rows of the dataframe to inspect the structure
display(df.head())
from scipy import stats
import numpy as np

# Calculating various statistics for the 'antigüedad' column
data = df['antigüedad']
mean = data.mean()
geometric_mean = stats.gmean(data)
mode = data.mode()[0]
median = data.median()
std_dev = data.std(ddof=1) # Sample standard deviation
variance = data.var(ddof=1) # Sample variance
quartiles = data.quantile([0.25, 0.5, 0.75])
percentile_33 = np.percentile(data, 33)
skewness = data.skew()
kurtosis = data.kurtosis()
minimum = data.min()
maximum = data.max()
range_ = maximum - minimum
iqr = quartiles[0.75] - quartiles[0.25] # Interquartile range
cv = std_dev / mean # Coefficient of variation

# Displaying the results
display({
'Mean': mean,
'Geometric Mean': geometric_mean,
'Mode': mode,
'Median': median,
'Sample Standard Deviation': std_dev,
'Sample Variance': variance,
'Quartile 1 (25%)': quartiles[0.25],
'Quartile 2 (50%) - Median': quartiles[0.5],
'Quartile 3 (75%)': quartiles[0.75],
'Percentile 33': percentile_33,
'Skewness - Coefficient of Asymmetry': skewness,
'Kurtosis - Coefficient of Curtosis': kurtosis,
'Minimum': minimum,

Summary of Statistical Analysis for 'antigüedad' Column

  • Mean (Average): 85.16
  • Geometric Mean: 51.95
  • Mode (Most Frequent): 6
  • Median (Middle Value): 65.0
  • Sample Standard Deviation: 72.77
  • Sample Variance: 5295.89
  • Quartile 1 (25%): 20.0
  • Quartile 2 (Median): 65.0
  • Quartile 3 (75%): 140.0
  • Percentile 33: 34.88
  • Skewness (Asymmetry Coefficient): 0.83
  • Kurtosis (Curtosis Coefficient): -0.21
  • Minimum: 6
  • Maximum: 248
  • Range: 242
  • Interquartile Range (IQR): 120.0
  • Coefficient of Variation (CV): 0.85

de la hoja cálculos sacame la media, mediana, moda, desviación tipica muestral, varianza muestral, curtosis, asimetria, máximo, minimo, rango, percentil 25, amplictud intercuartil y coeficiente de variación de la columna empleados, antiguedad, facturación, impuestos y costes

# Loading the 'Cálculos' sheet which likely contains the data for further analysis
sheet_name = 'Cálculos'
df_calculos = pd.read_excel(file_path, sheet_name=sheet_name)

# Display the first few rows of the dataframe to inspect the structure
display(df_calculos.head())
# Adjusting the dataframe to skip initial rows that do not contain relevant data
# Skipping rows and setting the first row with data as header
df_calculos_adjusted = pd.read_excel(file_path, sheet_name='Cálculos', skiprows=2)

# Display the first few rows of the adjusted dataframe
display(df_calculos_adjusted.head())
columns_to_analyze = ['empleados', 'antigüedad', 'facturación', 'impuestos', 'costes']

# Creating a dictionary to hold the analysis results
analysis_results = {}

for column in columns_to_analyze:
# Extracting the column data
data = df_calculos_adjusted[column].astype(float) # Ensure numeric type for calculations
# Calculating statistics
mean = data.mean()
median = data.median()
mode = data.mode()[0]
std_dev = data.std(ddof=1) # Sample standard deviation
variance = data.var(ddof=1) # Sample variance
kurtosis = data.kurtosis()
skewness = data.skew()
maximum = data.max()
minimum = data.min()
range_ = maximum - minimum
percentile_25 = np.percentile(data, 25)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
cv = std_dev / mean # Coefficient of variation

# Storing the results in the dictionary under the column name
analysis_results[column] = {
'Mean': mean,
'Median': median,
'Mode': mode,
'Sample Standard Deviation': std_dev,
'Sample Variance': variance,
'Kurtosis': kurtosis,
'Skewness': skewness,
'Maximum': maximum,
'Minimum': minimum,
'Range': range_,
columns_to_analyze = ['empleados', 'antigüedad', 'facturación', 'impuestos', 'costes']

# Creating a dictionary to hold the analysis results
analysis_results = {}

for column in columns_to_analyze:
# Extracting the column data
data = pd.to_numeric(df_calculos_adjusted[column], errors='coerce') # Convert to numeric, coercing errors
data = data.dropna() # Drop NaN values that result from coercion
# Calculating statistics
mean = data.mean()
median = data.median()
mode = data.mode()[0] if not data.mode().empty else 'No mode'
std_dev = data.std(ddof=1) # Sample standard deviation
variance = data.var(ddof=1) # Sample variance
kurtosis = data.kurtosis()
skewness = data.skew()
maximum = data.max()
minimum = data.min()
range_ = maximum - minimum
percentile_25 = np.percentile(data, 25)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
cv = std_dev / mean # Coefficient of variation

# Storing the results in the dictionary under the column name
analysis_results[column] = {
'Mean': mean,
'Median': median,
'Mode': mode,
'Sample Standard Deviation': std_dev,
'Sample Variance': variance,
'Kurtosis': kurtosis,
'Skewness': skewness,
'Maximum': maximum,
'Minimum': minimum,

Statistical Analysis Summary for Selected Columns

empleados

  • Mean: 2.68
  • Median: 3.0
  • Mode: 2.0
  • Sample Standard Deviation: 0.80
  • Sample Variance: 0.64
  • Kurtosis: -0.52
  • Skewness: 0.14
  • Maximum: 4.0
  • Minimum: 1.0
  • Range: 3.0
  • Percentile 25: 2.0
  • Interquartile Range (IQR): 1.0
  • Coefficient of Variation (CV): 0.30

antigüedad

  • Mean: 85.16
  • Median: 65.0
  • Mode: 6.0
  • Sample Standard Deviation: 72.77
  • Sample Variance: 5295.89
  • Kurtosis: -0.21
  • Skewness: 0.83
  • Maximum: 248.0
  • Minimum: 6.0
  • Range: 242.0
  • Percentile 25: 20.0
  • Interquartile Range (IQR): 120.0
  • Coefficient of Variation (CV): 0.85

facturación

  • Mean: 7589.26
  • Median: 7628.49
  • Mode: 5189.92
  • Sample Standard Deviation: 2474.33
  • Sample Variance: 6121285.32
  • Kurtosis: 0.49
  • Skewness: -0.11
  • Maximum: 10126.84
  • Minimum: 2032.89
  • Range: 8093.95
  • Percentile 25: 6323.34
  • Interquartile Range (IQR): 1305.15
  • Coefficient of Variation (CV): 0.33

impuestos

  • Mean: 1447.06
  • Median: 1473.22
  • Mode: 605.07
  • Sample Standard Deviation: 376.84
  • Sample Variance: 141998.96
  • Kurtosis: 1.98
  • Skewness: 0.67
  • Maximum: 1951.41
  • Minimum: 605.07
  • Range: 1346.34
  • Percentile 25: 1186.55
  • Interquartile Range (IQR): 286.66
  • Coefficient of Variation (CV): 0.26

costes

  • Mean: 1167.27
  • Median: 1188.82
  • Mode: 760.57
  • Sample Standard Deviation: 222.53
  • Sample Variance: 49519.14
  • Kurtosis: 4.22
  • Skewness: 1.33
  • Maximum: 1884.72
  • Minimum: 760.57
  • Range: 1124.15
  • Percentile 25: 1025.7
  • Interquartile Range (IQR): 204.27
  • Coefficient of Variation (CV): 0.19

de la hoja calulos, sacame un historiograma teniendo en cuenta las clases de facturación y su frecuencia

import plotly.express as px

# Preparing the data for the histogram
facturacion_data = df_calculos_adjusted['facturación'].dropna()

# Creating the histogram using Plotly Expressig = px.histogram(facturacion_data, x='facturación', nbins=10, title='Histogram of Facturación',
labels={'facturación': 'Facturación'}, template='plotly_dark')
fig.update_layout(paper_bgcolor='#111', plot_bgcolor='#111', font_color='white')
fig.show()
import plotly.express as px

# Preparing the data for the histogram
facturacion_data = df_calculos_adjusted['facturaci\u00f3n'].dropna()

# Creating the histogram using Plotly Express
fig = px.histogram(facturacion_data, x='facturaci\u00f3n', nbins=10, title='Histogram of Facturaci\u00f3n',
labels={'facturaci\u00f3n': 'Facturaci\u00f3n'}, template='plotly_dark')
fig.update_layout(paper_bgcolor='#111', plot_bgcolor='#111', font_color='white')
fig.show()
Loading...

Histogram of Facturación

A histogram was created to visualize the distribution of the 'facturación' column from the 'Cálculos' sheet. The histogram, plotted using Plotly Express with a dark theme, shows the frequency of different facturación classes across the dataset. The data was divided into 10 bins to provide a clear overview of the distribution pattern.