Data Loading, Cleaning, and Manual Input Correction

Apr 2, 2024, 1:45 PM

Research Question: Can any of the Glibenclamide samples substitute for one another?

import pandas as pd

df = pd.read_csv('Bioequivalence in Glibenclamide Tablet Samples.csv')
display(df)
import pandas as pd
from io import StringIO

data = """Samples, Molar concentration x X 10-6,Mean Unit Content mg,% Content to label claim,,\n10.23,5.1,102,,\n1,9.79,4.8,96,,\n2,11.25,5.6,112,,\n3,11.25,5.6,112,,\n4,10.5,6.2,124,,\n5,10.47,5.2,104,,\n6,,,,,\n"""
df = pd.read_csv(StringIO(data))
display(df)
df_cleaned = df.dropna(subset=['Molar concentration x X 10-6', 'Mean Unit Content mg', '% Content to label claim'])
display(df_cleaned)
df_cleaned = df.dropna(subset=['Mean Unit Content mg', '% Content to label claim'])
display(df_cleaned)
  • Successfully loaded and displayed the initial dataset, but encountered an issue with an empty or incorrectly formatted CSV file.
  • Corrected the data input issue by manually creating a DataFrame from a provided string of data.
  • Cleaned the dataset by removing rows with missing values in critical columns ('Mean Unit Content mg', '% Content to label claim').
  • The cleaned dataset includes samples 1 through 5, with complete data on molar concentration, mean unit content, and content to label claim percentages.

Null Hypothesis: There is no bioequivalence between the Glineclamide Tablet samples

To test the null hypothesis that there's no bioequivalence between the Glibenclamide Tablet samples, we'll compare their mean unit content and percentage content to label claim. If these values are statistically similar across samples, we might reject the null hypothesis, suggesting potential bioequivalence.
from scipy.stats import f_oneway

# Extracting relevant data columns for analysis
data_mean_unit_content = df_cleaned['Mean Unit Content mg'].values
data_pct_content_label = df_cleaned['% Content to label claim'].values

# Performing one-way ANOVA on both sets of data
test_result_mean_content = f_oneway(*[df_cleaned[df_cleaned['Samples'] == sample]['Mean Unit Content mg'] for sample in df_cleaned['Samples'].unique()])
test_result_pct_label = f_oneway(*[df_cleaned[df_cleaned['Samples'] == sample]['% Content to label claim'] for sample in df_cleaned['Samples'].unique()])

# Displaying the ANOVA test results
test_result_mean_content, test_result_pct_label
  • Encountered an issue with the ANOVA test, resulting in NaN values for both statistic and p-value, indicating a problem with the data input or handling.
  • The intention was to use ANOVA to compare mean unit content and percentage content to label claim across samples, to test the null hypothesis of no bioequivalence between Glibenclamide Tablet samples.