Executive Summary: Yield Prediction Analysis for Distillation Tower
Overview
This analysis aims to predict the yield of a distillation tower using various process variables. We utilized an XGBoost regression model to understand the key factors influencing the yield and to make accurate predictions.
Data Summary
- Dataset: The dataset contains various process variables recorded over time, including temperatures, pressures, and flow rates.
- Total Features: 24 (excluding the date and yield columns)
- Target Variable: Yield
Key Findings
Correlation Analysis
- A correlation matrix was generated to understand the relationships between different process variables.
- Key observations include strong correlations between certain temperature and flow variables, indicating potential areas for process optimization.
Model Performance
- Model Used: XGBoost Regressor
- Mean Squared Error (MSE): 0.881
- R-squared (RÂ²): 0.190
The model's performance indicates that while it captures some variability in the yield, there is room for improvement. The RÂ² value suggests that approximately 19% of the variance in yield is explained by the model.
Feature Importance
The most important features affecting the yield according to the XGBoost model are:
- FlowC1 (Importance: 0.275)
- Temp1 (Importance: 0.181)
- TempC2 (Importance: 0.078)
- FlowC9 (Importance: 0.072)
- TempC3 (Importance: 0.070)
SHAP Analysis
- SHAP (SHapley Additive exPlanations) values were used to interpret the model's predictions.
- The SHAP summary plot provides a detailed view of the impact of each feature on the model's output.
- Key insights include the significant influence of FlowC1 and Temp1 on yield predictions, with FlowC1 having the highest impact.
Recommendations
- Focus on Key Variables: Given the importance of FlowC1 and Temp1, process engineers should prioritize monitoring and optimizing these variables to improve yield.
- Further Model Optimization: Consider additional feature engineering and model tuning to improve the predictive performance.
- Regular Monitoring: Implement real-time monitoring of key variables to ensure they remain within optimal ranges.
Visualizations
- Correlation Matrix: Provides an overview of relationships between process variables.
- Feature Importances: Highlights the most influential features for yield prediction.
- SHAP Summary Plot: Offers detailed interpretability of the model's predictions.
Conclusion
This analysis provides a foundational understanding of the factors influencing distillation tower yield. By focusing on key variables and further optimizing the model, process engineers can make data-driven decisions to enhance yield and overall process efficiency.
Would you like to explore any specific aspect further or need additional visualizations?