XGBoost stands for eXtreme Gradient Boosting. It's a decision-tree-based ensemble machine learning algorithm that uses a gradient boosting framework. It's designed for speed and performance.

GridSearchCV is a method used to tune hyperparameters of a model. It exhaustively tries multiple combinations of parameters, cross-validating as it goes to determine which tune gives the best performance.

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. It indicates how well the data fit the regression model (the higher the value, the better the fit).

Show code

Show code

Show code

Show code

**Data Preparation**: Loaded the California housing dataset as a substitute for the Boston dataset, due to the latter's removal from sklearn. The dataset includes features like median income, house age, and average rooms, with the target variable being the median house value (MEDV).**Data Splitting**: The dataset was split into training (80%) and testing (20%) sets.**Model Training**: Used XGBoost with GridSearchCV to optimize the hyperparameters 'max_depth' and 'learning_rate'. The GridSearchCV was set up with 5-fold cross-validation and focused on maximizing the R-squared score.**Results**: The GridSearchCV process fitted 16 different combinations of parameters, totaling 80 fits. The best parameters and R-squared score will be retrieved and displayed next.

Show code

**Data Preparation**: Loaded the California housing dataset as a substitute for the Boston dataset, due to the latter's removal from sklearn. The dataset includes features like median income, house age, and average rooms, with the target variable being the median house value (MEDV).**Data Splitting**: The dataset was split into training (80%) and testing (20%) sets.**Model Training**: Used XGBoost with GridSearchCV to optimize the hyperparameters 'max_depth' and 'learning_rate'. The GridSearchCV was set up with 5-fold cross-validation and focused on maximizing the R-squared score.**Results**: The GridSearchCV process fitted 16 different combinations of parameters, totaling 80 fits. The best parameters and R-squared score will be retrieved and displayed next.

**Best Parameters**: {'learning_rate': 0.2, 'max_depth': 6}**Best R-squared Score**: 0.8343

This indicates a strong predictive performance of the model on the training data.

Show code

**R-squared on Training Data**: 0.9285**R-squared on Testing Data**: 0.8352

These results indicate that the model fits the training data very well and also generalizes effectively to new, unseen data.

AI-powered data analysis