Installing Kaleido and Plotting Histograms with Plotly

Jan 22, 2025, 10:50 AM

Complete Dataset.csv

Missing file

my PhD thesis topic is : "Driver Risk Profiling for Safety and Insurance Applications Using Open-Source Telematics Data" To implement the proposed methodology on the "Vehicle Telematics Dataset" by Mukul Bhele, follow these steps:Dataset Preparation.Explore Key Features: Identify columns related to speed, acceleration, braking, and timestamps.

Check for Missing Values: Assess the dataset for any mising or inconsistent data.
Visualize Data: Plot histograms for speed, acceleration, and braking to understand their distributions.Behavior Analysis:Define Thresholds for Driving Behaviors:
Harsh Braking: Deceleration below a certain threshold (e.g., -0.6g).
Aggressive Acceleration: Acceleration above a certain threshold (e.g., 0.5g).
Speeding: Speed exceeding a defined limit (e.g., 120 km/h in urban are).Label Data Points: Classify data points as risky or safe based on these thresholds.Clustering Drivers:Feature Extraction: Calculate the frequency of each behavior per driver.Apply K-Means:
The dataset has been successfully loaded. It contains various columns related to vehicle telematics data, such as tripID, deviceID, timeStamp, accData, gps_speed, battery, cTemp, eLoad, speed, total_acceleration, hard_brake_event, angular_acceleration, and event among others.

Next Steps:

  1. Explore Key Features: We will focus on columns related to speed, acceleration, braking, and timestamps.
  2. Check for Missing Values: Assess the dataset for any missing or inconsistent data.
  3. Visualize Data: Plot histograms for speed, acceleration, and braking to understand their distributions.
  4. Behavior Analysis: Define thresholds for driving behaviors such as harsh braking, aggressive acceleration, and speeding.
  5. Label Data Points: Classify data points as risky or safe based on these thresholds.
  6. Clustering Drivers: Extract features and apply K-Means clustering to group drivers based on their behavior.
Let's start by exploring the key features and checking for missing values.
The dataset does not contain any missing values, which is great for analysis.

Next Steps:

  • Visualize Data: Let's plot histograms for speed, acceleration, and braking to understand their distributions.
Let's proceed with visualizing the data.
It seems there was an error related to the kaleido package, which is required for exporting images using Plotly. Although we installed kaleido, the error persists, possibly due to an environment issue or a need to restart the kernel after installation.
To resolve this, you can try the following steps:
  1. Ensure that kaleido is installed correctly by running !pip install -U kaleido.
  2. Restart the Jupyter Notebook kernel to ensure all packages are loaded correctly.
  3. Re-run the plotting code to generate the visualizations.
Let's attempt to re-run the visualization code after confirming the installation of kaleido.

Vehicle Telematics (Cleaned using Excel).csv

Missing file

Behavior Analysis

Define Thresholds for Driving Behaviors:
Harsh Braking: Deceleration below a certain threshold (e.g., -0.6g).
Aggressive Acceleration: Acceleration above a certain threshold (e.g., 0.5g).
Speeding: Speed exceeding a defined limit (e.g., 120 km/h in urban areas).
Label Data Points: Classify data points as risky or safe based on these thresholds.
The cleaned dataset has been successfully loaded. It contains columns such as tripID, deviceID, timeStamp, accData, gps_speed, battery, cTemp, dtc, eLoad, iat, imap, kpl, maf, rpm, speed, tAdv, tPos, x, y, z, mx, my, and mz.

Behavior Analysis

To proceed with the behavior analysis, we will:
  1. Define Thresholds for Driving Behaviors:
    • Harsh Braking: Deceleration below a certain threshold (e.g., -0.6g).
    • Aggressive Acceleration: Acceleration above a certain threshold (e.g., 0.5g).
    • Speeding: Speed exceeding a defined limit (e.g., 120 km/h in urban areas).
  2. Label Data Points: Classify data points as risky or safe based on these thresholds.
Let's start by defining these thresholds and labeling the data points.
The data points have been successfully labeled based on the defined thresholds for risky behaviors:
  • Harsh Braking: Deceleration below -0.6g.
  • Aggressive Acceleration: Acceleration above 0.5g.
  • Speeding: Speed exceeding 120 km/h.
In the sample displayed, all data points are labeled as Safe based on the current thresholds.

Next Steps:

  • Clustering Drivers: We can proceed to extract features and apply K-Means clustering to group drivers based on their behavior.
Would you like to proceed with clustering the drivers, or is there anything else you'd like to explore?

Clustering Drivers

Feature Extraction: Calculate the frequency of each behavior per driver.
Apply K-Means Clustering:
Input Features: Frequency of harsh braking, speeding, and aggressive acceleration.
Output: Cluster labels indicating risk levels (e.g., low, medium, high risk).
The error occurred because the columns 'Harsh Braking' and 'Aggressive Acceleration' are not present in the behavior_counts_corrected DataFrame. This indicates that these behaviors were not detected in the dataset based on the thresholds we defined.
To resolve this, we should:
  1. Verify the thresholds and ensure they are appropriate for the dataset.
  2. Recalculate the frequency of each behavior per driver, ensuring that all potential behaviors are considered.
  3. If necessary, adjust the thresholds to better capture the behaviors in the dataset.
Let's proceed by recalculating the behavior frequencies and ensuring all behaviors are included.

Can you adjust the thresholds to better capture behaviors?

To better capture the driving behaviors, we can adjust the thresholds as follows:
  • Harsh Braking: Deceleration below a threshold of -0.3g.
  • Aggressive Acceleration: Acceleration above a threshold of 0.3g.
  • Speeding: Speed exceeding 100 km/h.
Let's apply these adjusted thresholds and recalculate the behavior frequencies.
The adjusted thresholds have been applied, and the data points have been re-labeled based on these new thresholds:
  • Harsh Braking: Deceleration below -0.3g.
  • Aggressive Acceleration: Acceleration above 0.3g.
  • Speeding: Speed exceeding 100 km/h.
The frequency of each behavior per driver has been calculated, and K-Means clustering has been applied to classify drivers into risk levels. The cluster labels indicate the risk level for each driver:
  • Cluster 0: Low risk
  • Cluster 1: Medium risk
  • Cluster 2: High risk
The table above shows the frequency of each behavior and the assigned risk level for a sample of drivers.
Would you like to visualize the clustering results or explore further analysis?

Can you visualize the clustering results?

The error occurred because the Plotly visualization engine requires the kaleido package for image export, and it seems there was an issue with its installation or usage. The error message suggests that the package is not available, even though it was installed.
To resolve this, please ensure that the kaleido package is correctly installed and accessible in your Python environment. You might want to restart the kernel or your Python environment to ensure that all packages are properly loaded.
Let's try to visualize the clustering results again after confirming the installation of kaleido.

Analyze Transitions

Sequential Modeling: Use models like Long Short-Term Memory (LSTM) networks to analyze transitions between driving behaviors (e.g., from defensive to aggressive driving).
Input: Time-series data for speed, acceleration, and braking.
The time-series data has been prepared for sequential modeling using Long Short-Term Memory (LSTM) networks. Here's a summary of the steps taken:
  1. Feature Selection: We selected relevant features for modeling: speed, x, y, and z (acceleration components).
  2. Data Preprocessing:
    • Converted timeStamp to datetime format and sorted the data to ensure sequential order.
    • Normalized the features using MinMaxScaler to ensure they are within the range [0, 1].
  3. Data Preparation:
    • Created a dataset with a look-back period of 10 time steps, which means each input sample to the LSTM will consist of 10 consecutive time steps.
    • Reshaped the data to be suitable for LSTM input, resulting in a shape of (1048553, 10, 4) for the input features and (1048553,) for the target variable.
Next, we can proceed to build and train an LSTM model to analyze transitions between driving behaviors. Would you like to proceed with training the LSTM model?

Visualize the LSTM model predictions.

Proceed with training the LSTM model.

Validate Results

Evaluate Clusters and Transitions:
Plot risk levels over time.
Compare with known risk patterns or events,
The plot above shows the predicted risk levels over time. The risk levels are categorized into 'Low', 'Medium', and 'High'. This visualization helps in understanding how the risk levels transition over time based on the driving behaviors captured in the dataset.
To further validate the results, you can compare these predicted risk levels with known risk patterns or events from the dataset or external sources. This comparison can help in assessing the accuracy and reliability of the LSTM model predictions.
If you have any specific known risk patterns or events, we can overlay them on this plot for a more detailed comparison. Would you like to proceed with any specific comparisons or analyses?

Visualize Results

Create Visualizations:
Display the distribution of risk levels across drivers.
Illustrate time-series transitions between driving behaviors.
It seems there is an issue with displaying the Plotly visualizations due to a missing or improperly installed package called kaleido. This package is required for exporting images using Plotly's show() function.
To resolve this issue, please ensure that kaleido is installed correctly. You can do this by running the following command in your environment:
After installing kaleido, please try running the visualization code again. If the problem persists, ensure that your Python environment is correctly configured to use the installed packages.

Can you show the time-series transitions between driving behaviors?

Get started with Vizly