This work has been partially supported by the Spoke 9 “Digital Society & Smart Cities” of ICSC – Centro Nazionale di Ricerca in High Performance-Computing, Big Data and Quantum Computing, funded by the European Union – NextGenerationEU (PNRR-HPC, CUP: E63C22000980007)
sustainability
aggressive driving can increase fuel consumption and CO₂ emissions by up to 40% [McConky et al., 2018; Xu et al., 2017].
green driving improves efficiency and reduces emissions [Zhou et al., 2016].
safety
aggressive driving is correlated with higher crash risk [Adavikottu & Velaga, 2021].
structured green driving programs have achieved up to 10% fuel savings and a 33% reduction in property-damage accidents [Nævestad, 2022].
Promoting and incentivizing green driving for sustainability and safety, while delivering clear benefits to insurers, fleet operators, and policymakers.
It is a project aimed at developing an eco-scoring system to
The ECOSCORING project stems from a partnership between Intesa Sanpaolo Insurance and the University of Naples Federico II.
Customers of the insurance company have telematics devices (black boxes) installed in their cars.
The insurance company provided access to the boxes from 100000 customers over 6 months (approx 600 million records)
boxes-recorded features:
date and timestamp of the recording
latitude and longitude
speed and distance traveled between two recordings
the boxes record data every 2 minutes during trips
insurance company customers-related features:
car characteristics (e.g., fuel type, engine size, power, segment, registration year)
customer characteristics (e.g. age class, geographical area)
Driving behavior/style impacts emissions beyond “the more you drive, the more you pollute”.
Eco-driving features (e.g., smooth acceleration, steady speed) can reduce emissions: they are not directly observed in the boxes data.
two minutes in between recordings is too coarse to capture them.
detect aggressive driving patterns from the boxes data using API services (e.g., Google Maps or Tom Tom)
for each road segment (between two recordings) get the expected time of travel at same time of the day and day of the week it was driven.
compare expected vs actual time → infer if the driver was aggressive or not.
what’s good: the expected time takes into account contextual factors (e.g., traffic, road type).
what’s bad:
free access to these services is limited (e.g., 2500 requests/day for Google Maps).
too expensive for large-scale applications.
use free services? e.g. Open Street Map:
do not provide travel time estimates
the number of requests per minute and/or per day still limited: too slow for large-scale applications.
estimate emissions directly enriching the boxes data with contextual information (e.g., road slope, type, time of the day/day of the week).
use a state-of-the-art emission model (pre-trained) to generate training data.
The MOVES framework refers to the Motor Vehicle Emission Simulator (MOVES):
MOVES is a state-of-the-art for emission modeling from telematics data [Koupal et al., 2003], and it is developed and maintained by the U.S. Environmental Protection Agency (EPA).
it is developed and continually updated by the EPA with extensive empirical validation against laboratory testing, roadside monitoring, and fuel consumption studies.
It provides legally defensible estimates [Park et al., 2016].
The MOVES framework is rigid and computationally intensive.
Impractical for real-time eco-scoring assessment.
Derivative tools (e.g., MOVEStar, [Wang et al., 2020]) improve usability but lose accuracy.
Surrogate models have been proposed in the literature that are trained on MOVES outputs and combine telematics with contextual data[Xu et al., 2021; Chen et al., 2020].
Among them NeuralMOVES shows desireable features [Ramirez-Sanchez et al., 2025].
A lightweight neural-network surrogate of the EPA’s MOVES model.
NeuralMOVES trained on millions of MOVES-generated scenarios
it essentially learns to emulate MOVES.
the discrepancies between NeuralMOVES and MOVES predictions are limited for most pollutants and driving conditions.
the pre-trained NeuralMOVES model is publicly available
the pre-trained NeuralMOVES model to estimate emissions from the boxes data.
input features:
Vehicle dynamics: speed, acceleration.
Vehicle characteristics: type, age, fuel type.
Environmental context: road grade, temperature, humidity, traffic.
to use NeuralMOVES for emission estimation, data pre-processing and enrichment are needed
NeuralMOVES required arguments
road slope
air Temperature and Humidity
acceleration
none of them in the boxes data.
Enrichment process
slope: computed altitude from latitude/longitude: by retrieving altitude fpr each pair of recording, we derived the slope via basic geometry.
temperature and humidity: dataset from MeteoWeb.it: assign temperature and humidity of the nearest province in straight-line distance at the month of the recording.
acceleration: \(\frac{(Speed_2 – Speed_1)}{\Delta_{time}}\) between two successive records in a trip.
NeuralMOVES produces emissions per segment (every 2 minutes).
We want a surrogate model to describe the drivers of emissions at the customer level.
Target: co2_g_km (grams per km).
Features:
Driving behavior (avg speed, variability, trip structure)
Vehicle (type, power, registration year)
Driver (age, geography)
The surrogate should be accurate but above all interpretable.
emsemble tree-based models
boosting (XGBoost)
Random Forest
additive models
XGBoost → best MAE, robust for typical drivers
Random Forest: best RMSE, stable predictions
GAM: interpretable effects, but less accurate
Elastic Net: weak baseline, shows need for nonlinear models
variable importance: average contribution in error reduction
Driving behavior dominates → duration, speed profile, short trips.
Vehicle features matter less once behavior is accounted for.
Registration year (proxy for emissions standards) also plays a role.
The surrogate confirms that how you drive matters at least as much as what you drive.
What are SHAP values?
game theory to ML
Game → Model prediction function 𝑓
Players → Features
Coalition → Subset of features
Payout → Model output
Gain → discrepancy between the prediction with and without the feature
The Shapley value is the average marginal contribution of a feature value across all possible coalitions.
SHAP uses sampling-based approximations to estimate the Shapley values.
shap beeswarm
ALE plots show the average effect of a feature on the model’s predictions,
computed locally within bins of the feature range.
Unlike PDPs (Partial Dependence Plots), ALE avoids bias from extrapolations into sparse regions.
Interpretation:
ALE complements SHAP:
Interpretation
- Longer trips → steadily increasing emissions per km.
- indeed the more you drive the more you pollute.
Interpretation
- Very low speeds (0–10 km/h) → high emissions (stop-and-go traffic).
- Around 30–40 km/h → lowest emissions (efficient steady cruising).
Interpretation
- Small variability (steady driving) → lower emissions.
- Large variability (frequent acceleration/braking) → higher emissions.
Interpretation
– Very high variation → higher emissions (stop–go or erratic).
– Moderate variation (10–20 km/h) → lowest emissions.
Interpretation
– Moderate ratio (20–50%) → lowest emissions.
– High ratio (>80%) → strong increase in emissions (frequent cold starts).
Interpretation
Emissions decrease clearly with newer registration years.
Old vehicles dominate baseline CO₂, regardless of driving style.
In large-scale applications, use of pre-trained large network (when available) might often be the way to go.
interpretability is key to gather insights and justify predicted values.
translate models into actionable eco-driving recommendations for each customer.
Next step (we already working on it): integrate the recommendations into the eco-scoring system of the insurer.
To assess and identify aggressive driving patterns, higher granularity data (e.g., 1-second intervals) would be ideal.
Then it be possible to correlate aggressive driving directly with both safety and sustainability.