from SHAP explanations to behavioural archetypes and targeted recommendations
Joint work with A. Mancuso, R. Simone and F. Palumbo · Seminar at DSS, La Sapienza University of Rome
2026-03-06
Sustainability
Aggressive driving can increase fuel consumption and CO₂ emissions by up to 40% (McConky, Chen, & Gavi, 2018; Xu, Li, Liu, Rodgers, & Guensler, 2017).Green driving improves efficiency and reduces emissions (Zhou, Jin, & Wang, 2016).Safety
Aggressive driving is correlated with higher crash risk (Adavikottu & Velaga, 2021).Structured green driving programs have achieved up to 10% fuel savings and a 33% reduction in property-damage accidents (Nævestad, 2022).How to measure, explain, and improve driving behavior at scale?
Ecoscoring project
A partnership between Intesa Sanpaolo Insurance and University of Naples Federico II
project objective
Develop an eco-scoring system to:
Our focus in this work
design an interpretable emission recommendation framework that:
This system is complementary to the eco-score and enables actionable intervention.
from raw telematics to actionable recommendations
Data reconstruction and enrichment derive acceleration, slope, and environmental context from 2-minute GPS data
DL-based estimation of microscopic CO₂ emission rates
Customer-level surrogate modelling
structural + behavioural + contextual drivers
Local and global model-agnostic explanations
individual drivers and global effect shapes
Behavioural archetypes
segmentation of explanation patterns → targeted recommendations
Boxes Data
customers data (provided by the insurance company)
main structural limitation
Two-minute resolution is too coarse to directly observe microscopic driving dynamics (acceleration, harsh braking, slope effects).
The method
Open street maps / Google maps)Why it’s attractive
The bottleneck
At our scale, continuous API calls are too slow (Open Street map) or too expensive (Google Maps).
Free option
Need for plan B
External APIs are unscalable for moderately high frequency telematics.
Key point
Emissions are driven by vehicle dynamics and resistive forces, not only distance.
Vehicle Specific Power (VSP)
\[VSP_t = \frac{A v_t + B v_t^2 + C v_t^3 + m v_t (a_t + g \sin\theta)}{m}\]
A: rolling resistance; B: rotation resistance; C: aerodynamic drag.
\(m\) is the vehicle mass, \(g\) is the gravity;
We needed: speed (\(v_t\)), acceleration (\(a_t\)), road grade (\(\theta\))
Data gap
Boxes ping every 2 minutes → microscopic inputs must be reconstructed.
a validated physical model
The gold standard is MOVES
(Motor Vehicle Emission Simulator, US EPA). (Koupal, Cumberworth, Michaels, & Beardsley, 2003; Park, Lee, & Lee, 2016)
MOVES pros
MOVES cons
MOVES is designed for macroscopic studies (cities, regions)
Heavy database infrastructure
13+ complex input files
evaluating a single 10-minute trip: 2–5 minutes of processing
for hundreds of thousands of trips: computationally infeasible
(Ramirez-Sanchez et al., 2026)
Why NeuralMOVES
Trained on millions of MOVES-generated scenarios
Tiny (MB-scale package)
Millisecond-level evaluation
highly accurate
NeuralMOVES a set of specialized neural networks trained on vehicle-specific subsets of the MOVES simulation data.
Architecture
Inputs: speed, acceleration, grade
Context: temperature, humidity
2 hidden layers
5 neurons each
Activation: hyperbolic tangent
rationale
Separate models by vehicle type
Avoid unrealistic mixing
Fast execution
Smooth, differentiable output
Training
70/30 train/test
Early stopping (~300 epochs)
avoid negative emissions:
\[ \text{NeuralMOVES}(x) = \max\{\hat{e}(x), e_{idle}\} \]
NeuralMOVES required inputs
reconstruct from boxes data:
road slope
air temperature & humidity
acceleration
Enrichment process
feed the neural network
Speed Conversion: From km/h to meters/second, capped at highway extremes: \[v_k = \max(0, \min(speed_k \times \frac{1000}{3600}, 33))\]
Forward Acceleration: \[a_k = \frac{v_{k+1} - v_k}{t_{k+1} - t_k}\] (Clipped safely between -4.5 and 3 \(m/s^2\))
Road Grade (Slope): Derived from elevation differences (\(\Delta elevation\)) and displacement (\(\Delta pos\)): \[grade_k = \arcsin\left(\frac{elevation_{k+1} - elevation_k}{deltapos_{k+1}}\right) \times \frac{180}{\pi}\] (Capped at \(\pm 25^{\circ}\))
Locate the nearest province
We extract the coordinates for every Italian province.
Every single GPS ping is mapped to its nearest_prov in milliseconds.
weather matrix
extract the month from the ping’s timestamp.
join the record with the historical monthly average temperature (\(T\)) and humidity (\(H\)) of that specific province.
Time Integration
NeuralMOVES outputs an instantaneous emission rate (\(e_i\) in grams-per-second)
For each GPS point \(i\), compute the delta time to the next point: \[\Delta t_i = t_{i+1} - t_i\]
Then, calculate the absolute emissions for that specific segment: \[E_i = e_i \times \Delta t_i\]
Finally, the total trip emissions are \[E_{trip} = \sum_{i} (e_i \times \Delta t_i)\]
transform a static 2-minute snapshot into a continuous, physics-backed emission trip.
what’s next?
customer-level understanding
key emission drivers
actionable guidance
Learning task
model NeuralMOVES-estimated emissions as a function of structural, behavioural, and contextual features.
setup
observations: box-recording -> trip -> customer
target: log-transformed NeuralMOVES estimated emissions per km
vehicle & driver context features
driving behaviour features
baseline
Tree-based models
Takeaway
XGBoost achieves the lowest RMSE (≈18% improvement over Random Forest), while Elastic Net underfits — nonlinear behavioural effects are essential.
Linear regression (observation \(i\))
\[ \hat y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} \]
Each term \(\beta_j x_{ij}\) is the contribution of feature \(j\) for observation \(i\).
SHAP decomposition (any model)
\[ f(\mathbf{x}_i) = \phi_0 + \sum_{j=1}^p \phi_{ij} \]
Each \(\phi_{ij}\) is the SHAP value associated with feature \(j\) for observation \(i\).
Cooperative game theory
SHAP stands for:
SHapley Additive exPlanations
It is based on Shapley values from cooperative game theory (Shapley, 1953).
The Shapley value is a method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation (Molnar, 2025).
Mapping:
Question:
How do we fairly distribute the prediction among features?
formal definition
Let \(N = \{1,\dots,p\}\) be the feature set.
For observation \(i\) and feature \(j\):
\[ \phi_{ij} = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|! (p-|S|-1)!}{p!} \Big[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i)\Big] \] with \(f_S(\mathbf{x}_i)=\mathbb{E}[f(X)\mid X_S=x_{iS}]\).
The weight
\[ \frac{|S|!(p-|S|-1)!}{p!} \]
equals the probability that subset \(S\) precedes \(j\) under a random permutation of features.
Think of features entering the model one by one
Example with three features: \(A,B,C\).
Possible arrival orders:
\[ ABC,\; ACB,\; BAC,\; BCA,\; CAB,\; CBA \]
When feature \(B\) enters
| Order | \(S\) (features before \(B\)) |
|---|---|
| \(ABC\) | \(\{A\}\) |
| \(ACB\) | \(\{A,C\}\) |
| \(BAC\) | \(\varnothing\) |
| \(BCA\) | \(\varnothing\) |
| \(CAB\) | \(\{C,A\}\) |
| \(CBA\) | \(\{C\}\) |
Averaging the marginal contribution
The Shapley value of feature \(j\) and observation \(i\) is the average increase in prediction when \(j\) enters the model,
\[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i) \]
averaged across all possible feature orders (or equivalently, across all subsets \(S\)).
Four fairness principles
Shapley values are the unique attribution method satisfying:
Efficiency
All feature contributions sum to the prediction difference.
Symmetry
If two features contribute equally in all coalitions,
they receive the same attribution.
Dummy
If a feature never changes the prediction,
its contribution is zero.
Additivity (Linearity)
For additive models, explanations combine consistently across components.
baseline prediction plus a sum of feature contributions
Because of the efficiency property, every prediction can be written as
\[ f(\mathbf{x}_i) = \mathbb{E}[f(X)] + \sum_{j=1}^p \phi_{ij} \]
Important
This decomposition allows us to attribute predictions to individual features, even for complex models.
Structure of gradient boosting
An XGBoost model is additive:
\[ f(\mathbf{x}_i) = f_0 + \sum_{t=1}^{T} \lambda \, f_t(\mathbf{x}_i), \]
where:
Each tree is fitted to the pseudo-residuals.
SHAP and boosting
Shapley values are linear:
\(\phi_{ij}(f + g) = \phi_{ij}(f) + \phi_{ij}(g),\phi_{ij}(a f)=a \, \phi_{ij}(f).\)
Therefore,
\[ \phi_{ij}(f) = \sum_{t=1}^{T} \lambda \, \phi_{ij}\!\left(f_t\right). \]
SHAP values can be computed tree-by-tree and then aggregated.
A regression tree:
For a fixed observation \(\mathbf{x}_i\):
Key simplification
Shapley values require averaging over all feature subsets.
But in a tree:
TreeSHAP computational efficiency
So SHAP values for a tree can be computed in polynomial time.
What drives predicted emissions?
Takeaway
Emissions are not explained only by vehicle structure: they are also shaped by how people drive.
What drives predicted emissions?
by-feature effect curves
SHAP tells us which variables matter (and in which direction) for each customer.
To understand how a variable changes predictions across its range, we use effect curves.
Partial Dependence (PDP) (Friedman, 2001)
For a feature \(X_j\):
\[ \mathrm{PDP}_j(x) = \mathbb{E}_{X_{-j}}\!\left[f(x, X_{-j})\right] \;\approx\; \frac{1}{n}\sum_{i=1}^n f(x, x_{i,-j}). \]
Interpretation: average prediction when we “set” \(X_j=x\) and keep the others as observed.
The issue: correlated predictors
If \(X_j\) is correlated with other features, PDP averages over unrealistic combinations:
\((X_j=x,\;X_{-j}=x_{i,-j})\) may have near-zero probability in the real data.
So PDP can create effects driven by extrapolation rather than the model.
Accumulated Local Effects (ALE) (Apley & Zhu, 2020) (Apley & Zhu)
ALE computes local prediction differences inside the data support and then accumulates them.
Result: a global effect curve that is much more robust under correlation.
definition
Split the range of \(X_j\) into intervals \([z_{k-1}, z_k]\) (often quantiles).
For each interval, compute the average local change:
\[ \Delta_k = \frac{1}{n_k}\sum_{i: x_{ij}\in[z_{k-1},z_k]} \Big[ f(z_k, {\bf x}_{i,-j}) - f(z_{k-1}, {\bf x}_{i,-j}) \Big]. \] where \({\bf x}_{i,-j}\) is the vector of observed values of the \(i^{th}\)
Then ALE is the cumulative sum up to \(x\), centered to have mean 0.
takeaway
Longer average trip length is associated with substantially lower emissions per km, with the strongest inefficiency concentrated in very short trips.
takeaway
Average trip duration shows a nonlinear positive effect, with emissions per km increasing markedly beyond roughly 30 minutes.
takeaway
Median trip speed has a comparatively moderate and nonlinear impact, with slightly higher emissions at sustained higher speeds.
From explanations to behavioural profiles
SHAP provides individual-level explanations.
For each customer \(i\) we obtain a SHAP vector
\[ \boldsymbol{\phi}_i = (\phi_{i1}, \dots, \phi_{ip}) \]
describing how behavioural variables contribute to predicted emissions.
The scalability problem
With 100,000 customers, SHAP yields 100,000 explanations.
Insurers cannot design 100,000 personalized interventions.
Instead, we need behavioural profiles that summarize recurring emission-driving patterns.
Idea
Cluster explanations, not customers.
We apply Archetypal Analysis to the SHAP vectors.
Data representation
Each customer is described by a SHAP explanation vector \(\boldsymbol{\phi}_i = (\phi_{i1},\dots,\phi_{ip})\)
Collecting them yields the SHAP matrix
\[ \boldsymbol{\Phi} = \begin{bmatrix} \boldsymbol{\phi}_1 \\ \vdots \\ \boldsymbol{\phi}_n \end{bmatrix} \in \mathbb{R}^{n\times p}. \]
Archetypal decomposition (Cutler & Breiman, 1994)
Archetypal Analysis approximates
\[ \boldsymbol{\Phi} \approx \mathbf{A}\mathbf{B}\boldsymbol{\Phi} \]
where
A ∈ ℝ^{n×k} : customer → archetype mixture weights
B ∈ ℝ^{k×n} : archetype → customer mixture weights
Rows of both matrices are non-negative and sum to one.
Archetypes
Archetypes are the rows of
\[ \mathbf{Z} = \mathbf{B}\boldsymbol{\Phi} \]
Each archetype is therefore a convex combination of SHAP vectors:
\[ \mathbf{z}_h = \sum_{\ell=1}^{n} B_{h\ell}\boldsymbol{\phi}_\ell \]
interpretation
Archetypes represent extreme emission mechanisms,
i.e. extreme patterns of feature contributions to predicted emissions.
Each customer is expressed as a convex combination of archetypes
\[ \boldsymbol{\phi}_i \approx \sum_{h=1}^{k} A_{ih}\mathbf{z}_h \]
→ enabling behavioural segmentation and targeted recommendations.
Clustering vs archetypes
K-means
Archetypal Analysis
Takeaway
Emissions are primarily driven by structural vehicle factors (vehicle age), with behavioural effects playing a secondary role.
Takeaway
Sustained trip duration is the dominant emission mechanism, pushing predicted CO₂ intensity substantially above average.
Takeaway
Very short average trip length is the main emission driver, reflecting a classic short-trip inefficiency regime.
Takeaway
Long average trip length offsets duration effects, resulting in comparatively lower emissions per km.
Mechanism
Predicted emissions are mainly driven by structural vehicle characteristics
(e.g., vehicle age and technology).
Behavioural recommendations
Limited behavioural leverage; focus on vehicle efficiency improvements:
Insurer action
Target vehicle upgrade incentives or eco-bonus programs rather than behavioural coaching.
Mechanism
Long sustained trip duration drives emissions upward.
Behavioural recommendations
Insurer action
Introduce long-trip coaching through periodic feedback and driving goals.
Mechanism
Frequent short trips create inefficiency due to cold starts and incomplete engine warm-up.
Behavioural recommendations
Insurer action
Introduce a short-trip reduction challenge rewarding reductions in short-trip frequency.
Mechanism
Long trips tend to be efficient per km; the main risk comes from sustained high speeds.
Behavioural recommendations
Insurer action
Provide lightweight feedback and eco-driving recognition incentives.
Important
Instead of producing 100,000 individual recommendations,
we design archetype-specific guidance.
This enables:
What we did
Take-home messages
Outlook
Scan to access the slides
https://alfonsoiodicede.github.io/seminar_sapienza_march_26.html