Drive green, drive safe

Sustainability

Safety

  • Aggressive driving is correlated with higher crash risk (Adavikottu & Velaga, 2021).
  • Structured green driving programs have achieved up to 10% fuel savings and a 33% reduction in property-damage accidents (Nævestad, 2022).

How to measure, explain, and improve driving behavior at scale?

Green-driving assessment

Ecoscoring project

A partnership between Intesa Sanpaolo Insurance and University of Naples Federico II

project objective

Develop an eco-scoring system to:

  • assess driving behavior
  • incentivize eco-friendly practices
  • provide insurance premium discounts

Our focus in this work

design an interpretable emission recommendation framework that:

  • explains the drivers of emissions
  • identifies behavioural archetypes
  • generates targeted eco-driving recommendations

This system is complementary to the eco-score and enables actionable intervention.

Roadmap

from raw telematics to actionable recommendations

  1. Data reconstruction and enrichment derive acceleration, slope, and environmental context from 2-minute GPS data

  2. DL-based estimation of microscopic CO₂ emission rates

  3. Customer-level surrogate modelling

    structural + behavioural + contextual drivers

  4. Local and global model-agnostic explanations

    individual drivers and global effect shapes

  5. Behavioural archetypes

    segmentation of explanation patterns → targeted recommendations

Data reconstruction and enrichment

Data sources

Boxes Data

Telematics devices

  • 100,000 customers
  • 6 months
  • 600 million records

Recorded features

  • timestamp
  • latitude & longitude
  • speed
  • distance between recordings
  • data collected every 2 minutes

customers data (provided by the insurance company)

  • vehicle characteristics (fuel type, power, segment, registration year)
  • customer characteristics (age class, geographical area)

main structural limitation

Two-minute resolution is too coarse to directly observe microscopic driving dynamics (acceleration, harsh braking, slope effects).

First idea: contextual driving style via routing APIs

The method

  • start from black-box GPS + speed
  • query routing services (Open street maps / Google maps)
  • reconstruct context (traffic, road type, expected travel time)
  • compare expected vs actual → infer “aggressiveness”

Why it’s attractive

  • context-aware (traffic + road class)
  • behaviour measured relative to surroundings
  • intuitive for insurers and policymakers

The bottleneck

At our scale, continuous API calls are too slow (Open Street map) or too expensive (Google Maps).

The API wall (reality check)

Free option

  • ~600,000,000 records
  • even at 1 request/second: \[600{,}000{,}000 \text{ seconds} \approx \mathbf{19 \text{ years}}\]

Need for plan B

External APIs are unscalable for moderately high frequency telematics.

DL-based estimation of microscopic CO₂ emission rates

Plan B: estimate emissions from vehicle physics

Key point

Emissions are driven by vehicle dynamics and resistive forces, not only distance.

Vehicle Specific Power (VSP)

\[VSP_t = \frac{A v_t + B v_t^2 + C v_t^3 + m v_t (a_t + g \sin\theta)}{m}\]

  • A: rolling resistance; B: rotation resistance; C: aerodynamic drag.

  • \(m\) is the vehicle mass, \(g\) is the gravity;

We needed: speed (\(v_t\)), acceleration (\(a_t\)), road grade (\(\theta\))

Data gap

Boxes ping every 2 minutes → microscopic inputs must be reconstructed.

Plan B: physics-based emission models

a validated physical model

The gold standard is MOVES
(Motor Vehicle Emission Simulator, US EPA). (Koupal, Cumberworth, Michaels, & Beardsley, 2003; Park, Lee, & Lee, 2016)

MOVES pros

  • Regulatory model
  • Scientifically validated
  • Physics-based
  • Legally defensible

Why not MOVES?

MOVES cons

  • MOVES is designed for macroscopic studies (cities, regions)

  • Heavy database infrastructure

  • 13+ complex input files

  • evaluating a single 10-minute trip: 2–5 minutes of processing

  • for hundreds of thousands of trips: computationally infeasible

NeuralMOVES: scalable surrogate of MOVES

(Ramirez-Sanchez et al., 2026)

Why NeuralMOVES

  • Trained on millions of MOVES-generated scenarios

  • Tiny (MB-scale package)

  • Millisecond-level evaluation

  • highly accurate

NeuralMOVES architecture

NeuralMOVES a set of specialized neural networks trained on vehicle-specific subsets of the MOVES simulation data.

Architecture

  • Inputs: speed, acceleration, grade

  • Context: temperature, humidity

  • 2 hidden layers

  • 5 neurons each

  • Activation: hyperbolic tangent

rationale

  • Separate models by vehicle type

  • Avoid unrealistic mixing

  • Fast execution

  • Smooth, differentiable output

pre-processing and enrichment

Training

  • 70/30 train/test

  • Early stopping (~300 epochs)

avoid negative emissions:

\[ \text{NeuralMOVES}(x) = \max\{\hat{e}(x), e_{idle}\} \]

NeuralMOVES required inputs

reconstruct from boxes data:

  • road slope

  • air temperature & humidity

  • acceleration

Enrichment process

  • slope: Computed from altitude changes along the GPS path.
  • weather: External meteorological dataset (MeteoWeb.it); assigned by nearest province and month.
  • acceleration: \(\frac{Speed_{t+1} - Speed_t}{\Delta t}\) between consecutive records.

diving deeper: kinematics & topology

feed the neural network

  • Speed Conversion: From km/h to meters/second, capped at highway extremes: \[v_k = \max(0, \min(speed_k \times \frac{1000}{3600}, 33))\]

  • Forward Acceleration: \[a_k = \frac{v_{k+1} - v_k}{t_{k+1} - t_k}\] (Clipped safely between -4.5 and 3 \(m/s^2\))

  • Road Grade (Slope): Derived from elevation differences (\(\Delta elevation\)) and displacement (\(\Delta pos\)): \[grade_k = \arcsin\left(\frac{elevation_{k+1} - elevation_k}{deltapos_{k+1}}\right) \times \frac{180}{\pi}\] (Capped at \(\pm 25^{\circ}\))

diving deeper: spatial context & weather

Locate the nearest province

  • We extract the coordinates for every Italian province.

  • Every single GPS ping is mapped to its nearest_prov in milliseconds.

weather matrix

  • extract the month from the ping’s timestamp.

  • join the record with the historical monthly average temperature (\(T\)) and humidity (\(H\)) of that specific province.

diving deeper: from rate to total emissions

Time Integration

NeuralMOVES outputs an instantaneous emission rate (\(e_i\) in grams-per-second)

  • For each GPS point \(i\), compute the delta time to the next point: \[\Delta t_i = t_{i+1} - t_i\]

  • Then, calculate the absolute emissions for that specific segment: \[E_i = e_i \times \Delta t_i\]

  • Finally, the total trip emissions are \[E_{trip} = \sum_{i} (e_i \times \Delta t_i)\]

transform a static 2-minute snapshot into a continuous, physics-backed emission trip.

Customer-level surrogate modelling

From emissions to explanations

what’s next?

  • customer-level understanding

  • key emission drivers

  • actionable guidance

Learning task

model NeuralMOVES-estimated emissions as a function of structural, behavioural, and contextual features.

Surrogate model

setup

observations: box-recording -> trip -> customer

target: log-transformed NeuralMOVES estimated emissions per km

  • aggregation of trip-level \(\text{CO}_2\_{\text{g/km}}\) to obtain customer-level target values

vehicle & driver context features

  • fuel / segment / power
  • registration year (emission standards proxy)
  • age class
  • geography / area

driving behaviour features

  • trip duration / distance structure
  • speed level (median/mean)
  • speed variability
  • share of short trips

Candidate models

baseline

Tree-based models

Surrogate model performance (RMSE, cross-validation)

Takeaway

XGBoost achieves the lowest RMSE (≈18% improvement over Random Forest), while Elastic Net underfits — nonlinear behavioural effects are essential.

Local model-agnostic explanations

intuition: from linear regression to SHAP (Lundberg & Lee, 2017)

Linear regression (observation \(i\))

\[ \hat y_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} \]

Each term \(\beta_j x_{ij}\) is the contribution of feature \(j\) for observation \(i\).

SHAP decomposition (any model)

\[ f(\mathbf{x}_i) = \phi_0 + \sum_{j=1}^p \phi_{ij} \]

Each \(\phi_{ij}\) is the SHAP value associated with feature \(j\) for observation \(i\).

Why is it called SHAP?

Cooperative game theory

SHAP stands for:

SHapley Additive exPlanations

It is based on Shapley values from cooperative game theory (Shapley, 1953).

The Shapley value is a method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation (Molnar, 2025).

Mapping:

  • Game → prediction function \(f\)
  • Players → features
  • Coalition \(S\) → subset of features
  • Payoff → model prediction

Question:

How do we fairly distribute the prediction among features?

Shapley value

formal definition

Let \(N = \{1,\dots,p\}\) be the feature set.

For observation \(i\) and feature \(j\):

\[ \phi_{ij} = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|! (p-|S|-1)!}{p!} \Big[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i)\Big] \] with \(f_S(\mathbf{x}_i)=\mathbb{E}[f(X)\mid X_S=x_{iS}]\).

The weight

\[ \frac{|S|!(p-|S|-1)!}{p!} \]

equals the probability that subset \(S\) precedes \(j\) under a random permutation of features.

Intuition: feature arrival order

Think of features entering the model one by one

Example with three features: \(A,B,C\).

Possible arrival orders:

\[ ABC,\; ACB,\; BAC,\; BCA,\; CAB,\; CBA \]

When feature \(B\) enters

Order \(S\) (features before \(B\))
\(ABC\) \(\{A\}\)
\(ACB\) \(\{A,C\}\)
\(BAC\) \(\varnothing\)
\(BCA\) \(\varnothing\)
\(CAB\) \(\{C,A\}\)
\(CBA\) \(\{C\}\)

Averaging the marginal contribution

The Shapley value of feature \(j\) and observation \(i\) is the average increase in prediction when \(j\) enters the model,

\[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i) \]

averaged across all possible feature orders (or equivalently, across all subsets \(S\)).

Why Shapley values are special

Four fairness principles

Shapley values are the unique attribution method satisfying:

Efficiency
All feature contributions sum to the prediction difference.


Symmetry
If two features contribute equally in all coalitions,
they receive the same attribution.


Dummy
If a feature never changes the prediction,
its contribution is zero.


Additivity (Linearity)
For additive models, explanations combine consistently across components.

From axioms to prediction decomposition

baseline prediction plus a sum of feature contributions

Because of the efficiency property, every prediction can be written as

\[ f(\mathbf{x}_i) = \mathbb{E}[f(X)] + \sum_{j=1}^p \phi_{ij} \]

Important

This decomposition allows us to attribute predictions to individual features, even for complex models.

From Shapley theory to XGBoost

Structure of gradient boosting

An XGBoost model is additive:

\[ f(\mathbf{x}_i) = f_0 + \sum_{t=1}^{T} \lambda \, f_t(\mathbf{x}_i), \]

where:

  • \(f_0\) = initial prediction (constant baseline)
  • \(f_t\) = regression tree at iteration \(t\)
  • \(\lambda\) = learning rate

Each tree is fitted to the pseudo-residuals.

SHAP and boosting

Shapley values are linear:

\(\phi_{ij}(f + g) = \phi_{ij}(f) + \phi_{ij}(g),\phi_{ij}(a f)=a \, \phi_{ij}(f).\)

Therefore,

\[ \phi_{ij}(f) = \sum_{t=1}^{T} \lambda \, \phi_{ij}\!\left(f_t\right). \]

SHAP values can be computed tree-by-tree and then aggregated.

Why a single tree is special

A regression tree:

  • Partitions the feature space via recursive splits
  • Assigns a constant prediction at each leaf

For a fixed observation \(\mathbf{x}_i\):

  • Exactly one root-to-leaf path is active.

Key simplification

Shapley values require averaging over all feature subsets.

But in a tree:

  • Only features used along the active path influence the prediction.
  • All other features are irrelevant for that tree.

Why a single tree is special

TreeSHAP computational efficiency

  • Follows the active path
  • Keeps track of split features
  • Computes marginal contributions efficiently via dynamic programming

So SHAP values for a tree can be computed in polynomial time.

Global SHAP importance

What drives predicted emissions?

Takeaway

Emissions are not explained only by vehicle structure: they are also shaped by how people drive.

Global SHAP importance

What drives predicted emissions?

Global model-agnostic explanations

From importance to effect shape

by-feature effect curves

SHAP tells us which variables matter (and in which direction) for each customer.

To understand how a variable changes predictions across its range, we use effect curves.

Partial Dependence (PDP) (Friedman, 2001)

For a feature \(X_j\):

\[ \mathrm{PDP}_j(x) = \mathbb{E}_{X_{-j}}\!\left[f(x, X_{-j})\right] \;\approx\; \frac{1}{n}\sum_{i=1}^n f(x, x_{i,-j}). \]

Interpretation: average prediction when we “set” \(X_j=x\) and keep the others as observed.

Why not just PDP?

The issue: correlated predictors

If \(X_j\) is correlated with other features, PDP averages over unrealistic combinations:

\((X_j=x,\;X_{-j}=x_{i,-j})\) may have near-zero probability in the real data.

So PDP can create effects driven by extrapolation rather than the model.

ALE in one formula

Accumulated Local Effects (ALE) (Apley & Zhu, 2020) (Apley & Zhu)

ALE computes local prediction differences inside the data support and then accumulates them.

Result: a global effect curve that is much more robust under correlation.

definition

Split the range of \(X_j\) into intervals \([z_{k-1}, z_k]\) (often quantiles).

For each interval, compute the average local change:

\[ \Delta_k = \frac{1}{n_k}\sum_{i: x_{ij}\in[z_{k-1},z_k]} \Big[ f(z_k, {\bf x}_{i,-j}) - f(z_{k-1}, {\bf x}_{i,-j}) \Big]. \] where \({\bf x}_{i,-j}\) is the vector of observed values of the \(i^{th}\)

Then ALE is the cumulative sum up to \(x\), centered to have mean 0.

ALE in one plot

ALE: length by trip

takeaway

Longer average trip length is associated with substantially lower emissions per km, with the strongest inefficiency concentrated in very short trips.

ALE: duration by trip

takeaway

Average trip duration shows a nonlinear positive effect, with emissions per km increasing markedly beyond roughly 30 minutes.

ALE: median trip speed

takeaway

Median trip speed has a comparatively moderate and nonlinear impact, with slightly higher emissions at sustained higher speeds.

Behavioural archetypes

Towards actionable recommendations

From explanations to behavioural profiles

SHAP provides individual-level explanations.

For each customer \(i\) we obtain a SHAP vector

\[ \boldsymbol{\phi}_i = (\phi_{i1}, \dots, \phi_{ip}) \]

describing how behavioural variables contribute to predicted emissions.

The scalability problem

With 100,000 customers, SHAP yields 100,000 explanations.

Insurers cannot design 100,000 personalized interventions.

Instead, we need behavioural profiles that summarize recurring emission-driving patterns.

Idea

Cluster explanations, not customers.

We apply Archetypal Analysis to the SHAP vectors.

Archetypal Analysis on SHAP vectors

Data representation

Each customer is described by a SHAP explanation vector \(\boldsymbol{\phi}_i = (\phi_{i1},\dots,\phi_{ip})\)

Collecting them yields the SHAP matrix

\[ \boldsymbol{\Phi} = \begin{bmatrix} \boldsymbol{\phi}_1 \\ \vdots \\ \boldsymbol{\phi}_n \end{bmatrix} \in \mathbb{R}^{n\times p}. \]

Archetypal decomposition (Cutler & Breiman, 1994)

Archetypal Analysis approximates

\[ \boldsymbol{\Phi} \approx \mathbf{A}\mathbf{B}\boldsymbol{\Phi} \]

where

A ∈ ℝ^{n×k} : customer → archetype mixture weights
B ∈ ℝ^{k×n} : archetype → customer mixture weights

Rows of both matrices are non-negative and sum to one.

Interpretation of archetypes

Archetypes

Archetypes are the rows of

\[ \mathbf{Z} = \mathbf{B}\boldsymbol{\Phi} \]

Each archetype is therefore a convex combination of SHAP vectors:

\[ \mathbf{z}_h = \sum_{\ell=1}^{n} B_{h\ell}\boldsymbol{\phi}_\ell \]

interpretation

Archetypes represent extreme emission mechanisms,
i.e. extreme patterns of feature contributions to predicted emissions.

Each customer is expressed as a convex combination of archetypes

\[ \boldsymbol{\phi}_i \approx \sum_{h=1}^{k} A_{ih}\mathbf{z}_h \]

→ enabling behavioural segmentation and targeted recommendations.

Why Archetypal Analysis?

Clustering vs archetypes

K-means

  • identifies groups of average behaviour

Archetypal Analysis

  • identifies extreme behavioural profiles
  • observations are mixtures of archetypes

Selecting the number of archetypes

Archetype A1 — Structural Vehicle Driven

 

 

 

Takeaway

Emissions are primarily driven by structural vehicle factors (vehicle age), with behavioural effects playing a secondary role.

Archetype A2 — Long-Duration Intensive

 

 

 

Takeaway

Sustained trip duration is the dominant emission mechanism, pushing predicted CO₂ intensity substantially above average.

Archetype A3 — Short-Trip Inefficient

 

 

 

Takeaway

Very short average trip length is the main emission driver, reflecting a classic short-trip inefficiency regime.

Archetype A4 — Long-Trip Efficient

 

 

 

Takeaway

Long average trip length offsets duration effects, resulting in comparatively lower emissions per km.

Archetype A1 — Structural Vehicle Driven

Mechanism

Predicted emissions are mainly driven by structural vehicle characteristics
(e.g., vehicle age and technology).

Behavioural recommendations

Limited behavioural leverage; focus on vehicle efficiency improvements:

  • Maintenance reminders (filters, tyres)
  • Maintain correct tyre pressure
  • Encourage efficient driving modes where available

Insurer action

Target vehicle upgrade incentives or eco-bonus programs rather than behavioural coaching.

Archetype A2 — Long-Duration Intensive

Mechanism

Long sustained trip duration drives emissions upward.

Behavioural recommendations

  • Break very long trips with short cool-down pauses
  • Prefer routes with less stop–go congestion
  • Encourage steady cruising instead of speed oscillations

Insurer action

Introduce long-trip coaching through periodic feedback and driving goals.

Archetype A3 — Short-Trip Inefficient

Mechanism

Frequent short trips create inefficiency due to cold starts and incomplete engine warm-up.

Behavioural recommendations

  • Trip chaining: combine errands into one trip
  • Shift very short trips to active mobility
  • Encourage smooth acceleration at trip start

Insurer action

Introduce a short-trip reduction challenge rewarding reductions in short-trip frequency.

Archetype A4 — Long-Trip Efficient

Mechanism

Long trips tend to be efficient per km; the main risk comes from sustained high speeds.

Behavioural recommendations

  • Avoid very high sustained speeds
  • Encourage anticipatory driving
  • Maintain existing efficient driving habits

Insurer action

Provide lightweight feedback and eco-driving recognition incentives.

From archetypes to scalable interventions

Important

Instead of producing 100,000 individual recommendations,
we design archetype-specific guidance.

This enables:

  • nudging strategies
  • targeted coaching
  • eco-driving incentives aligned with emission mechanisms

Wrap-up

What we did

  • reconstructed microscopic inputs from coarse telematics
  • used NeuralMOVES to estimate high-frequency CO₂ emission rates at scale
  • built an accurate customer-level surrogate (XGBoost)
  • explained predictions with local (SHAP) and global (ALE) methods
  • segmented explanation patterns via Archetypal Analysis to enable targeted recommendations

Take-home messages

  • interpretability is not only about explaining models — it enables segmentation and action
  • applying AA to SHAP uncovers behavioural profiles aligned with emission mechanisms
  • recommendations become scalable without generating 100,000 individual plans

Outlook

  • integrate the recommendation layer into the eco-score pipeline
  • assess robustness across vehicle segments, seasons, and geography

References

 

 

Adavikottu, A., & Velaga, N. R. (2021). Analysis of factors influencing aggressive driver behavior and crash involvement. Traffic Injury Prevention, 22(sup1), S21–S26.
Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B, 82(4), 1059–1086.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
Cutler, A., & Breiman, L. (1994). Archetypal analysis. Technometrics, 36(4), 338–347.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Koupal, J., Cumberworth, M., Michaels, H., & Beardsley, M. (2003). Design and implementation of MOVES: EPA’s new generation mobile source emission model. Proceedings of the EMFAC Conference / Ann Arbor.
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 30.
McConky, K., Chen, R. B., & Gavi, G. R. (2018). A comparison of motivational and informational contexts for improving eco-driving performance. Transportation Research Part F: Traffic Psychology and Behaviour, 52, 62–74.
Molnar, C. (2025). Interpretable machine learning: A guide for making black box models explainable (3rd ed.). Retrieved from https://christophm.github.io/interpretable-ml-book
Nævestad, T.-O. (2022). Eco driving as a road safety measure: Before and after study of three companies. Transportation Research Part F: Traffic Psychology and Behaviour, 91, 95–115.
Park, S., Lee, J.-B., & Lee, C. (2016). State-of-the-art automobile emissions models: A review. KSCE Journal of Civil Engineering, 20(3), 1053–1065.
Ramirez-Sanchez, E., Tang, C., Xu, Y., Renganathan, N., Jayawardana, V., He, Z., & Wu, C. (2026). NeuralMOVES: A lightweight and microscopic vehicle emission estimation model based on reverse engineering and surrogate learning. Transportation Research Part C: Emerging Technologies, 168, 104014.
Shapley, L. S. (1953). A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the theory of games II (pp. 307–317). Princeton University Press.
Xu, Y., Li, H., Liu, H., Rodgers, M. O., & Guensler, R. L. (2017). Eco-driving for transit: An effective strategy to conserve fuel and emissions. Applied Energy, 194, 784–797.
Zhou, M., Jin, H., & Wang, W. (2016). A review of vehicle fuel consumption models to evaluate eco-driving and eco-routing. Transportation Research Part D: Transport and Environment, 49, 203–218.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.

Thank you

Scan to access the slides

https://alfonsoiodicede.github.io/seminar_sapienza_march_26.html