Model-agnostic interpretability of Deep Learning models for car emissions assessment

from SHAP explanations to behavioural archetypes and targeted recommendations

Alfonso Iodice D’Enza

Joint work with A. Mancuso, R. Simone and F. Palumbo

2026-06-03

Drive green, drive safe

Sustainability

Aggressive driving can increase fuel consumption and CO₂ emissions by up to 40% (McConky, Chen, & Gavi, 2018; Xu, Li, Liu, Rodgers, & Guensler, 2017).
Green driving improves efficiency and reduces emissions (Zhou, Jin, & Wang, 2016).

Safety

Aggressive driving is correlated with higher crash risk (Adavikottu & Velaga, 2021).
Structured green driving programs have achieved up to 10% fuel savings and a 33% reduction in property-damage accidents (Nævestad, 2022).

How to measure, explain, and improve driving behavior at scale?

Green-driving assessment

Ecoscoring project

A partnership between an Italian insurance company and University of Naples Federico II

Project objective

Develop an eco-scoring system to:

assess driving behavior
incentivize eco-friendly practices
provide insurance premium discounts

Our focus in this work

Design an interpretable emission recommendation framework that:

explains the drivers of emissions
identifies behavioural archetypes
generates targeted eco-driving recommendations

This system is complementary to the eco-score and enables actionable intervention.

Roadmap

from raw telematics to actionable recommendations

Data reconstruction and enrichment derive acceleration, slope, and environmental context from 2-minute GPS data
DL-based estimation of microscopic CO₂ emission rates
Customer-level surrogate modelling

structural + behavioural + contextual drivers
Model-agnostic explanations

individual drivers and global effect shapes
Behavioural archetypes

segmentation of explanation patterns → targeted recommendations

Data reconstruction and enrichment

Data sources

Boxes Data

Telematics devices

100,000 customers
6 months
600 million records

Recorded features

timestamp
latitude & longitude
speed
distance between recordings
data collected every 2 minutes

customers data (provided by the insurance company)

vehicle characteristics (fuel type, power, segment, registration year)
customer characteristics (age class, geographical area)

main structural limitation

Two-minute resolution is too coarse to directly observe microscopic driving dynamics (acceleration, harsh braking, slope effects).

First idea: contextual driving style via routing APIs

The method

start from black-box GPS + speed
query routing services (Open street maps / Google maps)
reconstruct context (traffic, road type, expected travel time)
compare expected vs actual → infer “aggressiveness”

Why it’s attractive

context-aware (traffic + road class)
behaviour measured relative to surroundings
intuitive for insurers and policymakers

The bottleneck

At our scale, continuous API calls are too slow (Open Street map) or too expensive (Google Maps).

The API wall (reality check)

Free option

~600,000,000 records
even at 1 request/second: \[600{,}000{,}000 \text{ seconds} \approx \mathbf{19 \text{ years}}\]

Need for plan B

External APIs are unscalable for moderately high frequency telematics.

DL-based estimation of microscopic CO₂ emission rates

Plan B: estimate emissions from vehicle physics

Key point

Emissions are driven by vehicle dynamics and resistive forces, not only distance.

Vehicle Specific Power (VSP)

\[VSP_t = \frac{A v_t + B v_t^2 + C v_t^3 + m v_t (a_t + g \sin\theta)}{m}\]

A: rolling resistance; B: rotation resistance; C: aerodynamic drag.
\(m\) is the vehicle mass, \(g\) is the gravity;

We needed: speed (\(v_t\)), acceleration (\(a_t\)), road grade (\(\theta\))

Plan B: physics-based emission models

a validated physical model

The gold standard is MOVES
(Motor Vehicle Emission Simulator, US EPA). (Koupal, Cumberworth, Michaels, & Beardsley, 2003; Park, Lee, & Lee, 2016)

MOVES pros

Regulatory model
Scientifically validated
Physics-based
Legally defensible

Why not MOVES?

MOVES cons

MOVES is designed for macroscopic studies (cities, regions)
Heavy database infrastructure
13+ complex input files
evaluating a single 10-minute trip: 2–5 minutes of processing
for hundreds of thousands of trips: computationally infeasible

NeuralMOVES: scalable surrogate of MOVES (Ramirez-Sanchez et al., 2026)

NeuralMOVES is a set of specialized neural networks trained on vehicle-specific subsets of the MOVES simulation data.

Why NeuralMOVES

Trained on millions of MOVES-generated scenarios
Tiny (MB-scale package)
Millisecond-level evaluation
highly accurate

NeuralMOVES architecture

Architecture

Inputs: speed, acceleration, grade
Context: temperature, humidity
2 hidden layers
5 neurons each
Activation: hyperbolic tangent

rationale

Separate models by vehicle type
Avoid unrealistic mixing
Fast execution
Smooth, differentiable output

pre-processing and required features

Training

70/30 train/test
Early stopping (~300 epochs)

avoid negative emissions:

\[ \text{NeuralMOVES}(x) = \max\{\hat{e}(x), e_{idle}\} \]

NeuralMOVES required inputs

reconstruct from boxes data:

road slope
air temperature & humidity
acceleration

enrichment

Enrichment process

slope: Computed from altitude changes along the GPS path.
weather: External meteorological dataset (MeteoWeb.it); assigned by nearest province and month.
acceleration: \(\frac{Speed_{t+1} - Speed_t}{\Delta t}\) between consecutive records.

Customer-level surrogate modelling

From emissions to explanations

what’s next?

customer-level understanding
key emission drivers
actionable guidance

Learning task

model NeuralMOVES-estimated emissions as a function of structural, behavioural, and contextual features.

Surrogate model

setup

observations: box-recording -> trip -> customer

target: log-transformed NeuralMOVES estimated emissions per km

aggregation of trip-level \(\text{CO}_2\_{\text{g/km}}\) to obtain customer-level target values

vehicle & driver context features

fuel / segment / power
registration year (emission standards proxy)
age class
geography / area

driving behaviour features

trip duration / distance structure
speed level (median/mean)
speed variability
share of short trips

Candidate models

baseline

Elastic Net (Zou & Hastie, 2005)

Tree-based models

Random Forest (Breiman, 2001)
XGBoost (Chen & Guestrin, 2016)

Surrogate model performance (RMSE, cross-validation)

Takeaway

XGBoost achieves the lowest RMSE (≈18% improvement over Random Forest), while Elastic Net underfits — nonlinear behavioural effects are essential.

Local model-agnostic explanations

intuition: from linear regression to SHAP (Lundberg & Lee, 2017)

Linear regression (observation \(i\))

\[ \hat y_i = \mathbb{E}[\hat y] + \sum_{j=1}^{p} \hat{\beta}_j \left( x_{ij} - \mathbb{E}[X_j] \right) \]

Each term \(\hat{\beta}_j\left(x_{ij}-\mathbb{E}[X_j]\right)\) is the contribution of feature \(j\) for observation \(i\), measured as a deviation from the average prediction.

SHAP decomposition (any model)

\[ f(\mathbf{x}_i) = \phi_0 + \sum_{j=1}^p \phi_{ij} \]

Each \(\phi_{ij}\) is the SHAP value associated with feature \(j\) for observation \(i\).

Why is it called SHAP?

Cooperative game theory

SHAP stands for:

SHapley Additive exPlanations

It is based on Shapley values from cooperative game theory (Shapley, 1953).

game theory

The Shapley value is a method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation (Molnar, 2025).

mapping

Game → prediction function \(f\)
Players → features
Coalition \(S\) → subset of features
Payoff → model prediction

Shapley value

Formal definition

For observation \(i\) and feature \(j\):

\[ \phi_{ij} = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|! (p-|S|-1)!}{p!} \Big[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i) \Big], \]

where \(N = \{1,\dots,p\}\) and
\(f_S(\mathbf{x}_i)=\mathbb{E}[f(X)\mid X_S=x_{iS}]\).

Main intuition

The Shapley value is the average marginal contribution of feature \(j\) to the prediction for observation \(i\), averaged over all possible feature orders.

Weight

The term

\[ \frac{|S|!(p-|S|-1)!}{p!} \]

is the probability that subset \(S\) appears before feature \(j\) in a random ordering of features.

Marginal contribution

For a subset \(S\), the quantity

\[ f_{S\cup\{j\}}(\mathbf{x}_i) - f_S(\mathbf{x}_i) \]

measures how much feature \(j\) changes the prediction when added to the features in \(S\).

From Shapley theory to XGBoost

Structure of gradient boosting

An XGBoost model is additive:

\[ f(\mathbf{x}_i) = f_0 + \sum_{t=1}^{T} \lambda \, f_t(\mathbf{x}_i), \]

where:

\(f_0\) = initial prediction (constant baseline)
\(f_t\) = regression tree at iteration \(t\)
\(\lambda\) = learning rate

Each tree is fitted to the pseudo-residuals.

SHAP and boosting

Shapley values are linear:

\[\begin{equation*} \begin{split} \phi_{ij}(f + g) &= \phi_{ij}(f) + \phi_{ij}(g)\\ \phi_{ij}(a f)&= a \ \phi_{ij}(f) \end{split} \end{equation*}\]

Therefore,

\[ \phi_{ij}(f) = \sum_{t=1}^{T} \lambda \, \phi_{ij}\!\left(f_t\right). \]

SHAP values can be computed tree-by-tree and then aggregated.

Why a single tree is special

Active path intuition

A regression tree partitions the feature space and assigns a constant prediction at each leaf.

For each observation \(\mathbf{x}_i\), only one root-to-leaf path is active.

Therefore, only the variables used along that path can affect the prediction for that tree.

TreeSHAP payoff

Instead of averaging over all possible feature subsets by brute force, TreeSHAP:

follows the active path
accounts for alternative paths induced by missing/present features
aggregates marginal contributions efficiently

This makes exact SHAP values feasible for tree ensembles such as XGBoost.

Global SHAP importance

What drives predicted emissions?

Takeaway

Emissions are not explained only by vehicle structure:
they are also shaped by how people drive.

Global SHAP importance

What drives predicted emissions?

Behavioural archetypes

Towards actionable recommendations

From explanations to behavioural profiles

SHAP provides individual-level explanations.

For each customer \(i\) we obtain a SHAP vector

\[ \boldsymbol{\phi}_i = (\phi_{i1}, \dots, \phi_{ip}) \]

describing how behavioural variables contribute to predicted emissions.

The scalability problem

With 100,000 customers, SHAP yields 100,000 explanations.

Insurers cannot design 100,000 personalized interventions.

Instead, we need behavioural profiles that summarize recurring emission-driving patterns.

Idea

Cluster explanations, not customers.

We apply Archetypal Analysis to the SHAP vectors.

Archetypal Analysis on SHAP vectors

Data representation

Each customer is described by a SHAP explanation vector \(\boldsymbol{\phi}_i = (\phi_{i1},\dots,\phi_{ip})\)

Collecting them yields the SHAP matrix

\[ \boldsymbol{\Phi} = \begin{bmatrix} \boldsymbol{\phi}_1 \\ \vdots \\ \boldsymbol{\phi}_n \end{bmatrix} \in \mathbb{R}^{n\times p}. \]

Archetypal Analysis on SHAP vectors

Archetypal decomposition (Cutler & Breiman, 1994)

Archetypal Analysis approximates

\[ \boldsymbol{\Phi} \approx \mathbf{A}\mathbf{B}\boldsymbol{\Phi} \]

where

\(A_{n\times k}\) : customer → archetype mixture weights
\(B_{k\times}\) : archetype → customer mixture weights

Rows of both matrices are non-negative and sum to one.

Interpretation of archetypes

Archetypes on SHAP

Archetypes are the rows of

\[ \mathbf{Z} = \mathbf{B}\boldsymbol{\Phi} \]

Each archetype is therefore a convex combination of SHAP vectors:

\[ \mathbf{z}_h = \sum_{\ell=1}^{n} B_{h\ell}\boldsymbol{\phi}_\ell \]

Interpretation

Archetypes represent extreme emission mechanisms,
i.e. extreme patterns of feature contributions to predicted emissions.

Each customer is expressed as a convex combination of archetypes

\[ \boldsymbol{\phi}_i \approx \sum_{h=1}^{k} A_{ih}\mathbf{z}_h \]

→ enabling behavioural segmentation and targeted recommendations.

Why Archetypal Analysis?

Clustering vs archetypes

K-means

by-group average behavioural profiles: overlap leads to similar profiles

Archetypal Analysis

identifies extreme behavioural profiles
observations are mixtures of archetypes

Selecting the number of archetypes

Archetype A1 — Structural vehicle disadvantage

Takeaway

Predicted emissions are mainly increased by vehicle age and fuel-type effects, while trip-structure variables partly offset the prediction. Speeding behaviour remains a secondary behavioural lever

Archetype A2 — Duration-intensive profile

Takeaway

Emissions are driven upward by sustained trip duration and vehicle age, partially offset by average trip length and speed-related effects

Archetype A3 — Short-trip inefficiency

Takeaway

Very short average trip length is the dominant positive contribution, consistent with a short-trip inefficiency regime

Archetype A4 — Long-duration compensated profile

Takeaway

Very long trip duration pushes emissions upward, but long average trip length and recent vehicle age compensate, producing a mixed profile

Archetype A1 — Structural vehicle disadvantage

Mechanism

Predicted emissions are mainly driven by structural vehicle characteristics
(e.g., vehicle age and technology).

Behavioural recommendations

Limited behavioural leverage; focus on vehicle efficiency improvements:

Maintenance reminders (filters, tyres)
Maintain correct tyre pressure
Encourage efficient driving modes where available

Insurer action

Target vehicle upgrade incentives or eco-bonus programs rather than behavioural coaching

Archetype A2 — Duration-intensive profile

Mechanism

Long duration for moderatly long trips drives emissions upward

Behavioural recommendations

Prefer routes with less stop–go congestion
Encourage steady cruising instead of speed oscillations

Insurer action

Introduce long-trip coaching through periodic feedback and driving goals

Archetype A3 — Short-trip inefficiency

Mechanism

Frequent short trips create inefficiency due to cold starts and incomplete engine warm-up

Behavioural recommendations

Trip chaining: combine errands into one trip
Shift very short trips to active mobility
Encourage smooth acceleration at trip start

Insurer action

Introduce a short-trip reduction challenge rewarding reductions in short-trip frequency

Archetype A4 — Long-duration compensated profile

Mechanism

Long trips tend to be efficient per km; the main risk comes from sustained high speeds

Behavioural recommendations

Avoid very high sustained speeds
Encourage anticipatory driving
Maintain existing efficient driving habits

Insurer action

Provide lightweight feedback and eco-driving recognition incentives

From archetypes to scalable interventions

guidance

Instead of producing 100,000 individual recommendations,
we design archetype-specific guidance

This enables:

nudging strategies
targeted coaching
eco-driving incentives aligned with emission mechanisms

Wrap-up

What we did

reconstructed microscopic inputs from coarse telematics
used NeuralMOVES to estimate high-frequency CO₂ emission rates at scale
built an accurate customer-level surrogate (XGBoost)
explained predictions with SHAP
segmented explanation patterns via Archetypal Analysis to enable targeted recommendations

Wrap-up

Take-home messages

interpretability is not only about explaining models — it enables segmentation and action
applying AA to SHAP uncovers behavioural profiles aligned with emission mechanisms
recommendations become scalable without generating 100,000 individual plans

Outlook

integrate the recommendation layer into the eco-score pipeline
assess robustness across vehicle segments, seasons, and geography

References

Adavikottu, A., & Velaga, N. R. (2021). Analysis of factors influencing aggressive driver behavior and crash involvement. Traffic Injury Prevention, 22(sup1), S21–S26.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

Cutler, A., & Breiman, L. (1994). Archetypal analysis. Technometrics, 36(4), 338–347.

Koupal, J., Cumberworth, M., Michaels, H., & Beardsley, M. (2003). Design and implementation of MOVES: EPA’s new generation mobile source emission model. Proceedings of the EMFAC Conference / Ann Arbor.

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 30.

McConky, K., Chen, R. B., & Gavi, G. R. (2018). A comparison of motivational and informational contexts for improving eco-driving performance. Transportation Research Part F: Traffic Psychology and Behaviour, 52, 62–74.

Molnar, C. (2025). Interpretable machine learning: A guide for making black box models explainable (3rd ed.). Retrieved from https://christophm.github.io/interpretable-ml-book

Nævestad, T.-O. (2022). Eco driving as a road safety measure: Before and after study of three companies. Transportation Research Part F: Traffic Psychology and Behaviour, 91, 95–115.

Park, S., Lee, J.-B., & Lee, C. (2016). State-of-the-art automobile emissions models: A review. KSCE Journal of Civil Engineering, 20(3), 1053–1065.

Ramirez-Sanchez, E., Tang, C., Xu, Y., Renganathan, N., Jayawardana, V., He, Z., & Wu, C. (2026). NeuralMOVES: A lightweight and microscopic vehicle emission estimation model based on reverse engineering and surrogate learning. Transportation Research Part C: Emerging Technologies, 168, 104014.

Shapley, L. S. (1953). A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the theory of games II (pp. 307–317). Princeton University Press.

Xu, Y., Li, H., Liu, H., Rodgers, M. O., & Guensler, R. L. (2017). Eco-driving for transit: An effective strategy to conserve fuel and emissions. Applied Energy, 194, 784–797.

Zhou, M., Jin, H., & Wang, W. (2016). A review of vehicle fuel consumption models to evaluate eco-driving and eco-routing. Transportation Research Part D: Transport and Environment, 49, 203–218.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.