AI-Driven Double Materiality: Benchmarking Autonomous Agents Against Human Experts

In corporate sustainability, the Double Materiality Assessment (DMA) is the bedrock of compliance with the EU's Corporate Sustainability Reporting Directive (CSRD) and the European Sustainability Reporting Standards (ESRS). Typically, a comprehensive DMA requires hundreds of hours of consultant-led interviews, stakeholder surveys, and qualitative alignment sessions.

To evaluate whether technology can streamline this bottleneck, ExecutESG, the Hanken School of Economics (Sustainability Transformation Lab), and Ab Stormossen Oy conducted a joint academic study.

The project was executed by a dedicated team of master's students who served as the lead researchers and main contributors: Aleksandra Uzunova, Wen Zhang, Henrik Skjolden, and Trisha Shenoy, under the academic guidance of Professor Martin Fougère and Nikodemus Solitander.

🎓 Hanken Student Research Team & Lead Contributors

This academic study was conducted by the following master's students as the primary researchers and core contributors of this project:

Aleksandra Uzunova (Master's Student, Hanken)
Wen Zhang (Master's Student, Hanken)
Henrik Skjolden (Master's Student, Hanken)
Trisha Shenoy (Master's Student, Hanken)

Academic Guidance & Advising: Professor Martin Fougère and Nikodemus Solitander (Sustainability Transformation Lab, Hanken Business School).

The core research question evaluated was:

Can autonomous, persona-driven AI agents produce a Double Materiality Assessment of comparable quality to human sustainability experts, suitable for board-level decision-making?

To answer this, the research team set up a side-by-side comparison of three completely distinct methodologies applied to Stormossen, a municipal waste management and circular economy actor in Vaasa, Finland.

📊 The Three Methodologies Compared

The study benchmarked three different DMA processes, each leveraging different inputs, degrees of stakeholder involvement, and levels of automation:

Feature / Dimension	1️⃣ Human-Driven DMA	2️⃣ ExecutESG Custom AI DMA	3️⃣ ChatGPT-Driven DMA (GPT-4)
Primary Inputs	Stakeholder workshops, pre-task surveys, manual discussions	Internal strategic presentations, academic papers, regulatory guidelines	Publicly available company website info, general sector assumptions
Stakeholder Input	Lived, qualitative input from 15 distinct stakeholder groups	Simulated by 8 distinct management personas (CEO, COO, HR, etc.)	None (zero-shot generative profiling)
Prioritization Method	Stakeholder voting rounds and group consensus	Proprietary Pairwise Comparison Engine (Playwright automation)	Direct top-down AI scoring guesses
Timeline	~3 months	~2 hours	< 15 minutes

📈 Key Findings: Merits & Limitations of Each Approach

The study revealed that all three approaches converged on the same core material topics (specifically Circular Economy, Pollution, Climate Change, and Own Workforce). However, they differed significantly in how materiality was constructed, yielding distinct strengths and limitations:

1️⃣ Human-Driven DMA

Merits: Captures exceptional operational nuance and local context. For example, humans identified that Stormossen's environmental success depends heavily on customer sorting habits and communication logistics—insights that help design effective physical interventions. It also distinguishes differing weights and viewpoints across stakeholder sub-groups (Board vs. local residents).
Limitations: Highly resource-intensive, taking months of facilitation and incurring substantial advisory costs.
Strategic Value: The process itself builds essential organizational buy-in and psychological alignment. Discussing, negotiating, and voting on material topics prepares the board and management team to execute the resulting strategy.

2️⃣ ExecutESG Custom AI-Driven DMA

Merits: Grounded in internal documents (eliminating generic LLM hallucinations) and highly systematic. It excelled at linking environmental impacts to financial consequences—for instance, mapping groundwater contamination risks to specific liabilities and insurance premium hikes. It is extremely fast (running in hours) and structures hundreds of IROs objectively.
Limitations: Treats the company as a single unified entity without the qualitative stakeholder granularity of human workshops. It also requires secondary human double-checks to correct occasional classification errors (e.g., misclassifying an own-workforce impact under circular economy).
Proprietary Advantage: Unlike standard AI, ExecutESG utilizes a forced pairwise comparison algorithm. By forcing agents to make trade-offs ("Which issue is more significant: A or B?"), it mathematically derives a "Win Rate" for every IRO, which is calibrated directly to standard 1–5 ESRS significance ratings. This removes subjective human fatigue and arbitrary scoring.

3️⃣ ChatGPT-Driven DMA (GPT-4)

Merits: Offers a very low barrier to entry for a first-pass ESRS-aligned baseline structure. It is highly capable of generating clean narrative summaries and structured tables.
Limitations: Lacks connection to company-specific realities, local operational nuances, or internal data. Without stakeholder validation, it remains generalized.

🧠 The Conclusion: AI as a Helper, Not a Replacement

The Hanken report concludes that AI-driven DMAs cannot fully replace human participation.

Because double materiality is fundamentally an interpretive and consensus-building process, human judgement is necessary to validate findings, resolve conflicting interests, and—most importantly—translate reporting outputs into actual strategic execution. Lived stakeholder experiences cannot be simulated entirely in a database.

However, the study highlights that AI-driven DMA is a game-changer as a helper and complement:

For Resource-Constrained Companies (VSMEs): Smaller businesses often lack the budget and time for months of consultant-led workshops. For these companies, an AI-driven DMA is highly relevant, providing a structured, compliant, and rigorous baseline ("just to have something solid") to kickstart their sustainability journey.
As a Hybrid Workflow: For larger organizations, the optimal approach is a hybrid model. Use AI to ingest company documents, run initial context screenings, and generate candidate topic longlists. Then, use human workshops to validate, prioritize, and build organizational alignment around those topics.

By combining the structural efficiency and mathematical consistency of ExecutESG's pairwise consensus engine with the qualitative wisdom of human stakeholders, companies can produce superior, audit-ready disclosures in a fraction of the time.

📂 Downloads & Full Reports

You can download the official presentation and complete academic report here:

📄 Download ExecutESG Final Presentation (PDF)
📘 Download Hanken DMA Final Report (PDF)

Products

Full Service

LEARN

PROVE

AI-Driven Double Materiality: Benchmarking Autonomous Agents Against Human Experts

AI-Driven Double Materiality: Benchmarking Autonomous Agents Against Human Experts

🎓 Hanken Student Research Team & Lead Contributors

📊 The Three Methodologies Compared

📈 Key Findings: Merits & Limitations of Each Approach

1️⃣ Human-Driven DMA

2️⃣ ExecutESG Custom AI-Driven DMA

3️⃣ ChatGPT-Driven DMA (GPT-4)

🧠 The Conclusion: AI as a Helper, Not a Replacement

📂 Downloads & Full Reports

Recommended Articles

🍪 Your Privacy Options