Developing Trust in AI for Financial Services: Current Progress and Future Directions

Anshuman Prasad
Jul 8
8 min read

Updated: Jul 23

The rapid integration of artificial intelligence (AI) into financial services has reached a critical point. Recently, Sam Altman highlighted significant limitations of AI systems, including issues such as "hallucinations"—instances where AI models produce plausible but incorrect information—and opaque reasoning, which obscure how decisions are made. These concerns underline the necessity for rigorous measures to ensure that AI systems, particularly in finance, are transparent, reliable, and trustworthy. Given that financial decisions have profound impacts on individuals and economies, establishing trust in AI is both critical and urgent. This article explores various strategies currently employed to foster trust in financial AI, examines key regulatory frameworks, and highlights areas requiring future innovation.

Usage of AI in Financial Services

AI has a long history of use in financial services, extending beyond the recent surge in generative AI applications. Traditional uses include algorithmic trading, fraud detection, and credit risk modeling, employing techniques such as random forests, gradient boosting machines, and neural networks.

The advent of large language models (LLMs) and generative AI has expanded AI's application scope significantly. Financial institutions—including banks, insurance companies, and fintech startups—are adopting these advanced models for tasks such as document summarization, customer support chatbots, internal workflow automation, and even programming assistance. Recent surveys indicate that 75% of financial services firms are already using AI, with an additional 10% planning implementation within three years1 2. The adoption spans fraud detection (60% of European institutions), credit scoring systems (63% of firms), and algorithmic trading (over 50% of trading firms)3. These newer applications, despite their promise, introduce substantial risks, including reputational damage and potential financial losses, thereby underscoring the importance of robust trust frameworks.

The Trust Imperative in Financial AI

Historically, financial institutions built trust through human interactions and their established reputations. AI, however, challenges this traditional model by introducing systems whose decision-making processes are complex and not inherently transparent. Consumers demand transparency from AI-driven financial services, and industry practitioners specifically prioritize "explainability"—the ability to clearly understand how decisions are reached. Specifically, there is considerable concern when AI is used to make decisions that could have serious impact on finances - such as loan denials, trading actions, or fraud alerts.

Current Trust-Building Frameworks in Financial Services

1. Regulatory-Driven Governance

The 2008 financial crisis exposed severe shortcomings in risk modeling, prompting regulations such as SR 11-7 in the United States, which set comprehensive standards for model risk management (MRM). However, this regulation despite being comprehensive, was not explicitly geared towards AI usage and allowed certain ambiguity, resulting in certain AI algorithms falling outside MRM’s scope based on model definitions and standards followed in some banks.

Recent regulatory evolutions have increasingly included explicit guidelines for AI and machine learning (ML) applications:

EU AI Act mandates regular bias audits and classifies lending and credit scoring AI as "high-risk."
Monetary Authority of Singapore (MAS) AI MRM requires centralized model inventories and real-time monitoring capabilities for proactive management.
UK Prudential Regulation Authority (PRA) SS 1/23 insists on independent validation teams and detailed lifecycle documentation for all ML models.
New York Department of Financial Services (NYDFS) requires banks to disclose the use of AI in credit decisions explicitly, including the types of data utilized.
California's Consumer Privacy Act (CCPA) regulates AI-driven data collection and usage, mandating transparency about automated decision-making processes.
Canada's Directive on Automated Decision-Making mandates assessments for fairness, transparency, and bias mitigation in AI-based systems.

These evolving regulations emphasize the necessity of transparency, fairness, and accountability in AI systems within financial services.

2. Enhanced Model Risk Management (MRM)

Modern MRM practices now demand continuous, real-time monitoring instead of periodic checks. Regulatory frameworks like the UK's PRA SS 1/23 mandate:

Continuous drift detection in model performance
Ongoing bias assessments (such as demographic parity and equalized odds)
Comprehensive, transparent audit trails

Advanced platforms automate governance activities, facilitating dynamic oversight. These platforms use machine learning to monitor model performance continuously, detect anomalies, and rapidly assess impacts of drift and bias. Automated alerting ensures immediate response to issues, enhancing reliability and accountability.

Additionally, advanced analytics tools enable detailed scenario analyses, aligning AI models with regulatory and ethical standards. Financial institutions are increasingly adopting specialized MRM tools that automate document generation, template population for model development and validation, and workflow configurations that integrate controls throughout the model lifecycle

3. Mechanistic Interpretability

Explainability of machine learning models has been a concern and the black box evaluation of models using traditional methods of LIME and SHAP has its own shortcomings. Mechanistic interpretability is a newer, more advanced approach that seeks to deeply understand the internal workings of complex models, especially neural networks.

Key techniques include (refer Tatsat et al Beyond the Black Box: Explainability in LLMs):

Sparse Autoencoders (SAEs): Neural networks that isolate specific internal features (e.g., neurons linked explicitly to credit risk), providing clarity on model decisions. Recent research shows that SAEs can decompose complex model activations into interpretable features, making it possible to systematically map model behavior to human-understandable concepts. As highlighted in Neuronopedia, this approach helps address the challenge of polysemantic neurons—where a single neuron encodes multiple, unrelated features—by enabling more granular, monosemantic representations.
Logit Lens: Tracing predictions layer-by-layer, enhancing transparency. This method allows researchers to examine how information propagates through each stage of the model, revealing the intermediate computations that lead to the final output.
Activation Patching: Identifying specific components responsible for decisions, allowing targeted adjustments. By editing internal activations and observing changes in model outputs, practitioners can causally attribute certain behaviors to specific features or circuits within the network, which is crucial for debugging and safety validation

4. Adversarial Protections

Adversarial testing systematically exposes and mitigates AI vulnerabilities by simulating realistic attacks. DEF CON evaluates models against threats such as data poisoning and model theft, complemented by NIST standards involving reverse stress tests and synthetic fraud scenarios.

DEF CON, officially known as the Defense Conference, is the world's largest annual cybersecurity convention where hackers, security professionals, and researchers gather to share knowledge, techniques, and discoveries in digital security. The DEF CON AI Village hosts specialized events focused on AI security, including the Generative AI Red Team (GRT) challenges where thousands of participants attempt to identify vulnerabilities in large language models from major AI companies. The GRT events have exposed significant security flaws, with the largest event involving 2,244 hackers evaluating 8 LLMs across 21 topics ranging from cybersecurity to misinformation5 6.] These events have contributed to improved AI safety measures across the industry.

The National Institute of Standards and Technology (NIST) is a U.S. federal agency that develops standards and guidelines to promote innovation and ensure public trust in technology. NIST's AI Risk Management Framework (AI RMF) provides voluntary guidance for organizations to identify, assess, and mitigate AI risks throughout the entire AI lifecycle. The framework includes four core functions—GOVERN, MAP, MEASURE, and MANAGE—and emphasizes trustworthy AI characteristics including validity, reliability, safety, security, accountability, explainability, and fairness7 8. The framework includes standards for testing, evaluation, verification, and validation (TEVV) of AI systems, along with comprehensive documentation requirements for system transparency.

5. Evolving Testing Metrics

The landscape of AI testing metrics is rapidly advancing, driven by the need for more sophisticated evaluation approaches that can address the complex challenges posed by modern AI systems. Frameworks like A. Sudjianto et al.'s HCAT and Liang P. et al.'s HELM propose advanced evaluation strategies emphasizing embedding-based metrics, robustness testing, and human calibration.

The HCAT framework introduces sophisticated embedding-based metrics that move beyond traditional n-gram approaches like BLEU and ROUGE. These metrics use contextual embeddings from models like BERT to measure semantic similarity, capturing paraphrases and contextual nuances that surface-level methods miss9 10. HELM's comprehensive approach evaluates 30+ language models across 42 scenarios, measuring seven key metrics including accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency, ensuring that non-accuracy metrics receive proper attention11 12.

Persistent Challenges

Despite significant advances, several challenges remain:

Generative AI Risks: AI-generated "hallucinations" evade detection, representing a critical reliability issue.
Scalability Issues: Interpretability methods struggle with large-scale models exceeding 10 billion parameters, as computational requirements grow exponentially.
Bias Entrenchment: Persistent racial disparities in lending decisions despite bias mitigation efforts.
Regulatory Fragmentation: Lack of interoperability among global regulatory frameworks creates compliance complexity for multinational institutions.

The Trust Roadmap: 2025 and Beyond

Automated and ongoing AI Model Testing integrated into MRM

Adopting AI/ML has resulted in constant changes to the parameters of the model dynamically and this drives the need for automating comprehensive MRM activities through new-age platforms. This includes real-time cataloguing, monitoring and testing of models being used within a bank’s environment. Advanced platforms will maintain dynamic model inventories that can automatically discover and catalog AI models, while intelligent workflow orchestration will provide configurable rule engines tailored to different risk tiers and regulatory jurisdictions. Models will undergo ongoing testing that will be triggered automatically and human oversight will be needed to study outliers, unusual patterns and analyze reports. This level of automation will help MRM teams to enlarge models under purview to accommodate the ever-expanding AI use-cases within financial services.

Embedded Cryptographic Trust: Zero-Knowledge Proofs

Zero-knowledge proofs (ZKPs) are advanced cryptographic techniques that allow one party to prove to another that a certain statement about an AI model is true—such as fairness, regulatory compliance, or correct execution—without revealing any sensitive details about the model itself or its underlying data. This is particularly important in financial services, where models often contain proprietary algorithms and use confidential customer data. ZKPs enable financial institutions to demonstrate to regulators and third parties that their AI models meet specific legal or ethical standards (e.g., absence of bias, adherence to lending rules) without exposing the model’s inner workings or sensitive training data. This preserves intellectual property and privacy while still ensuring accountability.

Cross-Industry Stress Testing

The Singapore AI Verify Foundation’s AI Assurance Pilot in 2025 brought together 17 organizations deploying GenAI applications across 10 industries—including banking, insurance, and technology—and paired them with 16 specialist AI testing firms from Singapore and eight other countries. These real-world pilots tested a diverse set of live applications, most with a human in the loop, and focused on surfacing and codifying emerging norms and best practices for technical evaluation. The pilot emphasized the importance of context-specific risk assessment, simulation testing for edge cases, and the value of independent, external evaluation to uncover systemic vulnerabilities.

In parallel, initiatives like the AI Safety Institute’s Turing Trials are advancing cross-industry stress testing by providing structured, multi-scenario evaluations that simulate adversarial threats and operational edge cases. Collectively, these efforts are shaping a more transparent, reliable, and collaborative approach to stress testing across the financial sector and beyond.

Future LLM Explainability Research

In their paper Beyond the Black Box: Interpretability of LLMs in Finance, Tatsat et al (2025) note that while current techniques like sparse autoencoders and feature attribution can reveal some internal mechanisms, future research must address the challenge of polysemantic neurons and emergent behaviors in models with trillions of parameters. This requires scalable methods to map internal model components to human-understandable concepts, especially as models become more complex and dynamic. They also stress the importance of developing causal tracing tools that can establish clear, auditable links between model inputs, internal reasoning steps, and outputs. This is crucial for meeting regulatory requirements and providing actionable explanations in regulated industries.

Future explainability research should focus on building domain-aware interpretability frameworks that can translate LLM reasoning into financial concepts, such as risk factors or compliance criteria, making explanations more relevant and actionable for practitioners.

Conclusion

Trust in financial AI is foundational. Institutions must integrate rigorous regulatory compliance, advanced interpretability, sophisticated adversarial protections, and robust evaluation frameworks. Future success depends on proactive industry-wide collaboration, alignment with global regulatory standards, and sustained investment in AI innovations.

Further Reading and References

European Commission. (2025). EU AI Act: Financial Services Annex.
Monetary Authority of Singapore. (2025). AI Model Risk Management Guidelines.
UK PRA. (2023). SS 1/23: Supervisory Statement on Model Risk Management.
NYDFS. (2024). AI Use in Financial Decision Making Regulations.
CCPA. (2020). Automated Decision-Making Transparency Requirements.
Canada Treasury Board Secretariat. (2021). Directive on Automated Decision-Making.
Sudjianto, A. et al. (2024). Human-Calibrated Automated Testing (HCAT).
Liang, P. et al. (2023). Holistic Evaluation of Language Models (HELM).
Tatsat.H and Shater.A (2025). Beyond the Black Box: Interpretability of LLMs in Finance
DEFCON AI Red Team. (2025). Financial AI Vulnerability Database.
NIST (2024). Adversarial Testing Standards for AI Models.
Neuronpedia. (2024). Sparse Autoencoder available at https://docs.neuronpedia.org/sparse-autoencoder
Turing Trials. (2025). UK AI Safety Institute Cross-Industry Stress Testing available at https://www.turing.ac.uk/news/new-ai-security-initiative-set-boost-uks-resilience-against-hostile-threats
Singapore AI Verify Foundation (2025) Main report on AI assurance pilot of technical testing of Generative AI applications
Quantum Cryptography Report. (2024). Quantum Key Distribution and Post-Quantum Cryptography Standards.