admin管理员组

文章数量:1652184

文章目录

  • 一.论文信息
  • 二.论文结构
  • 三.论文内容
    • Abstract
    • 摘要

一.论文信息

题目: Search-Based Testing Approach for DeepReinforcement Learning Agents.【基于搜索的深度强化学习智能体测试方法】

发表年份: 2022

期刊/会议: arkiv

论文链接: http://arxiv/abs/2206.07813

作者信息: Amirhossein Zolfagharian, Manel Abdellatif, Lionel Briand, Mojtaba Bagherzadeh and Ramesh S

二.论文结构

1.Introduction
2.Background
	2.1 Definitions
	2.2 State Abstraction	
3.Problem Definition
	3.1 RL Agent Testing Challenges
	3.2 Assumptions
4.Approach
	4.1 Reformulation as a Search Problem(重新表述为一个搜索问题)
	4.2 Overview of the Approach(方法概括)
	4.3 Initial Population(初始化种群)
	4.4 Fitness Computations(健康度的计算)
	4.5 Search Operators(搜索算符)
	4.6 Execution of Final Results(执行最终结果)
5.Empirical Evaluation(经验评估)
	5.1 Research Questions(提出的研究问题)
	5.2 Case Study(案例研究)
	5.3 Implementation(实现)
	5.4 Evaluation and Results(效果和评价)
6.Discussions
7.Threats to Validity(威胁的有效性)
8.Related Work
9.Conclusion

三.论文内容

Abstract

Deep Reinforcement Learning (DRL) algorithms have been increasingly employed during the last decade to solve various decision-making problems such as autonomous driving and robotics. However, these algorithms have faced great challenges when deployed in safety-critical environments since they often exhibit erroneous behaviors that can lead to potentially critical errors.

One way to assess the safety of DRL agents is to test them to detect possible faults leading to critical failures during their execution. This raises the question of how we can efficiently test DRL policies to ensure their correctness and adherence to safety requirements.

Most existing works on testing DRL agents use adversarial attacks that perturb states or actions of the agent. However, such attacks often lead to unrealistic states of the environment. Their main goal is to test the robustness of DRL agents rather than testing the compliance of agents’ policies with respect to requirements.

Due to the huge state space of DRL environments, the high cost of test execution, and the black-box nature of DRL algorithms, the exhaustive testing of DRL agents is impossible. In this paper, we propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent by effectively searching for failing executions of the agent within a limited testing budget. We use machine learning models and a dedicated genetic algorithm to narrow the search towards faulty episodes.

We apply STARLA on a Deep-Q-Learning agent which is widely used as a benchmark and show that it significantly outperforms Random Testing by detecting more faults related to the agent’s policy. We also investigate how to extract rules that characterize faulty episodes of the DRL agent using our search results. Such rules can be used to understand the conditions under which the agent fails and thus assess its deployment risks.

摘要

在过去十年中(during the last decade),深度强化学习(DRL)算法被越来越多地用于解决各种决策问题(solve various decision-making problems),如自动驾驶、交易决策和机器人技术。然而,这些算法在安全关键环境中部署时面临着巨大的挑战,因为它们经常表现出错误的行为(exhibit erroneous behaviors),可能导致潜在的关键错误。

评估DRL智能体安全性(assess the safety of DRL agents)的方法之一是对其进行测试,以检测在执行过程中可能导致关键故障的故障。这就提出了一个问题(this raises the question of),即我们如何有效地测试DRL策略,以确保它们的正确性和符合安全需求(adherence to safety requirements)。

大多数现有的测试(most existing works on)DRL智能体的工作使用干扰智能体状态或动作(perturb states or actions)的对抗性攻击。然而,这种攻击往往会导致环境的不现实状态(lead to unrealistic states of the environment)。此外,他们的主要目标是测试DRL智能体的鲁棒性(test the robustness of DRL agents),而不是测试智能体的策略与需求的合规性(testing the compliance of agents’ policies with respect to requirements)。

由于深度强化学习环境的巨大状态空间(the huge state space of DRL environments)、测试执行成本高(the high cost of test execution)以及深度强化学习算法的黑箱特性(the black-box nature of DRL algorithms),无法对深度强化学习代理进行穷举测试。本文提出一种基于搜索的强化学习智能体测试方法(STARLA),通过在有限的测试预算(within a limited testing budget)中有效搜索智能体执行失败的策略来测试DRL智能体的策略。依靠机器学习模型和专用遗传算法(a dedicated genetic algorithm)将搜索范围缩小到错误情节(即DRL智能体产生的状态和动作序列)(faulty episodes)。将STARLA应用于一个广泛使用的深度q学习智能体上,作为基准,表明它通过检测更多与智能体策略相关的错误,明显优于随机测试。

我们还研究了如何使用搜索结果提取描述DRL智能体错误情节的规则。这些规则可用于了解智能体失败的条件,从而评估部署它的风险(assess the risks of deploying it)。

本文标签: 论文BasedtestingApproachSearch