admin管理员组

文章数量:1606657

文章目录

    • 一. 先看下官网(可略)
      • 1. Overview
      • 2. Configuration
        • 2.1. Setting up ResourceManager to use CapacityScheduler
        • 2.2. Setting capacity-scheduler.xml
      • 3. Changing Queue Configuration
      • 4. Updating a Container (Experimental - API may change in the future)
    • 二. 动手设置队列
      • 1. 设置容量调度器
      • 2. 设置capacity-scheduler.xml
        • 2.1. 设置队列资源
        • 2.2. 统一权限控制
      • 3. 执行生效
    • 完整配置示例

通过设置yarn的资源队列,可以实现不同业务的资源隔离,同时设置队列的弹性范围,以便在某个队列资源紧张时,可以使用其他队列的资源。

官网:hadoop CapacityScheduler

一. 先看下官网(可略)

1. Overview

我们先对容量调度器有一个认识:即它适合多租户的业务场景,简单的说可以规划不同的业务使用不同的队列资源。

The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.

2. Configuration

2.1. Setting up ResourceManager to use CapacityScheduler

在yarn-site.xml文件中设置:

PropertyValue
yarn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
2.2. Setting capacity-scheduler.xml

etc/hadoop/capacity-scheduler.xml is the configuration file for the CapacityScheduler.

设置:capacity-scheduler.xml

1. setting up Queue

我们接下来设置的所有队列都属于root队列的子集。通过逗号分隔来设置一个队列下的子队列。

The CapacityScheduler has a predefined queue called root. All queues in the system are children of the root queue.

Further queues can be setup by configuring yarn.scheduler.capacity.root.queues with a list of comma-separated child queues.

queue-path的概念:通过queue path可以制定一个队列,一个完整的queue path:从root开头, . 来说明队列继承关系。

yarn.scheduler.capacity..queues
The configuration for CapacityScheduler uses a concept called queue path to configure the hierarchy of queues. The queue path is the full path of the queue’s hierarchy, starting at root, with . (dot) as the delimiter.

如下:

<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>a,b,c</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.a.queues</name>
  <value>a1,a2</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.b.queues</name>
  <value>b1,b2,b3</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

2. Queue Properties

	Resource Allocation
	Resource Allocation using Absolute Resources configuration
	Running and Pending Application Limits
	Queue Administration & Permissions
	Queue Mapping based on User or Group, Application Name or user defined placement rules
	Queue lifetime for applications

3. application priority
Application priority works only along with FIFO ordering policy. Default ordering policy is FIFO.

4. Capacity Scheduler container preemption
Capacity Scheduler 允许 container 分配多于其所在的队列资源

5. Reservation Properties

6. Configuring ReservationSystem with CapacityScheduler

7. Dynamic Auto-Creation and Management of Leaf Queues
CapacityScheduler支持通过queue mapping自动创建父队列下的子队列。

8. Other Properties

 

3. Changing Queue Configuration

This behavior can be changed via yarn.scheduler.configuration.store.class in yarn-site.xml. Possible values are file, which allows modifying properties via file; memory, which allows modifying properties via API, but does not persist changes across restart; leveldb, which allows modifying properties via API and stores changes in leveldb backing store; and zk, which allows modifying properties via API and stores changes in zookeeper backing store. The default value is file.

两种方式去设置队列,通过API或者文件,鉴于重启会导致API修改的队列配置失效(但可以通过zk持久化),本文通过文件来配置队列

  1. 编辑capacity-scheduler.xml 和 yarn-site.xml
  2. 执行yarn rmadmin -refreshQueues 可以使得队列配置生效。

4. Updating a Container (Experimental - API may change in the future)

期待一下

 
 

二. 动手设置队列

1. 设置容量调度器

修改 yarn-site.xml

<!-- 使用容量调度器 -->
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

 

2. 设置capacity-scheduler.xml

2.1. 设置队列资源
  1. 设置子队列:可以将整体资源分配成三个队列,default、online、offline,
  2. 设置队列资源:比如分别占用20%、30%、50%的资源。总量(必须是)100%。
  3. 设置弹性队列:例如 online队列默认分配30%,最大为50%的集群资源,当其他队列资源空闲时可以使用集群中资源的50%。

[root@bigdata01 hadoop]# vi capacity-scheduler.xml
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,test,test1</value>
    <description>队列列表,多个队列之间使用逗号分割</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>20</value>
    <description>default队列20%</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.online.capacity</name>
    <value>30</value>
    <description>online队列30%</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.offline.capacity</name>
    <value>50</value>
    <description>offline队列50%</description>
  </property>
  <!-- 设置弹性队列 资源上xian--->
  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>40</value>
    <description>Default队列可使用的资源上限.</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.online.maximum-capacity</name>
    <value>50</value>
    <description>online队列可使用的资源上限.</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.offline.maximum-capacity</name>
    <value>60</value>
    <description>offline队列可使用的资源上限.</description>
  </property>

 

2.2. 统一权限控制

队列分配资源后,对权限有严格的控制,队列只允许有权限用户的提交任务和管理任务.
权限控制分 提交权限和控制权限:

  • 提交权限:拥有权限才能提交任务到该队列中;
  • 控制权限:拥有权限才能kill 任务;

提交权限

 <!--  配置三个队列-->
   <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,online,offline</value>
        <!-- 3个队列-->
        <description>The queues at the this level (root is the root queue).</description>
    </property>

  <property>
      <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
      <value> </value> #空格表示任何人都无法往root队列提交作业
  </property>
 #queue-name=root.default
 <property>
   <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
      <value>test,b1</value> #testqueue只允许test用户和b1用户提交作业
  </property>
   <property>
   <name>yarn.scheduler.capacity.root.online.acl_submit_applications</name>
      <value>test</value> #online只允许test用户提交作业
  </property>
   <property>
   <name>yarn.scheduler.capacity.root.offlinea.acl_submit_applications</name>
      <value>b1</value> #offline只允许b1用户提交作业
  </property>

控制权限:

#queue-name=root
  <property>
      <name>yarn.scheduler.capacity.root.acl_administer_queue</name>
      <value> </value> <!-- ACL继承性,父队列需控制权限-->
  </property>
 #queue-name=root.default
 <property>
   <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
      <value>test,a1</value> #default队列的任务只允许test用户和a1用户停止
  </property> 

 

3. 执行生效

`yarn rmadmin -refreshQueues` 

 

完整配置示例

<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare 
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare 
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,test1,test2</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>30</value>
    <description>Default queue target capacity.</description>
  </property>
<property>
    <name>yarn.scheduler.capacity.root.test1.capacity</name>
    <value>30</value>
    <description>test1 queue target capacity.</description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.test2.capacity</name>
    <value>40</value>
    <description>test1 queue target capacity.</description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test1.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test2.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>70</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.test1.maximum-capacity</name>
    <value>70</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.test2.maximum-capacity</name>
    <value>70</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test1.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test2.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
<!---->
  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
   <property>
    <name>yarn.scheduler.capacity.root.test1.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
   <property>
    <name>yarn.scheduler.capacity.root.test2.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
  
  
  
  

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test1.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.test2.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.test1.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.test2.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>
  

   <property>
     <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
     </description>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.default-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
     </description>
   </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.workflow-priority-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to override application priority.
      The syntax for this list is
      [workflowId]:[full_queue_name]:[priority][,next mapping]*
      where an application submitted (or mapped to) queue "full_queue_name"
      and workflowId "workflowId" (as specified in application submission
      context) will be given priority "priority".
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.workflow-priority-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a priority mapping is present, will it override the value specified
      by the user? This can be used by administrators to give applications a
      priority that is different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

本文标签: 队列资源弹性权限业务