


  • 1、三种常见调度器
    • 1.1、先进先出调度器
    • 1.2、容量调度器
    • 1.3、公平调度器
  • 2、容量调度器 多队列配置
  • 3、单词
  • 4、默认配置【capacity-scheduler.xml】



  • first-in first-out scheduler
  • FIFO Scheduler
  • 后入队的任务 要等待 前入队的任务 出队


  • Capacity Scheduler
  • 相当于 多个 FIFO Scheduler
  • 不同队列上的任务可以并行(比如 3个队列就可以并行3个任务)
  • 相同队列上的任务不能并行
(如:a队列40% b队列60%)



  • Fair Scheduler
  • 和Capacity Scheduler类似,可以多队列配置;不同的是,叶子队列不是FIFO的
  • 在同一条叶子队列上,所有作业可以并发;



2、容量调度器 多队列配置


vim $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml



3、修改 名为default的队列 的容量占比



    <description>root下名为hive的队列 的 容量占比</description>
    <description>访问控制列表:限定哪些用户 能访问该队列</description>
    <description>访问控制列表:限定哪些用户 可以管理该队列上的作业</description>
    <description>提交到该队列的应用 的 最大生存时间(-1表示无限时间)</description>
    <description>提交到该队列的应用 的 默认生存时间(-1表示无限时间;要求小于最大生存时间)</description>

5、分发配置 $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml



  • Java代码的org.apache.hadoop.conf.Configuration
configuration.set("mapred.job.queuename", "hive");
  • HIVE


FIFOfirst-in first-out先入先出
ACLAccess Control Lists访问控制列表
scheduleˈskedʒuːln. 计划(表);时间表;v. 预定
applicationˌæplɪˈkeɪʃnn. 应用;申请;应用程序


<!-- 容量调度器中 挂起和运行的应用程序 的 最大数量 -->
    <description>Maximum number of applications that can be pending and running.</description>

<!-- 可以用来运行【Application Masters】的最大资源占比 -->
Maximum percent of resources in the cluster which can be used to run 
application masters i.e. controls number of concurrent running applications.

<!-- 容量调度器中的【资源计算器】,它用来 比较资源,默认比较资源的内存,
另外可以选择别的资源计算器,从资源的多个维度(不仅内存,还有CPU等)来比较 -->
The ResourceCalculator implementation to be used to compare Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare 
multi-dimensional resources such as Memory, CPU etc.

<!-- 在 名为root的队列 下 设置队列名称(默认default一条队列,可设置多队列) -->
    <description>The queues at the this level (root is the root queue).</description>

<!-- root下名为default的队列 的 容量占比 -->
    <description>Default queue target capacity.</description>

<!-- 每个用户可以占据该队列资源占比的上限(防止某用户把资源占满) -->
    <description>Default queue user limit a percentage from 0.0 to 1.0.</description>

<!-- 该队列的最大容量占比 -->

<!-- 该队列状态(RUNNING or STOPPED) -->

<!-- 访问控制列表:限定哪些用户 能访问该队列 -->

<!-- 访问控制列表:限定哪些用户 可以管理该队列上的作业 -->
    <description>The ACL of who can administer jobs on the default queue.</description>

The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]

<!-- 提交到该队列的应用 的 最大生存时间(-1表示无限时间) -->
Maximum lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as disabled.
This will be a hard time limit for all applications in this
queue. If positive value is configured then any application submitted
to this queue will be killed after exceeds the configured lifetime.
User can also specify lifetime per application basis in
application submission context. But user lifetime will be
overridden if it exceeds queue maximum lifetime. It is point-in-time
Note : Configuring too low value will result in killing application
sooner. This feature is applicable only for leaf queue.

<!-- 提交到该队列的应用 的 默认生存时间(-1表示无限时间;要求小于最大生存时间) -->
Default lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
If the user has not submitted application with lifetime value then this
value will be taken. It is point-in-time configuration.
Note : Default lifetime can't exceed maximum lifetime. This feature is
applicable only for leaf queue.

Number of missed scheduling opportunities after which the CapacityScheduler 
attempts to schedule rack-local containers.
When setting this parameter, the size of the cluster should be taken into account.
We use 40 as the default value, which is approximately the number of nodes in one rack.
Note, if this value is -1, the locality constraint in the container request
will be ignored, which disables the delay scheduling.

Number of additional missed scheduling opportunities over the node-locality-delay
ones, after which the CapacityScheduler attempts to schedule off-switch containers,
instead of rack-local ones.
Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
after 40+20=60 missed opportunities.
When setting this parameter, the size of the cluster should be taken into account.
We use -1 as the default value, which disables this feature. In this case, the number
of missed opportunities for assigning off-switch containers is calculated based on
the number of containers and unique locations specified in the resource request,
as well as the size of the cluster.

A list of mappings that will be used to assign jobs to queues
The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
Typically this list will be used to map users to queues,
for example, u:%user:%user maps all users to queues with the same name
as the user.

If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.

Controls the number of OFF_SWITCH assignments allowed
during a node's heartbeat. Increasing this value can improve
scheduling rate for OFF_SWITCH containers. Lower values reduce
"clumping" of applications on particular nodes. The default is 1.
Legal values are 1-MAX_INT. This config is refreshable.

Whether RM should fail during recovery if previous applications'
queue is no longer valid.
