admin管理员组

文章数量:1534203

  • 参考 https://nsq.io/overview/design.html

QUICK START

  • The following steps will run a small NSQ cluster on your local machine and walk through publishing, consuming, and archiving messages to disk.

    1. follow the instructions in the INSTALLING doc.

    2. in one shell, start nsqlookupd:

       $ nsqlookupd
      
    3. in another shell, start nsqd:

       $ nsqd --lookupd-tcp-address=127.0.0.1:4160
      
    4. in another shell, start nsqadmin:

       $ nsqadmin --lookupd-http-address=127.0.0.1:4161
      
    5. publish an initial message (creates the topic in the cluster, too):

       $ curl -d 'hello world 1' 'http://127.0.0.1:4151/pub?topic=test'
      
    6. finally, in another shell, start nsq_to_file:

      $ nsq_to_file --topic=test --output-dir=/tmp --lookupd-http-address=127.0.0.1:4161

    7. publish more messages to nsqd:

       $ curl -d 'hello world 2' 'http://127.0.0.1:4151/pub?topic=test'
       $ curl -d 'hello world 3' 'http://127.0.0.1:4151/pub?topic=test'
      
    8. to verify things worked as expected, in a web browser open http://127.0.0.1:4171/ to view the nsqadmin UI and see statistics. Also, check the contents of the log files (test.*.log) written to /tmp.

    (nsqd类似broker,一方面接收producer的消息,另一方面展开多个channel和consumer联系;nsqlookupd是搜索服务)

  • The important lesson here is that nsq_to_file (the client) is not explicitly told where the test topic(producer的topic名字,看curl那几条语句) is produced, it retrieves this information from nsqlookupd and, despite the timing of the connection, no messages are lost.

Features And Guarantees

NSQ is a realtime distributed messaging platform.
Features
  1. 无单点故障

  2. 支持水平扩展

  3. 低延时

  4. 负载均衡和多播信息

  5. 既擅长流式处理(高吞吐量),又擅长作业处理(低吞吐量)

  6. 初始保存在内存上,超过一定程度后透明保存在磁盘上

    …(后面不翻译了)

  • support distributed topologies with no SPOF(单点故障)

  • horizontally scalable (no brokers, seamlessly add more nodes to the cluster)(没有broker)

  • low-latency push based message delivery (performance)

  • combination load-balanced and multicast style message routing

  • excel at both streaming (high-throughput) and job oriented (low-throughput) workloads

  • primarily in-memory (beyond a high-water mark messages are transparently kept on disk)

  • runtime discovery service for consumers to find producers (nsqlookupd)

  • transport layer security (TLS)

  • data format agnostic

  • few dependencies (easy to deploy) and a sane, bounded, default configuration

  • simple TCP protocol supporting client libraries in any language

  • HTTP interface for stats, admin actions, and producers (no client library needed to publish)

  • integrates with statsd for realtime instrumentation

  • robust cluster administration interface (nsqadmin)

Guarantees
As with any distributed system, achieving your goal is a matter of making intelligent tradeoffs. By being transparent about the reality of these tradeoffs we hope to set expectations about how NSQ will behave when deployed in production.
  1. 消息默认不持久化

    因为默认在内存,除非把–mem-queue-size参数设置成0;

    没有built-in的复制,但是有很多方式进行容错处理

  2. 消息至少会被传递1次

    这个意思是,很多情况下(例如客户端超时、断线、重新排队等)同一条消息会被传递多次,所以client要有去重措施

  3. 接收到的消息是无序的

  4. Consumer 最终会找到所有的 Topic Producer

  • messages are not durable by default

    Although the system supports a “release valve” (–mem-queue-size) after which messages will be transparently kept on disk, it is primarily an in-memory messaging platform.

    –mem-queue-size can be set to 0 to ensure that all incoming messages are persisted to disk. In this case, if a node failed, you are susceptible to a reduced failure surface (i.e. did the OS or underlying IO subsystem fail).

    There is no built in replication. However, there are a variety of ways this tradeoff is managed such as deployment topology and techniques which actively slave and persist topics to disk in a fault-tolerant fashion.

  • messages are delivered at least once

    Closely related to above, this assumes that the given nsqd node does not fail.

    This means, for a variety of reasons, messages can be delivered multiple times (client timeouts, disconnections, requeues, etc.). It is the client’s responsibility to perform idempotent operations or de-dupe.

  • messages received are un-ordered

    You cannot rely on the order of messages being delivered to consumers.

    Similar to message delivery semantics, this is the result of requeues(1), the combination of in-memory and on disk storage(2), and the fact that each nsqd node shares nothing(3).

    It is relatively straightforward to achieve loose ordering (i.e. for a given consumer its messages are ordered but not across the cluster as a whole) by introducing a window of latency in your consumer to accept messages and order them before processing (although, in order to preserve this invariant one must drop messages falling outside that window).

  • consumers eventually find all topic producers

    The discovery service (nsqlookupd) is designed to be eventually consistent. nsqlookupd nodes do not coordinate to maintain state or answer queries.

    Network partitions do not affect availability in the sense that both sides of the partition can still answer queries. Deployment topology has the most significant effect of mitigating these types of issues.

FAQ

Deployment
  1. What is the recommended topology for nsqd?

    We strongly recommend running an nsqd alongside any service(s) that produce messages.

    nsqd is a relatively lightweight process with a bounded memory footprint, which makes it well suited to “playing nice with others”.(推荐把nsqd进程和产生消息的进程一起运行,因为nsqd是一个轻量级、有界内存占用的进程 --> 这就涉及到了进程间通信

    This pattern aids in structuring message flow as a consumption problem rather than a production one.

    Another benefit is that it essentially forms an independent, sharded, silo of data for that topic on a given host.

    NOTE: this isn’t an absolute requirement though, it’s just simpler (see question below).

  2. Why can’t nsqlookupd be used by producers to find where to publish to?(为什么Producer侧不提供nsqlookupd用来发现消息被发送到哪里去)

    NSQ promotes a consumer-side discovery model that alleviates the upfront configuration burden of having to tell consumers where to find the topic(s) they need(nsqlookupd提供了一个Consumer侧的模型,用于减轻Consumer还要自己去找topics的负担).

    However, it does not provide any means to solve the problem of where a service should publish to. This is a chicken and egg problem, the topic does not exist prior to the first publish(因为第一次publish之前还没有topic的概念).

    By co-locating nsqd (see question above), you sidestep this problem entirely (your service simply publishes to the local nsqd) and allow NSQ’s runtime discovery system to work naturally.

  3. I just want to use nsqd as a work queue on a single node, is that a suitable use case?

    Yep, nsqd can run standalone just fine.

    nsqlookupd is beneficial in larger distributed environments.

    单个nsqd作为单节点queue也可以,分布式环境的话就可以用 nsqlookupd 了

  4. How many nsqlookupd should I run?

    Typically only a few depending on your cluster size, number of nsqd nodes and consumers, and your desired fault tolerance.

    3 or 5 works really well for deployments involving up to several hundred hosts and thousands of consumers.

    nsqlookupd nodes do not require coordination to answer queries. The metadata in the cluster is eventually consistent.

Publishing
  1. Do I need a client library to publish messages(发布消息需要专门的client lib吗)?

    不需要,靠http就行

    NO! Just use the HTTP endpoints for publishing (/pub and /mpub). It’s simple, it’s easy, and it’s ubiquitous in almost any programming environment.

    In fact, the overwhelming majority of NSQ deployments use HTTP to publish.

  2. Why force a client to handle responses to the TCP protocol’s PUB and MPUB commands?

    We believe NSQ’s default mode of operation should prioritize safety and we wanted the protocol to be simple and consistent.

  3. When can a PUB or MPUB fail?

    (1) The topic name is not formatted correctly (to character/length restrictions). See the topic and channel name spec.

    (2) The message is too large (this limit is exposed as a parameter to nsqd).

    (3) The topic is in the middle of being deleted.

    (4) nsqd is in the middle of cleanly exiting.

    (5) Any client connection-related failures during the publish.

    (1) and (2) should be considered programming errors. (3) and (4) are rare and (5) is a natural part of any TCP based protocol.

  4. How can I mitigate scenario (3) above?

    Deleting topics is a relatively infrequent operation. If you need to delete a topic, orchestrate the timing such that publishes eliciting topic creations will never be performed until a sufficient amount of time has elapsed since deletion.

    删除 topic 操作很少见,如果非要删除,那要设置一个删除命令执行延时

Design and Theory
  1. How do you recommend naming topics and channels?

    A topic name should describe the data in the stream.

    A channel name should describe the work performed by its consumers.

    For example, good topic names are encodes, decodes, api_requests, page_views and good channel names are archive, analytics_increment, spam_analysis.

  2. Are there any limitations to the number of topics and channels a single nsqd can support?(一个nsqd可以支持的topic和channel数量有限制吗?)

    没有built-in限制,只受物理CPU、内存限制

    There are no built-in limits imposed. It is only limited by the memory and CPU of the host nsqd is running on (per-client CPU usage was greatly reduced in issue #236).

  3. How are new topics announced to the cluster?

    The first PUB or SUB to a topic will create the topic on an nsqd. Topic metadata will then propagate to the configured nsqlookupd. Other readers will discover this topic by periodically querying the nsqlookupd.

  4. Can NSQ do RPC?

    Yes, it’s possible, but NSQ was not designed with this use case in mind.

    We intend to publish some docs on how this could be structured but in the meantime reach out if you’re interested.

DESIGN

  • NSQ is a successor to simplequeue (part of simplehttp) and as such is designed to (in no particular order):

    1. support topologies that enable high-availability and eliminate SPOFs

    2. address the need for stronger message delivery guarantees

    3. bound the memory footprint of a single process (by persisting some messages to disk)

      nsqd进程通过把一些message放到磁盘上,确保内存消耗有界

    4. greatly simplify configuration requirements for producers and consumers

    5. provide a straightforward upgrade path

    6. improve efficiency

Simplifying Configuration and Administration
  • A single nsqd instance is designed to handle multiple streams of data at once. Streams are called “topics” and a topic has 1 or more “channels”. Each channel receives a copy of all the messages for a topic. In practice, a channel maps to a downstream service consuming a topic.

    一个nsqd实例,可以处理多个topic的数据;每个topic可以有1或多个channel,每个channel接收到的是绑定的topic的所有message的副本

  • Topics and channels are not configured a priori. Topics are created on first use by publishing to the named topic or by subscribing to a channel on the named topic. Channels are created on first use by subscribing to the named channel.

    Topics是首次发布/channel首次订阅时创建;

    Channel时首次订阅时创建

  • Topics and channels all buffer data independently of each other, preventing a slow consumer from causing a backlog for other channels (the same applies at the topic level).

    Topics和Channels都用了NIO中的buffer思想,防止某个Consumer消费过慢

  • A channel can, and generally does, have multiple clients connected. Assuming all connected clients are in a state where they are ready to receive messages, each message will be delivered to a random client. For example:

    [外链图片转存失败(img-ddSzbLAw-1563328287962)(…/resources/f1434dc8-6029-11e3-8a66-18ca4ea10aca.gif)]

    一个channel一般和1个或多个client连接,每一条消息会被随机的分配到一个准备好的client上

    To summarize, messages are multicast from topic -> channel (every channel receives a copy of all messages for that topic每个channel都会接收到topic到全部message) but evenly distributed from channel -> consumers (each consumer receives a portion of the messages for that channel).

    总结一下是:每个channel会接收关联的topic的所有message,但是一个channel可能和多个consumer关联,所以最终每个consumer收到的是部分topic,它们加在一起就是全部的(甚至会带上冗余message,入前所述的at least once机制)

      1个topic   --> 1或多个channel
      1个channel --> 1或多个client
    
  • NSQ also includes a helper application, nsqlookupd, which provides a directory service where consumers can lookup the addresses of nsqd instances that provide the topics they are interested in subscribing to. In terms of configuration, this decouples the consumers from the producers (they both individually only need to know where to contact common instances of nsqlookupd, never each other), reducing complexity and maintenance.

    nsqlookupd用于让consumer发现nsqd实例,实现了consumer和producer的解耦

  • At a lower level each nsqd has a long-lived TCP connection to nsqlookupd over which it periodically pushes its state. This data is used to inform which nsqd addresses nsqlookupd will give to consumers. For consumers, an HTTP /lookup endpoint is exposed for polling.

    每个nsqd实例和nsqlookupd之间会建立一个TCP长连接,每隔一段时间发个心跳

  • To introduce a new distinct consumer of a topic, simply start up an NSQ client configured with the addresses of your nsqlookupd instances. There are no configuration changes needed to add either new consumers or new publishers, greatly reducing overhead and complexity.

    想要添加某个topic的新consumer实例时,只需要启动一个和nsqlookupd关联的client实例就行

  • NOTE: in future versions, the heuristic nsqlookupd uses to return addresses could be based on depth, number of connected clients, or other “intelligent” strategies. The current implementation is simply all. Ultimately, the goal is to ensure that all producers are being read from such that depth stays near zero.

    It is important to note that the nsqd and nsqlookupd daemons are designed to operate independently, without communication or coordination between siblings.

    We also think that it’s really important to have a way to view, introspect, and manage the cluster in aggregate. We built nsqadmin to do this. It provides a web UI to browse the hierarchy of topics/channels/consumers and inspect depth and other key statistics for each layer. Additionally it supports a few administrative commands such as removing and emptying a channel (which is a useful tool when messages in a channel can be safely thrown away in order to bring depth back to 0).

Straightforward Upgrade Path
这段是在将NSQ易于使用和扩展,包括提供了go/python lib、很容易添加nsqd实例、几个小工具
  • This was one of our highest priorities. Our production systems handle a large volume of traffic, all built upon our existing messaging tools, so we needed a way to slowly and methodically upgrade specific parts of our infrastructure with little to no impact.

  • First, on the message producer side we built nsqd to match simplequeue. Specifically, nsqd exposes an HTTP /put endpoint, just like simplequeue, to POST binary data (with the one caveat that the endpoint takes an additional query parameter specifying the “topic”). Services that wanted to switch to start publishing to nsqd only have to make minor code changes.

  • Second, we built libraries in both Python and Go that matched the functionality and idioms we had been accustomed to in our existing libraries. This eased the transition on the message consumer side by limiting the code changes to bootstrapping. All business logic remained the same.

  • Finally, we built utilities to glue old and new components together. These are all available in the examples directory in the repository:

      nsq_pubsub - expose a pubsub like HTTP interface to topics in an NSQ cluster
    
      nsq_to_file - durably write all messages for a given topic to a file
    
      nsq_to_http - perform HTTP requests for all messages in a topic to (multiple) endpoints
    
Eliminating SPOFs
  • NSQ is designed to be used in a distributed fashion. nsqd clients are connected (over TCP) to all instances providing the specified topic. There are no middle-men, no message brokers, and no SPOFs:

    consumer和所有相关联的topic的nsqd实例相连(TCP),没有中间人、消息分发商,也没有单点故障

    [外链图片转存失败(img-CRd99171-1563328287964)(…/resources/tumblr_mat85kr5td1qj3yp2.png)]

  • This topology eliminates the need to chain single, aggregated, feeds. Instead you consume directly from all producers. Technically, it doesn’t matter which client connects to which NSQ, as long as there are enough clients connected to all producers to satisfy the volume of messages, you’re guaranteed that all will eventually be processed.

  • For nsqlookupd, high availability is achieved by running multiple instances. They don’t communicate directly to each other and data is considered eventually consistent. Consumers poll all of their configured nsqlookupd instances and union the responses. Stale, inaccessible, or otherwise faulty nodes don’t grind the system to a halt.

Message Delivery Guarantees
  • NSQ guarantees that a message will be delivered at least once, though duplicate messages are possible. Consumers should expect this and de-dupe or perform idempotent operations.

    NSQ保证一条message至少会被发送一次,可能多次

  • This guarantee is enforced as part of the protocol and works as follows (assume the client has successfully connected and subscribed to a topic):

    这三条保证了 at-least-once:nsq先在本地把消息存起来,客户端收到并校验成功就ack,收到但是出现问题就通知nsq重新把这条消息加入队列,出现任何问题就超时重入队

    1. client indicates they are ready to receive messages

    2. NSQ sends a message and temporarily stores the data locally (in the event of re-queue or timeout)

    3. client replies FIN (finish) or REQ (re-queue) indicating success or failure respectively. If client does not reply, NSQ will timeout after a configurable duration and automatically re-queue the message)

  • This ensures that the only edge case that would result in message loss is an unclean shutdown of an nsqd process. In that case, any messages that were in memory (or any buffered writes not flushed to disk) would be lost.

    只有一种情况会丢失消息:nsqd进程错误关闭,有些没有被同步到磁盘的message就丢了。解决办法就是冗余nsqd进程

    If preventing message loss is of the utmost importance, even this edge case can be mitigated. One solution is to stand up redundant nsqd pairs (on separate hosts) that receive copies of the same portion of messages. Because you’ve written your consumers to be idempotent, doing double-time on these messages has no downstream impact and allows the system to endure any single node failure without losing messages.

  • The takeaway is that NSQ provides the building blocks to support a variety of production use cases and configurable degrees of durability.

Bounded Memory Footprint
  • nsqd provides a configuration option --mem-queue-size that will determine the number of messages that are kept in memory for a given queue. If the depth of a queue exceeds this threshold messages are transparently written to disk. This bounds the memory footprint of a given nsqd process to mem-queue-size * #_of_channels_and_topics:

    因此,一个nsqd进程占用的最大内存:

      mem-queue-size * topics&channels数量
    

    [外链图片转存失败(img-lUBUO61T-1563328287965)(…/resources/tumblr_mavte17V3t1qj3yp2.png)]

  • Also, an astute observer might have identified that this is a convenient way to gain an even higher guarantee of delivery by setting this value to something low (like 1 or even 0). The disk-backed queue is designed to survive unclean restarts (although messages might be delivered twice).

    如果把 mem-queue-size 这个数选的小一些,那么几乎所有的消息都会放到磁盘上,那么就比较稳

  • Also, related to message delivery guarantees, clean shutdowns (by sending a nsqd process the TERM signal) safely persist the messages currently in memory, in-flight, deferred, and in various internal buffers.

  • Note, a topic/channel whose name ends in the string #ephemeral will not be buffered to disk and will instead drop messages after passing the mem-queue-size. This enables consumers which do not need message guarantees to subscribe to a channel. These ephemeral channels will also disappear after its last client disconnects. For an ephemeral topic, this implies that at least one channel has been created, consumed, and deleted (typically an ephemeral channel).

Efficiency
  • NSQ was designed to communicate over a “memcached-like” command protocol with simple size-prefixed responses. All message data is kept in the core including metadata like number of attempts, timestamps, etc. This eliminates the copying of data back and forth from server to client, an inherent property of the previous toolchain when re-queueing a message. This also simplifies clients as they no longer need to be responsible for maintaining message state.

    message data被设计的比较简洁

  • Also, by reducing configuration complexity, setup and development time is greatly reduced (especially in cases where there are >1 consumers of a topic).

  • For the data protocol, we made a key design decision that maximizes performance and throughput by pushing data to the client instead of waiting for it to pull. This concept, which we call RDY state, is essentially a form of client-side flow control.

  • When a client connects to nsqd and subscribes to a channel it is placed in a RDY state of 0. This means that no messages will be sent to the client. When a client is ready to receive messages it sends a command that updates its RDY state to some # it is prepared to handle, say 100. Without any additional commands, 100 messages will be pushed to the client as they are available (each time decrementing the server-side RDY count for that client).

    这个设计说的是:nsq的client发送command,更新自己可以接收的message数量;一旦这个数大于0,那么它订阅的channel有message以后,不管此时client处于什么状态,直接push小于等于这个数的messages

  • Client libraries are designed to send a command to update RDY count when it reaches ~25% of the configurable max-in-flight setting (and properly account for connections to multiple nsqd instances, dividing appropriately).

    [外链图片转存失败(img-C6Mf10fF-1563328287965)(…/resources/tumblr_mataigNDn61qj3yp2.png)]

    This is a significant performance knob as some downstream systems are able to more-easily batch process messages and benefit greatly from a higher max-in-flight.

  • Notably, because it is both buffered and push based with the ability to satisfy the need for independent copies of streams (channels), we’ve produced a daemon that behaves like simplequeue and pubsub combined . This is powerful in terms of simplifying the topology of our systems where we would have traditionally maintained the older toolchain discussed above.

Go
  • We made a strategic decision early on to build the NSQ core in Go. We recently blogged about our use of Go at bitly and alluded to this very project - it might be helpful to browse through that post to get an understanding of our thinking with respect to the language.

  • Regarding NSQ, Go channels (not to be confused with NSQ channels) and the language’s built in concurrency features are a perfect fit for the internal workings of nsqd. We leverage buffered channels to manage our in memory message queues and seamlessly write overflow to disk.

  • The standard library makes it easy to write the networking layer and client code. The built in memory and cpu profiling hooks highlight opportunities for optimization and require very little effort to integrate. We also found it really easy to test components in isolation, mock types using interfaces, and iteratively build functionality.

INTERNALS

  • NSQ is composed of 3 daemons:

      1. nsqd is the daemon that receives, queues, and delivers messages to clients.
    
      2. nsqlookupd is the daemon that manages topology information and provides an eventually consistent discovery service.
    
      3. nsqadmin is a web UI to introspect the cluster in realtime (and perform various administrative tasks).
    
  • Data flow in NSQ is modeled as a tree of streams and consumers. A topic is a distinct stream of data. A channel is a logical grouping of consumers subscribed to a given topic.

  • A single nsqd can have many topics and each topic can have many channels. A channel receives a copy of all the messages for the topic, enabling multicast style delivery while each message on a channel is distributed amongst its subscribers, enabling load-balancing.

    These primitives form a powerful framework for expressing a variety of simple and complex topologies.

本文标签: 文档笔记官方nsq