This post is written by Vignesh Selvam (Senior Product Manager – Amazon MQ), Simon Unge (Senior software development engineer – Amazon MQ).
Amazon MQ for RabbitMQ announced support for quorum queues, a type of replicated queue designed for higher availability and data safety. This post presents an overview of this queue type, describes when you should use it, and best practices you can follow. The post also describes how Amazon MQ has also improved quorum queues in the open-source RabbitMQ community.
Overview of quorum queues
A quorum queue is a replicated first in, first out queue type offered by open-source RabbitMQ that uses the Raft consensus algorithm to maintain data consistency. Each quorum queue has a leader and multiple followers (replicas), which ensure that messages are replicated and persisted across a majority of nodes, thus providing resilience against node failures. Quorum queues only need a majority of member nodes (a quorum) to make decisions about data. If a RabbitMQ node hosting a leader becomes unavailable, another node hosting one of the followers is automatically elected as the leader. Once the node becomes available again, the node will become a follower for the quorum queue and catch up or synchronize with the new leader. Quorum queues can detect network failures faster and recover quicker than classic mirrored queues, thus improving the resiliency of the message broker as a whole.
Quorum queues share most of the fundamental features that are key to RabbitMQ replicated queue types such as consumption, consumer acknowledgements, cancelling consumers, purging and deletion. Poison message handling is a unique feature of quorum queues which help developers manage unprocessed messages more efficiently. A poison message is a message that cannot be processed and ends up being repeatedly requeued. Quorum queues keep track of the number of unsuccessful delivery attempts and expose it in the ‘x-delivery-count’ header that is included with any redelivered message. A delivery limit can be set using a policy argument for ’delivery-limit’. If the limit is reached, the message can be dropped or put in a dead-letter queue. This feature further improves the data reliability of a quorum queue.
You can get started with quorum queues by explicitly specifying the ‘x-queue-type’ parameter as ’quorum’ on a RabbitMQ broker running version 3.13 and above. We recommend that you change the default vhost queue type to ’quorum’ to ensure that all queues are created as quorum queues by default inside a vhost.
When should you use quorum queues?
You should use quorum queues when you need higher availability and consistency for their messaging infrastructure. Quorum queues are ideal for scenarios where data durability and fault tolerance are critical, such as financial transaction systems, e-commerce data processing systems, or any application requiring high reliability. They are particularly beneficial in environments where node failures are more likely or where maintaining data consistency across distributed systems is essential.
When should you NOT use quorum queues?
Quorum queues are not meant to be temporary. They do not support transient or exclusive queues and are not meant to be used in scenarios with high queue churn (declaration and deletion rates). They are also not recommended for unreplicated queues.
Best practices for quorum queues
Quorum queues perform better when the queues are short. You can set the maximum queue length using a policy or queue arguments to limit the total memory usage by queues (max-length, max-length-bytes).
Amazon MQ recommends publishers to use publisher confirms and consumers to use manual acknowledgements on quorum queues. Publisher confirms will only be issued once a published message has been successfully replicated to a quorum of nodes and is considered safe within the context of the system. Publisher confirms can also serve as a form of back pressure and protect the availability of the broker during periods of high workload. Manual acknowledgements are used to ensure messages that are not processed can be returned to the queue for reprocessing.
Open-source improvements by Amazon MQ
Amazon MQ contributed multiple improvements to the open-source RabbitMQ community to improve quorum queues for operators and users.
Automatic membership reconciliation
Quorum queues depend on a majority of replicas being available for the Raft consensus algorithm. Amazon MQ identified that many users and operators would prefer to maintain a certain minimal number of replicas (generally 3 or 5) at all times to ensure a majority always exists. The quorum queue replica management was also initially available only via CLI tools. Amazon MQ engineers introduced automatic membership reconciliation to improve this experience. Now, RabbitMQ can be configured to identify any queues that are below a target group member size, and automatically grow or add a node to the queue members. Thus ensuring a certain minimum number of replicas always exist.
Voter status
RabbitMQ considers a quorum queue member node to be a full member even if the member has not caught up or fully synced to the quorum. The CLI command rabbitmq-queues check_if_node_is_ quorum_critical can provide a false positive, and indicate a node is safe to remove, even though another node has queue members that are still synchronizing to the quorum. Amazon MQ introduced a new ‘non-voter’ state for a queue member node to indicate a member that is still catching up or synchronizing to the quorum. If a queue has a member in this state, it is not considered a full member. Once the member is fully synchronized, it is automatically promoted to the voter status, and is considered a full member. The command rabbitmq-queues check_if_node_is_quorum_critical now takes this into account and correctly reports if a node can be safely terminated without any queues becoming unavailable due to a loss of majority.
Inconsistent state management
When a broker is overloaded, a quorum queue can end up in an inconsistent state, where the quorum queue membership state stored in the Raft state machine differs from the RabbitMQ internal state for the queue. Amazon MQ introduced a periodic check per quorum queue that identifies if a queue has an inconsistent state and takes action to fix it.
Default queue type
The default queue type for a RabbitMQ broker vhost was classic queues. You could declare a different queue type by explicitly stating the ’x-queue-type’ as a queue creation argument. Amazon MQ introduced a global default queue type in the configuration file (rabbit.conf) that provides the ability to define a default queue type at the broker level. Now, an operator can change the default queue type to quorum queues if not specified during creation.
Membership management permissions
RabbitMQ users are able to configure the quorum queue membership using the management API. This can interfere with automatic membership reconciliation. Amazon MQ introduced the ability for an operator to turn off the membership management permissions available through the management API. Thus, preventing customers from accidentally affecting their broker.
Conclusion
Quorum queues on RabbitMQ provide a robust solution for scenarios requiring high availability and resilience. By leveraging the Raft consensus protocol, quorum queues ensure that messages are safely stored and replicated across a quorum of nodes, making them an excellent choice for modern, distributed message queuing systems.
Amazon MQ recommends that you adopt quorum queues as the preferred replicated queue type on RabbitMQ 3.13 brokers. For more details, see Amazon MQ documentation. To know more about the open-source feature, see quorum queues.
Get started with quorum queues on Amazon MQ for RabbitMQ 3.13 with a few clicks.