Replicated
The engine is based on the Atomic engine. It supports replication of metadata via DDL log being written to ZooKeeper and executed on all of the replicas for a given database.
One ClickHouse server can have multiple replicated databases running and updating at the same time. But there can't be multiple replicas of the same replicated database.
Creating a database
Engine Parameters
- zoo_path— ZooKeeper path. The same ZooKeeper path corresponds to the same database.
- shard_name— Shard name. Database replicas are grouped into shards by- shard_name.
- replica_name— Replica name. Replica names must be different for all replicas of the same shard.
Parameters can be omitted, in such case missing parameters are substituted with defaults.
If zoo_path contains macro {uuid}, it is required to specify explicit UUID or add ON CLUSTER to create statement to ensure all replicas use the same UUID for this database.
For ReplicatedMergeTree tables if no arguments provided, then default arguments are used: /clickhouse/tables/{uuid}/{shard} and {replica}. These can be changed in the server settings default_replica_path and default_replica_name. Macro {uuid} is unfolded to table's uuid, {shard} and {replica} are unfolded to values from server config, not from database engine arguments. But in the future, it will be possible to use shard_name and replica_name of Replicated database.
Specifics and recommendations
DDL queries with Replicated database work in a similar way to ON CLUSTER queries, but with minor differences.
First, the DDL request tries to execute on the initiator (the host that originally received the request from the user). If the request is not fulfilled, then the user immediately receives an error, other hosts do not try to fulfill it. If the request has been successfully completed on the initiator, then all other hosts will automatically retry until they complete it. The initiator will try to wait for the query to be completed on other hosts (no longer than distributed_ddl_task_timeout) and will return a table with the query execution statuses on each host.
The behavior in case of errors is regulated by the distributed_ddl_output_mode setting, for a Replicated database it is better to set it to null_status_on_timeout — i.e. if some hosts did not have time to execute the request for distributed_ddl_task_timeout, then do not throw an exception, but show the NULL status for them in the table.
The system.clusters system table contains a cluster named like the replicated database, which consists of all replicas of the database. This cluster is updated automatically when creating/deleting replicas, and it can be used for Distributed tables.
When creating a new replica of the database, this replica creates tables by itself. If the replica has been unavailable for a long time and has lagged behind the replication log — it checks its local metadata with the current metadata in ZooKeeper, moves the extra tables with data to a separate non-replicated database (so as not to accidentally delete anything superfluous), creates the missing tables, updates the table names if they have been renamed. The data is replicated at the ReplicatedMergeTree level, i.e. if the table is not replicated, the data will not be replicated (the database is responsible only for metadata).
ALTER TABLE FREEZE|ATTACH|FETCH|DROP|DROP DETACHED|DETACH PARTITION|PART queries are allowed but not replicated. The database engine will only add/fetch/remove the partition/part to the current replica. However, if the table itself uses a Replicated table engine, then the data will be replicated after using ATTACH.
In case you need only configure a cluster without maintaining table replication, refer to Cluster Discovery feature.
Usage example
Creating a cluster with three hosts:
Creating database on cluster with implicit parameters:
Running the DDL-query:
Showing the system table:
Creating a distributed table and inserting the data:
Adding replica on the one more host:
Adding replica on the one more host if macro {uuid} is used in zoo_path:
The cluster configuration will look like this:
The distributed table also will get data from the new host:
Settings
The following settings are supported:
| Setting | Default | Description | 
|---|---|---|
| max_broken_tables_ratio | 1 | Do not recover replica automatically if the ratio of staled tables to all tables is greater | 
| max_replication_lag_to_enqueue | 50 | Replica will throw exception on attempt to execute query if its replication lag greater | 
| wait_entry_commited_timeout_sec | 3600 | Replicas will try to cancel query if timeout exceed, but initiator host has not executed it yet | 
| collection_name | A name of a collection defined in server's config where all info for cluster authentication is defined | |
| check_consistency | true | Check consistency of local metadata and metadata in Keeper, do replica recovery on inconsistency | 
| max_retries_before_automatic_recovery | 10 | Max number of attempts to execute a queue entry before marking replica as lost recovering it from snapshot (0 means infinite) | 
| allow_skipping_old_temporary_tables_ddls_of_refreshable_materialized_views | false | If enabled, when processing DDLs in Replicated databases, it skips creating and exchanging DDLs of the temporary tables of refreshable materialized views if possible | 
| logs_to_keep | 1000 | Default number of logs to keep in ZooKeeper for Replicated database. | 
| default_replica_path | /clickhouse/databases/{uuid} | The path to the database in ZooKeeper. Used during database creation if arguments are omitted. | 
| default_replica_shard_name | {shard} | The shard name of the replica in the database. Used during database creation if arguments are omitted. | 
| default_replica_name | {replica} | The name of the replica in the database. Used during database creation if arguments are omitted. | 
Default values may be overwritten in the configuration file
