AvroConfluent
| Input | Output | Alias |
|---|---|---|
| ✔ | ✗ |
Description
Apache Avro is a row-oriented serialization format that uses binary encoding for efficient data processing. The AvroConfluent format supports decoding single-object, Avro-encoded Kafka messages serialized using the Confluent Schema Registry (or API-compatible services).
Each Avro message embeds a schema ID that ClickHouse automatically resolves by querying the configured schema registry. Once resolved, schemas are cached for optimal performance.
Data type mapping
The table below shows all data types supported by the Apache Avro format, and their corresponding ClickHouse data types in INSERT and SELECT queries.
Avro data type INSERT | ClickHouse data type | Avro data type SELECT |
|---|---|---|
boolean, int, long, float, double | Int(8\16\32), UInt(8\16\32) | int |
boolean, int, long, float, double | Int64, UInt64 | long |
boolean, int, long, float, double | Float32 | float |
boolean, int, long, float, double | Float64 | double |
bytes, string, fixed, enum | String | bytes or string * |
bytes, string, fixed | FixedString(N) | fixed(N) |
enum | Enum(8\16) | enum |
array(T) | Array(T) | array(T) |
map(V, K) | Map(V, K) | map(string, K) |
union(null, T), union(T, null) | Nullable(T) | union(null, T) |
union(T1, T2, …) ** | Variant(T1, T2, …) | union(T1, T2, …) ** |
null | Nullable(Nothing) | null |
int (date) *** | Date, Date32 | int (date) *** |
long (timestamp-millis) *** | DateTime64(3) | long (timestamp-millis) *** |
long (timestamp-micros) *** | DateTime64(6) | long (timestamp-micros) *** |
bytes (decimal) *** | DateTime64(N) | bytes (decimal) *** |
int | IPv4 | int |
fixed(16) | IPv6 | fixed(16) |
bytes (decimal) *** | Decimal(P, S) | bytes (decimal) *** |
string (uuid) *** | UUID | string (uuid) *** |
fixed(16) | Int128/UInt128 | fixed(16) |
fixed(32) | Int256/UInt256 | fixed(32) |
record | Tuple | record |
* bytes is default, controlled by setting output_format_avro_string_column_pattern
** The Variant type implicitly accepts null as a field value, so for example the Avro union(T1, T2, null) will be converted to Variant(T1, T2).
As a result, when producing Avro from ClickHouse, we have to always include the null type to the Avro union type set as we don't know if any value is actually null during the schema inference.
Unsupported Avro logical data types:
time-millistime-microsduration
Format settings
| Setting | Description | Default |
|---|---|---|
input_format_avro_allow_missing_fields | Whether to use a default value instead of throwing an error when a field is not found in the schema. | 0 |
input_format_avro_null_as_default | Whether to use a default value instead of throwing an error when inserting a null value into a non-nullable column. | 0 |
format_avro_schema_registry_url | The Confluent Schema Registry URL. For basic authentication, URL-encoded credentials can be included directly in the URL path. |
Examples
Using a schema registry
To read an Avro-encoded Kafka topic using the Kafka table engine, use the format_avro_schema_registry_url setting to provide the URL of the schema registry.
Using basic authentication
If your schema registry requires basic authentication (e.g., if you're using Confluent Cloud), you can provide URL-encoded credentials in the format_avro_schema_registry_url setting.
Troubleshooting
To monitor ingestion progress and debug errors with the Kafka consumer, you can query the system.kafka_consumers system table. If your deployment has multiple replicas (e.g., ClickHouse Cloud), you must use the clusterAllReplicas table function.
If you run into schema resolution issues, you can use kafkacat with clickhouse-local to troubleshoot: