Skip to main content

2023 Changelog

ClickHouse release 24.8 LTS, 2024-08-20

Backward Incompatible Change

  • clickhouse-client and clickhouse-local now default to multi-query mode (instead single-query mode). As an example, clickhouse-client -q "SELECT 1; SELECT 2" now works, whereas users previously had to add --multiquery (or -n). The --multiquery/-n switch became obsolete. INSERT queries in multi-query statements are treated specially based on their FORMAT clause: If the FORMAT is VALUES (the most common case), the end of the INSERT statement is represented by a trailing semicolon ; at the end of the query. For all other FORMATs (e.g. CSV or JSONEachRow), the end of the INSERT statement is represented by two newlines \n\n at the end of the query. #63898 (FFish).
  • In previous versions, it was possible to use an alternative syntax for LowCardinality data types by appending WithDictionary to the name of the data type. It was an initial working implementation, and it was never documented or exposed to the public. Now, it is deprecated. If you have used this syntax, you have to ALTER your tables and rename the data types to LowCardinality. #66842 (Alexey Milovidov).
  • Fix logical errors with storage Buffer used with distributed destination table. It's a backward incompatible change: queries using Buffer with a distributed destination table may stop working if the table appears more than once in the query (e.g., in a self-join). #67015 (vdimir).
  • In previous versions, calling functions for random distributions based on the Gamma function (such as Chi-Squared, Student, Fisher) with negative arguments close to zero led to a long computation or an infinite loop. In the new version, calling these functions with zero or negative arguments will produce an exception. This closes #67297. #67326 (Alexey Milovidov).
  • The system table text_log is enabled by default. This is fully compatible with previous versions, but you may notice subtly increased disk usage on the local disk (this system table takes a tiny amount of disk space). #67428 (Alexey Milovidov).
  • In previous versions, arrayWithConstant can be slow if asked to generate very large arrays. In the new version, it is limited to 1 GB per array. This closes #32754. #67741 (Alexey Milovidov).
  • Fix REPLACE modifier formatting (forbid omitting brackets). #67774 (Azat Khuzhin).
  • Backported in #68349: Reimplement Dynamic type. Now when the limit of dynamic data types is reached new types are not casted to String but stored in a special data structure in binary format with binary encoded data type. Now any type ever inserted into Dynamic column can be read from it as subcolumn. #68132 (Kruglov Pavel).

New Feature

  • Added a new MergeTree setting deduplicate_merge_projection_mode to control the projections during merges (for specific engines) and OPTIMIZE DEDUPLICATE query. Supported options: throw (throw an exception in case the projection is not fully supported for *MergeTree engine), drop (remove projection during merge if it can't be merged itself consistently) and rebuild (rebuild projection from scratch, which is a heavy operation). #66672 (jsc0218).
  • Add _etag virtual column for S3 table engine. Fixes #65312. #65386 (skyoct).
  • Added a tagging (namespace) mechanism for the query cache. The same queries with different tags are considered different by the query cache. Example: SELECT 1 SETTINGS use_query_cache = 1, query_cache_tag = 'abc' and SELECT 1 SETTINGS use_query_cache = 1, query_cache_tag = 'def' now create different query cache entries. #68235 (sakulali).
  • Support more variants of JOIN strictness (LEFT/RIGHT SEMI/ANTI/ANY JOIN) with inequality conditions which involve columns from both left and right table. e.g. t1.y < t2.y (see the setting allow_experimental_join_condition). #64281 (lgbo).
  • Interpret Hive-style partitioning for different engines (File, URL, S3, AzureBlobStorage, HDFS). Hive-style partitioning organizes data into partitioned sub-directories, making it efficient to query and manage large datasets. Currently, it only creates virtual columns with the appropriate name and data. The follow-up PR will introduce the appropriate data filtering (performance speedup). #65997 (Yarik Briukhovetskyi).
  • Add function printf for Spark compatiability (but you can use the existing format function). #66257 (李扬).
  • Add options restore_replace_external_engines_to_null and restore_replace_external_table_functions_to_null to replace external engines and table_engines to Null engine that can be useful for testing. It should work for RESTORE and explicit table creation. #66536 (Ilya Yatsishin).
  • Added support for reading MULTILINESTRING geometry in WKT format using function readWKTLineString. #67647 (Jacob Reckhard).
  • Add a new table function fuzzQuery. This function allows the modification of a given query string with random variations. Example: SELECT query FROM fuzzQuery('SELECT 1') LIMIT 5;. #67655 (pufit).
  • Add a query ALTER TABLE ... DROP DETACHED PARTITION ALL to drop all detached partitions. #67885 (Duc Canh Le).
  • Add the rows_before_aggregation_at_least statistic to the query response when a new setting, rows_before_aggregation is enabled. This statistic represents the number of rows read before aggregation. In the context of a distributed query, when using the group by or max aggregation function without a limit, rows_before_aggregation_at_least can reflect the number of rows hit by the query. #66084 (morning-color).
  • Support OPTIMIZE query on Join tables to reduce their memory footprint. #67883 (Duc Canh Le).
  • Allow run query instantly in play if you add &run=1 in the URL #66457 (Aleksandr Musorin).

Experimental Feature

  • Implement a new JSON data type. #66444 (Kruglov Pavel).
  • Add the new TimeSeries table engine. #64183 (Vitaly Baranov).
  • Add new experimental Kafka storage engine to store offsets in Keeper instead of relying on committing them to Kafka. It makes the commit to ClickHouse tables atomic with regard to consumption from the queue. #57625 (János Benjamin Antal).
  • Use adaptive read task size calculation method (adaptive meaning it depends on read column sizes) for parallel replicas. #60377 (Nikita Taranov).
  • Added statistics type count_min (count-min sketches) which provide selectivity estimations for equality predicates like col = 'val'. Supported data types are string, date, datatime and numeric types. #65521 (JackyWoo).

Performance Improvement

  • Setting optimize_functions_to_subcolumns is enabled by default. #68053 (Anton Popov).
  • Store the plain_rewritable disk directory metadata in __meta layout, separately from the merge tree data in the object storage. Move the plain_rewritable disk to a flat directory structure. #65751 (Julia Kartseva).
  • Improve columns squashing (an operation happening in INSERT queries) for String/Array/Map/Variant/Dynamic types by reserving required memory in advance for all subcolumns. #67043 (Kruglov Pavel).
  • Speed up SYSTEM FLUSH LOGS and flush logs on shutdown. #67472 (Sema Checherinda).
  • Improved overall performance of merges by reducing the overhead of the scheduling steps of merges. #68016 (Anton Popov).
  • Speed up tables removal for DROP DATABASE query, increased the default value for database_catalog_drop_table_concurrency to 16. #67228 (Nikita Mikhaylov).
  • Avoid allocating too much capacity for array column while writing ORC. Performance speeds up 15% for an Array column. #67879 (李扬).
  • Speed up mutations for non-replicated MergeTree significantly #66911 #66909 (Alexey Milovidov).

Improvement

  • Setting allow_experimental_analyzer is renamed to enable_analyzer. The old name is preserved in a form of an alias. This signifies that Analyzer is no longer in beta and is fully promoted to production. #66438 (Nikita Mikhaylov).
  • Improve schema inference of date times. Now DateTime64 used only when date time has fractional part, otherwise regular DateTime is used. Inference of Date/DateTime is more strict now, especially when date_time_input_format='best_effort' to avoid inferring date times from strings in corner cases. #68382 (Kruglov Pavel).
  • ClickHouse server now supports new setting max_keep_alive_requests. For keep-alive HTTP connections to the server it works in tandem with keep_alive_timeout - if idle timeout not expired but there already more than max_keep_alive_requests requests done through the given connection - it will be closed by the server. #61793 (Nikita Taranov).
  • Various improvements in the advanced dashboard. This closes #67697. This closes #63407. This closes #51129. This closes #61204. #67701 (Alexey Milovidov).
  • Do not require a grant for REMOTE when creating a Distributed table: a grant for the Distributed engine is enough. #65419 (jsc0218).
  • Do not pass logs for keeper explicitly in the Docker image to allow overriding. #65564 (Azat Khuzhin).
  • Introduced use_same_password_for_base_backup settings for BACKUP and RESTORE queries, allowing to create and restore incremental backups to/from password protected archives. #66214 (Samuele).
  • Ignore async_load_databases for ATTACH query (previously it was possible for ATTACH to return before the tables had been attached). #66240 (Azat Khuzhin).
  • Added logs and metrics for rejected connections (where there are not enough resources). #66410 (Alexander Tokmakov).
  • Support proper UUID type for MongoDB engine. #66671 (Azat Khuzhin).
  • Add replication lag and recovery time metrics. #66703 (Miсhael Stetsyuk).
  • Add DiskS3NoSuchKeyErrors metric. #66704 (Miсhael Stetsyuk).
  • Ensure the COMMENT clause works for all table engines. #66832 (Joe Lynch).
  • Function mapFromArrays now accepts Map(K, V) as first argument, for example: SELECT mapFromArrays(map('a', 4, 'b', 4), ['aa', 'bb']) now works and returns {('a',4):'aa',('b',4):'bb'}. Also, if the 1st argument is an Array, it can now also be of type Array(Nullable(T)) or Array(LowCardinality(Nullable(T))) as long as the actual array values are not NULL. #67103 (李扬).
  • Read configuration for clickhouse-local from ~/.clickhouse-local. #67135 (Azat Khuzhin).
  • Rename setting input_format_orc_read_use_writer_time_zone to input_format_orc_reader_timezone and allow the user to set the reader timezone. #67175 (kevinyhzou).
  • Decrease level of the Socket is not connected error when HTTP connection immediately reset by peer after connecting, close #34218. #67177 (vdimir).
  • Add ability to load dashboards for system.dashboards from config (once set, they overrides the default dashboards preset). #67232 (Azat Khuzhin).
  • The window functions in SQL are traditionally in snake case. ClickHouse uses camelCase, so new aliases denseRank() and percentRank() have been created. These new functions can be called the exact same as the original dense_rank() and percent_rank() functions. Both snake case and camelCase syntaxes remain usable. A new test for each of the functions has been added as well. This closes #67042 . #67334 (Peter Nguyen).
  • Autodetect configuration file format if is not .xml, .yml or .yaml. If the file begins with < it might be XML, otherwise it might be YAML. It is useful when providing a configuration file from a pipe: clickhouse-server --config-file <(echo "hello: world"). #67391 (sakulali).
  • Functions formatDateTime and formatDateTimeInJodaSyntax now treat their format parameter as optional. If it is not specified, format strings %Y-%m-%d %H:%i:%s and yyyy-MM-dd HH:mm:ss are assumed. Example: SELECT parseDateTime('2021-01-04 23:12:34') now returns DateTime value 2021-01-04 23:12:34 (previously, this threw an exception). #67399 (Robert Schulze).
  • Automatically retry Keeper requests in KeeperMap if they happen because of timeout or connection loss. #67448 (Antonio Andelic).
  • Add -no-pie to Aarch64 Linux builds to allow proper introspection and symbolizing of stacktraces after a ClickHouse restart. #67916 (filimonov).
  • Added profile events for merges and mutations for better introspection. #68015 (Anton Popov).
  • Fix settings and current_database in system.processes for async BACKUP/RESTORE. #68163 (Azat Khuzhin).
  • Remove unnecessary logs for non-replicated MergeTree. #68238 (Daniil Ivanik).

Build/Testing/Packaging Improvement

  • Integration tests flaky check will not run each test case multiple times to find more issues in tests and make them more reliable. It is using pytest-repeat library to run test case multiple times for the same environment. It is important to cleanup tables and other entities in the end of a test case to pass. Repeating works much faster than several pytest runs as it starts necessary containers only once. #66986 (Ilya Yatsishin).
  • Unblock the usage of CLion with ClickHouse. In previous versions, CLion freezed for a minute on every keypress. This closes #66994. #66995 (Alexey Milovidov).
  • getauxval: avoid a crash under a sanitizer re-exec due to high ASLR entropy in newer Linux kernels. #67081 (Raúl Marín).
  • Some parts of client code are extracted to a single file and highest possible level optimization is applied to them even for debug builds. This closes: #65745. #67215 (Nikita Mikhaylov).

Bug Fix

  • Only relevant to the experimental Variant data type. Fix crash with Variant + AggregateFunction type. #67122 (Kruglov Pavel).
  • Fix crash in DistributedAsyncInsert when connection is empty. #67219 (Pablo Marcos).
  • Fix crash of uniq and uniqTheta with tuple() argument. Closes #67303. #67306 (flynn).
  • Fixes #66026. Avoid unresolved table function arguments traversal in ReplaceTableNodeToDummyVisitor. #67522 (Dmitry Novik).
  • Fix potential stack overflow in JSONMergePatch function. Renamed this function from jsonMergePatch to JSONMergePatch because the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes #67304. #67756 (Alexey Milovidov).
  • Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. #68098 (Salvatore Mesoraca).
  • Fixed Not-ready Set in some system tables when filtering using subqueries. #66018 (Michael Kolupaev).
  • Fixed reading of subcolumns after ALTER ADD COLUMN query. #66243 (Anton Popov).
  • Fix boolean literals in query sent to external database (for engines like PostgreSQL). #66282 (vdimir).
  • Fix formatting of query with aliased JOIN ON expression, e.g. ... JOIN t2 ON (x = y) AS e ORDER BY x should be formatted as ... JOIN t2 ON ((x = y) AS e) ORDER BY x. #66312 (vdimir).
  • Fix cluster() for inter-server secret (preserve initial user as before). #66364 (Azat Khuzhin).
  • Fix possible runtime error while converting Array field with nulls to Array(Variant). #66727 (Kruglov Pavel).
  • Fix for occasional deadlock in Context::getDDLWorker. #66843 (Alexander Gololobov).
  • Fix creating KeeperMap table after an incomplete drop. #66865 (Antonio Andelic).
  • Fix broken part error while restoring to a s3_plain_rewritable disk. #66881 (Vitaly Baranov).
  • In rare cases ClickHouse could consider parts as broken because of some unexpected projections on disk. Now it's fixed. #66898 (alesapin).
  • Fix invalid format detection in schema inference that could lead to logical error Format {} doesn't support schema inference. #66899 (Kruglov Pavel).
  • Fix possible deadlock on query cancel with parallel replicas. #66905 (Nikita Taranov).
  • Forbid create as select even when database_replicated_allow_heavy_create is set. It was unconditionally forbidden in 23.12 and accidentally allowed under the setting in unreleased 24.7. #66980 (vdimir).
  • Reading from the numbers could wrongly throw an exception when the max_rows_to_read limit was set. This closes #66992. #66996 (Alexey Milovidov).
  • Add proper type conversion to lagInFrame and leadInFrame window functions - fixes msan test. #67091 (Yakov Olkhovskiy).
  • TRUNCATE DATABASE used to stop replication as if it was a DROP DATABASE query, it's fixed. #67129 (Alexander Tokmakov).
  • Use a separate client context in clickhouse-local. #67133 (Vitaly Baranov).
  • Fix error Cannot convert column because it is non constant in source stream but must be constant in result. for a query that reads from the Merge table over the Distriburted table with one shard. #67146 (Nikolai Kochetov).
  • Correct behavior of ORDER BY all with disabled enable_order_by_all and parallel replicas (distributed queries as well). #67153 (Igor Nikonov).
  • Fix wrong usage of input_format_max_bytes_to_read_for_schema_inference in schema cache. #67157 (Kruglov Pavel).
  • Fix the memory leak for count distinct, when exception issued during group by single nullable key. #67171 (Jet He).
  • Fix an error in optimization which converts OUTER JOIN to INNER JOIN. This closes #67156. This closes #66447. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/62907. #67178 (Maksim Kita).
  • Fix error Conversion from AggregateFunction(name, Type) to AggregateFunction(name, Nullable(Type)) is not supported. The bug was caused by the optimize_rewrite_aggregate_function_with_if optimization. Fixes #67112. #67229 (Nikolai Kochetov).
  • Fix hung query when using empty tuple as lhs of function IN. #67295 (Duc Canh Le).
  • It was possible to create a very deep nested JSON data that triggered stack overflow while skipping unknown fields. This closes #67292. #67324 (Alexey Milovidov).
  • Fix attaching ReplicatedMergeTree table after exception during startup. #67360 (Antonio Andelic).
  • Fix segfault caused by incorrectly detaching from thread group in Aggregator. #67385 (Antonio Andelic).
  • Fix one more case when a non-deterministic function is specified in PK. #67395 (Nikolai Kochetov).
  • Fixed bloom_filter index breaking queries with mildly weird conditions like (k=2)=(k=2) or has([1,2,3], k). #67423 (Michael Kolupaev).
  • Correctly parse file name/URI containing :: if it's not an archive. #67433 (Antonio Andelic).
  • Fix wait for tasks in ~WriteBufferFromS3 in case WriteBuffer was cancelled. #67459 (Kseniia Sumarokova).
  • Protect temporary part directories from removing during RESTORE. #67491 (Vitaly Baranov).
  • Fix execution of nested short-circuit functions. #67520 (Kruglov Pavel).
  • Fix Logical error: Expected the argument №N of type T to have X rows, but it has 0. The error could happen in a remote query with constant expression in GROUP BY (with a new analyzer). #67536 (Nikolai Kochetov).
  • Fix join on tuple with NULLs: Some queries with the new analyzer and NULL inside the tuple in the JOIN ON section returned incorrect results. #67538 (vdimir).
  • Fix redundant reschedule of FileCache::freeSpaceRatioKeepingThreadFunc() in case of full non-evictable cache. #67540 (Kseniia Sumarokova).
  • Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. #67554 (János Benjamin Antal).
  • Fix for function toStartOfWeek which returned the wrong result with a small DateTime64 value. #67558 (Yarik Briukhovetskyi).
  • Fix creation of view with recursive CTE. #67587 (Yakov Olkhovskiy).
  • Fix Logical error: 'file_offset_of_buffer_end <= read_until_position' in filesystem cache. Closes #57508. #67623 (Kseniia Sumarokova).
  • Fixes #62282. Removed the call to convertFieldToString() and added datatype specific serialization code. Parameterized view substitution was broken for multiple datatypes when parameter value was a function or expression returning datatype instance. #67654 (Shankar).
  • Fix crash on percent_rank. percent_rank's default frame type is changed to range unbounded preceding and unbounded following. IWindowFunction's default window frame is considered and now window functions without window frame definition in sql can be put into different WindowTransfomers properly. #67661 (lgbo).
  • Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. #67665 (Antonio Andelic).
  • Fix possible logical error "Unexpected return type from if" with experimental Variant type and enabled setting use_variant_as_common_type in function if with Tuples and Maps. #67687 (Kruglov Pavel).
  • Due to a bug in Linux Kernel, a query can hung in TimerDescriptor::drain. This closes #37686. #67702 (Alexey Milovidov).
  • Fix completion of RESTORE ON CLUSTER command. #67720 (Vitaly Baranov).
  • Fix dictionary hang in case of CANNOT_SCHEDULE_TASK while loading. #67751 (Azat Khuzhin).
  • Queries like SELECT count() FROM t WHERE cast(c = 1 or c = 9999 AS Bool) SETTINGS use_skip_indexes=1 with bloom filter indexes on c now work correctly. #67781 (jsc0218).
  • Fix wrong aggregation result in some queries with aggregation without keys and filter, close #67419. #67804 (vdimir).
  • Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. #67911 (Kruglov Pavel).
  • Fix DateTime64 parsing after constant folding in distributed queries, close #66773. #67920 (vdimir).
  • Fix wrong count() result when there is non-deterministic function in predicate. #67922 (János Benjamin Antal).
  • Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. #67963 (Robert Schulze).
  • Now ClickHouse doesn't consider part as broken if projection doesn't exist on disk but exists in checksums.txt. #68003 (alesapin).
  • Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. #68052 (Anton Popov).
  • Removes an incorrect optimization to remove sorting in subqueries that use OFFSET. Fixes #67906. #68099 (Graham Campbell).
  • Attempt to fix Block structure mismatch in AggregatingStep stream: different types for aggregate projection optimization. #68107 (Nikolai Kochetov).
  • Try fix postgres crash when query is cancelled. #68288 (Kseniia Sumarokova).
  • Fix missing sync replica mode in query SYSTEM SYNC REPLICA. #68326 (Duc Canh Le).

ClickHouse release 24.7, 2024-07-30

Backward Incompatible Change

  • Forbid CRATE MATERIALIZED VIEW ... ENGINE Replicated*MergeTree POPULATE AS SELECT ... with Replicated databases. #63963 (vdimir).
  • clickhouse-keeper-client will only accept paths in string literals, such as ls '/hello/world', not bare strings such as ls /hello/world. #65494 (Alexey Milovidov).
  • Metric KeeperOutstandingRequets was renamed to KeeperOutstandingRequests. #66206 (Robert Schulze).
  • Remove is_deterministic field from the system.functions table. #66630 (Alexey Milovidov).
  • Function tuple will now try to construct named tuples in query (controlled by enable_named_columns_in_function_tuple). Introduce function tupleNames to extract names from tuples. #54881 (Amos Bird).
  • Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. #61601 (Sema Checherinda).

New Feature

  • Add ASOF JOIN support for full_sorting_join algorithm. #55051 (vdimir).
  • Support JWT authentication in clickhouse-client (will be available only in ClickHouse Cloud). #62829 (Konstantin Bogdanov).
  • Add SQL functions changeYear, changeMonth, changeDay, changeHour, changeMinute, changeSecond. For example, SELECT changeMonth(toDate('2024-06-14'), 7) returns date 2024-07-14. #63186 (cucumber95).
  • Introduce startup scripts, which allow the execution of preconfigured queries at the startup stage. #64889 (pufit).
  • Support accept_invalid_certificate in client's config in order to allow for client to connect over secure TCP to a server running with self-signed certificate - can be used as a shorthand for corresponding openSSL client settings verificationMode=none + invalidCertificateHandler.name=AcceptCertificateHandler. #65238 (peacewalker122).
  • Add system.error_log which contains history of error values from table system.errors, periodically flushed to disk. #65381 (Pablo Marcos).
  • Add aggregate function groupConcat. About the same as arrayStringConcat( groupArray(column), ',') Can receive 2 parameters: a string delimiter and the number of elements to be processed. #65451 (Yarik Briukhovetskyi).
  • Add AzureQueue storage. #65458 (Kseniia Sumarokova).
  • Add a new setting to disable/enable writing page index into parquet files. #65475 (lgbo).
  • Introduce logger.console_log_level server config to control the log level to the console (if enabled). #65559 (Azat Khuzhin).
  • Automatically append a wildcard * to the end of a directory path with table function file. #66019 (Zhidong (David) Guo).
  • Add --memory-usage option to client in non-interactive mode. #66393 (vdimir).
  • Make an interactive client for clickhouse-disks, add local disk from the local directory. #64446 (Daniil Ivanik).
  • When lightweight delete happens on a table with projection(s), users have choices either throw an exception (by default) or drop the projection #65594 (jsc0218).
  • Add system tables with main information about all detached tables. #65400 (Konstantin Morozov).

Experimental Feature

  • Change binary serialization of the Variant data type: add compact mode to avoid writing the same discriminator multiple times for granules with single variant or with only NULL values. Add MergeTree setting use_compact_variant_discriminators_serialization that is enabled by default. Note that Variant type is still experimental and backward-incompatible change in serialization is ok. #62774 (Kruglov Pavel).
  • Support on-disk backend storage for clickhouse-keeper. #56626 (Han Fei).
  • Refactor JSONExtract functions, support more types including experimental Dynamic type. #66046 (Kruglov Pavel).
  • Support null map subcolumn for Variant and Dynamic subcolumns. #66178 (Kruglov Pavel).
  • Fix reading Dynamic subcolumns from altered Memory table. Previously if max_types parameter of a Dynamic type was changed in Memory table via alter, further subcolumns reading can return wrong result. #66066 (Kruglov Pavel).
  • Add support for cluster_for_parallel_replicas when using custom key parallel replicas. It allows you to use parallel replicas with custom key with MergeTree tables. #65453 (Antonio Andelic).

Performance Improvement

  • Replace int to string algorithm with a faster one (from a modified amdn/itoa to a modified jeaiii/itoa). #61661 (Raúl Marín).
  • Sizes of hash tables created by join (parallel_hash algorithm) are collected and cached now. This information will be used to preallocate space in hash tables for subsequent query executions and save time on hash table resizes. #64553 (Nikita Taranov).
  • Optimized queries with ORDER BY primary key and WHERE that have a condition with high selectivity by using buffering. It is controlled by setting read_in_order_use_buffering (enabled by default) and can increase memory usage of query. #64607 (Anton Popov).
  • Improve performance of loading plain_rewritable metadata. #65634 (Alexey Milovidov).
  • Attaching tables on read-only disks will use fewer resources by not loading outdated parts. #65635 (Alexey Milovidov).
  • Support minmax hyperrectangle for Set indices. #65676 (AntiTopQuark).
  • Unload primary index of outdated parts to reduce total memory usage. #65852 (Anton Popov).
  • Functions replaceRegexpAll and replaceRegexpOne are now significantly faster if the pattern is trivial, i.e. contains no metacharacters, pattern classes, flags, grouping characters etc. (Thanks to Taiyang Li). #66185 (Robert Schulze).
  • s3 requests: Reduce retry time for queries, increase retries count for backups. 8.5 minutes and 100 retires for queries, 1.2 hours and 1000 retries for backup restore. #65232 (Sema Checherinda).
  • Support query plan LIMIT optimization. Support LIMIT pushdown for PostgreSQL storage and table function. #65454 (Maksim Kita).
  • Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite fallback_session_lifetime. Added support for AZ-aware balancing. #65570 (Alexander Tokmakov).
  • DatabaseCatalog drops tables faster by using up to database_catalog_drop_table_concurrency threads. #66065 (Sema Checherinda).

Improvement

  • Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite fallback_session_lifetime. Added support for AZ-aware balancing. #65570 (Alexander Tokmakov).
  • The setting optimize_trivial_insert_select is disabled by default. In most cases, it should be beneficial. Nevertheless, if you are seeing slower INSERT SELECT or increased memory usage, you can enable it back or SET compatibility = '24.6'. #58970 (Alexey Milovidov).
  • Print stacktrace and diagnostic info if clickhouse-client or clickhouse-local crashes. #61109 (Alexander Tokmakov).
  • The result of SHOW INDEX | INDEXES | INDICES | KEYS was previously sorted by the primary key column names. Since this was unintuitive, the result is now sorted by the position of the primary key columns within the primary key. #61131 (Robert Schulze).
  • Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. #61601 (Sema Checherinda).
  • Support reading partitioned data DeltaLake data. Infer DeltaLake schema by reading metadata instead of data. #63201 (Kseniia Sumarokova).
  • In composable protocols TLS layer accepted only certificateFile and privateKeyFile parameters. https://clickhouse.com/docs/en/operations/settings/composable-protocols. #63985 (Anton Ivashkin).
  • Added profile event SelectQueriesWithPrimaryKeyUsage which indicates how many SELECT queries use the primary key to evaluate the WHERE clause. #64492 (0x01f).
  • StorageS3Queue related fixes and improvements. Deduce a default value of s3queue_processing_threads_num according to the number of physical cpu cores on the server (instead of the previous default value as 1). Set default value of s3queue_loading_retries to 10. Fix possible vague "Uncaught exception" in exception column of system.s3queue. Do not increment retry count on MEMORY_LIMIT_EXCEEDED exception. Move files commit to a stage after insertion into table fully finished to avoid files being commited while not inserted. Add settings s3queue_max_processed_files_before_commit, s3queue_max_processed_rows_before_commit, s3queue_max_processed_bytes_before_commit, s3queue_max_processing_time_sec_before_commit, to better control commit and flush time. #65046 (Kseniia Sumarokova).
  • Support aliases in parametrized view function (only new analyzer). #65190 (Kseniia Sumarokova).
  • Updated to mask account key in logs in azureBlobStorage. #65273 (SmitaRKulkarni).
  • Partition pruning for IN predicates when filter expression is a part of PARTITION BY expression. #65335 (Eduard Karacharov).
  • arrayMin/arrayMax can be applicable to all data types that are comparable. #65455 (pn).
  • Improved memory accounting for cgroups v2 to exclude the amount occupied by the page cache. #65470 (Nikita Taranov).
  • Do not create format settings for each row when serializing chunks to insert to EmbeddedRocksDB table. #65474 (Duc Canh Le).
  • Reduce clickhouse-local prompt to just :). getFQDNOrHostName() takes too long on macOS, and we don't want a hostname in the prompt for clickhouse-local anyway. #65510 (Konstantin Bogdanov).
  • Avoid printing a message from jemalloc about per-CPU arenas on low-end virtual machines. #65532 (Alexey Milovidov).
  • Disable filesystem cache background download by default. It will be enabled back when we fix the issue with possible "Memory limit exceeded" because memory deallocation is done outside of query context (while buffer is allocated inside of query context) if we use background download threads. Plus we need to add a separate setting to define max size to download for background workers (currently it is limited by max_file_segment_size, which might be too big). #65534 (Kseniia Sumarokova).
  • Add new option to config <config_reload_interval_ms> which allow to specify how often clickhouse will reload config. #65545 (alesapin).
  • Implement binary encoding for ClickHouse data types and add its specification in docs. Use it in Dynamic binary serialization, allow to use it in RowBinaryWithNamesAndTypes and Native formats under settings. #65546 (Kruglov Pavel).
  • Server settings compiled_expression_cache_size and compiled_expression_cache_elements_size are now shown in system.server_settings. #65584 (Robert Schulze).
  • Add support for user identification based on x509 SubjectAltName extension. #65626 (Anton Kozlov).
  • clickhouse-local will respect the max_server_memory_usage and max_server_memory_usage_to_ram_ratio from the configuration file. It will also set the max memory usage to 90% of the system memory by default, like clickhouse-server does. #65697 (Alexey Milovidov).
  • Add a script to backup your files to ClickHouse. #65699 (Alexey Milovidov).
  • PostgreSQL source to support query cancellations. #65722 (Maksim Kita).
  • Make allow_experimental_analyzer be controlled by the initiator for distributed queries. This ensures compatibility and correctness during operations in mixed version clusters. #65777 (Nikita Mikhaylov).
  • Respect cgroup CPU limit in Keeper. #65819 (Antonio Andelic).
  • Allow to use concat function with empty arguments :) select concat();. #65887 (李扬).
  • Allow controlling named collections in clickhouse-local. #65973 (Alexey Milovidov).
  • Improve Azure-related profile events. #65999 (alesapin).
  • Support ORC file read by writer's time zone. #66025 (kevinyhzou).
  • Add settings to control connections to PostgreSQL. The setting postgresql_connection_attempt_timeout specifies the value passed to connect_timeout parameter of connection URL. The setting postgresql_connection_pool_retries specifies the number of retries to establish a connection to the PostgreSQL end-point. #66232 (Dmitry Novik).
  • Reduce inaccuracy of input_wait_elapsed_us/elapsed_us in the system.processors_profile_log. #66239 (Azat Khuzhin).
  • Improve ProfileEvents for the filesystem cache. #66249 (zhukai).
  • Add settings to ignore the ON CLUSTER clause in queries for named collection management with the replicated storage. #66288 (MikhailBurdukov).
  • Function generateSnowflakeID now allows to specify a machine ID as a parameter to prevent collisions in large clusters. #66374 (ZAWA_ll).
  • Disable suspending on Ctrl+Z in interactive mode. This is a common trap and is not expected behavior for almost all users. I imagine only a few extreme power users could appreciate suspending terminal applications to the background, but I don't know any. #66511 (Alexey Milovidov).
  • Add option for validating the primary key type in Dictionaries. Without this option for simple layouts any column type will be implicitly converted to UInt64. #66595 (MikhailBurdukov).

Bug Fix (user-visible misbehavior in an official stable release)

  • Check cyclic dependencies on CREATE/REPLACE/RENAME/EXCHANGE queries and throw an exception if there is a cyclic dependency. Previously such cyclic dependencies could lead to a deadlock during server startup. Also fix some bugs in dependencies creation. #65405 (Kruglov Pavel).
  • Fix unexpected sizes of LowCardinality columns in function calls. #65298 (Raúl Marín).
  • Fix crash in maxIntersections. #65689 (Raúl Marín).
  • Fix the VALID UNTIL clause in the user definition resetting after a restart. #66409 (Nikolay Degterinsky).
  • Fix the remaining time column in SHOW MERGES. #66735 (Alexey Milovidov).
  • Query was cancelled might have been printed twice in clickhouse-client. This behaviour is fixed. #66005 (Nikita Mikhaylov).
  • Fixed crash while using MaterializedMySQL (which is an unsupported, experimental feature) with TABLE OVERRIDE that maps MySQL NULL field into ClickHouse not NULL field. #54649 (Filipp Ozinov).
  • Fix logical error when PREWHERE expression read no columns and table has no adaptive index granularity (very old table). #59173 (Alexander Gololobov).
  • Fix bug with the cancellation buffer when canceling a query. #64478 (Sema Checherinda).
  • Fix filling parts columns from metadata (when columns.txt does not exists). #64757 (Azat Khuzhin).
  • Fix crash for ALTER TABLE ... ON CLUSTER ... MODIFY SQL SECURITY. #64957 (pufit).
  • Fix crash on destroying AccessControl: add explicit shutdown. #64993 (Vitaly Baranov).
  • Eliminate injective function in argument of functions uniq* recursively. This used to work correctly but was broken in the new analyzer. #65140 (Duc Canh Le).
  • Fix unexpected projection name when query with CTE. #65267 (wudidapaopao).
  • Require dictGet privilege when accessing dictionaries via direct query or the Dictionary table engine. #65359 (Joe Lynch).
  • Fix user-specific S3 auth with incremental backups. #65481 (Antonio Andelic).
  • Disable non-intersecting-parts optimization for queries with FINAL in case of read-in-order optimization was enabled. This could lead to an incorrect query result. As a workaround, disable do_not_merge_across_partitions_select_final and split_parts_ranges_into_intersecting_and_non_intersecting_final before this fix is merged. #65505 (Nikolai Kochetov).
  • Fix getting exception Index out of bound for blob metadata in case all files from list batch were filtered out. #65523 (Kseniia Sumarokova).
  • Fix NOT_FOUND_COLUMN_IN_BLOCK for deduplicate merge of projection. #65573 (Yakov Olkhovskiy).
  • Fixed bug in MergeJoin. Column in sparse serialisation might be treated as a column of its nested type though the required conversion wasn't performed. #65632 (Nikita Taranov).
  • Fixed a bug that compatibility level '23.4' was not properly applied. #65737 (cw5121).
  • Fix odbc table with nullable fields. #65738 (Rodolphe Dugé de Bernonville).
  • Fix data race in TCPHandler, which could happen on fatal error. #65744 (Kseniia Sumarokova).
  • Fix invalid exceptions in function parseDateTime with %F and %D placeholders. #65768 (Antonio Andelic).
  • For queries that read from PostgreSQL, cancel the internal PostgreSQL query if the ClickHouse query is finished. Otherwise, ClickHouse query cannot be canceled until the internal PostgreSQL query is finished. #65771 (Maksim Kita).
  • Fix a bug in short circuit logic when old analyzer and dictGetOrDefault is used. #65802 (jsc0218).
  • Fix a bug leads to EmbeddedRocksDB with TTL write corrupted SST files. #65816 (Duc Canh Le).
  • Functions bitTest, bitTestAll, and bitTestAny now return an error if the specified bit index is out-of-bounds #65818 (Pablo Marcos).
  • Setting join_any_take_last_row is supported in any query with hash join. #65820 (vdimir).
  • Better handling of join conditions involving IS NULL checks (for example ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) ) is rewritten to ON a <=> b), fix incorrect optimization when condition other then IS NULL are present. #65835 (vdimir).
  • Functions bitShiftLeft and bitShitfRight return an error for out of bounds shift positions #65838 (Pablo Marcos).
  • Fix growing memory usage in S3Queue. #65839 (Kseniia Sumarokova).
  • Fix tie handling in arrayAUC to match sklearn. #65840 (gabrielmcg44).
  • Fix possible issues with MySQL server protocol TLS connections. #65917 (Azat Khuzhin).
  • Fix possible issues with MySQL client protocol TLS connections. #65938 (Azat Khuzhin).
  • Fix handling of SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE with zero timeout. #65941 (Azat Khuzhin).
  • Add missing settings input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines/input_format_csv_try_infer_numbers_from_strings/input_format_csv_try_infer_strings_from_quoted_tuples in schema inference cache because they can change the resulting schema. It prevents from incorrect result of schema inference with these settings changed. #65980 (Kruglov Pavel).
  • Column _size in s3 engine and s3 table function denotes the size of a file inside the archive, not a size of the archive itself. #65993 (Daniil Ivanik).
  • Fix resolving dynamic subcolumns in analyzer, avoid reading the whole column on dynamic subcolumn reading. #66004 (Kruglov Pavel).
  • Fix config merging for from_env with replace overrides. #66034 (Azat Khuzhin).
  • Fix a possible hanging in GRPCServer during shutdown. #66061 (Vitaly Baranov).
  • Fixed several cases in function has with non-constant LowCardinality arguments. #66088 (Anton Popov).
  • Fix for groupArrayIntersect. It had incorrect behavior in the merge() function. Also, fixed behavior in deserialise() for numeric and general data. #66103 (Yarik Briukhovetskyi).
  • Fixed buffer overflow bug in unbin/unhex implementation. #66106 (Nikita Taranov).
  • Disable the merge-filters optimization introduced in #64760. It may cause an exception if optimization merges two filter expressions and does not apply a short-circuit evaluation. #66126 (Nikolai Kochetov).
  • Fixed the issue when the server failed to parse Avro files with negative block size arrays encoded, which is now allowed by the Avro specification. #66130 (Serge Klochkov).
  • Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to "soft memory limit" in ClickHouse Keeper. #66140 (Alexander Tokmakov).
  • Fix issue in SumIfToCountIfVisitor and signed integers. #66146 (Raúl Marín).
  • Fix rare case with missing data in the result of distributed query. #66174 (vdimir).
  • Fix order of parsing metadata fields in StorageDeltaLake. #66211 (Kseniia Sumarokova).
  • Don't throw TIMEOUT_EXCEEDED for none_only_active mode of distributed_ddl_output_mode. #66218 (Alexander Tokmakov).
  • Fix handling limit for system.numbers_mt when no index can be used. #66231 (János Benjamin Antal).
  • Fixed how the ClickHouse server detects the maximum number of usable CPU cores as specified by cgroups v2 if the server runs in a container such as Docker. In more detail, containers often run their process in the root cgroup which has an empty name. In that case, ClickHouse ignored the CPU limits set by cgroups v2. #66237 (filimonov).
  • Fix the Not-ready set error when a subquery with IN is used in the constraint. #66261 (Nikolai Kochetov).
  • Fix error reporting while copying to S3 or AzureBlobStorage. #66295 (Vitaly Baranov).
  • Prevent watchdog from keeping descriptors of unlinked (rotated) log files. #66334 (Aleksei Filatov).
  • Fix the bug that logicalexpressionoptimizerpass lost logical type of constant. #66344 (pn).
  • Fix Column identifier is already registered error with group_by_use_nulls=true and new analyzer. #66400 (Nikolai Kochetov).
  • Fix possible incorrect result for queries joining and filtering table external engine (like PostgreSQL), due to too aggressive filter pushdown. Since now, conditions from where section won't be send to external database in case of outer join with external table. #66402 (vdimir).
  • Added missing column materialization for cross join. #66413 (lgbo).
  • Fix Cannot find column error for queries with constant expression in GROUP BY key and new analyzer enabled. #66433 (Nikolai Kochetov).
  • Avoid possible logical error during import from Npy format in case of bad array nesting level, fix testing of other kinds of errors. #66461 (Yarik Briukhovetskyi).
  • Fix wrong count() result when there is non-deterministic function in predicate. #66510 (Duc Canh Le).
  • Correctly track memory for Allocator::realloc. #66548 (Antonio Andelic).
  • Fix reading of uninitialized memory when hashing empty tuples. #66562 (Alexey Milovidov).
  • Fix an invalid result for queries with WINDOW. This could happen when PARTITION columns have sparse serialization and window functions are executed in parallel. #66579 (Nikolai Kochetov).
  • Fix removing named collections in local storage. #66599 (János Benjamin Antal).
  • Fix column_length is not updated in ColumnTuple::insertManyFrom. #66626 (lgbo).
  • Fix Unknown identifier and Column is not under aggregate function errors for queries with the expression (column IS NULL). The bug was triggered by #65088, with the disabled analyzer only. #66654 (Nikolai Kochetov).
  • Fix Method getResultType is not supported for QUERY query node error when scalar subquery was used as the first argument of IN (with new analyzer). #66655 (Nikolai Kochetov).
  • Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. #66659 (Kruglov Pavel).
  • Fix rare case of stuck merge after drop column. #66707 (Raúl Marín).
  • Fix assertion isUniqTypes when insert select from remote sources. #66722 (Sema Checherinda).
  • Fix logical error in PrometheusRequestHandler. #66621 (Vitaly Baranov).
  • Fix indexHint function case found by fuzzer. #66286 (Anton Popov).
  • Fix AST formatting of 'create table b empty as a'. #64951 (Michael Kolupaev).

ClickHouse release 24.6, 2024-07-01

Backward Incompatible Change

  • Enable asynchronous load of databases and tables by default. See the async_load_databases in config.xml. While this change is fully compatible, it can introduce a difference in behavior. When async_load_databases is false, as in the previous versions, the server will not accept connections until all tables are loaded. When async_load_databases is true, as in the new version, the server can accept connections before all the tables are loaded. If a query is made to a table that is not yet loaded, it will wait for the table's loading, which can take considerable time. It can change the behavior of the server if it is part of a large distributed system under a load balancer. In the first case, the load balancer can get a connection refusal and quickly failover to another server. In the second case, the load balancer can connect to a server that is still loading the tables, and the query will have a higher latency. Moreover, if many queries accumulate in the waiting state, it can lead to a "thundering herd" problem when they start processing simultaneously. This can make a difference only for highly loaded distributed backends. You can set the value of async_load_databases to false to avoid this problem. #57695 (Alexey Milovidov).
  • Setting replace_long_file_name_to_hash is enabled by default for MergeTree tables. #64457 (Anton Popov). This setting is fully compatible, and no actions needed during upgrade. The new data format is supported from all versions starting from 23.9. After enabling this setting, you can no longer downgrade to a version 23.8 or older.
  • Some invalid queries will fail earlier during parsing. Note: disabled the support for inline KQL expressions (the experimental Kusto language) when they are put into a kql table function without a string literal, e.g. kql(garbage | trash) instead of kql('garbage | trash') or kql($$garbage | trash$$). This feature was introduced unintentionally and should not exist. #61500 (Alexey Milovidov).
  • Rework parallel processing in Ordered mode of storage S3Queue. This PR is backward incompatible for Ordered mode if you used settings s3queue_processing_threads_num or s3queue_total_shards_num. Setting s3queue_total_shards_num is deleted, previously it was allowed to use only under s3queue_allow_experimental_sharded_mode, which is now deprecated. A new setting is added - s3queue_buckets. #64349 (Kseniia Sumarokova).
  • New functions snowflakeIDToDateTime, snowflakeIDToDateTime64, dateTimeToSnowflakeID, and dateTime64ToSnowflakeID were added. Unlike the existing functions snowflakeToDateTime, snowflakeToDateTime64, dateTimeToSnowflake, and dateTime64ToSnowflake, the new functions are compatible with function generateSnowflakeID, i.e. they accept the snowflake IDs generated by generateSnowflakeID and produce snowflake IDs of the same type as generateSnowflakeID (i.e. UInt64). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just like generateSnowflakeID. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable setting allow_deprecated_snowflake_conversion_functions. #64948 (Robert Schulze).

New Feature

  • Allow to store named collections in ClickHouse Keeper. #64574 (Kseniia Sumarokova).
  • Support empty tuples. #55061 (Amos Bird).
  • Add Hilbert Curve encode and decode functions. #60156 (Artem Mustafin).
  • Add support for index analysis over hilbertEncode. #64662 (Artem Mustafin).
  • Added support for reading LINESTRING geometry in the WKT format using function readWKTLineString. #62519 (Nikita Mikhaylov).
  • Allow to attach parts from a different disk. #63087 (Unalian).
  • Added new SQL functions generateSnowflakeID for generating Twitter-style Snowflake IDs. #63577 (Danila Puzov).
  • Added merge_workload and mutation_workload settings to regulate how resources are utilized and shared between merges, mutations and other workloads. #64061 (Sergei Trifonov).
  • Add support for comparing IPv4 and IPv6 types using the = operator. #64292 (Francisco J. Jurado Moreno).
  • Support decimal arguments in binary math functions (pow, atan2, max2, min2, hypot). #64582 (Mikhail Gorshkov).
  • Added SQL functions parseReadableSize (along with OrNull and OrZero variants). #64742 (Francisco J. Jurado Moreno).
  • Add server settings max_table_num_to_throw and max_database_num_to_throw to limit the number of databases or tables on CREATE queries. #64781 (Xu Jia).
  • Add _time virtual column to file alike storages (s3/file/hdfs/url/azureBlobStorage). #64947 (Ilya Golshtein).
  • Introduced new functions base64URLEncode, base64URLDecode and tryBase64URLDecode. #64991 (Mikhail Gorshkov).
  • Add new function editDistanceUTF8, which calculates the edit distance between two UTF8 strings. #65269 (LiuNeng).
  • Add http_response_headers configuration to support custom response headers in custom HTTP handlers. #63562 (Grigorii).
  • Added a new table function loop to support returning query results in an infinite loop. #63452 (Sariel). This is useful for testing.
  • Introduced two additional columns in the system.query_log: used_privileges and missing_privileges. used_privileges is populated with the privileges that were checked during query execution, and missing_privileges contains required privileges that are missing. #64597 (Alexey Katsman).
  • Added a setting output_format_pretty_display_footer_column_names which when enabled displays column names at the end of the table for long tables (50 rows by default), with the threshold value for minimum number of rows controlled by output_format_pretty_display_footer_column_names_min_rows. #65144 (Shaun Struwig).

Experimental Feature

  • Introduce statistics of type "number of distinct values". #59357 (Han Fei).
  • Support statistics with ReplicatedMergeTree. #64934 (Han Fei).
  • If "replica group" is configured for a Replicated database, automatically create a cluster that includes replicas from all groups. #64312 (Alexander Tokmakov).
  • Add settings parallel_replicas_custom_key_range_lower and parallel_replicas_custom_key_range_upper to control how parallel replicas with dynamic shards parallelizes queries when using a range filter. #64604 (josh-hildred).

Performance Improvement

  • Add the ability to reshuffle rows during insert to optimize for size without violating the order set by PRIMARY KEY. It's controlled by the setting optimize_row_order (off by default). #63578 (Igor Markelov).
  • Add a native parquet reader, which can read parquet binary to ClickHouse Columns directly. It's controlled by the setting input_format_parquet_use_native_reader (disabled by default). #60361 (ZhiHong Zhang).
  • Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables. #60463 (Amos Bird).
  • Reduce max memory usage of multithreaded INSERTs by collecting chunks of multiple threads in a single transform. #61047 (Yarik Briukhovetskyi).
  • Reduce the memory usage when using Azure object storage by using fixed memory allocation, avoiding the allocation of an extra buffer. #63160 (SmitaRKulkarni).
  • Reduce the number of virtual function calls in ColumnNullable::size. #60556 (HappenLee).
  • Speedup splitByRegexp when the regular expression argument is a single-character. #62696 (Robert Schulze).
  • Speed up aggregation by 8-bit and 16-bit keys by keeping track of the min and max keys used. This allows to reduce the number of cells that need to be verified. #62746 (Jiebin Sun).
  • Optimize operator IN when the left hand side is LowCardinality and the right is a set of constants. #64060 (Zhiguo Zhou).
  • Use a thread pool to initialize and destroy hash tables inside ConcurrentHashJoin. #64241 (Nikita Taranov).
  • Optimized vertical merges in tables with sparse columns. #64311 (Anton Popov).
  • Enabled prefetches of data from remote filesystem during vertical merges. It improves latency of vertical merges in tables with data stored on remote filesystem. #64314 (Anton Popov).
  • Reduce redundant calls to isDefault of ColumnSparse::filter to improve performance. #64426 (Jiebin Sun).
  • Speedup find_super_nodes and find_big_family keeper-client commands by making multiple asynchronous getChildren requests. #64628 (Alexander Gololobov).
  • Improve function least/greatest for nullable numberic type arguments. #64668 (KevinyhZou).
  • Allow merging two consequent filtering steps of a query plan. This improves filter-push-down optimization if the filter condition can be pushed down from the parent step. #64760 (Nikolai Kochetov).
  • Remove bad optimization in the vertical final implementation and re-enable vertical final algorithm by default. #64783 (Duc Canh Le).
  • Remove ALIAS nodes from the filter expression. This slightly improves performance for queries with PREWHERE (with the new analyzer). #64793 (Nikolai Kochetov).
  • Re-enable OpenSSL session caching. #65111 (Robert Schulze).
  • Added settings to disable materialization of skip indexes and statistics on inserts (materialize_skip_indexes_on_insert and materialize_statistics_on_insert). #64391 (Anton Popov).
  • Use the allocated memory size to calculate the row group size and reduce the peak memory of the parquet writer in the single-threaded mode. #64424 (LiuNeng).
  • Improve the iterator of sparse column to reduce call of size. #64497 (Jiebin Sun).
  • Update condition to use server-side copy for backups to Azure blob storage. #64518 (SmitaRKulkarni).
  • Optimized memory usage of vertical merges for tables with high number of skip indexes. #64580 (Anton Popov).

Improvement

  • SHOW CREATE TABLE executed on top of system tables will now show the super handy comment unique for each table which will explain why this table is needed. #63788 (Nikita Mikhaylov).
  • The second argument (scale) of functions round(), roundBankers(), floor(), ceil() and trunc() can now be non-const. #64798 (Mikhail Gorshkov).
  • Hot reload storage policy for Distributed tables when adding a new disk. #58285 (Duc Canh Le).
  • Avoid possible deadlock during MergeTree index analysis when scheduling threads in a saturated service. #59427 (Sean Haynes).
  • Several minor corner case fixes to S3 proxy support & tunneling. #63427 (Arthur Passos).
  • Improve io_uring resubmit visibility. Rename profile event IOUringSQEsResubmits -> IOUringSQEsResubmitsAsync and add a new one IOUringSQEsResubmitsSync. #63699 (Tomer Shafir).
  • Added a new setting, metadata_keep_free_space_bytes to keep free space on the metadata storage disk. #64128 (MikhailBurdukov).
  • Add metrics to track the number of directories created and removed by the plain_rewritable metadata storage, and the number of entries in the local-to-remote in-memory map. #64175 (Julia Kartseva).
  • The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. limit or additional_table_filters) would affect the query result. #64205 (Robert Schulze).
  • Support the non standard error code QpsLimitExceeded in object storage as a retryable error. #64225 (Sema Checherinda).
  • Forbid converting a MergeTree table to replicated if the zookeeper path for this table already exists. #64244 (Kirill).
  • Added a new setting input_format_parquet_prefer_block_bytes to control the average output block bytes, and modified the default value of input_format_parquet_max_block_size to 65409. #64427 (LiuNeng).
  • Allow proxy to be bypassed for hosts specified in no_proxy env variable and ClickHouse proxy configuration. #63314 (Arthur Passos).
  • Always start Keeper with sufficient amount of threads in global thread pool. #64444 (Duc Canh Le).
  • Settings from the user's config don't affect merges and mutations for MergeTree on top of object storage. #64456 (alesapin).
  • Support the non standard error code TotalQpsLimitExceeded in object storage as a retryable error. #64520 (Sema Checherinda).
  • Updated Advanced Dashboard for both open-source and ClickHouse Cloud versions to include a chart for 'Maximum concurrent network connections'. #64610 (Thom O'Connor).
  • Improve progress report on zeros_mt and generateRandom. #64804 (Raúl Marín).
  • Add an asynchronous metric jemalloc.profile.active to show whether sampling is currently active. This is an activation mechanism in addition to prof.active; both must be active for the calling thread to sample. #64842 (Unalian).
  • Remove mark of allow_experimental_join_condition as important. This mark may have prevented distributed queries in a mixed versions cluster from being executed successfully. #65008 (Nikita Mikhaylov).
  • Added server Asynchronous metrics DiskGetObjectThrottler* and DiskGetObjectThrottler* reflecting request per second rate limit defined with s3_max_get_rps and s3_max_put_rps disk settings and currently available number of requests that could be sent without hitting throttling limit on the disk. Metrics are defined for every disk that has a configured limit. #65050 (Sergei Trifonov).
  • Initialize global trace collector for Poco::ThreadPool (needed for Keeper, etc). #65239 (Kseniia Sumarokova).
  • Add a validation when creating a user with bcrypt_hash. #65242 (Raúl Marín).
  • Add profile events for number of rows read during/after PREWHERE. #64198 (Nikita Taranov).
  • Print query in EXPLAIN PLAN with parallel replicas. #64298 (vdimir).
  • Rename allow_deprecated_functions to allow_deprecated_error_prone_window_functions. #64358 (Raúl Marín).
  • Respect max_read_buffer_size setting for file descriptors as well in the file table function. #64532 (Azat Khuzhin).
  • Disable transactions for unsupported storages even for materialized views. #64918 (alesapin).
  • Forbid QUALIFY clause in the old analyzer. The old analyzer ignored QUALIFY, so it could lead to unexpected data removal in mutations. #65356 (Dmitry Novik).

Bug Fix (user-visible misbehavior in an official stable release)

  • A bug in Apache ORC library was fixed: Fixed ORC statistics calculation, when writing, for unsigned types on all platforms and Int8 on ARM. #64563 (Michael Kolupaev).
  • Returned back the behaviour of how ClickHouse works and interprets Tuples in CSV format. This change effectively reverts https://github.com/ClickHouse/ClickHouse/pull/60994 and makes it available only under a few settings: output_format_csv_serialize_tuple_into_separate_columns, input_format_csv_deserialize_separate_columns_into_tuple and input_format_csv_try_infer_strings_from_quoted_tuples. #65170 (Nikita Mikhaylov).
  • Fix a permission error where a user in a specific situation can escalate their privileges on the default database without necessary grants. #64769 (pufit).
  • Fix crash with UniqInjectiveFunctionsEliminationPass and uniqCombined. #65188 (Raúl Marín).
  • Fix a bug in ClickHouse Keeper that causes digest mismatch during closing session. #65198 (Aleksei Filatov).
  • Use correct memory alignment for Distinct combinator. Previously, crash could happen because of invalid memory allocation when the combinator was used. #65379 (Antonio Andelic).
  • Fix crash with DISTINCT and window functions. #64767 (Igor Nikonov).
  • Fixed 'set' skip index not working with IN and indexHint(). #62083 (Michael Kolupaev).
  • Support executing function during assignment of parameterized view value. #63502 (SmitaRKulkarni).
  • Fixed parquet memory tracking. #63584 (Michael Kolupaev).
  • Fixed reading of columns of type Tuple(Map(LowCardinality(String), String), ...). #63956 (Anton Popov).
  • Fix an Cyclic aliases error for cyclic aliases of different type (expression and function). #63993 (Nikolai Kochetov).
  • This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline. #64079 (pufit).
  • Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. #64096 (Yakov Olkhovskiy).
  • Fix creating backups to S3 buckets with different credentials from the disk containing the file. #64153 (Antonio Andelic).
  • The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. #64199 (Robert Schulze).
  • Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. #64206 (Kruglov Pavel).
  • Fix duplicate alias error for distributed queries with ARRAY JOIN. #64226 (Nikolai Kochetov).
  • Fix unexpected accurateCast from string to integer. #64255 (wudidapaopao).
  • Fixed CNF simplification, in case any OR group contains mutually exclusive atoms. #64256 (Eduard Karacharov).
  • Fix Query Tree size validation. #64377 (Dmitry Novik).
  • Fix Logical error: Bad cast for Buffer table with PREWHERE. #64388 (Nikolai Kochetov).
  • Prevent recursive logging in blob_storage_log when it's stored on object storage. #64393 (vdimir).
  • Fixed CREATE TABLE AS queries for tables with default expressions. #64455 (Anton Popov).
  • Fixed optimize_read_in_order behaviour for ORDER BY ... NULLS FIRST / LAST on tables with nullable keys. #64483 (Eduard Karacharov).
  • Fix the Expression nodes list expected 1 projection names and Unknown expression or identifier errors for queries with aliases to GLOBAL IN.. #64517 (Nikolai Kochetov).
  • Fix an error Cannot find column in distributed queries with constant CTE in the GROUP BY key. #64519 (Nikolai Kochetov).
  • Fix the crash loop when restoring from backup is blocked by creating an MV with a definer that hasn't been restored yet. #64595 (pufit).
  • Fix the output of function formatDateTimeInJodaSyntax when a formatter generates an uneven number of characters and the last character is 0. For example, SELECT formatDateTimeInJodaSyntax(toDate('2012-05-29'), 'D') now correctly returns 150 instead of previously 15. #64614 (LiuNeng).
  • Do not rewrite aggregation if -If combinator is already used. #64638 (Dmitry Novik).
  • Fix type inference for float (in case of small buffer, i.e. --max_read_buffer_size 1). #64641 (Azat Khuzhin).
  • Fix bug which could lead to non-working TTLs with expressions. #64694 (alesapin).
  • Fix removing the WHERE and PREWHERE expressions, which are always true (for the new analyzer). #64695 (Nikolai Kochetov).
  • Fixed excessive part elimination by token-based text indexes (ngrambf , full_text) when filtering by result of startsWith, endsWith, match, multiSearchAny. #64720 (Eduard Karacharov).
  • Fixes incorrect behaviour of ANSI CSI escaping in the UTF8::computeWidth function. #64756 (Shaun Struwig).
  • Fix a case of incorrect removal of ORDER BY / LIMIT BY across subqueries. #64766 (Raúl Marín).
  • Fix (experimental) unequal join with subqueries for sets which are in the mixed join conditions. #64775 (lgbo).
  • Fix crash in a local cache over plain_rewritable disk. #64778 (Julia Kartseva).
  • Keeper fix: return correct value for zk_latest_snapshot_size in mntr command. #64784 (Antonio Andelic).
  • Fix Cannot find column in distributed query with ARRAY JOIN by Nested column. Fixes #64755. #64801 (Nikolai Kochetov).
  • Fix memory leak in slru cache policy. #64803 (Kseniia Sumarokova).
  • Fixed possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. #64844 (Anton Popov).
  • Fix the Block structure mismatch error for queries reading with PREWHERE from the materialized view when the materialized view has columns of different types than the source table. Fixes #64611. #64855 (Nikolai Kochetov).
  • Fix rare crash when table has TTL with subquery + database replicated + parallel replicas + analyzer. It's really rare, but please don't use TTLs with subqueries. #64858 (alesapin).
  • Fix duplicating Delete events in blob_storage_log in case of large batch to delete. #64924 (vdimir).
  • Fixed Session moved to another server error from [Zoo]Keeper that might happen after server startup when the config has includes from [Zoo]Keeper. #64986 (Alexander Tokmakov).
  • Fix ALTER MODIFY COMMENT query that was broken for parameterized VIEWs in https://github.com/ClickHouse/ClickHouse/pull/54211. #65031 (Nikolay Degterinsky).
  • Fix host_id in DatabaseReplicated when cluster_secure_connection parameter is enabled. Previously all the connections within the cluster created by DatabaseReplicated were not secure, even if the parameter was enabled. #65054 (Nikolay Degterinsky).
  • Fixing the Not-ready Set error after the PREWHERE optimization for StorageMerge. #65057 (Nikolai Kochetov).
  • Avoid writing to finalized buffer in File-like storages. #65063 (Kruglov Pavel).
  • Fix possible infinite query duration in case of cyclic aliases. Fixes #64849. #65081 (Nikolai Kochetov).
  • Fix the Unknown expression identifier error for remote queries with INTERPOLATE (alias) (new analyzer). Fixes #64636. #65090 (Nikolai Kochetov).
  • Fix pushing arithmetic operations out of aggregation. In the new analyzer, optimization was applied only once. #65104 (Dmitry Novik).
  • Fix aggregate function name rewriting in the new analyzer. #65110 (Dmitry Novik).
  • Respond with 5xx instead of 200 OK in case of receive timeout while reading (parts of) the request body from the client socket. #65118 (Julian Maicher).
  • Fix possible crash for hedged requests. #65206 (Azat Khuzhin).
  • Fix the bug in Hashed and Hashed_Array dictionary short circuit evaluation, which may read uninitialized number, leading to various errors. #65256 (jsc0218).
  • This PR ensures that the type of the constant(IN operator's second parameter) is always visible during the IN operator's type conversion process. Otherwise, losing type information may cause some conversions to fail, such as the conversion from DateTime to Date. This fixes (#64487). #65315 (pn).

Build/Testing/Packaging Improvement

ClickHouse release 24.5, 2024-05-30

Backward Incompatible Change

  • Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make sure to drop such indexes before upgrade and re-create them after upgrade. #62884 (Robert Schulze).
  • Usage of functions neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set allow_deprecated_error_prone_window_functions = 1 or set compatibility = '24.4' or lower. #63132 (Nikita Taranov).
  • Queries from system.columns will work faster if there is a large number of columns, but many databases or tables are not granted for SHOW TABLES. Note that in previous versions, if you grant SHOW COLUMNS to individual columns without granting SHOW TABLES to the corresponding tables, the system.columns table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. #63439 (Alexey Milovidov).

New Feature

  • Adds the Form format to read/write a single record in the application/x-www-form-urlencoded format. #60199 (Shaun Struwig).
  • Added possibility to compress in CROSS JOIN. #60459 (p1rattttt).
  • Added possibility to do CROSS JOIN in temporary files if the size exceeds limits. #63432 (p1rattttt).
  • Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y. To enable, SET allow_experimental_join_condition = 1. #60920 (lgbo).
  • Maps can now have Float32, Float64, Array(T), Map(K, V) and Tuple(T1, T2, ...) as keys. Closes #54537. #59318 (李扬).
  • Introduce bulk loading to EmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce EmbeddedRocksDB table settings. #59163 #63324 (Duc Canh Le).
  • User can now parse CRLF with TSV format using a setting input_format_tsv_crlf_end_of_line. Closes #56257. #59747 (Shaun Struwig).
  • A new setting input_format_force_null_for_omitted_fields that forces NULL values for omitted fields. #60887 (Constantine Peresypkin).
  • Earlier our S3 storage and s3 table function didn't support selecting from archive container files, such as tarballs, zip, 7z. Now they allow to iterate over files inside archives in S3. #62259 (Daniil Ivanik).
  • Support for conditional function clamp. #62377 (skyoct).
  • Add NPy output format. #62430 (豪肥肥).
  • Raw format as a synonym for TSVRaw. #63394 (Unalian).
  • Added a new SQL function generateUUIDv7 to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function UUIDToNum to extract bytes from a UUID and a new function UUIDv7ToDateTime to extract timestamp component from a UUID version 7. #62852 (Alexey Petrunyaka).
  • On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to INTO OUTFILE). #63662 (v01dXYZ).
  • Change warning on high number of attached tables to differentiate tables, views and dictionaries. #64180 (Francisco J. Jurado Moreno).
  • Provide support for azureBlobStorage function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If use_workload_identity parameter is set in config, workload identity is used for authentication. #57881 (Vinay Suryadevara).
  • Add TTL information in the system.parts_columns table. #63200 (litlig).

Experimental Features

  • Implement Dynamic data type that allows to store values of any type inside it without knowing all of them in advance. Dynamic type is available under a setting allow_experimental_dynamic_type. Reference: #54864. #63058 (Kruglov Pavel).
  • Allowed to create MaterializedMySQL database without connection to MySQL. #63397 (Kirill).
  • Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than max_retries_before_automatic_recovery (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. #63549 (Alexander Tokmakov).
  • Account failed files in s3queue_tracked_file_ttl_sec and s3queue_traked_files_limit for StorageS3Queue. #63638 (Kseniia Sumarokova).

Performance Improvement

  • Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by keep_free_space_size(elements)_ratio). This allows to release pressure from space reservation for queries (on tryReserve method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. #61250 (Kseniia Sumarokova).
  • Skip merging of newly created projection blocks during INSERT-s. #59405 (Nikita Taranov).
  • Process string functions ...UTF8 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. #61632 (李扬).
  • Improved performance of selection ({}) globs in StorageS3. #62120 (Andrey Zvonov).
  • HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. #62652 (Anton Ivashkin).
  • Add a new configurationprefer_merge_sort_block_bytes to control the memory usage and speed up sorting 2 times when merging when there are many columns. #62904 (LiuNeng).
  • clickhouse-local will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes #62941. #63074 (Alexey Milovidov).
  • Micro-optimizations for the new analyzer. #63429 (Raúl Marín).
  • Index analysis will work if DateTime is compared to DateTime64. This closes #63441. #63443 #63532 (Alexey Milovidov).
  • Speed up indices of type set a little (around 1.5 times) by removing garbage. #64098 (Alexey Milovidov).
  • Remove copying data when writing to the filesystem cache. #63401 (Kseniia Sumarokova).
  • Now backups with azure blob storage will use multicopy. #64116 (alesapin).
  • Allow to use native copy for azure even with different containers. #64154 (alesapin).
  • Finally enable native copy for azure. #64182 (alesapin).

Improvement

  • Allow using clickhouse-local and its shortcuts clickhouse and ch with a query or queries file as a positional argument. Examples: ch "SELECT 1", ch --param_test Hello "SELECT {test:String}", ch query.sql. This closes #62361. #63081 (Alexey Milovidov).
  • Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. #63365 (Julia Kartseva).
  • Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes #58634. #63381 (Alexey Milovidov).
  • Allow trailing commas in the columns list in the INSERT query. For example, INSERT INTO test (a, b, c, ) VALUES .... #63803 (Alexey Milovidov).
  • Better exception messages for the Regexp format. #63804 (Alexey Milovidov).
  • Allow trailing commas in the Values format. For example, this query is allowed: INSERT INTO test (a, b, c) VALUES (4, 5, 6,);. #63810 (Alexey Milovidov).
  • Make rabbitmq nack broken messages. Closes #45350. #60312 (Kseniia Sumarokova).
  • Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes #60460. #60468 (Alexey Milovidov).
  • Distinct messages for s3 error 'no key' for cases disk and storage. #61108 (Sema Checherinda).
  • The progress bar will work for trivial queries with LIMIT from system.zeros, system.zeros_mt (it already works for system.numbers and system.numbers_mt), and the generateRandom table function. As a bonus, if the total number of records is greater than the max_rows_to_read limit, it will throw an exception earlier. This closes #58183. #61823 (Alexey Milovidov).
  • Support for "Merge Key" in YAML configurations (this is a weird feature of YAML, please never mind). #62685 (Azat Khuzhin).
  • Enhance error message when non-deterministic function is used with Replicated source. #62896 (Grégoire Pineau).
  • Fix interserver secret for Distributed over Distributed from remote. #63013 (Azat Khuzhin).
  • Support include_from for YAML files. However, you should better use config.d #63106 (Eduard Karacharov).
  • Keep previous data in terminal after picking from skim suggestions. #63261 (FlameFactory).
  • Width of fields (in Pretty formats or the visibleWidth function) now correctly ignores ANSI escape sequences. #63270 (Shaun Struwig).
  • Update the usage of error code NUMBER_OF_ARGUMENTS_DOESNT_MATCH by more accurate error codes when appropriate. #63406 (Yohann Jardin).
  • os_user and client_hostname are now correctly set up for queries for command line suggestions in clickhouse-client. This closes #63430. #63433 (Alexey Milovidov).
  • Automatically correct max_block_size to the default value if it is zero. #63587 (Antonio Andelic).
  • Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address #52086. #63656 (Zimu Li).
  • Enable truncate operation for object storage disks. #63693 (MikhailBurdukov).
  • The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. #63786 (Nikita Mikhaylov).
  • Clickhouse disks have to read server setting to obtain actual metadata format version. #63831 (Sema Checherinda).
  • Disable pretty format restrictions (output_format_pretty_max_rows/output_format_pretty_max_value_width) when stdout is not TTY. #63942 (Azat Khuzhin).
  • Exception handling now works when ClickHouse is used inside AWS Lambda. Author: Alexey Coolnev. #64014 (Alexey Milovidov).
  • Throw CANNOT_DECOMPRESS instread of CORRUPTED_DATA on invalid compressed data passed via HTTP. #64036 (vdimir).
  • A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes #61993. #64084 (Alexey Milovidov).
  • Add metrics, logs, and thread names around parts filtering with indices. #64130 (Alexey Milovidov).
  • Ignore allow_suspicious_primary_key on ATTACH and verify on ALTER. #64202 (Azat Khuzhin).

Build/Testing/Packaging Improvement

Bug Fix

ClickHouse release 24.4, 2024-04-30

Upgrade Notes

  • clickhouse-odbc-bridge and clickhouse-library-bridge are now separate packages. This closes #61677. #62114 (Alexey Milovidov).
  • Don't allow to set max_parallel_replicas (for the experimental parallel reading from replicas) to 0 as it doesn't make sense. Closes #60140. #61201 (Kruglov Pavel).
  • Remove support for INSERT WATCH query (part of the deprecated LIVE VIEW feature). #62382 (Alexey Milovidov).
  • Removed the optimize_monotonous_functions_in_order_by setting. #63004 (Raúl Marín).
  • Remove experimental tag from the Replicated database engine. Now it is in Beta stage. #62937 (Justin de Guzman).

New Feature

  • Support recursive CTEs. #62074 (Maksim Kita).
  • Support QUALIFY clause. Closes #47819. #62619 (Maksim Kita).
  • Table engines are grantable now, and it won't affect existing users behavior. #60117 (jsc0218).
  • Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. #61116 (Julia Kartseva). The main use case is for system tables.
  • The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). #62123 (Alexey Milovidov).
  • Supports dropping multiple tables at the same time like DROP TABLE a, b, c;. #58705 (zhongyuankai).
  • Modifying memory table settings through ALTER MODIFY SETTING is now supported. Example: ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;. #62039 (zhongyuankai).
  • Added role query parameter to the HTTP interface. It works similarly to SET ROLE x, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send both SET ROLE x and the statement itself at the same time. It is possible to set multiple roles that way, e.g., ?role=x&role=y, which will be an equivalent of SET ROLE x, y. #62669 (Serge Klochkov).
  • Add SYSTEM UNLOAD PRIMARY KEY to free up memory usage for a table's primary key. #62738 (Pablo Marcos).
  • Added value1, value2, ..., value10 columns to system.text_log. These columns contain values that were used to format the message. #59619 (Alexey Katsman).
  • Added persistent virtual column _block_offset which stores original number of row in block that was assigned at insert. Persistence of column _block_offset can be enabled by the MergeTree setting enable_block_offset_column. Added virtual column_part_data_version which contains either min block number or mutation version of part. Persistent virtual column _block_number is not considered experimental anymore. #60676 (Anton Popov).
  • Add a setting input_format_json_throw_on_bad_escape_sequence, disabling it allows saving bad escape sequences in JSON input formats. #61889 (Kruglov Pavel).

Performance Improvement

  • JOIN filter push down improvements using equivalent sets. #61216 (Maksim Kita).
  • Convert OUTER JOIN to INNER JOIN optimization if the filter after JOIN always filters default values. Optimization can be controlled with setting query_plan_convert_outer_join_to_inner_join, enabled by default. #62907 (Maksim Kita).
  • Improvement for AWS S3. Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. #62249 (Sema Checherinda).
  • Reduce overhead of the mutations for SELECTs (v2). #60856 (Azat Khuzhin).
  • More frequently invoked functions in PODArray are now force-inlined. #61144 (李扬).
  • Speed up parsing of JSON by skipping the rest of the object when all required columns are read. #62210 (lgbo).
  • Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. #62404 (Kruglov Pavel).
  • Functions to_utc_timestamp and from_utc_timestamp are now about 2x faster. #62583 (KevinyhZou).
  • Functions parseDateTimeOrNull, parseDateTimeOrZero, parseDateTimeInJodaSyntaxOrNull and parseDateTimeInJodaSyntaxOrZero now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. #62634 (LiuNeng).
  • SELECTs against system.query_cache are now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). #62671 (Robert Schulze).
  • Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. #61163 (Kseniia Sumarokova).
  • Speed up dynamic resize of filesystem cache. #61723 (Kseniia Sumarokova).
  • Dictionary source with INVALIDATE_QUERY is not reloaded twice on startup. #62050 (vdimir).
  • Fix an issue where when a redundant = 1 or = 0 is added after a boolean expression involving the primary key, the primary index is not used. For example, both SELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1 and SELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0 will both perform a full table scan, when the primary index can be used. #62142 (josh-hildred).
  • Return stream of chunks from system.remote_data_paths instead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. #62613 (Alexander Gololobov).

Experimental Feature

  • Support parallel write buffer for Azure Blob Storage managed by setting azure_allow_parallel_part_upload. #62534 (SmitaRKulkarni).
  • Userspace page cache works with static web storage (disk(type = web)) now. Use client setting use_page_cache_for_disks_without_file_cache=1 to enable. #61911 (Michael Kolupaev).
  • Don't treat Bool and number variants as suspicious in the Variant type. #61999 (Kruglov Pavel).
  • Implement better conversion from String to Variant using parsing. #62005 (Kruglov Pavel).
  • Support Variant in JSONExtract functions. #62014 (Kruglov Pavel).
  • Mark type Variant as comparable so it can be used in primary key. #62693 (Kruglov Pavel).

Improvement

  • For convenience purpose, SELECT * FROM numbers() will work in the same way as SELECT * FROM system.numbers - without a limit. #61969 (YenchangChan).
  • Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka (a bad C library with a lot of bugs) that consumer properties were specified for producer instances and vice versa (e.g. Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance). Closes: #58983. #58956 (Aleksandr Musorin).
  • Functions date_diff and age now calculate their result at nanosecond instead of microsecond precision. They now also offer nanosecond (or nanoseconds or ns) as a possible value for the unit parameter. #61409 (Austin Kothig).
  • Added nano-, micro-, milliseconds unit for date_trunc. #62335 (Misz606).
  • Reload certificate chain during certificate reload. #61671 (Pervakov Grigorii).
  • Try to prevent an error #60432 by not allowing a table to be attached if there is an active replica for that replica path. #61876 (Arthur Passos).
  • Implement support for input for clickhouse-local. #61923 (Azat Khuzhin).
  • Join table engine with strictness ANY is consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close #51027. #61972 (vdimir).
  • Automatically infer Nullable column types from Apache Arrow schema. #61984 (Maksim Kita).
  • Allow to cancel parallel merge of aggregate states during aggregation. Example: uniqExact. #61992 (Maksim Kita).
  • Use system.keywords to fill in the suggestions and also use them in the all places internally. #62000 (Nikita Mikhaylov).
  • OPTIMIZE FINAL for ReplicatedMergeTree now will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinary MergeTree behaviour. #62067 (Nikita Taranov).
  • While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like test_tbl(a Int32, b Int32, c Int32), but the first line of text file only has 2 fields, and in this suitation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou).
  • CREATE AS copies the table's comment. #62117 (Pablo Marcos).
  • Add query progress to table zookeeper. #62152 (JackyWoo).
  • Add ability to turn on trace collector (Real and CPU) server-wide. #62189 (alesapin).
  • Added setting lightweight_deletes_sync (default value: 2 - wait all replicas synchronously). It is similar to setting mutations_sync but affects only behaviour of lightweight deletes. #62195 (Anton Popov).
  • Distinguish booleans and integers while parsing values for custom settings: SET custom_a = true; SET custom_b = 1;. #62206 (Vitaly Baranov).
  • Support S3 access through AWS Private Link Interface endpoints. Closes #60021, #31074 and #53761. #62208 (Arthur Passos).
  • Do not create a directory for UDF in clickhouse-client if it does not exist. This closes #59597. #62366 (Alexey Milovidov).
  • The query cache now no longer caches results of queries against system tables (system.*, information_schema.*, INFORMATION_SCHEMA.*). #62376 (Robert Schulze).
  • MOVE PARTITION TO TABLE query can be delayed or can throw TOO_MANY_PARTS exception to avoid exceeding limits on the part count. The same settings and limits are applied as for theINSERT query (see max_parts_in_total, parts_to_delay_insert, parts_to_throw_insert, inactive_parts_to_throw_insert, inactive_parts_to_delay_insert, max_avg_part_size_for_too_many_parts, min_delay_to_insert_ms and max_delay_to_insert settings). #62420 (Sergei Trifonov).
  • Changed the default installation directory on macOS from /usr/bin to /usr/local/bin. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into /usr/bin, even with sudo. #62489 (haohang).
  • Make transform always return the first match. #62518 (Raúl Marín).
  • Added the missing hostname column to system table blob_storage_log. #62456 (Jayme Bird).
  • For consistency with other system tables, system.backup_log now has a column event_time. #62541 (Jayme Bird).
  • Table system.backup_log now has the "default" sorting key which is event_date, event_time, the same as for other _log table engines. #62667 (Nikita Mikhaylov).
  • Avoid evaluating table DEFAULT expressions while executing RESTORE. #62601 (Vitaly Baranov).
  • S3 storage and backups also need the same default keep alive settings as s3 disk. #62648 (Sema Checherinda).
  • Add librdkafka's (that infamous C library, which has a lot of bugs) client identifier to log messages to be able to differentiate log messages from different consumers of a single table. #62813 (János Benjamin Antal).
  • Allow special macros {uuid} and {database} in a Replicated database ZooKeeper path. #62818 (Vitaly Baranov).
  • Allow quota key with different auth scheme in HTTP requests. #62842 (Kseniia Sumarokova).
  • Reduce the verbosity of command line argument --help in clickhouse client and clickhouse local. The previous output is now generated by --help --verbose. #62973 (Yarik Briukhovetskyi).
  • log_bin_use_v1_row_events was removed in MySQL 8.3, and we adjust the experimental MaterializedMySQL engine for it #60479. #63101 (Eugene Klimov). Author: Nikolay Yankin.

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

ClickHouse release 24.3 LTS, 2024-03-27

Upgrade Notes

  • The setting allow_experimental_analyzer is enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting the compatibility to 24.2 or disabling the allow_experimental_analyzer setting. Watch the video on YouTube.
  • ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, output_format_parquet_string_as_string, output_format_orc_string_as_string, output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster lz4 compression method, that's why we set zstd by default. This is controlled by the settings output_format_parquet_compression_method, output_format_orc_compression_method, and output_format_arrow_compression_method. We changed the default to zstd for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).
  • In the new ClickHouse version, the functions geoDistance, greatCircleDistance, and greatCircleAngle will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by setting geo_distance_returns_float64_on_float64_arguments to false or setting compatibility to 24.2 or earlier. #61848 (Alexey Milovidov). Co-authored with Geet Patel.
  • The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of #55186 and #45409. It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type. To disable the usage of in-memory data parts, do ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. #61127 (Alexey Milovidov).
  • Changed the column name from duration_ms to duration_microseconds in the system.zookeeper table to reflect the reality that the duration is in the microsecond resolution. #60774 (Duc Canh Le).
  • Reject incoming INSERT queries in case when query-level settings async_insert and deduplicate_blocks_in_dependent_materialized_views are enabled together. This behaviour is controlled by a setting throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert and enabled by default. This is a continuation of https://github.com/ClickHouse/ClickHouse/pull/59699 needed to unblock https://github.com/ClickHouse/ClickHouse/pull/59915. #60888 (Nikita Mikhaylov).
  • Utility clickhouse-copier is moved to a separate repository on GitHub: https://github.com/ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download. This closes: #60734 This closes: #60540 This closes: #60250 This closes: #52917 This closes: #51140 This closes: #47517 This closes: #47189 This closes: #46598 This closes: #40257 This closes: #36504 This closes: #35485 This closes: #33702 This closes: #26702.
  • To increase compatibility with MySQL, the compatibility alias locate now accepts arguments (needle, haystack[, start_pos]) by default. The previous behavior (haystack, needle, [, start_pos]) can be restored by setting function_locate_has_mysql_compatible_argument_order = 0. #61092 (Robert Schulze).
  • Forbid SimpleAggregateFunction in ORDER BY of MergeTree tables (like AggregateFunction is forbidden, but they are forbidden because they are not comparable) by default (use allow_suspicious_primary_key to allow them). #61399 (Azat Khuzhin).
  • The Ordinary database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes #52229. #56942 (shabroo).

New Feature

  • Support reading and writing backups as tar (in addition to zip). #59535 (josh-hildred).
  • Implemented support for S3 Express buckets. #59965 (Nikita Taranov).
  • Allow to attach parts from a different disk (using copy instead of hard link). #60112 (Unalian).
  • Size-capped Memory tables: controlled by their settings, min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep and max_rows_to_keep. #60612 (Jake Bamrah).
  • Separate limits on number of waiting and executing queries. Added new server setting max_waiting_queries that limits the number of queries waiting due to async_load_databases. Existing limits on number of executing queries no longer count waiting queries. #61053 (Sergei Trifonov).
  • Added a table system.keywords which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. #51808 (Nikita Mikhaylov).
  • Add support for ATTACH PARTITION ALL. #61107 (Kirill Nikiforov).
  • Add a new function, getClientHTTPHeader. This closes #54665. Co-authored with @lingtaolf. #61820 (Alexey Milovidov).
  • Add generate_series as a table function (compatibility alias for PostgreSQL to the existing numbers function). This function generates table with an arithmetic progression with natural numbers. #59390 (divanik).
  • A mode for topK/topkWeighed support mode, which return count of values and its error. #54508 (UnamedRus).
  • Added function toMillisecond which returns the millisecond component for values of typeDateTime or DateTime64. #60281 (Shaun Struwig).
  • Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make / redirect to the Play UI. #60390 (Alexey Milovidov).

Performance Improvement

  • Optimized function dotProduct to omit unnecessary and expensive memory copies. #60928 (Robert Schulze).
  • 30x faster printing for 256-bit integers. #61100 (Raúl Marín).
  • If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns with the value 0.9 by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. #60255 (Alexey Milovidov).
  • Improve the performance of serialized aggregation methods when involving multiple Nullable columns. #55809 (Amos Bird).
  • Lazy builds JSON's output to improve performance of ALL JOIN. #58278 (LiuNeng).
  • Make HTTP/HTTPs connections with external services, such as AWS S3 reusable for all use cases. Even when the response is 3xx or 4xx. #58845 (Sema Checherinda).
  • Improvements to aggregate functions argMin / argMax / any / anyLast / anyHeavy, as well as ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1 queries. #58640 (Raúl Marín).
  • Trivial optimization for column's filter. Peak memory can be reduced to 44% of the original in some cases. #59698 (李扬).
  • Execute multiIf function in a columnar fashion when the result type's underlying type is a number. #60384 (李扬).
  • Faster (almost 2x) mutexes. #60823 (Azat Khuzhin).
  • Drain multiple connections in parallel when a distributed query is finishing. #60845 (lizhuoyu5).
  • Optimize data movement between columns of a Nullable number or a Nullable string, which improves some micro-benchmarks. #60846 (李扬).
  • Operations with the filesystem cache will suffer less from the lock contention. #61066 (Alexey Milovidov).
  • Optimize array join and other JOINs by preventing a wrong compiler's optimization. Close #61074. #61075 (李扬).
  • If a query with a syntax error contained the COLUMNS matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. #61543 (Alexey Milovidov).
  • Add a new analyzer pass to optimize the IN operator for a single value. #61564 (LiuNeng).
  • DNSResolver shuffles set of resolved IPs which is needed to uniformly utilize multiple endpoints of AWS S3. #60965 (Sema Checherinda).

Experimental Feature

  • Support parallel reading for Azure blob storage. This improves the performance of the experimental Azure object storage. #61503 (SmitaRKulkarni).
  • Add asynchronous WriteBuffer for Azure blob storage similar to S3. This improves the performance of the experimental Azure object storage. #59929 (SmitaRKulkarni).
  • Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. #61785 (Daniel Pozo Escalona).
  • Add a setting parallel_replicas_allow_in_with_subquery = 1 which allows subqueries for IN work with parallel replicas. #60950 (Nikolai Kochetov).
  • A change for the "zero-copy" replication: all zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. #57575 (Sema Checherinda).

Improvement

  • Use MergeTree as a default table engine. #60524 (Alexey Milovidov)
  • Enable output_format_pretty_row_numbers by default. It is better for usability. #61791 (Alexey Milovidov).
  • In the previous version, some numbers in Pretty formats were not pretty enough. #61794 (Alexey Milovidov).
  • A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the SHOW CREATE TABLE query. #61795 (Alexey Milovidov).
  • Similarly to clickhouse-local, clickhouse-client will accept the --output-format option as a synonym to the --format option. This closes #59848. #61797 (Alexey Milovidov).
  • If stdout is a terminal and the output format is not specified, clickhouse-client and similar tools will use PrettyCompact by default, similarly to the interactive mode. clickhouse-client and clickhouse-local will handle command line arguments for input and output formats in a unified fashion. This closes #61272. #61800 (Alexey Milovidov).
  • Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting, output_format_pretty_highlight_digit_groups. #61802 (Alexey Milovidov).
  • Add ability to override initial INSERT settings via SYSTEM FLUSH DISTRIBUTED. #61832 (Azat Khuzhin).
  • Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. #61096 (Azat Khuzhin).
  • Support files without format extension in Filesystem database. #60795 (Kruglov Pavel).
  • Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. #60420 (豪肥肥). I appreciate if you will continue to write it correctly, e.g., JSON 😇, not Json 🤮, but we don't mind if you spell it as you prefer.
  • Added none_only_active mode for distributed_ddl_output_mode setting. #60340 (Alexander Tokmakov).
  • The advanced dashboard has slightly better colors for multi-line graphs. #60391 (Alexey Milovidov).
  • The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. #60692 (Alexey Milovidov).
  • While running the MODIFY COLUMN query for materialized views, check the inner table's structure to ensure every column exists. #47427 (sunny).
  • String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes #60726. #60727 (Alexey Milovidov).
  • Allow declaring Enums in the structure of external data for query processing (this is an immediate temporary table that you can provide for your query). #57857 (Duc Canh Le).
  • Consider lightweight deleted rows when selecting parts to merge, so the disk size of the resulting part will be estimated better. #58223 (Zhuo Qiu).
  • Added comments for columns for more system tables. Continuation of https://github.com/ClickHouse/ClickHouse/pull/58356. #59016 (Nikita Mikhaylov).
  • Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like _part_offset. #59033 (Amos Bird). Improved overall usability of virtual columns. Now it is allowed to use virtual columns in PREWHERE (it's worthwhile for non-const virtual columns like _part_offset). Now a builtin documentation is available for virtual columns as a comment of column in DESCRIBE query with enabled setting describe_include_virtual_columns. #60205 (Anton Popov).
  • Instead of using a constant key, now object storage generates key for determining remove objects capability. #59495 (Sema Checherinda).
  • Allow "local" as object storage type instead of "local_blob_storage". #60165 (Kseniia Sumarokova).
  • Parallel flush of pending INSERT blocks of Distributed engine on DETACH/server shutdown and SYSTEM FLUSH DISTRIBUTED (Parallelism will work only if you have multi-disk policy for a table (like everything in the Distributed engine right now)). #60225 (Azat Khuzhin).
  • Add a setting to force read-through cache for merges. #60308 (Kseniia Sumarokova).
  • An improvement for the MySQL compatibility protocol. The issue #57598 mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. #60338 (PapaToemmsn).
  • Function substring now has a new alias byteSlice. #60494 (Robert Schulze).
  • Renamed server setting dns_cache_max_size to dns_cache_max_entries to reduce ambiguity. #60500 (Kirill Nikiforov).
  • SHOW INDEX | INDEXES | INDICES | KEYS no longer sorts by the primary key columns (which was unintuitive). #60514 (Robert Schulze).
  • Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. #60537 (Antonio Andelic).
  • Update tzdata to 2024a. #60768 (Raúl Marín).
  • Keeper improvement: support leadership_expiry_ms in Keeper's settings. #60806 (Brokenice0415).
  • Always infer exponential numbers in JSON formats regardless of the setting input_format_try_infer_exponent_floats. Add setting input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. #60808 (Kruglov Pavel).
  • Add support for START TRANSACTION syntax typically used in MySQL syntax, resolving https://github.com/ClickHouse/ClickHouse/discussions/60865. #60886 (Zach Naimon).
  • Add a flag for the full-sorting merge join algorithm to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. #60896 (loudongfeng).
  • Support detect output format by file exctension in clickhouse-client and clickhouse-local. #61036 (豪肥肥).
  • Update memory limit in runtime when Linux's CGroups value changed. #61049 (Han Fei).
  • Add the function toUInt128OrZero, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases FROM_UNIXTIME and DATE_FORMAT (they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. #61114 (Alexey Milovidov).
  • Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example: GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;. #61115 (pufit).
  • Fix has() function with Nullable column (fixes #60214). #61249 (Mikhail Koviazin).
  • Now it's possible to specify the attribute merge="true" in config substitutions for subtrees <include from_zk="/path" merge="true">. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. #61299 (alesapin).
  • Add async metrics for virtual memory mappings: VMMaxMapCount & VMNumMaps. Closes #60662. #61354 (Tuan Pham Anh).
  • Use temporary_files_codec setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in partial_merge JOIN algorithm. #61456 (Maksim Kita).
  • Add a new setting max_parser_backtracks which allows to limit the complexity of query parsing. #61502 (Alexey Milovidov).
  • Less contention during dynamic resize of the filesystem cache. #61524 (Kseniia Sumarokova).
  • Disallow sharded mode of StorageS3 queue, because it will be rewritten. #61537 (Kseniia Sumarokova).
  • Fixed typo: from use_leagcy_max_level to use_legacy_max_level. #61545 (William Schoeffel).
  • Remove some duplicate entries in system.blob_storage_log. #61622 (YenchangChan).
  • Added current_user function as a compatibility alias for MySQL. #61770 (Yarik Briukhovetskyi).
  • Fix inconsistent floating point aggregate function states in mixed x86-64 / ARM clusters #60610 (Harry Lee).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

ClickHouse release 24.2, 2024-02-29

Backward Incompatible Change

  • Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. #59385 (Kruglov Pavel).
  • Add sanity check for number of threads and block sizes. #60138 (Raúl Marín).
  • Don't infer floats in exponential notation by default. Add a setting input_format_try_infer_exponent_floats that will restore previous behaviour (disabled by default). Closes #59476. #59500 (Kruglov Pavel).
  • Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the format_alter_operations_with_parentheses config. By default, in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.: ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c cannot be parsed properly with the old syntax. In the new syntax the query ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c) is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster. #59532 (János Benjamin Antal).
  • Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/en/sql-reference/statements/create/view#sql_security. #54901 #60439 (pufit).

New Feature

  • Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. #54901 #60439 (pufit).
  • Try to detect file format automatically during schema inference if it's unknown in file/s3/hdfs/url/azureBlobStorage engines. Closes #50576. #59092 (Kruglov Pavel).
  • Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. #58486 (Julia Kartseva).
  • Allow to set up a quota for maximum sequential login failures. #54737 (Alexey Gerasimchuck).
  • A new aggregate function groupArrayIntersect. Follows up: #49862. #59598 (Yarik Briukhovetskyi).
  • Backup & Restore support for AzureBlobStorage. Resolves #50747. #56988 (SmitaRKulkarni).
  • The user can now specify the template string directly in the query using format_schema_rows_template as an alternative to format_template_row. Closes #31363. #59088 (Shaun Struwig).
  • Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty convert_to_replicated file in table's data directory (/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/) and that table will be converted automatically on next server start. #57798 (Kirill).
  • Added query ALTER TABLE table FORGET PARTITION partition that removes ZooKeeper nodes, related to an empty partition. #59507 (Sergei Trifonov). This is an expert-level feature.
  • Support JWT credentials file for the NATS table engine. #59543 (Nickolaj Jepsen).
  • Implemented system.dns_cache table, which can be useful for debugging DNS issues. #59856 (Kirill Nikiforov).
  • The codec LZ4HC will accept a new level 2, which is faster than the previous minimum level 3, at the expense of less compression. In previous versions, LZ4HC(2) and less was the same as LZ4HC(3). Author: Cyan4973. #60090 (Alexey Milovidov).
  • Implemented system.dns_cache table, which can be useful for debugging DNS issues. New server setting dns_cache_max_size. #60257 (Kirill Nikiforov).
  • Support single-argument version for the merge table function, as merge(['db_name', ] 'tables_regexp'). #60372 (豪肥肥).
  • Support negative positional arguments. Closes #57736. #58292 (flynn).
  • Support specifying a set of permitted users for specific S3 settings in config using user key. #60144 (Antonio Andelic).
  • Added table function mergeTreeIndex. It represents the contents of index and marks files of MergeTree tables. It can be used for introspection. Syntax: mergeTreeIndex(database, table, [with_marks = true]) where database.table is an existing table with MergeTree engine. #58140 (Anton Popov).

Experimental Feature

  • Added function seriesOutliersDetectTukey to detect outliers in series data using Tukey's fences algorithm. #58632 (Bhavna Jindal). Keep in mind that the behavior will be changed in the next patch release.
  • Add function variantType that returns Enum with variant type name for each row. #59398 (Kruglov Pavel).
  • Support LEFT JOIN, ALL INNER JOIN, and simple subqueries for parallel replicas (only with analyzer). New setting parallel_replicas_prefer_local_join chooses local JOIN execution (by default) vs GLOBAL JOIN. All tables should exist on every replica from cluster_for_parallel_replicas. New settings min_external_table_block_size_rows and min_external_table_block_size_bytes are used to squash small blocks that are sent for temporary tables (only with analyzer). #58916 (Nikolai Kochetov).
  • Allow concurrent table creation in the Replicated database during adding or recovering a new replica. #59277 (Konstantin Bogdanov).
  • Implement comparison operator for Variant values and proper Field inserting into Variant column. Don't allow creating Variant type with similar variant types by default (allow uder a setting allow_suspicious_variant_types) Closes #59996. Closes #59850. #60198 (Kruglov Pavel).
  • Disable parallel replicas JOIN with CTE (not analyzer) #59239 (Raúl Marín).

Performance Improvement

  • Primary key will use less amount of memory. #60049 (Alexey Milovidov).
  • Improve memory usage for primary key and some other operations. #60050 (Alexey Milovidov).
  • The tables' primary keys will be loaded in memory lazily on first access. This is controlled by the new MergeTree setting primary_key_lazy_load, which is on by default. This provides several advantages: - it will not be loaded for tables that are not used; - if there is not enough memory, an exception will be thrown on first use instead of at server startup. This provides several disadvantages: - the latency of loading the primary key will be paid on the first query rather than before accepting connections; this theoretically may introduce a thundering-herd problem. This closes #11188. #60093 (Alexey Milovidov).
  • Vectorized distance functions used in vector search. #58866 (Robert Schulze).
  • Vectorized function dotProduct which is useful for vector search. #60202 (Robert Schulze).
  • Add short-circuit ability for dictGetOrDefault function. Closes #52098. #57767 (jsc0218).
  • Keeper improvement: cache only a certain amount of logs in-memory controlled by latest_logs_cache_size_threshold and commit_logs_cache_size_threshold. #59460 (Antonio Andelic).
  • Keeper improvement: reduce size of data node even more. #59592 (Antonio Andelic).
  • Continue optimizing branch miss of if function when result type is Float*/Decimal*/*Int*, follow up of https://github.com/ClickHouse/ClickHouse/pull/57885. #59148 (李扬).
  • Optimize if function when the input type is Map, the speed-up is up to ~10x. #59413 (李扬).
  • Improve performance of the Int8 type by implementing strict aliasing (we already have it for UInt8 and all other integer types). #59485 (Raúl Marín).
  • Optimize performance of sum/avg conditionally for bigint and big decimal types by reducing branch miss. #59504 (李扬).
  • Improve performance of SELECTs with active mutations. #59531 (Azat Khuzhin).
  • Optimized function isNotNull with AVX2. #59621 (李扬).
  • Improve ASOF JOIN performance for sorted or almost sorted data. #59731 (Maksim Kita).
  • The previous default value equals to 1 MB for async_insert_max_data_size appeared to be too small. The new one would be 10 MiB. #59536 (Nikita Mikhaylov).
  • Use multiple threads while reading the metadata of tables from a backup while executing the RESTORE command. #60040 (Vitaly Baranov).
  • Now if StorageBuffer has more than 1 shard (num_layers > 1) background flush will happen simultaneously for all shards in multiple threads. #60111 (alesapin).

Improvement

  • When output format is Pretty format and a block consists of a single numeric value which exceeds one million, A readable number will be printed on table right. #60379 (rogeryk).
  • Added settings split_parts_ranges_into_intersecting_and_non_intersecting_final and split_intersecting_parts_ranges_into_layers_final. These settings are needed to disable optimizations for queries with FINAL and needed for debug only. #59705 (Maksim Kita). Actually not only for that - they can also lower memory usage at the expense of performance.
  • Rename the setting extract_kvp_max_pairs_per_row to extract_key_value_pairs_max_pairs_per_row. The issue (unnecessary abbreviation in the setting name) was introduced in https://github.com/ClickHouse/ClickHouse/pull/43606. Fix the documentation of this setting. #59683 (Alexey Milovidov). #59960 (jsc0218).
  • Running ALTER COLUMN MATERIALIZE on a column with DEFAULT or MATERIALIZED expression now precisely follows the semantics. #58023 (Duc Canh Le).
  • Enabled an exponential backoff logic for errors during mutations. It will reduce the CPU usage, memory usage and log file sizes. #58036 (MikhailBurdukov).
  • Add improvement to count the InitialQuery Profile Event. #58195 (Unalian).
  • Allow to define volume_priority in storage_configuration. #58533 (Andrey Zvonov).
  • Add support for the Date32 type in the T64 codec. #58738 (Hongbin Ma).
  • Allow trailing commas in types with several items. #59119 (Aleksandr Musorin).
  • Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g. <distributed> <flush_on_detach>false</flush_on_detach> </distributed>. #59291 (Azat Khuzhin).
  • Retry disconnects and expired sessions when reading system.zookeeper. This is helpful when reading many rows from system.zookeeper table especially in the presence of fault-injected disconnects. #59388 (Alexander Gololobov).
  • Do not interpret numbers with leading zeroes as octals when input_format_values_interpret_expressions=0. #59403 (Joanna Hulboj).
  • At startup and whenever config files are changed, ClickHouse updates the hard memory limits of its total memory tracker. These limits are computed based on various server settings and cgroups limits (on Linux). Previously, setting /sys/fs/cgroup/memory.max (for cgroups v2) was hard-coded. As a result, cgroup v2 memory limits configured for nested groups (hierarchies), e.g. /sys/fs/cgroup/my/nested/group/memory.max were ignored. This is now fixed. The behavior of v1 memory limits remains unchanged. #59435 (Robert Schulze).
  • New profile events added to observe the time spent on calculating PK/projections/secondary indices during INSERT-s. #59436 (Nikita Taranov).
  • Allow to define a starting point for S3Queue with Ordered mode at the creation using a setting s3queue_last_processed_path. #59446 (Kseniia Sumarokova).
  • Made comments for system tables also available in system.tables in clickhouse-local. #59493 (Nikita Mikhaylov).
  • system.zookeeper table: previously the whole result was accumulated in memory and returned as one big chunk. This change should help to reduce memory consumption when reading many rows from system.zookeeper, allow showing intermediate progress (how many rows have been read so far) and avoid hitting connection timeout when result set is big. #59545 (Alexander Gololobov).
  • Now dashboard understands both compressed and uncompressed state of URL's #hash (backward compatibility). Continuation of #59124 . #59548 (Amos Bird).
  • Bumped Intel QPL (used by codec DEFLATE_QPL) from v1.3.1 to v1.4.0 . Also fixed a bug for polling timeout mechanism, as we observed in same cases timeout won't work properly, if timeout happen, IAA and CPU may process buffer concurrently. So far, we'd better make sure IAA codec status is not QPL_STS_BEING_PROCESSED, then fallback to SW codec. #59551 (jasperzhu).
  • Do not show a warning about the server version in ClickHouse Cloud because ClickHouse Cloud handles seamless upgrades automatically. #59657 (Alexey Milovidov).
  • After self-extraction temporary binary is moved instead copying. #59661 (Yakov Olkhovskiy).
  • Fix stack unwinding on Apple macOS. This closes #53653. #59690 (Nikita Mikhaylov).
  • Check for stack overflow in parsers even if the user misconfigured the max_parser_depth setting to a very high value. This closes #59622. #59697 (Alexey Milovidov). #60434
  • Unify XML and SQL created named collection behaviour in Kafka storage. #59710 (Pervakov Grigorii).
  • In case when merge_max_block_size_bytes is small enough and tables contain wide rows (strings or tuples) background merges may stuck in an endless loop. This behaviour is fixed. Follow-up for https://github.com/ClickHouse/ClickHouse/pull/59340. #59812 (Nikita Mikhaylov).
  • Allow uuid in replica_path if CREATE TABLE explicitly has it. #59908 (Azat Khuzhin).
  • Add column metadata_version of ReplicatedMergeTree table in system.tables system table. #59942 (Maksim Kita).
  • Keeper improvement: send only Keeper related metrics/events for Prometheus. #59945 (Antonio Andelic).
  • The dashboard will display metrics across different ClickHouse versions even if the structure of system tables has changed after the upgrade. #59967 (Alexey Milovidov).
  • Allow loading AZ info from a file. #59976 (Konstantin Bogdanov).
  • Keeper improvement: add retries on failures for Disk related operations. #59980 (Antonio Andelic).
  • Add new config setting backups.remove_backup_files_after_failure: <clickhouse> <backups> <remove_backup_files_after_failure>true</remove_backup_files_after_failure> </backups> </clickhouse>. #60002 (Vitaly Baranov).
  • Copy S3 file GCP fallback to buffer copy in case GCP returned Internal Error with GATEWAY_TIMEOUT HTTP error code. #60164 (Maksim Kita).
  • Short circuit execution for ULIDStringToDateTime. #60211 (Juan Madurga).
  • Added query_id column for tables system.backups and system.backup_log. Added error stacktrace to error column. #60220 (Maksim Kita).
  • Connections through the MySQL port now automatically run with setting prefer_column_name_to_alias = 1 to support QuickSight out-of-the-box. Also, settings mysql_map_string_to_text_in_show_columns and mysql_map_fixed_string_to_text_in_show_columns are now enabled by default, affecting also only MySQL connections. This increases compatibility with more BI tools. #60365 (Robert Schulze).
  • Fix a race condition in JavaScript code leading to duplicate charts on top of each other. #60392 (Alexey Milovidov).

Build/Testing/Packaging Improvement

  • Added builds and tests with coverage collection with introspection. Continuation of #56102. #58792 (Alexey Milovidov).
  • Update the Rust toolchain in corrosion-cmake when the CMake cross-compilation toolchain variable is set. #59309 (Aris Tritas).
  • Add some fuzzing to ASTLiterals. #59383 (Raúl Marín).
  • If you want to run initdb scripts every time when ClickHouse container is starting you shoud initialize environment varible CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS. #59808 (Alexander Nikolaev).
  • Remove ability to disable generic clickhouse components (like server/client/...), but keep some that requires extra libraries (like ODBC or keeper). #59857 (Azat Khuzhin).
  • Query fuzzer will fuzz SETTINGS inside queries. #60087 (Alexey Milovidov).
  • Add support for building ClickHouse with clang-19 (master). #60448 (Alexey Milovidov).

Bug Fix (user-visible misbehavior in an official stable release)

  • Fix a "Non-ready set" error in TTL WHERE. #57430 (Nikolai Kochetov).
  • Fix a bug in the quantilesGK function #58216 (李扬).
  • Fix a wrong behavior with intDiv for Decimal arguments #59243 (Yarik Briukhovetskyi).
  • Fix translate with FixedString input #59356 (Raúl Marín).
  • Fix digest calculation in Keeper #59439 (Antonio Andelic).
  • Fix stacktraces for binaries without debug symbols #59444 (Azat Khuzhin).
  • Fix ASTAlterCommand::formatImpl in case of column specific settings… #59445 (János Benjamin Antal).
  • Fix SELECT * FROM [...] ORDER BY ALL with Analyzer #59462 (zhongyuankai).
  • Fix possible uncaught exception during distributed query cancellation #59487 (Azat Khuzhin).
  • Make MAX use the same rules as permutation for complex types #59498 (Raúl Marín).
  • Fix corner case when passing update_insert_deduplication_token_in_dependent_materialized_views #59544 (Jordi Villar).
  • Fix incorrect result of arrayElement / map on empty value #59594 (Raúl Marín).
  • Fix crash in topK when merging empty states #59603 (Raúl Marín).
  • Fix distributed table with a constant sharding key #59606 (Vitaly Baranov).
  • Fix KQL issue found by WingFuzz #59626 (Yong Wang).
  • Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer #59630 (Vitaly Baranov).
  • Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor #59658 (Raúl Marín).
  • Fix query start time on non initial queries #59662 (Raúl Marín).
  • Validate types of arguments for minmax skipping index #59733 (Anton Popov).
  • Fix leftPad / rightPad function with FixedString input #59739 (Raúl Marín).
  • Fix AST fuzzer issue in function countMatches #59752 (Robert Schulze).
  • RabbitMQ: fix having neither acked nor nacked messages #59775 (Kseniia Sumarokova).
  • Fix StorageURL doing some of the query execution in single thread #59833 (Michael Kolupaev).
  • S3Queue: fix uninitialized value #59897 (Kseniia Sumarokova).
  • Fix parsing of partition expressions surrounded by parens #59901 (János Benjamin Antal).
  • Fix crash in JSONColumnsWithMetadata format over HTTP #59925 (Kruglov Pavel).
  • Do not rewrite sum to count if the return value differs in Analyzer #59926 (Azat Khuzhin).
  • UniqExactSet read crash fix #59928 (Maksim Kita).
  • ReplicatedMergeTree invalid metadata_version fix #59946 (Maksim Kita).
  • Fix data race in StorageDistributed #59987 (Nikita Taranov).
  • Docker: run init scripts when option is enabled rather than disabled #59991 (jktng).
  • Fix INSERT into SQLite with single quote (by escaping single quotes with a quote instead of backslash) #60015 (Azat Khuzhin).
  • Fix several logical errors in arrayFold #60022 (Raúl Marín).
  • Fix optimize_uniq_to_count removing the column alias #60026 (Raúl Marín).
  • Fix possible exception from S3Queue table on drop #60036 (Kseniia Sumarokova).
  • Fix formatting of NOT with single literals #60042 (Raúl Marín).
  • Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 #60083 (Kruglov Pavel).
  • Fix inconsistent formatting of queries containing tables named table. Fix wrong formatting of queries with UNION ALL, INTERSECT, and EXCEPT when their structure wasn't linear. This closes #52349. Fix wrong formatting of SYSTEM queries, including SYSTEM ... DROP FILESYSTEM CACHE, SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW, SYSTEM ENABLE/DISABLE FAILPOINT. Fix formatting of parameterized DDL queries. Fix the formatting of the DESCRIBE FILESYSTEM CACHE query. Fix incorrect formatting of the SET param_... (a query setting a parameter). Fix incorrect formatting of CREATE INDEX queries. Fix inconsistent formatting of CREATE USER and similar queries. Fix inconsistent formatting of CREATE SETTINGS PROFILE. Fix incorrect formatting of ALTER ... MODIFY REFRESH. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting of RESPECT NULLS and IGNORE NULLS if they were used after a function that implements an operator (such as plus). Fix idiotic formatting of SYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM .... Fix inconsistent formatting of invalid queries with GROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS. Fix inconsistent formatting of GRANT CURRENT GRANTS. Fix inconsistent formatting of CREATE TABLE (... COLLATE). Additionally, I fixed the incorrect formatting of EXPLAIN in subqueries (#60102). Fixed incorrect formatting of lambda functions (#60012). Added a check so there is no way to miss these abominations in the future. #60095 (Alexey Milovidov).
  • Fix inconsistent formatting of explain in subqueries #60102 (Alexey Milovidov).
  • Fix cosineDistance crash with Nullable #60150 (Raúl Marín).
  • Allow casting of bools in string representation to true bools #60160 (Robert Schulze).
  • Fix system.s3queue_log #60166 (Kseniia Sumarokova).
  • Fix arrayReduce with nullable aggregate function name #60188 (Raúl Marín).
  • Hide sensitive info for S3Queue #60233 (Kseniia Sumarokova).
  • Fix http exception codes. #60252 (Austin Kothig).
  • S3Queue: fix a bug (also fixes flaky test_storage_s3_queue/test.py::test_shards_distributed) #60282 (Kseniia Sumarokova).
  • Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 #60359 (Kruglov Pavel).
  • Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments #60453 (Raúl Marín).
  • Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. #59674. #60470 (Alexey Milovidov) #59674 (Austin Kothig).

ClickHouse release 24.1, 2024-01-30

Backward Incompatible Change

  • The setting print_pretty_type_names is turned on by default. You can turn it off to keep the old behavior or SET compatibility = '23.12'. #57726 (Alexey Milovidov).
  • The MergeTree setting clean_deleted_rows is deprecated, it has no effect anymore. The CLEANUP keyword for OPTIMIZE is not allowed by default (unless allow_experimental_replacing_merge_with_cleanup is enabled). #58316 (Alexander Tokmakov).
  • The function reverseDNSQuery is no longer available. This closes #58368. #58369 (Alexey Milovidov).
  • Enable various changes to improve the access control in the configuration file. These changes affect the behavior, and you check the config.xml in the access_control_improvements section. In case you are not confident, keep the values in the configuration file as they were in the previous version. #58584 (Alexey Milovidov).
  • Improve the operation of sumMapFiltered with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values. -0 is now also treated as equal to 0; since 0 values are discarded, -0 values are discarded too. #58959 (Raúl Marín).
  • The function visibleWidth will behave according to the docs. In previous versions, it simply counted code points after string serialization, like the lengthUTF8 function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, set function_visible_width_behavior to 0, or set compatibility to 23.12 or lower. #59022 (Alexey Milovidov).
  • Kusto dialect is disabled until these two bugs will be fixed: #59037 and #59036. #59305 (Alexey Milovidov). Any attempt to use Kusto will result in exception.
  • More efficient implementation of the FINAL modifier no longer guarantees preserving the order even if max_threads = 1. If you counted on the previous behavior, set enable_vertical_final to 0 or compatibility to 23.12.

New Feature

  • Implement Variant data type that represents a union of other data types. Type Variant(T1, T2, ..., TN) means that each row of this type has a value of either type T1 or T2 or ... or TN or none of them (NULL value). Variant type is available under a setting allow_experimental_variant_type. Reference: #54864. #58047 (Kruglov Pavel).
  • Certain settings (currently min_compress_block_size and max_compress_block_size) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example: CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();. #55201 (Duc Canh Le).
  • Add quantileDD aggregate function as well as the corresponding quantilesDD and medianDD. It is based on the DDSketch https://www.vldb.org/pvldb/vol12/p2195-masson.pdf. ### Documentation entry for user-facing changes. #56342 (Srikanth Chekuri).
  • Allow to configure any kind of object storage with any kind of metadata type. #58357 (Kseniia Sumarokova).
  • Added null_status_on_timeout_only_active and throw_only_active modes for distributed_ddl_output_mode that allow to avoid waiting for inactive replicas. #58350 (Alexander Tokmakov).
  • Add function arrayShingles to compute subarrays, e.g. arrayShingles([1, 2, 3, 4, 5], 3) returns [[1,2,3],[2,3,4],[3,4,5]]. #58396 (Zheng Miao).
  • Added functions punycodeEncode, punycodeDecode, idnaEncode and idnaDecode which are useful for translating international domain names to an ASCII representation according to the IDNA standard. #58454 (Robert Schulze).
  • Added string similarity functions dramerauLevenshteinDistance, jaroSimilarity and jaroWinklerSimilarity. #58531 (Robert Schulze).
  • Add two settings output_format_compression_level to change output compression level and output_format_compression_zstd_window_log to explicitly set compression window size and enable long-range mode for zstd compression if output compression method is zstd. Applied for INTO OUTFILE and when writing to table functions file, url, hdfs, s3, and azureBlobStorage. #58539 (Duc Canh Le).
  • Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal. Add new auto mode to setting output_format_pretty_color. #58614 (Shaun Struwig).
  • Added function sqidDecode which decodes Sqids. #58544 (Robert Schulze).
  • Allow to read Bool values into String in JSON input formats. It's done under a setting input_format_json_read_bools_as_strings that is enabled by default. #58561 (Kruglov Pavel).
  • Added function seriesDecomposeSTL which decomposes a time series into a season, a trend and a residual component. #57078 (Bhavna Jindal).
  • Introduced MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases. #57323 (Val Doroshchuk).
  • Intel QuickAssist Technology (QAT) provides hardware-accelerated compression and cryptograpy. ClickHouse got a new compression codec ZSTD_QAT which utilizes QAT for zstd compression. The codec uses Intel's QATlib and Inte's QAT ZSTD Plugin. Right now, only compression can be accelerated in hardware (a software fallback kicks in in case QAT could not be initialized), decompression always runs in software. #57509 (jasperzhu).
  • Implementing the new way how object storage keys are generated for s3 disks. Now the format could be defined in terms of re2 regex syntax with key_template option in disc description. #57663 (Sema Checherinda).
  • Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #58038 (Yakov Olkhovskiy).
  • Add settings max_materialized_views_size_for_table to limit the number of materialized views attached to a table. #58068 (zhongyuankai).
  • clickhouse-format improvements: support INSERT queries with VALUES; support comments (use --comments to output them); support --max_line_length option to format only long queries in multiline. #58246 (vdimir).
  • Attach all system tables in clickhouse-local, including system.parts. This closes #58312. #58359 (Alexey Milovidov).
  • Support for Enum data types in function transform. This closes #58241. #58360 (Alexey Milovidov).
  • Add table system.database_engines. #58390 (Bharat Nallan). Allow registering database engines independently in the codebase. #58365 (Bharat Nallan). Allow registering interpreters independently. #58443 (Bharat Nallan).
  • Added FROM <Replicas> modifier for SYSTEM SYNC REPLICA LIGHTWEIGHT query. With the FROM modifier ensures we wait for fetches and drop-ranges only for the specified source replicas, as well as any replica not in zookeeper or with an empty source_replica. #58393 (Jayme Bird).
  • Added setting update_insert_deduplication_token_in_dependent_materialized_views. This setting allows to update insert deduplication token with table identifier during insert in dependent materialized views. Closes #59165. #59238 (Maksim Kita).
  • Added statement SYSTEM RELOAD ASYNCHRONOUS METRICS which updates the asynchronous metrics. Mostly useful for testing and development. #53710 (Robert Schulze).

Performance Improvement

  • Coordination for parallel replicas is rewritten for better parallelism and cache locality. It has been tested for linear scalability on hundreds of replicas. It also got support for reading in order. #57968 (Nikita Taranov).
  • Replace HTTP outgoing buffering based with the native ClickHouse buffers. Add bytes counting metrics for interfaces. #56064 (Yakov Olkhovskiy).
  • Large aggregation states of uniqExact will be merged in parallel in distrubuted queries. #59009 (Nikita Taranov).
  • Lower memory usage after reading from MergeTree tables. #59290 (Anton Popov).
  • Lower memory usage in vertical merges. #59340 (Anton Popov).
  • Avoid huge memory consumption during Keeper startup for more cases. #58455 (Antonio Andelic).
  • Keeper improvement: reduce Keeper's memory usage for stored nodes. #59002 (Antonio Andelic).
  • More cache-friendly final implementation. Note on the behaviour change: previously queries with FINAL modifier that read with a single stream (e.g. max_threads = 1) produced sorted output without explicitly provided ORDER BY clause. This is no longer guaranteed when enable_vertical_final = true (and it is so by default). #54366 (Duc Canh Le).
  • Bypass extra copying in ReadBufferFromIStream which is used, e.g., for reading from S3. #56961 (Nikita Taranov).
  • Optimize array element function when input is Array(Map)/Array(Array(Num)/Array(Array(String))/Array(BigInt)/Array(Decimal). The previous implementations did more allocations than needed. The optimization speed up is up to ~6x especially when input type is Array(Map). #56403 (李扬).
  • Read column once while reading more than one subcolumn from it in compact parts. #57631 (Kruglov Pavel).
  • Rewrite the AST of sum(column + constant) function. This is available as an optimization pass for Analyzer #57853 (Jiebin Sun).
  • The evaluation of function match now utilizes skipping indices ngrambf_v1 and tokenbf_v1. #57882 (凌涛).
  • The evaluation of function match now utilizes inverted indices. #58284 (凌涛).
  • MergeTree FINAL does not compare rows from same non-L0 part. #58142 (Duc Canh Le).
  • Speed up iota calls (filling array with consecutive numbers). #58271 (Raúl Marín).
  • Speedup MIN/MAX for non-numeric types. #58334 (Raúl Marín).
  • Optimize the combination of filters (like in multi-stage PREWHERE) with BMI2/SSE intrinsics #58800 (Zhiguo Zhou).
  • Use one thread less in clickhouse-local. #58968 (Alexey Milovidov).
  • Improve the multiIf function performance when the type is Nullable. #57745 (KevinyhZou).
  • Add SYSTEM JEMALLOC PURGE for purging unused jemalloc pages, SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILE for controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper: jmst for dumping jemalloc stats, jmfp, jmep, jmdp for controlling jemalloc profile if the profiler is enabled. #58665 (Antonio Andelic).
  • Lower memory consumption in backups to S3. #58962 (Vitaly Baranov).

Improvement

  • Added comments (brief descriptions) to all columns of system tables. There are several reasons for this: - We use system tables a lot, and sometimes it could be very difficult for developer to understand the purpose and the meaning of a particular column. - We change (add new ones or modify existing) system tables a lot and the documentation for them is always outdated. For example take a look at the documentation page for system.parts. It misses a lot of columns - We would like to eventually generate documentation directly from ClickHouse. #58356 (Nikita Mikhaylov).
  • Allow queries without aliases for subqueries for PASTE JOIN. #58654 (Yarik Briukhovetskyi).
  • Enable MySQL/MariaDB integration on macOS. This closes #21191. #46316 (Alexey Milovidov) (Robert Schulze).
  • Disable max_rows_in_set_to_optimize_join by default. #56396 (vdimir).
  • Add <host_name> config parameter that allows avoiding resolving hostnames in ON CLUSTER DDL queries and Replicated database engines. This mitigates the possibility of the queue being stuck in case of a change in cluster definition. Closes #57573. #57603 (Nikolay Degterinsky).
  • Increase load_metadata_threads to 16 for the filesystem cache. It will make the server start up faster. #57732 (Alexey Milovidov).
  • Add ability to throttle merges/mutations (max_mutations_bandwidth_for_server/max_merges_bandwidth_for_server). #57877 (Azat Khuzhin).
  • Replaced undocumented (boolean) column is_hot_reloadable in system table system.server_settings by (Enum8) column changeable_without_restart with possible values No, Yes, IncreaseOnly and DecreaseOnly. Also documented the column. #58029 (skyoct).
  • Cluster discovery supports setting username and password, close #58063. #58123 (vdimir).
  • Support query parameters in ALTER TABLE ... PART. #58297 (Azat Khuzhin).
  • Create consumers for Kafka tables on the fly (but keep them for some period - kafka_consumers_pool_ttl_ms, since last used), this should fix problem with statistics for system.kafka_consumers (that does not consumed when nobody reads from Kafka table, which leads to live memory leak and slow table detach) and also this PR enables stats for system.kafka_consumers by default again. #58310 (Azat Khuzhin).
  • sparkBar as an alias to sparkbar. #58335 (凌涛).
  • Avoid sending ComposeObject requests after upload to GCS. #58343 (Azat Khuzhin).
  • Correctly handle keys with dot in the name in configurations XMLs. #58354 (Azat Khuzhin).
  • Make function format return constant on constant arguments. This closes #58355. #58358 (Alexey Milovidov).
  • Adding a setting max_estimated_execution_time to separate max_execution_time and max_estimated_execution_time. #58402 (Zhang Yifan).
  • Provide a hint when an invalid database engine name is used. #58444 (Bharat Nallan).
  • Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default as Arrow recommends. Closes #57401. #58519 (Kruglov Pavel).
  • Implement #58575 Support CLICKHOUSE_PASSWORD_FILE environment variable when running the docker image. #58583 (Eyal Halpern Shalev).
  • When executing some queries, which require a lot of streams for reading data, the error "Paste JOIN requires sorted tables only" was previously thrown. Now the numbers of streams resize to 1 in that case. #58608 (Yarik Briukhovetskyi).
  • Better message for INVALID_IDENTIFIER error. #58703 (Yakov Olkhovskiy).
  • Improved handling of signed numeric literals in normalizeQuery. #58710 (Salvatore Mesoraca).
  • Support Point data type for MySQL. #58721 (Kseniia Sumarokova).
  • When comparing a Float32 column and a const string, read the string as Float32 (instead of Float64). #58724 (Raúl Marín).
  • Improve S3 compatibility, add ECloud EOS storage support. #58786 (xleoken).
  • Allow KILL QUERY to cancel backups / restores. This PR also makes running backups and restores visible in system.processes. Also, there is a new setting in the server configuration now - shutdown_wait_backups_and_restores (default=true) which makes the server either wait on shutdown for all running backups and restores to finish or just cancel them. #58804 (Vitaly Baranov).
  • Avro format to support ZSTD codec. Closes #58735. #58805 (flynn).
  • MySQL interface gained support for net_write_timeout and net_read_timeout settings. net_write_timeout is translated into the native send_timeout ClickHouse setting and, similarly, net_read_timeout into receive_timeout. Fixed an issue where it was possible to set MySQL sql_select_limit setting only if the entire statement was in upper case. #58835 (Serge Klochkov).
  • A better exception message while conflict of creating dictionary and table with the same name. #58841 (Yarik Briukhovetskyi).
  • Make sure that for custom (created from SQL) disks ether filesystem_caches_path (a common directory prefix for all filesystem caches) or custom_cached_disks_base_directory (a common directory prefix for only filesystem caches created from custom disks) is specified in server config. custom_cached_disks_base_directory has higher priority for custom disks over filesystem_caches_path, which is used if the former one is absent. Filesystem cache setting path must lie inside that directory, otherwise exception will be thrown preventing disk to be created. This will not affect disks created on an older version and server was upgraded - then the exception will not be thrown to allow the server to successfully start). custom_cached_disks_base_directory is added to default server config as /var/lib/clickhouse/caches/. Closes #57825. #58869 (Kseniia Sumarokova).
  • MySQL interface gained compatibility with SHOW WARNINGS/SHOW COUNT(*) WARNINGS queries, though the returned result is always an empty set. #58929 (Serge Klochkov).
  • Skip unavailable replicas when executing parallel distributed INSERT SELECT. #58931 (Alexander Tokmakov).
  • Display word-descriptive log level while enabling structured log formatting in json. #58936 (Tim Liou).
  • MySQL interface gained support for CAST(x AS SIGNED) and CAST(x AS UNSIGNED) statements via data type aliases: SIGNED for Int64, and UNSIGNED for UInt64. This improves compatibility with BI tools such as Looker Studio. #58954 (Serge Klochkov).
  • Change working directory to the data path in docker container. #58975 (cangyin).
  • Added setting for Azure Blob Storage azure_max_unexpected_write_error_retries , can also be set from config under azure section. #59001 (SmitaRKulkarni).
  • Allow server to start with broken data lake table. Closes #58625. #59080 (Kseniia Sumarokova).
  • Allow to ignore schema evolution in the Iceberg table engine and read all data using schema specified by the user on table creation or latest schema parsed from metadata on table creation. This is done under a setting iceberg_engine_ignore_schema_evolution that is disabled by default. Note that enabling this setting can lead to incorrect result as in case of evolved schema all data files will be read using the same schema. #59133 (Kruglov Pavel).
  • Prohibit mutable operations (INSERT/ALTER/OPTIMIZE/...) on read-only/write-once storages with a proper TABLE_IS_READ_ONLY error (to avoid leftovers). Avoid leaving left-overs on write-once disks (format_version.txt) on CREATE/ATTACH. Ignore DROP for ReplicatedMergeTree (so as for MergeTree). Fix iterating over s3_plain (MetadataStorageFromPlainObjectStorage::iterateDirectory). Note read-only is web disk, and write-once is s3_plain. #59170 (Azat Khuzhin).
  • Fix bug in the experimental _block_number column which could lead to logical error during complex combination of ALTERs and merges. Fixes #56202. Replaces #58601. #59295 (alesapin).
  • Play UI understands when an exception is returned inside JSON. Adjustment for #52853. #59303 (Alexey Milovidov).
  • /binary HTTP handler allows to specify user, host, and optionally, password in the query string. #59311 (Alexey Milovidov).
  • Support backups for compressed in-memory tables. This closes #57893. #59315 (Alexey Milovidov).
  • Support the FORMAT clause in BACKUP and RESTORE queries. #59338 (Vitaly Baranov).
  • Function concatWithSeparator now supports arbitrary argument types (instead of only String and FixedString arguments). For example, SELECT concatWithSeparator('.', 'number', 1) now returns number.1. #59341 (Robert Schulze).

Build/Testing/Packaging Improvement

  • Improve aliases for clickhouse binary (now ch/clickhouse is clickhouse-local or clickhouse depends on the arguments) and add bash completion for new aliases. #58344 (Azat Khuzhin).
  • Add settings changes check to CI to check that all settings changes are reflected in settings changes history. #58555 (Kruglov Pavel).
  • Use tables directly attached from S3 in stateful tests. #58791 (Alexey Milovidov).
  • Save the whole fuzzer.log as an archive instead of the last 100k lines. tail -n 100000 often removes lines with table definitions. Example:. #58821 (Dmitry Novik).
  • Enable Rust on macOS with Aarch64 (this will add fuzzy search in client with skim and the PRQL language, though I don't think that are people who host ClickHouse on darwin, so it is mostly for fuzzy search in client I would say). #59272 (Azat Khuzhin).
  • Fix aggregation issue in mixed x86_64 and ARM clusters #59132 (Harry Lee).

Bug Fix (user-visible misbehavior in an official stable release)

Changelog for 2023