Introduction

Home page — Fast Open-Source OLAP DBMS - ClickHouse

clickhouse-local is an easy-to-use version of ClickHouse that is ideal for developers who need to perform fast processing on local and remote files using SQL without having to install a full database server. With clickhouse-local, developers can use SQL commands (using the ClickHouse SQL dialect) directly from the command line, providing a simple and efficient way to access ClickHouse features without the need for a full ClickHouse installation. One of the main benefits of clickhouse-local is that it is already included when installing clickhouse-client. This means that developers can get started with clickhouse-local quickly, without the need for a complex installation process.

curl https://clickhouse.com/ | sh will download a single clickhouse binary, which can be used as clickhouse-local:

./clickhouse local -q "SELECT * FROM file('reviews.tsv', 'TabSeparated')"
# to see the inferred schema
./clickhouse local -q "DESCRIBE file('reviews.tsv')"
# a query
./clickhouse local -q "SELECT
argMax(product_title,star_rating),
max(star_rating)
FROM file('reviews.tsv')"

This can also be used to query data in a parquet file on S3 storage:

./clickhouse local -q "
SELECT count()
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')"

chDB

embedded SQL OLAP Engine powered by ClickHouse

The birth of chDB - auxten — “rocket engine on a bicycle”. github.

  • In-process SQL OLAP Engine, powered by ClickHouse
  • No need to install ClickHouse
  • Minimized data copy from C++ to Python with python memoryview
  • Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+more formats, samples
  • Support Python DB API 2.0, example
  • pip install chdb
query = "SELECT distinct city from 'employees.csv' "
res = chdb.query(query, "CSV")
print(res)

See Trying chDB, an embeddable ClickHouse engine for more code examples.

TODO: What’s the comparison to duckdb and where would one shine over the other?

Comparisons

[ClickHouse or StarRocks? Here is a DETAILED Comparison | by Tianyi Wang | Dec, 2021 | Medium]; (https://archive.is/llUmZ) (archive).

Articles

Papers

ClickHouse - Lightning Fast Analytics for Everyone ( PDF ) (2024). Youtube Talk.

This paper presents an overview of ClickHouse, a popular open- source OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates. Its storage layer combines a data format based on traditional log-structured merge (LSM) trees with novel techniques for continuous trans- formation (e.g. aggregation, archiving) of historical data in the background. Queries are written in a convenient SQL dialect and processed by a state-of-the-art vectorized query execution engine with optional code compilation. ClickHouse makes aggressive use of pruning techniques to avoid evaluating irrelevant data in queries. Other data management systems can be integrated at the table function, table engine, or database engine level. Real-world bench- marks demonstrate that ClickHouse is amongst the fastest analyti- cal databases on the market.

Internals

ByConity is a fork of clickhouse by tiktok; it is a distributed cloud-native SQL data warehouse engine, that excels in interactive queries and Ad-Hoc queries, featuring support for querying multiple tables, cluster expansion without sensation, and unified aggregation of offline batch data and real-time data streams.