ClickHouse Quick Start
This page helps you set up open-source ClickHouse on your own machine. The fastest way to deploy ClickHouse and to get access to our exclusive SQL Console is to use ClickHouse Cloud.
New users get $300 in free trial credits. Click here to sign up.
1: Download the binary
ClickHouse runs natively on Linux, FreeBSD and macOS, and runs on Windows via the WSL.
The simplest way to download ClickHouse locally is to run the following curl
command. It determines if your operating system is supported,
then downloads an appropriate ClickHouse binary:
curl https://clickhouse.com/ | sh
2: Start the server
Run the following command to start the ClickHouse server:
./clickhouse server
3: Start the client
Use the clickhouse-client
to connect to your ClickHouse service. Open a new Terminal, change directories to where your clickhouse
binary is saved, and run the following command:
./clickhouse client
You should see a smiling face as it connects to your service running on localhost:
my-host :)
4: Create a table
Use CREATE TABLE
to define a new table. Typical SQL DDL commands work in ClickHouse with one addition - tables in ClickHouse require
an ENGINE
clause. Use MergeTree
to take advantage of the performance benefits of ClickHouse:
CREATE TABLE my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree
PRIMARY KEY (user_id, timestamp)
5. Insert data
You can use the familiar INSERT INTO TABLE
command with ClickHouse, but it is important to understand that each insert into a
MergeTree
table causes a part (folder) to be created in storage. To minimize parts, bulk insert lots of rows at a time (tens of
thousands or even millions at once).
INSERT INTO my_first_table (user_id, message, timestamp, metric) VALUES
(101, 'Hello, ClickHouse!', now(), -1.0 ),
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
6. Query your new table
You can write a SELECT
query just like you would with any SQL database:
SELECT *
FROM my_first_table
ORDER BY timestamp
Notice the response comes back in a nice table format:
┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│ 102 │ Insert a lot of rows per batch │ 2022-03-21 00:00:00 │ 1.41421 │
│ 102 │ Sort your data based on your commonly-used queries │ 2022-03-22 00:00:00 │ 2.718 │
│ 101 │ Hello, ClickHouse! │ 2022-03-22 14:04:09 │ -1 │
│ 101 │ Granules are the smallest chunks of data read │ 2022-03-22 14:04:14 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘
4 rows in set. Elapsed: 0.008 sec.
7: Insert your own data
The next step is to get your current data into ClickHouse. We have lots of table functions and integrations for ingesting data. We have some examples in the tabs below, or check out our Integrations for a long list of technologies that integrate with ClickHouse.
- S3
- GCS
- Web
- Local
- PostgreSQL
- MySQL
- ODBC/JDBC
- Message Queues
- Data Lakes
- Other
Use the s3
table function to read files from S3. It's a table function - meaning that the
result is a table that can be:
- used as the source of a
SELECT
query (allowing you to run ad-hoc queries and leave your data in S3), or... - insert the resulting table into a
MergeTree
table (when you are ready to move your data into ClickHouse)
An ad-hoc query looks like:
SELECT
passenger_count,
avg(toFloat32(total_amount))
FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz',
'TabSeparatedWithNames'
)
GROUP BY passenger_count
ORDER BY passenger_count;
Moving the data into a ClickHouse table looks like the following, where nyc_taxi
is a MergeTree
table:
INSERT INTO nyc_taxi
SELECT * FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz',
'TabSeparatedWithNames'
)
SETTINGS input_format_allow_errors_num=25000;
View our collection of AWS S3 documentation pages for lots more details and examples of using S3 with ClickHouse.
The s3
table function used for reading data in AWS S3 also works on files in Google Cloud Storage. For example:
SELECT
*
FROM s3(
'https://storage.googleapis.com/my-bucket/trips.parquet',
'MY_GCS_HMAC_KEY',
'MY_GCS_HMAC_SECRET_KEY',
'Parquet'
)
LIMIT 1000
Find more details on the s3
table function page.
The url
table function reads files accessible from the web:
--By default, ClickHouse prevents redirects to protect from SSRF attacks.
--The URL below requires a redirect, so we must set max_http_get_redirects > 0.
SET max_http_get_redirects=10;
SELECT *
FROM url(
'http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
'CSV'
);
Find more details on the url
table function page.
Use the file
table engine to read a local file. For simplicity, copy the file to the user_files
directory (which is
found in the directory where you downloaded the ClickHouse binary).
DESCRIBE TABLE file('comments.tsv')
Query id: 8ca9b2f9-65a2-4982-954a-890de710a336
┌─name──────┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ id │ Nullable(Int64) │ │ │ │ │ │
│ type │ Nullable(String) │ │ │ │ │ │
│ author │ Nullable(String) │ │ │ │ │ │
│ timestamp │ Nullable(DateTime64(9)) │ │ │ │ │ │
│ comment │ Nullable(String) │ │ │ │ │ │
│ children │ Array(Nullable(Int64)) │ │ │ │ │ │
└───────────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Notice ClickHouse infers the names and data types of your columns by analyzing a large batch of rows. If ClickHouse can not determine the storage type from the filename, you can specify it as the second argument:
SELECT count()
FROM file(
'comments.tsv',
'TabSeparatedWithNames'
)
View the file
table function docs page for more details.
Use the postgresql
table function to read data from a table in PostgreSQL:
SELECT *
FROM
postgresql(
'localhost:5432',
'my_database',
'my_table',
'postgresql_user',
'password')
;
View the postgresql
table function docs page for more details.
Use the mysql
table function to read data from a table in MySQL:
SELECT *
FROM
mysql(
'localhost:3306',
'my_database',
'my_table',
'postgresql_user',
'password')
;
View the mysql
table function docs page for more details.
ClickHouse can read data from any ODBC or JDBC data source:
SELECT *
FROM
odbc(
'DSN=mysqlconn',
'my_database',
'my_table'
);
View the odbc
table function and the jdbc
table function docs pages for more details.
Message queues can stream data into ClickHouse using the corresponding table engine, including:
- Kafka: integrate with Kafka using the
Kafka
table engine - Amazon MSK: integrate with Amazon Managed Streaming for Apache Kafka (MSK)
- RabbitMQ: integrate with RabbitMQ using the
RabbitMQ
table engine
ClickHouse has table functions to read data from the following sources:
- Hadoop: integrate with Apache Hadoop using the
hdfs
table function - Hudi: read from existing Apache Hudi tables in S3 using the
hudi
table function - Iceberg: read from existing Apache Iceberg tables in S3 using the
iceberg
table function - DeltaLake: read from existing Delta Lake tables in S3 using the
deltaLake
table function
Check out our long list of ClickHouse integrations to find how to connect your existing frameworks and data sources to ClickHouse.
What's Next?
- Check out the Advanced Tutorial which takes a much deeper dive into the key concepts and capabilities of ClickHouse
- Continue your learning by taking our free on-demand training courses at the ClickHouse Academy
- We have a list of example datasets with instructions on how to insert them
- If your data is coming from an external source, view our collection of integration guides for connecting to message queues, databases, pipelines and more
- If you are using a UI/BI visualization tool, view the user guides for connecting a UI to ClickHouse
- The user guide on primary keys is everything you need to know about primary keys and how to define them