Creating Tables in ClickHouse
Like most databases, ClickHouse logically groups tables into databases. Use the CREATE DATABASE
command to create a new database in ClickHouse:
CREATE DATABASE IF NOT EXISTS helloworld
Similarly, use CREATE TABLE
to define a new table. (If you do not specify the database name, the table will be in the
default
database.) The following table named is my_first_table
in the helloworld
database:
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)
In the example above, my_first_table
is a MergeTree
table with four columns:
user_id
: a 32-bit unsigned integermessage
: aString
data type, which replaces types likeVARCHAR
,BLOB
,CLOB
and others from other database systemstimestamp
: aDateTime
value, which represents an instant in timemetric
: a 32-bit floating point numbernoteThe table engine determines:
- How and where the data is stored
- Which queries are supported
- Whether or not the data is replicated
There are many engines to choose from, but for a simple table on a single-node ClickHouse server, MergeTree is your likely choice.
A Brief Intro to Primary Keys
Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!):
- primary keys in ClickHouse are not unique for each row in a table
The primary key of a ClickHouse table determines how the data is sorted when written to disk. Every 8,192 rows or 10MB of data (referred to as the index granularity) creates an entry in the primary key index file. This granularity concept creates a sparse index that can easily fit in memory, and the granules represent a stripe of the smallest amount of column data that gets processed during
SELECT
queries.The primary key can be defined using the
PRIMARY KEY
parameter. If you define a table without aPRIMARY KEY
specified, then the key becomes the tuple specified in theORDER BY
clause. If you specify both aPRIMARY KEY
and anORDER BY
, the primary key must be a subset of the sort order.The primary key is also the sorting key, which is a tuple of
(user_id, timestamp)
. Therefore, the data stored in each column file will be sorted byuser_id
, thentimestamp
.
For more details, check out the Modeling Data training module in ClickHouse Academy.