Talk by Josh Berkus, core member of PostgreSQL.

Big data features

Data sizing:

9.5 status:

9.5 big data features:

UPSERT: INSERT ... ON CONFLICT DO { UPDATE | IGNORE }, similar to MERGE only better. Merge syntax was complex and did not support some of the use cases (single row insert or update). Merge is available in other DBs (Oracle?)

Why you want Upsert:

How it works – It probes the index (which means BTree index), (caveat?), locks the leaf page. If we try concurrent upserts they will fail because of lock, but will not overwrite data (??).

BRIN – Block-Range INdex:

GROUPING, SETS, ROLLUP and CUBE:

ROLLUP – get subtotals and totals with detail in one query.

SELECT country, city, count(*)
FROM lwn_subscribers
GROUP by rollup (country, city);

CUBE – explosiion of all summaries for use with OLAP tools (e.g: Mondrian, Pentaho etc.,). Earlier we had to do this externally (say Java). (CUBE output is not meant to be human-readable).

GROUPING SETS - superset/programmable version of both ROLLUP and CUBE.

SELECT ...
FROM ...
GROUP by grouping sets ((city, level), (level), ());

Abbreviated key sorting

Foreign Schema

CREATE FOREIGN TABLE myredishash
(key text, val text[])
SERVER redis_server
...

INSERT INTO myredishash (key, val)
VALUES ('mytable:r1, '{prop1, val1..}');

FDW to query other PostgreSQL instances,Hadoop, cassandra, kafka, spreadsheet, twitter(REST), multicorn (Python).

Import Foreign Schema (SQL DBs):

IMPORT FOREIGN SCHEMA public
EXCEPT (reports, audit)
FROM server acctng_server into acctng;

currently on “postgresql-like” databases.

Foreign Table inheritence:

CREATE FOREIGN TABLE users_shard_1 ()
INHERITS (users) server shard_1;

CREATE FOREIGN TABLE users_shard_2 ()
INHERITS (users) server shard_2;

citusdb uses FTI to store in columnstore (cstore) tables.

More features:

SET LOGGED/UNLOGGED:

GiST index-only Scan:

Index-only scan for PostGIS, Exclusion

WAL compression:

Reduce PostgreSQL log writes during data loading; less writing => faster load.

TABLESAMPLE:

Get a “quick look” at data in a big table.

SELECT * FROM user_profiles
TABLESAMPLE BERNOULLI (0.001);

Beyond 9.5:

Parallel seq scan; finally, parallel query for postgresql.

set max_parallel_degree = 4 ;
select * from pgbench_accounts
where filler like '%a%';

cstore & pg_shard

Big data forks go OSS

Questions:

Links:

Additional reference: