Talk by Josh Berkus, core member of PostgreSQL.
Big data features
Data sizing:
9.5 status:
9.5 big data features:
UPSERT: INSERT ... ON CONFLICT DO { UPDATE | IGNORE }
, similar to
MERGE
only better. Merge syntax was complex and did not support some
of the use cases (single row insert or update). Merge is available in other DBs (Oracle?)
Why you want Upsert:
How it works – It probes the index (which means BTree index), (caveat?), locks the leaf page. If we try concurrent upserts they will fail because of lock, but will not overwrite data (??).
BRIN – Block-Range INdex:
BTree:2142MB::BRIN:(64kb for 512, 192kb for 64)
)95:BTree::18:BRIN
)GROUPING, SETS, ROLLUP and CUBE:
ROLLUP – get subtotals and totals with detail in one query.
SELECT country, city, count(*)
FROM lwn_subscribers
GROUP by rollup (country, city);
CUBE – explosiion of all summaries for use with OLAP tools (e.g: Mondrian, Pentaho etc.,). Earlier we had to do this externally (say Java). (CUBE output is not meant to be human-readable).
GROUPING SETS - superset/programmable version of both ROLLUP and CUBE.
SELECT ...
FROM ...
GROUP by grouping sets ((city, level), (level), ());
Abbreviated key sorting
Foreign Schema
CREATE FOREIGN TABLE myredishash
(key text, val text[])
SERVER redis_server
...
INSERT INTO myredishash (key, val)
VALUES ('mytable:r1, '{prop1, val1..}');
FDW to query other PostgreSQL instances,Hadoop, cassandra, kafka, spreadsheet, twitter(REST), multicorn (Python).
Import Foreign Schema (SQL DBs):
IMPORT FOREIGN SCHEMA public
EXCEPT (reports, audit)
FROM server acctng_server into acctng;
currently on “postgresql-like” databases.
Foreign Table inheritence:
CREATE FOREIGN TABLE users_shard_1 ()
INHERITS (users) server shard_1;
CREATE FOREIGN TABLE users_shard_2 ()
INHERITS (users) server shard_2;
citusdb uses FTI to store in columnstore (cstore) tables.
More features:
SET LOGGED/UNLOGGED:
ALTER TABLE july_data SET LOGGED
GiST index-only Scan:
Index-only scan for PostGIS, Exclusion
WAL compression:
Reduce PostgreSQL log writes during data loading; less writing => faster load.
TABLESAMPLE:
Get a “quick look” at data in a big table.
SELECT * FROM user_profiles
TABLESAMPLE BERNOULLI (0.001);
system
, bernoulli
.Beyond 9.5:
Parallel seq scan; finally, parallel query for postgresql.
set max_parallel_degree = 4 ;
select * from pgbench_accounts
where filler like '%a%';
cstore
& pg_shard
cstore
: column store for PostgreSQLpg_shard
: automated DW sharding – automated sharding of tables to remote shards. Aimed at big data use case.Big data forks go OSS
Questions:
Links:
Additional reference: