Apache Phoenix

Notes

Where does Phoenix fit in?

phoenix demo

What is Hbase

Why use HBase?

Why use Phoenix?

Phoenix versus Hive performance

Use cases

Why is it important?

Archival problem set

Archive pilot demonstration

Monitoring use case

Phoenix under the hood

Roadmap (Nov, 2013)

State of the union (Jun 2015)

Join and subquery support

SELECT AVG(response_time) FROM SERVER_METRICS
WHERe DAYOFMONTH(create_time)=1

Adding the following index will turn it into a range scan:

CREATE INDEX day_of_month_idx
ON SERVER_METRICS(DAYOFMONTH(create_time))
INCLUDE(response_time)
CREATE FUNCTION WOEID_DISTANCE(INTEGER, INTEGER)
RETURN INTEGER AS 'org.apache.geo.woeidDistance'
USING JAR '/lib/geo/geoloc.jar'

usage:

SELECT FROM woeid a JOIN woeid b on a.country = b.country
WHERE woeid_distance(a.ID, b.ID) < 5;

This query executes on server side. This is tenant aware, which means.. there is no chance of conflict between tenants defining the same UDF.

Apache Calcite

Installation

Setting up a local phoenix cluster using Docker

Download and install Docker beta for Mac

We will setup a three node hadoop cluster using this recipe as a template.

git clone git@github.com:kiwenlau/hadoop-cluster-docker.git
cd hadoop-cluster-docker
docker build .

Lot of things fly by …

Successfully built ed4ece2b19f2

Create hadoop network

sudo docker network create --driver=bridge hadoop
Password:
f678bdfc5918e15a578b909692e6f04a4ef5f730a95ebb0f16da5a30c38354b1