You are on page 1of 26

AgensGraph: a Multi-Model Graph Database

based-on PostgreSQL

Kisung Kim (kskim@bitnine.net)


Bitnine R&D Center
2017-1-14
Who am I
Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.

Researched query optimization for graph-structured data during


doctorate degree

Developed a distributed relational database engine in TmaxSoft

Lead the development of a new graph database, AgensGraph in


Bitnine Global
What is Graph Database?

Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j


What is Graph Database?
Relationship is the first-class citizen in the graph database
Make your data connected in the graph database

Relational Database Graph Database

Entity Row Node (Vertex)

Relationship Row Relationship (Edge)


What is the Graph Database?
Handle data in different view
Data model similar to entity-relationship model
Gartner says it represents a radical change in how data is
organized and processed
Cypher Query Language
Declarative query language for the property graph model
Inspired by SQL and SPARQL
Designed to be human-readable query language
Developed by Neo technology Inc. since 2011
Current version is 3.0
OpenCypher.org (http://opencypher.org)
Participate in developing the query language
Cypher Query Example
Make two nodes
CREATE (:person {id: 1, name: Kisung Kim, birthday: 1980-01-05});
CREATE (:company {id: 1, name: Bitnine Global});

Make a relationship between the two nodes


MATCH (p:person {id: 1}), (c:company {id:1})
CREATE (p)-[:workFor {title: CTO, since: 2014}]->(c);

workFor
Kisung Kim Bitnine Global
Cypher Query Example
Querying
MATCH (p:person {name: Kisung Kim})-[:workFor]->(c:company)
RETURN (p), (c)
workFor
Kisung Kim ?

Query with variable length relationships


MATCH (p:person {name: Kisung Kim})-[:knows*..3]->(f:person)
RETURN (f)
knows knows knows
Kisung Kim ? ? ?

No Table Definitions and No Joins


GraphDB to PostgreSQL Case
From Hipolabs

http://engineering.hipolabs.com/graphdb-to-postgresql/
Graph Database and Hybrid Database

Magic Quadrant for Operational Database Management Systems, Gartner, 2016


So, What We Want to Make is
Hybrid database engine with graph and relational model
Cypher query processing on PostgreSQL
Online transactional graph database
Disk-based persistent graph storage

( ) -[:processes]->(Cypher)
Why We Choose PostgreSQL?
Fully-featured enterprise-ready open source database

Graph processing actually uses relational algebra


Graph is serialized as tables in disk
Every graph traversal step is in principle a join
(from LDBC documentation)

It is important to optimize the joins speed up join processing


PostgreSQL has an excellent query optimizer

And. Abundant eco-system of PostgreSQL


Challenges
How to store graph data
Efficient structure for graph pattern matching
At the same time, efficient for transaction processing

How to process graph queries


Processing complex graph pattern matching: variable length path,
shortest path
Mismatches between graph data model & relational data model
Graph query optimization
Graph Storage
Graph data is stored in disk as decomposed into vertexes
and edges

When processing graph pattern matching, it is essential to


find adjacent vertexes or edges efficiently
Given a start vertex, find end vertexes
Given an end vertex, find start vertexes
v1
Two Graph Databases

Solution Company Latest Version Features

Most famous graph database, Cypher


Neo Technology 3.1
O(1) access using fixed-size array

Titan Distributed graph system based on


Datastax -
Cassandra
Graph Storage -Neo4j
Fixed-size array for nodes and relationships
Relationships for a node is organized as a doubly-linked list
Index-free adjacency
O(1) access for adjacent edges: follow the pointer

From Graph Databases 2nd ed. OReilly, 2015


Graph Storage Titan (DSE Graph)
Titan stores graphs in adjacency list format
Each edge is stored twice
Vertex and edge list are stored in backend storage like HBase
Cassandra or BerkeleyDB

From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Graph Storage -AgensGraph
Fixed-size array is hard to implement in PostgreSQL
Tuples are moved when updated
Titans big row approach is also inadequate
We chose B-tree index for graph traversal

Graph
Vertex Edge

B-tree B-tree B-tree


Vertex ID (Start, End) (End, Start)

Vertex ID Properties Edge ID Start Vertex ID End Vertex ID Properties


Index Problems
Current B-tree has several disadvantages for our workload
Composite index is preferable but the size increases
There exists a lot of duplicate keys (vertex ID) on start_ID or end_ID
Property updates incur insertions into B-trees

We are developing a new index having bucket structure (like


GIN index), in-direct index and supports for index-only scan
for the graph traversals
Graph Storage -AgensGraph
Vertexes and edges are grouped into labels
Labels are organized as a label hierarchy
We use PostgreSQLs table hierarchy feature
ag_vertex
Vertex ID Properties

Person Message
Vertex ID Properties Vertex ID Properties

Comment Post
Vertex ID Properties Vertex ID Properties
Current Status
AgensGraph v0.9
(https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)
Graph data model and DDL on PostgreSQL 9.6
Cypher query processing (70% of OpenCypher spec.)
Integrated query processing (Cypher + SQL)
Client library (JDBC, ODBC, Python)
Monitoring and development using Tadpole DB-hub
Tadpole for Agens Graph
Tadpole DB Hub is open-source project for managing unified
infrastructure (https://github.com/hangum/TadpoleForDBTools)

Support various databases including (PostgreSQL and Agens Graph)

Features of Tadpole for Agens Graph


Monitoring Agens Graph server
Cypher query browser and graph visualization
Tadpole for AgensGraph
Future Roadmap
Distributed graph database
Plan to exploit Postgres-XL
Specialized storage and index for graph traversals
Dictionary compression for JSONB (ZSON)
Graph query optimization using graph statistics
Integration with big data systems
HDFS Storage
Graph analysis using GraphX
Join Us
AgensGraph is an open-source project https://github.com/bitnine-oss/agens-
graph

We also wish to contribute PostgreSQL community

Graph database meetup in Silicon Valley


http://www.meetup.com/Graph-Database-in-Silicon-Valley/
Thank You
kskim@bitinine.net

:likes

You might also like