Professional Documents
Culture Documents
Apache Kudu
Introduction
❑ Introduction
❑ Architecture
❑ History
❑ Why kudu
❑ Use Case
❑ Kudu vs HBase
Apache Kudu
Introduction
❑ Apache Kudu is a open source column-oriented data store of the Apache Hadoop
ecosystem.
❑ Kudu is storage for fast analytics on fast data.
❑ Kudu providing a combination of fast inserts and updates alongside efficient columnar
scans to enable multiple real-time analytic workloads across a single storage layer.
❑ Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid
architecture.
Apache Kudu
Architecture
The diagram shows a Kudu cluster with three masters and multiple tablet servers, each
serving multiple tablets. It illustrates how Raft consensus is used to allow for both leaders
and followers for both the masters and tablet servers. In addition, a tablet server can be a
leader for some tablets and a follower for others. Leaders are shown in gold, while
followers are shown in grey.
Apache Kudu
Architecture
Tablet 2
Master tablet Tablet 1
FOLLOWER
FOLLOWER FOLLOWER
❑ Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop
ecosystem. It is compatible with most of the data processing frameworks in the Hadoop
environment.
❑ The open source project to build Apache Kudu began as internal project at Cloudera. The
first version Apache Kudu 1.0 was released 19 September 2016.
Apache Kudu
Why Kudu
❑ Apache kudu is the disruptive technology to enable Real-Time analytics on fast data that
we have all been waiting for.
❑ Kudu is completely different than other Big data analytics solution.
❑ Kudu take advantage of Next Generation Hardware.
❑ Kudu supports SQL with Spark or Impala.
❑ Kudu enables killer “Big Data” Apps.
❑ Kudu should be part of your Big Data strategy.
Apache Kudu
Use case
The big data landscape was until 1-3 years ago dominated by several storage systems, the
first was Hadoop HDFS and later followed by Apache HBase, a NoSQL database. HDFS is
great for high-speed writes and scans while the latter is well suited for random-access
queries. A new storage engine, Apache Kudu tries to bridge the gap between those two
uses cases. Apache Kudu is a distributed, columnar database for structured, real-time data.
Because Kudu has a schema, it is only suited for structured data, contrary to HBase which is
schemaless.
Apache Kudu
Kudu vs HBase