You are on page 1of 9

*8

Apache Kudu
Introduction

❑ Introduction
❑ Architecture
❑ History
❑ Why kudu
❑ Use Case
❑ Kudu vs HBase
Apache Kudu
Introduction

❑ Apache Kudu is a open source column-oriented data store of the Apache Hadoop
ecosystem.
❑ Kudu is storage for fast analytics on fast data.
❑ Kudu providing a combination of fast inserts and updates alongside efficient columnar
scans to enable multiple real-time analytic workloads across a single storage layer.
❑ Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid
architecture.
Apache Kudu
Architecture

The diagram shows a Kudu cluster with three masters and multiple tablet servers, each
serving multiple tablets. It illustrates how Raft consensus is used to allow for both leaders
and followers for both the masters and tablet servers. In addition, a tablet server can be a
leader for some tablets and a follower for others. Leaders are shown in gold, while
followers are shown in grey.
Apache Kudu
Architecture

Master tablet Tablet 1. Tablet 2. Tablet n

Master tablet Tablet 1


Tablet n
LEADER LEADER
FOLLOWER

Tablet 2
Master tablet Tablet 1
FOLLOWER
FOLLOWER FOLLOWER

Master tablet Tablet 1 Taet 2 Tablet n


FOLLOWER FOLLOWER FOLLOWER LEADER
Apache Kudu
History

❑ Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop
ecosystem. It is compatible with most of the data processing frameworks in the Hadoop
environment.
❑ The open source project to build Apache Kudu began as internal project at Cloudera. The
first version Apache Kudu 1.0 was released 19 September 2016.
Apache Kudu
Why Kudu

❑ Apache kudu is the disruptive technology to enable Real-Time analytics on fast data that
we have all been waiting for.
❑ Kudu is completely different than other Big data analytics solution.
❑ Kudu take advantage of Next Generation Hardware.
❑ Kudu supports SQL with Spark or Impala.
❑ Kudu enables killer “Big Data” Apps.
❑ Kudu should be part of your Big Data strategy.
Apache Kudu
Use case

The big data landscape was until 1-3 years ago dominated by several storage systems, the
first was Hadoop HDFS and later followed by Apache HBase, a NoSQL database. HDFS is
great for high-speed writes and scans while the latter is well suited for random-access
queries. A new storage engine, Apache Kudu tries to bridge the gap between those two
uses cases. Apache Kudu is a distributed, columnar database for structured, real-time data.
Because Kudu has a schema, it is only suited for structured data, contrary to HBase which is
schemaless.
Apache Kudu
Kudu vs HBase

❑ Apache HBase is an open-source, distributed, versioned, column-oriented store modeled


after Google Bigtable: A Distributed Storage System for Structured Data. Just as Bigtable
leverages the distributed data storage provided by the Google File System, HBase provides
Bigtable-like capabilities on top of Apache Hadoop.
❑ Performance
● OLTP
● Fast Point Queries
❑ HBase is fast for updates and inserts but for analytics
❑ A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's
storage layer to enable fast analytics on fast data.
❑ Real time analytics
❑ Kudu is meant to do both well

You might also like