You are on page 1of 100

Discovering+Elas/cSearch

A"li%le"about"me

23#years#old

Jess'%boyfriend

Zoe's&dad

Mountain(biker

Scuba&diver

Scuba&biker?

Mar$al&Arts&Instructor

I'm$also$a$developer.
I"run"a"development"agency"in"Australia,"
Webcomm

Finally,(I'm(a(Laracon'addict

who$has$travelled

132,000&km

a"third"of"the"distance"to"the"moon.

Flashcard

@ben_corle+

h+p://github.com/bencorle+

h+p://webcomm.com.au

Search'is

ltering!informa)on
and$determining$relevance

Picture(this(scenario

"I#want#to#nd#hotels#called#
Renaissance#for#under+150,#within+
500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#
have#disability+access#and#ideally#
provide#Wi#and#have#rooms#above+
ground+level."

Requirements
1. Name'of'hotel'called'Renaissance.
2. Under'150.'

3. Within'500m'of'Bimhuis.'

4. Disability'access.

"Perceived"requirements"may"be"exible.

Wants
1. Wi.&

2. Rooms&above&ground&level.

"Let's"be"honest,"this"should"be"a"requirement";)

If!search!is!ltering!informa1on!and!
determining!relevance
And!humans!think!with!expression!and!
emo2on,!why!do!your!apps!operate!like!
robots?
How!can!we!tailor!our!apps!to!think!like!our!
users?

Use$the$right$toolset;$and

Know%your%data.

Introducing+Elas0cSearch

Elas%cSearch
1. Powerful+search+and+analy3cs+engine.+

2. Object;based+document+store+where+every+eld+is+indexed+and+
searchable.
3. Distributed;+ready+to+scale.+

"Elas'cSearch"may"be"used"for"much"more"than"just"a"search"engine.
2
"Webscalez"FTW"(trollolol).

SQL$vs.$Elas+cSearch

SQL$and$Elas+cSearch;$a$comparison

SQL%is%a%rela,onal%database,%Elas,cSearch%is%a%search%engine.

Where%SQL%is%great%at%ltering%on%a%binary%level,%Elas,cSearch%
thrives%on%both%binary%data%and%full%text%relevance.

SQL%indexes%are%always%up%to%date%with%your%primary%store,%
Elas,cSearch%needs%to%be%synced.

Elas,cSearch%is%very%easy%to%horizontally%scale%for%performance%
and%redundancy.

SQL$and$Elas+cSearch;$a$comparison
SQL$uses$the$following$structure$for$its$data$store:
database

>

table

>

row

Elas%cSearch+has+a+dierent,+yet+comparable+structure:
index

>

type

>

document

Elas%cSearch+101

It's%all%about%documents
1. Elas'cSearch-is-document5oriented.
2. Documents-are-represented-using-JSON.
3. Data-can-be-in-nested-JSON-objects,-arrays-and-is-all-searchable.

It's%all%about%documents
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 6,
"features": [
"disability_access",
"wifi",
"smoking_allowed",
"pool"
]
}

Elas%cSearch+Requires

Java

You$have$3$seconds$to$sulk$and$complain,$then$shutup.

Installa'on)is)easy
1. Install)Java.
2. Download)and)run)Elas4cSearch)in)3)bash)commands.)

3. Debian)or)RPM)packages)available.
4. Puppet)&)chef)scripts)available.

"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/_installa.on.html

Scaling(isn't(scary
1. Near'zero*congura0on*to*build*a*cluster*of*Elas0cSearch*
instances.
2. Easy*to*scale*horizontally.
3. Each*Elas0cSearch*instance*is*referred*to*as*a*node.
4. Any*node*is*capable*of*handling*any*request*and*delega0ng*load.

In#very#basic#terms,#horizontal*scaling#is#adding

more%servers
to#build#a#cluster,#or#cloud...

...where&ver$cal(scaling&is&throwing

more%resources
at#an#individual#server.

Elas%cSearch+exposes+a

RESTful(API

Win.

Communica)ng+with+Elas)cSearch
1. HTTP&verbs&,&GET,&POST,&PUT,&DELETE,&etc...
2. Send&&&receive&JSON&payloads.

:VERB /:index/:type/:document
{
"key": "value",
"complex": ["foo", "bar"]
}

Crea%ng(a(document
POST /myapp/hotel
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 6,
"features": [
"disability_access",
"wifi",
"smoking_allowed",
"pool"
]
}

Upda%ng(a(document
PUT /myapp/hotel/1 # ^1

{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 10,
"features": [
"disability_access",
"wifi",
"pool",
"restaurant"
]
}
1

"Actually"an"upsert;"create"or"update"depending"on"existance.

Dele$ng'a'document
DELETE /myapp/hotel/1

There's'lots'more'to'documents
1. Par&al(document(upda&ng.
2. Document(versioning.
3. Conict(resolu&on(for(distributed(documents.
4. Bulk(CRUD(methods(to(avoid(HTTP(boEleneck.

See#h%p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/docs.html

Elas%cSearch+makes

searching*fun

Searching*in*Elas.cSearch
1. Every(single(eld(can(be(searchable.
2. Perform(structured(queries(or(lters,(against(elds.
3. Perform(full(text(queries(to(nd(documents.
4. Queries(and(lters(represented(using(JSON.
5. Organise(results(by(relevance.

"SQL&like"approach.

Index
1. Index&(noun)#$#refers#to#the#equivalent#of#a#database#in#an#SQL#
system.
2. Index&(verb)#$#refers#to#the#process#of#storing(a(document#in#an#
index.
3. Inverted&index#$#list#of#all#terms#inside#Elas@cSearch#and#the#
documents#in#which#they#appear.

Analysis

Character(lters"simplify"data,"such"as"changing:

"&""to""and".

"""to""e".

Data"is"split"into"terms"through"a"process"called"tokenisa1on.

Analysis

Token&lters"tweak"and"normalise"terms,"such"as:

Cast"to"lowercase.

Remove&stop3words"like""a""or""the"

Add"synonyms.

Inverted(index
1

1. Analysis#process#extremely#congurable.

2. Mul7lingual#support#(33#languages#in#total),#interchangeable#per#
index.
3. Any#elds#not#indexed#are#not#searchable.
4. The+same+analysis+process+occurs+at+search+3me.
1

"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/analysis.html

Example(inverted(index
Consider)the)two)following)sentences:

"The%quick%brown%fox%jumped%over%the%lazy%dog"
"Quick%brown%foxes%leap%over%lazy%dogs%in%summer"

Example(inverted(index
Term
Doc_1 Doc_2
------------------------brown
|
X
| X
dog
|
X
| X
fox
|
X
| X
in
|
| X
jump
|
X
| X
lazy
|
X
| X
over
|
X
| X
quick
|
X
| X
summer |
| X
the
|
X
| X
------------------------

Scoring
1. Term%frequency#$#the#more#o+en#a#term#appears#in#a#eld,#the#
more%relevant.
2. Inverted%document%frequency#$#the#more#o+en#a#term#appears#
in#the#inverted#index,#the#less%relevant.
3. Field%length%norm#$#the#longer#the#eld,#the#less#relevant#each#
term#in#it#is.

Scoring
1. Fields)are)"boostable")to)increase)relevance.
2. Func5ons)(inbuilt)and)scripted))can)be)used)to)increase/decrease)
relevance.
3. Altering)analysis)to)ne?tune)scoring.
4. Very)important)to)know%your%data.

Queries'and'lters
1. Both'are'modular;'think'of'building(blocks.
2. Both'can'be'nested'inside'one'another.
3. Syntax'does'not'change,'regardless'of'posi?on'or'nes?ng.
4. En?re'JSON'object'is'the'Elas/cSearch(Query(DSL.

Querying)in)Elas.cSearch
1. There'37'queries'(as'of'August'2014).'

2. Queries'are'intelligent;'they'score'all'results'according'to'a'
relevance'algorithm.
3. Any'nesBng'passes'relevance'back'to'parents.

"h$p://www.elas.csearch.org/guide/en/elas.csearch/guide/current/
relevance9intro.html

Filters
1. You&will&nd&27&lters&(as&of&August&2014).&

2. Filters&are&binary;&either&a&eld&matches&or&it&doesn't.
3. Filters&don't&aect&relevance&scoring.

"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/query;dsl;lters.html

Querying)in)Elas.cSearch
GET /myapp/hotel/_search
{
"query": {
"match": "Renaissance"
}
}

This%is%a%match%query.%It%is%the%go1to%full%text%query.

Querying)in)Elas.cSearch
GET /myapp/hotel/_search

{
"query": {
"filtered": {
"filter": {
"term": {
"features": "disability_access"
}
}
}
}
}

This%is%a%ltered%query,%containing%a%term%lter.

Back%to%our%scenario

"I#want#to#nd#hotels#called#
Renaissance#for#under+150,#within+
500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#
have#disability+access#and#ideally#
provide#Wi#and#have#rooms#above+
ground+level."

Two$approaches
1. Possible*with*both*SQL*and*Elas5cSearch.
2. Much*easier*Elas5cSearch.
3. Elas5cSearch*understands*the*concept*of*relevance.
4. Elas5cSearch*can*severely*outperform*SQL.

SQL$approach

First,'we'll'build'the'obvious...
select *
from `hotels`
where `name` like "%Renaissance%"
and `price` <= 150
and `disability_access` = 1

Performance*&*relevance
Consider)the)following:
select *
from `hotels`
where `name` like "%Renaissance%"

1. This'query'will'be'slow.
2. This'query'accounts'for'terms'which'contain'the'(correctly)'spelt'
Renaissance.

Full$text$search
1. Add%a%full%text%index.%

2. Alter%the%query,%and%search%across%both%name%and%company.
select *
from `hotels`
where match (`name`, `company`) against ("Renaissance")
and `price` <= 150
and `disability_access` = 1
1

"h$p://dev.mysql.com/doc/refman/5.0/en/fulltext<search.html

Checklist
1. Name'of'hotel'called'Renaissance
2. Under'150.
3. Within&500m&of&Bimhuis.
4. Disability'access.
5. Wi.
6. Above&ground&level.

Adding&"wants"&in
select *, if (`floor_levels` > 1, 1, 0) as `has_multiple_floor_levels`
from `hotels`
where match (`name`, `company`) against ("Renaissance")
and `price` <= 150
and `disability_access` = 1
order by `wifi` desc, `has_multiple_floor_levels` desc -- ^1

"We're"priori*sing"Wi"over"mul*ple"oor"levels...

Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(150.
3. Within&500m&of&Bimhuis.
4. Disability(access.
5. Wi.
6. Rooms(above(ground(level.

Spacial'awareness...
1. Not&so&easy.
2. PostGIS&for&PostgreSQL.&

3. Possible&with&MySQL&with&MyISAM&tables&only.&

4. Very&nite;&either&a&match&or&not&a&match.
5. Outside&the&scope&of&this&talk.
1

"h$p://postgis.net
2
"Possible"with"other"engines"in"new"versions"of"MySQL

Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(150.
3. Within(500m(of(Bimhuis.
4. Disability(access.
5. Wi.
6. Rooms(above(ground(level.

What%if...
1. Somebody*searched*for*"Residence*Inn"*as*the*hotel*name?
2. There*was*an*appropriate*hotel*for*151?
3. A*brilliant*candidate*could*be*found*501m*away*from*Bimhuis?
4. Somebody*cared*more*haveing*rooms*above*ground*level*than*
being*provided*Wi?

Elas%cSearch+approach

Popula'ng*Elas'cSearch
POST /myapp/hotel
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 10,
"features": [
"disability_access",
"wifi",
"pool",
"restaurant"
]
}

Rince&and&repeat&for&as&many&hotels&as&required

The$bool$query
{
"bool": {
"must": {},
"must_not": {},
"should": {}
}
}

We#specify#condi-ons#which#must#and#must%not#match.#Terms#that#
should#match#make#a#document#more#relevant.

Prepare&a&bool&query
{
"bool": {
"must": {
"multi_match": {"query": "Renaissance", "fields": ["name^2", "company"]},
"term": {"features": "disability_access"},
},
"should": {
"term": {"features": "wifi"},
"range": {"floor_levels": {"gt": 1}}
}
}
}

A"eld"boost"of"2"was"applied"to"name"to"increase"relevance.

Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under&150.
3. Within&500m&of&Bimhuis.
4. Disability(access.
5. Wi.
6. Have(rooms(above(ground(level.

What%if...
1. There'was'an'appropriate'hotel'for'151?
2. A'brilliant'candidate'could'be'found'501m'away'from'Bimhuis?

This%is%all%possible%with%Elas/cSearch,

plus%it's%easy.

Controlling)relevance
{
"gauss": {
"location": {
"origin": "52.3712561,4.9005577",
"offset": "0.5km",
"decay": 20
}
}
}

Set$the$origin$to$Bimhuis,$allowing$loca4ons$
of$hotels$within$500m.$Outside$that,$a$steep$
decay$of$relevance$occurs.

Controlling)relevance
{
"gauss": {
"price": {
"origin": 0,
"offset": 100,
"decay": 20
}
}
}

Any$price$over$100$suers$a$similar,$severe$
relevance$penalty.

{
"query": {
"function_score": {
"query": {
"bool": {
"must": {
"multi_match": {"query": "Renaissance", "fields": ["name", "company"]},
"term": {"features": "disability_access"},
},
"should": {
"term": {"features": "wifi"},
"range": {"floor_levels": {"gt": 1}}
}
}
},
"functions": [
{
"gauss": {
"location": {
"origin": "52.3712561,4.9005577",
"offset": "0.5km",
"decay": 20
}
}
},
{
"gauss": {
"price": {
"origin": 0,
"offset": 100,
"decay": 20
}
}
}
]
}
}
}

See#example#in#detail#over#at
h"p://git.io/V4Hm6w

Integra(ng)with

WordPress
*Joking*

Integra(ng)with

Laravel

Install'via'Composer
{
"require": {
"elasticsearch/elasticsearch": "1.1.*"
}
}

Crea%ng(/(upda%ng(documents
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
'body' => [
'name'
=> 'Renaissance Hotel Amsterdam',
'company'
=> 'Marriott',
'location'
=> [52.3712561, 4.9005577],
'floor_levels' => 10,
'features'
=> ['disability_access', 'wifi', 'pool', 'restaurant'],
],
]);

You$can$create$or$update$in$the$same$request.

Par$ally'upda$ng'documents
$client = new Elasticsearch\Client();
$client->update([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
'body' => [
'floor_levels' => 11,
],
]);

Dele$ng'documents
$client = new Elasticsearch\Client();
$client->delete([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
]);

Searching*documents
$client = new Elasticsearch\Client();
$client->search([
'index' => 'myapp',
'type' => 'hotel',
'body' => [
'query' => [
'match' => 'Renaissance',
],
],
]);

Create/update/delete+eloquent+documents
Hotel::created(function ($hotel) {
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
'body' => $hotel->toArray(),
]);
});

Create/update/delete+eloquent+documents
Hotel::updated(function ($hotel) {
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
'body' => $hotel->toArray(),
]);
});

Create/update/delete+eloquent+documents
Hotel::deleted(function ($hotel) {
$client = new Elasticsearch\Client();
$client->delete([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
]);
});

Searching*+*two*approaches
1. Acceptable#have#a#search()#method#directly#on#Eloquent#to#
use#Elas6cSearch
2. Be*er#approach#is#to#decorate#a#repository;#decouple#and#
remove#vendor#lock<in.

Eloquent)repository
class EloquentHotelRepository implements HotelRepository
{
public function create($name)
{
// Create and save the model
}
public function search($name, array $filters)
{
// Perform search as best you can without ElasticSearch...
}
// Truncated for brevity...
}

Decora'ng*the*repository
class ElasticSearchHotelRepository implements HotelRepository
{
protected $eloquent;
public function __construct(EloquentHotelRepository $eloquent)
{
$this->eloquent = $eloquent;
}
public function create($name)
{
$this->eloquent->create($name);
}
public function search($name, array $filters)
{
// Truncated for brevity...
}
}

Decora'ng*the*repository
class ElasticSearchHotelRepository implements HotelRepository
{
public function search($name, array $filters)
{
$results = $client->search([
// ...
]);
return array_map(function ($result) {
$hotel = new Hotel([
//
]);
$hotel->exists = true;
return $hotel;
}, $results);
}
}

To#dig#deeper,#please#visit
1

h"p://git.io/CRW6Mg0
1

"Repository"will"be"live"soon

Why$I$chose$Elas-cSearch
1. Built(for(real.me(search(applica.ons.
2. Handles(concurrent(read/write(much(be;er(than(compe.tors.
3. Download(an(execute(a(single(binary(as(a(bare(minimum.
4. Easy(to(congure;(you(don't(need(to(congure(anything.
5. JSON(over(a(RESTful(API.(Need(I(say(more?

Why$I$chose$Elas-cSearch$over$"X"
1. Solr#$#I#dislike#how#you#communicate#with#it;#maybe#I'm#not#
enterprise+enough#for#XML.#I#also#don't#like#it's#real>me#
1
performance.
2. Sphinx#$#query#language#was#peculiar,#SQL$like.#Was#never#built#
as#a#real>me#search#engine.
1

"h$p://blog.socialcast.com/real5me6search6solr6vs6elas5csearch/

Things'I'haven't'told'you'about
1. Par&al(matching(0(matching(par&al(words(using(ngrams.
2. How(easy(and(fast(autocomplete(can(be.
3. Fuzzy1search(0(misspelt(words.
4. Fine0tuning(analysis(for(specic(data(sets.
5. Analy5cs(0(aggrega&ng(sta&s&cs(to(produce(things(like(reports(
or(faceted8ltering(0(part(of(a(query.

One$more$thing...

Elas%cSearch+is+coming+soon+to

Laravel'Homestead
Run$vagrant box update$to$get$the$awesomeness.

Further'learning
1. h$p://www.elas-csearch.org
2. h$p://shop.oreilly.com/product/0636920028505.do
3. h$p://git.io/CRW6Mg

h"p://joind.in/11691
h"ps://github.com/bencorle"/laracon5eu52014

You might also like