Professional Documents
Culture Documents
A"li%le"about"me
23#years#old
Jess'%boyfriend
Zoe's&dad
Mountain(biker
Scuba&diver
Scuba&biker?
Mar$al&Arts&Instructor
I'm$also$a$developer.
I"run"a"development"agency"in"Australia,"
Webcomm
Finally,(I'm(a(Laracon'addict
who$has$travelled
132,000&km
a"third"of"the"distance"to"the"moon.
Flashcard
@ben_corle+
h+p://github.com/bencorle+
h+p://webcomm.com.au
Search'is
ltering!informa)on
and$determining$relevance
Picture(this(scenario
"I#want#to#nd#hotels#called#
Renaissance#for#under+150,#within+
500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#
have#disability+access#and#ideally#
provide#Wi#and#have#rooms#above+
ground+level."
Requirements
1. Name'of'hotel'called'Renaissance.
2. Under'150.'
3. Within'500m'of'Bimhuis.'
4. Disability'access.
"Perceived"requirements"may"be"exible.
Wants
1. Wi.&
2. Rooms&above&ground&level.
"Let's"be"honest,"this"should"be"a"requirement";)
If!search!is!ltering!informa1on!and!
determining!relevance
And!humans!think!with!expression!and!
emo2on,!why!do!your!apps!operate!like!
robots?
How!can!we!tailor!our!apps!to!think!like!our!
users?
Use$the$right$toolset;$and
Know%your%data.
Introducing+Elas0cSearch
Elas%cSearch
1. Powerful+search+and+analy3cs+engine.+
2. Object;based+document+store+where+every+eld+is+indexed+and+
searchable.
3. Distributed;+ready+to+scale.+
"Elas'cSearch"may"be"used"for"much"more"than"just"a"search"engine.
2
"Webscalez"FTW"(trollolol).
SQL$vs.$Elas+cSearch
SQL$and$Elas+cSearch;$a$comparison
SQL%is%a%rela,onal%database,%Elas,cSearch%is%a%search%engine.
Where%SQL%is%great%at%ltering%on%a%binary%level,%Elas,cSearch%
thrives%on%both%binary%data%and%full%text%relevance.
SQL%indexes%are%always%up%to%date%with%your%primary%store,%
Elas,cSearch%needs%to%be%synced.
Elas,cSearch%is%very%easy%to%horizontally%scale%for%performance%
and%redundancy.
SQL$and$Elas+cSearch;$a$comparison
SQL$uses$the$following$structure$for$its$data$store:
database
>
table
>
row
Elas%cSearch+has+a+dierent,+yet+comparable+structure:
index
>
type
>
document
Elas%cSearch+101
It's%all%about%documents
1. Elas'cSearch-is-document5oriented.
2. Documents-are-represented-using-JSON.
3. Data-can-be-in-nested-JSON-objects,-arrays-and-is-all-searchable.
It's%all%about%documents
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 6,
"features": [
"disability_access",
"wifi",
"smoking_allowed",
"pool"
]
}
Elas%cSearch+Requires
Java
You$have$3$seconds$to$sulk$and$complain,$then$shutup.
Installa'on)is)easy
1. Install)Java.
2. Download)and)run)Elas4cSearch)in)3)bash)commands.)
3. Debian)or)RPM)packages)available.
4. Puppet)&)chef)scripts)available.
"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/_installa.on.html
Scaling(isn't(scary
1. Near'zero*congura0on*to*build*a*cluster*of*Elas0cSearch*
instances.
2. Easy*to*scale*horizontally.
3. Each*Elas0cSearch*instance*is*referred*to*as*a*node.
4. Any*node*is*capable*of*handling*any*request*and*delega0ng*load.
In#very#basic#terms,#horizontal*scaling#is#adding
more%servers
to#build#a#cluster,#or#cloud...
...where&ver$cal(scaling&is&throwing
more%resources
at#an#individual#server.
Elas%cSearch+exposes+a
RESTful(API
Win.
Communica)ng+with+Elas)cSearch
1. HTTP&verbs&,&GET,&POST,&PUT,&DELETE,&etc...
2. Send&&&receive&JSON&payloads.
:VERB /:index/:type/:document
{
"key": "value",
"complex": ["foo", "bar"]
}
Crea%ng(a(document
POST /myapp/hotel
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 6,
"features": [
"disability_access",
"wifi",
"smoking_allowed",
"pool"
]
}
Upda%ng(a(document
PUT /myapp/hotel/1 # ^1
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 10,
"features": [
"disability_access",
"wifi",
"pool",
"restaurant"
]
}
1
"Actually"an"upsert;"create"or"update"depending"on"existance.
Dele$ng'a'document
DELETE /myapp/hotel/1
There's'lots'more'to'documents
1. Par&al(document(upda&ng.
2. Document(versioning.
3. Conict(resolu&on(for(distributed(documents.
4. Bulk(CRUD(methods(to(avoid(HTTP(boEleneck.
See#h%p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/docs.html
Elas%cSearch+makes
searching*fun
Searching*in*Elas.cSearch
1. Every(single(eld(can(be(searchable.
2. Perform(structured(queries(or(lters,(against(elds.
3. Perform(full(text(queries(to(nd(documents.
4. Queries(and(lters(represented(using(JSON.
5. Organise(results(by(relevance.
"SQL&like"approach.
Index
1. Index&(noun)#$#refers#to#the#equivalent#of#a#database#in#an#SQL#
system.
2. Index&(verb)#$#refers#to#the#process#of#storing(a(document#in#an#
index.
3. Inverted&index#$#list#of#all#terms#inside#Elas@cSearch#and#the#
documents#in#which#they#appear.
Analysis
Character(lters"simplify"data,"such"as"changing:
"&""to""and".
"""to""e".
Data"is"split"into"terms"through"a"process"called"tokenisa1on.
Analysis
Token<ers"tweak"and"normalise"terms,"such"as:
Cast"to"lowercase.
Remove&stop3words"like""a""or""the"
Add"synonyms.
Inverted(index
1
1. Analysis#process#extremely#congurable.
2. Mul7lingual#support#(33#languages#in#total),#interchangeable#per#
index.
3. Any#elds#not#indexed#are#not#searchable.
4. The+same+analysis+process+occurs+at+search+3me.
1
"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/analysis.html
Example(inverted(index
Consider)the)two)following)sentences:
"The%quick%brown%fox%jumped%over%the%lazy%dog"
"Quick%brown%foxes%leap%over%lazy%dogs%in%summer"
Example(inverted(index
Term
Doc_1 Doc_2
------------------------brown
|
X
| X
dog
|
X
| X
fox
|
X
| X
in
|
| X
jump
|
X
| X
lazy
|
X
| X
over
|
X
| X
quick
|
X
| X
summer |
| X
the
|
X
| X
------------------------
Scoring
1. Term%frequency#$#the#more#o+en#a#term#appears#in#a#eld,#the#
more%relevant.
2. Inverted%document%frequency#$#the#more#o+en#a#term#appears#
in#the#inverted#index,#the#less%relevant.
3. Field%length%norm#$#the#longer#the#eld,#the#less#relevant#each#
term#in#it#is.
Scoring
1. Fields)are)"boostable")to)increase)relevance.
2. Func5ons)(inbuilt)and)scripted))can)be)used)to)increase/decrease)
relevance.
3. Altering)analysis)to)ne?tune)scoring.
4. Very)important)to)know%your%data.
Queries'and'lters
1. Both'are'modular;'think'of'building(blocks.
2. Both'can'be'nested'inside'one'another.
3. Syntax'does'not'change,'regardless'of'posi?on'or'nes?ng.
4. En?re'JSON'object'is'the'Elas/cSearch(Query(DSL.
Querying)in)Elas.cSearch
1. There'37'queries'(as'of'August'2014).'
2. Queries'are'intelligent;'they'score'all'results'according'to'a'
relevance'algorithm.
3. Any'nesBng'passes'relevance'back'to'parents.
"h$p://www.elas.csearch.org/guide/en/elas.csearch/guide/current/
relevance9intro.html
Filters
1. You&will&nd&27<ers&(as&of&August&2014).&
2. Filters&are&binary;&either&a&eld&matches&or&it&doesn't.
3. Filters&don't&aect&relevance&scoring.
"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/
current/query;dsl;lters.html
Querying)in)Elas.cSearch
GET /myapp/hotel/_search
{
"query": {
"match": "Renaissance"
}
}
This%is%a%match%query.%It%is%the%go1to%full%text%query.
Querying)in)Elas.cSearch
GET /myapp/hotel/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"features": "disability_access"
}
}
}
}
}
This%is%a%ltered%query,%containing%a%term%lter.
Back%to%our%scenario
"I#want#to#nd#hotels#called#
Renaissance#for#under+150,#within+
500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#
have#disability+access#and#ideally#
provide#Wi#and#have#rooms#above+
ground+level."
Two$approaches
1. Possible*with*both*SQL*and*Elas5cSearch.
2. Much*easier*Elas5cSearch.
3. Elas5cSearch*understands*the*concept*of*relevance.
4. Elas5cSearch*can*severely*outperform*SQL.
SQL$approach
First,'we'll'build'the'obvious...
select *
from `hotels`
where `name` like "%Renaissance%"
and `price` <= 150
and `disability_access` = 1
Performance*&*relevance
Consider)the)following:
select *
from `hotels`
where `name` like "%Renaissance%"
1. This'query'will'be'slow.
2. This'query'accounts'for'terms'which'contain'the'(correctly)'spelt'
Renaissance.
Full$text$search
1. Add%a%full%text%index.%
2. Alter%the%query,%and%search%across%both%name%and%company.
select *
from `hotels`
where match (`name`, `company`) against ("Renaissance")
and `price` <= 150
and `disability_access` = 1
1
"h$p://dev.mysql.com/doc/refman/5.0/en/fulltext<search.html
Checklist
1. Name'of'hotel'called'Renaissance
2. Under'150.
3. Within&500m&of&Bimhuis.
4. Disability'access.
5. Wi.
6. Above&ground&level.
Adding&"wants"&in
select *, if (`floor_levels` > 1, 1, 0) as `has_multiple_floor_levels`
from `hotels`
where match (`name`, `company`) against ("Renaissance")
and `price` <= 150
and `disability_access` = 1
order by `wifi` desc, `has_multiple_floor_levels` desc -- ^1
"We're"priori*sing"Wi"over"mul*ple"oor"levels...
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(150.
3. Within&500m&of&Bimhuis.
4. Disability(access.
5. Wi.
6. Rooms(above(ground(level.
Spacial'awareness...
1. Not&so&easy.
2. PostGIS&for&PostgreSQL.&
3. Possible&with&MySQL&with&MyISAM&tables&only.&
4. Very&nite;&either&a&match&or¬&a&match.
5. Outside&the&scope&of&this&talk.
1
"h$p://postgis.net
2
"Possible"with"other"engines"in"new"versions"of"MySQL
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(150.
3. Within(500m(of(Bimhuis.
4. Disability(access.
5. Wi.
6. Rooms(above(ground(level.
What%if...
1. Somebody*searched*for*"Residence*Inn"*as*the*hotel*name?
2. There*was*an*appropriate*hotel*for*151?
3. A*brilliant*candidate*could*be*found*501m*away*from*Bimhuis?
4. Somebody*cared*more*haveing*rooms*above*ground*level*than*
being*provided*Wi?
Elas%cSearch+approach
Popula'ng*Elas'cSearch
POST /myapp/hotel
{
"name": "Renaissance Hotel Amsterdam",
"company": "Marriott",
"location": [52.3712561, 4.9005577],
"floor_levels": 10,
"features": [
"disability_access",
"wifi",
"pool",
"restaurant"
]
}
Rince&and&repeat&for&as&many&hotels&as&required
The$bool$query
{
"bool": {
"must": {},
"must_not": {},
"should": {}
}
}
We#specify#condi-ons#which#must#and#must%not#match.#Terms#that#
should#match#make#a#document#more#relevant.
Prepare&a&bool&query
{
"bool": {
"must": {
"multi_match": {"query": "Renaissance", "fields": ["name^2", "company"]},
"term": {"features": "disability_access"},
},
"should": {
"term": {"features": "wifi"},
"range": {"floor_levels": {"gt": 1}}
}
}
}
A"eld"boost"of"2"was"applied"to"name"to"increase"relevance.
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under&150.
3. Within&500m&of&Bimhuis.
4. Disability(access.
5. Wi.
6. Have(rooms(above(ground(level.
What%if...
1. There'was'an'appropriate'hotel'for'151?
2. A'brilliant'candidate'could'be'found'501m'away'from'Bimhuis?
This%is%all%possible%with%Elas/cSearch,
plus%it's%easy.
Controlling)relevance
{
"gauss": {
"location": {
"origin": "52.3712561,4.9005577",
"offset": "0.5km",
"decay": 20
}
}
}
Set$the$origin$to$Bimhuis,$allowing$loca4ons$
of$hotels$within$500m.$Outside$that,$a$steep$
decay$of$relevance$occurs.
Controlling)relevance
{
"gauss": {
"price": {
"origin": 0,
"offset": 100,
"decay": 20
}
}
}
Any$price$over$100$suers$a$similar,$severe$
relevance$penalty.
{
"query": {
"function_score": {
"query": {
"bool": {
"must": {
"multi_match": {"query": "Renaissance", "fields": ["name", "company"]},
"term": {"features": "disability_access"},
},
"should": {
"term": {"features": "wifi"},
"range": {"floor_levels": {"gt": 1}}
}
}
},
"functions": [
{
"gauss": {
"location": {
"origin": "52.3712561,4.9005577",
"offset": "0.5km",
"decay": 20
}
}
},
{
"gauss": {
"price": {
"origin": 0,
"offset": 100,
"decay": 20
}
}
}
]
}
}
}
See#example#in#detail#over#at
h"p://git.io/V4Hm6w
Integra(ng)with
WordPress
*Joking*
Integra(ng)with
Laravel
Install'via'Composer
{
"require": {
"elasticsearch/elasticsearch": "1.1.*"
}
}
Crea%ng(/(upda%ng(documents
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
'body' => [
'name'
=> 'Renaissance Hotel Amsterdam',
'company'
=> 'Marriott',
'location'
=> [52.3712561, 4.9005577],
'floor_levels' => 10,
'features'
=> ['disability_access', 'wifi', 'pool', 'restaurant'],
],
]);
You$can$create$or$update$in$the$same$request.
Par$ally'upda$ng'documents
$client = new Elasticsearch\Client();
$client->update([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
'body' => [
'floor_levels' => 11,
],
]);
Dele$ng'documents
$client = new Elasticsearch\Client();
$client->delete([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> '1',
]);
Searching*documents
$client = new Elasticsearch\Client();
$client->search([
'index' => 'myapp',
'type' => 'hotel',
'body' => [
'query' => [
'match' => 'Renaissance',
],
],
]);
Create/update/delete+eloquent+documents
Hotel::created(function ($hotel) {
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
'body' => $hotel->toArray(),
]);
});
Create/update/delete+eloquent+documents
Hotel::updated(function ($hotel) {
$client = new Elasticsearch\Client();
$client->index([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
'body' => $hotel->toArray(),
]);
});
Create/update/delete+eloquent+documents
Hotel::deleted(function ($hotel) {
$client = new Elasticsearch\Client();
$client->delete([
'index' => 'myapp',
'type' => 'hotel',
'id'
=> $hotel->id,
]);
});
Searching*+*two*approaches
1. Acceptable#have#a#search()#method#directly#on#Eloquent#to#
use#Elas6cSearch
2. Be*er#approach#is#to#decorate#a#repository;#decouple#and#
remove#vendor#lock<in.
Eloquent)repository
class EloquentHotelRepository implements HotelRepository
{
public function create($name)
{
// Create and save the model
}
public function search($name, array $filters)
{
// Perform search as best you can without ElasticSearch...
}
// Truncated for brevity...
}
Decora'ng*the*repository
class ElasticSearchHotelRepository implements HotelRepository
{
protected $eloquent;
public function __construct(EloquentHotelRepository $eloquent)
{
$this->eloquent = $eloquent;
}
public function create($name)
{
$this->eloquent->create($name);
}
public function search($name, array $filters)
{
// Truncated for brevity...
}
}
Decora'ng*the*repository
class ElasticSearchHotelRepository implements HotelRepository
{
public function search($name, array $filters)
{
$results = $client->search([
// ...
]);
return array_map(function ($result) {
$hotel = new Hotel([
//
]);
$hotel->exists = true;
return $hotel;
}, $results);
}
}
To#dig#deeper,#please#visit
1
h"p://git.io/CRW6Mg0
1
"Repository"will"be"live"soon
Why$I$chose$Elas-cSearch
1. Built(for(real.me(search(applica.ons.
2. Handles(concurrent(read/write(much(be;er(than(compe.tors.
3. Download(an(execute(a(single(binary(as(a(bare(minimum.
4. Easy(to(congure;(you(don't(need(to(congure(anything.
5. JSON(over(a(RESTful(API.(Need(I(say(more?
Why$I$chose$Elas-cSearch$over$"X"
1. Solr#$#I#dislike#how#you#communicate#with#it;#maybe#I'm#not#
enterprise+enough#for#XML.#I#also#don't#like#it's#real>me#
1
performance.
2. Sphinx#$#query#language#was#peculiar,#SQL$like.#Was#never#built#
as#a#real>me#search#engine.
1
"h$p://blog.socialcast.com/real5me6search6solr6vs6elas5csearch/
Things'I'haven't'told'you'about
1. Par&al(matching(0(matching(par&al(words(using(ngrams.
2. How(easy(and(fast(autocomplete(can(be.
3. Fuzzy1search(0(misspelt(words.
4. Fine0tuning(analysis(for(specic(data(sets.
5. Analy5cs(0(aggrega&ng(sta&s&cs(to(produce(things(like(reports(
or(faceted8ltering(0(part(of(a(query.
One$more$thing...
Elas%cSearch+is+coming+soon+to
Laravel'Homestead
Run$vagrant box update$to$get$the$awesomeness.
Further'learning
1. h$p://www.elas-csearch.org
2. h$p://shop.oreilly.com/product/0636920028505.do
3. h$p://git.io/CRW6Mg
h"p://joind.in/11691
h"ps://github.com/bencorle"/laracon5eu52014