You are on page 1of 88

C++ vector class library

2013 Agner Fog, Gnu public license Version 1.03 . www.agner.org/optimize

Table of Contents
ntro!uction.............................................................................................................2 "ow it wor#s.......................................................................................................2 $lat%orms supporte!............................................................................................3 nstruction sets supporte!..................................................................................3 &ompilers supporte!...........................................................................................3 Features..............................................................................................................3 nten!e! use.......................................................................................................' A(ailabilit)...........................................................................................................' *icense................................................................................................................' +,e basics...............................................................................................................' "ow to compile...................................................................................................' -(er(iew o% (ector classes.................................................................................. &onstructing (ectors an! loa!ing !ata into (ectors.........................................../ 0ea!ing !ata %rom (ectors...............................................................................10 -perators..............................................................................................................12 Arit,metic operators..........................................................................................12 *ogic operators.................................................................................................1' nteger !i(ision..................................................................................................1/ Functions...............................................................................................................20 nteger %unctions...............................................................................................20 Floating point simple mat,ematical %unctions...................................................22 Floating point categorization %unctions.............................................................21 Floating point control wor! manipulation %unctions..........................................30 Floating point mat,ematical librar) %unctions...................................................32 $ermute, blen!, loo#up an! c,ange sign %unctions.........................................'0 2umber 3 string con(ersion %unctions............................................................'4 5oolean operations an! per6element branc,es....................................................'7 &on(ersion between (ector t)pes..........................................................................3 8pecial applications...............................................................................................7 36!imensional (ectors........................................................................................7 &omple9 number (ectors..................................................................................42 :uaternions......................................................................................................44 nstruction sets an! &$; !ispatc,ing...................................................................47 $er%ormance consi!erations................................................................................./3 &omparison o% alternati(e met,o!s %or writing 8 <= co!e............................../3 &,oise o% compiler an! %unction libraries........................................................./. &,oosing t,e optimal (ector size an! precision.............................................../. $utting !ata into (ectors.................................................................................../4 >,en t,e !ata size is not a multiple o% t,e (ector size..................................../1 ;sing multiple accumulators.............................................................................11

;sing multiple t,rea!s......................................................................................13 ?rror con!itions.....................................................................................................13 0untime errors..................................................................................................13 &ompile6time errors..........................................................................................1' *in# errors.........................................................................................................1' File list...................................................................................................................1' ?9amples..............................................................................................................14

Introduction
+,is (ector class librar) is a tool t,at ma#es it simpler to utilize 8ingle6 nstruction6 <ultiple6=ata @8 <=A instruction sets suc, as 88?2 or AVB in &CC programs. +,is is best e9plaine! wit, an e9ampleD
// Example 1a. Adding list of float a[8], b[8], c[8]; ... for int i ! "; i # 8; i$$% & c[i] ! a[i] $ b[i]'1.(f; * numbers // declare arrays // put values into arrays // loop for 8 elements // operations on eac) element

+,e (ector class librar) allows )ou to write t,is co!e as (ectorsD
// Example 1b. Adding list of numbers as vectors +include ,vectorclass.), // use vector class library float a[8], b[8], c[8]; // declare arrays ... // put values into arrays -ec8f avec, bvec, cvec; // define vectors avec.load a%; // load array a into vector bvec.load b%; // load array b into vector cvec ! avec $ bvec ' 1.(f; // do operations on vectors cvec.store c%; // save result in array c

?9ample 1b !oes t,e same as e9ample 1a, but more e%%icientl) because it utilizes 8 <= instructions t,at !o eig,t a!!itions an!/or eig,t multiplications in a single instruction. <o!ern microprocessors ,a(e t,ese instructions w,ic, ma) gi(e )ou a t,roug,put o% eig,t %loating point a!!itions an! eig,t multiplications per cloc# c)cle. A goo! optimizing compiler ma) actuall) con(ert e9ample 1a automaticall) to use t,e 8 <= instructions, but in more complicate! cases )ou cannot be sure t,at t,e compiler is able to (ectorize )our co!e automaticall).

How it works
+,e t)pe -ec8f in e9ample 1b is a class t,at encapsulates t,e intrinsic t)pe ..m/(0 w,ic, represents a 2.46bit (ector register ,ol!ing 1 %loating point numbers o% 32 bits eac,. +,e o(erloa!e! operators $ an! ' represent t,e 8 <= instructions %or a!!ing an! multipl)ing (ectors. +,ese operators are inline! so

t,at no e9tra co!e is generate! ot,er t,an t,e 8 <= instructions. All )ou ,a(e to !o to get access to t,ese (ector operations is to inclu!e E(ectorclass.,E in )our &CC co!e an! speci%) t,e !esire! instruction set @e.g. 88?2 or AVBA in )our compiler options. +,e co!e in e9ample 1b can be re!uce! to Fust ' mac,ine instructions i% t,e instruction set AVB or ,ig,er is enable!. +,e 88?2 instruction set will gi(e 1 mac,ine instructions because t,e ma9imum (ector register size is ,al% as big %or instruction sets prior to AVB. +,e co!e in e9ample 1a will generate appro9imatel) '' instructions i% t,e compiler !oes not automaticall) (ectorize t,e co!e.

Platforms supported
>in!ows, *inu9 an! <ac, 326bit an! 4'6bit, wit, ntel, A<= or V A processor.

Instruction sets supported


914 an! 91464' wit, 88?2, 88?3, 888?3, 88?'.1, 88?'.2, AVB, AVB2, B-$, F<A3, F<A'. +,e AVB an! later instruction sets can onl) run on newer operating s)stem (ersions @>in!ows / 8$1, >in!ows 8er(er 2001 02 8$1, *inu9 #ernel (ersion 2.4.30, Apple -8 B 8now *eopar! 10.4.1A.

Compilers supported
<icroso%t, ntel an! Gnu &CC compilers. t is recommen!e! to use t,e newest (ersion o% t,e compiler i% t,e newest instruction sets are use!. -l!er compiler (ersions can be use! up to t,e 88?'.2 instruction set.

Features
(ectors o% 1, 14, 32 an! 4'6bit integers, signe! an! unsigne! (ectors o% single an! !ouble precision %loating point numbers total (ector size 121 or 2.4 bits !e%ines almost all common operators boolean operations an! branc,es on (ector elements !e%ines man) arit,metic %unctions permute, blen! an! table6loo#up %unctions %ast integer !i(ision man) mat,ematical %unctions @reGuires e9ternal librar)A can buil! co!e %or !i%%erent instruction sets %rom t,e same source co!e &$; !ispatc,ing to utilize ,ig,er instruction sets w,en a(ailable uses metaprogramming @inclu!ing preprocessing !irecti(es an! templatesA to %in! t,e best implementation %or t,e selecte! instruction set an! parameter (alues o% a gi(en operator or %unction inclu!es se(eral e9tra ,ea!er %iles %or special purposes an! applications

Intended use
+,is (ector class librar) is inten!e! %or e9perience! &CC programmers. t is use%ul %or impro(ing co!e per%ormance w,ere spee! is critical an! w,ere t,e compiler is unable to (ectorize t,e co!e automaticall) in an optimal wa). &ombining e9plicit (ectorization b) t,e programmer wit, ot,er #in!s o% optimization !one b) t,e compiler, it ,as t,e potential %or generating ,ig,l) e%%icient co!e. +,is can be use%ul %or optimizing librar) %unctions an! critical innermost loops @,otspotsA in &$;6intensi(e programs. +,ere is no reason to use it in less critical parts o% t,e program.

Availability
+,e newest (ersion o% t,e (ector class librar) is a(ailable %rom ,ttpD//www.agner.org/optimize/(ectorclass.zip +,ere is a !iscussion boar! %or t,e (ector class librar) at ,ttpD//www.agner.org/optimize/(ectorclass/

License
+,is (ector class librar), %unction librar) an! e9amples are %ree to use in open source so%twareD )ou can re!istribute it an!/or mo!i%) it un!er t,e terms o% t,e G2; General $ublic *icense as publis,e! b) t,e Free 8o%tware Foun!ation, (ersion 3 or an) later (ersion. 8ee t,e %ile license.t9t. &ommercial licenses are a(ailable on reGuest.

The basics
How to compile
&op) t,e ,ea!er %iles @H.,A %rom (ectorclass.zip to t,e same %ol!er as )our &CC source %iles. +,e ,ea!er %iles in t,e sub%ol!er name! EspecialE s,oul! onl) be inclu!e! i% nee!e!. nclu!e t,e ,ea!er %ile (ectorclass., in )our &CC source %ileD
include ,vectorclass.),

8e(eral ot,er ,ea!er %iles will be inclu!e! automaticall). 8et )our compiler options to t,e !esire! instruction set. +,e instruction set must be at least 88?2. 8ee page /0 %or a list o% compiler options. Iou ma) compile multiple (ersions %or !i%%erent instruction sets as e9plaine! in t,e c,apter starting at page 47.

+,e %ollowing simple &CC e9ample ma) ,elp )ou getting starte!D
// 1imple vector class example 2$$ file +include #stdio.)3 +include ,vectorclass.), int main % & // define and initiali4e integer vectors a and b -ec5i a 1",11,1/,16%; -ec5i b /",/1,//,/6%; // add t)e t7o vectors -ec5i c ! a $ b; // 8rint t)e results for int i ! "; i # 5; i$$% & printf , 9(i,, c[i]%; * printf ,:n,%; * return ";

Overview of vector classes


+,e %ollowing (ector classes are !e%ine!D nteger (ector classesD vector integer class size, bits

signed

elements per vector

total bits

recommended instruction set 88?2 88?2 88?2 88?2 88?2 88?2 88?2 88?2 AVB2

Vec14c Vec14uc Vec1s Vec1us Vec'i Vec'ui Vec2G Vec2G Vec32c

1 1 14 14 32 32 4' 4' 1

signe! unsigne! signe! unsigne! signe! unsigne! signe! unsigne! signe!

14 14 1 1 ' ' 2 2 32

121 121 121 121 121 121 121 121 2.4

Vec32uc Vec14s Vec14us Vec1i Vec1ui Vec'G Vec'uG

1 14 14 32 32 4' 4'

unsigne! signe! unsigne! signe! unsigne! signe! unsigne!

32 14 14 1 1 ' '

2.4 2.4 2.4 2.4 2.4 2.4 2.4

AVB2 AVB2 AVB2 AVB2 AVB2 AVB2 AVB2

Floating point (ector classesD vector class precision

elements per vector ' 2 1 '

total bits

recommended instruction set 88?2 88?2 AVB AVB

Vec'% Vec2! Vec1% Vec'!

single !ouble single !ouble

121 121 2.4 2.4

Vector classes t,at can be use! %or 5oolean operationsD vector class for use with elements per total bits vector Vec121b Vec14c Vec1s Vec'i Vec2G Vec2.4b Vec32c Vec14s Vec1i Vec'G Vec'%b Vec121b Vec14c, Vec14uc Vec1s, Vec1us Vec'i, Vec'ui Vec2G, Vec2uG Vec2.4b Vec32c, Vec32uc Vec14s, Vec14us Vec1i, Vec1ui Vec'G, Vec'uG Vec'% 121 14 1 ' 2 2.4 32 14 1 ' ' 121 121 121 121 121 2.4 2.4 2.4 2.4 2.4 121

recommended instruction set 88?2 88?2 88?2 88?2 88?2 AVB2 AVB2 AVB2 AVB2 AVB2 88?2

Vec2!b Vec1%b Vec'!b

Vec2! Vec1% Vec'!

2 1 '

121 2.4 2.4

88?2 AVB AVB

Constructing vectors and loading data into vectors


+,ere are man) wa)s to create (ectors an! put !ata into (ectors. +,ese met,o!s are liste! ,ere. method defined for description efficiency ?9ampleD
-ec5i a;

!e%ault constructor all (ector classes t,e (ector is create! but not initialize!. +,e (alue is unpre!ictable goo!
// creates a vector of 5 signed integers

method defined for description efficiency ?9amplesD

constructor wit, one parameter all (ector classes all elements get t,e same (alue goo! %or constant. <e!ium %or (ariable as parameter
// all four elements ! ; // all four elements ! 8

-ec5i a ;%; -ec5i b ! 8;

method defined for description efficiency ?9amplesD

constructor wit, one parameter %or eac, (ector element all (ector classes, e9cept Vec121b, Vec2.4b eac, element gets a speci%ie! (alue. +,e parameter %or element number 0 comes %irst goo! %or constant. <e!ium %or (ariables as parameters
1",11,1/,16% /",/1,//,/6%

-ec5i a 1",11,1/,16%; // a ! -ec5i b ! -ec5i /",/1,//,/6%; // b !

method defined for description efficiency ?9ampleD

constructor wit, one parameter %or eac, ,al% (ector all 2.46bit (ector classes concatenates two 1216bit (ectors into one 2.46bit (ector goo!

-ec5i a 1",11,1/,16%; -ec5i b /",/1,//,/6%; -ec8i c a, b%; // c !

1",11,1/,16,/",/1,//,/6%

method defined for description efficiency ?9ampleD

insert@in!e9, (alueA all (ector classes, e9cept Vec121b, Vec2.4b c,anges t,e (alue o% element number @in!e9A to @(alueA. +,e in!e9 starts at 0. me!ium to poor, !epen!ing on instruction set

-ec5i a "%; a.insert /, <%;

// a !

",",<,"%

method defined for description efficiency

loa!@const pointerA all (ector classes, e9cept Vec'%b, Vec1%b, Vec2!b, Vec'!b loa!s all elements %rom an arra)

goo!, e9cept imme!iatel) a%ter inserting elements separatel) into t,e arra). +,is is t,e pre%erre! wa) o% putting (alues into a (ector, e9cept imme!iatel) a%ter (alues ,a(e been put into t,e arra) one b) one @see page /4A. ?9ampleD
int list[8] ! &1",11,1/,16,15,1(,10,1;*; -ec5i a, b; a.load list%; // a ! 1",11,1/,16% b.load list$5%; // b ! 15,1(,10,1;%

method defined for description efficiency

loa!Ja@const pointerA all (ector classes, e9cept Vec'%b, Vec1%b, Vec2!b, Vec'!b loa!s all elements %rom an aligne! arra) goo!, e9cept imme!iatel) a%ter inserting elements separatel) into t,e arra).

+,is met,o! !oes t,e same as t,e load met,o! @see abo(eA, but reGuires t,at t,e pointer points to an a!!ress !i(isible b) 14 %or 1216bit (ectors, or !i(isible b) 32 %or 2.46bit (ectors. % )ou are not certain t,at t,e arra) is properl) aligne! t,en use load instea! o% load.a. load.a is more e%%icient t,an load on ntel Atom processor. method defined for description efficiency ?9ampleD loa!Jpartial@int n, const pointerA all integer an! %loating point (ector classes loa!s n elements %rom an arra) into a (ector. 8ets remaining elements to 0. 0 K n K @(ector sizeA. me!ium

float list[6] ! &1."f, 1.1f, 1./f*; -ec5f a; a.load.partial /, list%; // a ! 1.", 1.1, ".", "."%

method defined for description efficiency ?9ampleD

cuto%%@int nA all integer an! %loating point (ector classes lea(es t,e %irst n elements unc,ange! an! sets t,e remaining elements to zero. 0 K n K @(ector sizeA. goo!

-ec5i a 1", 11, 1/, 16%; a.cutoff /%;

// a !

1", 11, ", "%

method defined for description

setJbit@in!e9, (alueA all integer (ector classes c,anges a single bit to 0 or 1. in!e9 starts at bit 0 o% element 0 an! en!s wit, t,e last bit o% t,e last element. (alue L 0 or 1. me!ium

efficiency ?9ampleD

-ec5i a 1"%; a.set.bit 65, 1%;

// a !

1",15,1",1"%

Reading data from vectors


+,ere are man) wa)s to e9tract elements or parts o% a (ector. +,ese met,o!s are liste! ,ere. method defined for description store@pointerA all (ector classes, e9cept Vec'%b, Vec1%b, Vec2!b, Vec'!b stores all elements into an arra)

efficiency goo! +,is is t,e pre%erre! wa) o% getting t,e in!i(i!ual elements o% a (ector. ?9ampleD
-ec5i a 1",11,1/,16%; -ec5i b /",/1,//,/6%; int list[8]; a.store list%; b.store list$5%; // list contains 1",11,1/,16,/",/1,//,/6%

method defined for description

storeJa@pointerA all (ector classes, e9cept Vec'%b, Vec1%b, Vec2!b, Vec'!b stores all elements into an aligne! arra)

efficiency goo! +,is met,o! !oes t,e same as t,e store met,o! @see abo(eA, but reGuires t,at t,e pointer points to an a!!ress !i(isible b) 14 %or 1216bit (ectors, or !i(isible b) 32 %or 2.46bit (ectors. % )ou are not certain t,at t,e arra) is properl) aligne! t,en use store instea! o% store.a. store.a is more e%%icient t,an store on ntel Atom processor.

method defined for description efficiency ?9ampleD

storeJpartial@int n, pointerA all integer an! %loating point (ector classes stores t,e %irst n elements into an arra). 0 K n K @(ector sizeA. me!ium

float list[6] ! &<."f, <."f, <."f*; -ec5f a 1."f, 1.1f, 1./f, 1.6f%; a.store.partial /, list%; // list contains

1.", 1.1, <."%

method defined for description efficiency ?9ampleD

e9tract@in!e9A all (ector classes, e9cept Vec121b, Vec2.4b gets a single element %rom a (ector me!ium

-ec5i a 1",11,1/,16%; int b ! a.extract /%;

// b ! 1/

method defined for description

operator MN all (ector classes, e9cept Vec121b, Vec2.4b gets a single element %rom a (ector

efficiency me!ium +,e operator [] !oes e9actl) t,e same as t,e e9tract met,o!. 2ote t,at )ou can rea! a (ector element wit, t,e [] operator, but not write an element. ?9ampleD
-ec5i a 1",11,1/,16%; int b ! a[/]; a[6] ! (; // b ! 1/ // not allo7ed=

method defined for description efficiency ?9ampleD

getJbit@in!e9A all integer (ector classes rea!s a single bit. in!e9 starts at bit 0 o% element 0 an! en!s wit, t,e last bit o% t,e last element. me!ium

-ec5i a 1"%; int b ! a.get.bit 65%;

// b ! "

method defined for description efficiency ?9ampleD

getJlow@A all 2.46bit (ector classes gets t,e lower ,al% o% a 2.46bit (ector as a 1216bit (ector goo!

-ec8i a 1",11,1/,16,15,1(,10,1;%; -ec5i b ! a.get.lo7 %; // b ! 1",11,1/,16%

method defined for description efficiency ?9ampleD

getJ,ig,@A all 2.46bit (ector classes gets t,e upper ,al% o% a 2.46bit (ector as a 1216bit (ector goo!

-ec8i a 1",11,1/,16,15,1(,10,1;%; -ec5i b ! a.get.)ig) %; // b ! 15,1(,10,1;%

Operators
Arit metic operators
operator defined for description efficiency ?9ampleD C, CC, CL all (ector classes e9cept 5ooleans a!!ition goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a $ b;

// c !

6", 6/, 65, 60%

operator defined for description efficiency ?9ampleD

6, 66, 6L, unar) 6 all (ector classes e9cept 5ooleans subtraction goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a > b;

// c !

>1", >1", >1", >1"%

operator defined for

H, HL all (ector classes e9cept 5ooleans

description efficiency

multiplication goo! %or (ectors o% %loat, !ouble, an! 146bit integers, poor %or (ectors o% 16bit integers an! 4'6bit integers, goo! %or (ectors o% 326bit integers i% 88?'.1 or ,ig,er instruction set

?9ampleD
-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a ' b;

// c !

/"", /61, /05, /<<%

operator defined for description efficiency ?9ampleD

/, /L @%loating pointA Vec'%, Vec1%, Vec2!, Vec'! !i(ision poor

-ec5f a 1."f, 1.1f, 1./f, 1.6f%; -ec5f b /."f, /.1f, /./f, /.6f%; -ec5f c ! a / b; // c ! ".(0(f%

".(""f, ".(/5f, ".(5(f,

operator defined for description efficiency ?9ampleD

/, /L @integer (ector !i(i!e! b) scalarA all integer (ector classes, e9cept 4'6bit integers !i(ision b) scalar. All elements are !i(i!e! b) t,e same !i(isor. 8ee page 1/ %or e9planation poor

-ec5i a 1", 11, 1/, 16%; int b ! 6; -ec5i c ! a / b;

// c !

6, 6, 5, 5%

operator defined for description efficiency ?9ampleD

/, /L @integer (ector !i(i!e! b) constantA all integer (ector classes, e9cept 4'6bit integers !i(ision b) compile6time constant. All elements are !i(i!e! b) t,e same !i(isor. 8ee page 1/ %or e9planation poor, but better t,an !i(ision b) scalar (ariable. Goo! i% !i(isor is a power o% 2

// signed -ec5i a 1", -ec5i b ! a // unsigned -ec5ui c 1", -ec5ui d ! c

11, 1/, 16%; / const.int 6%;

// b !

6, 6, 5, 5% 6, 6, 5, 5%

11, 1/, 16%; / const.uint 6%; // d !

Logic operators
operator defined for description OO, OOL all integer (ector classes logical s,i%t le%t. Alle (ector elements are s,i%te! b) t,e same amount. 8,i%ting le%t b) n is a %ast wa) o% multipl)ing b) 2 n goo!

efficiency ?9ampleD

-ec5i a 1", 11, 1/, 16%; -ec5i b ! a ## /;

// b !

5", 55, 58, (/%

operator defined for description

PP, PPL all integer (ector classes s,i%t rig,t. Alle (ector elements are s,i%te! b) t,e same amount. ;nsigne! integers use logical s,i%t, signe! integers use arit,metic s,i%t @i.e. sign bit is copie!A goo!

efficiency ?9ampleD

-ec5i a 1", 11, 1/, 16%; -ec5i b ! a 33 /;

// b !

/, /, 6, 6%

operator defined for description efficiency ?9ampleD

LL all integer an! %loating point (ector classes test i% eGual. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%;

-ec5i b 15, 16, 1/, 11%; -ec5i c ! a !! b; // c !

", ", >1, "%

operator defined for description efficiency ?9ampleD

QL all integer an! %loating point (ector classes test i% not eGual. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! a =! b; // c !

>1, >1, ", >1%

operator defined for description efficiency ?9ampleD

P all integer an! %loating point (ector classes test i% bigger. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! a 3 b; // c !

", ", ", >1%

operator defined for description efficiency ?9ampleD

PL all integer an! %loating point (ector classes test i% bigger or eGual. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! a 3! b;

// c !

", ", >1, >1%

operator defined for

O all integer an! %loating point (ector classes

description efficiency ?9ampleD

test i% smaller. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! a # b;

// c !

>1, >1, ", "%

operator defined for description efficiency ?9ampleD

OL all integer an! %loating point (ector classes test i% smaller or eGual. 0esult is a 5oolean (ector @true is represente! b) an element w,ere all bits are 1A goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! a #! b;

// c !

>1, >1, >1, "%

operator defined for description efficiency ?9ampleD

R, RL all (ector classes bitwise an! goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a ? b; // c !

", 1, 5, (%

operator defined for description efficiency ?9ampleD

S, SL all (ector classes bitwise or goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a @ b; // c !

6", 61, 6", 61%

operator defined for description efficiency ?9ampleD

T, TL all (ector classes bitwise e9clusi(e or goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! a A b; // c !

6", 6", /0, /0%

operator defined for description efficiency ?9ampleD

U all integer an! 5oolean (ector classes bitwise not goo!

-ec5i a 1", 11, 1/, 16%; -ec5i b ! Ba; // b !

>11, >1/, >16, >15%

operator defined for description efficiency ?9ampleD

Q all integer an! %loating point (ector classes logical not goo!

-ec5i a >1, ", 1, /%; -ec5i b ! =a;

// b !

", >1, ", "%

Integer division
+,ere are no instructions in t,e 914 instruction set an! its e9tensions t,at are use%ul %or integer (ector !i(ision, an! suc, instructions woul! be Guite slow i% t,e) e9iste!. +,ere%ore, t,e (ector class librar) is using an algorit,m %or %ast integer !i(ision. +,e basic principle o% t,is algorit,m can be e9presse! in t,is %ormulaD a / b V a H @2n / bA PP n +,is calculation goes t,roug, t,e %ollowing stepsD 1. %in! a suitable (alue %or n 2. calculate 2n / b

3. calculate necessar) corrections %or roun!ing errors '. !o t,e multiplication an! s,i%t6rig,t an! appl) corrections %or roun!ing errors +,is %ormula is a!(antageous i% multiple numbers are !i(i!e! b) t,e same !i(isor b. 8teps 1, 2 an! 3 nee! onl) be !one once w,ile step ' is repeate! %or eac, (alue o% t,e !i(i!en! a. +,e mat,ematical !etails are !escribe! in t,e %ile (ectori121.,. @8ee also +. Granlun! an! $. *. <ontgomer)D =i(ision b) n(ariant ntegers ;sing <ultiplication, $rocee!ings o% t,e 8 G$*A2 177' &on%erence on $rogramming *anguage =esign an! mplementationA +,e implementation in t,e (ector class librar) uses (arious (ariants o% t,is met,o! wit, appropriate corrections %or roun!ing errors to get t,e e9act result truncate! towar!s zero. +,e wa) to use t,is in )our co!e !epen!s on w,et,er t,e !i(isor b is a (ariable or constant, an! w,et,er t,e same !i(isor is applie! to multiple (ectors. +,is is illustrate! in t,e %ollowing e9amplesD
// Civision example AD // A variable divisor is applied to one vector -ec5i a 1", 11, 1/, 16%;// dividend is an integer vector int b ! 6; // divisor is an integer variable -ec5i c ! a / b; // result c ! 6, 6, 5, 5%

// Civision example ED // F)e same divisor is applied to multiple vectors int b ! 6; // divisor Civisor.i divb b%; // t)is obGect contains t)e results // of calculation steps 1, /, and 6 for ...% & // loop t)roug) multiple vectors -ec5i a ! ... // get dividend a ! a / divb; // do step 5 of t)e division ... // store results * // Civision example 2D // F)e divisor is a constant, Hno7n at compile time -ec5i a 1", 11, 1/, 16%; // dividend is integer vector -ec5i c ! a / const.int 6%; // result c ! 6, 6, 5, 5%

?9planationD +,e class Civisor.i in e9ample 5 ta#es care o% t,e calculation steps 1, 2 an! 3 in t,e algorit,m !escribe! abo(e. +,e o(erloa!e! / operator ta#es a (ector on t,e le%t ,an! si!e an! an obFect o% class Civisor.i on t,e rig,t ,an! si!e. +,is

obFect is create! be%ore t,e loop wit, t,e !i(isor as parameter to t,e constructor. >e are sa(ing time b) !oing t,is time6consuming calculation onl) once w,ile step ' in t,e calculation is !one multiple times insi!e t,e loop b) a ! a / divb;. n e9ample A, we are also creating an obFect o% class Civisor.i, but t,is is !one implicitl). +,e compiler sees an integer on t,e rig,t ,an! si!e o% t,e / operator w,ere it nee!s an obFect o% class Civisor.i, an! t,ere%ore con(erts t,e integer b to suc, an obFect b) calling t,e constructor Civisor.i int%. +,e %ollowing !i(isor classes are a(ailableD Dividend vector type Divisor class re uired Vec14c, Vec32c Vec14uc, Vec32uc Vec1s, Vec14s Vec1us, Vec14us Vec'i, Vec1i Vec'ui, Vec1ui =i(isorJs =i(isorJus =i(isorJs =i(isorJus =i(isorJi =i(isorJui

% t,e !i(isor is a constant an! t,e (alue is #nown at compile time, t,en we can use t,e met,o! in e9ample &. +,e implementation ,ere uses macros an! templates to !o t,e calculation steps 1, 2 an! 3 at compile time rat,er t,an at e9ecution time. +,is ma#es t,e co!e e(en %aster. +,e e9pression to put on t,e rig,t6,an! si!e o% t,e / operator loo#s as %ollowsD Dividend vector type Vec14c, Vec32c Vec14uc, Vec32uc Vec1s, Vec14s Vec1us, Vec14us Vec'i, Vec1i Vec'ui, Vec1ui Divisor e!pression constJint constJuint constJint constJuint constJint constJuint

+,e compiler will generate an error message i% t,e parameter to constJint or constJuint is not a (ali! compile6time constant. @A (ali! compile time constant can contain integer literals an! operators, as well as macros t,at are e9pan!e! to compile time constants, but not %unction callsA. A %urt,er a!(antage o% t,e met,o! in e9ample & is t,at t,e co!e is able to use !i%%erent met,o!s %or !i%%erent (alues o% t,e !i(isor. +,e !i(ision is particularl) %ast

i% t,e !i(isor is a power o% 2. <a#e sure to use const.int or const.uint on t,e rig,t ,an! si!e o% t,e / operator i% )ou are !i(i!ing b) 2, ', 1, 14, etc. =i(ision is %aster %or (ectors o% 146bit integers t,an %or (ectors o% 16bit or 326bit integers. +,ere is no support %or !i(ision o% (ectors o% 4'6bit integers. ;nsigne! !i(ision is %aster t,an signe! !i(ision.

"unctions
Integer functions
function defined for description efficiency ?9ampleD ,orizontalJa!! all integer (ector classes calculates t,e sum o% all (ector elements me!ium

-ec5i a 1", 11, 1/, 16%; int b ! )ori4ontal.add a%;

// b ! 50

function defined for description efficiency ?9ampleD

,orizontalJa!!J9 all 16bit, 146bit an! 326bit integer (ector classes calculates t,e sum o% all (ector elements. +,e sum is calculate! wit, a ,ig,er number o% bits to a(oi! o(er%low me!ium @slower t,an ,orizontalJa!!A

-ec5i a 1", 11, 1/, 16%; int05.t b ! )ori4ontal.add.x a%;

// b ! 50

function defined for description efficiency ?9ampleD

a!!Jsaturate! all 16bit, 146bit an! 326bit integer (ector classes same as operator C. -(er%low is ,an!le! b) saturation rat,er t,an wrap6aroun! %ast %or 16bit an! 146bit integers. <e!ium %or 326bit integers
"x/""""""", "x5""""""", "x6""""""", "x(""""""", "x5"""""""%; "x0"""""""%;

-ec5i a "x1""""""", -ec5i b "x6""""""",

-ec5i // c -ec5i // d

c ! add.saturated a, b%; ! "x5""""""", "x0""""""", "x;IIIIIII, "x;IIIIIII% d ! a $ b; ! "x5""""""", "x0""""""", >"x8""""""", >"x0"""""""%

function defined for description efficiency ?9ampleD


-ec5i -ec5i -ec5i // c -ec5i // d

subJsaturate! all 16bit, 146bit an! 326bit integer (ector classes same as operator 6. -(er%low is ,an!le! b) saturation rat,er t,an wrap6aroun! %ast %or 16bit an! 146bit integers. <e!ium %or 326bit integers
a >"x1""""""",>"x/""""""",>"x6""""""",>"x5"""""""%; b "x6""""""", "x5""""""", "x(""""""", "x0"""""""%; c ! sub.saturated a, b%; ! >"x5""""""",>"x0""""""",>"x8""""""",>"x8"""""""% d ! a > b; ! >"x5""""""",>"x0""""""",>"x8""""""", "x0"""""""%

function defined for description efficiency ?9ampleD

ma9 all integer (ector classes returns t,e biggest o% two (alues %ast %or Vec14uc, Vec32uc, Vec1s, Vec14s, me!ium %or ot,er integer (ector classes

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! max a, b%; // c !

15, 16, 1/, 16%

function defined for description efficiency ?9ampleD

min all integer (ector classes returns t,e smallest o% two (alues %ast %or Vec14uc, Vec32uc, Vec1s, Vec14s, me!ium %or ot,er integer (ector classes

-ec5i a 1", 11, 1/, 16%; -ec5i b 15, 16, 1/, 11%; -ec5i c ! min a, b%; // c !

1", 11, 1/, 11%

function defined for description efficiency ?9ampleD

abs all signe! integer (ector classes calculates t,e absolute (alue me!ium

-ec5i a >1, ", 1, /%; -ec5i b ! abs a%; // b !

1, ", 1, /%

function defined for description efficiency ?9ampleD

absJsaturate! all signe! integer (ector classes calculates t,e absolute (alue. -(er%low saturates to ma#e sure t,e result is ne(er negati(e w,en t,e input is 2+J< 2 me!ium @slower t,an absA

-ec5i a >"x8""""""", >1, ", 1%; -ec5i b ! abs.saturated a%; // b! "x;IIIIIII,1,",1% -ec5i c ! abs a%; // c! >"x8""""""",1,",1%

function defined for description efficiency ?9ampleD

(ector L rotateJle%t@(ector, intA all integer (ector classes rotates t,e bits o% eac, element. ;se a negati(e count to rotate rig,t me!ium

-ec5i a "x1/65(0;8, "x""""IIII, "xA"""E""", "x"""""""1%; -ec5i b ! rotate.left a, 8%; // b ! "x65(0;81/, "x""IIII"", "x""E"""A", "x"""""1""%

Floating point simple mat ematical functions


function defined for description ,orizontalJa!! all %loating point (ector classes calculates t,e sum o% all (ector elements

efficiency ?9ampleD

me!ium

-ec5f a 1.", 1.1, 1./, 1.6%; float b ! )ori4ontal.add a%;

// b ! 5.0

function defined for description efficiency ?9ampleD

ma9 all %loating point (ector classes returns t,e biggest o% two (allues goo!

-ec5f a 1.", 1.1, 1./, 1.6%; -ec5f b 1.5, 1.6, 1./, 1.1%; -ec5f c ! max a, b%; // c !

1.5, 1.6, 1./, 1.6%

function defined for description efficiency ?9ampleD

min all %loating point (ector classes returns t,e smallest o% two (allues goo!

-ec5f a 1.", 1.1, 1./, 1.6%; -ec5f b 1.5, 1.6, 1./, 1.1%; -ec5f c ! min a, b%; // c !

1.", 1.1, 1./, 1.1%

function defined for description efficiency ?9ampleD

abs all %loating point (ector classes gets t,e absolute (alue goo!

-ec5f a >1.", ".", 1.", /."%; -ec5f b ! abs a%; // b ! 1.", ".", 1.", /."%

function defined for description efficiency

sGrt all %loating point (ector classes calculates t,e sGuare root poor

?9ampleD
-ec5f a ".", 1.", /.", 6."%; -ec5f b ! sJrt a%; // b ! ".""", 1.""", 1.515, 1.;6/%

function defined for description efficiency ?9ampleD

sGuare all %loating point (ector classes calculates t,e sGuare goo!

-ec5f a ".", 1.", /.", 6."%; -ec5f b ! sJuare a%; // b !

".", 1.", 5.", <."%

function defined for description efficiency ?9ampleD

pow@(ector, intA all %loating point (ector classes raises all (ector elements to t,e same integer power me!ium

-ec5f a ".", 1.", /.", 6."%; int b ! 6; -ec5f c ! po7 a, b%; // c !

".", 1.", 8.", /;."%

function defined for description efficiency ?9ampleD

pow@(ector, constJintA all %loating point (ector classes raises all (ector elements to t,e same integer power, w,ere t,e integer is a compile6time constant me!ium, o%ten better t,an pow@(ector,intA

-ec5f a ".", 1.", /.", 6."%; -ec5f b ! po7 a, const.int 6%%; // b !

".",1.",8.",/;."%

function defined for description

roun! all %loating point (ector classes roun! to nearest integer @e(en (alue i% two (alues are eGuall) nearA. +,e (alue is returne! as a %loating point (ector

efficiency ?9ampleD

goo! i% 88?'.1 instruction set

-ec5f a 1.", 1.5, 1.(, 1.0% -ec5f b ! round a%; // b !

1.", 1.", /.", /."%

function defined for description efficiency ?9ampleD

roun!JtoJint all %loating point (ector classes roun! to nearest integer @e(en (alue i% two (alues are eGuall) nearA. +,e (alue is returne! as an integer (ector goo!

// single precisionD -ec5f a 1.", 1.5, 1.(, 1.0% -ec5i b ! round.to.int a%; // b ! // double precisionD -ec/d a 1.", 1.5%; -ec/d b 1.(, 1.0% -ec5i c ! round.to.int a, b%; // c !

1, 1, /, /%

1, 1, /, /%

function defined for description efficiency ?9ampleD

truncate all %loating point (ector classes truncates number towar!s zero. +,e (alue is returne! as a %loating point (ector goo! i% 88?'.1 instruction set

-ec5f a 1.", 1.(, 1.<, /."% -ec5f b ! truncate a%; // b !

1.", 1.", 1.", /."%

function defined for description efficiency ?9ampleD

truncateJtoJint all %loating point (ector classes truncates number towar!s zero. +,e (alue is returne! as an integer (ector goo! i% 88?'.1 instruction set

// single precisionD -ec5f a 1.", 1.(, 1.<, /."% -ec5i b ! truncate.to.int a%; // double precisionD

// b !

1, 1, 1, /%

-ec/d a 1.", 1.5%; -ec/d b 1.(, 1.0% -ec5i c ! truncate.to.int a, b%; // c !

1, 1, 1, /%

function defined for description efficiency ?9ampleD

truncateJtoJint4' Vec2!, Vec'! truncates number towar!s zero. +,e (alue is returne! as an integer (ector poor

-ec/d a 1.(, 1.<% -ec/J b ! truncate.to.int05 a%;

// b !

1, 1%

function defined for description efficiency ?9ampleD

%loor all %loating point (ector classes roun!s number towar!s 6W. +,e (alue is returne! as a %loating point (ector goo! i% 88?'.1 instruction set

-ec5f a >".(, 1.(, 1.<, /."% -ec5f b ! floor a%; // b !

>1.", 1.", 1.", /."%

function defined for description efficiency ?9ampleD

ceil all %loating point (ector classes roun!s number towar!s CW. +,e (alue is returne! as a %loating point (ector goo! i% 88?'.1 instruction set

-ec5f a >".(, 1.1, 1.<, /."% -ec5f b ! ceil a%; // b ! ".", /.", /.", /."%

function defined for description efficiency

appro9Jrecipr Vec'%, Vec1% %ast appro9imate calculation o% reciprocal. 0elati(e accurac) better t,an 2611 goo!

?9ampleD
-ec5f a ".(, 1.", /.", 6."% -ec5f b ! approx.recipr a%; // b ! /.", 1.", ".(, ".666%

function defined for description efficiency ?9ampleD

appro9JrsGrt Vec'%, Vec1% %ast appro9imate calculation o% (alue to t,e power o% 60... 0elati(e accurac) better t,an 2611 goo!

-ec5f a 1.", /.", 6.", 5."% -ec5f b ! approx.rsJrt a%; // b !

1.",".;";,".(;;,".(""%

function defined for description

e9ponent all %loating point (ector classes e9tracts t,e e9ponent part o% a %loating point number. 0esult is an integer (ector. e9ponent@aA L %loor@log2@abs@aAAA, e9cept %or a L 0 me!ium

efficiency ?9ampleD

// single precisionD -ec5f a 1.", /.", 6.", 5."%; -ec5i b ! exponent a%; // b ! // double precisionD -ec/d a 1.", /."%; -ec/J b ! exponent a%; // b !

", 1, 1, /% ", 1%

function defined for description efficiency ?9ampleD

%raction all %loating point (ector classes e9tracts t,e %raction part o% a %loating point number. a L pow@2, e9ponent@aAA H %raction@aA, e9cept %or a L 0 me!ium

-ec5f a /.", 6.", 5.", (."%; -ec5f b ! fraction a%; // b !

1."", 1.(", 1."", 1./(%

function

e9p2

defined for description

all %loating point (ector classes calculates integer powers o% 2. +,e input is an integer (ector, t,e output is a %loating point (ector. -(er%low gi(es C 2F, un!er%low gi(es zero. +,is %unction will ne(er pro!uce !enormals, an! ne(er raise e9ceptions me!ium

efficiency ?9ampleD

// single precisionD -ec5i a >1, ", 1, /%; -ec5f b ! exp/ a%; // double precisionD -ec/J a >1, "%; -ec/d b ! exp/ a%;

// b ! // b !

".(, 1.", /.", 5."% ".(, 1."%

Floating point categori!ation functions


function defined for description efficiency ?9ampleD signJbit all %loating point (ector classes returns true %or elements t,at ,a(e t,e sign bit set, inclu!ing 60.0, 6 2F an! 62A2. goo!

// single precisionD -ec5f a >1.", ".", 1.", /."%; -ec5fb b ! sign.bit a%; // b ! // double precisionD -ec/d a >1.", "."%; -ec/db b ! sign.bit a%; // b !

true, false, false, false% true, false%

function defined for description efficiency ?9ampleD


-ec5f -ec5f -ec5f -ec5fb

isJ%inite all %loating point (ector classes returns true %or elements t,at are normal, !enormal or zero, %alse %or 2F an! 2A2 me!ium
a ".", 1.", /.", 6."%; b >1.", ".", 1.", /."%; c ! a / b; d ! is.finite c%; // d !

true, false, true, true%

function defined for description efficiency ?9ampleD


-ec5f -ec5f -ec5f -ec5fb

isJin% all %loating point (ector classes returns true %or elements t,at are C 2F or 6 2F, %alse %or all ot,er (alues, inclu!ing 2A2 goo!
a ".", 1.", /.", 6."%; b >1.", ".", 1.", /."%; c ! a / b; d ! is.inf c%; // d ! false, true, false, false%

function defined for description efficiency ?9ampleD

isJnan all %loating point (ector classes returns true %or all t)pes o% 2A2, %alse %or all ot,er (alues, inclu!ing 2F me!ium

-ec5f a >1.", ".", 1.", /."%; -ec5f b ! sJrt a%; -ec5fb c ! is.nan b%; // c ! true, false, false, false%

function defined for description efficiency ?9ampleD

isJ!enormal all %loating point (ector classes returns true %or !enormal numbers, %alse %or normal numbers, zero, 2F an! 2A2 me!ium

-ec5f a 1.", 1."E>1", 1."E>/", 1."E>6"%; -ec5f b ! a ' a; // b ! 1., 1.E>/", 1.E>5", ".% -ec5fb c ! is.denormal b%; // c ! false,false,true,false%

function defined for description efficiency ?9ampleD

in%inite'%, in%inite1%, in%inite2!, in%inite'! all %loating point (ector classes returns positi(e in%init) goo!

-ec5f

a ! infinite5f %; // a !

KLI, KLI, KLI, KLI%

function defined for description efficiency ?9ampleD


-ec5f

nan'%, nan1%, nan2!, nan'! all %loating point (ector classes returns positi(e not6a6number goo!
a ! nan5f %; // a ! LAL, LAL, LAL, LAL%

function defined for description efficiency ?9ampleD


-ec5f

snan'%, snan1%, snan2!, snan'! all %loating point (ector classes returns a signalling 2A2. @2oteD )ou cannot alwa)s rel) on a signalling 2A2 causing an e9ceptionA goo!
a ! snan5f %; // a ! LAL, LAL, LAL, LAL%

Floating point control word manipulation functions


<B&80 is a control wor! t,at controls %loating point e9ceptions, roun!ing mo!e an! !enormal numbers. +,e <B&80 ,as t,e %ollowing bitsD bit inde! meaning 0 1 2 3 ' . 4 / 1 7 10 n(ali! -peration Flag =enormal Flag =i(i!e6b)6Xero Flag -(er%low Flag ;n!er%low Flag $recision Flag =enormals Are Xeros n(ali! -peration <as# =enormal -peration <as# =i(i!e6b)6Xero <as# -(er%low <as#

11 12 1361'

;n!er%low <as# $recision <as# 0oun!ing controlD 00D roun! to nearest or e(en 01D roun! !own towar!s 6in%init) 10D roun! up towar!s Cin%init) 11D roun! towar!s zero @truncateA % t,e roun!ing mo!e is temporaril) c,ange! t,en it must be set bac# to 00 %or t,e (ector class librar) to wor# correctl). Flus, to Xero

1.

$lease see programming manuals %rom ntel or A<= %or %urt,er e9planation.

function description efficiency ?9ampleD

getJcontrolJwor! rea!s t,e <B&80 control wor! me!ium


// default value m ! "x1I8"

int m ! get.control.7ord %;

function description efficiency ?9ampleD

setJcontrolJwor! writes t,e <B&80 control wor! me!ium


// overflo7 and divide by 4ero // exceptions

set.control.7ord "x1<8"%;

function description efficiency ?9ampleD

resetJcontrolJwor! sets t,e <B&80 control wor! to t,e !e%ault (alue me!ium

reset.control.7ord %;

function description

noJ!enormals =isables t,e use o% !enormal (alues.

Floating point numbers wit, an absolute (alue below 1.11Y10631 %or single precision or 2.22Y106301 %or !ouble precision are represente! b) !enormal numbers. +,e ,an!ling o% !enormal numbers is e9tremel) time6 consuming on man) &$;s. +,e noJ!enormals %unction sets t,e E!enormals are zerosE an! E%lus, to zeroE mo!e to a(oi! t,e use o% !enormal numbers. t is recommen!e! to call t,is %unction at t,e beginning o% a program or t,rea! i% e9tremel) low numbers are li#el) to occur an! it is acceptable to replace t,ese numbers b) zero. efficiency ?9ampleD me!ium

no.denormals %;

Floating point mat ematical library functions


<at,ematical %unctions suc, as logarit,ms, e9ponential %unctions, trigonometric %unctions, etc. are a(ailable t,roug, e9ternal %unction libraries. Iou get access to t,e (ector mat, %unctions b) inclu!ing t,e ,ea!er %ile E(ectormat,.,E %rom t,e EspecialE sub%ol!er. Iou can c,oose between t,e %ollowing mat,ematical %unction libraries an! in!icate )our c,oice t,roug, t,e !e%ine -E2FMNOAFPD #$CTO%&'T( value 0 "unction library ;ses t,e stan!ar! mat, librar) t,at is inclu!e! wit, t,e compiler. Iou !o not ,a(e to inclu!e an) e9tra libraries. +,e librar) %unction is calle! once %or eac, (ector element. +,is is slow @especiall) %or t,e Gnu librar)A. ;se t,is option %or testing purposes or w,ere per%ormance is not critical. A<= * 5< librar). +,e * 5< librar) is a(ailable %or 4'6bit *inu9 an! 4'6bit >in!ows, but not %or 326bit s)stems. FilenameD am!libm.lib or libam!libm.a. $er%ormance is goo! %or A<= processors wit, F<A', but in%erior %or processors wit,out F<A'. &urrentl), t,e F<A' instruction set is supporte! onl) in A<= processors. ;se ntel 8V<* librar) @8,ort Vector <at, *ibrar)A wit, an) compiler. +,e 8V<* librar) is a(ailable %or all plat%orms rele(ant to t,e (ector class librar). t is inclu!e! wit, ntel &CC compilers but can be use! wit, ot,er compilers as well.

FilenameD s(mlJ!ispmt.lib or libs(ml.a. 5e sure to c,oose t,e 326bit (ersion or 4'6bit (ersion accor!ing t,e plat%orm )ou are compiling %or. $er%ormance is goo! on ntel processors. $er%ormance is in%erior on ot,er bran!s o% processors unless )ou replace ntelZs own &$; !ispatc, %unction. *in# in t,e librar) libircmt.lib to use ntelZs own &$; !ispatc, %unction %or ntel processors, or use an obFect %ile %rom t,e asmlib librar) un!er Eintel!ispatc,patc,E %or best per%ormance on all bran!s o% processors. 8ee m) blog an! m) &CC manual %or !etails. 3 ;se ntel 8V<* librar) wit, an ntel compiler. Iou !o not ,a(e to lin# in an) e9tra libraries. +,e ntel compiler gi(es access to !i%%erent (ersions wit, !i%%erent precision. $er%ormance is goo! on ntel processors, but in%erior on ot,er bran!s o% processors unless )ou lin# in t,e !ispatc, patc, %rom t,e asmlib librar) as !escribe! abo(e.

+,e (alue o% -E2FMNOAFP can be !e%ine! on t,e compiler comman! line or b) a !e%ine statementD
+define -E2FMNOAFP / +include ,vectormat).),

+,e c,osen %unction librar) must be lin#e! into t,e proFect i% t,e (alue o% -E2FMNOAFP is 1 or 2. +,e use o% a (ector mat, %unction is straig,t%orwar!. ?9ampleD
+include +define +include +include #stdio.)3 -E2FMNOAFP " ,vectorclass.), ,vectormat).),

int main % & -ec5f a ".", ".(, 1.", 1.(%; -ec5f b ! sin a%; // call sin function // b ! "."""", ".5;<5, ".851(, ".<<;(% for int i ! "; i # 5; i$$% & printf ,90.5f ,, b[i]%; // output results

* printf ,:n,%; return "; *

+,e a(ailable (ector mat, %unctions are liste! below. +,e e%%icienc) is liste! as poor because t,ese %unctions ta#e longer time to e9ecute t,an simple arit,metic

%unctions, but t,e (ector mat, libraries are ne(ert,eless muc, %aster t,an alternati(es. )owers, e!ponential functions and logarithms* function defined for description efficiency e9p all %loating point (ector classes, all (alues o% V?&+-0<A+" e9ponential %unction poor

function defined for description efficiency

e9pm1 all %loating point (ector classes, all (alues o% V?&+-0<A+", e9cept 0 %or some libraries e9p@9A 6 1. ;se%ul to a(oi! loss o% precision i% 9 is close to 0 poor

function defined for description efficiency

e9p2 all %loating point (ector classes, all (alues o% V?&+-0<A+", e9cept 0 %or some libraries 29 poor

function defined for description efficiency

e9p10 all %loating point (ector classes, all (alues o% V?&+-0<A+" 109 poor

function defined for

pow all %loating point (ector classes,

all (alues o% V?&+-0<A+" description efficiency ?9ampleD pow@a,bA L ab w,ere a an! b are bot, (ectors. 8ee also pow %unction page 2'. poor

-ec5f a 1.", /.", 6.", 5."%; -ec5f b ".", >1.", ".(, /."%; -ec5f c ! po7 a, b%; // c ! 1."""" ".(""" 1.;6/1 10.""""%

function defined for description efficiency

log all %loating point (ector classes, all (alues o% V?&+-0<A+" natural logarit,m poor

function defined for description efficiency

log1p all %loating point (ector classes, all (alues o% V?&+-0<A+", e9cept 0 %or some libraries log@1C9A ;se%ul to a(oi! loss o% precision i% 9 is close to 0 poor

function defined for description efficiency

log2 all %loating point (ector classes, all (alues o% V?&+-0<A+" logarit,m base 2 poor

function defined for description

log10 all %loating point (ector classes, all (alues o% V?&+-0<A+" logarit,m base 10

efficiency

poor

function defined for description efficiency

cubicJroot all %loating point (ector classes, V?&+-0<A+" L 1, 2, 3 cubic root L pow@9, 1./3.A poor

function defined for description efficiency

reciprJsGrt all %loating point (ector classes, V?&+-0<A+" L 2, 3 reciprocal sGuareroot L pow@9, 60,.A poor

function defined for description

ce9p all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 comple9 e9ponential %unction. ?(en6numbere! (ector elements are real part, o!!6numbere! (ector elements are imaginar) part. poor

efficiency

Trigonometric functions +angles in radians,* function defined for description efficiency sin all %loating point (ector classes, all (alues o% V?&+-0<A+" sine %unction poor

function

cos

defined for description efficiency

all %loating point (ector classes, all (alues o% V?&+-0<A+" cosine %unction poor

function defined for description efficiency ?9ampleD

sincos all %loating point (ector classes, all (alues o% V?&+-0<A+" sine an! cosine compute! simultaneousl) poor

-ec5f a ".", ".(, 1.", 1.(%; -ec5f s, c; s ! sincos ?c, a%; // s ! "."""", ".5;<5, ".851(, ".<<;(% // c ! 1."""", ".8;;0, ".(5"6, ".";";%

function defined for description efficiency

tan all %loating point (ector classes, all (alues o% V?&+-0<A+" tangent %unction poor

Inverse trigonometric functions function defined for description efficiency asin all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 in(erse sine %unction poor

function defined for

acos all %loating point (ector classes,

V?&+-0<A+" L 0, 2, 3 description efficiency in(erse cosine %unction poor

function defined for description

atan all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 in(erse tangent %unction. atan@aA L in(erse tangent@aA atan@a, bA L in(erse tangent@a / bA poor

efficiency

(yperbolic functions and inverse hyperbolic functions* function defined for description efficiency sin, all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 ,)perbolic sine poor

function defined for description efficiency

cos, all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 ,)perbolic cosine poor

function defined for description

tan, all %loating point (ector classes, V?&+-0<A+" L 0, 2, 3 ,)perbolic tangent

efficiency

poor

function defined for description efficiency

asin, all %loating point (ector classes, V?&+-0<A+" L 2, 3 in(erse ,)perbolic sine poor

function defined for description efficiency

acos, all %loating point (ector classes, V?&+-0<A+" L 2, 3 in(erse ,)perbolic cosine poor

function defined for description efficiency

atan, all %loating point (ector classes, V?&+-0<A+" L 2, 3 in(erse ,)perbolic tangent poor

$rror function, etc-* function defined for er% all %loating point (ector classes, V?&+-0<A+" L 2, 3, an! some libraries V?&+-0<A+" L0 error %unction poor

description efficiency

function defined for

er%c all %loating point (ector classes, V?&+-0<A+" L 2, 3, an! some libraries V?&+-0<A+" L0 error %unction complement poor

description efficiency

function defined for description efficiency

er%in( all %loating point (ector classes, V?&+-0<A+" L 2, 3 in(erse error %unction poor

function defined for description efficiency

c!%norm all %loating point (ector classes, V?&+-0<A+" L 2, 3 cumulati(e normal !istribution %unction poor

function defined for description efficiency

c!%normin( all %loating point (ector classes, V?&+-0<A+" L 2, 3 in(erse cumulati(e normal !istribution %unction poor

Permute" blend" lookup and c ange sign functions


)ermute functions* function defined for permute..Oi0, i1, ...P@(ectorA all integer an! %loating point (ector classes

description efficiency

permutes (ector elements !epen!s on parameters an! instruction set

+,e permute %unctions can mo(e an) element o% a (ector into an) position, cop) t,e same element to multiple positions, an! set an) element to zero. +,e name o% t,e permute %unction is EpermuteE C t,e (ector t)pe su%%i9, %or e9ample permute'i %or Vec'i. +,e permute %unction %or a (ector o% n elements ,as n in!e9es, w,ic, are entere! as template parameters in angle brac#ets. ?ac, in!e9 in!icates t,e !esire! contents o% t,e correspon!ing element in t,e result (ector. An in!e9 i in t,e inter(al 0 K i K n61 in!icates t,at element number i %rom t,e input (ector s,oul! be place! in t,e correspon!ing position in t,e result (ector. An in!e9 i L 61 gi(es a zero in t,e correspon!ing position. An in!e9 i L 62.4 means !onZt care @i.e. use w,ate(er implementation is %astest, regar!less o% w,at (alue it puts in t,is positionA. +,e (alue )ou get wit, E!onZt careE ma) be !i%%erent %or !i%%erent implementations or !i%%erent instruction sets. ?9ampleD
-ec5i a 1", 11, 1/, 16%; -ec5i b ! permute5i#/,/,6,"3 a%; // b ! -ec5i c ! permute5i#>1,>1,1,13 a%; // c ! 1/, 1/, 16, 1"% ", ", 11, 11%

+,e in!e9es in angle brac#ets must be compile6time constants, t,e) cannot contain (ariables or %unction calls. % )ou nee! (ariable in!e9es t,en use t,e loo#up %unctions @see page '2A. +,e permute %unctions contain a lot o% metaprogramming co!e w,ic, is use! %or %in!ing t,e best implementation %or t,e gi(en set o% in!e9es an! t,e speci%ie! instruction set. +,is metaprogramming pro!uces a lot o% e9tra co!e w,en compiling in !ebug mo!e, but it is re!uce! out w,en compiling %or release mo!e wit, optimization on. +,e call to a permute %unction is re!uce! to Fust one or a %ew mac,ine instructions in %a(orable cases. 5ut in un%a(orable cases w,ere t,e selecte! instruction set ,as no mac,ine instruction t,at matc,es t,e !esire! permutation pattern, it ma) pro!uce man) mac,ine instructions. +,e per%ormance is generall) goo! w,en t,e instruction set 888?3 or ,ig,er is enable!. +,e per%ormance %or permuting (ectors o% 146bit integers is me!ium, an! t,e per%ormance %or permuting (ectors o% 16bit integers is poor %or instruction sets lower t,an 888?3. .lend functions* function defined for blen!..Oi0, i1, ...P@(ector, (ectorA all integer an! %loating point (ector classes

description efficiency

permutes an! blen!s elements %rom two (ectors !epen!s on parameters an! instruction set

+,e blen! %unctions are similar to t,e permute %unctions, but wit, two input (ectors. An in!e9 i in t,e inter(al 0 K i K n61 in!icates t,at element number i %rom t,e %irst input (ector s,oul! be place! in t,e correspon!ing position in t,e result (ector. An in!e9 i in t,e inter(al n K i K 2Hn61 in!icates t,at element number i-n %rom t,e secon! input (ector s,oul! be place! in t,e correspon!ing position in t,e result (ector. An in!e9 i L 61 gi(es a zero in t,e correspon!ing position. An in!e9 i L 62.4 means !onZt care. ?9ampleD
-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec5i c ! blend5i#5,",5,63 a, b%; // c ! /", 1", /", 16%

% )ou want to blen! input %rom more t,an two (ectors, t,ere are t,ree !i%%erent met,o!s )ou can useD 1. A binar) tree o% blen! calls, w,ere unuse! (alues are set to !onZt care @62.4A. ?9ampleD
-ec5i -ec5i -ec5i -ec5i -ec5i -ec5i -ec5i a b c d r s t 1", 11, 1/, 16%; /", /1, //, /6%; 6", 61, 6/, 66%; 5", 51, 5/, 56%; ! blend5i#",(,>/(0,>/(03 a, b%;// r ! ! blend5i#>/(0,>/(0,/,;3 c, d%;// s ! ! blend5i#",1,0,;3 r, s%; // t !

1",/1,Q,Q% Q,Q,6/,56% 1",/1,6/,56%

2. 8et unuse! (alues to zero, an! -0 t,e results. ?9ampleD


-ec5i -ec5i -ec5i -ec5i -ec5i -ec5i -ec5i a b c d r s t 1", 11, 1/, 16%; /", /1, //, /6%; 6", 61, 6/, 66%; 5", 51, 5/, 56%; ! blend5i#",(,>1,>13 a, b%;// r ! ! blend5i#>1,>1,/,;3 c, d%;// s ! ! r @ s; // t !

1",/1,","% ",",6/,56% 1",/1,6/,56%

3. % t,e input (ectors are store! seGuentiall) in memor) t,en use t,e loo#up %unctions s,own below. /oo0up functions* function Vec14c loo#up14@Vec14c, Vec14cA Vec32c loo#up32@Vec32c, Vec32cA

Vec1s loo#up1@Vec1s, Vec1sA Vec14s loo#up14@Vec14s, Vec14sA Vec'i loo#up'@Vec'i, Vec'iA Vec1i loo#up1@Vec1i, Vec1iA Vec'G loo#up'@Vec'G, Vec'GA defined for description Vec14c, Vec32c, Vec1s, Vec14s, Vec'i, Vec1i, Vec'G permutation wit, (ariable in!e9es. +,e %irst input (ector contains t,e in!e9es, t,e secon! input (ector is t,e !ata source. ?ac, in!e9 must be in t,e range 0 K i K n61 w,ere n is t,e number o% elements in a (ector. goo! %or AVB2, me!ium %or lower instruction sets

efficiency

function

Vec14c loo#up32@Vec14c, Vec14c, Vec14cA Vec1s loo#up14@Vec1s, Vec1s, Vec1sA Vec'i loo#up1@Vec'i, Vec'i, Vec'iA Vec'i loo#up14@Vec'i, Vec'i, Vec'i, Vec'i, Vec'iA Vec14c, Vec1s, Vec'i blen! wit, (ariable in!e9es. +,e %irst input (ector contains t,e in!e9es, t,e %ollowing two or %our input (ectors contain t,e !ata source. ?ac, in!e9 must be in t,e range 0 K i K n6 1 w,ere n is t,e number in!icate! b) t,e name. goo! %or AVB2, me!ium %or lower instruction sets

defined for description

efficiency

function

Vec'% loo#up'@Vec'i, Vec'%A Vec1% loo#up1@Vec1i, Vec1%A Vec2! loo#up2@Vec2G, Vec2!A Vec'! loo#up'@Vec'G, Vec'!A all %loating point (ector classes permutation o% %loating point (ectors wit, integer in!e9es. ?ac, in!e9 must be in t,e range 0 K i K n61 w,ere n is t,e number o% elements in a (ector. goo! %or AVB2, me!ium %or lower instruction sets

defined for description

efficiency

function defined for

Vec'% loo#up1@Vec'i, Vec'%, Vec'%A Vec2! loo#up'@Vec2G, Vec2!, Vec2!A Vec'%, Vec2!

description

blen! o% %loating point (ectors wit, integer in!e9es. ?ac, in!e9 must be in t,e range 0 K i K 2Hn61 w,ere n is t,e number o% elements in a (ector. me!ium

efficiency

function

Vec14c loo#upOnP@Vec14c in!e9, (oi! const H tableA Vec32c loo#upOnP@Vec32c in!e9, (oi! const H tableA Vec1s loo#upOnP@Vec1s in!e9, (oi! const H tableA Vec14s loo#upOnP@Vec14s in!e9, (oi! const H tableA Vec'i loo#upOnP@Vec'i in!e9, (oi! const H tableA Vec1i loo#upOnP@Vec1i in!e9, (oi! const H tableA Vec'G loo#upOnP@Vec'G in!e9, (oi! const H tableA Vec'% loo#upOnP@Vec'i in!e9, %loat const H tableA Vec1% loo#upOnP@Vec1i const R in!e9, %loat const H tableA Vec2! loo#upOnP@Vec2G in!e9, !ouble const H tableA Vec'! loo#upOnP@Vec'G const R in!e9, !ouble const H tableA all %loating point an! signe! integer (ector classes permute, blen!, table loo#up or gat,er !ata %rom arra) wit, an integer (ector o% in!e9es. ?ac, in!e9 must be in t,e range 0 K i K n61, w,ere n is in!icate! as a template parameter @n must be a positi(e compile6time constantA. goo! %or AVB2, me!ium %or lower instruction sets

defined for description

efficiency

+,e loo#up %unctions are similar to t,e permute an! blen! %unctions, but wit, (ariable in!e9es. +,e) cannot be use! %or setting an element to zero, an! t,ere is no E!onZt careE option. +,e loo#up %unctions can be use! %or se(eral purposesD 1. 2. 3. '. .. permute wit, (ariable in!e9es blen! wit, (ariable in!e9es blen! %rom more t,an two sources table loo#up gat,er non6contiguous !ata %rom an arra)

+,e in!e9 is alwa)s an integer (ector. +,e input can be one or more (ectors or an arra). +,e result is a (ector o% t,e same t)pe as t,e input. All elements in t,e in!e9 (ector must be in t,e speci%ie! range. +,e be,a(ior %or an in!e9 out o% range is implementation6!epen!ent. +,e loo#up %unctions are not !e%ine! %or unsigne! integer (ector t)pes, but t,e correspon!ing signe! (ersions can be use!. Iou !onZt ,a(e to worr) about

o(er%low w,en con(erting unsigne! integers to signe! ,ere, as long as t,e result (ector is con(erte! bac# to unsigne!. ?9ample o% permutation wit, (ariable in!e9esD
-ec5f a 1.", 1.1, 1./, 1.6%; -ec5i b /, 6, 6, "%; -ec5f c ! looHup5 b, a%; // c ! 1./, 1.6, 1.6, 1."%

?9ample o% blen!ing wit, (ariable in!e9esD


-ec5f -ec5f -ec5i -ec5f a b c d 1.", 1.1, 1./, 1.6%; /.", /.1, /./, /.6%; 5, 6, /, ;%; ! looHup5 c,a,b%; // d !

/.", 1.6, 1./, /.6%

?9ample o% blen!ing %rom more t,an two sourcesD


float sources[1/] ! & 1.",1.1,1./,1.6,/.",/.1,/./,/.6,6.",6.1,6./,6.6*; -ec5i i 11, ", (, (%; -ec5f c ! looHup#1/3 i, sources%; // c ! 6.6,1.",/.1,/.1%

A %unction wit, a limite! number o% possible input (alues can be replace! b) a loo#up table. +,is is use%ul i% table loo#up is %aster t,an calculating t,e %unction. +,is e9ample ,as a table o% t,e %unction y L x2 6 1
// table of t)e function x'x>1 int table[0] ! &>1,",6,8,1(,/5*; -ec5i x 5,/,",(%; -ec5i y ! looHup#03 table%; // y !

1(, 6, >1, /5%

?9ample o% gat,ering non6contiguous !ata %rom an arra)D


float x[10] ! & ... *; -ec5i i ",5,8,1/%; -ec5f y ! looHup#103 i, x%; // y !

x["],x[5],x[8],x[1/]%

1hift byte functions* function defined for description efficiency ?9ampleD


-ec10c a 1",11,1/,16,15,1(,10,1;,18,1<,/",/1,//,/6,/5,/(%; -ec10c b ! s)ift.bytes.up a,(%; // b ! ",",",",",1",11,1/,16,15,1(,10,1;,18,1<,/"%

(ector s,i%tJb)tesJup@(ector, intA (ector s,i%tJb)tesJ!own@(ector, intA Vec14c, Vec32c s,i%ts t,e b)tes o% a (ector up or !own an! inserts zeroes at t,e (acant places me!ium. @)ou ma) use permute %unctions instea! i% t,e s,i%t count is a compile6time constantA

Change sign functions* function defined for description c,angeJsignOi0, i1, ...P@(ectorA all %loating point (ector classes c,anges sign o% (ector elements

efficiency goo! ?ac, template parameter is 1 %or c,anging sign o% t,e correspon!ing element, an! 0 %or no c,ange. ?9ampleD
-ec5f a 1"."f, 11."f, 1/."f, 16."f%; -ec5f b ! c)ange.sign#",1,1,"3 a%; // b ! 1",>11,>1/,16%

#umber string conversion functions


+,ese %unctions reGuire t,e ,ea!er %ile E!ecimal.,E %rom t,e sub%ol!er name! EspecialE. .inary to binary-coded-decimal +.CD, conversion* function defined for description (ector bin2bc!@(ectorA All unsigne! integer (ector t)pes ?ac, (ector element is con(erte! to 5&= co!e. +,e be,a(ior in case o% o(er%low is implementation !epen!ent. me!ium.

efficiency ?9ampleD

+include ,decimal.), ... -ec5ui a 1"",1"1,1"/,1"6%; -ec5ui b ! bin/bcd a%; // b ! "x1"", "x1"1, "x1"/, "x1"6% // maximum value 7it)out overflo7 ! <<<<<<<<%

.inary to decimal '1CII string conversion* function defined for description int bin2ascii @(ector a, c,ar H string, int %iel!len, int num!at, bool sign!, c,ar o(%l, c,ar separator, bool termA Vec14c, Vec32c, Vec1s, Vec14s, Vec'i, Vec1i <a#es an A8& string o% numbers, w,ere eac, (ector element is con(erte! to a ,uman6rea!able !ecimal A8& representation, rig,t6Fusti%ie! in a %iel! o% speci%ie! lengt,. a Vector o% signe! or unsigne! integers to con(ert

parameters

string

&,aracter arra) t,at will recei(e t,e string. <ust be big enoug, to contains t,e worst6case string lengt,, inclu!ing separators an! terminating zero. =esire! lengt, o% eac, %iel! in t,e output string. @!e%ault L 2, ', or 1 !epen!ing on (ector t)peA 2umber o% (ector elements to con(ert. @!e%ault L number o% elements in aA ?ac, number will be interprete! as signe! i% sign! L true, unsigne! i% %alse. @!e%ault L trueA 8peci%ies ,ow to ,an!le cases w,ere a number is too big to %it into a %iel! o% lengt, %iel!len. o(%l L 0D t,e size o% t,e %iel! will be ma!e big enoug, to ,ol! t,e number @ma9 11 c,aractersA. o(%l L A8& c,aracterD t,e %iel! will be %ille! wit, t,is c,aracter i% t,e number is too big to %it into t,e %iel!. @!e%ault L ZHZA

%iel!len num!at sign!

o(%l

separator 8peci%ies an A8& c,aracter to insert between %iel!s @but not a%ter t,e last %iel!A. 0 %or no separator. @!e%ault L Z,ZA term >rites a zero6terminate! A8& string i% term is true. +,e string ,as no terminator i% term is %alse. @!e%ault L trueA +,e returne! (alue is t,e lengt, o% t,e string written. +,e terminating zero is not inclu!e! in t,e count.

return (alue efficiency ?9ampleD

poor, but better t,an alternati(es. mpro(e! b) instruction sets 888?3, 88?'.1, AVB2.

+include ,decimal.), ... -ec5ui a 1/6, 1/65(0, ", >;8%; c)ar text[("]; bin/ascii a, text, (, 5, true, R'R, R,R, true%; // text ! , 1/6,''''', ", >;8,

.inary to he!adecimal '1CII string conversion* function defined for int bin2,e9Jascii @(ector a, c,ar H string, int num!at, c,ar separator, bool termA All signe! integer (ector t)pes

description

<a#es an A8& string o% ,e9a!ecimal numbers, w,ere eac, (ector element is con(erte! to an unsigne! ,e9a!ecimal A8& representation in a %iel! o% 1, ' or 2 c,aracters, !epen!ing on t,e (ector t)pe. a string Vector o% integers to con(ert &,aracter arra) t,at will recei(e t,e string. <ust be big enoug, to contains t,e string lengt,, inclu!ing separators an! terminating zero. 2umber o% (ector elements to con(ert. @!e%ault L number o% elements in aA

parameters

num!at

separator 8peci%ies an A8& c,aracter to insert between %iel!s @but not a%ter t,e last %iel!A. 0 %or no separator. @!e%ault L Z,ZA term >rites a zero6terminate! A8& string i% term is true. +,e string ,as no terminator i% term is %alse. @!e%ault L trueA +,e returne! (alue is t,e lengt, o% t,e string written. +,e terminating zero is not inclu!e! in t,e count.

return (alue efficiency ?9ampleD

<e!ium. mpro(e! b) instruction sets 888?3, AVB2.

+include ,decimal.), ... -ec5ui a /(0, "x1/65abcd, ", >1%; c)ar text[("]; bin/)ex.ascii a, text, 5, R,R, true%; // text ! ,"""""1"",1/65AE2C,"""""""",IIIIIIII,

Decimal '1CII string to binary number conversion* function defined for description Vec'i ascii2bin@Vec32c stringA Vec32c +,e input (ector contains an A8& string, organize! as %our %iel!s o% 1 c,aracters eac,. ?ac, %iel! contains a !ecimal number. +,ere are no separator or terminator c,aracters. ?ac, number must be rig,t6Fusti%ie! in its %iel!. 8paces an! a minus sign are allowe! to t,e le%t o% eac, number. 2o ot,er c,aracters are allowe!. +,e %unction returns a (ector o% %our signe! integers. A s)nta9 error is in!icate! b) t,e (alue 0910000000, w,ic, cannot occur ot,erwise. me!ium.

efficiency

?ac, %iel! must be e9actl) 1 c,aracters wi!e. +,e number must ,a(e one or more !igits Z0Z 6 Z7Z. 8paces an! one minus sign are allowe! to t,e le%t o% t,e number. 2ot,ing is allowe! to t,e rig,t o% t,e number. 2o ot,er c,aracters t,an !igits, spaces an! minus sign are allowe!. +,e s)nta9 o% t,e input string can be !e%ine! wit, t,e %ollowing ?52F !escriptionD OstringP O%iel!P OspaceP OminusP O!igitP DDL O%iel!P O%iel!P O%iel!P O%iel!P DDL [ OspaceP \ M OminusP N [ OspaceP \ O!igitP [ O!igitP \ DDL Z Z DDL Z6Z DDL Z0Z S Z1Z S Z2Z S Z3Z S Z'Z S Z.Z S Z4Z S Z/Z S Z1Z S Z7Z

A s)nta9 error in a %iel! will set t,e correspon!ing number to 2+J< 2 L 0910000000. +,is will not a%%ect t,e ot,er numbers. t is -] to input a string w,ere onl) part o% t,e string contains (ali! numbers an! ignore t,e rest because t,ere is no per%ormance penalt) %or s)nta9 errors. +,e error6(alue 0910000000 L 621'/'134'1 cannot occur wit, a correct input because it reGuires more t,an eig,t !igits to represent. +,e numbers cannot be bigger t,an 77777777 or smaller t,an 67777777 because t,e) ,a(e to %it into eig,t c,aracters. +,e %ollowing e9ample ,as a s)nta9 error in t,e last %iel! because t,ere are spaces to t,e rig,t o% t,e numberD
+include ,decimal.), ... c)ar str[] ! , 1/6 >5(0;8 """"5 -ec6/c string ! -ec6/c %.load str%; -ec5i a ! ascii/bin string%; // a ! 1/6, >5(0;8, 5, "x8"""""""%

,;

.oolean operations and per-element branches


&onsi!er t,is piece o% &CC co!eD
int a[5], b[5], c[5], d[5]; ... for int i ! "; i # 5; i$$% & d[i] ! a[i] 3 " ?? a[i] # 1"% Q b[i] D c[i]; *

>e can !o t,is wit, (ectors in t,e %ollowing wa)D

-ec5i a, b, c, d; ... d ! select a 3 " ? a # 1", b, c%;

+,e select %unction is similar to t,e Q D operator. t ,as t,ree (ector parametersD t,e %irst parameter is interprete! as a 5oolean (ector, w,ic, c,ooses between t,e elements o% t,e secon! an! t,ir! (ector parameter. +,e relational operators 3, 3!, #, #!, !!, =! pro!uce 5oolean (ectors, w,ic, accept t,e 5oolean operations ?, @, A, B. +,ere is a table on page 4 s,owing w,ic, (ector classes can be use! as 5oolean (ectors %or t,ese operators. n t,e abo(e e9ample, t,e 5oolean (ectors must ,a(e t,e same number o% elements per (ector as a, b, c an! d. A (alue o% true is represente! b) 61, i.e. all bits in t,e (ector element are 1. False is represente! b) 0. t is important to note t,at t,e select %unction an! t,e 5oolean operators will not wor# correctl) ,ere i% a (ector element t,at is suppose! to represent a 5oolean (alue ,as an) ot,er (alue t,an 0 or 61. +,e be,a(ior %or ot,er (alues t,an 0 an! 61 is implementation !epen!ent an! !i%%erent %or !i%%erent instruction sets. +,e (ector elements t,at are not selecte! are calculate! an)wa) because )ou cannot calculate a part o% a (ector. For e9ampleD
-ec5f a >1."f, "."f, 1."f, /."f%; -ec5f b ! select a 3! "."f, sJrt a%, "."f%;

"ere, we will be calculating t,e sGuareroot o% 61 e(en t,oug, we are not using it. +,is coul! possibl) generate an e9ception i% %loating point e9ceptions are not mas#e!. A better solution woul! t,ere%ore beD
-ec5f a >1."f, "."f, 1."f, /."f%; -ec5f b ! sJrt max a, "."f%%;

*i#ewise, t,e ? an! @ operators are calculating bot, input operan!s, e(en i% t,e secon! operan! is not use!. +,e %ollowing e9amples illustrates t,isD
// array versionD float a[5] ! &>1."f, "."f, 1."f, /."f*; float b[5]; for int i ! "; i # 5; i$$% & if a 3 "."f ?? sJrt a% # 8."f% b ! a; else b ! 1."f; *

an! t,e (ector (ersion o% t,e sameD


// vector versionD -ec5f a >1."f, "."f, 1."f, /."f%; -ec5f b ! select a 3 "."f% ? sJrt a% # 8."f%, a, 1."f%;

n t,e arra) (ersion, we will ne(er call sJrt >1% because t,e ?? operator !oesnZt e(aluate t,e secon! operan! w,en t,e %irst operan! is %alse. 5ut in t,e (ector (ersion we are in!ee! calculating sJrt >1% because t,e ? operator alwa)s e(aluates bot, operan!s. +,e (ector class librar) !e%ines t,e operators ?? an! @@ as s)non)ms to ? an! @ %or con(enience, but t,e) are still !oing a bitwise A2= or -0 operation, so ? an! @ are actuall) more representati(e o% w,at t,ese operators reall) !o. +,e (ector class librar) !e%ines separate (ector classes %or use as 5oolean (ectors in connection wit, %loating point (ectors. For e9ample, in t,e abo(e e9ample w,ere a is an obFect o% class -ec5f, t,e e9pression a 3 "."f% is an obFect o% class -ec5fb. +,e class -ec5fb !oes e9actl) t,e same as t,e class -ec5i w,en use! as 5oolean. 5ot, are 5oolean (ectors wit, ' elements o% 32 bits eac,. +,e reason w,) we ,a(e !e%ine! a separate 5oolean (ector class %or use wit, %loating point (ectors is t,at it enables us to pro!uce %aster co!e. @<an) mo!ern &$;Zs ,a(e separate e9ecution units %or integer (ectors an! %loating point (ectors. t is sometimes possible to !o t,e 5oolean operations in t,e %loating point unit an! t,ereb) a(oi! t,e !ela) %rom mo(ing !ata between t,e two unitsA. +,e %ollowing table s,ows t,e 5oolean (ector classes t,at are inten!e! %or use wit, %loating point (ectorsD .oolean vector for float Vec'%b Vec1%b Vec2!b Vec'!b 2se with floating point vector class Vec'% Vec1% Vec2! Vec'! $ uivalent .oolean vector for integers Vec'i Vec1i Vec2G Vec'G

+,e eGui(alent 5oolean (ectors can be con(erte! to eac, ot,er, %or e9ampleD
-ec5f -ec5f -ec5fb -ec5i a >1."f, "."f, 1."f, /."f%; b 1."f, "."f, >1."f, >/."f%; c ! a 3 b; // c ! false, false, true, true% d ! c; // d ! ", ", >1, >1%

"owe(er, )ou s,oul! remember w,en con(erting an integer (ector to 5oolean, t,at t,e onl) allowe! (alues are 0 an! 61. For e9ampleD
-ec5i a >1, ", 1, /%; -ec5fb b ! a; // 7ill not 7orHD values 1 and / are 7rong

t is possible to use t,e bitwise operators ?, @, A, B wit, an) !ata (ectors. For e9ample, i% b is a 5oolean (ector an! a is some ot,er #in! o% !ata, t,en b ? a

will be eGual to a i% b is true, an! zero i% b is %alseD


-ec5f -ec5f -ec5fb -ec5f -ec5f a c b d e >1."f, "."f, 1."f, /."f%; 1."f, "."f, >1."f, >/."f%; ! a 3 c; ! -ec5f b% ? a; // d ! ".", ".", 1.", /."%; ! select b, a, "."f%; // same, but slo7er

+,e (ector classes -ec1/8b an! -ec/(0b contain 121 an! 2.4 5ooleans o% 1 bit eac,, respecti(el). +,ese classes can be use! wit, t,e operators ?, @, A, B but t,e) are not use%ul %or selecting !ata %rom t,e ot,er (ector classes.

function defined for description efficiency

(ector select@boolean (ector s, (ector a, (ector bA all integer an! %loating point (ector classes branc, per element. resultMiN L sMiN ^ aMiN D bMiN goo!
// b ! <,1",1,/%

ExampleD -ec5i a >1, ", 1, /%; -ec5i b ! select a 3 ", a, a $ 1"%;

function defined for description efficiency

bool ,orizontalJan!@(ectorA all integer (ector classes, Vec'%b, Vec1%b, Vec2!b, Vec'!b t,e input is a (ector use! as boolean. +,e output is t,e A2= combination o% all elements me!ium
// b ! false

ExampleD -ec5i a >1, ", 1, /%; -ec5i b ! )ori4ontal.and a 3 "%;

function defined for description efficiency

bool ,orizontalJor@(ectorA all integer (ector classes, Vec'%b, Vec1%b, Vec2!b, Vec'!b t,e input is a (ector use! as boolean. +,e output is t,e -0 combination o% all elements me!ium
// b ! true

ExampleD -ec5i a >1, ", 1, /%; -ec5i b ! )ori4ontal.or a 3 "%;

function defined for description efficiency

(ector an!not@(ector, (ectorA Vec121b, Vec2.4b, Vec'%b, Vec1%b, Vec2!b, Vec'!b an!not@a,bA L a R U b goo! @better t,an a R U bA

Conversion between vector types


5elow is a list o% met,o!s an! %unctions %or con(ersion between !i%%erent (ector t)pes, (ector sizes or precisions.

method defined for description

con(ersion between (ector class an! intrinsic (ector t)pe all (ector classes con(ersion between a (ector class an! t,e correspon!ing intrinsic (ector t)pe JJm121, JJm121!, JJm121i, JJm2.4, JJm2.4!, JJm2.4i can be !one implicitl) or e9plicitl). goo!

efficiency ?9ampleD

-ec5i a ",1,/,6%; ..m1/8i b ! a; // b ! "x"""""""6"""""""/"""""""1"""""""" -ec5i c ! b; // c ! ",1,/,6%

method defined for description

con(ersion %rom scalar to (ector all (ector classes con(ersion %rom a scalar @single (alueA to a (ector can be !one e9plicitl) b) calling a constructor, or implicitl) b) putting a scalar w,ere a (ector is e9pecte!. All (ector elements get t,e same (alue. goo! %or constant. <e!ium %or (ariable as parameter

efficiency ?9ampleD

-ec5i a, b; a ! -ec5i (%; b ! a $ 6;

// explicit conversion. a ! (,(,(,(% // implicit conversion to -ec5i. // b ! 8,8,8,8%

mplicit con(ersion is con(enient in t,e e9ample b L a C 3, w,ic, a!!s 3 to all elements o% t,e (ector. ;se e9plicit con(ersion w,ere t,ere is ambiguit) about t,e !esire! (ector t)pe.

method defined for description efficiency ?9ampleD

con(ersion between signe! an! unsigne! integer (ectors all integer (ector classes signe! 3 unsigne! con(ersion can be !one implicitl) or e9plicitl). -(er%low an! un!er%low wraps aroun! goo!
signed vector implicit conversion to unsigned. b ! "xIIIIIIII,",1,/% same, 7it) explicit conversion convert bacH to signed

-ec5i a >1,",1,/%; -ec5ui b ! a;

// // // -ec5ui c ! -ec5ui a%; // -ec5i d ! c; //

method defined for description

con(ersion between !i%%erent integer (ector t)pes all integer (ector classes con(ersion can be !one implicitl) or e9plicitl) between all integer (ector classes wit, t,e same total number o% bits. +,is con(ersion !oes not c,ange an) bits, Fust t,e grouping o% bits into elements is c,ange! goo!

efficiency ?9ampleD

-ec8s a ",1,/,6,5,(,0,;%; -ec5i b ! -ec5i a%; // b !

"x1""", "x6""/, "x(""5, "x;""0%

method defined for description

reinterpretJ!, reinterpretJ%, reinterpretJi all (ector classes reinterprets a (ector as a !i%%erent t)pe wit,out c,anging an) bits. reinterpretJ! is use! %or con(erting to Vec2! or Vec'!, reinterpretJ% is use! %or con(erting to Vec'% or Vec1%, reinterpretJi is use! %or con(erting to an) integer (ector t)pe goo!

efficiency ?9ample

-ec5f a 1."f, 1.(f, /."f, /.(f%; -ec5i b ! reinterpret.i a%; // b ! "x6I8""""", "x6I2""""", "x5""""""", "x5"/"""""%

method

Vec'i roun!JtoJint@Vec'%A Vec'i roun!JtoJint@Vec2!, Vec2!A Vec1i roun!JtoJint@Vec1%A Vec'i roun!JtoJint@Vec'!A

defined for description

all %loating point (ector classes roun!s %loating point numbers to nearest integer an! returns integer (ector. @w,ere two integers are eGuall) near, t,e e(en integer is returne!A me!ium

efficiency ?9ampleD

-ec5f a 1."f, 1.(f, /."f, /.(f%; -ec5i b ! round.to.int a%; // b !

1,/,/,/%

method

Vec2G roun!JtoJint4'@Vec2!A Vec'G roun!JtoJint4'@Vec'!A

defined for description

Vec2!, Vec'! roun!s %loating point numbers to nearest integer an! returns integer (ector. @w,ere two integers are eGuall) near, t,e e(en integer is returne!A poor

efficiency ?9ampleD

-ec5d a 1.", 1.(, /.", /.(%; -ec5J b ! round.to.int05 a%;

// b !

1,/,/,/%

method

Vec'i truncateJtoJint@Vec'%A Vec'i truncateJtoJint@Vec2!, Vec2!A Vec1i truncateJtoJint@Vec1%A Vec'i truncateJtoJint@Vec'!A

defined for description efficiency

all %loating point (ector classes truncates %loating point numbers towar!s zero an! returns integer (ector. me!ium

?9ampleD
-ec5f a 1."f, 1.(f, /."f, /.(f%; -ec5i b ! truncate.to.int a%; // b ! 1,1,/,/%

method

Vec2G truncateJtoJint4'@Vec2!A Vec'G truncateJtoJint4'@Vec'!A

defined for description efficiency ?9ampleD

Vec2!, Vec'! truncates %loating point numbers towar!s zero an! returns integer (ector. poor

-ec5d a 1.", 1.(, /.", /.(%; -ec5J b ! truncate.to.int05 a%;

// b !

1,/,/,/%

method

Vec'% toJ%loat@Vec'iA Vec1% toJ%loat@Vec1iA

defined for description efficiency ?9ampleD

Vec'i, Vec1i con(erts integers to single precision %loat me!ium

-ec5i a ", 1, /, 6%; -ec5f b ! to.float a%;

// b !

"."f, 1."f, /."f, 6."f%

method defined for description efficiency ?9ampleD

Vec'! toJ!ouble@Vec'iA Vec'i con(erts 326bit integers to !ouble precision %loat me!ium

-ec5i a ", 1, /, 6%; -ec5d b ! to.double a%;

// b !

".", 1.", /.", 6."%

method

Vec2! toJ!ouble@Vec2GA Vec'! toJ!ouble@Vec'GA

defined for description efficiency ?9ampleD

Vec2G, Vec'G con(erts 4'6bit integers to !ouble precision %loat poor

-ec/J a ", 1%; -ec/d b ! to.double a%;

// b !

".", 1."%

method

Vec2! toJ!oubleJlow@Vec'iA Vec2! toJ!oubleJ,ig,@Vec'iA

defined for description efficiency ?9ampleD

Vec'i con(erts 326bit integers to !ouble precision %loat me!ium

-ec5i a ", 1, /, 6%; -ec/d b ! to.double.lo7 a%; // b ! -ec/d c ! to.double.)ig) a%; // c !

".", 1."% /.", 6."%

method defined for description efficiency ?9ampleD

concatenating (ectors all 1216bit (ector classes two 1216bit (ectors can be concatenate! into one 2.46bit (ector o% t,e correspon!ing t)pe b) calling a constructor goo!

-ec5i a 1",11,1/,16%; -ec5i b /",/1,//,/6%; -ec8i c a, b%; // c !

1",11,1/,16,/",/1,//,/6%

method defined for description efficiency ?9ampleD

getJlow, getJ,ig, all 2.46bit (ector classes one 2.46bit (ector can be split into two 1216bit (ectors b) calling t,e met,o!s getJlow an! getJ,ig, goo!

-ec8i a 1",11,1/,16,15,1(,10,1;%; -ec5i b ! a.get.lo7 %; // b ! 1",11,1/,16%

-ec5i c ! a.get.)ig) %; // c !

15,1(,10,1;%

method defined for description

e9ten!Jlow, e9ten!J,ig, Vec14c, Vec14uc, Vec1s, Vec1us, Vec'i, Vec'ui, Vec32c, Vec32uc, Vec14s, Vec14us, Vec1i, Vec1ui, e9ten!s integers to a larger number o% bits per element. ;nsigne! integers are zero6e9ten!e!, signe! integers are sign6e9ten!e!. goo!

efficiency ?9ampleD

-ec8s a >/, >1, ", 1, /, 6, 5, (%; -ec5i b ! extend.lo7 a%; // b ! >/, >1, ", 1% -ec5i c ! extend.)ig) a%; // c ! /, 6, 5, (%

method defined for description efficiency ?9ampleD

e9ten!Jlow, e9ten!J,ig, Vec'%, Vec1% e9ten!s single precision %loating point numbers to !ouble precision goo!

-ec5f a 1."f, 1.1f, 1./f, 1.6f%; -ec/d b ! extend.lo7 a%; // b ! -ec/d c ! extend.)ig) a%; // c !

1.", 1.1% 1./, 1.6%

method defined for description efficiency ?9ampleD

compress Vec1s, Vec1us, Vec'i, Vec'ui, Vec2G, Vec2uG Vec14s, Vec14us, Vec1i, Vec1ui, Vec'G, Vec'uG re!uces integers to a lower number o% bits per element. -(er%low an! un!er%low wraps aroun! me!ium

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec8s c ! compress a, b%; // c !

1",11,1/,16,/",/1,//,/6%

method defined for description efficiency ?9ampleD

compress Vec2!, Vec'! re!uces !ouble precision %loating point numbers to single precision me!ium

-ec/d a 1.", 1.1%; -ec/d b /.", /.1%; -ec5f c ! compress a, b%; // c !

1."f, 1.1f, /."f, /.1f%

method defined for description efficiency ?9ampleD

compressJsaturate! Vec1s, Vec1us, Vec'i, Vec'ui, Vec2G, Vec2uG Vec14s, Vec14us, Vec1i, Vec1ui, Vec'G, Vec'uG re!uces integers to a lower number o% bits per element. -(er%low an! un!er%low saturates me!ium @worse t,an compress in most casesA

-ec5i a 1", 11, 1/, 16%; -ec5i b /", /1, //, /6%; -ec8s c ! compress.saturated a, b%; // c ! 1",11,1/,16,/",/1,//,/6%

1pecial applications
$%dimensional vectors
+,e ,ea!er %ile E(ector3!.,E in t,e sub%ol!er name! EspecialE !e%ines 36 !imensional (ectors %or use in geometr) an! p,)sics. Vector classes !e%ine! in (ector3!.,D vector class precision elements per vector Vec3% Vec3! single !ouble 3 3 total bits 121 2.4 recommended instruction set 88?3 AVB

+,ese (ector classes are actuall) using (ector registers t,at can ,ol! ' %loats or ' !oubles, respecti(el). +,e last element in t,e (ector register is not use!. <ost operators an! %unctions are similar to t,ose o% Vec'% an! Vec'!. A

constructor wit, t,e t,ree coor!inates is !e%ine!D method defined for description constructor wit, 3 elements as parameter Vec3%, Vec3! contents is initialize! wit, 9, ), z coor!inates

2ote t,at some operators an! %unctions in,erite! %rom Vec'% an! Vec'! ma#e little or no sense. For e9ample, t,e P operator will ma#e a not (er) use%ul element6b)6element comparison rat,er t,an comparing (ector lengt,sD
-ec6f a 1",11,1/%; -ec6f b 1/,11,1"%; -ec5fb c ! a 3 b; // c ! false,false,true,false% bool d ! vector.lengt) a% 3 vector.lengt) b%; // d ! false

<ember %unctionsD member function defined for description member function defined for description getJ9@A, getJ)@A, getJz@A Vec3%, Vec3! e9tract a single coor!inate e9tract@in!e9A Vec3%, Vec3! e9tracts coor!inate 9, ) or z %or in!e9 L 0, 1 or 2, respecti(el)

Arit,metic operatorsD operators defined for description C, 6, H, / Vec3%, Vec3! element6b)6element operation

&omparison operatorsD operators defined for description LL, QL Vec3%, Vec3! returns a boolean telling i% (ectors are eGual or not eGual. +,e unuse! %ourt, element is ignore!.

+,ere are se(eral !i%%erent wa)s to multipl) 36!imensional (ectorsD

operator defined for description operator defined for description function defined for description function defined for description

(ector H (ector Vec3%, Vec3! element6b)6element multiplication (ector H scalar, scalar H (ector Vec3%, Vec3! all elements are multiplie! b) t,e scalar !otJpro!uct@(ector, (ectorA Vec3%, Vec3! returns t,e !ot6pro!uct as a scalar crossJpro!uct@(ector, (ectorA Vec3%, Vec3! returns t,e cross6pro!uct as a (ector perpen!icular to t,e two input (ectors

-t,er %unctionsD function defined for description function defined for description (ectorJlengt,@(ectorA Vec3%, Vec3! returns t,e lengt, as a scalar normalizeJ(ector@(ectorA Vec3%, Vec3! returns a (ector wit, unit lengt, an! same !irection as t,e input (ector rotate@(ector c0, (ector c1, (ector c2, (ector aA Vec3%, Vec3! rotates (ector a b) multipl)ing wit, t,e matri9 !e%ine! b) t,e columns @c0,c1,c2A. @ % t,e rotation matri9 is !e%ine! b) rows t,en it must %irst be transpose! to get t,e column (ectors, see page 14 %or an e9ampleA.

function defined for description

?9ampleD
-ec6f a 11,//,66%;

-ec6f c" ",1,"%, c1 ",",1%, c/ 1,","%; -ec6f b ! rotate c", c1, c/, a%; // b !

//,66,11%

function defined for description function defined for description

toJsingle Vec3! con(erts to Vec3% toJ!ouble Vec3% con(erts to Vec3!

Comple& number vectors


+,e ,ea!er %ile Ecomple9(ec.,E in t,e sub%ol!er name! EspecialE !e%ines classes %or comple9 numbers an! comple9 (ectors %or use in mat,ematics an! electronics. &lasses !e%ine! in comple9(ec.,D vector class precision comple! numbers per vector &omple92% &omple9'% &omple91% &omple92! single single single !ouble 1 2 ' 1 total bits recommended instruction set 88?2 88?2 AVB 88?2

121 121 2.4 121

&omple9'! !ouble 2 2.4 AVB +,e class &omple92% uses t,e lower ,al% o% a 1216bit register, w,ile t,e upper ,al% o% t,e register is unuse!. +,e ot,er comple9 classes use a %ull 1216bit or 2.46bit register. +,e minimum instruction set is 88?2. +,e per%ormance o% multiplication is impro(e! b) compiling %or t,e 88?3 instruction set. +,e per%ormance o% multiplication an! !i(ision is impro(e! b) compiling %or t,e F<A3 or F<A' instruction set. &onstructorsD method defined for !e%ault constructor all comple9 classes

description method defined for description ?9ampleD

contents is not initialize! constructor wit, real an! imaginar) parts all comple9 classes all elements are initialize! wit, real an! imaginar) parts

2omplex5f a 1."f, /."f, 6."f, 5."f%; // a ! 1$/i, 6$5i%

method defined for description

constructor wit, one real an! one imaginar) part all comple9 classes all elements are initialize! wit, t,e same @real,imaginar)A pair constructor wit, one real part onl) all comple9 classes all elements are initialize! wit, t,e same real number. +,e imaginar) parts are set to zero constructor wit, one &omple92% or &omple92! &omple9'%, &omple91%, &omple9'! all elements are initialize! wit, t,e same @real,imaginar)A pair constructor wit, two &omple9'% or %our &omple92% &omple91% (ectors are concatenate!

method defined for description

method defined for description

method defined for description <ember %unctionsD method defined for description ?9ampleD

loa! all comple9 classes all elements are initialize! %rom a %loat or !ouble arra) containing alternating real an! imaginar) parts

double x[5] ! &1.", /.", 6.", 5."*;

-ec5d a; a.load x%; // a !

1$/i, 6$5i%

method defined for description

getJlow, getJ,ig, &omple9'%, &omple91%, &omple9'! get lower or upper ,al% or t,e (ector as a &omple92%, &omple9'%, &omple92!, respecti(el) e9tract@in!e9A all comple9 classes e9tract a single real or imaginar) part. in!e9 L 0 gi(es real part o% %irst element, in!e9 L 1 gi(es imaginar) part o% %irst element, etc.

method defined for description

-peratorsD operators defined for description C, CL, 6, 6L, unar) minus, H, HL, /, /L all comple9 classes arit,metic %unctions between two comple9 numbersD
a$i'b% a$i'b% a$i'b% a$i'b% $ > ' / c$i'd% c$i'd% c$i'd% c$i'd% ! ! ! ! a$c% $ i' b$d%% a>c% $ i' b>d%% a'c>b'd% $ i' a'd$b'c%% a'c$b'd%$i' b'c>a'd%%/ c/$d/%

-perators combining comple9 an! real operators C, CL, 6, 6L, H, HL, /, /L defined for description all comple9 classes arit,metic %unctions between a comple9 number an! a realD a$ib% $ c ! a$c% $ i'b% a$ib% > c ! a>c% $ i'b% a$ib% ' c ! a'c% $ i' b'c%% a$ib% / c ! a/c% $ i' b/c%% c / @aCibA L @@aHcA 6 iH@bHcAA/@a2Cb2A

&omple9 conFugateD operators defined for description U all comple9 classes comple9 conFugate o% all (ector elementsD B a$i'b% ! a>i'b%

&omparison operatorsD operators defined for description LL, QL all comple9 classes returns a boolean %or &omple92% an! &omple92!. returns a boolean (ector %or &omple9'%, &omple91%, &omple9'!. +,e output can be use! in t,e select %unction

FunctionsD function defined for description function defined for description abs all comple9 classes abs@aCiHbA L sGrt@aHaCbHbA sGrt all comple9 classes sGuare root o% comple9 number

function defined for description ?9ampleD

select all comple9 classes selects between t,e elements o% two (ectors

2omplex5f a 1,/,6,5%; 2omplex5f b 1,/,(,0%; 2omplex5f c ! select a !! b, 2omplex5f "%, b%; // c ! "$i'", ($i'0%

function defined for description function defined for description

toJsingle &omple92!, &omple9'! con(erts to &omple92%, &omple9'% toJ!ouble &omple92%, &omple9'% con(erts to &omple92!, &omple9'!

function defined for description

ce9p all comple9 classes comple9 e9ponential %unctionD cexp a$i'b% ! exp a%' cos b%$i'sin b%% For best per%ormance, inclu!e (ectormat,., be%ore comple9(ec., an! use ntel 8V<* librar) as e9plaine! on page 32.

'uaternions
+,e ,ea!er %ile EGuaternion.,E in t,e sub%ol!er name! EspecialE !e%ines classes %or Guaternions @,)percomple9 numbersA %or use in mat,ematics an! geometr). &lasses !e%ine! in Guaternion.,D vector class precision uaternions per vector :uaternion'% :uaternion'! &onstructorsD method defined for description method defined for description ?9ampleD single !ouble 1 1 total bits 121 2.4 recommended instruction set 88?2 AVB

!e%ault constructor :uaternion'%, :uaternion'! contents is not initialize! constructor wit, real an! imaginar) parts :uaternion'%, :uaternion'! initialize! wit, real an! imaginar) parts

Suaternion5f a 1."f, /."f, 6."f, 5."f%; // a ! 1 $ /'i $ 6'G $ 5'H%

method defined for description

constructor wit, one real part onl) :uaternion'%, :uaternion'! initialize! wit, t,e real number. +,e imaginar) parts are set to zero constructor wit, two &omple92% or two &omple92! :uaternion'%, :uaternion'!

method defined for

description

+,e Guaternion is constructe! %rom two comple9 numbersD :uaternion@@aCbHiA,@cC!HiAA L @aCbHiA C @cC!HiAHF L aCbHiCcHFC!H#

method defined for description

constructor wit, (ector :uaternion'%@Vec'%A, :uaternion'!@Vec'!A +,e %our (ector elements go into t,e real part an! t,e t,ree imagniar) parts constructor %rom 36!imensional (ector :uaternion'%@Vec3%A, :uaternion'!@Vec3!A @9,),zA is con(erte! to @9HiC)HFCzH#A. &on(ersion %rom Guaternion to 36!imensional (ector is also possible. +,e crossJpro!uct %unction %or Vec3% an! Vec3! correspon!s to t,e operator H %or :uaternion'% an! :uaternion'!. 2ote t,at t,ese con(ersions are onl) a(ailable i% (ector3!., is inclu!e! be%ore Guaternion.,

method defined for description

<ember %unctionsD method defined for description loa!@pointerA :uaternion'%, :uaternion'! +,e Guaternion is rea! %rom a %loat or !ouble arra) containing t,e real part %ollowe! b) t,e imaginar) parts store@pointerA :uaternion'%, :uaternion'! +,e Guaternion is store! as %our (alues in a %loat or !ouble arra) getJlow@A, getJ,ig,@A :uaternion'%, :uaternion'! 8plit t,e Guaternion into two comple9 numbers. G L G.getJlow@A C G.getJ,ig,@AHF real@A

method defined for description

method defined for description

method

defined for description method defined for description method defined for description

:uaternion'%, :uaternion'! Get t,e real part as a %loat or !ouble imag@A :uaternion'%, :uaternion'! Get t,e imaginar) parts, wit, t,e real part set to zero e9tract@in!e9A :uaternion'%, :uaternion'! e9tract a single real or imaginar) part. in!e9 L 0 gi(es real part o% %irst element, in!e9 L 1 gi(es %irst imaginar) part, etc. toJ(ector@A :uaternion'%, :uaternion'! &on(ert to a (ector Vec'% or Vec'! containing t,e real part an! t,e imaginar) parts.

method defined for description

-peratorsD operators defined for description C, CL, 6, 6L, unar) minus, H, HL, /, /L :uaternion'%, :uaternion'! Arit,metic %unctions between two Guaternions. <ultiplication is not commutati(e. =i(ision o% Guaternions is ambiguous. "ere, !e(ision is !e%ine! as G / r L G H reciprocal@rA.

-perators combining comple9 an! real operators C, CL, 6, 6L, H, HL, /, /L defined for description &omple9 conFugateD operators defined for description U :uaternion'%, :uaternion'! +,e conFugate is !e%ine! as :uaternion'%, :uaternion'! Arit,metic %unctions between a Guaternion an! a real

B a$b'i$c'G$d'H% ! &omparison operatorsD operators defined for description LL, QL

a>b'i>c'G>d'H%

:uaternion'%, :uaternion'! returns a boolean %or &omple92% an! &omple92!. returns a boolean (ector %or &omple9'%, &omple91%, &omple9'!. +,e output can be use! in t,e select %unction

FunctionsD function defined for description function defined for description function defined for description function defined for description function defined for description abs :uaternion'%, :uaternion'! abs@aCiHbA L sGrt@aHaCbHbA sGrt :uaternion'%, :uaternion'! sGuare root o% comple9 number select :uaternion'%, :uaternion'! selects between two Guaternions toJsingle :uaternion'! con(erts :uaternion'! to :uaternion'% toJ!ouble :uaternion'% con(erts :uaternion'% to :uaternion'!

Instruction sets and C)2 dispatching


Almost e(er) new generation o% microprocessors ,as a new e9tension to t,e instruction set. <ost o% t,e new instructions relate to (ector operations. >e can

ta#e a!(antage o% t,ese new instructions to ma#e (ector co!e more e%%icient. +,e (ector class librar) reGuires t,e 88?2 instruction set as a minimum, but it ma#es more e%%icient co!e w,en a ,ig,er instruction set is use!. +,e %ollowing table in!icates t,ings t,at are impro(e! %or eac, successi(e instruction set e9tension. Instruction 3ear "unctions that are improved set introduced 88?2 88?3 888?3 88?'.1 2001 200' 2004 200/ minimum reGuirement %or (ector class librar) %loating point ,orizontalJa!! permute, blen! an! loo#up %unctions, integer ,orizontalJa!!, integer abs select, blen!, ,orizontalJan!, ,orizontalJor, integer ma9/min, integer multipl) @32 an! 4' bitA, integer !i(i!e @32 bitA, 4'6bit integer compare @LL, QLA, %loating point roun!, truncate, %loor, ceil. 4'6bit integer compare @P, PL, O, OLA. 4' bit integer ma9, min all operations on 2.46bit %loating point (ectorsD Vec1%, Vec'! on 1216bit integer (ectorsD compare, ,orizontalJa!!J9, rotateJle%t, blen!, loo#up %loating point co!e containing multiplication %ollowe! b) a!!ition %loating point co!e containing multiplication %ollowe! b) a!!ition All operations on 2.46bit integer (ectorsD Vec32c, Vec32uc, Vec14s, Vec14us, Vec1i, Vec1ui, Vec'G, Vec'uG

88?'.2 AVB B-$ A<= onl) F<A' A<= onl) F<A3 AVB2

2001 2011 2011 2011 2012 2013

+,e (ector class librar) ma#es it possible to compile %or !i%%erent instruction sets %rom t,e same source co!e b) using preprocessing branc,es. =i%%erent (ersions are ma!e simpl) b) recompiling t,e co!e wit, !i%%erent compiler options. +,e !esire! instruction set can be speci%ie! on t,e compiler comman! line as %ollowsD Instruction 4nu compiler Intel set compiler /inu! 88?2 88?3 6msse2 6msse3 6msse2 6msse3 Intel compiler &1 compiler 5indows /arc,Dsse2 /arc,Dsse3 /arc,Dsse2 /arc,Dsse2 6=JJ88?3JJ

888?3 88?'.1 88?'.2 AVB B-$

6mssse3 6msse'.1 6msse'.2

6mssse3 6msse'.1 6msse'.2

/arc,Dssse3 /arc,Dsse'.1 /arc,Dsse'.2 /arc,Da(9 not a(ailable

/arc,Dsse2 6=JJ888?3JJ /arc,Dsse2 6=JJ88?'J1JJ /arc,Dsse2 6=JJ88?'J2JJ /arc,Da(9 /arc,Da(9 6=JJB-$JJ not a(ailable not a(ailable /arc,Da(9 6=JJAVB2JJ

6ma(9 6ma(9 6%abi6(ersionL0 6ma(9 not a(ailable 6m9op 6%abi6(ersionL0 6m%ma' 6m%ma not a(ailable 6m%ma

F<A' F<A3 AVB2

not a(ailable /:%ma /arc,Da(92

6ma(92 6ma(92 6%abi6(ersionL0

+,e <icroso%t compiler supports onl) a %ew o% t,e instruction sets, but t,e remaining instruction sets can be speci%ie! as !e%ines w,ic, are !etecte! in t,e preprocessing !irecti(es o% t,e (ector class librar). +,e F<A3 an! F<A' instruction sets are not ,an!le! !irectl) b) an) co!e in t,e (ector class librar), but b) t,e compiler. +,e compiler will automaticall) combine a %loating point multiplication an! a subseGuent a!!ition or subtraction into a single instruction. +,ere is no a!(antage in using t,e 2.46bit %loating point (ector classes @Vec1%, Vec'!A unless t,e AVB instruction set is speci%ie!, but it can be con(enient to use t,ese classes an)wa) i% t,e same source co!e is use! wit, an! wit,out AVB. ?ac, 2.46bit (ector will simpl) be split up into two 1216bit (ectors w,en compiling wit,out AVB. *i#ewise, a 2.46bit integer (ector @e.g. Vec1iA will be split up into two 1216bit (ectors w,en compiling wit,out AVB2. t is recommen!e! to ma#e an automatic &$; !ispatc,er t,at !etects at runtime w,ic, instruction sets are supporte! b) t,e actual &$; an! operating s)stem, an! selects t,e best (ersion o% t,e co!e accor!ingl). For e9ample, )ou ma) compile t,e co!e t,ree times %or t,e t,ree !i%%erent instruction setsD 88?2, 88?'.1 an! AVB. +,e &$; !ispatc,er s,oul! t,en set a %unction pointer to point to t,e appropriate (ersion. Iou can use t,e %unction instrset.detect @see below, page /2A to !etect t,e supporte! instruction set. +,e %ile !ispatc,Je9ample.cpp s,ows an e9ample o% ,ow to ma#e a &$; !ispatc,er t,at selects t,e appropriate co!e (ersion. +,e critical part o% t,e program is calle! t,roug, a %unction pointer. +,is %unction pointer initiall) points to t,e &$; !ispatc,er, w,ic, is acti(ate! t,e %irst time t,e %unction is calle!. +,e &$;

!ispatc,er c,anges t,e %unction pointer to point to t,e best (ersion o% t,e co!e, an! t,en continues in t,e selecte! co!e. +,e ne9t time t,e %unction is calle!, t,e call goes !irectl) to t,e rig,t (ersion o% t,e co!e wit,out calling t,e &$; !ispatc,er %irst. t is probabl) not necessar) to ma#e a branc, %or instruction sets prior to 88?2 because ol! computers wit,out 88?2 are rarel) in use to!a), an! certainl) not %or !eman!ing applications. +,e AVB2 instruction set is not )et a(ailable in an) &$; @_ul) 2012A, but it is supporte! in t,e compilers. t is not recommen!e! to ma#e automatic &$; !ispatc,ing %or AVB2 until it can be properl) teste!, because t,e AVB2 compiler support is not %ull) stable )et. @+,e AVB2 support in t,e (ector class librar) ,as been teste! wit, ntelZs emulatorA. +,ere is an important restriction w,en )ou are combing co!e compile! %or !i%%erent instruction setsD =o not trans%er an) !ata as vectors between !i%%erent pieces o% co!e t,at are compile! %or !i%%erent instruction sets, because t,e (ectors ma) be represente! !i%%erentl) un!er t,e !i%%erent instruction sets. <ore speci%icall), 2.46bit %loating point (ectors are represente! !i%%erentl) w,en compile! wit, an! wit,out AVB, an! 2.46bit integer (ectors are represente! !i%%erentl) w,en compile! wit, an! wit,out AVB2. t is recommen!e! to trans%er t,e !ata as arra)s instea! between !i%%erent parts o% t,e program t,at are compile! %or !i%%erent instruction sets. +,e %ollowing %unctions, !e%ine! in t,e %ile instrsetJ!etect.cpp, can be use! %or !etecting at runtime w,ic, instruction set is supporte!. function description int instrsetJ!etect@(oi!A returns one o% t,ese (aluesD 0D 10314 instruction set 1D or abo(e L 88? supporte! b) &$; @not testing %or -.8. supportA 2D or abo(e L 88?2 3D or abo(e L 88?3 'D or abo(e L 8upplementar) 88?3 @888?3A .D or abo(e L 88?'.1 4D or abo(e L 88?'.2 /D or abo(e L AVB supporte! b) &$; an! -.8. 1D or abo(e L AVB2 poor

efficiency

function description

bool ,asF<A3@(oi!A returns true i% F<A3 is supporte!

efficiency

poor

function description efficiency

bool ,asF<A'@(oi!A returns true i% F<A' is supporte! poor

function description efficiency

bool ,asB-$@(oi!A returns true i% B-$ is supporte! poor

)erformance considerations
Comparison of alternative met ods for writing (I)* code
+,e 8 <= @8ingle nstruction <ultiple =ataA instructions pla) an important role w,en so%tware per%ormance ,as to be optimize!. 8e(eral !i%%erent wa)s o% writing 8 <= co!e are !iscusse! below. 'ssembly code Assembl) programming is t,e ultimate wa) o% optimizing co!e. Almost e(er)t,ing is possible in assembl) co!e, but it is Guite te!ious an! error6prone. +,ere are %ar more t,an a t,ousan! !i%%erent instructions, an! it is Guite !i%%icult to remember w,ic, instruction belongs to w,ic, instruction set e9tension. Assembl) co!e is !i%%icult to !ocument, !i%%icult to !ebug an! !i%%icult to maintain. Intrinsic functions 8e(eral compilers support intrinsic %unctions t,at are !irect representations o% mac,ine instructions. A big a!(antage o% using intrinsic %unctions rat,er t,an assembl) co!e is t,at t,e compiler ta#es care o% register allocation, %unction calling con(entions an! ot,er !etails w,ic, are o%ten !i%%icult to #eep trac# o% w,en writing assembl) co!e. Anot,er a!(antage is t,at t,e compiler can optimize t,e co!e %urt,er b) suc, met,o!s as sc,e!uling, interproce!ural optimization, %unction inlining, constant propagation, common sube9pression elimination, loop in(ariant co!e motion, in!uction (ariables, etc. 8uc, optimizations are not alwa)s use! in assembl) co!e because t,e) ma#e t,e

co!e unwiel!) an! unmanageable. &onseGuentl), t,e combination on intrinsic co!e an! a goo! optimizing compiler can o%ten pro!uce more e%%icient co!e t,an w,at a !ecent assembl) programmer woul! !o. A !isa!(antage o% intrinsic %unctions is t,at t,ese %unctions ,a(e long names t,at are !i%%icult to remember an! w,ic, ma#e t,e co!e loo# aw#war!. Intel vector classes ntel ,as publis,e! a number o% (ector classes in t,e %orm o% t,ree &CC ,ea!er %iles name! %(ec.,, !(ec., an! i(ec.,. +,ese are simpler to use t,an t,e intrinsic %unctions, but un%ortunatel) t,e ntel (ector classes pro(i!e onl) t,e most basic %unctionalit), an! ntel ,as !one (er) little to promote, support or !e(elop it. +,e ntel (ector classes ,a(e no wa) o% con(erting !ata between arra)s an! (ectors. +,is lea(es us wit, no wa) o% putting !ata into a (ector ot,er t,an speci%)ing eac, element separatel) 6 w,ic, prett) muc, !estro)s t,e a!(antage o% using (ectors. +,e ntel (ector classes wor# onl) wit, ntel an! <8 compilers. This vector class library +,e present (ector class librar) ,as se(eral important %eatures, liste! on page 3. t pro(i!es t,e same le(el o% optimization as t,e intrinsic %unctions, but it is muc, easier to use. +,is ma#es it possible to ma#e optimal use o% t,e 8 <= instructions wit,out t,e nee! to remember t,e 1000C !i%%erent instructions or intrinsic %unctions. t also ta#es awa) t,e ,assle o% remembering w,ic, instruction belongs to w,ic, instruction set e9tension an! ma#ing !i%%erent co!e (ersions %or !i%%erent instruction sets. 'utomatic vectorization A goo! optimizing compiler is able to automaticall) trans%orm linear co!e to (ector co!e in simple cases. +)picall), a goo! compiler will (ectorize an algorit,m t,at loops t,roug, an arra) an! !oes some calculations on eac, arra) element. Automatic (ectorization is t,e easiest wa) o% generating 8 <= co!e, an! woul! recommen! to use t,is met,o! w,en it wor#s. Automatic (ectorization ma) %ail or pro!uce suboptimal co!e in t,e %ollowing casesD w,en t,e algorit,m is too comple9 w,en !ata ,a(e to be re6arrange! in or!er to %it into (ectors an! it is not ob(ious to t,e compiler ,ow to !o t,is or w,en ot,er parts o% t,e co!e nee!s to be c,ange! to ,an!le t,e re6arrange! !ata w,en it is not #nown to t,e compiler w,ic, !ata sets are bigger or smaller t,an t,e (ector size w,en it is not #nown to t,e compiler w,et,er t,e size o% a !ata set is a multiple o% t,e (ector size or not w,en t,e algorit,m in(ol(es calls to %unctions t,at are !e%ine! elsew,ere or cannot be inline! an! w,ic, are not rea!il) a(ailable in (ector (ersions w,en t,e algorit,m in(ol(es branc,es t,at are not easil) (ectorize!

w,en %loating point operations ,a(e to be reor!ere! or trans%orme! an! it is not #nown to t,e compiler w,et,er t,ese trans%ormations are permissible wit, respect to precision, o(er%low, etc.

+,e present (ector class librar) is inten!e! as a goo! alternati(e w,en automatic (ectorization %ails to pro!uce optimal co!e %or an) o% t,ese reasons.

C oise of compiler and function libraries


+,e (ector class librar) ,as support %or t,e %ollowing t,ree compilersD &icrosoft #isual 1tudio +,is is a (er) popular compiler %or >in!ows because it ,as a goo! an! user %rien!l) =? @ ntegrate! =e(elopment ?n(ironmentA. <a#e sure )ou are compiling %or t,e Eunmanage!E (ersion, i.e. not using t,e .net %ramewor#. +,e <icroso%t compiler optimizes reasonabl) well, but not as goo! as t,e ntel an! Gnu compilers. Intel 1tudio 6 Intel Composer +,is compiler optimizes (er) well. ntel also pro(i!es some o% t,e best optimize! %unction libraries %or mat,ematical an! ot,er purposes. ;n%ortunatel), t,e ntel compilers an! some o% t,e %unction libraries %a(orize ntel &$;s, an! o%ten pro!uce co!e t,at runs slower t,an necessar) on &$;s o% an) ot,er bran! t,an ntel. t is possible to wor# aroun! t,is limitation %or t,e ntel %unction libraries an! is come cases also %or t,e compiler. 8ee m) blog an! m) &CC manual %or !etails. ntelZs compilers are a(ailable %or >in!ows, *inu9 an! <ac plat%orms. 4nu C++ compiler +,is compiler pro!uce! t,e best optimizations in m) tests. +,e gCC compiler is a(ailable %or all 914 an! 91464' plat%orms. +,e mat, %unctions in t,e glibc librar) are currentl) not %ull) optimize!.

C oosing t e optimal vector si!e and precision


+,e time it ta#es to ma#e a (ector operation suc, as a!!ition or multiplication t)picall) !epen!s on t,e total number o% bits in t,e (ector rat,er t,an t,e number o% elements. For e9ample, it ta#es t,e same time to ma#e a (ector a!!ition wit, (ectors o% ' single precision %loats @-ec5fA as wit, (ectors o% two !ouble precision %loats @-ec/dA. *i#ewise, it ta#es t,e same time to a!! two integer (ectors w,et,er t,e (ectors ,a(e %our 326bit integers @ -ec5iA or eig,t 146bit integers @-ec8sA. +,ere%ore, it is a!(antageous to use t,e lowest precision or resolution t,at %its t,e !ata. t ma) e(en be wort,w,ile to mo!i%) a %loating point

algorit,m to re!uce loss o% precision i% t,is allows )ou to use single precision (ectors rat,er t,an !ouble precision (ectors. "owe(er, )ou s,oul! also ta#e into account t,e time it ta#es to con(ert !ata %rom one precision to anot,er. +,ere%ore, it is not goo! to mi9 !i%%erent precisions. +,e total (ector size is eit,er 121 bits or 2.4 bits. >,et,er it is a!(antageous to use t,e biggest (ector size !epen!s on t,e instruction set. +,e 2.46bit %loating point (ectors @-ec8f an! -ec5dA are onl) a!(antageous w,en t,e AVB instruction set is a(ailable an! enable!. +,e 2.46bit integer (ectors @ -ec6/c, -ec10s, -ec8i, -ec5J, etc.A are onl) a!(antageous un!er t,e AVB2 instruction set.

Putting data into vectors


+,e !i%%erent wa)s o% putting !ata into (ectors are liste! on page /. % t,e (ector elements are constants #nown at compile time, t,en t,e %astest wa) is to use a constructorD
-ec5i a 1%; -ec5i b /, 6, 5, (%; // a ! // b ! 1, 1, 1, 1% /, 6, 5, (%

% t,e (ector elements are not constants t,en t,e %astest wa) is to loa! %rom an arra) wit, t,e met,o! load or load.a. "owe(er, it is not goo! to loa! !ata %rom an arra) imme!iatel) a%ter writing t,e !ata elements to t,e arra) one b) one, because t,is causes a Estore %orwar!ing stallE @see m) microarc,itecture manualA. +,is is illustrate! in t,e %ollowing e9amplesD
// Example 1. OaHe vector 7it) constructor int OaHeOyCata int i%; // maHe 7)atever data 7e need void Co1omet)ing -ec5i ? data%; // )andle t)ese data const int datasi4e ! 1"""; // total number data elements ... for int i ! "; i # datasi4e; i $! 5% & -ec5i d OaHeOyCata i%, OaHeOyCata i$1%, OaHeOyCata i$/%, OaHeOyCata i$6%%; Co1omet)ing d%; * // Example /. Toad from small array int OaHeOyCata int i%; // maHe 7)atever data 7e need void Co1omet)ing -ec5i ? data%; // )andle t)ese data const int datasi4e ! 1"""; // total number data elements ... for int i ! "; i # datasi4e; i $! 5% & int data5[5]; for int G ! "; G # 5; G$$% & data5[G] ! OaHeOyCata i$G%; *

// store for7arding stall )ere= -ec5i d ! -ec5i %.load data5%; Co1omet)ing d%;

// Example 6. OaHe array a little bigger int OaHeOyCata int i%; // maHe 7)atever data 7e need void Co1omet)ing -ec5i ? data%; // )andle t)ese data const int datasi4e ! 1"""; // total number data elements ... for int i ! "; i # datasi4e; i $! 8% & int data8[8]; for int G ! "; G # 8; G$$% & data8[G] ! OaHeOyCata i$G%; * -ec5i d; for int H ! "; H # 8; H $! 5% & d.load data8 $ H%; Co1omet)ing d%; * * // Example 5. OaHe array full si4e int OaHeOyCata int i%; // maHe 7)atever data 7e need void Co1omet)ing -ec5i ? data%; // )andle t)ese data const int datasi4e ! 1"""; // total number data elements ... int data1"""[datasi4e]; int i; for i ! "; i # datasi4e; i$$% & data1"""[i] ! OaHeOyCata i%; * -ec5i d; for i ! "; i # datasi4e; i $! 5% & d.load data1""" $ i%; Co1omet)ing d%; *

n e9ample 1, we are combining %our !ata elements into (ector d b) calling a constructor wit, %our parameters. +,is ma) not be t,e most e%%icient wa) because it reGuires se(eral instructions to combine t,e %our numbers into a single (ector. n e9ample 2, we are putting t,e %our (alues into an arra) an! t,en loa!ing t,e arra) into a (ector. +,is is causing a so6calle! store %orwar!ing stall. A store %orwar!ing stall occurs in t,e &$; ,ar!ware w,en !oing a large rea! @,ere 121 bitsA imme!iatel) a%ter a smaller write @,ere 32 bitsA to t,e same a!!ress range. +,is causes a !ela) o% 10 6 20 cloc# c)cles. n e9ample 3, we are putting eig,t (alues into an arra) an! t,en rea!ing %our elements at a time. % we assume t,at it ta#es more t,an 10 6 20 cloc# c)cles to

call OaHeOyCata %our times t,en t,e %irst %our elements o% t,e arra) will ,a(e su%%icient time to ma#e it into t,e le(el61 cac,e w,ile we are writing t,e ne9t %our elements. +,is !ela) is su%%icient to a(oi! t,e store %orwar!ing stall. n e9ample ', we are putting a t,ousan! elements into an arra) be%ore loa!ing t,em. +,is is certain to a(oi! t,e store %orwar!ing stall. ?9ample 3 an! ' are li#el) to be t,e best solutions. A !isa!(antage o% e9ample 3 is t,at we nee! an e9tra loop. A !isa!(antage o% e9ample ' is t,at t,e large arra) ta#es more cac,e space.

+ en t e data si!e is not a multiple of t e vector si!e


t is ob(iousl) easier to (ectorize a !ata set w,en t,e number o% elements in t,e !ata set is a multiple o% t,e (ector size. "ere, we will !iscuss !i%%erent wa) o% ,an!ling t,e situation w,en t,e !ata !o not %it into a w,ole number o% (ectors. >e will use t,e simple e9ample o% a!!ing 13' integers store! in an arra). 7- handling the remaining data one by one
const int datasi4e ! 165; const int vectorsi4e ! 8; const int regularpart ! datasi4e ? >vectorsi4e%; // ! 1/8 // ALC>ing 7it) >vectorsi4e 7ill round do7n to nearest // lo7er multiple of vectorsi4e. F)is 7orHs only if // vectorsi4e is a po7er of /% int mydata[datasi4e]; ... // initiali4e mydata -ec8i sum1 "%, temp; int i; // loop for 8 numbers at a time for i ! "; i # regularpart; i $! vectorsi4e% & temp.load mydata$i%; // load 8 elements sum1 $! temp; // add 8 elements * int sum ! "; // loop for t)e remaining 0 numbers for ; i # datasi4e; i$$% & sum $! mydata[i]; * sum $! )ori4ontal.add sum1%; // add t)e vector sum

8- handling the remaining data with a smaller vector size


const int datasi4e ! 165; const int vectorsi4e ! 8; const int regularpart ! datasi4e ? >vectorsi4e%; // ! 1/8

int mydata[datasi4e]; ... // initiali4e mydata -ec8i sum1 "%, temp; int sum ! "; int i; // loop for 8 numbers at a time for i ! "; i # regularpart; i $! vectorsi4e% & temp.load mydata$i%; // load 8 elements sum1 $! temp; // add 8 elements * sum ! )ori4ontal.add sum1%; // sum of first 1/8 numbers if datasi4e > i 3! 5% & // get four more numbers -ec5i sum/; sum/.load mydata$i%; i $! 5; sum $! )ori4ontal.add sum/%; * // loop for t)e remaining / numbers for ; i # datasi4e; i$$% & sum $! mydata[i]; *

9- use partial load for the last vector


const int datasi4e ! 165; const int vectorsi4e ! 8; int mydata[datasi4e]; ... // initiali4e mydata -ec8i sum1 "%, temp; // loop for 8 numbers at a time for int i ! "; i # datasi4e; i $! vectorsi4e% & if datasi4e > i 3! vectorsi4e% & temp.load mydata$i%; // load 8 elements * else & // load t)e last 0 elements temp.load.partial datasi4e>i, mydata$i%; * sum1 $! temp; // add 8 elements * int sum ! )ori4ontal.add sum1%; // vector sum

:- read past the end of the array and ignore e!cess data
const int datasi4e ! 165; const int vectorsi4e ! 8; int mydata[datasi4e]; ... // initiali4e mydata

-ec8i sum1 "%, temp; // loop for 8 numbers at a time, reading 160 numbers for int i ! "; i # datasi4e; i $! vectorsi4e% & temp.load mydata$i%; // load 8 elements if datasi4e > i # vectorsi4e% & // set excess data to 4ero // t)is is faster t)an load.partial% temp.cutoff datasi4e > i%; * sum1 $! temp; // add 8 elements * int sum ! )ori4ontal.add sum1%; // vector sum

;- ma0e array bigger and set e!cess data to zero


const int datasi4e ! 165; const int vectorsi4e ! 8; // round up datasi4e to 160 const int arraysi4e ! datasi4e $ vectorsi4e > 1% ? int mydata[arraysi4e]; int i; ... // initiali4e mydata

>vectorsi4e%;

// set excess data to 4ero for i ! datasi4e; i # arraysi4e; i$$% & mydata[i] ! "; * -ec8i sum1 "%, temp; // loop for 8 numbers at a time, reading 160 numbers for i ! "; i # arraysi4e; i $! vectorsi4e% & temp.load mydata$i%; // load 8 elements sum1 $! temp; // add 8 elements * int sum ! )ori4ontal.add sum1%; // vector sum

t is clearl) a!(antageous to increase t,e arra) size to a multiple o% t,e (ector size, as in case . abo(e. *i#ewise, i% )ou are storing (ector !ata to an arra), t,en it is an a!(antage to ma#e t,e result arra) bigger to ,ol! t,e e9cess !ata. % t,is is not possible t,en use store.partial to write t,e last partial (ector to t,e arra). t is usuall) possible to rea! past t,e en! o% an arra), as in case ' abo(e, wit,out causing problems. "owe(er, t,ere is a t,eoretical possibilit) t,at t,e arra) is place! at t,e (er) en! o% t,e rea!able !ata area so t,at t,e program will cras, w,en attempting to rea! %rom an illegal a!!ress past t,e en! o% t,e (ali! !ata area. +o consi!er t,is problem, we nee! to loo# at eac, possible met,o! o% !ata

storageD aA An arra) !eclare! insi!e a %unction, an! not static, is store! on t,e stac#. +,e subseGuent a!!resses on t,e stac# will contain t,e return a!!ress an! parameters %or t,e %unction, %ollowe! b) local !ata, parameters, an! return a!!ress o% t,e ne9t ,ig,er %unction all t,e wa) up to main. n t,is case t,ere is plent) o% e9tra !ata to rea! %rom. bA A static or global arra) is store! in static !ata memor). +,e static !ata area is o%ten %ollowe! b) librar) !ata, e9ception ,an!ler tables, lin# tables, etc. +,ese tables can be seen b) reGuesting a map %ile %rom t,e lin#er. cA =ata allocate! wit, t,e operator ne7 are store! on t,e ,eap. ,a(e no in%ormation o% t,e size o% t,e en! no!e in a ,eap. !A % an arra) is !eclare! insi!e a class !e%inition t,en case @aA, @bA or @cA abo(e applies, !epen!ing on ,ow t,e class instance @obFectA is create!. +,ese problems can be a(oi!e! eit,er b) ma#ing t,e arra) bigger or b) aligning t,e arra) to an a!!ress !i(isible b) 14 %or 1216bit (ectors or !i(isible b) 32 %or 2.46bit (ectors. +,e memor) page size is at least ' #b)tes, an! alwa)s a power o% 2. % t,e arra) is aligne! b) t,e (ector size @14 or 32A t,en t,e page boun!aries are certain to coinci!e wit, (ector boun!aries. +,is ma#es sure t,at t,ere is no memor) page boun!ar) between t,e en! o% t,e arra) an! t,e ne9t (ector6size boun!ar). +,ere%ore, we can rea! up to t,e ne9t (ector6size boun!ar) wit,out t,e ris# o% crossing a boun!ar) to an in(ali! memor) page. A %urt,er a!(antage o% aligning t,e arra) b) 14 or 32 is t,at rea!ing an! writing (ectors %rom an aligne! arra) ma) be %aster. +o align an arra) b) 14 in >in!ows, writeD
..declspec align 10%% int mydata[165];

n uni96li#e s)stems, writeD


int mydata[165] ..attribute.. aligned 10%%%;

t is alwa)s recommen!e! to align large arra)s %or per%ormance reasons i% t,e co!e uses (ectors. ;n%ortunatel), it ma) be more complicate! to align arra)s create! wit, operator ne7.

,sing multiple accumulators


&onsi!er t,is %unction w,ic, a!!s a long list o% %loating point numbersD
double add.long.list double const ' p, int n% & int n1 ! n ? >5%; // round do7n n to multiple of 5 -ec5d sum "."%; int i; for i ! "; i # n1; i $! 5% & sum $! -ec5d %.load p $ i%; // add 5 numbers * // add any remaining numbers

sum $! -ec5d %.load.partial n > i, p $ i%; return )ori4ontal.add sum%; *

n t,is e9ample, we ,a(e a loop6carrie! !epen!enc) c,ain @see m) &CC manualA. +,e (ector a!!ition insi!e t,e loop ,as a latenc) o% t)picall) 3 6 . cloc# c)cles. As eac, a!!ition ,as to wait %or t,e result o% t,e pre(ious a!!ition, t,e loop will ta#e 3 6 . cloc# c)cles per iteration. "owe(er, t,e t,roug,put o% %loating point a!!itions is t)picall) one (ector a!!ition per cloc# c)cle. +,ere%ore, we are %ar %rom %ull) utilizing t,e capacit) o% t,e %loating point a!!er. n t,is situation, we can !ouble t,e spee! b) using two accumulatorsD
double add.long.list double const ' p, int n% & int n/ ! n ? >8%; // round do7n n to multiple of 8 -ec5d sum1 "."%, sum/ "."%; int i; for i ! "; i # n/; i $! 8% & sum1 $! -ec5d %.load p $ i%; // add 5 numbers sum/ $! -ec5d %.load p $ i $ 5%; // 5 more numbers * if n > i 3! 5% & // add 5 more numbers sum1 $! -ec5d %.load p $ i%; i $! 5; * // add any remaining numbers sum/ $! -ec5d %.load.partial n > i, p $ i%; return )ori4ontal.add sum1 $ sum/%; *

"ere, t,e a!!ition to sum/ can begin be%ore t,e a!!ition to sum1 is %inis,e!. +,e loop still ta#es 3 6 . cloc# c)cles per iteration, but t,e number o% a!!itions !one per loop iteration is !ouble!. t ma) e(en be wort,w,ile to ,a(e t,ree or %our accumulators in t,is case i% n is (er) big. n general, i% we want to pre!ict w,et,er it is a!(antageous to ,a(e more t,an one accumulator, we %irst ,a(e to see i% t,ere is a loop6carrie! !epen!enc) c,ain. % t,e per%ormance is not limite! b) a loop6carrie! !epen!enc) c,ain t,en t,ere is no nee! %or multiple accumulators. 2e9t, we ,a(e to loo# at t,e latenc) an! t,roug,put o% t,e instructions insi!e t,e loop. Floating point a!!ition, subtraction an! multiplication all ,a(e latencies o% t)picall) 3 6 . cloc# c)cles an! a t,roug,put o% one (ector a!!ition or subtraction plus one (ector multiplication per cloc# c)cle. +,ere%ore, i% t,e loop6carrie! !epen!enc) c,ain in(ol(es %loating point a!!ition, subtraction or multiplication` an! t,e total number o% %loating point operations per loop iteration is lower t,an t,e ma9imum t,roug,put, t,en it ma) be a!(antageous to ,a(e two accumulators, or per,aps more t,an two.

+,ere is rarel) an) reason to ,a(e multiple accumulators in integer co!e, because an integer (ector a!!ition ,as a latenc) o% Fust 1 or 2 cloc# c)cles.

,sing multiple t reads


$er%ormance can be impro(e! b) !i(i!ing t,e wor# between multiple t,rea!s on processors wit, multiple &$; cores. +,is tec,niGue is outsi!e t,e scope o% t,e present manual. +,e (ector class librar) is t,rea!6sa%e as long as t,e same (ector is not accesse! %rom multiple t,rea!s simultaneousl). +,e %loating point control wor! @see p. 30A is not s,are! between t,rea!s.

$rror conditions
Runtime errors
+,e (ector class librar) is generall) not pro!ucing runtime error messages. An in!e9 t,at is out o% range pro!uces be,a(ior t,at is implementation6!epen!ent. +,is means t,at t,e be,a(ior ma) be !i%%erent %or !i%%erent instruction sets or %or !i%%erent (ersions o% t,e (ector class librar). For e9ample, an attempt to rea! a (ector element wit, an in!e9 t,at is out o% range ma) result in (arious be,a(iors, suc, as pro!ucing zero, ta#ing t,e in!e9 mo!ulo t,e (ector size, gi(ing t,e last element, or pro!ucing an arbitrar) (alue. *i#ewise, an attempt to write a (ector element wit, an in!e9 t,at is out o% range ma) (ariousl) ta#e t,e in!e9 mo!ulo t,e (ector size, write t,e last element, or !o not,ing. +,is applies to %unctions suc, as insert, extract, load.partial, store.partial, cutoff, permute, blend an! looHup. +,e same applies to a bit6in!e9 t,at is out o% range in %unctions li#e set.bit, get.bit, rotate, an! s,i%t operators @##, 33A. +,e onl) allowe! (alues %or a 5oolean (ector element are 0 @%alseA an! 61 @trueA. +,e be,a(ior %or ot,er (alues is implementation !epen!ent an! possibl) inconsistent. For e9ample, t,e be,a(ior o% t,e select %unction w,en t,e 5oolean selector input is a mi9ture o% 0 an! 1 bits !epen!s on t,e instruction set. For instruction sets prior to 88?'.1, it will select between t,e operan!s bit6b)6bit. For 88?'.1 an! ,ig,er it will select integer (ectors b)te6b)6b)te, using t,e le%tmost bit o% eac, b)te in t,e selector input. For %loating point (ectors un!er 88?'.1 an! ,ig,er, it will use onl) t,e le%tmost bit @sign bitA o% t,e selector. An integer !i(ision b) a (ariable t,at is zero will usuall) pro!uce a runtime e9ception.

A %loating point o(er%low will usuall) pro!uce in%init), %loating point un!er%low pro!uces zero, an! an in(ali! %loating point operation ma) pro!uce not6a6number @2A2A. Floating point e9ceptions can occur onl) i% e9ceptions are unmas#e!.

Compile%time errors
nteger (ector !i(ision b) a const.int or const.uint can pro!uce a compile6 time error message w,en t,e !i(isor is zero or out o% range. +,e error message ma) not be as in%ormati(e as we coul! wis,, !ue to t,e limitations o% template metaprogramming. +,e error message ma) possibl) contain t,e te9t E8taticJerrorJc,ec#O%alsePE. &ombination o% incompatible (ector classes, or ot,er s)nta9 errors pro!uce compile6time error messages. +,ese error messages ma) be Guite long an! con%using !ue to o(erloa!ing an! templates, but generall) in!icating t,e line number o% t,e error. Eerror &2/17D %ormal parameter wit, JJ!eclspec@align@Z14ZAA wonZt be aligne!E. +,e <icroso%t compiler cannot ,an!le (ectors as %unction parameters. +,e easiest solution is to c,ange t,e parameter to a const re%erence, e.g.D
-ec5f my.function -ec5f const ? x% & ... *

Link errors
Eunresol(e! e9ternal s)mbol JJintelJcpuJin!icatorE. +,is lin# error occurs w,en )ou are using ntelZs 8V<* librar) wit,out inclu!ing a &$; !ispatc,er. *in# in t,e librar) libircmt.lib to use ntelZs own &$; !ispatc, %unction %or ntel processors, or use an obFect %ile %rom t,e asmlib librar) un!er Eintel!ispatc,patc,E %or best per%ormance on all bran!s o% processors. 8ee m) blog an! m) &CC manual %or !etails.

"ile list
file name Vector&lass.p!% (ectorclass., purpose instructions @t,is %ileA top6le(el &CC ,ea!er %ile. +,is will inclu!e se(eral ot,er ,ea!er %iles, accor!ing to t,e in!icate! instruction set. !etection o% w,ic, instruction set t,e co!e is compile!

instrset.,

%or, an! (arious common !e%initions. nclu!e! b) (ectorclass., (ectori121., !e%ines classes, operators an! %unctions %or integer (ectors wit, a total size o% 121 bits. nclu!e! b) (ectorclass., !e%ines classes, operators an! %unctions %or integer (ectors wit, a total size o% 2.4 bits %or t,e AVB2 instruction set. nclu!e! b) (ectorclass., i% appropriate !e%ines classes, operators an! %unctions %or integer (ectors wit, a total size o% 2.4 bits %or instruction set lower t,an AVB2. nclu!e! b) (ectorclass., i% appropriate !e%ines classes, operators an! %unctions %or %loating point (ectors wit, a total size o% 121 bits. nclu!e! b) (ectorclass., !e%ines classes, operators an! %unctions %or %loating point (ectors wit, a total size o% 2.4 bits %or t,e AVB an! later instruction sets. nclu!e! b) (ectorclass., i% appropriate !e%ines classes, operators an! %unctions %or %loating point (ectors wit, a total size o% 2.4 bits %or instruction sets lower t,an AVB. nclu!e! b) (ectorclass., i% appropriate optional ,ea!er %ile %or mat,ematical (ector %unction libraries optional ,ea!er %ile %or con(ersion o% integer (ectors to !ecimal an! ,e9a!ecimal A8& number strings an! (ice (ersa optional ,ea!er %ile %or 36!imensional (ectors optional ,ea!er %ile %or comple9 numbers an! comple9 (ectors optional ,ea!er %ile %or Guaternions optional %unctions %or !etecting w,ic, instruction set is supporte! at runtime e9ample o% ,ow to ma#e automatic &$; !ispatc,ing Gnu general public license c,ange log

(ectori2.4.,

(ectori2.4e.,

(ector%121.,

(ector%2.4.,

(ector%2.4e.,

special/(ectormat,., special/!ecimal.,

special/(ector3!., special/comple9(ec., special/Guaternion., instrsetJ!etect.cpp !ispatc,Je9ample.cpp license.t9t c,angelog.t9t

$!amples
+,is e9ample calculates t,e pol)nomial 2Yx2 6 .Yx C 1 on a %loating point (ector. +,e %unction parameter x is !eclare! as a const re%erence in or!er to a(oi! alignment problems in t,e <icroso%t compiler. +,e parameters a, b an! c are !eclare! static so t,at t,e) !onZt nee! to be initialize! at e(er) %unction call.
-ec5f polynomial -ec5f const ? x% & static const -ec5f a /."f%, b >(."f%, c 1."f%; return a ' x $ b% ' x $ c; *

+,e ne9t e9ample transposes a '9' matri9.


void transpose float matrix[5][5]% & -ec8f ro7"1, ro7/6, col"1, col/6; // load first t7o ro7s ro7"1.load ?matrix["]["]%; // load next t7o ro7s ro7/6.load ?matrix[/]["]%; // reorder into columns col"1 ! blend8f#",5, 8,1/,1,(, <,163 ro7"1, ro7/6%; col/6 ! blend8f#/,0,1",15,6,;,11,1(3 ro7"1, ro7/6%; // store columns into ro7s col"1.store ?matrix["]["]%; col/6.store ?matrix[/]["]%; *

+,e ne9t e9ample ma#es a matri9 multiplication o% two '9' matri9es.


void matrixmul float A[5][5],float E[5][5],float O[5][5]%& // calculates O ! A'E -ec5f Ero7[5], Oro7[5]; int i, G; // load E as ro7s for i ! "; i # 5; i$$% & Ero7[i].load ?E[i]["]%; * // loop for A and O ro7s for i ! "; i # 5; i$$% & Oro7[i] ! -ec5f "."f%; // loop for A columns, E ro7s for G ! "; G # 5; G$$% & Oro7[i] $! Ero7[G] ' A[i][G]; * * // store O for i ! "; i # 5; i$$% & Oro7[i].store ?O[i]["]%;

+,e ne9t e9ample ma#es a table o% t,e sin %unction an! gets sin@9A an! cos@9A b) table loo#up.
+include #mat).)3 +ifndef O.8K +define O.8K +endif // define pi if not defined 6.151(</0(6(8<;<6/6850

// lengt) of table. Oust be a po7er of /. +define sin.tablelen 1"/5 // t)e accuracy of table looHup is $/> pi/sin.tablelen class 1inFable & protectedD float table[sin.tablelen]; float resolution; float rres; // 1./resolution publicD 1inFable %; // constructor -ec5f sin -ec5f const ? x%; -ec5f cos -ec5f const ? x%; *; 1inFableDD1inFable % & // constructor // compute resolution resolution ! /." ' O.8K / sin.tablelen; rres ! 1."f / resolution; // initiali4e table no need to use vectors // )ere because t)is is calculated only once% for int i ! "; i # sin.tablelen; i$$% & table[i] ! sinf float%i ' resolution%; * * -ec5f 1inFableDDsin -ec5f const ? x% & // calculate sin by table looHup -ec5i index ! round.to.int x ' rres%; // modulo tablelen eJuivalent to modulo /'pi index ?! sin.tablelen > 1; // looH up in table return looHup#sin.tablelen3 index, table%; * -ec5f 1inFableDDcos -ec5f const ? x% & // calculate cos by table looHup -ec5i index ! round.to.int x ' rres% $ sin.tablelen/5; // modulo tablelen eJuivalent to modulo /'pi

index ?! sin.tablelen > 1; // looH up in table return looHup#sin.tablelen3 index, table%;

int main % & 1inFable sintab; -ec5f a "."f, ".(f, 1."f, 1.(f%; -ec5f b ! sintab.sin a%; // b ! "."""" ".5;08 ".8510 ".<<;6% // accuracy $/> ".""6 ... return "; *

You might also like