You are on page 1of 111

DigitalResources

Electronic Survey Report 2013-010

Linguistic Comparison of
Semai Dialects

Timothy C. Phillips

Linguistic Comparison of Semai Dialects


Timothy C. Phillips

SIL International
2013

SIL Electronic Survey Report 2013-010, July 2013


2013 Timothy C. Phillips and SIL International
All rights reserved

Abstract
The Semai language of peninsular Malaysia is rich in diversity: each Semai village speaks its own variety.
This report1,2 documents the various Semai dialects and explores the relationships between them, with
regard to choice of words as well as the phonological changes that affect the pronunciation of each.
Systematic analysis shows the dialects to be different enough from each other that it is unlikely that any
one dialect can be adequately understood by all speakers. Thus, in the strictest sense of the terms, it may
be argued that the Semai varieties are a cluster of closely related languages rather than dialects.
Following local convention, however, the term dialect is maintained in this paper. This report lays the
foundation for establishing which dialect or dialects could be used to provide an optimal means of
communication across all the Semai dialects.

1
This research was carried out under the auspices of the Economic Planning Unit (EPU), part of the Prime Ministers
Department, Malaysia. My Malaysian counterpart was Dr Wong Bee Eng, lecturer at Universiti Putra Malaysia. The
original report was filed with the EPU in 2005. This version has been revised for publication in 2013.
2
I would like to thank the people of Malaysia for welcoming me and my family to their beautiful country and for their
patience as we have struggled to learn their cultures and languages. I would especially like to thank Dr Wong Bee Eng
for being willing to oversee my research and for her interest in the project. I would also like to thank the JHEOA for its
gracious assistance in opening up its research library and helping me to gain access to even the most remote areas. I
also must thank the many colleagues who suffered through early drafts of this paper offering their suggestions, and
especially Cal Rensch who spent enormous effort helping me fine-tune the final draft. Finally, I wish most of all to
thank the Semai people themselves who so willingly shared their language, their time, and often even their homes and
food. It is my deepest hope that the information presented in this report will be able to help in the preservation and
development of the Semai language and culture.

Contents

Abstract
1 Introduction
1.1 Background
1.2 Preservation of the Semai language
1.3 The contribution of Semai to historical linguistics
2 Methodology
2.1 Collection of wordlists
2.2 Language assistant questionnaires
2.3 Linguistic comparison and analysis
3 Results
3.1 Basic phonology
3.1.1 Words
3.1.2 Syllables
3.1.3 Basic inventory: consonants
3.1.4 Basic inventory: vowels
3.1.5 Deviations from basic inventory: the preploded nasals
3.1.6 Deviations from basic inventory: shifted vowels
3.1.7 Deviations from basic inventory: possible merged vowel
3.1.8 Deviations from basic inventory: diphthongs
3.1.9 Deviations from basic inventory: resyllabification
3.1.10 Deviations from basic inventory: other vowel differences
3.2 Shared word families
3.3 Basic lexical comparison
3.3.1 Simple lexical similarity
3.3.2 Average lexical similarity
3.3.3 Limitations of lexical similarity analysis
3.4 Shared phonological changes
3.4.1 Phonological changes
3.4.2 Other phonological changes
3.4.3 Summary of phonological changes
3.4.4 Aggregate phonological changes
3.4.5 Summary of phonological changes
3.4.6 Limitations of phonological change analysis
3.5 Comparative reconstruction of Proto-Semai
3.5.1 Independent phonological changes
3.5.2 The Malay peninsula as a linguistic area
3.5.3 Genetic relationships of Semai dialects
3.5.4 Reconstruction of Proto-Semai lexical items
3.6 Word borrowings
3.7 Some other observations
3.7.1 Language preservation and vitality
3.7.2 Morphology reduplication
4 Conclusions
4.1 Language vitality
4.2 The complex dialect situation
5 Recommendations for further research

iii

iv
Appendix A: Map of dialects sampled
Appendix B: Wordlist employed in study
Appendix C: Language assistant questionnaires
Appendix D: Isoglosses as determined by phonological changes
Appendix E: Proto-Semai lexical items
Appendix F: Phonology of Semai, Betau Dialect
References

1 Introduction
1.1 Background
The languages of the Orang Asli in Malaysia are classified into two groups: Aslian and Proto-Malay. The
Aslian languages form a branch of the Mon-Khmer language family and geographically range through
most of the Malay Peninsula. Aslian is divided into three groups: Northern, Central, and Southern. The
following languages make up the Central Aslian group: Semai, Temiar, Lanoh, Semnam, and Sabm.3 The
target of this research is Semai, ISO code [sea], the Aslian language that has the largest number of
speakers (Lewis 2009).
North Aslian

Jah Hut
Mon-Khmer
Aslian

Semnam
Central Aslian

Lanoh
Temiar
Semai

South Aslian

Figure 1. Aslian languages of Malaysia.4


The speakers of Semai are located primarily in the Malaysian states of Perak and Pahang.
Traditionally, these people were swidden agriculturists living in the dense jungle, often moving their
places of residence as well as moving their gardens. The Semai do hunting and gathering as well to
supplement their diets. In more recent years, some Semai have resettled near towns and taken jobs as
laborers, resulting in a more sedentary lifestyle.

No Sabm speakers have been located in many years; it is likely extinct. Historical documents indicate that Sabm
was quite close to Semnam (Phillips, forthcoming).
4
This chart is derived from Phillips (forthcoming).

Figure 2. Approximate location of Semai on the Malay peninsula.


The exact number of speakers is difficult to ascertain because many Semai continue to move about in
the deeper forest areas, and some others have moved to towns. However, the total number of Semai
speakers has recently been estimated at 42,383.5

1.2 Preservation of the Semai language


Semai speakers are proud of their language and culture and have sought to preserve their language and
culture despite interacting with other peoples for centuries. The importance of preserving indigenous
languages has received much attention in recent years at the local, national, and even international level.
The arguments for being involved in such preservation include the safeguarding of linguistic diversity,
contributing to a knowledge base for language universals, and the belief that knowledge in and of itself
is valuable. The languages of the Orang Asli in Malaysia should be considered a national treasure.
For the Semai language to be preserved, it must be studied and documented. The Semai language has
been studied by only a few researchers, and while some quality work has been done,6 the Semai
language remains largely undocumented. This project collected wordlists representative of many of the
dialects of Semai.7 The wordlists were compared linguistically in order to determine how similar the

According to population data for the year 2008 as provided by the JHEOA and displayed at the Orang Asli Museum
in Gombak, Selangor.
6
For example, Grard Diffloth spent many years studying the Semai language and has written a number of papers
regarding some of his findings.
7
Diffloth (1977) has estimated there are more than forty dialects of Semai.

3
various dialects are. In order to preserve a written record of the Semai dialects, copies of the compiled
wordlists have been turned over to the Jabatan Hal Ehwal Orang Asli as well as the Economic Planning
Unit in the Prime Ministers Department.
Ultimately, for the usage of Semai to be preserved, some form of standardization will need to take
place so that important decisions, such as orthography, can be effectively made. One of the key questions
regards determining the optimal dialect or dialects that allow adequate communication with all speakers
of Semai. Identification, documentation, and systematic comparison of the Semai dialects are critical first
steps for standardizing Semai.

1.3 The contribution of Semai to historical linguistics


The Semai language, true to its Mon-Khmer heritage, has a rich set of vowelsnearly thirty, when
counting all the nasal and length features. Furthermore, as Diffloth (1976a) has noted, Semai has
preserved a number of disyllabic and polysyllabic words, features that have largely been lost in other
Mon-Khmer languages in Southeast Asia. Thus the Semai people, as well as other speakers of Aslian
languages in Malaysia, have much to offer humanity as we endeavor to reconstruct the history of the
Mon-Khmer languages.
It is hoped that the documentation and the reconstruction of the Semai ancestor language in this
report will help to further such efforts.

2 Methodology
2.1 Collection of wordlists
A wordlist of 436 words was constructed, including words from the basic 200 Swadesh wordlist, words
that are typical of Southeast Asian languages, and words that are culturally and linguistically specific to
the speakers of Central Aslian languages. The items in the wordlist were arranged by semantic categories
and listed in Malay and English.
This wordlist was then used to elicit words from twenty-seven dialects of Semai. Dialects were
selected based on a combination of information gleaned from existing literature on Semai and from
asking the Semai themselves which areas spoke dialects different from their own. The following table
shows the locations of the dialects selected for this research. A map showing the geographic locations of
these villages is shown in Appendix A.

4
Table 1.Wordlist locations
Kampung
Batu 17

District

Batang Padang

Bidor

Batang Padang

Cluny

Batang Padang

Chinggung

State

Perak

Perak

Batang Padang

Perak

Rasau

Batang Padang

Perak

Sungkai

Batang Padang

Perak

Gopeng

Kinta

Sungai Bil
Tapah

Kampar

Batang Padang

Perak
Perak

Batang Padang

Perak

Kinta

Perak

Perak Tengah

Perak

Perak

Bota

Perak Tengah

Cenan Cerah

Cameron Highlands

Pahang

Cameron Highlands

Pahang

Cameron Highlands

Pahang

Tangkai Cermin
Relong

Cameron Highlands

Sungai Ruil

Cameron Highlands

Renglas
Terisu

Perak

Pahang
Pahang

Bertang

Lipis

Pahang

Cherong

Lipis

Pahang

Betau

Lipis

Pahang

Kuala Kenip

Lipis

Pahang

Lanai

Lipis

Pahang

Serau

Pagar

Simoi

Pos Buntu

Lipis

Pahang

Lipis

Pahang

Lipis

Pahang

Raub

Pahang

The wordlists were generally elicited using direct questioning in Bahasa Malaysia (Malay). Once the
complete list was elicited, the data were rearranged according to the similar phonetic segments
encountered. The list was then rechecked. By grouping the elicited words together according to similar
sounds (for instance, all the words containing front vowels were put together), it was easier to hear the
often-subtle differences between similar sounds.
In some cases a recording was also made of the same Semai speaker pronouncing the words that had
just been elicited. These recordings were quite helpful in clearing up remaining inconsistencies later
discovered in the elicited words, and thus often avoided the need to return to the same village for further
checking.
The elicited wordlists were then used to determine the degree of linguistic similarity between
dialects. The comparison of wordlists was used to determine the number of phonetically similar lexical
items, to discover word families, to identify phonological changes in order to establish the linguistic
relationship between the speech communities, and finally, to propose a reconstruction of several
hundred lexical items for proto-Semai.

2.2 Language assistant questionnaires


Questionnaires were administered to many of the language assistants who gave the wordlists. The
questions were mainly developed in order to help establish the reliability of the data. Information
gathered from questionnaires was also used to determine which dialects still needed to be sampled,
dialects that had not initially been selected based on the literature search.

2.3 Linguistic comparison and analysis


A variety of comparisons and analyses were carried out on the data. These included the following:
Establishing the phonemes for each dialect
Comparing the number of lexically similar items in each wordlist
Determining phonological changes in each wordlist
Correlating phonological changes and postulating a family tree for the Semai dialects
Reconstructing lexical forms of proto-Semai
Estimating the percentage of borrowed words for each dialect

3 Results
This section presents the results of the analytical methods used to understand the relationships that exist
among the varieties of Semai represented by the wordlists collected during the current research. Despite
the simplicity of the data collection method, there is a great wealth of information that can be drawn
from the data.
First, although the Semai language has many dialects and variations, it is useful to consider the
basic, overall phonology of the language. The various dialects are then compared to this basic phonology
(see section 3.1 Basic Phonology).
Second, the wordlists are lexically compared to show the degrees of lexical similarity that exist
among the different varieties of Semai (see section 3.2 Shared Word Families).
Third, an attempt is made to discern dialect boundaries by comparing shared word families; that is,
looking for dialects that could be linked together by unique sets of related words (see section 3.3 Basic
Lexical Comparison).
Fourth, consistent phonological changes from the norm are compared as another reflection of dialect
boundaries (see section 3.4 Shared Phonological Changes).
Fifth, the historical comparative method is used to postulate a family tree for the Semai dialects and
to reconstruct several hundred Proto-Semai words (see section 3.5 Comparative Reconstruction of ProtoSemai).
Lastly, some observations are made on topics that were observed but not rigorously investigated as
part of this research (see section 3.7 Some Other Observations).

3.1 Basic phonology


This section presents a generalized description of words, syllables, and inventory of sounds in the Semai
language.8

3.1.1

Words

Semai words, in good Mon-Khmer tradition, fit the following syllable template:

For a more complete summary of the phonology of one dialect, see Appendix F.

(C3 V2 (C4) )

C1 V1 C2

Minor

Major

The final syllable is regarded as the major syllable; the penultimate syllable, if present, is regarded as the
minor syllable. Semai words always have ultimate stress; that is, on the major syllable. While many
Semai words have only one syllable, the majority of Semai words have two syllables. The minor vowel V2
is usually very short, nonphonemic, epenthetic [], and its enunciation in any given word is often
optional if the two consonants are easily pronounced without the epenthetic vowel. For this reason
Semai roots are sometimes called sesquisyllabic since the minor syllable does not carry the same
weight, phonetically or phonemically, as the major syllable. The following forms, examples from the
Tapah dialect, are illustrative of Semai word shapes.
/liip/9

to swallow

/.kuu/

thunder

/mt/

eye

/s.lec/

smooth

/m.nii/

rain

/mr.s/

tiger

There are a number of words that have minor syllables with minor vowel segments (V2) other than
[]; namely [], [i], and [u]. Diffloth (1968, 1976a) has shown that these segments may be
phonologically conditioned in some cases and morphemes in other cases.10 The following examples are
from the Bota dialect.
/m.muh/

to bathe

/s.miiw/

bear

/pi.nuuy/

wind (n)

/ti.ii/

snake

/ku.re/

to dig

/ku.rool/

knee

It is noteworthy that when the minor vowel segment (V2) is [], [i], or [u], it is pronounced with
greater length than when V2 is [], roughly equal in length with V1 in the major syllable when V1 is not a
long vowel. Stress remains on the ultimate syllable.
Also worth noting is that a reduced set of consonants are found in the C4 position; namely, /r/, /l/,
/m/, /n/, // and //. However, the voiceless stops (/p/, /t/, /c/ and /k/) and fricatives (/s/ and /h/)
do occur due to infixation, reduplication, and compound words.
While most roots are apparently either mono-, sesqui-, or disyllabic, there are examples of words
with three syllables. The antepenultimate syllable acts as a minor syllable. The following examples, and
for the rest of this section, are from the Terisu dialect.
/t.m.ii/

forehead

/b.r.poo/

to dream

/b.l.r/

green (an expressive11)

/hi.bt.bt/

is sleeping

In this paper short vowels are represented by a single letter, and long vowels by a double letter. The latter is a
departure from standard IPA. Another departure is that the palatal central approximant is represented by the symbol
y rather than the IPA standard j, which could easily be confused with the palatal voiced plosive and with local
orthographies, especially Bahasa Malaysia, which use j to represent a voiced alveopalatal affricate. Lastly, is
used for the unrounded open central vowel, and for the rounded open back vowel.
10
Diffloth (1976a) claims, for example, that /-a-/ in certain minor syllables is a morpheme.
11
Diffloth (1976b) discusses a word class called expressives, also known as ideophones. Diffloth demonstrates that
expressives in Semai have a phonology that is different from other word classes, exhibiting sequences of sounds not
found in the rest of Semai.

3.1.2

Syllables

In Semai every syllable has an obligatory nucleus and onset, and an optional coda. The nucleus is usually
a vowel; however, there are some nasals that are syllabic as well in the minor syllable.
/m.pc/

salt

/n.tooy/

big

/.cees/

shallow

/.kuu/

thunder

The onset and coda are consonants. The two basic syllable types are CV and CVC. The syllable type CV is
found only in the minor syllable.12 For example,
CV

/b.h.y/

crocodile

/t.wk/

butterfly

/k.rk/

to shiver

All major syllables, and some minor syllables, have the syllable type CVC. For example,
CVC

3.1.3

/pc/

to wait

/b.lk/

blunt

/kl.p/

brain

Basic inventory: consonants

When viewed as a whole, the Semai language can be said to have twenty-three consonants.
Table 2.Consonant phonemes in Semai
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

Plosive, voiced

Preploded nasala

Nasal

Trill or flap
Lateral
approximant
Fricative, voiceless
Central
approximant

r
l
s
w

The preploded nasals are found only at the end of major syllables. There is good evidence
to support the notion that preploded nasals occur only after oral (nonnasal) vowels.
Assuming this proves to be the case, then preploded nasals would represent allophones of
simple nasals at the end of major syllables, rather than contrastive phonemes. However,
phonemic preploded nasals are, in fact, found in some related languagesfor example,
Kensiw (Bishop 1996) and Temiar (Benjamin 1976a).

The Gopeng dialect, however, has lost glottal stops after long vowels in the major syllable, resulting in CV syllable
types in those words. This loss of the glottal stop is further noted in section 3.4.

12

3.1.4

Basic inventory: vowels

When viewed as a whole, the Semai language can be said to have thirty vowels. Semai has both short
and long vowels. The long vowels are not dramatically elongated. Indeed, it may be more accurate to
portray the long vowels as the more normal and the short vowels as extra short. Overall, there are
roughly twice as many words with long vowels as opposed to short in the major syllable.
Table 3. Oral vowels
Oral, long

Front

Central

Back

Close

ii

uu

Close-mid

ee

oo

Open-mid

Open
Oral, short

Front

Central

Back

Close

Close-mid

Open-mid

Open

Table 4. Nasal vowels

Nasal, long

Front

Central

Back

Close

Mid

Open
Nasal, short

Front

Central

Back

Close

Mid

Open

3.1.5

Deviations from basic inventory: the preploded nasals

The southeastern dialects of Semai13 generally have the full set of consonants. That is to say, they retain
the preploded nasals. These dialects include Betau, Cherong, Pos Buntu, and Bertang. In one dialect,
Kuala Kenip, the plosive is voiceless, but still has a voiced nasal release. For the rest of the dialects,14 all
of the preploded nasals have become simple voiceless plosives. Hence, for these dialects there are only
nineteen consonants. The following examples show the form of these endings in three representative
dialects.

13

This region is the Raub district and the southeastern half of the Lipis district in Pahang.
That is, the Cameron Highlands district of Pahang, the northwest part of the Lipis district in Pahang, and all
dialects in Perak.

14

9
Voiced preploded
nasals (e.g. Betau)

Voiceless preploded
nasals (Kuala Kenip)

Voiceless plosive
(e.g. Simoi)

mati

[d n]

[d n]

[dt]

termite

anai-anai

[.r]

[.rc]

[.rc]

to fly

terbang

[h ]

[h ]

[hk]

English

Malay

blood

darah

to die

[b.hiibm]

[b.hiipm]

[b.hiip]

Note that the reduction of the preploded nasal to a voiceless plosive has produced a number of
homonyms in these dialects, in those cases where there already existed a phonologically similar word
with a simple voiceless plosive. The following examples are given.
English

Malay

foot

kaki

to return

Dialects with preploded


nasals (e.g. Betau)

Dialects where nasal is now a


voiceless plosive (e.g. Simoi)

[u]

[uk]

pulang

[uk]

[uk]

skinny

kurus

[soo]

[sook]

navel

pusat

[sook]

[sook]

3.1.6

Deviations from basic inventory: shifted vowels

Two dialects surveyed have developed shifted vowels. What most dialects pronounce as a long, open
central vowel [] is now pronounced in Gopeng and Kampar dialects as a long, open back vowel [].
The entry of this vowel into the back region has caused the open-mid // to be phonetically raised to
[oo], and the close-mid /oo/ to be phonetically raised to [oo]. The following examples demonstrate this
phenomenon.
English

Malay

Most dialects

Gopeng, Kampar

shoulders

bahu

/l.pl/

/l.pl/

bone

tulang

/.k/

/.k/

fire

api

/s/

[oos]

/s/

shadow

bayang

/wk/

[wook]

/wk/

woman

perempuan

/kr.door/

[kr.door]

/kr.door/

roof

atap

/p.look/

[p.look]

/p.look/

3.1.7

Deviations from basic inventory: possible merged vowel

In at least one dialect near Tapah (Perak), the contrast between /oo/ and // appears to have been lost
and these reflexes have merged into /oo/. So for this dialect (Batu Dua), there is evidently one less
phonemic vowel than the rest. The neighboring village (Batu Tiga) still had this contrast, so it would
appear at least from this data that the phenomenon is not widespread. This phenomenon was noticed
during a chance encounter, however, and full wordlists from these villages were not collected.

10

English

Malay

Most dialects

Batu 2, Jalan Pahang (Perak)

hair

rambut

/sk/

/sook/

navel

pusat

/sook/

/sook/

3.1.8

Deviations from basic inventory: diphthongs

In general, diphthongs rarely occur in Semai. However, in the northern region of Pahang, the Telom
River dialects in Lipis District and up to the eastern edge of Cameron Highlands District in Pahang, the
Semai dialects (for example, Renglas, Lanai, and Serau) have a diphthong /u/. This occurs in words
where other dialects generally have words containing /oo/ before a glottal stop // or glottal fricative
/h/.
English

Malay

Most dialects

Telom area, Lipis (Pahang)

dog

anjing

/coo/

/cu/

to defecate

berak

/c[h].cooh/

/ch.cuh/

These same Telom River dialects (Renglas, Lanai, and Serau) also have the diphthong /ei/ before a
glottal stop // in a few words that in other dialects generally is /ii/ or /ee/ before //.
English

Malay

Most dialects

Telom area, Lipis (Pahang)

soil, earth

tanah

/tii/ or /tee/

/tei/

short

pendek

/ku.tii/ or /ku.tee/

/ku.tei/

3.1.9

Deviations from basic inventory: resyllabification

Diffloth (1977) noted that in the southern reaches of the Semai territory, the Semai dialects have
changed /oo/ to /uw/ before final alveolars and palatals (-t, -n, -r, -l, -s, -c, -, -y), and // to /iy/
before final labials and alveolars (-p, -m, -w, -t, -n, -r, -l, -s). These changes, plus the fact that the
southern dialects use quite a number of words unique to the region, give these dialects an especially
peculiar sound. These changes were also attested in the data collected for this report, although there
were not examples of the sound change before every segment listed above.
Resyllabification and sometimes simplification of the minor syllables have occurred to produce
words that fit better into Semais sesquisyllabic constraints. Thus the consonants (y and w) in these
sequences now occupy the initial consonant position in the major syllable.
English

Malay

Most dialects

Tanjung Malim area (Perak)

dry

kering

/soot/

/su.wt/

woman

perempuan

/kr.door/

/k.du.wr/

to sleep

tidur

/bt/

/bi.yt/

bird

burung

/cp/

/ci.yp/

11
The approximants y and w are also found in the onset of major syllables in other words, so these
phonological changes and resyllabification do not introduce any new phonemes.

3.1.10 Deviations from basic inventory: other vowel differences


The Telom River dialects previously mentioned (Renglas, Lanai, and Serau) have markedly different
vowel qualities for certain vowels, as compared with the rest of the Semai dialects. Where most dialects
exhibit the long close central vowel //, in a good number of these words, the vowel is rounded to //
in Lanai and Serau. In Renglas this vowel is rounded and backed to /oo/ in these same words.
Furthermore, the segment that is commonly /oo/ in most other Semai dialects occurs unrounded and
more central in these three dialects, as //. The following examples illustrate these shifts.
English

Malay

Most dialects

Lanai, Serau

Renglas

house

rumah

/dk/

/dk/

/dook/

betel nut

pinang

/b.lk/

/b.lk/

/b.look/

rat

tikus

/p.rook/

/p.rk/

/p.rk/

wind (n)

angin

/pooy/

/py/

/py/

Finally, it should be noted that many of the dialects showed a reduction of vowel length for long
vowels before final laryngeals /h/ and //. However, the data is not totally consistent in showing this,
and there was often difficulty in hearing the distinction in vowel length. Furthermore, some dialects
seem to show this phonological change more consistently than others. These changes were also identified
by Diffloth (1977).

3.2 Shared word families


This section examines those dialects which are linked through shared word families. A word family in
the current study is a word form of apparently common origin that is shared by a set of dialects. In the
example below, the form /byk/ is shared by the dialects represented by the designations H, V, and W,
and represents a word family. The semantic item shadow elicited two word families, the word family
/byk/ and the word family /wk/. For the current analysis, the greater number of word families
shared uniquely by a set of dialects, the more those dialects are deemed related.
Consider the following three items from the wordlist, demonstrating word families that indicate
distinct groupings of the various dialects.15
shadow
black

/wk/

(B,C,E,F,G,J,K,L,M,N,O,Q,R,S,T,U,X,Y,Z,AA,BB,CC,DD,EE)

/byk/

(H,V,W)

/blk/

(B)

/blk/

(C,E,L,M,N,Q,R,S,T,X,BB,CC,DD,EE)

/blk/

(J,K)

15

The capital letter codes are defined as following: B Gopeng; C Rasau; E Bertang; F Kuala Kenip; G Tangkai
Cermin; H Cluny; J Tapah; L Batu 17; M Kampar; N Bidor; O Bota; Q Sungkai; R Pos Buntu; S Betau; T
Simoi Baru; U Cherong; V Chinggung; W Sungai Bil; X Sungai Ruil; Y Serau; Z Lanai; AA Renglas; BB Cenan
Cerah; CC Relong; DD Terisu; EE Pagar.

12

some

/h/

(F,T,Y,Z,AA)

/ryh/

(G)

/rh/

(O)

/ct/

(U)

/hitp/

(H,V,W)

//

(B,G,J,K,L,M,O,Q,T,X,AA,BB,CC,DD,EE)

//

(C,E,N,U)

/u/

(R,S,Y,Z)

/n/

(H,V,W)

In these cases, the southern dialects (H, V, and W) share a lexical innovation that is distinct from the
other dialects. Indeed, there are a total of thirteen items in the wordlist for which the dialects (H, V, and
W) have a shared word that is distinct from the rest of the dialects. Another nine items have a common
word for the dialects (H, V, and W), but it so happens that one or two other dialects also have this word.
Furthermore, the dialects (V and W) share a lexical innovation unique to only these two dialects for an
additional twenty-three wordlist items. This evidence argues strongly for dialects V and W to be grouped
together, with dialect H being closely related as well.
Another pair of dialects that share a lexical innovation is from the northwest area: G (Tangkai
Cermin) and O (Bota). This pair shares sixteen words that are distinct from the other dialects, plus
another nine words that are found in this pair and at most two other dialects.
Beyond the two dialect clusters just discussed, the picture is less clear, mostly because lexical
innovations tend not to be unique for the other dialects. For example, while examination of the various
word families reveals that two of the eastern dialects (F and S) very often share the same word, that
word is almost always also found in a variety of other dialects, but never consistently the same set of
dialects.

3.3 Basic lexical comparison


This section analyzes the wordlists by utilizing basic lexical comparison. Two aspects are considered:
simple lexical similarity between each pair of wordlists and average lexical similarity for each dialect.16

3.3.1

Simple lexical similarity

The corresponding lexical items from each of the wordlists were compared to determine which are
lexically similar and which are dissimilar. A limited number of phonological changes (see section 3.4)
account for the vast majority of the differences between related words in the Semai dialects; hence it is
not generally difficult to judge whether corresponding words between wordlists are related.
The percentage of words that are lexically similar and apparently cognate was calculated for each
pair of dialects. The number of words compared between any two dialects averaged 410; however, the
exact number varied slightly for each pair compared because each list has a few missing items. The
resulting similarity percentage between each pair of dialects is presented in Table 5.

16

In this section there are a few places where pairs of dialects that are geographically close and lexically closely
related are represented by just one of the dialects. This was done to conserve space and reduce clutter in tables and
figures. In these instances label Y represents both Y Serau and Z Lanai, and label BB represents both BB Cenan
Cerah and CC Relong.

13
Table 5. Percentage of lexical similarity between Semai dialects
H Cluny
59 B Gopeng
63 76 J Tapah

59 67 73 L Batu 17
61 68 70 69 M Kampar

64 67 73 68 67 N Bidor
61 68 70 67 68 67 G Tangkai Cermin
60 68 69 65 67 68 83 O Bota
61 72 65 59 57 64 60 59 C Rasau

67 66 70 62 62 70 64 65 73 Q Sungkai
69 63 67 58 60 67 61 62 64 62 V Chinggung

69 60 64 58 59 65 60 59 61 60 82 W Sungai Bil
57 62 63 56 54 62 56 58 68 68 56 55 U Cherong

61 61 63 59 56 63 56 58 68 69 60 58 69 R Pos Buntu
59 63 64 59 56 62 55 57 70 67 60 59 69 75 E Bertang

57 64 68 60 57 62 59 60 69 69 58 56 70 71 73 S Betau
57 63 67 61 57 64 57 59 66 67 57 57 70 68 67 75 T Simoi Baru

57 61 63 59 55 62 56 59 66 67 57 57 70 70 74 77 76 F Kuala Kenip
57 65 68 62 58 62 59 59 65 70 55 55 70 67 71 75 75 77 EE Pagar

59 62 66 61 60 63 60 63 62 67 60 61 66 67 67 71 75 74 69 Y Serau
54 62 65 60 56 61 58 61 60 63 55 56 65 62 64 67 70 70 70 71 AA Renglas

56 64 67 61 57 61 57 59 60 66 55 54 66 66 67 69 73 70 74 71 75 BB Cenan Cerah
57 67 70 64 61 65 63 62 60 67 59 58 64 63 65 68 71 68 70 70 75 75 DD Terisu
60 66 66 66 63 67 64 65 61 64 61 61 60 61 61 61 64 62 62 65 65 65 69 X Sungai

The following dialect pairs have similarity percentages above 80 percent: G-O, V-W. Not
surprisingly, both of these pairs are geographically close. The following sets all have similarity
percentages between 75 percent and 80 percent: B-J, E-R, F-S-T-EE, T-Y, and AA-BB-DD. While these
pairs of dialects are separated by a greater geographical distance, the pairs are always still neighbors.
Figure 3 depicts pairs of lexically similar dialects with a minimum of 71 percent similarity.

14

Perak

Pahang

X
DD

B
O

AA
L
G

Y/Z

BB/CC

T
F

EE

80% and above

Q
R
C

75-79%

71-74%

W
V

Figure 3. Map of lexical similarity between Semai dialects


While some groupings of high similarity do stand out (encircled on the map), perhaps the most
salient feature of this map is the evidence of a classic dialect chain, or more accurately, a dialect
network.17 Moreover, the overall pattern shows a gap down the middle that strongly correlates with the
mountain range that runs along the border between Perak and Pahang. This is unsurprising since
mountains constitute natural barriers that hinder travel and thereby impede communication between
dialects.
One surprise is the relatively high lexical similarity (72 percent) between dialect B (Gopeng) and C
(Rasau), given that they are at opposite ends of the territory.
It should be noted that as with any analysis involving sampling, there is a range of error expected. A
calculation was made based on the sample size and the degree of confidence in the reliability of the data,
according to the procedure proposed by Simons (1977). Although the confidence level is estimated, it
does provide some idea of what the range of error is and therefore what degree of difference is
significant. For the current research, the value N = 400 was used (referring to the 400-plus samples in
the wordlist), and a confidence level of average was assigned, based on the fact that all of the Semai
speakers queried were adequately bilingual in Malay, and the researcher had some background in the
Semai language. Using these values, the resulting calculation reveals that, when considering the lexical
similarity statistics, a difference of three percentage points should be considered significant. This is an
important factor when considering the previous tables and figures. In essence, what this means for this
study is that one should not put too much emphasis on differences of just one or two percentage points.

17

In a dialect network each dialect is lexically most related to its near neighbors and increasingly different with
distance.

15

3.3.2

Average lexical similarity

Another useful indicator is the average percentage of lexical similarity with all other dialects. The
average percentage for all lists is 64.1; that is, on average, 64.1 percent of the lexical items of each
dialect are similar to those of other Semai dialects. The individual percentages for each dialect are shown
in Table 6.
Table 6. Average lexical similarity
Percentage
67.2
66.6
66.5
66.5
66.4
66.3
66.2
65.9
65.8
65.8
64.9
64.8
64.6
64.6
63.7
63.7
63.4
63.4
62.4
62.2
61.7
60.6
60.4
59.8
59.5

Code
J
DD
Y
T
CC
EE
Q
S
F
BB
AA
B
E
N
X
R
C
U
O
L
G
V
M
H
W

Location (kampung, district, state)


Tapah, Batang Padang, Perak
Terisu, Cameron Highlands, Pahang
Lanai, Lipis, Pahang
Simoi, Lipis, Pahang
Relong, Cameron Highlands, Pahang
Pagar, Lipis, Pahang
Sungkai, Batang Padang, Perak
Betau, Lipis, Pahang
Kuala Kenip, Lipis, Pahang
Cenan Cerah, Cameron Highlands, Pahang
Renglas, Cameron Highlands, Pahang
Gopeng, Kinta, Perak
Bertang, Lipis, Pahang
Bidor, Batang Padang, Perak
Sungai Ruil, Cameron Highlands, Pahang
Pos Buntu, Raub, Pahang
Rasau, Batang Padang, Perak
Cherong, Lipis, Pahang
Bota, Perak Tengah, Perak
Batu 17, Batang Padang, Perak
Tangkai Cermin, Perak Tengah, Perak
Chinggung, Batang Padang, Perak
Kampar, Kinta, Perak
Cluny, Batang Padang, Perak
Sungai Bil, Batang Padang, Perak

These averages can be placed on the map of the Semai territory to show the geographical
distribution.

16

Perak

Pahang

X 64

67

DD

B
65

O62

AA 65
Y/Z 67

L62
G 62

BB/CC
66

T 67

60

F66

J 67

EE

N 65
Average Lexical Similarity

66

U63

S 66
E 65

Q 66

67%

R 64
C 63

65-66%

H
60

63-64%
62%

61 60

V,W

Figure 4.Distribution of average lexical similarity.


Although the average percentages do not have a particularly wide range, it is notable that Tapah (J)
has the highest average lexical similarity with other dialects: 67.2 percent. Given the centrality of this
dialect, plus being at the crossroads of the common travel routes, the figure is not especially surprising.
However, by the same reasoning, it might be expected that Kampar (M) also might be high on the list,
whereas, in fact, it is quite low. It is noteworthy that the language speaker that gave the Kampar list had
been living in the Kuala Lumpur area for two years. Hence it seems reasonable to believe that he may be
beginning to forget some of his language. For instance, this speaker gave the Malay word /pli/ for
rainbow, whereas every other dialect gave the word /cdw/. Overall, twenty-two percent of the
wordlist from the Kampar speaker were borrowed words from Malay. For comparison, the neighboring
dialect of Tapah showed approximately twelve percent borrowed words. (This topic is explored further
in section 3.6.)
On the other side of the ranking, Sungai Bil (W) has the lowest average lexical similarity compared
with other dialects: 59.5 percent. This dialect represents the southern extreme of the range.
It is also noteworthy that the dialects that are statistically only weakly related tend to be at the
extreme reaches of the overall Semai territory. This is not surprising since not only would a dialect near
the outer edge of the territory have less contact with other dialects than if it were more centrally located,
but such a dialect would also presumably have more contact with neighboring languages and cultures.
Contact with other languages and cultures is often a major source of borrowed words.
However, once again the data also provides a few puzzling counter-examples. The Terisu (DD)
dialect, at the northern extreme of the territory and known to have significant contact with its
neighboring language, Temiar, has a surprisingly high position in the relatedness ranking. The Lanai (Y)
dialect, at the northeast extreme, also has a surprisingly high average lexical similarity.
Overall, the averages seem low, evoking the question of why even near neighbors have at best 83
percent lexical similarity. There are a number of possible explanations, some of which will be presented
here. First of all, it may indeed be the case that even near neighbors have a significant number of
different words. Diffloth (1977) reported visiting 117 Semai settlements and found no two settlements to
be identical in their choice of words.

17
Secondly, it is quite likely that a given item on the wordlist has a variety of possible correct
responses due either to synonymy or to specific-generic mismatches. Synonymy can come from
borrowings and from word taboos. Diffloth (1980) discusses at length Semai word taboos, especially
associated with animals. Regarding specific-generic mismatches, there may be a number of words for
spider, depending on the species or characteristics (e.g. large or small, indoor or outdoor, etc.). One
village visited (Kampung Leryar, near Kampung Relong (CC)) had at least four different words for
different types or sizes of spiders, while a village a few kilometers away (Renglas (AA)) had just one
word for all spiders.
spider

/m/

(smallest)

/t.wiik/

(small)

/.loow/

(large)

/m.h l/

(largest)

/t.wiik/

(all types)

Kampung Leryar, near Relong (CC)

Kampung Renglas (AA)

Thirdly, some items on the wordlist sometimes present problems because they do not correspond
one-to-one with Semai words or semantic categories. For instance, finding the word for throw was
particularly problematic. The wordlist was changed to elicit particular types of throwing, such as
throwing sidearm, throwing a spear, tossing a ball, and throwing away garbage. Even then, problems
emerged, which would tend to indicate the Semai do not organize their words that involve throwing the
way that the Malays do. For more comments on problematic items in the wordlist, see Appendix B.
More than one Semai speaker north of the Tanjung Malim area believed that the southern Semai
dialects had borrowed significant numbers of words from Temuan, the aboriginal Malay language spoken
directly south of the Semai range. However, this is only hearsay evidence, and it may be hard to
distinguish the true source, since Temuan and Malay are closely related.
Dialect L (Kampar) has a relatively low average lexical similarity (60 percent) with other dialects.
The low average is unexpected since Kampar is centrally located along well-traveled routes. However, as
previously noted, the speaker from the Kampar dialect may have been starting to forget some of his
language.
The Kampar case demonstrates how a particular language speaker can have significant effect on the
overall percentage due to his or her personal history and knowledge. This is an important reminder that
it is not just the sample size that is important, but also that the circumstances of the individual language
speakers need to be consideredfactors such as how much a given speaker has moved around, which
other languages he or she speaks and in what contexts, education level, where the speaker currently lives
and works, and so forth.

3.3.3

Limitations of lexical similarity analysis

Lexicostatistics is but one tool for analyzing dialect similarities. While this tool is useful, it has its
drawbacks. For instance, the historical linguist may readily identify the phonological changes that have
taken place and thereby see the lexical similarity between two dialects for a given lexical item. However,
this type of information may well be beyond the grasp of the individual speakers. For instance, in the
Gopeng dialect the long // vowel has become backed and rounded to //; furthermore, the glottal
plosive after long vowels has often been dropped in this dialect. Hence the word /*c/ to eat, which
is still pronounced /c/ in almost every dialect, is pronounced /c/ in Gopeng. This word has
sometimes been known to be mistaken for /cooh/ to defecate by Semai from other dialects.

3.4 Shared phonological changes


There are a number of phonological changes that have affected various cognates in the Semai dialects.
Dialects that share a phonological change are more similar, at least for that change, than dialects that

18
did not undergo the same phonological change. The greater number of phonological changes in one
dialect but not in another, the greater the degree of distinction between those two dialects. This
distinction is in addition to differences in lexical items, as discussed in section 3.3.

3.4.1

Phonological changes

The following is a list of phonological changes found in the Semai dialects studied in this survey.18,19
(a) Final preploded nasals became voiceless plosives (in most dialects, except S Betau, E Bertang,
U Cherong, R Pos Buntu, and F Kuala Kenip, the last of which has a voiceless plosive but is
still a preploded nasal).
Example:
/*liibm/20 to swallow
/liibm/

(E,R,S,U)

/liipm/

(F)

/liip/

(B,C,G,H,J,L,M,N,O,Q,T,V,W,X,Y,Z,AA,BB,CC,DD,EE)

(b) Final glottal stops were lost after long vowels (B Gopeng).
Example:
/*c/ head louse
/c/

(C,F,G,H,J,K,L,N,O,Q,R,S,T,U,W,X,Y,AA,BB,CC,DD,EE)

/c/

(B)

(c) /*oo/ became unrounded and centralized to // (AA Renglas) or // (Y Serau, Z Lanai) in
all environments21 except before final /*-h/ or /*-/, where it became the diphthong /u/.
Example:
/coo/ rattan
/cook/

(B,C,G,H,J,L,M,N,O,Q,T,V,W,X,BB,CC,DD,EE)

/coo/

(E,R,S,U)

/coo /

(F)

/ck/

(Y,Z)

/ck/

(AA)

/coo/ dog
/coo/

(C,J,K,L,M,N,O,Q,R,U,V,W,X,BB,CC,DD)

/co/

(E,F,G,H,S,T,EE)

18

Nearly all of the data collected was congruent with Diffloths analysis (1977). Indeed, I have relied heavily upon
Diffloths proposed Proto-Semai forms as a basis for my hypotheses regarding proto-forms.
19
The phonological changes are presented in order of the average number of words affected, with the change
affecting the most words first. The relevance of this ordering will be addressed in the following section.
20
The asterisk (*) is used to designate a reconstructed element of a proto language, in this case a reconstructed
lexical item. A proto language is an ancestral form of a language from which modern varieties presumably have
developed.
21
In this section the phrase in all environments refers to the consonants on either side of the given segment.

19
/coo/

(B)

/cu/

(Y,Z,AA)

(d) /*/ became backed and rounded to // in all environments (B Gopeng and M Kampar).
Example:
/*krl/ man, male
/krl/

(C,E,F,G,H,J,L,N,O,Q,R,S,T,U,V,W,X,Y,Z,AA,BB,CC,DD,EE)

/krl/

(B,M)

(e) /*ee/ became centralized and raised to // in all environments (B Gopeng, M Kampar, and J
Tapah).
Example:
/*b[]hee/ satiated
/bhee/

(C,F,G,N,O,T,U,V,W,AA,BB,EE)

/bhe/

(E,H,R,S,Y,Z)

/bhee/

(L,Q,CC,DD)

/bhee/

(X)

/bah/

(B)

/bh/

(J,M)

(f) /*/ became rounded to /oo/ (AA Renglas), centralized, raised, and rounded to // (Y
Serau, Z Lanai), but simply centralized and raised to // in all other dialects. This change
happened in all environments.
Example:
/*d/ house
/dk/

(B,C,G,H,J,L,M,N,O,Q,T,V,W,X,BB,CC,DD,EE)

/d/

(E,R,S,U)

/d /

(F)

/dk/

(Y,Z)

/dook/

(AA)

(g) /*u/ became lowered to /o/ in all environments (M Kampar, X Sungai Ruil, and DD Terisu).
Example:
/*u/ foot
/ok/

(M,X,DD)

/uk/

(B,C,H,J,L,N,O,Q,T,V,W,Y,Z,AA,BB,CC,EE)

/u /

(E,R,S,U)

/uk/

(F)

20
(h) Final palatal consonants /*-c/ and /*-/ first merged as /*-c/, and then shifted to /-t/ or /-k/;22
also, final /*-/ shifted to /y/ (V Chinggung, W Sungai Bil, and some lexical items at H
Cluny). This phonological change may well still be in process, since a couple of the older people
interviewed seemed to still have a few palatal final consonants, whereas the younger adults
appeared to have shifted them (either in place of articulation or in manner of articulation). More
research is needed on this dialect to determine the demographics (e.g. age, gender, and location)
of those who are yet to shift the final palatal consonants, as opposed to those who have already
shifted them.
Examples:
/*p / to wait
/pt/

(V,W)

/p /

(E,R,S,U)

/pc/

(F)

/pc/

(B,C,G,H,J,L,M,N,O,Q,T,X,Y,Z,AA,BB,CC,DD,EE)

/*sc / flesh
/sk/

(H,V,W)

/sc/

(B,C,E,F,G,J,L,M,N,O,Q,R,S,T,U,X,Y,Z,AA,BB,CC,DD,EE)

/*lm/ tooth
/lmy/

(V, but also G)23

/lm/

(B,C,E,F,H,J,L,M,N,O,Q,R,S,T,U,W,X,Y,Z,AA,BB,CC,DD,EE)

(i) /*i/ became /ii/ (dialects E, F, L, J, L, M, N, R, S, T, U, X, BB, CC, DD, EE), /ee/ (dialects B, C,
G, H, O, Q, V, W, Y, Z), or // (dialect AA) in all environments.
Examples:
/*lhi/ saliva
/lhiik/

(J,K,L,M,N,T,X,Y,Z,BB,CC,DD,EE)

/lhii /

(E,S,U)

/lhiik/

(F)

/lheek/

(B,C,G,H,O,Q,V,W)

/lhk/

(AA)

/*ris/ root
/riis/

(E,F,H,M,N,R,S,U,BB,DD,EE)

/riiy/

(J,L,CC)24

22

So far there is not enough data to determine which environments produce /-t/ and which produce /-k/, if indeed
the shift is predictable from the environment.
23
The W Sungai Bil speaker was an older woman in her fifties. As mentioned in this section, some of the older
speakers still maintained some final palatal consonants, as attested in this example where one would expect
/lmy/ if all final palatals had shifted.
24
In some dialects final /-s/ has become /-y/ in some words. The conditions under which this happens are unclear.
Some words seem to have undergone this change completely; that is, they are consistently pronounced with a final

21
/rees/

(B,C,G,O,Q,W,Y,Z)

/rs/

(AA)

(j) /*oo/ became /w/ before alveolar (*-t, *-dn, *-n, *-r, *-l, *-s) and palatal (*-c, *- ,
consonants (H Cluny, V Chinggung, and W Sungai Bil).

*-) final

Examples:

/*croos/ fingernail
/cnws/

(H,V,W)

/cnoos/

(C,E,F,Q,R,S,T,U,BB,CC,EE)

/coos/

(B,G,M,N,O,X,DD)

/crooy/

(J,L)

/cns/

(Y,Z)

/cns/

(AA)

/*y/ lip
/wy/

(H,V,W)

/y /

(B,E,F,R,S,T,Q,U,Y,Z,BB,CC,EE)

/y/

(G,O,DD)

/y/

(C,J,N)

/y/

(X)

/y
/

(AA)

(k) /*Noo/25 became /Nuu/ (B Gopeng, G Tangkai Cermin, O Bota, X Sungai Ruil, and DD
Terisu), or /N/ (AA Renglas), or /N/ (all other dialects).

Example:

/*n/ path, road


/n/

(B,G,O,X,DD)

/n/

(C,E,F,H,J,L,M,N,Q,R,S,T,U,V,W,Y,Z,BB,CC,EE)

/n
/

(AA)

(l) /*/ and /*i/ became /y/ before final labials (*-p, *-bm, *-m, *-w) and final alveolars (*-t, *d
n, *-n, *-r, *-l, *-s) (H Cluny, V Chinggung, and W Sungai Bil).
Examples:
/*bt/ to sleep
/biyt/

(H,V,W)

/bt/

(B,C,E,F,J,L,M,N,Q,R,S,T,U,X,Y,Z,AA,BB,CC,DD,EE)

/-y/. However, other words clearly still have a final /-s/, and another set of words seem to have free variation,
sometimes pronounced with a final /-s/ and sometimes with a final /-y/ by the same speaker.
25
The N symbol represents any nasal consonant (/m, n, , /). Hence /*Noo/ means proto-vowel /oo/ after a nasal
consonant.

22
/*sibm/ to forget
/siyp/

(H,V,W)

/siip/

(J,L,M,N,T,X,Y,Z,BB,CC,DD,EE)

/seep/

(O,Q,B,C,G)

/sp/

(AA)

/sii m/

(F)

/seebm/

(E,R,S,U)

3.4.2

Other phonological changes

Diffloth (1977) proposed several other phonological changes that were also seen in the data collected for
this study. However, where there were fewer than five examples of these remaining changes in this
corpus, it was deemed insufficient to present in this report.

23

3.4.3

Summary of phonological changes

Table 7 summarizes the phonological changes for the dialects researched in this study.
Table 7.Summary of phonological changes
Change
Proto-form

B Gopeng
C Rasau
E Bertang
F Kuala Kenip
G Tangkai Cermin
H Cluny
J Tapah
L Batu 17
M Kampar
N Bidor
O Bota
Q Sungkai
R Pos Buntu
S Betau
T Simoi
U Cherong
V Chinggung
W Sungai Bil
X Sungai Ruil
Y Lanai
Z Serau
AA Renglas
BB Cenan Cerah
CC Relong
DD Terisu
EE Pagar

a
b
c
d
e
*-bm *-VV *oo * *ee
-p
-p
-bm
-pm
-p
-p
-p
-p
-p
-p
-p
-p
-bm
-bm
-p
-bm
-p
-p
-p
-p
-p
-p
-p
-p
-p
-p

lost

f
*

g
*u

oo

h
i
j
k
l
*-c,*- *i *oo
*N *,i
/-T,-C
/-P,-T
ee

ee

ii

ii

ee

some ee w

y
ii

ii

ii

ii

ee

ee

ii

ii

ii

ii

shifte ee w

y
shifte ee w

y
ii

ee

ee

ii

ii

ii

ii

In this header row the -P represents any final labial consonant (/-p, -m, -w/); the -T represents any final
alveolar consonant (/-t, -n, -r, -l, -s/); and the -C symbol represents any final palatal consonant (/-c, -,
-y/). Note also that change c refers to where the proto-form *oo changed in all environments except before
final /*-h/ or /*-/.

In Table 7 the letters in the first title row refer to the phonological changes listed in the previous section.
The second title row refers to the proto-Semai form. The table is filled in where the particular dialect is
different from the usual form found in the majority of other dialects. For instance, column f refers to the
proto-Semai vowel /*/. In the majority of dialects, this proto-vowel became present-day //, but this
is not shown in the table. Rather, the table is filled in where the proto-vowel became rounded to /oo/
(Renglas) or // (Lanai and Serau), which is different from the majority of the dialects.
When no one form is especially common, the form for each dialect is listed. For example, column i
refers to the proto-Semai vowel /*i/. Since no one present-day manifestation of this proto-vowel is
markedly more common than the others, the entire column is filled in.

24
Perak

Pahang

X agk

agk

DD

B
abdeik

Oaik

AA acfi
Y/Zacfi

La
G aik

BB/CC
a

Ta

adeg

F(a)

J ae

EE a

Na

Q ai
R

CAPs indicate dialect

C ai

small letters represent


associated changes

H
ahijl

V/W

ahijl

Figure 5. Geographical distribution of phonological changes.


The phonological changes can also be displayed on a map, with regions of the shared phonological
changes encircled, as shown in Figure 5. Note that Figure 5 graphically demonstrates the complicated
relationship of each phonological change, often resulting in overlapping circles, either partially or in full.
For example, Gopeng (B) uniquely shares phonological change d with its neighbor Kampar (M); that is,
they both exhibit this phonological change. Hence, one of the lines encircles just these two dialects.
These two dialects together also share phonological change e, but this change is also shared with Tapah
J). Hence another circle encompasses these three dialects. For another phonological change, k, Gopeng
shares this change with four different neighbors: Sungai Ruil, Terisu, Tangkai Cermin and Bota (X, DD,
G, and O, respectively). The circle for this change only partly intersects the two circles previously
mentioned.
Table 7 and Figure 5 allow us to readily see how many phonological changes exist between any one
dialect and its neighbors near and far. When two dialects lie within the same set of circlesthat is, there
are no lines between themthen those two dialects do not have any consistent phonological change
differences. The more lines between two dialects being compared, the more phonological changes that
distinguish them.
If Figure 5 illustrates just one thing, it is that the dialect situation for Semai is quite complicated.
There are no clusters of isogloss lines that clearly separate the Semai dialects into just two or three
groupings. Indeed, there are ten distinct regions in this distribution map.
To further complicate the analysis, not all phonological changes should be given equal weight
because they do not appear to have equal impact. For example, a particular phonological change may
amount to a relatively minor phonetic shift (as in /*/ changing to //), or it may instead result in a
fairly radical restructuring of the affected words (as in /*oo/ changing to /*uw/, which often resulted
in other changes in these words). Furthermore, while some phonological changes impact great swaths of
the lexicon (for example, the loss of the glottal stop after long vowels affects 13.3 percent of the words in
this study), other changes only involved a handful of words (such as /*/ and /*i/ changing to /y/,
which occurred in just 1.8 percent of the words).

25
Whereas it is difficult to quantify the former of these impacts (the nature of a phonological change,
whether slight or radical), the latter impact (the number of words affected by a phonological change)
can be calculated. Such an analysis is undertaken in the following section.

3.4.4

Aggregate phonological changes

Phonological changes cause words to take on different shapes and sounds. If two dialects do not share a
given phonological change, then they will have differing forms for those words that have a common
source. However, two dialects could have several phonological changes that are not shared between
them and yet these changes might affect only a very small total number of words. On the other hand,
two dialects could have just one phonological change that is not shared between them, yet that one
change may affect a large percentage of words.
Based on the assumption that the greater total number of words impacted by phonological changes
not shared between two dialects causes a greater degree of distinction between them, it is insightful to
look at the relative number of words affected by given phonological changes. Note that two dialects may
have both undergone several phonological changes that affected a large percentage of the words;
however, if both dialects share all of the same phonological changes, then presumably they still have
exactly the same form for each word they still have in common. (Obviously, this is not relevant for
lexical innovations, wherein the two dialects would by definition no longer have the same word.)
One way to analyze the impact of nonshared phonological changes is to examine the average total
number of words affected by phonological changes. Table 8 summarizes of the phonological changes plus
the average number of occurrences per 100 words.26
Table 8. Percentage of words affected by each phonological change
Phonological change
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)

Average per 100 words

Preploded nasal > voiceless plosive


Glottal stop lost after long vowels
/*oo/ > // , //
/*/ > //
/*ee/ > //
/*/ > // , /oo/
/*u/ > /o/
Final palatal consonant shifted
/*i/ > /ii/ , /ee/ , //
/*oo/ > /w/
/*oo/ after N > /uu/ , //
/*/, /*ie/ > /y/

18.0
13.3
13.0
10.5
5.5
5.5
4.0
3.8
3.5
2.8
2.5
1.8

The letters associated with the phonological changes are the same as in the previous section. Note
that the list of phonological changes has been arranged in order of changes that affected the most words,
on average, per 100 words in this study. The phonological change that affected the most words is placed
at the top of the list. These percentages can then be applied to the map of geographic distribution
(Figure 5) to produce the contour map in Figure 6.

26

This average is based on counts of words within the wordlist and not on a frequency count of any particular text.

26

Figure 6. Contour map of average total number of phonological changes.


The ten dialect regions can now be compared to each other. (A dialect region here is defined as a set
of dialects that have no phonological change boundaries between them.) If two dialects lie within the
same set of circles, such as Pos Buntu and Betau (R and S, respectively), then there are no consistent
phonological changes between the two dialects. If two dialect regions have just one line between them,
the value associated with that contour line27 is the average number of instances per 100 words of
differences in the two regions as a result of one or more phonological changes. In the case of Sungkai (Q)
and Bidor (N), for example, the line dividing them has the value of four; hence, there are approximately
four instances per 100 words where there are differences in the two dialects as a result of an identified
phonological change.
This analysis can be extended to dialects that are separated by more than one contour line. In
principle the procedure is to add up the figures for the contour lines that are crossed. In practice this can
be a little tricky since the contour map in Figure 6 is actually a simplification of what amounts to a
multidimensional surface. If a line is drawn to avoid unnecessary crossing of contour lines, however, the
correct average aggregate number of phonological changes can be determined. For example, when
comparing Cluny (H) with Lanai (Y), a line can be drawn from H to Y that avoids going through the two
southeast regions so that only two contour lines are crossed. The sum of these two lines is 30, meaning
that on average there are 30 instances per 100 words where there are differences between Cluny and
Serau as a result of phonological changes.
Every dialect region was compared to each of the others. The following matrix shows a summary of
the average aggregate number of instances of differences per 100 words, comparing each dialect region
to all the others.

27

A given contour line may well represent more than one phonological change. For instance, in the northeast region
dialects AA, Y, and Z are encircled by a contour line with the value 22. This actually represents the sum of three
phonological changes: c, f, i. These three changes have occurred in these three languages, but not outside of that
area.

27
Table 9. Average aggregate number of instances of differences per 100 words
Q,C Sungkai, Rasau
3.5

L,N,T,BB,EEBatu 17, Bidor, Simoi Baru, Cenan Cerah, Pagar

2.5

6.0

G,OTangkai Cermin, Bota

9.0

5.5

11.5 JTapah

10.0

6.5

7.5

8.3

11.8 10.8 17.3 18.3 H,V,WCluny, Chinggung, Sungai Bil

12.0 X,DDSungai Ruil, Terisu

23.6 20.1 26.1 14.5 18.5 31.8 M (Kampar


21.6 18.0 24.1 23.6 24.6 29.8 38.1

S,R,U,E,F Betau, Pos Buntu, Cherong,


K. Kenip, Bertang

21.1 24.6 18.5 30.1 26.1 29.3 44.6 42.6 AA,Y,ZRenglas, Lanai, Serau
31.8 35.3 29.3 29.8 36.8 40.1 23.3 53.4 47.9 BGopeng

To read the matrix, simply find the intersection of any two dialect regions on the chart. For example,
the intersection of Row S,R,U,E,F and Column Q,C reads 21.6, meaning that on average, there are 21.6
instances of differences per 100 shared words between dialect region E (Betau, etc.) and dialect region C
(Sungkai, etc.).
A low score indicates few instances of consistent phonological change between dialects and suggests
relatively good comprehension. Conversely, a high score indicates a large number of instances of
consistent phonological change. Note that this number is not a percentage, since if there were enough
phonological changes, there could be cases where there was an average of more than 100 instances of
phonological change per 100 words, since some words have multiple phonological changes. While there
are no cases of numbers greater than 100 in this chart, there are words that have multiple phonological
changes between certain dialects. For example, there are two phonological changes that affect the
following word, making the form from dialect Chinggung (V) different from that of dialect Betau (S).
/*sibm/ to forget

(V): /siyp/

[Preploded nasal > voiceless plosive;


plus /*i/ > /iy/]

(S): /seebm/

[/*i/ > /ee/]

Finally, an overall average for each dialect region can be calculated from the matrix.

28
Table 10. Average aggregate number of instances of phonological change per 100 words
Overall average
36.4
31.6
30.6
26.7
21.9
17.8
17.0
15.1
14.6
14.6

Dialect region
B
Y,Z,AA
E,F,R,S,U
M
H,V,W
X,DD
J
G,O
L,N,T,BB,CC,EE
C,Q

Table 10 dramatically demonstrates that some dialects have many more instances of phonological
change than others, when compared to all the dialect regions. For example, Gopeng (B), the dialect with
the highest aggregate, has on average nearly 2.5 times as many instances of phonological change as
dialects with the lowest aggregate (C, Q, L, N, T, BB, EE).

3.4.5

Summary of phonological changes

This section has looked at the aspect of consistent phonological changes from a variety of angles. These
changes are important when establishing dialect boundaries as well as determining the degree of
difference between dialect regions.
The previous tables, matrices, and maps provide important information that should be useful in
choosing a standard dialect, if so required. Other factors being equal, dialects with a lower average
aggregate number of phonological changes would be better candidates for a standard dialect, since the
lower average aggregate implies there are fewer instances of nonshared phonological change, on
average, with other dialects. However, it must be stressed that other factors are not equal, and that this is
just one factor. All the factors, especially sociolinguistic factors, should be considered when making such
decisions. For example, the peoples attitude toward their language and dialects can turn out to be even
more important than linguistic factors.28

3.4.6

Limitations of phonological change analysis

In addition to noting that phonological changes are just one indicator of dialect boundaries and
variation, it is worthwhile to point out other limitations of this approach. First, as pointed out earlier in
this section, not all phonological changes ought to be considered equal in that some changes have a
much more profound impact on the forms of words than others. That is to say, some changes result in a
dramatically different form of a word, compared to the same word that has not changed (or changed
differently) in another dialect, while other changes amount to a slight phonetic shift, easily recognizable
by speakers of other dialects. Unfortunately, it is rather difficult to quantify just how slight or radical a
particular phonological change is. Sociolinguistic factors are important, since it ultimately is the
perception of Semai speakers that determines whether a particular phonological change is profound or
not.

28

For instance, in countries with a caste system or with strong social classes, one would generally expect the
language/dialect of the lowest caste or class to be unacceptable as the standard language/dialect, even if it were the
most widely understood.

29
For the analysis regarding aggregate phonological changes cited earlier (Table 10), it is worth noting
that the averages were determined from the roughly 436 lexical items that were used in this study. A
larger corpus would theoretically produce more accurate figures. It should also be noted that each
dialect region was given equal weight when determining the averages. No adjustment was attempted
based on factors such as the total population in each dialect region. It may be a strategic factor to give
more weight to the more populous dialect regions if a standardized form of Semai is to emerge.
Lastly, attention should be given to items of the wordlist in which dialect forms are not similar and
apparently are unrelated. When the forms are dissimilar, the matter of aggregate instances of
phonological changes is irrelevant. Therefore if two given dialects have a great number of unrelated
lexical items, the approach of looking at aggregate instances of phonological change is much less
insightful since it applies to much less data.

3.5 Comparative reconstruction of Proto-Semai


Comparative reconstruction involves using daughter languages and dialects to reconstruct the protolanguage or ancestor language, a single original language from whence all of the daughter languages and
dialects descended. The most important tool for reconstruction is the comparative method. The previous
sections have already referred to the various consistent phonological changes found between dialects.
This section will consider whether certain phonological changes that are the same or nearly the same in
two dialects did in fact occur simultaneously or while those dialects were in contact.

3.5.1

Independent phonological changes

Generally, dialects that share the same set of phonological innovations are assumed to have a shared
history and therefore are more closely related. There are important exceptions, however. One exception
is when a phonological change is regarded as extremely common and therefore may well have occurred
as an independent innovation in two or more dialects after they separated from each other. Stated
another way, the phonological change did not occur before the dialects split, as might mistakenly be
presumed. Losses of a finalh or finalr are examples of such changes that may occur independently.
There have been several independent phonological changes in Semai. Diffloth (1977) shows that
Proto-Semai had more nasal vowels than do the current-day dialects. For instance, Proto-Semai had a
long nasal vowel //, which has been either raised to // or lowered to // in all dialects. Since only
two change strategies (to eliminate the long nasal //) were employed throughout the Semai dialects, it
would not be surprising to find that several dialects had undergone the same phonological change
independently. Indeed, Diffloth shows the raising of // in two geographically separate areas, and
claims that the same phonological change occurred in both areas independently, after the dialects had
split.29
In the current research, two of the identified phonological changes fall into this category, and as
such, offer weak evidence at best of a genetic relationship. The first is phonological change (i), where
/*i/ changes to /ii/, /ee/, or in one dialect //. The second is phonological change (k), where /*/
becomes // or // .

29

Unfortunately, on average this vowel was elicited fewer than twice per wordlist, which was not deemed sufficient
to verify Diffloths findings. What data there was did mostly substantiate Diffloths claim for this phonological
change, however.

30

3.5.2

The Malay peninsula as a linguistic area

The principle that shared phonological changes indicate relatedness can also exhibit exceptions when
dialects or languages come into contact and influence each other. A linguistic area30 is a geographical
region where languages come into contact with each other and various language features are borrowed;
not just lexical items or phonological changes are borrowed, but often other features such as
morphological and/or syntactic structures. Indeed, linguistic history is replete with examples of
phonological changes induced by language contact. For purposes of tracing linguistic history and
determining family relationships, it is critical to resolve which phonological features are inherited and
which are borrowed through language contact.
There is strong evidence that Semai and many of its surrounding languages constitute a linguistic
area. The Semai language has clearly borrowed many lexical items from Malay, as evidenced in the
collected wordlists. In addition, Semai has borrowed the Malay suffix-lah (with various pronunciations),
which stands out because Mon-Khmer languages are typically devoid of suffixes. Furthermore, the
southern dialects near Tanjung Malim (represented by wordlists from Chinggung and Sungai Bil) are
seen to be shifting the final palatal consonants to either the alveolar or velar points of articulation. This
shifting is evidently due to the influence of their neighboring language, Temuan, as well as Malay. Both
are Austronesian languages, and Austronesian languages in general do not have final palatal consonants.
However, although it is beyond the scope of this report, it is rather interesting to note that wordlists of
Temuan do show a few words with palatal final consonants (Baer 1999), and this feature evidently is
borrowed from Semai and/or other Mon-Khmer languages in the region. Temuan is not an isolated case:
Seidlitz (2005) reports that Jakun, another neighboring Austronesian language, also has some words
with palatal final consonants. 31

30

A linguistic area has also been referred to in various literature by the following terms: Sprachbund, diffusion area,
adstratum relationship, and convergence area.
31
It is also possible that final palatalsc and in Temuan and Jakun are evidence of substratum; that is, vestiges of
another language (presumably an Aslian language) spoken by Temuan and Jakun speakers before they essentially
switched to a variety of Malay.

31

Figure 7. Geographical range of Orang Asli languages.32


In the linguistic area to which the Semai language belongs, probably the most interesting feature, or
at least the most relevant to this discussion, is the preploded nasal. Most Aslian languages, as well as
several local Malay dialects, exhibit nasal preplosion in some form or another (Phillips 2006a). ProtoSemai clearly had preploded nasals, which derived from simple nasals at some point in its history. In the
present day, most dialects of Semai have reduced the preploded nasals to simple, voiceless plosives. The
line dividing the region between the areas that preplode and those that have simple plosives cuts across

32

This map is reproduced from Dentan (1997).

32
the Semai territory more or less from north to south along the Perak-Pahang border, but with the
Cameron Highlands portion of Pahang exhibiting voiceless plosives as the Perak side does.
The crucial question remains of whether the preploded nasal in Semai is an inherited trait or an
areal featurea separate phenomenon sweeping across the Semai dialects. Diffloth (1977) does not
address the question directly, perhaps because his paper is specific to the Semai vowels; however, his
groupings imply that he views the preploded nasals as an areal feature. In particular, Diffloth treats his
entire set of NE dialects as a single branch of Semai, even though one of the dialects (NE) in that set
has preploded nasals and the others (NE1 and NE2) do not. Furthermore, he does not group NE with E or
SE, even though the latter two also exhibit preploded nasals.
The current research contains three wordlists that seem to correspond with Diffloths NE
designations: Renglas, Lanai, and Serau. None of these wordlists contained preploded nasals. However,
upon listening to the Lanai and Serau speakers talking in Semai after wordlists were elicited, it became
clear that the speakers do indeed preplode. Apparently these speakers suppress the preploded nasal when
giving words in isolation. Whether the suppression is intentional or not is not known. It would be good
to work with these speakers again, and perhaps devise an elicitation method whereby words can be
elicited in a frame of text such that the preplosion is not suppressed.
The fact that several phonological changes unique to Diffloths NE dialects were verified in this
research33 gives valuable evidence that the NE dialects should be grouped together and not genetically
split by the preplosion feature. Therefore, unless future research turns up contradicting evidence, the
reduction of preploded nasals to simple plosives is best viewed as an areal feature, not a genetic one.

3.5.3

Genetic relationships of Semai dialects

Once all genetic phonological changes are identified, a family tree relationship of the dialects can be
drawn up based on inherited phonological changes.
Dialect area

Phonological change

Areal feature

b,d,e

d,e,(g)

Y,Z,AA

c,f

Proto-

X,DD

Semai

G,O

i1

E,F,R,S,U
N,L,BB,CC,EE

C,Q

i2

H,V,W

h,j,l

Figure 8. Family tree for Semai dialects.


In the family tree in Figure 8, a number of observations can be made. First of all, two dialect areas are
rather divergent from the rest of the dialects. The bottom group, representing the southern dialects H, V,
and W, has three shared phonological changes not found in other dialects, indicating that these three
dialects link back to proto-Semai and are not otherwise related to the other dialects. In the second

33

Where data was too scant to be included in this paper, the data that did exist appeared to support Diffloths (1977)
conclusions.

33
dialect area, the top dialect B Gopeng also has three phonological changes that distinguish it from most
of the other dialects; however, two of the changes are also shared with M Kampar. Gopeng and Kampar
share one of their phonological changes with J Tapah. Therefore it appears that B, M, and J could be
viewed as a grouping linked by sharing change e.
Secondly, two other dialect areas also exhibit phonological changes, albeit fewer, that separate them
from the rest. The dialects in the northeast corner of the Semai territory (Y Lanai, Z Serau, and AA
Renglas) have two phonological changes not found elsewhere. The other dialect area, in the northern
part of Cameron Highlands (X Sungai Ruil and DD Terisu), has just one phonological change setting it
apart from others. This change, where short vowel /u/ goes to /o/, happens also to be found in Kampar.
Since crossing genetic lines is not generally allowed when drawing family trees,34 this change has been
added only in parentheses to M Kampar. This may be an areal feature, an independent innovation, or
simply that the speaker who gave the Kampar list had been influenced by the dialect from Cameron
Highlands for some reason. In any case, this phonological change should be viewed as suspect until
further evidence can be brought to bear on the situation.
Next, the dialects roughly representing center and northwest regions all group together with no or
few phonological changes. Two exceptions have been drawn on this tree. First, the northwest dialects G
Tangkai Cermin and O Bota and the south-central dialects C Rasau and Q Bidor happen to share the
same phonological change. However, it seems probable that this change represents independent
innovations for two reasons: first, the geographic separation between these two regions argues against
linking the two dialect areas; and second, this phonological change (where /*i/ changes to /ii/ or /ee/)
was previously mentioned as likely being a nongenetic phonological change (i.e. an independent
innovation).
Finally, only one feature is shown dividing the remaining dialects, which roughly represent the
geographical center of the Semai territory. This phonological change, where the preploded nasals
become simple, voiceless plosives, was argued earlier to be an areal feature. So, while an areal feature is
not generally included on a genetic chart, it has been included here (with a dashed line) to highlight the
residual difference between these central dialects.

3.5.4

Reconstruction of Proto-Semai lexical items

Although there is some room for differing interpretations of the phonological changes that determine the
genetic tree, the reconstruction of Proto-Semai is a bit more clear-cut. Indeed, even if the preploded
nasals are considered to be a genetic feature rather than an areal one, the reconstruction produces the
same forms. The proto-forms determined from the data in the study can be found in Appendix E. The list
contains 429 lexical items. However, since many lexical items produced multiple words across the many
dialects, a total of about 760 unique words were reconstructable. Of this total, 150 words are from
Diffloths data and congruent with the current findings, while 610 words are newly added.

3.6 Word borrowings


Another insightful analysis regards word borrowings. The most obvious source for Semai to borrow
words is from Bahasa Malaysia (Malay), which is the national language as well as the language of wider
communication. Table 11 gives the percentage of apparent borrowings from Malay found in the
wordlists elicited. As noted earlier, it would be hard to distinguish between Malay and Temuan as the
source of the borrowings since these two languages are so closely related.

34

Another method of representing phonological changes, called wave-theory, has no such limitation. Circles are used
to show the extent of each phonological change, and the various circles are permitted to overlap and cross each
other.

34
Table 11.Percentage of apparent borrowings from Malay
Percent
26.5
22.8
21.8
21.1
20.1
19.2
18.2
16.2
16.0
16.0
14.7
12.5
12.1
12.1
12.1
12.0
11.2
10.7
10.2
10.1
10.0
9.7
8.0
7.9
7.6
7.5

Dialect
W Sungai Bil
V Chinggung
M Kampar
H Cluny
N Bidor
G Tangkai Cermin
O Bota
L Batu 17
Z Serau
X Sungai Ruil
Y Lanai
CC Relong
R Pos Buntu
Q Sungkai
J Tapah
DD Terisu
T Simoi
E Bertang
B Gopeng
AA Renglas
F Kuala Kenip
C Rasau
S Betau
U Cherong
BB Cenan Cerah
EE Pagar

A number of salient observations can be made from Table 11. In broad strokes, the dialects with the
fewest borrowings tend to be in the remote interior, most often in Pahang. On the other hand, the
dialects with the most borrowings tend to be in the lowland areas of Perak that have ready access to
roads and the surrounding cultures.
Of particular interest are the southern dialectsChinggung, Sungai Bil, and Clunywhich represent
three out of the four highest rates of borrowings among the dialects surveyed for this study. This high
rate of borrowing, coupled with the shift of final palatal consonants35 mentioned earlier (phonological
change h in section 3.4), is a strong indicator of the significant influence that Malay or Temuan has
been exerting on Semai dialects in this region.
Also exhibiting relatively high rates of borrowing are Tangkai Cermin and Botanineteen percent
and eighteen percent, respectively. This fact, plus the fact that these dialects also share several unique

35

Bahasa Malaysia does not have the final palatal consonants /-c/ and /-/, although these palatal consonants are
common in other positions. In fact, as a rule the entire Austronesian language family has a dearth of these final
palatal consonants. But a curious exception to this rule is Temuan, the Malayic language that is Semais immediate
neighbor to the south. Temuan has several lexical items with palatal final consonants, which conceivably could have
been borrowed from Semai (or another Aslian language), or be substratum.

35
lexical innovations, helps establish a link between them as well as a distinction of sorts from the other
Semai dialects.
At the lower end of the chart, six of the seven dialects with the lowest rates of apparent borrowing
from Malay are spoken in relatively remote villages in Pahang. The one surprising exception, Rasau, is in
a relatively accessible area in Perak; indeed, it is rather close to the southern dialects (Sungai Bil,
Chinggung, and Cluny) that have the highest rates of borrowing.
It would be premature, however, to draw strong conclusions from Table 11 until more research can
be done, since many factors can influence the number of apparent borrowings. Personal factors of people
providing the language data, including age, gender, education, mobility, and intermarriage, affect
responses. Even fatigue can come into playin some cases the language speaker was able to produce the
Semai word quickly, going through the list for the first time; however, during subsequent checking, often
several hours later, the speaker was no longer able to remember the word easily and would substitute
the Malay word.
For future research it would also be useful to compare the wordlists from this research with those of
other languages in the region, especially those known to be in contact with or in geographical proximity
to the Semai, such as Temuan (to the south) and Temiar (to the northeast) (Benjamin 1976b:75).

3.7 Some other observations


3.7.1

Language preservation and vitality

One aim of this research project is the preservation of the national treasure that Malaysia has among its
wealth of indigenous languages, such as the Semai language. Language preservation has two aspects.
One aspect, to which this project has been largely dedicated, is the documentation of the language as it
exists today, a snapshot to be preserved for generations to come. The second aspect of language
preservation involves the continuation of the use of the language.
Although it was not the direct aim of this research project to document the vitality of Semai, several
noted factors indicate Semai is indeed a living and vital language. The following four observations were
made from informal discussion with the various Semai speakers who provided the wordlists for this
study. First, all of the Semai speakers seemed quite proud of their language. Second, the Semai claimed
that they always spoke their mother tongue when talking to other Semai, even with Semai from distant
dialects, the only exception being when the dialects were just too different, and they needed to
supplement their speech with Malay. Third, the Semai are diligently teaching the language to their
children and expressed their desire to see their grandchildren and future generations continue to learn
and use Semai. Fourth, most Semai interviewed are interested in seeing the language developed. They
desire to be able to read and write their mother tongue, and in most cases are delighted to see materials
such as school curricula and dictionaries in their language.
It is hoped that the results presented in this report will further the use, study, and development of
the Semai language.

3.7.2

Morphology reduplication

In the course of any research involving digging for as yet unknown treasures, there often are gems of
discovery beyond the original scope of the research. Such was the case with this project. During the
wordlist collection it was not uncommon to encounter words, especially verbs, in a reduplicated form. It
was discovered that reduplication usually conveys the sense of continuative aspect not an uncommon
function for languages that employ reduplication. But as more and more wordlists were collected, several
patterns of reduplication were noticed.

Some dialects have more than one type of reduplication, including full reduplication of the root
verb; for example, Betau has these forms.

36
bi.bm

cries/cried

bi.m. m

is crying

bi..bm.bm

cries and cries

This last form is especially interesting in that it defies the norm for Semai word in that normally
only the ultimate syllable bears stress and utilizes the full set of phonemes. Instead, in this type
of reduplication, both the ultimate and penultimate syllables are exact copies, each bearing
stress and neither having a reduced set of phonemes.

Some dialects in which final preploded nasals become voiceless, have nasals reappear in
partially reduplicated bases. For example, Kampar has these forms.
/bi.p/

cries/cried

/bi.m.p/

is crying

Some dialects drastically reduce the amount of phonetic information in partially reduplicated
roots. For example, in the partial reduplication in the following forms from Kampar the vowel is
reduced and the oral plosive has become glottal.
/bi.ciip/

walks/walked

/bi.c.ciip/

is walking

It is reported that some dialects of Malay have a very similar type of reduplication (Kroeger
1989 and Zaharani 1991).

In some dialects reduplication appears to be far less productive; a number of apparently


reduplicated forms are frozen. For example, the following form from Sungai Bil contains what
appears to be a frozen reduplication, but no parallel form without reduplication has been found.
/bi.t.th/ (but no /bi.th/)

spits

Perhaps the most intriguing aspect of reduplication in Semai is apparent in sesquisyllabic roots.
Rather than one of the syllables being copied, a copy of the final consonant is infixed between
the initial consonant and the first consonant of the major (ultimate) syllable. Matisoff (2003)
coined the term incopyfixation to describe this phenomenon. Here are two examples from the
Betau dialect.
/bi.k.hl/

to cough

/bi.kl.hl/

is coughing

/bi.b.lh/

to see

/bi.bh.lh/

is seeing

A preliminary search of the linguistic literature on this topic reveals that this form of
discontinuous reduplication is very rare, if not unique to Aslian languages.
A fuller discussion of this type of discontinuous reduplication in both Semai and in surrounding
languages is presented in Phillips (2006b).

37

4 Conclusions
4.1 Language vitality
The Semai language is very vital the Semai people consistently use it among themselves, and they are
teaching the language to their children. The Semai people are proud of their language and see it as
something that identifies them as a people. In essence, the Semai people epitomize the Malay proverb:
Bahasa jiwa bangsa
Language is the soul of the race
Bilingualism in Malay does not appear to be threatening the vitality of Semai. Although it was not the
specific aim of this research, it was readily apparent that in general, the Semai people are fluent in
Bahasa Malaysia.

4.2 The complex dialect situation


Another richness of the Semai language is the wide diversity of its dialects. Every village speaks its own
variety, giving every region its own unique flavor. As this report has shown, most Semai locales share
many words with their neighboring villages, but the differences tend to grow with distance. The result is
a sort of dialect chain or dialect network spread over the Semai territory, with each village linked with
its near neighbors, except where physical barriers, such as the mountain range between the states of
Perak and Pahang, tend to impede travel and hence hinder communication.
As would be expected, the dialects that are the most different are at the boundaries of the Semai
territory. These dialects are the farthest on average from all the dialects as a whole and also more in
contact with neighboring language groups. For the Semai case, the extreme southern dialects are the
most notably different, but the northwest dialects also show a number of lexical innovations, and the
northeast dialects have sounds that are quite distinct from the rest.
The end result of all the differences among the Semai communities, both in unique words and in
phonological changes, is that most likely, one variety of Semai could not be used effectively to serve the
entire Semai-speaking population. The differences are just too great for any one Semai dialect to be
understood in all regions. Furthermore, the various analyses in this report have shown that the situation
is rather complex, and the Semai territory cannot be neatly divided into just two or three regions.
Ultimately, direct comprehension testing between dialects will be needed, in order to truly
determine how widely understood the dialects are. The preliminary groupings outlined in this report will
serve as a basis for choosing dialects for such testing.

5 Recommendations for further research


This research has laid the groundwork for much research yet to be done. First of all, if the Semai
language is to be developed and its vitality maintained, some form of standardization will need to take
place. This study of Semai dialects has revealed a plethora of differences from one dialect to the next.
The Semai people themselves are well aware of these differences and admit some dialects are harder to
understand than others. It is impractical and undesirable to develop each dialect; rather, one dialect will
need to be selected for standardization. As researcher Grard Diffloth (1968) asserts,
In spite of many common features, the differences between the various dialects labeled
Semai are so numerous As a practical consequence, it would be extremely difficult to
implement a literacy campaign in their own idiom until one of the dialects has been
accepted by them as a standard.

38
A number of factors need to be considered in determining which dialect is to be promoted as the
standard. Ideally, the standard dialect is one which is highly comprehensible36 by all of the other
dialects. While wordlists, linguistic comparative analysis, and native speaker intuition is of value in
predicting comprehensibility, ultimately comprehensibility must be empirically tested.37
Secondly, the morphology is a good next level of research after the phonology has been investigated.
The syntax and semantics of Semai also need to be studied, not only for unlocking the secrets of how the
Semai language works, but also for documentation and preservation purposes. And as previously
mentioned, there appears to be something intriguing about reduplication in Semai, which may turn out
to have significant importance for linguists.38
Thirdly, a dictionary project,39 especially one that covers the dialectical differences, would immensely
aid efforts to:
Preserve the language;
Make the Semai language available to linguistic researchers around the world;
Raise the status of the Semai language;
Allow Semai people to understand one another across large dialect gaps;
Train government officials that relate to the Semai people.
Fourthly, a project to do text collection should be initiated. Not only would this produce material for
more linguistic research, but it also would encourage the preservation of the language and culture that is
still known to the older generation, but is gradually being lost by the younger generations. In addition to
the legacy of culture and language at risk of being forgotten, it is estimated that huge amounts of
knowledge about the rainforest and its useful resources will become extinct if attempts are not made to
preserve this knowledge.40
Fifthly, in the realm of biodiversity, a topic of ever-growing international concern, scientists are
becoming increasingly aware of the sophistication of traditional ecological knowledge among many
indigenous and local communities. Traditional healers in Southeast Asia rely on as many as 6,500
medicinal plants, and shifting cultivators throughout the tropics frequently sow more than 100 crops in
their forest farms (Posey 2001). There is an inseparable link between this indigenous knowledge of local
biodiversity and linguistic diversity in that such local knowledge is critically dependent on local
languages for cultural transmission between generations. In other words, preservation of the Semai
language will help preserve what the Semai know about the rainforest, which in turn will aid knowledge
of local biodiversity and how to maintain it.
Lastly, a sociolinguistic study should be undertaken so that the social factors, as they relate to
language, can be better understood. This kind of research would seek answers to questions such as the
following:

36

The more common terms in linguistics literature are intelligible and intelligibility. However, this term is sometimes
confused with intelligence; therefore, I have chosen terms such as comprehensible and comprehensibility,
referring to the inherent ability of people from one dialect to understand another dialect without already being
familiar with that dialect.
37
A dialect comprehension suvery was carried out in 20052006 and resulted in the report Dialect comprehension
survey of the Semai language (Phillips 2006b), submitted to the Economic Planning Unit (EPU) of the Prime
Ministers Department, Malaysia. The results of this report were used to choose a standard dialect for Semai,
corresponding roughly to Semai dialect spoken along the lower Jalan Pahang region of Perak.
38
The phonology of Betau Semai in Appendix F was later published (see Phillips 2007). Phillips (forthcoming)
outlines the morphology of Semai as well as what is currently known of the morphology of other Aslian languages in
his dissertation on Aslian. As mentioned earlier, Semai reduplication has been described in some detail (Phillips
2011).
39
A Semai dictionary project was begun in 2005 and resulted in a dictionary published in 2008 (see Basrim 2008). In
2012 a revision of this dictionary was initiated.
40
Since this report was originally submitted in 2006, there have been several efforts to collect stories and texts in
Semai. While these stories have not been officially published, some of them have been printed in limited quantities
and distributed among the Semai in an effort to stimulate an interest in literacy and in language preservation. As of
2012 these efforts are ongoing.

39

When people marry across dialects, which language do they speak with each other and to their
children?

What are their cultural and linguistic taboos, and how do these affect the language?

What are the linguistic differences related to gender, age, status, and time away from the home
village?

It is hoped that the research presented in this report will provide insights and pertinent information
to future research about, and for the sake of, the Semai people and the country of Malaysia.

Appendix A: Map of dialects sampled


0.1

40

Appendix B: Wordlist employed in study


The major tool used in this research was a 429-item wordlist. The first part of this appendix presents the
full wordlist that was employed, shown in Table 12. The wordlist is organized largely in semantic order
i.e. words have been grouped according to similar concepts. The second part of this appendix details the
various issues that arose in using the wordlist, shown in Table 13.

41

42

1
2

Malay

matahari

English

sun

40

star

42

bulan

moon

langit

sky

bintang

air

water

alir, mengalir

flow, to

4
6

sungai

8a

berenang (orang)

8b

river

swim, to
(human)
swim, to
(animal, eg. dog)

9a

berenang
(haiwan, contoh
anjing)
apung, terapung

10

tenggelam

float, to
(rise to surface)
sink, to

cetek

shallow, to be

9b

timbul

float, to (surface)

11

mandi

13

dalam (lawan
cetek)

14

hujan

deep, to be
(opposite
shallow)
rain

16

guruh, guntur

thunder

18

bayang

shadow, a

12

15
17

kilat

pelangi

bathe, to

lumpur

cave

soil, earth

27

pasir

sand

32

garam

salt

habuk

mud

dust

33

jalan (kecil)

path (small)

35

kayu api

firewood

34
36

api

ashes

bakar (kayu)

burn, to (wood)

asap

39

layur; bakar
(bulu)

38

fire, a

abu

37

root, a

smoke

burn off, to
(feathers)

leaf

45

menggali

dig, to

47

bunga

flower

46

cari, mencari

48

duri

49b

dahan

49a

batang pokok

50

kulit kayu

52

buah betik

51

53

54
55
57

62

63

look for, to

thorn

trunk, tree

branch, tree
bark, tree

buah

fruit

pisang

banana

61

stone, a

grass

akar

kapuk / kabu
(digunakan untuk
membuat bantal)
buluh

tanah

28

44

60

mountain

26

daun

forest

seed

rainbow

gunung

25

43

English

biji, benih

cendawan

23

gua batu

rumput

58

lightning

wind

24

41

56

angin
batu

hutan

kelapa muda /
nyiur muda
kelapa tua / nyiur
tua
terung

19

22

Malay

halia

rebung
rotan

papaya

coconut (unripe)
coconut (ripe)
eggplant
ginger

mushroom
(generic)
kapok (cotton)
bamboo

bamboo shoot
rattan

64

buah pinang

betel nut

66
67

kapur (digunakan
bersama buah
pinang)
ludah, meludah

68

getah

lime
(used with betel
nut)
spit, to (betel
juice)
rubber

65

69

daun sirih

toreh, potong
getah

betel nut leaf

tap, to (a rubber
tree)

43

70
71
72
73
74a
74b
75
76a
76b
77
78a
78b
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
104

Malay

pokok ipoh

sumpitan (alat)
sumpit,
menyumpit
pangkal damak
damak (yang ada
racun)
damak (yang
tiada racun)
tabang (bekas
untuk simpan
damak)
tombak, lembing
(dibuat daripada
besi)
seligi (dibuat
daripada buluh)
buru, memburu
bunuh,
membunuh
matikan,
mematikan
tikam, menikam
tembak,
menembak
beruang
babi
landak
monyet (ekor
pendek)
arnab
rusa
harimau
ekor
anjing
menyalak
kucing
tikus
gigit, menggigit
burung
terbang
ayam
bulu (ayam)
telur
sayap
rama-rama
lebah
lalat
nyamuk

English

tree used for


dart poison
blowgun
blow, to
(blowgun)
dart head
dart (poison)
dart (without
poison)
quiver
spear (iron)
spear (bamboo)
hunt, to
kill, to
cause to die, to
stab, to
shoot, to
bear, a
pig
porcupine
monkey (short
tail)
rabbit
deer (large)
tiger
tail
dog
bark, to
cat
rat
bite, to (animal)
bird
fly, to
chicken
feather (chicken)
egg
wing
butterfly
bee
fly, a
mosquito

105
106
107
108
109
110
111
112a
112b
112c
113
114a
114b
115
116
117a
117b
118
119
120
121
122
123
124a

Malay

kutu (ayam)
kutu (kepala)
anai-anai (sejenis
semut putih)
labah-labah
lipas
ular
biawak
siput
siput babi
siput air
katak
penyu
labi-labi
ikan
buaya
orang (asli)
orang (bukan
orang asli)
lelaki
perempuan
anak
kanak-kanak
emak/ibu
bapa
abang

124b kakak
125a

adik lelaki

125b adik perempuan


126

anak bongsu

127
128
129

suami
isteri
balu (perempuan
yang kematian
suami)
kawan
nama
kepala
rambut
sikat
botak (tiada
rambut kerana
rambut telah
gugur)

130
131
132
133
134
135a

English

louse (chicken)
louse (head)
termite

spider (small
house)
cockroach
snake
monitor lizard
snail
snail, garden
snail, water
frog
turtle (sea)
turtle (river)
fish
crocodile
person (Aslian)
person (nonAslian)
man
woman
child (offspring)
child
mother
father
sibling (elder
brother)
sibling (elder
sister)
sibling (younger
brother)
sibling (younger
sister)
offspring
(youngest)
husband
wife
widow
friend
name
head
hair
comb, a
bald, to be
(natural)

44
Malay

135b botak (tiada


rambut kerana
telah dicukur)
136a menggunting
rambut; potong
rambut (pakai
gunting)
136b potong rambut
(pakai alat lain)
137
138
139
140
141
142
143
144
145

mata
hidung
telinga
muka
dahi
kening
pipi
dagu
kerongkong,
tekak
146a bibir
146b bibir atas
147c bibir bawah
148
gigi
149
lidah
150
gusi
151
otak
152a tengkok
152b leher
153
belakang
(bahagian badan)
154
bahu
155
ketiak
156
siku
157
tangan
158
tapak tangan
159
kuku
160
jari tangan
161a perut
161b perut (bahagian
di atas pusat)
162
perut (bahagian
di bawah pusat)
163
punggung
164
pusat
165
paru-paru
166
jantung
167
tali perut, usus
168
hati

English

bald, to be
(shaved)
cut hair, to
(with scissors)
cut hair, to
(with other
instrument)
eye
nose
ear
face
forehead
eyebrow
cheek
chin
throat (internal)
lip
lip, upper
lip, lower
tooth
tongue
gums
brain
neck, (back of)
neck, (front of)
back
shoulders
armpit
elbow
hand
hand, palm of
finger nail
digit, finger
abdomen
(generic)
abdomen, upper
abdomen, lower
buttocks
navel
lungs
heart
intestines
liver

169
170
171
174
175
176
177
178
179
180
181
182
183a
183b
184
185
186
187
188
189
190
191
192
193a

Malay

tulang
tulang rusuk
kulit (manusia)
daging (manusia)
peha, paha
lutut
betis
kaki
tumit
jari kaki
kuat
letih, penat
tidur
tidur lena,
tidur nyenyak
berdengkur
menguap
buta
nampak (contoh,
nampak seekor
ayam)
lihat
kenyit / kelip
mata (sebelah
mata)
pekak, tuli
dengar
cium, menghidu
(bunga)
reput

193b basi
194
195
196
197
198a

lapar
kenyang

makan
haus, dahaga
sedut, menyedut
(siput air,sumsum tulang)
198b hisap, menghisap
(gula-gula)
199
jilat, menjlat
200
mabuk
201
arak
202
minum
203
telan, menelan
204
muntah

English

bone
rib
skin (human)
flesh (human)
leg (upper)
knee
leg (lower)
foot
heel
toes
strong, to be
tired, to be
sleep, to
sleep, to
(soundly)
snore, to
yawn, to
blind, to be
see a chicken, to
look, to
wink, to
deaf, to be
hear, to
smell, to (flower)

rotten/decayed,
to be
rotten, to be
(food)
hungry, to be
full / satisfied, to
be
eat, to
thirsty, to be
suck out, to
(snail, bone
marrow)
suck on, to
(candy,sweet)
lick, to
drunk, to be
liquor
drink, to
swallow, to
vomit, to

45

205

Malay

sakit (tangan),
[bukan demam]
206
bengkak
207
gatal
208
garu, menggaru
209a mencakar (seperti
mencakar sampah
atau daun)
209b kais, mengais
(seperti ayam
mengais untuk
mencari
makanan))
210
sejuk
211
menggigil,
menggeletar
212
batuk
213
bersin
214
panas
215
air peluh
216
air liur
217
meludah (dekat)
218
meludah (jauh)
219
air mata
220
menangis
221
air kencing
222
darah
223
tahi, najis
224
buang air besar,
berak
225
nanah
226
bisul
227
bekas luka, parut
228
ubat
229
menjampi
230
bomoh, pawang,
dukun
231
hidup
232
233
234

mati
mengebumikan
mayat, tanam
mayat
tua (orang)

235
236

gemuk
kurus

English

hurt / be
painful, to
swollen, to be
itchy
scratch, to
rake, to
scratch, to
(chicken)

cold, to feel
shiver, to
cough, to
sneeze, to
hot, to be
sweat
saliva
spit close to self
spit far away
tears (noun)
cry, to
urine
blood
excrement
defecate, to
pus
boil, a / abscess
scar
medicine
incant, to
herbal curer,
shaman
exist /be alive,
to;
live, to
die, to
bury, to (corpse)
old, to be
(animate)
fat, to be
skinny, to be
(animate)

237a

Malay

tinggi (orang)

English

240
241
242
244
245
246
247
248
249
250
251

tall, to be
(human)
tall, to be (tree)
short, to be
(human)
short, to be
(tree)
big, to be
small, to be
breathe, to
blow on, to (fire)
sit, to
stand, to
walk, to
crawl, to
run, to
fast, to be
slow, to be

254
255
256
257

think, to
know, to (thing)
know, to
(person)
forget, to
dream, to
choose, to
love, to

237b tinggi (pokok)


238a pendek (orang)
239

pendek (pokok)

besar
kecil
bernafas
tiup, meniup (api)
duduk
berdiri
berjalan
merangkak
berlari
cepat
perlahan-lahan,
lambat (lawan
cepat)
252
fikir
253a tahu, faham
253b kenal (orang)

258
259
260

lupa
bermimpi
pilih, memilih
sayang,
mengasihi, kasih
senyum
ketawa
baik (orang)

261

jahat (orang)

263
264

marah
meradang,
berang, sangat
marah
membohong,
bercakap bohong
curi, mencuri
berlawan (secara
fizikal)
takut
betul
salah
susah
pukul
bercakap

265
266
267
268
269
270
271
272
273

smile, to
laugh, to
good, to be
(person)
bad, to be
(person)
angry, to be
furious, to be
lie, to;
fib, to
steal, to
fight, to
(physical)
afraid, to be
correct, to be
wrong, to be
difficult, to be
hit, to
speak, to

46

274

276
277
278
279
280
281
282
283
284
285
286
287
288

289a

Malay

memanggil (untuk
mendapatkan
perhatian
seseorang)
cerita
beritahu
bersiul
jawab, menjawab
nyanyi, menyanyi
tari, menari
jantung, buluh
centong (alat
muzik)
gendang
bermain
tendang
jatuh (orang)
gugur (buahbuahan)
jatuh dari tempat
tinggi (seperti
orang jatuh dari
pokok kelapa)
jinjing (seperti
beg plastik)

289b mendukung
(anak)
290
mengambin
291
292

junjung
pikul

293
294
295
296
297
299
300
301
302
303

pulang
datang
masuk
tunggu
bekerja
bayar
jual
beli
beri, bagi
melempar

304

baling (seperti
lembing)
baling (seperti
membaling bola
dalam sepak
takraw)

305

English

call, to (to get


s.o. attention)
story
tell, to
whistle, to
answer, to
sing, to
dance, to
stamper, striker
(musical
instrument)
drum, a
play, to
kick, to
fall, to (human)
fall down, to
(fruit)
fall down from a
height, to
carry in the
hand, to take, to
(plastic bag)
carry in arms, to
(child)
carry child in
cloth on back, to
carry on head, to
carry on
shoulder, to
return, to
come, to
enter, to
wait, to
work, to
pay, to
sell, to
buy, to
give, to
throw something
sidearm, to
throw
(overhand), to
throw
(underhand), to

306
307
308
309
310
311
312
313
314
315
316
317

322
323
324
325
326
327
328
329
330
331
332
333

334
335
336
337
338
339
340
341
342

Malay

membuang
(seperti sampah)
tarik
tolak
hari
pagi
tengah hari
petang
malam
esok
semalam
tahun
pondok sementara
dalam hutan,
rumah pisang
sesikat
rumah
kolong (bahagian
ruang di bawah
rumah)
atap
tingkap
lantai
selimut
tikar
cawat
kain pelikat, kain
sarung lelaki
kain sarung
perempuan
ikat
mengikat
(mengikat dengan
memusing seperti
mengikat sarung
lelaki)
seluar
kotor
cuci (tangan,
pinggan)
basuh (kain,
pakaian)
gosok baju tanpa
pakai berus (masa
basuh baju)
basah
kering
jemur
lap,mengelap
(meja)

English

throw away, to

pull, to
push, to
day, a
morning
noon
afternoon (late)
night
tomorrow
yesterday
year
lean-to, a

house
space under
house
roof
window
floor
blanket
mat
loincloth
sarong (mans)
sarong
(womans)
tie, to
tie by twisting,
to
(tie a sarong)
trousers
dirty, to be
wash, to (hands,
dishes)
wash, to
(clothes)
rub, to
wet, to be
dry, to be
dry in the sun, to
wipe, to (table)

47

343
344
345
346
347a
347b
348
349
350

Malay

sapu, menyapu
jahit, menjahit
jarum
memasak
masak, didih (air)
rebus (ubi kayu)
cerek
periuk
penuh

351a senduk
351b gayung
352a

lesung (dibuat
daripada batu
untuk menumbuk
lada/rempah)
352b lesung (untuk
menumbuk padi)
353a antan (untuk
menumbuk
lada/rempah)
353b antan (untuk
menumbuk padi)
354a tumbuk (seperti
menumbuk
lada/rempah)
354b tumbuk (seperti
menumbuk padi)
355
pisau
357
tajam
358
tumpul
359
belah, membelah
(kayu, buluh)
360
manis
361
masam
362
pahit
363
hitam
364
putih
365
merah
366
hijau
367
kuning
368
terang
369
gelap
370
baru
371
lama (benda)
372
bulat
373
lurus
374
sempit
375
tebal

English

sweep, to
sew, to
needle, a
cook, to
boil, to (water)
boil, to (tapioca)
kettle
cooking pot
full, to be
(container)
ladle, a
dipper (for
bathing)
mortar (stone)

mortar (for rice)


pestle (spices)
pestle (for
pounding rice)
pound in mortar,
to (spices)
pound in mortar,
to (rice)
knife
sharp, to be
blunt, to be
split, to
(bamboo, wood)
sweet, to be
sour, to be
bitter, to be
black, to be
white, to be
red, to be
green, to be
yellow, to be
bright, to be
dark, to be
new, to be
old, to be (thing)
round, to be
straight, to be
narrow, to be
thick, to be

Malay

376

nipis

377
378
379

licin
lebar
panjang

380
381
382
385
386
387
388
389
391
392
401
402
403
404
405
406
407
408
393

keras
berat
sama
lain
apa
siapa
bila
berapa
satu
dua
tiga
empat
lima
enam
tujuh
lapan
sembilan
sepuluh
banyak (benda)

394

ramai, banyak
(orang)
semua
sedikit
jauh
dekat
kanan
kiri
keluar
pandang dgn
tepat
mengikuti
memulas (air dari
kain)
menyembunyikan
diri
bersendawa
berkokok (ayam
jantan)
kesat, kasar (tak
licin)
ada
bujang (lelaki)
bangun

395
396
397
398
399
400
509
510
512
513
514
515
516
517
518
519
520

English

thin, to be
(thing)
smooth, to be
wide, to be
long, to be
(thing)
hard, to be
heavy, to be
same
other
what?
who?
when?
how many?
one
two
three
four
five
six
seven
eight
nine
ten
many (thing),
much
many (human)
all
some
far, to be
near, to be
right (side)
left (side)
to go/come out
to stare
to follow/pursue
to wring
to hide oneself
to burp
to crow/sing
(rooster)
rough (to the
touch)
there is
bachelor
to get up

48
Malay

521

bau ikan segar


(bukan busuk)
belum masak
(buah)
masak (buah)
menyengat

528
522
523

English

smell of fresh
fish
still unripe
(fruit)
ripe
to sting

524
525
526
527

Malay

anak yatin, piatu


membuat
di belakang,
selepas, terakhir
senja, senjakala

English

orphan
to make
in the back,
after, the last
to be twilight

Certain items on the wordlist caused problems. Table 13 is a compilation of the various issues that arose
regarding the wordlist. These comments include items that were removed from the list, in order to help
future researchers avoid some of the problems that were encountered.
Table 13. Comments on problems

No.

English

Malay

Comment

6
8b

river
swim (animal)

sungai
berenang (haiwan)

9a
9b

terapung
timbul

11

float: on the
surface
float: rise to
the surface
bathe

Usually the word for water is given.


Usually the same word as for humans
swimming.
These two words were often confusing despite
explanations.

mandi

16

thunder

guntur, guruh

20
21
25

east
west
earth, soil

timur
barat
tanah

29
30
31
33

gold
silver
iron
path (small)

emas
perak
besi
jalan (kecil)

36

ashes

abu

39

burn off
feathers

layur, bakar bulu

49a
59
61

tree trunk
pepper (red)
bamboo

batang pokok
lada
buluh

Care was needed to avoid eliciting the word


for rub.
The usual word given seems actually to be
the name of a legendary spirit in the clouds
that is responsible for making the noise.
Usually BM. Removed from list.
Usually BM. Removed from list.
Care was needed to stay with the BM tanah,
and avoid the various definitions of the
English earth.
Usually BM. Removed from list.
Usually BM. Removed from list.
Usually BM. Removed from list.
The addition of the word small was needed
to avoid confusion with the alternate
definition for the BM jalan that is, to go.
There did not seem to be a word for this.
Rather, the word for dust was usually given.
Often the word layur was not known.
Needed to be careful with the alternate
phrase bakar bulu in order to be sure it was
not heard as bakar buluh.
Often seemed to be confusing.
Usually BM. Removed from list.
It appears some of the variation in the
response was due to whether the speaker was
referring to young or old bamboo.

49

No.

English

Malay

Comment

71

blow-gun

sumpitan

73

dart-head

pangkal damak

74a
74b
78a
78b
79

dart (with
poison)
dart (without
poison)
kill
cause to die
stab

damak (yang ada


racun)
damak (yang tiada
racun)
bunuh
matikan
tikam

83

porcupine

landak

84

monkey (shorttailed)

monyet (ekor
pendek)

90

to bark

menyalak

92

rat

tikus

101

bee

lebah

103

insect

serangga

104
108

mosquito
spider

nyamuk
labah-labah

109

cockroach

lipas

111

monitor lizard

biawak

114a
114b

turtle (sea)
turtle (river)

penyu
labi-labi

117b

person, non-OA

orang, bukan-OA

Care was needed to get the object and avoid


getting the verb sumpit (to blow).
Care was needed because of the confusion
between the head and the butt, as to which
is the sharp end and which the dull.
Some dialects did not seem to have a
distinction, while others had as many as three
names (e.g. with no poison, with a little
poison, and with a lot of poison)
Attempts to distinguish between these two
words caused confusion.
There may be a difference depending on the
direction and force of the thrust, as well the
type of instrument.
There appear to be two species: large and
small.
There are many names for the different types
of monkeys, but apparently not for monkeys
in general. Asking for the name of the shorttailed monkey helped some, but monyet in
BM implies a long-tailed monkey. Also, in
some areas there exist more than one type of
short-tailed monkey.
There may be more than one type of barking,
perhaps akin to yelp and howl.
The response seemed to be either a particular
species name, or else the general term for
rodent.
It appears some dialects distinguish different
species.
Apparently there is no general term. Removed
from list.
Many dialects distinguish different species.
Some dialects appeared to have a general
term, while others distinguished several
species.
Most dialects appeared to distinguish at least
large and small varieties.
At least one dialect distinguished between the
type that lives on land versus the type that
lives in the water. Hence the variation in
response may reflect different species.
Since the Semai do not live near the sea, the
distinction between sea and river turtles was
often confusing. There may be confusion with
land tortoises as well.
Apparently there is no general term; rather,
the response was often others, different, or
a specific people group such as Malay or
Chinese.

50

No.

English

Malay

Comment

119

female

perempuan

123

father

bapa

124a
124b
125a
125b

abang
kakak
adik lelaki
adik perempuan

130

elder brother
elder sister
younger
brother
young sister
friend

Apparently there is no distinction between


woman and female.
Diffloths data implies there are different
terms perhaps one for the relationship and
another for the term of address.
There is usually just one term for older
sibling, not distinguishing gender.
There is usually just one term for younger
sibling, not distinguishing gender.

135a
135b

bald (natural)
bald (shaved)

136a
136b
141

cut hair with


scissors
cut hair with
something else
forehead

botak (telah gugur)


botak (telah
dicukur)
menggunting
rambut
potong rambut
(pakai alat lain)
dahi

145
152a
152b
146

throat
neck (back of)
neck (front of)
mouth

kerongkong
tengkok
leher
mulut

147b
147c
160

lip, upper
lip, lower
finger

bibir (di atas)


bibir (di bawah)
jari

161b
161c

perut, di atas pusat


perut, di bawah
pusat

165
166
168

abdomen,
upper
abdomen,
lower
lungs
heart
liver

167

intestines

tali perut, usus

kawan

paru-paru
jantung
hati

This word was narrowed to just friend, as


opposed to also including companion.
Most dialects did make a distinction between
these two.
Attempts to make a distinction between these
two usually caused confusion.
The response sometimes appeared to refer to
the fontanel (the soft spot on a babys head).
The Semai do not appear to make all three of
these distinctions.
There did not seem to be a general term,
hence this caused confusion. Removed from
the list.
There did not appear to be different words for
upper and lower lips.
Initially, attempts were made to distinguish
between finger and digit; however, this
usually caused confusion. Nevertheless, some
of the responses implied another (or similar?)
distinction that was not well understood.
Usually there was just one general term.
Attempts to elicit a distinction sometimes
resulted in the words for chest and waist,
respectively.
There was sometimes confusion regarding
these internal organs. Certainly some of the
confusion was due to the center of emotions,
and consequently quite a number of phrases
expressing emotion, being the heart in
English, but the liver in BM.
One dialect made a distinction between the
large and small intestines. Hence some of the
variation in responses reflect this distinction.

51

No.

English

Malay

Comment

181

strong

kuat

183b

sleep soundly

tidur nyenak

184

snore

berdengkur

185

yawn

menguap

189

wink

kelip mata

192

sniff

cium, menghidu

193a

decayed

reput

193b

rotten

basi

195

satiated, full

kenyang

198a

to suck out

sedut

198b

to suck on

hisap

205

hurt

sakit

207

itchy

gatal

209a

rake

cakar

209b

kais

210

scratch
(chicken)
cold

sejuk

234

old (animate)

tua

235

fat

gemuk

The term kuat seemed to be too generic,


since it can have rather different meanings
depending on what it refers to (e.g. people,
trees, rain, etc.) Hence the term was elicited
in reference to a strong person.
There did not seem to be a special word for
this. Responses tended to describe the state.
The response was usually the word for
breathe, or related to this word.
Needed to specify this is related to sleepiness,
so that there is no confusion with being agape
for other reasons, such as eating something
hot or spicy.
Needed to specify that this was one eye only,
not blinking both eyes.
The example of sniffing a flower was
specified so that there is no confusion with
the other meaning of cium, i.e. to kiss.
Needed to specify this referred to wood, for
example, and not food.
Needed to specify this referred to cooked food
that has spoiled.
Needed to guard against confusion with the
other meaning of the English word full, e.g.
a full container.
The term sedut was too general. Needed to
specify the context of sucking marrow from
bone, or sucking out snails
The term hisap was too general. Needed to
specify sucking on candy.
The term sakit can also mean sickness;
hence needed to specify the example of ones
hand that is hurting.
Needed to specify the condition of being itchy
(e.g. after being bitten by a mosquito, rather
than the property of causing itchiness (e.g.
miang, the fine hair on bamboo that feels
itchy).
This term needed the example of raking
leaves or rubbish.
This term needed the example of a chicken
scratching for food.
Care was needed to separate this term from
the condition of feeling cold from a fever.
Sometimes the response was old man and/or
old woman, rather than the generic word for
simply old.
Care was needed to avoid confusion with the
English noun, as in animal fat or lard.

52
No.

English

Malay

Comment

237a
237b
238a
238b
243
244

tall (person)
tall (tree)
short (person)
short (tree)
to blow
to blow on

tinggi (orang)
tinggi (pokok)
pendek (orang)
pendek (pokok)
hembus
tiup

258
259
260
261
267

to smile
to laugh
good
bad
to fight

senyum
ketawa
baik
jahat
lawan, gaduh

274

to call

memanggil

276

story

cerita

278

to whistle

bersiul

286
288

to fall
(animate)
to fall from a
height
to carry a bag

jatuh (orang)
jatuh dari tempat
tinggi

Usually the response was the same for both


people and trees.
Usually the response was the same for both
people and trees.
Too generic. Removed from list.
To avoid possible confusion because of the
general term, the example of to blow on a
fire was used.
Some dialects made this distinction, but
others had the same word for both.
The example of good person was used.
The example of bad person was used.
This seemed to cause confusion, or there
wasnt a ready term. Usually the response was
the same as hit.
The term memanggil was too general. The
example of to call to get someones attention
was used.
This seemed to cause confusion, or there
wasnt a ready term. Sometimes the response
was the word legend.
Usually the term bersiul was not known.
Needed to demonstrate by whistling.
Did not seem to be a distinction.

mendukung (anak)

315

to carry in
arms (child)
to carry in
cloth
to carry on
shoulder
to throw
sidearm
to throw
overhand
to toss
to throw away
yesterday

317

lean-to

pondok sementara

333

to tie by
twisting

simpul

289a
289b
290
292
303
304
305
306

jinjing

ambin
pikul
lontar
baling, lempar
buang
buang (sampah)
kelmarin, semalam

Needed to specify the example of carrying a


plastic bag dangling from ones hand.
There was sometimes unresolved confusion
between these two terms despite trying to
specify examples.
Care was needed to avoid confusion with
carrying on a pole between two people.
The Semai appear to make different
distinctions; that is, what they throw and the
terms for how they throw it did not seem to
be organized the same as BM. Responses were
almost always confused.
The term semalam may have been
interpreted as meaning last night only, and
not the more generic yesterday.
Needed to specify a temporary shelter when
one was forced to overnight in the jungle.
Needed to specify the example of the
twisting/tying of a mans sarong so that it
does not fall down.

53
No.

English

Malay

Comment

336

to wash
(hands)
to wash
(clothes)
to rub (clothes
while washing)

cuci

Needed to give the example of washing


hands, as opposed to clothes.
Need to specify clothes.

340
341

dry
to dry

kering
jemur

346

to cook

memasak

347a
347b

didih
rebus

350

to boil (water)
to boil
(tapioca)
full

355

knife

pisaw

356

knife, hooked

pisaw

361

sour

masam

384
390
393

also
count
much

juga
bilang
banyak

395

everyone

semua

511

to grope

menyentuh

522

ripe

masak

529

to spread

hampar

337
338

basuh
gosok

penuh

The term gosok was too general. The


example of to rub while washing clothes was
used.
Needed to avoid confusion in the English
between the state of being dry, and to dry
something.
Needed to avoid confusion in BM of the
adjective masak, meaning ripe.
Needed to distinguish between boiling water
and cooking something by boiling it in
water.
Needed to guard against confusion with the
other meaning of the English word full, e.g.
to be full (satiated).
Many Semai make a distinction between a
knife (small) and a parang (large knife, or
machete). This distinction was not known for
the early lists.
This type of knife was unknown to the Semai.
Removed from list.
For many Semai, their word also could mean
salty.
Usually BM. Removed from list.
Usually BM. Removed from list.
This distinction, as opposed to many, was
merged into one item.
This distinction, as opposed to all, was
merged into one item.
Removed from list because of somewhat
embarrassing (e.g., sexual) connotations.
Needed to give the example of fruit, to avoid
confusion with the other BM masak,
meaning to cook.
Too generic. Removed from list.

Appendix C: Language assistant questionnaires


Questionnaires were used in interviews with many of the language assistants who gave the wordlists.
The questions were developed in order to establish, among other things, the reliability of the data. For
example, it is reasonable to expect that a language assistant who has always lived in a certain language
community and has married someone also from that community will be able to provide more reliable
data than another individual who has recently moved there or has married someone from another dialect
area.
A few questions also probed language attitudes and language use. Again, this helps in establishing
reliability of the data. A good language assistant is one who loves his or her language and uses it as often
as possible.
Lastly, a number of questions asked the language assistant about how well he or she was able to
understand and/or communicate in other dialects. The answers to these questions may provide
preliminary direction for future research into how well the Semai speakers are able to communicate with
those from other dialects despite the many phonological and lexical differences.
The following pages provide the data that was collected. [Please note the following abbreviations:
BM = Bahasa Malaysia, BS = Bahasa Semai.]

54

55
Language Code
SEA-F
Gender
M
Age
30+
Born where?
Kg. K. Kenip, Lipis, Pahang
Grew up where?
Kg. K. Kenip, Lipis, Pahang
Language used at school
BM
Now lives where?
Kg. K. Kenip, Lipis, Pahang
Has ego lived elsewhere?
(no answer)
Where else?
Father born where?
Kg. Belida, Lipis, Pahang
Father grew up where? Kg. Belida, Lipis, Pahang
Mother born where? Kg. K. Kenip, Lipis, Pahang
Mother grew up where?
Kg. K. Kenip, Lipis
Spouse born where?
Kg. K. Kenip Ulu, Lipis
Spouse grew up where? Kg. K. Kenip Ulu, Lipis
Ego used what lang. as a child?
BS only
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
BS only
Third Child
BS only
Fourth Child
BS only
Fifth Child
BS only
Sixth Child
BS only
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
BS only
Third Child
BS only
Fourth Child
BS only
Fifth Child
BS only
Sixth Child
BS only
What language do the children use with
their grandparents
BS only
their friends
BS only
What language does ego use to
buy food
BM only
sell to others from the [local]lang.
BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
No
If so, what languages?
Has ego written in the [local] language?
No
If so, what?
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS
talk about spiritual things
BS
dream
BS
count
BM

speak when startled


speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
identity, better relationship within community
What advantages does ego see in being able to
read and write in the [local] language?
identity, language preservation
Which dialect is the hardest for this ego to
understand?
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
(no answer)
Jelengkok, Betau
BS only
Simoi Baru, Lipis
BS only
Pos Buntu
BS only
Bertang
BS only
(no answer)
Batu 17, Tapah
BS only
Bidor
BS only
Kg Chinggung, T Malim
BM only
Gopeng, Kinta
BS only
Kampar
BS only
Has ego
met people
from?
Temiar
Temuan
Ja Hut
Jakun

Did they use


How well
their own
could ego
language?
understand?
No
0
No
0
No
0
No
0
No
0
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
No
Which?
Radio 7
Can ego understand?
Yes
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
Yes
If so, which ones?
Temiar
Other notes:

56
Language Code
SEA-W
Gender
female
Age
55
Born where?
Sg. Bil
Grew up where?
Sg. Bil
Language used at school
tak tahu
Now lives where?
Sg. Bil
Has ego lived elsewhere?
No
Where else?
Father born where?
Kuala Slim
Father grew up where?
Kuala Slim
Mother born where?
Sg. Bil
Mother grew up where?
Sg. Bil
Spouse born where?
Kg. Rasau
Spouse grew up where?
Kg. Rasau
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
BS only
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
BS only
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
No
If so, what languages?
Has ego written in the [local] language?
No
If so, what?
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS
talk about spiritual things
BS
dream
BS
count
BS

speak when startled


BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
people understand better
What advantages does ego see in being able to
read and write in the [local] language?
easy to understand
Which dialect is the hardest for this ego to
understand?
Jelai (Pahang), Semai in Selangor, Terisu (Perak)
Where is that dialect spoken
Jelai (Pahang), Semai in Selangor, Terisu (Perak)
What language does ego use with adults from
The same kampung
BS only
Kluny, Slim River
BS only
Tapah
BS only
Pos Slim, Parit
BS only
Bota, Parit
BS only
Batu 7, Kampar
BS only
Betau, Pahang
BS only
Telom, Pahang
BM only
Jelai, Pahang
BM only
Batu 17, CH
BS only
Simoi, Raub
BM only
Has ego
met people
from?
Temiar

Did they use


How well
their own
could ego
language?
understand?
No
0
No
No
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
Yes
Which?
Semai, Temiar
Can ego understand?
No
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
No
If so, which ones?
Other notes:

57
Language Code
SEA-V
Gender
female
Age
24
Born where? Kg. Chinggung, Behrang, T.Malim,
Perak
Grew up where?
Kg. Chinggung, Behrang,
Language used at school
BM
Now lives where?
Kg. Chinggung, Behrang
Has ego lived elsewhere?
(no answer)
Where else?
Father born where?
Father grew up where?
Langkap
Mother born where?
Kg. Chinggung, Behrang
Mother grew up where?
Kg. Chinggung
Spouse born where?
na
Spouse grew up where?
na
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
(no answer)
Parents
BS only
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
(no answer)
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
(no answer)
their friends
(no answer)
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
No
If so, what languages?
Has ego written in the [local] language?
Yes
If so, what? letter
In what language does ego
think
BS
talk about health
BS (kampung), BM (with
doktor)
talk about finances BS(kampung), BM(outside)
pray
BS
talk about spiritual things BS (kampung), BM

(outside)
dream
BS
count
BS
speak when startled
BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] lang.? Yes
What advantages does ego see in being able to
speak the [local] language?
easier to express self in own language
What advantages does ego see in being able to
read and write in the [local] language?
good if you have 2 languages, BM & BS
Which dialect is the hardest for this ego to
understand?
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
(no answer)
Pos Slim, Parit
BM only
Kampar Gopeng
BM only
Batu 7, Kampar
BM only
Kluny, Slim River
BS only
Tapah
BS only
Raub, Pahang
BM only
Jelintoh, Pahang
BM only
Cherong, Pahang
BM only
Jelai
BM only
Telom
BM only
Has ego
Did they use
How well
met people their own
could ego
from?
language?
understand?
Temiar
No
0
Jakun
No
little bit,
related to
Temuan
No
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
Yes
Which?
Temuan, Semai;older helper: Jakun,
Temuan, Semai
Can ego understand?
Yes
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
Yes
If so, which ones?
yang dari jauh
Other notes:

58
Language Code
SEA-U
Gender
male
Age
33
Born where?
Kg. Cherong, Sinderut, Pahang
Grew up where? Kg. Cherong, Sinderut, Pahang
Language used at school
BM
Now lives where?
Kg. Cherong, Sinderut
Has ego lived elsewhere?
No
Where else?
Father born where? Kg.Ganchar (near Cherong)
Father grew up where?
Kg. Ganchar
Mother born where?
Kg. Jakek, Sg. Cherong
Mother grew up where? Kg. Jakek, Sg. Cherong
Spouse born where?
Kg. Cherong, Sinderut
Spouse grew up where?
Kg. Cherong, Sinderut
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
BS only
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
Yes
If so, what languages?
BS & BM
Has ego written in the [local] language?
No
If so, what?
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS
talk about spiritual things
BS
dream
BS
count
BS

speak when startled


BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
to not forget their own language, & knowledge
(med)
What advantages does ego see in being able to
read and write in the [local] language?
memberi erti terutamanya kpd yg tak tahu BM
Which dialect is the hardest for this ego to
understand?
Where is that dialect spoken?
Parit and Ipoh
What language does ego use with adults from
The same kampung
BM only
Kg. Tenau
BS only
Kg. Buntu, Kelang
BS only
Kg. Buntu, Kerayong,
(no answer)
Jelengek
kg. Santat, Sempoh,
BS only
Labu
(no answer)
Parit
(no answer)
Gopeng, Kampar
BM only
Slim River
mostly BS
(no answer)
(no answer)
Has ego
met people
from?
Temiar
Lanoh
JaHut

Did they use


How well
their own
could ego
language?
understand?
No
0
No
0
No
0
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
No
Which?
Semai (faham), Temiar,
Jakun Semelai, Temuan
Can ego understand?
Yes
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
Yes
If so, which ones?
Temuan
Other notes:

59
Language Code
SEA-S
Gender
female
Age
24
Born where?
Gombak
Grew up where? Gombak (till age 6) Jelangkok
(RPS) Betau
Language used at school
BM
Now lives where?
Kg. RPS, Betau
Has ego lived elsewhere?
Yes
Where else?
Kg. Simoi Lama, Jelai, Kg Sinderut
Father born where?
Bkt. Long, Kg. Sinderut
Father grew up where? Kg. RPS Betau, K. Lipis
Mother born where?
Baru Gandang, Jelai,
K. Lipis
Mother grew up where?
Kg. Simoi Lama, Jelai
Spouse born where?
Fiance Kg. Simoi Lama,
Jelai
Spouse grew up where?
Kg. Simoi Lama, Jelai
Ego used what lang as a child?
mostly BS
What language does ego use with
Spouse
mostly BS
Parents
mostly BS
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
mostly BS
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
mostly BS
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
Yes
If so, what languages?
BS & BM
Has ego written in the [local] language?
Yes
If so, what? letter, list
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS

talk about spiritual things


BS
dream
BS & BM
count
BS & BM
speak when startled
BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
Language preservation, identity
What advantages does ego see in being able to
read and write in the [local] language?
Language preservation
Which dialect is the hardest for this ego to
understand?
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
mostly BS
Tenau
BS only
Batu 17, CH
BS only
CH
BS only
Raub
BS only
Telom
BS only
Gopeng, Perak
(no answer)
T. Malim
BM only
Slim River
BS only
Chingugng, Kluny
BS only
BM only
Has ego
met people
from?

Did they use


How well
their own
could ego
language?
understand?
No
No
No
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
No
Which?
Can ego understand?
No
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
No
If so, which ones?
Other notes:

60
Language Code
SEA-N
Gender
male
Age
49
Born where?
Kg. Chang Lama, Bidor, Perak
Grew up where?
Kg. Chang Lama, Bidor, Perak
Language used at school
BS & BM
Now lives where?
Kg. Chang Lama, Bidor
Has ego lived elsewhere?
No
Where else?
Father born where?
Kg. Chang Lama, Bidor
Father grew up where? Kg. Chang Lama, Bidor,
Perak
Mother born where?
no answer
Mother grew up where?
Spouse born where?
Kg. Chang Lama, Bidor
Spouse grew up where? Kg. Chang Lama, Bidor
Ego used what lang as a child? BS/BM equally
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
BS/BM equally
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?
No
If so, what languages?
Has ego written in the [local] language?
No
If so, what?
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS
talk about spiritual things
BS
dream
BS

count
BS
speak when startled
BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
easier to understand
What advantages does ego see in being able to
read and write in the [local] language?
better
Which dialect is the hardest for this ego to
understand?
Jelai, Pahang
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
(no answer)
Tumbuh Hangat
BS only
Gopeng
BS only
Kampar
BS only
Sungkai
BS only
Kluny
BS only
Raub
BS only
Lipis
BS only
Betau
BS only
Jelai
BS only
Telom
BS only
Has ego
Did they use
How well
met people their own
could ego
from?
language?
understand?
Temiar
Yes
litlle bit
Lanoh
No
0
Sabm
No
0
Semnam
No
0
Ja Hut
No
0
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
Yes
Which?
some of it
Can ego understand?
No
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
No
If so, which ones?
Other notes:

61
Language Code
SEA-T
Gender
male
Age
25
Born where?
Kg. Simoi Lama
Grew up where?
Kg Simoi Baru
Language used at school
BM
Now lives where?
Kg. Simoi Baru
Has ego lived elsewhere?
Yes
Where else?
Kg. Jelengkek (for 2 months)
Father born where?
Kg. Simoi Lama
Father grew up where?
Kg Simoi Baru
Mother born where?
Kg. Simoi Lama
Mother grew up where?
Kg. Simoi Lama
Spouse born where?
Spouse grew up where?
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
(no answer)
Parents
(no answer)
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
(no answer)
First Child
(no answer)
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
(no answer)
their friends
(no answer)
What language does ego use to
buy food
(no answer)
sell to others from the [local] lang. (no ans.)
sell to outsiders
(no answer)
speak to teachers
(no answer)
speak at a government office
(no answer)
Does ego mix language with friends?(no answer)
If so, what languages?
Has ego written in the [local] language?
(no
ans.)
If so, what?
In what language does ego
think
talk about health
talk about finances
pray
talk about spiritual things
dream

count
speak when startled
speak to animals
Does ego want his/her grandchildren to speak
the [local] language?
No
Does ego think it is good if his/her
grandchildren can read the [local] language?
No
What advantages does ego see in being able to
speak the [local] language?
What advantages does ego see in being able to
read and write in the [local] language?
Which dialect is the hardest for this ego to
understand?
Slim River & Batu 18, Perak
Where is that dialect spoken?
Slim River &
Batu 18, Perak
What language does ego use with adults from
The same kampung
BM only
Raub, Pahang
BS only
Telom (Sg. Serau)
BS only
Kg. Sinderut
BS only
Kg Buntu
BS only
Jelai
BS only
Kg. Ulu Groh, Perak
BS only
Kluny, Slim River
BS only
Senderiang
BS only
BS only
BS only
Has ego
met people
from?
Temiar
Lanoh
Ja Hut

Did they use


How well
their own
could ego
language?
understand?
No
little
No
0
No
0
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
Yes
Which?
Semai, Temiar, Jakun, Semelai
Can ego understand?
Yes
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
Yes
If so, which ones?
all
Other notes: In question 47b (speaker only
understands Semai)

62
Language Code
SEA-Q
Gender
male
Age
42
Born where? Kg. Sungai Teras, Kuala Slim, Perak
Grew up where?
Kg. Sungai Teras &
Trolak Timur
Language used at school
BM
Now lives where?
Kg. Sg. Tisung, Sungkai,
Perak
Has ego lived elsewhere?
Yes
Where else?
Trolak Timur (10 yrs)
Father born where?
Kg. Rasau, Slim
Father grew up where?
Kg. Rasau, Slim
Mother born where?
Trolak Timur, Slim River
Mother grew up where?
Trolak Timur,
Slim River
Spouse born where?
Kg. Sg. Tisung, Sungkai,
Spouse grew up where? Kg. Sg. Tisung, Sungkai,
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
BS only
Third Child
BS only
Fourth Child
BS only
Fifth Child
BS only
Sixth Child
(no answer)
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
BS only
Third Child
BS only
Fourth Child
BS only
Fifth Child
BS only
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
BS only
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BM only
speak at a government office
BM only
Does ego mix language with friends?(no answer)
If so, what languages?
Has ego written in the [local] language?
Yes
If so, what? sermon
In what language does ego
think
BS & BM
talk about health
BS
talk about finances
BS
pray
BS

talk about spiritual things


BS
dream
BS
count
BS
speak when startled
BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
Identity, secret language
What advantages does ego see in being able to
read and write in the [local] language?
preservation of MT
Which dialect is the hardest for this ego to
understand?
na
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
BS only
Chang, Bidor
BS only
Gopeng
BS only
Kampar
BS only
Sungkai
BS only
Kluny
BS only
Raub
BS only
Lipis
BS only
Betau
BS only
Jelai
BS only
Telom
BS only
Has ego
Did they use
How well
met people their own
could ego
from?
language?
understand?
Temiar
No
Lanoh
No
Sabm
No
Semnam
No
JaHut
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
No
Which?
Can ego understand?
No
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
No
If so, which ones?
Other notes:

63
Language Code
SEA-R
Gender
male
Age
23
Born where?
Kg Pos Buntu
Grew up where?
Kg Pos Buntu
Language used at school
BM
Now lives where?
Kg Pos Buntu
Has ego lived elsewhere?
Yes
Where else?
Bota (6 mos)
Father born where?
Slim River
Father grew up where?
Kg Pos Buntu
Mother born where?
Kg Pos Buntu
Mother grew up where?
Kg Pos Buntu
Spouse born where?
Kg Pos Buntu
Spouse grew up where?
Kg Pos Buntu
Ego used what lang as a child?
BS only
What language does ego use with
Spouse
BS only
Parents
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language does egos spouse use with
Ego
BS only
First Child
BS only
Second Child
(no answer)
Third Child
(no answer)
Fourth Child
(no answer)
Fifth Child
(no answer)
Sixth Child
(no answer)
What language do the children use with
their grandparents
BS only
their friends
BS only
What language does ego use to
buy food
BM only
sell to others from the [local] lang. BS only
sell to outsiders
BM only
speak to teachers
BS/BM equally
speak at a government office
BM only
Does ego mix language with friends?
Yes
If so, what languages?
BM & BS
Has ego written in the [local] language?
Yes
If so, what? stories, songs
In what language does ego
think
BS
talk about health
BS
talk about finances
BS
pray
BS
talk about spiritual things
BS
dream
BS
count
BM

speak when startled


BS
speak to animals
BS
Does ego want his/her grandchildren to speak
the [local] language?
Yes
Does ego think it is good if his/her
grandchildren can read the [local] language?
Yes
What advantages does ego see in being able to
speak the [local] language?
on tape
What advantages does ego see in being able to
read and write in the [local] language?
on tape
Which dialect is the hardest for this ego to
understand?
Gopeng
Where is that dialect spoken?
What language does ego use with adults from
The same kampung
BS only
K. Lipis
BS only
CH
BS only
Jelai
BS only
Kuala Medang
BS only
(no answer)
Gopeng
BS only
Bota
BS only
T. Malim
BS only
Bidor
BS only
Tapah
BS only
Has ego
met people
from?
Temiar

Did they use


How well
their own
could ego
language?
understand?
No
0
No
No
No
No
Does ego listen to radio broadcasts in Semai,
Temiar, Jakun, and Semelai?
Yes
Which?
Semai, Temiar, Semelai
Can ego understand?
Yes
Is ego interested in learning to speak Temiar,
Jakun, Semelai?
Yes
If so, which ones?
all
Other notes: Notes on 47: Semai- Gopeng: sekit,
Temir (mixed with BM): sekit, Semelai:sekit (dekat
dgn BM)

Appendix D: Isoglosses as determined by phonological changes


The following is a list of the phonological changes found in the Semai dialects studied in this survey.
a) Final preploded nasals became voiceless plosives (in most dialects, except S Betau, E Bertang, U
Cherong, R Pos Buntu, and F Kuala Kenip).
b) Final glottal stops were lost after long vowels (B Gopeng).
c) /*oo/ became unrounded to // (AA Renglas) or // (Y Serau and Z Lanai) in all *
environments except before final /*-h/ and /*-/, where it became the diphthong /u/.
d) /*/ became backed and rounded to // in all environments (B Gopeng and M Kampar).
e) /*ee/ became centralized and raised to // in all environments (B Gopeng, M *oo
Kampar, and J Tapah).
f)

/*/ became rounded to /oo/ (AA Renglas), centralized, raised, and rounded to
// (Y Serau and Z Lanai), but simply centralized and raised to // in all other dialects. This
change happened in all environments.

g) /*u/ became lowered to /o/ in all environments (M Kampar, X Sungai Ruil, and DD Terisu).
h) Final palatal consonants /*-c/ and /*-/ first merged to /*-c/, and then shifted to /-t/ or /-k/. Also,
final /*-/ shifted to /y/ (V Chinggung and W Sungai Bil, and some lexical items at H Cluny).
These changes occurred in all environments.
i)

/*i/ became /ii/, /ee/, or //, in all environments.

j)

/*oo/ became /w/ before alveolar (*-t, *-dn, *-n, *-r, *-l) and palatal (*-c, *- , *-, *-s) final
consonants (H Cluny, V Chinggung, and W Sungai Bil).

k) /*Noo/ became /Nuu/ (B Gopeng, G Tangkai Cermin, O Bota, X Sungai Ruil, DD Terisu), or
/N/ (AA Renglas), or /N/ (all other dialects).
l)

/*/ and /*i/ became /y/ before final labials (*-p, *-bm, *-m, *-w) and final alveolars (*-t, *-dn,
*-n, *-r, *-l, *-s) (H Cluny, V Chinggung, and W Sungai Bil).

The following pages show the isogloss maps for these phonological changes.

64

65

65

66

66

67

68

69

70

71

72

73

74

75

76

77

Appendix E: Proto-Semai lexical items


Table 14 lists some proto-Semai words reconstructed on the basis of material included in the Semai
wordlists. A lexical item has been reconstructed only when cognate items were found in two or more
areas. Naturally, no reconstruction was attempted when a lexical item was found in only one dialect.
Note that the semantic range of a cognate in one language or dialect often does not completely match
that of the cognate in another. Therefore, the gloss postulated for a reconstructed lexical item may differ
from that of one or more of the cognates and may also differ from that of the wordlist item. When it
seems clear that the gloss of the reconstructed lexical item is different from that of the wordlist item, the
gloss postulated for the reconstructed item is enclosed in parentheses immediately following the item.
Borrowings from other languages are included but not marked separately.
The symbols for proto-Semai proposed by Diffloth are used so as not to introduce conflicting symbols
into the literature. Where Diffloth has already postulated a proto-form, it is listed here followed by a
dagger ().
Table 14. Reconstructed proto-Semai words
1

sun

moon

*c

sky

*blii, *rhuu, *swiik

star

water, river

4
7

flow

float (on surface)

*mt iis, *pr

*prlooy, *bint
*teew

*l

swim (human or animal)

*ly

10

sink

*krp, *clk, *snrp, *tlbm

12

shallow

*ncees

14

rain

*mn, *pk

11
13

bathe

deep (opposite shallow)

*timl

*mhmh
*ree

15

lightning

*bc

17

rainbow

*cdw

16

thunder

*r (thunderstorm), *kuu

18

shadow

*wk

22

stone

*suk, *btu

19

wind

23

mountain

25

earth

24
26

*pp

mud

*py

dust

*hbu

sand

32

salt

33

*loodn, *lml

cave

27

28

*pooy, *ps

path (small)

*ti

*smby

*m pc
*n

77

78
34

fire

36

ashes

*hbuu

38

burn (wood)

*tt, *p, *h

40

forest

*rs, *srk, *drt

42

seed

44

root

35
37
39
41
43
45

46

firewood

smoke

burn off (eg. feathers)


grass

*s

*cs, *rs
*cs, *cuul

*r, *rbm

*sp, * , *llk

*mt, *kb, *bnh

leaf

*sl

dig

*coobm

*ris, *kr, *coo

look for

*k

48

thorn

*rl

49b

branch, tree

*kn hu, *c

51

fruit

*pl

53

banana

*tly

55

coconut (ripe)

*r / *r r

57

ginger

*kr

60

kapok

*kbuu

62

bamboo shoot

*rboo

47
49a
50

52
54
56

58
61

flower

trunk, tree
bark

papaya

coconut (unripe)

eggplant

mushroom
bamboo

*bu

*kee, *kn, *lloow, *ct

*ckp, *cwk, *h
*pl ptik

*r / *r m, *niy m
*tro

*btees

*wt, *poo, *leew

63

rattan

*coo

65

leaf (betel)

*sirih, *kluk, *rk

64
66

betel nut

*bl

lime

*kp

rubber

*cbr

67

spit (betel juice)

69

tap (a tree)

*ii, *c, *kooh, *mot, *wr

71

blow gun

*blw

darthead (blunt end)

*brl, *pkl

68
70
72
73

74a

ipoh tree

blow

dart (with poison)

*[]th

*kee, kn (tree), *dk (poison), *tnk


*tuh, *puut

*roo, *siseey, *sasee

79
74b

dart (without poison)

76a

spear (iron)

*trk

77

hunt

*lp, *uml

75

76b

quiver

spear (bamboo)

*smc

*lk

*mt

78

kill

*prddn

80

shoot

*bdil

82

pig (wild)

*w, *l, *uy, *mhr

porcupine, small

*pcr, *lok dk

79
81

stab

bear (honey)

*ck

*brw, *sml, *bahwoow

83

porcupine, large

84

monkey (short-tailed)

*d, *rw, *hl, *cbh

deer

*rus, *bh tb

83a
85

rabbit

87

tiger

86

*kuus, *bh ork

*rnp

*rk, *mrs, *, *t

88

tail

*snt

90

bark (to)

*l

92

rat

*kn, *prook

89

91

dog
cat

*coo

*kuci

93

bite

*kp

95

fly (v)

*h

97

feather, body hair

94
96

98

bird

*cbm

chicken

*puk

egg

*pnl, *kt

*sntl

99

wing

*kny

101

bee

*smc (stinin insect), *luwy, *dn

103

insect

*ch, *ls

105

louse (chicken)

100

102

104

butterfly

fly [n]
mosquito

*krkbk, *twk, *sirooy


*rooy

*kmt, *sb, *hp, *lul, *kbo

*c, *bruuy, *mc

106

louse (head)

*c

108

spider (small, house)

*twii, *tp, *mhl, *mn

107

108a

termite

spider (large)

*r

*b[]hl

109

cockroach

*sr, *sur, *pph

111

lizard (monitor)

*ri / ree, *p[]ree, *trkl, *hrk

110

snake

*tii, *tu

80
112b snail (garden)

*too, *kloo

112c

snail (water)

113

frog

114a

turtle (sea)

*too, *kloo, *k kp, *kmbuweey


*tbk, *kroo, *k uk, *op, *st, *br,
*k skk
*p[]s, *p, *koh

115

fish

*k

114b turtle (river)


116

117a

crocodile

person (OA)

117b person (non-OA)

*p[]s, *lpiil, *k h, *k pl

*bhy

*sy , *my

*my kil

118

man

*krl

120

child (offspring)

*knn

122

mother

119
121

123
124
125

woman, female

children
father

sibling, elder brother or


sister
sibling, younger

*krdoor

*syt / *st

*m, *mh
*bn (ones own), *bee, bh, *yh,
*p
*tn
*bn , *b (not ones own)

126

sibling, youngest

*iluc, *lc

128

wife

*knh

130

friend

*rnm, *kwdn

131

name

127

129

130x

husband
widow

*nsiir

*blu, *nd

companion

*roop

head

*kuuy

comb

*sikt

135b bald (shaved)

*lc

132
133
134

135a

hair

bald (natural)

*mh, *m h
*sk

*lc, *prl

136

scissors

*unti

137

eye

*mt

139

ear

*nt

141

forehead

*tmi[i], thi, *lbuudn (fontanelle)

136a
138
140

cut hair (with scissors)


nose

face

142

eyebrow

144

chin

143

cheek

*ko[o]h, *ts
*mh

*rns, *ulh

*sntl mt, *smpooy mt, *stst mt

*[k]m , *kp

*k, *ck

81
145
146

throat

mouth

*m pk

tooth

*lm

147

lip

149

tongue

151

brain

148
150
152a

gums

neck, (back of)

152b neck, front of


153

back

155

armpit

154

*rook, *krukuk, *t n, *rukuk

*yy
*lntk

*lsi

*kloobm
*tn
*ck

*cloodn, *knk

shoulders

*lpl, *pk

156

elbow

*kn

158

palm (of hand)

*pl, *tpr, *tpk

160

finger

157
159

hand

nail (finger)

*krndook, *sn
*t

*croos

*cnr, *rs

161a

abdomen

162

abdomen (lower)

*w

164

navel

*sook, *pdeek

heart

*noos, *ntu

161b abdomen (upper)


163

buttocks

*kt, *c

*kt, *nth (chest)


*kit, *skil

165

lungs

167

intestines

*coo c, *c pdl, *c wdn

169

bone

171

skin

*t, *kpoobm

166
168
170

liver

rib

173

dry

175

thigh

174

*suup

*riis, *klp

*crs

*soot, *tiil

flesh

*sc

176

knee

*kurool

178

foot

*u

180

toes

*cnr, *rs

tired

*hl, *sly

snore

*smr

177
179

calf , lower leg

heel

181

strong

183

sleep

182
184

*lmp[]

*km
*cn, *kul , *kn l
*kuwt, *tr

*bt

82
185
187

yawn

see

*kihy

*n , *blh, *loow

188

look

*n , *tiyw

190

deaf

*p rtee, *tip nt

189

wink

*kbr, *kp mt

191

hear

192

smell

193a

rotten, decayed

*rtee, *cry, *thn


,
*y , *y, *ht, *, *hn
*st / *st
*s, *bbh

194

hungry

*cuw, *cms / *cm s

196

eat

193b rotten (food)


195
197

satiated

thirsty
suck out (snail, bone
198a
marrow)
198b suck on (candy, sweet)
199

*c

*slt
*huc, *ht , *sb, *spt
*sb, *b, *kmm, *lt

*bc

liquor

*teew bl

drunk

202

drink

204

vomit

203

*b[]hee

lick

200

201

*s, *bri

*bl
*t

swallow

*liibm

205

hurt

*phoot, *t

207

itchy

*bh, *sc

206
208

swollen
scratch (v)

*k
*s

*ees, *eeh, *ih

209a

rake

*ps, *kpc, *ktc

210

cold, to feel

*dkt (cold, feverish), *sc

212

cough

*khl

214

hot

216

saliva

209b scratch (chicken)


211

213
215

shiver

sneeze

*ps, *kwc

*kr, *kr

*rmh, *cs, *brsidn

*bkt, *pliit

sweat

*sbm, *mkt

spit

*[]th

220

cry

*bm

222

blood

217

219
221

223

tears

*lhi

*teew mt

urine

*nm

excrement

*c

*bhiibm

83
224

defecate

*chcoh

226

boil, abscess

*dk

228

medicine

*ply

230

shaman, herbal curer

225
227
229

pus

scar

*bmb

*tm, *hl, *wk, *bkt, *dl

incant

*ch, *thr, *bicw

231

live, to

*rees (to be live, erect), *suuy

233

bury

235

fat

*hiidn, *bc, *bcoo, *hiit

237

tall (animate, inanimate)

*cr, *lsiik

240

big

242

breathe

232
234
236
238
241
244

*hl, *pw

die

*ddn

old (animate)

*liiw, *r, *mnh, *tt, *

skinny

*coobm

*soor, *soo

short (animate, inanimate)

*pti, *kuti, *lt, *l

small

*mct , *ncn, *kk, *ll

*ntooy

*lhm

blow on

*thool

stand

245

sit

247

walk

*ciip

249

run

*r

251

slow

253

know

246
248
250
252

254
255

256

crawl

*[]c, *drs, *culs, *tc / *tc

think

*s (awareness), *oo

forget

dream

choose

love

259

laugh

261

bad (person)

264

furious

266

steal

260
263

265
267

*wdn

fast

257

258

*y

*lih, *liiw, *prlhdn, *mnn , *lmt


*s (awareness), *pny
*sibm

*brpoo, *m poo, *bkee

*pileh

*hoo

smile

*[kr]m, *krm, *luk

good

*br, *il

*luk

*ds, *nc

angry

*r, *my, *ls

fib, lie

*loodn, *lnn, *dh

fight

*kh, *lhii

*bl, *ckh

*sic

84
268

afraid

*sh

270

wrong

*s

272

hit

274

call (get s.o.s attention)

*cree

277

tell

*k pny

279

answer

269
271

273
276

278
280

correct
difficult

*pyh, *sush

speak

*yp, *doot, *cl

story

sing

*ul, *luu, *

stamper , striker

*cnt

play

*mn, *toodn

283

drum

285

286
288
289

289a

*crmr, *cnl

*h

dance

284

*kh, *t, *pt / *pt

whistle

281

282

*kml, *kn

kick
fall down (animate,
inanimate)
fall down from a height

take

carry in the hand, take

*brlt, *brl, *doot, *wp, *bls


*s, *ruen, *ut
*th, *tubuk, *ito, *rbn

*tk, *tnd[], *sip


*r, *th, *y
*y, *th

*kt, *, *ciduubm
*tk, *

289b carry in arms (child)

*cduubm, *b

291

*tkl, *unu[]

290
292

carry in cloth

carry on head

*b, *l

293

carry on shoulder
carry on shoulder, two
people
return

*uk, *ls

295

enter

*mc, *plt

292a
294

come

296

wait

298

earn

297
299

work

pay

*ulbm, *l

*[]ndr
*hool, *tl
*p

*kr

*kbm

*byr

300

sell

*uwl

302

give

*k

301

303
304

305

306

buy

throw (s.t. sidearm)


throw overhand

toss (throw underhand)


throw away

*bli, *kt
*pc, *uwl, *rw, *pk
*sdk, *pkh

*s, *tlh

*s, *wees

85
307

pull

*k

309

day

310

morning

311

noon

*iis, *hrii
*plp, *huplp, *bl iis, *biyh,
*[hu]pr / *[hu]pr
*l, * th hrii, *mkt

313

night

314

tomorrow

315

yesterday

317

lean-to

322

house

*thuudn
*d rp, *d dnm, *d spw,
*pndr / *pndr
*d

324

roof

*ploo

floor

*cikr, *ris

328

mat

*cruu

330

sarong, mens

*bdn, *bdn kb / kb

332

tie

*bk, *kool, *sk

334

trousers

*srwl, *sluwr

336

wash

*suuc

338

rub

*sk, *c

340

dry, to be

*soot

308

312

316

323

push

afternoon

year

space under house

325

window

327

blanket

326

329
331

333

335
337
339

341

342

loincloth

sarong, womens
tie by twisting
dirty

wash (clothes)

wet

dry, to
wipe

*dus, *tkuu, *sr

*duuy, *brkes, *brks


*klm, *sp, *sp, *bt , *ditp,
*lp
*hplp, *hlp, *huprdh, *chy,
*hupr [dh], *ypr
*sn
, *duuy n , *n[]tbm

*krbm d
*tikp

*tiluubm, *bdn, *br


*ldn

*bdn bti

*kool, *beer, *loor


*bct, *nc, *brit

*sh

*kc, *tc, *pk


*tiil

*iit, *spuu, *lp

343

sweep

*ps, *spuu

345

needle

*rubm

344
346

347a

sew

cook

boil (water)

347b boil (tapioca)

*cu, *ci, *yet

*brcdn

*bm, *nm

*bm, *rbus

86
348

pot

350

full

349
351a

cooking pot

*cre

*lee, *pryuk, *tl

*tbee, *kbm

ladle

*sndo, *mnh, *suduu

352a

mortar (stone)

*uul, *sikldn, *lsu

353a

pestle (spices)

*knh, *rn, *knr

354a

pound (in mortar, spices)

*sh, *rlit

355

knife

358

blunt

351b dipper (for bathing)


352b mortar (for rice)
353b pestle (for pounding rice)

*c[n]ibo, *sndok, *tk

*uul

*knh, *rn

354b pound in mortar (rice)

*sh

357

sharp

*cbt

split

*pk, *kh, *blh

359

360

*yooc, *yc, *pisw


*blk

sweet

*ct

bitter

*kdc

364

white

365

red

366

green

367

yellow

368

bright

*byk, *byk, *bk , *bkl, *puteh


*chr (briht red), *cl, *r, *h,
*ptlc
*blr, *hiw
*r[t]mt (yellow, turmeric), *klooy, *cmck,
*phook
*loow

370

new

*py

round

*kb / *kb, *bult, *blnl

361

sour

363

black

362

369

dark

371

old (thing)

373

straight

372

*kr, *kbm / *kbm


*blk, *rh, *hitbm

*lp, *klp, *sr, *sp, *sp

*liiw, *mnh, *r

*t, *ly, *lp, *lurus, *lk

374

narrow

376

thin

*nsy, *npis

378

wide

*l, *luws, *lebr

375
377

thick

smooth

379

long

381

heavy

380

hard

382

same

386

what?

385

other

*kp, *t, *p, *smpit, *t


*tbl, *nsee
*slc

*cr

*ch, *tr
*h
*sm

*kil / *kil, *lyidn, *psik


*mh, *l, *boo, *ml

87
387

who?

*boo, *boo imy

389

how many?

*m[]ribm, *mit

388
391

when?

one

*nn, *s

many (things), much

*y, *kbm, *smy / *sm y

392

two

394

many (human)

393
395

*lumpuu, *bil, *mpu

*nr, *duw

*[n]y, *kbm

all

*dic, *smw

far

396

some

398

near

*
*n
, *r / *r

left (side)

*[kn]wil

397
399

400
401

right (side)
three

*[kn]tbm

*n, *ti

402

four

*m pt

404

six

*nnm

eight

*lpn

403

five

405

seven

407

nine

406

*lim

*tuoh

*smbiln

408

ten

*spuloh

510

stare

*ldn

512

follow, pursue

*bsh, *dlk, *ooy

509
511

exit, go/come out


grope

*hool

*pbm

513

wring

*rit, *putt

515

burp

*rp

517

not oily; squeaky clean

*ckt

514
516

518

hide oneself
crow, sing

there is

519

bachelor

521

smell of fresh fish

520
522

get up

*brceebm

*tdr, *cbh

*m

*litw, *bu, *uleey

*kus, *kuu

*pls, *plh

ripe

*nbm

524

orphan

*rknk

526

last, after, in the back

523
525
527

528

sting, to

make, to

twilight
unripe

*sc

*bh, *uuy
*ktnt

*yp

*klbdn

88

Appendix F: Phonology of Semai, Betau Dialect


F.1 Introduction
The purpose of this paper is to briefly describe the phonology of the Betau dialect of Semai. Semai is
categorized with the following genetic affiliation:
Austroasiatic
Mon-Khmer
Aslian
Central Aslian
Semai
Semai is spoken by approximately 34,000 people,41 who live mostly in the remote areas of the Malaysian
peninsula, in the states of Perak and Pahang. Linguistic and anthropological research indicates that the
Semai and other Aslian groups on the peninsula lived there long before the arrival of the current Malayspeaking population.42

F.2 Word
Semai words, in good Mon-Khmer tradition, fit the following syllable template:
(C3 V2 (C4) )
C1 V1 C2
Minor

Major

The final syllable is regarded as the major syllable and the penultimate syllable, if present, is regarded as
the minor syllable. Semai words always have ultimate stress; that is, on the major syllable. While many
Semai words have only one syllable, the majority of Semai words have two syllables. The minor vowel V2
is usually very short, nonphonemic, epenthetic [], and its enunciation in any given word is often
optional if the two consonants are easily pronounced without the epenthetic vowel. For this reason
Semai roots are sometimes called sesquisyllabic since the minor syllable does not carry the same
weight, phonetically or phonemically, as the major syllable. The following forms are illustrative of Semai
word shapes.
/liim/43

to swallow

/.ku/

thunder

/mt/

eye

/s.lc/

smooth

/m.n/

rain

/.rii/

monitor lizard

There are a number of words that have minor syllables with minor vowel segments (V2) other than
[]; namely [], [i], and [u]. It is claimed by some (Diffloth 1968) that these segments may be
phonologically conditioned in some cases and morphemes in other cases.44
/.l/

what

/t.p/

small spider

41

According to Population of Orang Asli Sub-Groups, 1999 Keene State University Orang Asli Archive website.
Carey (1976).
43
In this paper the short vowels will be represented by a single letter, and the long vowels by a double letter. The
latter is a departure from standard IPA. Another departure is that the palatal central approximant is represented by
the symbol y rather than the standard j, which could easily be confused with the palatal voiced plosive and with
local orthographies, especially Bahasa Malaysia. Further, r is used to symbolize the flap. Lastly, the symbol is
used for the unrounded open central vowel, and for the rounded open back vowel.
44
Diffloth (1976a) claims, for example, that /-a-/ in certain minor syllables is a morpheme.
42

88

89
/ki.l/

other

/li.tow/

bachelor

/ku.ti/

short

/ku.rool/

knee

It is noteworthy that when the minor vowel segment (V2) is [], [i], or [u], it is pronounced with
greater length than when V2 is [], roughly equal in length with V1 in the major syllable when V1 is not a
long vowel. Stress remains on the ultimate syllable.
While most roots are apparently either mono-, sesqui-, or disyllabic, there are examples of words
with three syllables.
/n.r.m/

friend (possibly a nominalization of Malay ramah friendly)

/b.r.po/

to dream

/k.rk.bk/

butterfly (possibly from proto-Austronesian *kali-baba)

/b.l.r/

green (an expressive45)

/p.n.m/

full moon

F.3 Syllable
In Semai every syllable has an obligatory nucleus and onset, and an optional coda. The nucleus is usually
a vowel; however, there are some nasals that are syllabic as well in the minor syllable.
[m.pc]

/m.pc/

salt

[.ku]

/.ku/

thunder

[. h]

/. h/

heavy

[n.tooy]

/n.tooy/

big

The onset and coda are consonants. The two basic syllable types are CV and CVC. The syllable type CV is
found only in the minor syllable. For example,
CV

/p.n.m/

full moon

/p.cr/

porcupine (small variety)

/k.r/

to shiver

All major syllables, and some minor syllables, have the syllable type CVC. For example,
CVC

/wt/

to split

/th/

to spit

/./

bone

/kl.oom/

brain

/kr.door/

woman

F.4 Phonemes
Betau Semai in this treatment has forty-five phonemes: nineteen are consonants, fourteen are oral
vowels, and twelve are nasal vowels. The consonants are /p, t, c, k, , b, d, , , m, n, , , s, h, l, r, y, w/,
the oral vowels are /i, ii, ee, , , , , , , u, uu, oo, , /, and the nasal vowels are /, , , , , ,
, , , , , / . Tables 1517 show the Betau Semai phonemes.

45

Diffloth (1976b) discusses a word class for Mon-Khmer languages called expressives. He claims that expressives in
Semai have a phonology that is different from other word classes, exhibiting sequences of sounds not found in the
rest of Semai.

90
Table 15. Consonants
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

Plosive, voiced

Nasal

Fricative, voiceless
Trill or flap
Lateral
approximant
Central
approximant

r
l
w

y
Table 16. Oral vowels

Oral, long
Close

Front
(unrounded)

Central
(unrounded)

ii

Back
(rounded)
uu

Close-mid

ee

oo

Open-mid

Open
Oral, short
Close
Mid

Front
(unrounded)

Central
(unrounded)

Open

Back
(rounded)
u

Table 17. Nasal vowels

Nasal, long
Close
Mid
Open
Nasal, short
Close
Mid

Front
(unrounded)

Central
(unrounded)

Back
(rounded)

Front
(unrounded)

Central
(unrounded)

Back
(rounded)

Open

The following is a description of the articulatory features of Betau Semai phonemes and their
allophones, as well as a description of their distributions.

91

F.4.1

Consonants

Voiceless plosives
The voiceless plosives /p, t, c, k/ all have two allophones, released and unreleased. The unreleased
allophone occurs word final, and the released allophone occurs elsewhere.
/p/ [p,p] is a voiceless unaspirated bilabial plosive and occurs in syllable onsets and codas.
[p.y]

/p.y/

mud

[pn.ly]

/pn.ly/

medicine

[m.pc]

/m.pc/

salt

[ cp. ciip]

/cp.ciip/

walking

[ cr.kp]

/cr.kp/

to stab

/t/ [t, t] is a voiceless unaspirated alveolar plosive and occurs in syllable onsets and codas.
[t]

/t/

hand

[t.leey]

/t.leey/

banana

[sn.tl]

/sn.tl/

feather

[soot]

/soot/

to be dry

[iit]

/iit/

to wipe

[ co]

/coo/

dog

[ c.t]

/c.t/

sweet

[m. ct ]

/m.ct /

small

/t.c/

wet

/sc/

flesh

/c/ [ c, c] is a voiceless unaspirated palatal plosive with two allophones. It occurs in syllable onsets
slightly affricated as [ c ], and in syllable codas as the unreleased palatal plosive [c].

[t.c]
[sec]

/k/ [k, k] is a voiceless unaspirated velar plosive and occurs in syllable onsets and codas.
[k.nh]

/k.nh/

wife

[k]

/k/

fish

[b.kt]

/b.kt/

hot

[.ku]

/.ku/

thunder

[ln.tk]

/ln.tk/

tongue

[b.lk]

/b.lk/

blunt

// [] is a voiceless glottal plosive and occurs in syllable onsets and codas.


[.be]

/.bee/

father

[k]

/k/

to give

[kl.oo m]

/kl.oom/

brain

[s.y ]

/s.y /

person

92
[po.o]

/p.oo/

bamboo

[sn.t]

/sn.t/

tail

Voiced plosives
The voiced plosives /b, d, , / occur only in syllable onsets.
/b/ [b] is a voiced bilabial plosive and occurs in syllable onsets.
[bi.hiibm]

/b.hiim/

blood

[bt]

/bt/

to sleep

[bm.b]

/bm.b/

pus

[ c.bt]

/c.bt/

sharp

/d/ [d] is a voiced alveolar plosive and occurs in syllable onsets.


[d]

/d/

house

[d.k]

/d.k/

boil, abscess

[ c.dw]

/c.dw/

rainbow

[kr.door]

/kr.door/

woman, female

// [ ] is a voiced palatal plosive and occurs in syllable onsets. Akin to its voiceless counter part, its
articulation is slightly affricated.
[ bm]

/m/

to cry

[ r.l]

/r.l/

thorn

[t. u]

/t.u/

snake

[kr. ]

/kr./

ginger

// [] is a voiced velar plosive and occurs in syllable onsets.


[u.]

/u./

some

[n.siir]

/n.siir/

husband

[m.it]

/m.it/

how many?

[t.h]

/t.h/

to fall down

Nasals
The nasal consonants /m, n, , / all have two allophones, plain and preploded. The preploded
allophone occurs word final but only after nonnasal vowels, and the plain allophone occurs elsewhere.
The plosive in these preploded nasals is always at the same point of articulation as the following nasal.
There are also cases where the nasal segment bears syllabicity in words where the nasal segment stands
alone in the minor syllable.
/m/ [m, bm] is a bilabial nasal and occurs in syllable onsets and codas.
[mt]

/mt/

eye, seed

[t.m.i]

/t.m.i/

forehead

[sm.pooy]

/sm.pooy/

eyebrow

[bm.b]

/bm.b/

pus

93
[nm
]

/nm
/

urine

[kl.oo m]

/kl.oom/

brain

[sn.bm]

/sn.m/

sweat

[m.pc]

/m.pc/

salt

/n/ [n, dn] is an alveolar nasal and occurs in syllable onsets and codas.
[nr]

/nr/

two

[ln.tk]

/ln.tk/

tongue

[m.n]

/m.n/

rain

[d.nn]

/d.nn/

corpse

[hn
]

/hn
/

to smell something

[loodn]

/loon/

mountain

[l.p n]

/l.pn/

eight (Malay: lapan)

[n.tooy]

/n.tooy/

big

[n.seey]

/n.seey/

thin

// [, ] is a palatal nasal and occurs in syllable onsets and codas.


[]

//

far

[.y ]

/.y /

lips

[s.p]

/s.p/

dark

[l.m]

/l.m/

tooth

[.r ]

/.r/

termite

[b.he]

/b.h/

itchy

[. h]

/. h/

heavy

// [, ] is a velar nasal and occurs in syllable onsets and codas.


[t ]

/t /

to drink

[sr.]

/sr./

to think

[n
]

/n
/

path

[s.y ]

/s.y /

person

[t.p ]

/t.p/

spider

[bm.b]

/bm.b/

pus

[.ku]

/.ku/

thunder

Fricatives
/s/ [s] is a voiceless palatal fricative and occurs in syllable onsets and codas.
[suuc]

/suuc/

to wash

[sr.]

/sr./

to think

[n.siir]

/n.siir/

husband

94
[.s]

/.s/

to dance

[ns ]

/ns /

heart

[r.iis]

/r.iis/

root

/h/ [h] is a voiceless glottal fricative and occurs in syllable onsets and codas.
[hn
]

/hn
/

to smell something

[hn.lp]

/hn.lp/

tomorrow

[.hl]

/.hl/

tired

[l.hii ]

/l.hii/

saliva

[r.mh]

/r.mh/

to sneeze

[sh]

/sh/

to wash clothes

Trill / Flap
/r/ [r] is an alveolar flap and occurs in syllable onsets and codas. This segment is sometimes articulated
as a trill, especially during slow or emphatic speech, but never in contrast to a flap.
[rk]

/rk/

tiger

[rt.riit]

/rt.riit/

wringing

[r.p]

/r.p/

to burp

[ r.l]

/r.l/

thorn

[ r. r]

/r.r/

running

[ c.koor]

/c.koor/

to rake

Approximants
/l/ [l] is a lateral approximant and occurs in syllable onsets and codas.
[luk]

/luk/

to laugh

[l.si ]

/l.si/

gums

[kl.oobm]

/kl.oom/

brain

[l.pl]

/l.pl/

shoulder

[sn.tl]

/sn.tl/

feather

/y/ [y] is a palatal central approximant and occurs in syllable onsets and codas.
[yc]

/yc/

knife

[s.yt ]

/s.yt /

child

[b.h.y]

/b.h.y/

crocodile

[sn.m y]

/sn.m y/

many people

[suuy]

/suuy/

to live

/w/ [w] is a labiovelar central approximant and occurs in syllable onsets and codas.
[wk]

/wk/

shadow

95

F.4.2

[wt]

/wt/

to split

[sl.wl]

/sl.wl/

trousers

[kn.wiil]

/kn.wiil/

left side

[teew]

/teew/

water

[b.lw]

/b.lw/

blowgun

Oral vowels

As with most languages of Mon-Khmer heritage, Semai is endowed with many vowels. However, while
most Mon-Khmer languages have multiplied their vowel inventories with registers, tones, and
diphthongs, Semais multitude of vowels is due to its having not only long and short vowels, but also
both oral and nasal vowels.
The long vowels are not dramatically elongated. Indeed, it may be more accurate to portray the long
vowels as the more normal, and the short vowels as extra short. Overall, there are roughly twice as
many words with long vowels as opposed to short in the major syllable.
The epenthetic vowel in the minor syllable often undergoes vowel harmony with the major vowel
when the intervening consonant is glottal ( or h). This is discussed further in section 6 of this
appendix.

Long oral vowels


/ii/ [ii] is a close front unrounded long vowel. It occurs only in closed major syllables.
[tiil]

/tiil/

to dry

[l.piil]

/l.piil/

river turtle

[lii m]

/liim/

to swallow

/ee/ [ee, e] is a close-mid front unrounded long vowel. It occurs only in closed major syllables. Usually
this segment is articulated as a long vowel, but before glottal final consonants h and , it is shortened
to the allophone [e].
[teew]

/teew/

water

[b.tees]

/b.tees/

mushroom

[t.bee ]

/t.bee/

full

[t.leh]

/t.leeh/

only, just

[ .re]

/.ree/

deep

// [] is an open-mid front unrounded long vowel. It occurs only in closed major syllables.
[ cbm]

/cm/

bird

[b.kt]

/b.kt/

hot

[lp]

/lp/

to hunt

// [, ] is a close central unrounded long vowel. It occurs only in closed major syllables. Usually this
segment is articulated as a long vowel, but before glottal final consonants h and , it is shortened to
the allophone [].
[sc]

/sc/

to sting

[p.lt]

/p.lt/

to enter

96
[i.pr]

/i.pr/

morning

[kh]

/kh/

to hit

[l]

/l/

pig

// [] is an open central unrounded long vowel. It occurs only in closed major syllables.
[py]

/py/

new

[ . ]

/./

bone

[k.rl]

/k.rl/

man, male

/uu/ [uu] is a close back rounded long vowel. It occurs only in closed major syllables.
[kuuy]

/kuuy/

head

[ .huu]

/.huu/

wood

[nm.puu]

/nm.puu/

when

/oo/ [oo, o] is a close-mid back rounded long vowel. It occurs only in closed major syllables. Usually this
segment is articulated as a long vowel, but before glottal final consonants h and , it is shortened to
the allophone [o].
[loodn]

/loon/

mountain

[n.tooy]

/n.tooy/

big

[r.ook]

/r.ook/

throat

[koh]

/kooh/

to cut

[b.r.po]

/b.r.poo/

to dream

// [] is an open-mid back rounded long vowel. It occurs only in closed major syllables.
[sk]

/sk/

hair

[p ]

/p/

to wait

[ki.hy]

/ki.hy/

to yawn

Short oral vowels


/i/ [i] is a close front unrounded vowel. It occurs in both open (minor) syllables and closed (minor and
major) syllables.
[ki.l]

/ki.l/

other

[b.dil]

/b.dil/

to shoot

[ku.ti]

/ku.ti/

short

// [, e] is a open-mid front unrounded vowel. It occurs only in closed major syllables. This segment is
usually realized as [], but before a palatal consonant, it is raised to the allophone [e].
[ c]

/c/

louse

[ c.h]

/c.h/

hard

[u.]

/u./

some

[pn.ls]

/pn.ls/

dart head

[s.lec]

/s.lc/

smooth

97
[b.he]

/b.h/

itchy

// [] is a close-mid central unrounded vowel. It occurs in both open (minor) syllables and closed
(minor and major) syllables. In the minor syllable, this segment appears to be in free variation with [].
In the minor syllable, [] is often very short and sometimes dropped in fast speech where the resulting
consonant cluster is easy to pronounce without the vowel, such as when the onset of the major syllable is
a lateral approximant or a flap.
[lk]

/lk/

quiver

[sn. m]

/sn.m/

sweat

[s.l] ~ [sl]

/s.l/

leaf

// [] is an open central unrounded vowel. It occurs in both open (minor) syllables and closed (minor
and major) syllables.
[ddn]

/dn/

to die

[b.h.y]

/b.h.y/

crocodile

[ c. coh]

/c.cooh/

to defecate

[bl.k]

/bl.k/

black

/u/ [u] is a close back rounded vowel. It occurs in both open (minor) syllables and closed (minor and
major) syllables.
[su.d]

/su.d/

bamboo spear

[kus]

/kus/

to get up

[t. u]

/t.u/

snake

[.sur]

/.sur/

cockroach

// [] is an open-mid back rounded vowel. It occurs in closed (minor and major) syllables.

F.4.3

[k.rm.dk]

/k.rm.dk/

armpit46

[b]

/b/

to carry

[kn.tbm]

/kn.tm/

right (side)

[l]

/l/

to carry on shoulder

[ .ly]

/.ly/

straight

Nasal vowels

The nasal vowels are of two types: those that are automatically nasalized after a nasal consonant, and
others that are unpredictable and therefore contrastive. Both types are treated together here. While the
evidence supports that nasality is phonemic, nasal vowels (especially the unpredictable type) are rare,
and minimal pairs of words showing contrast with the oral vowels are rather hard to find. The following
examples, while not minimal pairs, clearly show that nasality is phonemic.
[lt ]

/lt /

to suck on something

[s.lt]

/s.lt/

thirsty

[lc]

/lc/

shaved bald (head)

46

But this may be a frozen form of krm under plus dk ???.

98
[t.c]

/t.c/

wet

[ku.wc]

/ku.wc/

to scratch like a chicken

[k.r.sh]

/k.r.sh/

rough to the touch

[sh]

/sh/

to wash (e.g. hands)

[t.kl]

/t.kl/

to carry on head

[sn.tl]

/sn.tl/

feather

[l]

/l/

to carry on shoulder

[plh]

/pl.h/

smell of freshly cut fish

[iih]

/iih/

to scratch

Long nasal vowels


// [] is a close front unrounded long nasal vowel. It occurs only in closed major syllables.
[m]

/m/

cheek

[k.ml]

/k.ml/

above

[rk.nk]

/rk.nk/

orphan

// [] is an open-mid front unrounded long nasal vowel. It occurs only in closed major syllables.
[s.yt ]

/s.yt /

child

[r.mt ]

/r.mt /

yellow

[n ]

/n /

to see

/ / [ ] is a close central unrounded long nasal vowel. It occurs only in closed major syllables. Usually
this segment is articulated as a long vowel, but before glottal final consonants h and , it is shortened
to the allophone [ ].
[sn.m y]

/sn.m y/

many (people)

[s.m c]

/s.m c/

stinging insect

[c.m s]

/c.m s/

hungry

[m h]

/m h/

name

[m.m h]

/m.m h/

to bathe

// [] is an open central unrounded long nasal vowel. It occurs only in closed major syllables.
[t.c]

/t.c/

wet

[nr]

/nr/

two

// [] is a close back rounded long nasal vowel. It occurs only in closed major syllables.
[l.m]

/l.m/

tooth

[i.ny]

/i.ny/

something made

[k.nr]

/k.nr/

type of rattan used as a spice

// [] is an open-mid back rounded long nasal vowel. It occurs only in closed major syllables.
[hn
]

/hn
/

to smell/sniff something

99
[lt ]

/lt /

to suck on something

[m. ct ]

/m.ct /

small

[.y ]

/.y /

lip

Short nasal vowels


// [] is a close front unrounded nasal vowel. It occurs in both open (minor) syllables and closed (minor
and major) syllables.
[.y ]

/.y /

lip

[pl.h]

/pl.h/

smell of fresh fish

[n]

/n/

three

// [] is an open-mid front unrounded nasal vowel. There were only three examples in the data, all in
closed major syllables.
[t.n]

/t.n/

elder sibling

[.m]

/.m/

mother

[t mt]

/t mt/

to wink an eye

// [] is an open-mid central unrounded nasal vowel. So far it has only been found in two examples,
both in minor syllables.
[n.r.m]

/n.r.m/

friend

[mn.mn]

/mn.mn/

playing

// [] is an open central unrounded nasal vowel. It occurs in both open and closed syllables, and in
both major and minor syllables.
[mt]

/mt/

eye

[mn.mn]

/mn.mn/

playing

[k.r.sh]

/k.r.sh/

rough to the touch

[m.m h]

/m.m h/

to bathe

// [] is a close back rounded nasal vowel. It occurs in both open and closed syllables, and in both
major and minor syllables.
[mc]

/mc/

to get up very early in the morning

[nk]

/nuk/

type of bird

[nm.puu]

/nm.puu/

when

// [] is an open-mid back rounded nasal vowel. It occurs only in closed major syllables.
[t.kl]

/t.kl/

to carry on head

[s.h]

/s.h/

fear

[r.mh]

/r.mh/

to sneeze

100

F.5 Distribution charts of Betau Semai phonemes


The distribution of the phonemes is notably different for each position in the Semai word, which once
again is:
(C3 V2 (C4) )

C1 V1 C2

Minor

Major

The consonants have the following distribution:


Consonant C1
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

Plosive, voiced

Nasal

Fricative, voiceless
Flap
Lateral
approximant
Central
approximant

r
l
w

Consonant C2
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

Nasal

Fricative, voiceless
Flap
Lateral
approximant
Central
approximant

s
r
l
w

101
Consonant C3
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

Plosive, voiced

Nasal

[]a

Fricative, voiceless
Flap
Lateral
approximant
Central
approximantb

r
l
(w)

(y)

The segment // is evidently rare as it was not found in the data; however, it is expected
for symmetry reasons.
b
Note: In the C3 position, the central approximants (/w/ and /y/) were only as a result of
reduplication.

Consonant C4a
Bilabial

Alveolar

Palatal

Velar

Glottal

Plosive, voiceless

(p)

(t)

(c)

(k)

Nasal
Fricative,
voiceless
Flap
Lateral
approximant

(s)

(h)

r
l

In the C4 position, all occurrences of the voiceless stops (/p/, /t/, /c/ and /k/) and
fricatives (/s/ and /h/) appeared to be due to infixation, reduplication, and compound
words.

102
The vowels have the following distribution:
Vowel V1, orala
Front
(unrounded)
Close

Central
(unrounded)

ii

Close-mid

Back
(rounded)

uu

ee

Open-mid

oo

Open

The segment [] is found in some borrowed words, and reflects the


Malay pronunciation.

Vowel V1, nasal

Close
Mid

Front
(unrounded)

Central
(unrounded)

Back
(rounded)

Open

Vowel V2, orala

Close

Front
(unrounded)

Central
(unrounded)

Mid

Open

Back
(rounded)

There was one example of [], but it was in a closed syllable of a


borrowed word, reflecting the Malay pronunciation. Segments []
and [] were found, but only as vowel harmony with the major
syllable across glottal segments /h/ or //.

Vowel V2, nasal

Close

Front
(unrounded)

Central
(unrounded)

Back
(rounded)

Open

F.6 Prosodic features


F.6.1

Vowel harmony

The epenthetic vowel in the minor syllable often undergoes vowel harmony with the major vowel when
the intervening consonant is glottal (// or /h/). This feature results in free variation of pronunciation
for such words, with a tendency toward the epenthetic central vowel when the word is articulated slowly
and carefully, but a harmonized vowel when articulated during normal or fast speech.

103

F.6.2

[.] ~ [.]

/./

bone

[p.o] ~ [po.o]

/p.oo/

bamboo

[t.hr] ~ [t.hr]

/t.hr/

to incant

[p.hoot] ~ [po.hoot]

/p.hoot/

to be hurting

Preploded nasals

This dialect is interesting in that it is representative of a relatively small area of the Semai territory that
has preserved the preploded nasals. Most of the Semai dialects have reduced these phonemes to simple
voiceless plosives. It is clear that these phonemes used to be simple nasals in the past, both from
comparison to other Mon-Khmer languages and to words borrowed from Malay. For example,
[l.pdn]

/l.pn/

Malay: lapan (eight)

[ku. ci]

/ku.ci/

Malay: kucing (cat)

[ .ru m]

/.rum/

Malay: jarum (needle)

This feature of preplosion has sometimes been labeled pre-denasalization. This is an apt term,
since preploded nasals seem be part of a prosodic tendency to prevent nasalization from spreading
leftward in the word. Indeed, those dialects that have reduced the nasal to a simple voiceless plosive
have removed nasalization from the oral major syllable altogether, making the whole syllable nonnasal.
In the Betau dialect preploded nasals are found only after oral vowels. Note the following examples,
which have nasal vowels (or else vowels nasalized by the preceding nasal consonant) and hence do not
have preplosion.
[hn
]

/hn
/

to smell

[m]

/m]

cheek

[n ]

/n /

to look

There is one example, however, that appears to be an exception to the rule; namely, a simple nasal is
found after an oral vowel.
[pr.hm]

/pr.hm/

to breathe

It is possible that the major vowel was slightly nasalized, or that the preceding h comes into play
somehow.

F.7 Residue
F.7.1

Glottal-final words

A great many of the long vowels in words ending with glottal segments /-h/ or /-/ appear to have been
shortened and hence phonetically are now short vowels.47 This helps explain why the long vowels /oo/,
/ee/, and // have shortened vowel allophones in these positions. This conclusion would be more
satisfactory, however, if all long vowels were shortened before these glottal-final segments. However, the
language assistant did pronounce a few /-/-final words with other long vowels (although not always
consistently). For example,

47

This phenomenon was discussed by Diffloth (1977).

104
[m.n] ~ [m.n]

/m.n/

rain

[sr.] ~ [sr.]

/sr./

to think

[d.k] ~ [d.k]

/d.k/

boil (n)

[ .huu]

/.huu/

wood, tree

The conclusion is that either this phonological change is not complete or not consistent in this
dialect, or else perhaps this speaker has picked up pronunciations from speakers of other dialects, and
uses these pronunciations occasionally. It would be good to check these words with other speakers of
this dialect.
For the sake of this paper, vowels that were sometimes heard as phonetically long before glottal stop
have been marked as long vowels phonemically. Likewise, vowels never heard long before a final glottal
stop have been marked as short vowels phonemically, except of course for the long vowels /oo/, /ee/,
and //, which do not have phonemic short vowel counterparts.

F.7.2

Syllabic nasals

Words with syllabic nasals at the beginning seem to be preceded by a glottal stop, which would allow
the minor syllable in these words to be analyzed as /-/. Further research is needed to see if this
analysis is correct. A study of the morphology of the language, particularly reduplication and affixation,
should shed light on these words.

F.7.3

Final consonants

In general, all Semai words have final consonants. However, one word appears to defy this rule:
[p.nii]

/p.nii/

to know48

In other dialects of Semai this word is p.ny or p.ny . Apparently a phonological change in this
dialect has shifted the proto-vowel * to in this language, with the odd side effect of essentially
swallowing the final semi-vowel y in this word, or at least rendering the final y indiscernible from the
vowel. Not only would it be interesting to find more words of this type, but there may well be a parallel
situation with the back vowels, if the proto-vowel * is shifted to before w.

48

This word was most likely originally borrowed from the Malay word pandai intelligent.


105

References
Baer, Adela. 1999. A Temuan-English-Malay lexicon. Adela S. Baer papers, Orang Asli archive. Keene, New
Hampshire. http://www.keene.edu/library/orangasli/lexicon.pdf, accessed 2005.
Basrim bin Ngah Aching, ed. 2008. Kamus Semai. Bangi, Malaysia: Institut Alam dan Tamadun Melayu
[Institute of the Malay World and Civilization].
Benjamin, Geoffrey. 1976a. An outline of Temiar grammar. In Philip N. Jenner, Laurence C. Thompson,
and Stanley Starosta (eds.), Austroasiatic studies, Part I, 129187. Honolulu: University Press of
Hawaii.
Benjamin, Geoffrey. 1976b. Austroasiatic subgroupings and prehistory in the Malay peninsula. In Philip
N. Jenner, Laurence C. Thompson, and Stanley Starosta (eds.), Austroasiatic studies, Part I, 37128.
Honolulu: University Press of Hawaii.
Bishop, Nancy. 1996. A preliminary description of Kensiw (Maniq) phonology. Mon-Khmer studies: A
Journal of Southeast Asian linguistics and languages 25:227253.
Carey, Iskandar. 1976. Orang Asli: The aboriginal tribes of peninsular Malaysia. Oxford: Oxford University
Press.
Dentan, Robert Knox, Kirk Endicott, Alberto G. Gomes, and M. B. Hooker. 1997. Malaysia and the original
people: A case study of the impact of development on indigenous peoples. Boston: Allyn and Bacon.
Diffloth, Grard. 1968. Proto-Semai phonology. Federation museums journal 13:6574. Kuala Lumpur,
Malaysia: Muzium Negara.
Diffloth, Grard. 1976a. Minor-syllable vocalism in Senoic languages. In Philip N. Jenner, Laurence C.
Thompson, and Stanley Starosta (eds.), Austroasiatic studies, Part I, 229247. Honolulu: University
Press of Hawaii.
Diffloth, Grard. 1976b. Expressives in Semai. In Philip N. Jenner, Laurence C. Thompson, and Stanley
Starosta (eds.), Austroasiatic studies, Part I, 249264. Honolulu: University Press of Hawaii.
Diffloth, Grard. 1977. Towards a history of Mon-Khmer: Proto-Semai vowels. Tonan Azia Kenkyu
[Southeast Asian studies] 14:463495.
Diffloth, Grard. 1980. To taboo everything at all times. Proceedings of the Berkeley Linguistic Society
6:157165. Berkeley.
Kroeger, Paul R. 1989. Discontinuous reduplication in vernacular Malay. Proceedings of the Berkeley
Linguistic Society 15. Berkeley.
Lewis, M. Paul, ed. 2009. Ethnologue: Languages of the world. Sixteenth edition. Dallas: SIL International.
Matisoff, James. 2003. Aslian: Mon-Khmer of the Malay peninsula. Mon-Khmer Studies: A Journal of
Southeast Asian linguistics and languages 33:157.
Phillips, Timothy C. 2006a. A survey of nasal preplosion in Aslian languages. Paper presented at the
International Conference on Indigenous People, Kuala Lumpur, Malaysia. 5 July 2006.
Phillips, Timothy C. 2006b. Dialect comprehension survey of the Semai language. Unpublished
manuscript. Economic Planning Unit, Prime Ministers Department, Malaysia.
Phillips, Timothy C. 2007. A brief phonology of Betau Semai. In Shin Chong, Karim Harun, Yabit Alas,
and James T. Collins (eds.), Reflections in Southeast Asian seas. Essays in Honour of Professor James T.
Collins 2. Pontianak, Indonesia: STAIN Pontianak Press.
Phillips, Timothy C. 2011. Retention and Reduction in Reduplication of Semai. In Sophana Srichampa,
Paul Sidwell, and Kenneth Gregerson (eds.), Austroasiatic studies: Papers from ICAAL4, 178179. MonKhmer Studies Journal Special Issue 3. Dallas: SIL International; Bangkok: Mahidol University;
Canberra: Pacific Linguistics.
Phillips, Timothy C. Forthcoming. Proto-Aslian: Towards an understanding of its historical linguistic
systems, principles, and processes. Ph.D. dissertation. Universiti Kebangsaan Malaysia.
Keene State University. 2002. Population of Orang Asli Sub-Groups, 1999. Orang Asli archive.
http://www.keene.edu/library/OrangAsli/pop2000.doc, accessed 7 February 2005.
Posey, Darrell A. 2001. Biological and cultural diversity: The inextricable, linked by language and
politics. In Luisa Maffi (ed.), On biocultural diversity: Linking language, knowledge, and the environment,
379396. Washington, DC: Smithsonian Institution Press.

105

106
Seidlitz, Eric. 2005. A study of Jakun Malay: Some aspects of its phonology. M.A. thesis. Universiti
Kebangsaan Malaysia.
Simons, Gary F. 1977. Tables of significance for lexicostatistics. In Richard Loving and Gary F. Simons
(eds.), Language variation and survey techniques. Work papers in Papua New Guinea Languages 21,
75106. Ukarumpa, Papua New Guinea: Summer Institute of Linguistics.
Zaharani bin Ahmad. 1991. The phonology and morphology of the Perak dialect. Kuala Lumpur: Dewan
Bahasa dan Pustaka.

You might also like