You are on page 1of 23

2 PASS ASSEMBLER FOR 8085

2 PASS ASSEMBLER FOR 8085

CONTENTS
1 ABSTRACT 2

2 INTRODUCTION 3

3 DESIGN ASPECTS OF ASSEMBLER 4

A HYPOTHETICAL AND SIMPLE ASSEMBLY LANGUAGE 5

EXAMPLE 6

THE TASKS OF AN ASSEMBLER 6

LITERAL HANDLING 11
ONE PASS THE ASSEMBLER 12

4 BRIEF DESCRIPTION OF 8085 12

SOME OF 8085 INSTRUCTIONS 14

5 THE DETAILED VERSION OF THE ALGORITHM FOR AN 16


ASSEMBLER FOR 8085
MAIN FUNCTIONS USED IN TWO PASSES 18

6 RESULTS 19

7 CONCLUSION 21

8 REFERENCES 22

2
2 PASS ASSEMBLER FOR 8085

ABSTRACT:

Assembler is a program that translates programs from assembly language to machine


language. Before an assembly program can be executed on the computer platform, it
must be translated into the machine language of the target hardware. A program called
assembler does the translation task. The assembler takes as input a stream of assembly
commands, and generates as output a stream of equivalent binary instructions. The
resulting code can be loaded as-is into the computer's memory, and then executed by
the hardware. Assembler is essentially a text-processing program, designed to provide
translation services. Carries out the following operations: Parse the symbolic command
into its underlying fields; For each field, generate the corresponding bits in the machine
language; Replace all symbolic references (if any) with numeric addresses of memory
locations; Assemble the binary codes into a complete machine instruction. The
translation of symbols to numeric addresses is done it two conceptual stages. First, the
assembler creates a symbol table, associating each symbol with a designated memory
address. Next, the assembler uses the symbol table to translate each occurrence of each
symbol in the program to its allocated address. Inside the 8085, instructions are really
stored as binary numbers, not a very good way to look at them and extremely difficult
to decipher. An assembler is a program that allows you to write instructions in, more or
less, English form, much more easily read and understood, and then converted or
assembled into hex numbers and finally into binary numbers.

2 pass Assembler
Pass 1
Assign addresses to all statements in the program
Save the values assigned to all labels for use in Pass 2
 Perform some processing of assembler directive

Pass 2
Assemble instructions

Generate data values defined by BYTE, WORD


Perform processing of assembler directives not done in Pass 1
Write the object program and the assembly listing

3
2 PASS ASSEMBLER FOR 8085

INTRODUCTION:

Assemblers perform one-to-one translation of symbolic source statements written in assembly


language to the corresponding machine language instructions. Assembly language is
intermediate between the high-level language and machine language. Assembly language is the
symbolic representation of a computer’s binary encoding—machine language. Assembly
language is more readable than machine language because it uses symbols instead of bits. The
symbols in assembly language name commonly occurring bit patterns, such as opcodes and
register specifiers, so that it can be read and remembered. In addition, assembly language
permits programmers to use labels to identify and name particular memory words that hold
instructions or data.
An assembler reads a single assembly language source file and produces an object file
containing machine instructions and bookkeeping information that helps combine several
object files into a program. Figure illustrates how a program is built. Most programs consist of
several files—also called modules — that are written, compiled, and assembled independently.
A program may also use prewritten routines supplied in a program library. A module typically
contains references to subroutines and data defined in other modules and in libraries. The code
in a module cannot be executed when it contains unresolved references to labels in other object
files or libraries. Another tool, called a linker, combines a collection of object and library files
into an executable file, which a computer can run.

Fig.1: THE PROCESS THAT PRODUCES AN EXECUTABLE FILE

4
2 PASS ASSEMBLER FOR 8085

A statement contains an operation name mainly the following three types:


 Operation Code (Op-code): It is an easy to understand code name for a primitive
machine instruction.
 Assembler directives (Pseudo-op): It is a symbolic directive to the assembler that tells
the assembler how to translate a program but do not produce machine instructions.
 Macro-name: It is a symbolic name, which represents a group of assembly statements.
The purpose of using a macro is to allow a programmer to physically insert a set of
instructions by means of a single symbolic name.
The operand field contains an address of the operand whose content is manipulated by the
opcode. It could also be the target label of a branch. In general could be a literal, the name of a
machine register or a simple expression involving addresses. In case of such an expression, the
assembler calculates the effective address.
An assembly language program contains absolute entities, relative and externally defined
entities. Absolute entities like op-codes, and fixed addresses are independent of the storage
locations the machine code will eventually occupy. Relative entities like symbolic references
are fixed only in respect to each other and can stated relative to the starting addresses of the
program. Externally defined entities are used but not defined within module. Output of an
assembler is primarily a file containing object code. In addition to the object file the assembler
also produces some other files to help the user in debugging the program. The object file
contains machine codes for the mnemonics and the addresses along with an identification of
whether it is relative, absolute or external.

DESIGN ASPECTS OF ASSEMBLER


Implementation of an assembler can be viewed as having 3 logical stages. During the first
stage, the macro definitions are collected and proper textual substitution is made for macro
calls. This stage is known as microprogramming. The assembler star forms the second stage.
At first scans through the entire text to collect all the symbols associating them with addresses
whenever possible. This stage is known as pass one. In the second pass, the assembler replaces

5
2 PASS ASSEMBLER FOR 8085

the mnemonics by machine codes, symbols by the corresponding addresses and generates the
output.

A HYPOTHETICAL AND SIMPLE ASSEMBLY LANGUAGE


Let us consider a model of a hypothetical computer in order to appreciate the functions of an
assembler. For simplicity of explanation and the ease of understanding, our model is a very
small set of assembly level instruction. We now present the detail of the instructions and
functions. In example cited below, A and B are the names of two variables, L is a label and
“ACC” stands for the accumulator. A and B are the 2 variables, L is a label and “ACC” stands
for accumulator. We also include two pseudo code operations: DEFW and CONST . DEFW is
used to reserve one word of storage. The assembly language for the program as shown.

Mnemonic Machine No of Length of Example Meaning


code code operands instruction

ADD 01 1 2 ADD A ACC:=ACC+A

SUB 02 1 2 SUB A ACC:=ACC-A

MULT 03 1 2 MUL T ACC:=ACC*A

JMP 04 1 2 JMP L GOTO L

JNEG 05 1 2 JNEG L if ACC<0


GOTO L

JPOS 06 1 2 JZ if ACC>0
GOTO L

JZ 07 1 2 JZ L if ACC=0
GOTO L

LOAD 08 1 2 LOAD A ACC:=A

STORE 09 1 2 STORE A A:=ACC

READ 10 1 2 READ A A:=


Input Data
WRITE 11 1 2 WRITE A Output A

6
2 PASS ASSEMBLER FOR 8085

STOP 12 0 2 STOP STOP


Execution
EXAMPLE:
Presenting a small assembly language program in order to highlight the structure of a typical
assembly language program. Consider the problem of reading N numbers and finding their
sum. The details of the instructions and their functions are listed below.

Line no Label Mnemonic code Operand

1 READ N
2 LOAD ZERO
3 STORE COUNT
4 STORE SUM
5 LOOP READ X
6 LOAD X
7 ADD SUM
8 STORE SUM
9 LOAD COUNT
10 ADD ONE
11 STORE COUNT
12 SUB N
13 JZ OUTER
14 JMP LOOP
15 OUTER WRITE SUM
16 STOP
17 ENDP
18 ZERO
19 ONE
20 SUM DEFW
21 COUNT DEFW
22 N DEFW
23 X DEFW
24 END

THE TASKS OF AN ASSEMBLER:

1. Replace symbolic codes by machine instruction codes.


2. Replace symbolic references by numeric addresses.
3. Reserve storage to be occupied by instruction and data.
4. Translate the literals into their internal representations.

7
2 PASS ASSEMBLER FOR 8085

The translation of the program is as follows. Each line is typically of form:


Label opcode field operand(s)
Where one or more of the fields may be missing in an instruction. Assuming program
will be stored from the 0th word of memory onwards.
In line 1 there is a mnemonic read and the operand ‘N’. It is easy to substitute the
machine code for the mnemonic. We can maintain a list of all mnemonics and their
corresponding machine codes in a table usually known as Machine Op-code Table
(MOT). Whenever a mnemonic is encountered this table is searched to fined the
appropriate machine code. MOT also contains other important information’s like the
number of operands, length of the instruction etc. Which helps in identifying the
appropriate number of operands expected to follow the mnemonic in an assembly
language statement. For example, the entry for “READ” in the MOT tells us that there
should be exactly one operand (“N in this case). Therefore, occurrence of more than one
operand or its absence has to be treated as an error. However, it is not possible to which it
refers. The assembler has not yet encountered any instructions that reserved storage
pointed by N. Moreover, as the size of the assembly language code still unknown, the
available part of the memory where data can be stored can not be fixed in advance. So,
the assembler keeps note of the fact that “N” could not be replaced and postpones this
action till the address of N is resolved. This is achieved by the use of a Symbol table
(ST). Each entry in this table contains the name of a symbol, its address and other
attributes. Whenever a new symbol is encountered, its name and other known attributes
are inserted in the table. Hence, the assembler puts N in the symbol table. Note that the
other attributes of N still remain undefined.
Recall that we have assumed the program will be stored from the 0th word onwards. The
assembler has to reserve two words of memory for line 1 namely, the word for the
Opcode (Read) and word 1 which will eventually hold the address of the operand (N).
Consequently, the assembler now finds word 2 onwards free for its use. Starting location
of this free part of the memory changes as we proceed along the source code. The
assembler keeps track of this using a variable called the location counter (LC). After each

8
2 PASS ASSEMBLER FOR 8085

instruction LC is incremented by L, the length of the machine code of that instruction.


This is precisely the reason why L is included along the mnemonic is the MOT.

Processing of lines 2, 3, and 4 is similar. When we come to line 5 LC becomes eight and
the symbols ZERO, COUNT, SUM have been inserted in the ST. The specialty of line 5
is that it has a label field “LOOP”, The label Loop can be used elsewhere in the program
only to transfer control to line 5i.e. to word 8(the current value of the location counter).
To remember this information for subsequent use, the name Loop and its attribute (with
value 8) is inserted inside the symbol table. Proceeding in this manner, we can continue
up to line 17 inserting the new symbols as encountered in the symbol table and updating
the location counter. The symbol table now contains the following information.

Symbol Name Type Address Other

N Id ---
ZERO Id ---
COUNT Id ---
SUM Id ---
X Id ---
LOOP Id 08
ONE Id ---
OUTER Id 28

Note that, the type field can be seen to identify whether a symbol represents a label or an
identifier. T he significance of “ other” fields will be clarified later. The current value of
LC is clearly 31. In line 17, the assembler comes across a pseudo-op ENDP. So the
assembler next searches a similar table called the pseudo operation table (POT), which
contains information regarding all pseudo ops. Unlike mnemonics for machine codes, a
pseudo-op does not always alter the value of the LC. Hence LC remains 31 at the end of
line 17. Seeing the ENDP assembler directive, which signifies the physical end of the
program segment, the assembler can conclude that the remaining free memory words
may be used to store the data.
In line 18, the pseudo-op CONST indicates that the label ZERO is defined as a constant
having the value 0. The assembler then search for the entry of the symbol ZERO is
ST(creates one is not found) puts the LC value (31) as its address and keeps a tag to

9
2 PASS ASSEMBLER FOR 8085

remember that it is a constant so that any attempt to overwrite ZERO can be reported as
an error. LC is incremented by 1. When we come to line 20, a new pseudo-op DEFW is
found. Processing of DEFW is similar to that of CONST expect that it is marked as a
variable identifier. Once we come to END in line 24, which indicates the end of the
source text symbol table will be updated as shown below.

Symbol Name Type Address Other

N Var Id 35
ZERO CONST Id 31
COUNT Var Id 34
SUM Var Id 33
X Var Id 36
LOOP LABEL 08
ONE CONST Id 32
OUTER LABEL 28

Now after scanning the entire source text, we are in a position to say the amount of
storage necessary for the instructions, the storage required by the data segment and the
addresses, each symbol should refer to. Recall that while scanning line 1, the assembler
reserved one word for the address of N. But since this address was not known the content
of word 1 had to remain undefined till the availability of the address of N. Actually
replacing such symbolic references is deferred until the end of pass one when unresolved
attributed of all symbols are expected to be completed. The assembler is now equipped
with all necessary information so that it can start generating the object code by going
through the source code for the second time. For example, the object code of the program
in Example1 can easily be generated as follows:

Line Source Text word Output


1 READ N 0 10
1 35
2 LOAD ZERO 2 08
3 31
3 STORE COUNT 4 09
5 34
4 STORE SUM 6 09

10
2 PASS ASSEMBLER FOR 8085

7 33
5 LOOP READ X 8 10
9 36
6 LOAD X 10 08
11 36
7 ADD SUM 12 01
13 33
8 STORE SUM 14 09
15 33
9 LOAD COUNT 16 08
17 34
10 ADD ONE 18 01
19 32
11 STORE COUNT 20 09
21 34
12 SUB N 22 02
23 35
13 JZ OUTER 24 07
25 28
14 JMP LOOP 26 04
27 08
15 OUTER WRITE SUM 28 11
29 33
16 STOP 30 12
17 ENDP
18 ZERO CONST 0 31 00
19 ONE CONST 1 32 01
20 SUM DEFW 33 XX
21 COUNT DEFW 34 XX
22 N DEFW 35 XX
23 X DEFW 36 XX
24 END

Note that the output has been written in decimal for case of understanding. Actually, the
assembler generates the output in binary. The data generation part (lines 18 to 23) needs

11
2 PASS ASSEMBLER FOR 8085

some more explanation. Consider line 18. In the first pass, the assembler assigns a value
(equal to 0) to that address. While processing the directive “DEFW” in the second pass,
only storage is reserved and it is not necessarily initialed with some specific value. To
indicate that contents of word 33-36 have been marked XX. However, some assemblers
might as well assign some value, possibly 0, to those words. The program in Example 1
and its translated version identify the major tasks of an assembler and also explain how
those tasks are performed. Let us now more formally present an assembler in an
algorithm form.

LITERAL HANDLING
An operand whose value is literally stated is called a literal. For example consider the
following two Add commands: (i) ADD A (ii) ADD @21.
In (i) ‘A’ refers to some memory location whose content is to be added.
In (ii), the intension is to add the value 21 (not the content of some explicitly stated
memory address as in (i)). So, 21 is a literal. The purpose of the special symbol @ is to
inform the assembler that what follows it is to be treated as a literal. Literals are very
useful for writing program.
In the first pass, the assembler puts the literals in a table Known as Literal Table.
The literal table consists of three fields, the literal, its corresponding address and the
value. The fields specified at the end of pass1. The data is also generated and the value
field is initialized. In pass 2; the assembler can easily replace the literals by their
addresses.

12
2 PASS ASSEMBLER FOR 8085

ONE PASS ASSEMBLER


In pass 1, the assembler may fail to resolve symbolic references mainly due to two
reasons:
(i) The program may contain forward referencing in jump instructions, and
(ii) The definition of data becomes available only at the end of program text
If these facilities are restricted then pass 2 of the assembler can be eliminated. It
is not a major restriction to force the user to define the data at the beginning. But
forward referencing can not be compromised in the jump instructions. Even then the
pass 2 of the assembler could be practically eliminated. To implement the assembler in
one pass, each undefined symbol along with the address of the operand of
corresponding jump statement can be entered into a table known as Branch Table. Note
that there may be many references to the same symbol because several branches to the
same symbol may occur in a program. At the end of pass 1, this table and ST may be
consulted to settle the unresolved references.

BRIEF DESCRIPTION OF 8085

The 8085 is an 8bit general purpose microprocessor capable of addressing 64K words
of memory. The microprocessor requires a 5volts power supply and can operate at a
3MHZ single phase clock. The functional diagram of 8085 microprocessor is given in
figure the Arithmetic Logic Unit (ALU) includes 8bit accumulator, a temporary
register, arithmetic and logic circuits and five flags. It has six general purpose registers
identified as B, C, D, E, H and L. They can be combined as register pair BC, DE, and
HL in order to perform sixteen bit operations. The accumulator is identified as A.

13
2 PASS ASSEMBLER FOR 8085

Control Bus

Interrupt control
8 Bit intern

Accumulator
(8) Tem
Fig.2: FUNCTIONAL ORGANIZATION OF p.
THE 8085

Reg. (8)
The instruction set of 8085 may be classified into the following function categories:
(i) Data transfer, (ii) arithmetic operations, (iii) logical operations, (iv) branching
operations, (v) machine control operations and (vi) assembler directives.
The brief description of the subset of the entire instruction set given in table. Each
instruction contains an op-code and may also have an operand. The op-code is 8 bit
wide. The operand includes an internal register, 8 bit or 16 bit data, a memory location

Flag(5)
and 8 or 16 bit address.

Flip-Flops
14
2 PASS ASSEMBLER FOR 8085

SOME OF 8085 INSTRUCTIONS

Opcode Operands Bytes M/C code Explanation

LDA Addr. 3 3E ACC := (Addr)


STA Addr 3 32 Addr := (ACC)
MOV R1,R2 --- 1 OIDDDSSS R1 := (R2)
LHLD Addr 3 2A HL := (Addr)
SHLD Addr 3 22 Addr := (HL)
MOV R,M --- 1 01DDD110 R := ((HL))
MOV M,R --- 1 01110SSS (HL) := (R)
ADD R --- 1 10000SSS ACC := (ACC)+(R)
ADD M --- 1 86 ACC := (ACC)+((HL))
SUB R --- 1 10010SSS ACC := (ACC) - (R)
SUB M --- 1 96 ACC := (ACC) - ((HL))
INR R --- 1 00SSS100 R := R+1
DCR R --- 1 00SSS101 R := R-1
JZ Label 3 CA If(result = 0) then
goto Label
JNZ Label 3 C2 If(result != 0) then
goto Label
JC Label 3 DA If(carry bit is 1) then
goto Label
JNC Label 3 D2 If(carry bit is 0) then
goto Label
JMP Label 3 C3 goto label
HLT --- 1 76 Stop

1 Addr A valid 16 bit address


2 ACC Accumulator
3 R, R1, R2 Register
4 (X) Content of X
5 DDD Destination
6 SSS Sourse
7 Lable Target of a branch

15
2 PASS ASSEMBLER FOR 8085

Source/Destination SSS/ DDD

A 111
B 000
C 001
D 010
E 011
H 100
L 101
M 110

Here we are mentioning some of the 8085 assembler directives.

Assembler directive Example description

ORG ORG 0200 The next block of instructions


should be stored in memory
locations starting at 0200.
END END End of assembly.
DB Y: DB 05 Reserves a byte symbolically
(Define byte) referred as Y and initializes
it with 05.
DS L: DS 06 Reserves six bytes of memory
(Define Storage) locations for L.

The following data structures and databases are to be maintained to design the assembler for
8085.
1) A file containing the input source program.
2) The Machine Operation Table (MOT). An entry in MOT contains a
mnemonic, its machine code and the length of the instruction.
3) The Pseudo Operation Table (POT) which contains the list of all the
assembler directives.
4) The Symbol Table (ST).
5) The Literal Table (LT).
6) The intermediate file and the output file (.obj).

16
2 PASS ASSEMBLER FOR 8085

THE DETAILED VERSION OF THE ALGORITHM FOR AN ASSEMBLER FOR 8085.


Pass 1:
Step 1 : LC : = 0 ;
Step 2 : Read a line from the input file.
Step 3 : analyze the statement.
We have seen that a statement may contain three fields label, Opcode and the
operant. These parts are identified during the analysis of the statement. Let L,
I, X denote the label, op-code, and operand (if any) of the statement.
Step 3.1 : if the statement conations a label L then
Begin
If (L is not found in ST) then insert L in ST;
The address field of L is set to LC;
end
Step 3.2 :
Case 1 : I is found in MOT
LC := LC+ l Where l is the length of the length of I;
Case 2 : I is found in POT
Case 2.1 : I = ORG
LC := X;
Case 2.2 : I = END
goto step 5;
Case 2.3 : I = DB
LC := LC+1;
Case 2.4 : I = DS
LC := LC+X;
Step 3.3 : if (X is a literal ) then
Begin
LC := LC +1,
Insert X in LT if it is new
end;
Step 4 : goto Step 2

17
2 PASS ASSEMBLER FOR 8085

Step 5 : Set the address of the literal in LT (i >0) to LC + i.


Pass 2:
Step 1 : LC := 0;
Step 2 : Read a line from the input file
Step 3 : analyze the statement
Step 3.1 :
Case 1 : I is found in MOT,
Write a line in the output file containing the current value
of LC, the machine code of I, character ‘a’ (the last
character signifies that the content is ‘absolute’ which will
be needed by loader)
case 2 : I is found in POT
Case 2.1 : I = ORG no action
Case 2.2 : I = END
goto step 5;
Case 2.3 : I = DB
Write a line in the output file containing the current
value of LC, X, the character ‘a’.
Step 3.2 :
Case 1 : X is a literal
Write a line in the output file containing the current value
of LC, the address of X ( as found from LC ) and character
‘r’ ( for “relative” ).
Case 2 : X is a symbol
Write a line in the output file containing the current value
of LC, The address of X ( as found from ST ) and the
letter ‘r’.
Step 4 : goto Step 2
Step 5 : for i = 0 to total number of literals – 1 do
Write a line containing LC+I, the ith literal in LT and the letter ‘a’.
Step 6 : stop.

18
2 PASS ASSEMBLER FOR 8085

MAIN FUNCTIONS USED IN TWO PASSES


We developed some functions that are used for the 8085 assembler.
• PASS 1:
• READ1 To read the assembly source file.
• STSTO Store a label and its value into ST (if symbol is not present in ST).
• POTGET Search the POT for a match with the operation field.
• MOTGET Search the MOT for a match with the operation field.
• LTSTO Store a literal into LT.
• STGET Search the ST for the entry corresponding to a specific symbol
• PASS 2
• READ2 To read the assembly source file from the file copy.
• LTGEN Generate code for literals.
• DCGEN Process the fields of the DC (data constant) pseudo-op to generate object code.

19
2 PASS ASSEMBLER FOR 8085

RESULTS

1) INPUT FILE

LXI H, 0000
MVI C, 00
MOV A, M
INX H
ADD M
JNC LAB1
INR C
LAB1: INX H
MOV M, A
INX H
MOV M, C
HLT

2) SYMBOL TABLE

Index - symbol name - value


-----------------------------------------------------------------------
s1 - LAB1 - 000C

3) INTERMEDIATE FILE

m80-r6-C-A16-0000
m23-r3-C-A8-00
m22-r1-C-r9
m20-r6
m3-r9
m70-l-s1
m19-r3
l-s1-cn-m20-r6
m22-r9-C-r1
m20-r6
m22-r9-C-r3
m17

20
2 PASS ASSEMBLER FOR 8085

4) OUTPUT FILE

PC Opcode OBJCODE
________________________________________

0000 LXI H , 0000 210000

0003 MVI C , 00 0E00

0005 MOV A , M 7E

0006 INX H 23

0007 ADD M 86

0008 JNC LAB1 D20C00

000B INR C 0C

000C LAB1 : INX H 23

000D MOV M , A 77

000E INX H 23

000F MOV M , C 71

0010 HLT 76

21
2 PASS ASSEMBLER FOR 8085

CONCLUSION

This project generates the object code for 8085 microprocessor. It takes assembly

language as input, and generates object code as output. And this is done with 2 passes. During

the pass1 phase it generates symbol table and an intermediate code. In pass2 phase it takes

intermediate file as input and updates the symbol table and generates the object code.

22
2 PASS ASSEMBLER FOR 8085

REFERENCES
[1] Donovan J.J., “System Programming”, Mc-Graw Hill, New York, 1972.

[2] Barron D. W., “Assemblers and loaders, 2/e”, Elsevier, New York, 1972.

[3] Beck L. L., “System Software: An introduction to systems programming”,

Addison-Wesley, 1985.

[4] Ullman, j. d. ,”Fundamental Concepts of Programming Systems”,

Addison-Wesley, 1976.

[5] The Digital Core, by Nisan & Schocken, 2003, www.idc.ac.il/csd.

[6] Functional programming and the two-pass assembler, by Grady Early,

Southwest Texas State University, San Marcos, Texas.

[7] System Software, by Leland L. Beck.

[8] Assemblers and Loaders, by David Salomon.

[9] Wegner P., “Programming Languages, Information Structure and Machine

Organization” Mc-Graw Hill, NY 1968.

23

You might also like