You are on page 1of 33

E-MAIL VIRUSES DETECTION: DETECT E-MAIL VIRUS BY

NETWORK TRAFFIC
A Thesis in TCC402
Pesen!e" T#
The F$%&'!( #)
S%h##' #) En*ineein* $n" A++'ie" S%ien%e
Uni,esi!( #) Vi*ini$
In P$!i$' F&')i''-en!
#) !he Re.&ie-en! )# !he De*ee
B$%he'# #) S%ien%e in C#-+&!e S%ien%e
B(
L$+ F$n L$-
March 24, 2002
On my honor as a University student, on this assignment I have neither given nor
received unauthorized aid as defined by the Honor Guidelines for a!ers in "##
#ourses$

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
&'ull (ignature)
*!!roved+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% &"echnical *dvisor)
&Type Full Name) &(ignature)
*!!roved+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% &"## *dvisor)
&Type Full Name) &(ignature)
"echnical ,e!ort Outline
/LOSSARY:00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
ABSTRACT000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002
1 INTRODUCTION00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002
2 VIRUS DETECTION 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003
2 ELECTRONIC MAIL VIRUS DETECTION MET4ODOLO/Y0000000000000000000000000000000000000000005
4 SIMULATION RESULTS00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000016
7 SIMULATION RESULTS ANALYSIS00000000000000000000000000000000000000000000000000000000000000000000000000000000000021
6 CONCLUSION000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000022
REFERENCE:0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000027
APPENDI8 A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000023
APPENDI8 B000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025
i
"able of 'igures
FI/URE 10 CONTROL SIMULATION RESULTS000000000000000000000000000000000000000000000000000000000000000000019
FI/URE 20 SIN/LE VIRUS SIMULATION RESULTS000000000000000000000000000000000000000000000000000000000015
FI/URE 20 MULTIPLE VIRUS SIMULATION0000000000000000000000000000000000000000000000000000000000000000000000020
FI/URE 40 4I/4 E-MAILS MESSA/ES CAN POTENTIALLY TRI//ER FALSE
VIRUS ALERT00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000021
ii
/'#ss$(:
-.mail+ -lectronic mail$
"rue negative+ /o virus !resent in the system$ *nti.virus !rogram also signals
there is no virus !resent$ #orrect signal from the anti.virus !rogram$
'alse negative+ 0irus !resent in the system$ *nti.virus !rogram signals there is no
virus !resent$ Incorrect signal from the anti.virus !rogram$
"rue !ositive+ 0irus !resent in the system$ *nti.virus !rogram signals there is
virus !resent$ #orrect signal from the anti.virus !rogram$
'alse !ositive+ /o virus !resent in the system$ *nti.virus !rogram signals there is
virus !resent$ Incorrect signal from the anti.virus !rogram$
1
A:s!$%!
-lectronic mail viruses cause substantial damage and cost of traditional anti.virus
method is very e2!ensive$
"his re!ort !resents a ne3 anti.virus method, 3hich runs anti.virus !rogram on mail
server and detects e.mail viruses by mentoring net3or4 traffic$ "he !rogram is called e.
mail traffic monitor$ -.mail traffic monitors can !otentially reduce anti.virus cost since it
only needs to install on mail server$ -.mail traffic monitor can also detect ne3 virus
based on their behavior$
(imulation model and e.mail traffic monitor !rototy!e has been develo!ed in this
!ro5ect to test 3hether this method is !ossible$ "his re!ort states 3hether this is !ossible
based on the simulations results$
2
1 In!#"&%!i#n
"his re!ort suggests detecting and sto!!ing the s!read of e.mail virus at mail
servers$ * simulated net3or4 model and an e.mail traffic monitor !rototy!e are
develo!ed to investigate 3hether it is !ossible to detect electronic mail viruses by
monitoring electronic mails !assing through the mail servers$
I-+#!$n%e #) De!e%!in* E--$i' Vi&ses
6aily activities of both business and home users rely heavily on the Internet
es!ecially e.mail services$ 6isru!tions in Internet normal o!eration can cost huge
monetary damages to business and home users in addition to inconvenience$ In some
e2treme cases, disru!tion of Internet o!erations can !ut national security at ris4$ 'or
e2am!le, the 6e!artment of Health (ervices e2!erienced disru!tions in e.mail services
ranging from a fe3 hours to a fe3 days after 78ove 9ug: infestation$ If a biological
outbrea4 had occurred simultaneously 3ith the 78ove 9ug: infestation, the health and
stability of the /ation 3ould have been com!romised 3ith the lac4 of com!uter net3or4
communication ;<=$
In order to 4ee! Internet functioning normally, it is im!ortant to ma4e sure that
Internet free from harmful disru!tions$ (ince e.mail viruses can easily disable large
number of com!uter 3ithin a short !eriod of time, e.mail virus has the ability to disru!t
Internet activities$ In addition, an e.mail virus, unli4e denial.of.service attac4, 3hich
targets a s!ecific net3or4, usually targets all Internet users$
*lthough anti.virus com!anies and organizations have develo!ed many
methods to detect electronic mail viruses, only four ma5or methods are 3idely used$ "hey
>
are scanners, heuristic analysis, behavior bloc4, and integrity chec4er$ "hese are the four
ma5or methods to detect virus$ 6etails of these four anti.virus methods are in the
a!!endi2 * of this re!ort$ *!!endi2 9 gives the bac4ground information of viruses$
9ecause anti.virus !rograms usually cannot detect ne3 viruses 3ithout soft3are
u!date, anti.virus com!anies and Internet users have to s!end huge amount of money to
u!date their anti.virus !rograms every year$ "he amount of time and money s!end on
anti.virus is a huge burden for all Internet users$
-ven though soft3are u!date is e2!ensive, it is essential that Internet users 4ee!
their anti.virus soft3are u! to date$ "he cost of failure to detect and sto! e.mail viruses
can be very high$ 'or e2am!le, 7I love you:, also called the 78ove 9ug:, 3hich is a
hybrid bet3een e.mail virus and a 3orm, caused five to ten billons business damages
3orld3ide alone ;1=$ "he multi!lication of these e.mail viruses create huge amount of
net3or4 traffic, 3hich increases 3or4loads on mail servers$ "he e.mail viruses also drag
do3n net3or4s and mail servers similar to the denial of service attac4 ;4=$ *s a result,
many Internet users found many of their favorite 3eb sites are do3n, including some of
the e.mail service !age$
"he deadliest characteristic of modern e.mail viruses is that it is generally not
hard to create a ne3 virus$ 'or instance, original sus!ect of the virus 7I love you: 3as a
college dro!out 3ho did not even get his com!uter science degree$
8uc4ily, studies have sho3n that if immunization is a!!lied on selected com!uter
nodes in the net3or4, the number of com!uters infected, and infection rate can be
effectively reduced ;2=$ "his means that if anti.virus !rograms can detect and sto! e.mail
4
viruses at their early !hase, then 3e 3ill be able to dramatically reduced cost of e.mail
viruses? damages$
P#:'e-s ;i!h T$"i!i#n$' An!i-,i&s Me!h#"s
"here are four ma5or methods to detect com!uter viruses$ "hey are scanners,
heuristic analysis, behavior bloc4, and integrity chec4er$
*ll the anti.virus methods share the same ma5or !roblems+ incom!lete !rotection
and high cost$ *nti.virus soft3are has to install and run on every com!uter to give
com!lete safety coverage, but it doest not mean anti.virus soft3are can guarantee these
com!uters are virus free$ 8ost of data due to incom!lete e.mail virus !rotection can be
disastrous$ @hat 3ould ha!!en if (!rint loss its clients monthly billsA
,unning anti.virus soft3are also costs com!utational !o3er$ In addition, install
anti.virus soft3are on every com!uter also costs soft3are license fee$ 'or a com!any of
size of a hundred, cost of a hundred soft3are license is a heavy e2tra financial burden for
the com!any$
R$!i#n$'<S%#+e
It might be !ossible to solve the !roblem above if it is !ossible to detect and sto!
e.mail viruses at the mail server at early stage of the s!read of virus 3ithout soft3are
u!date$ 6amage from e.mail viruses 3ill be greatly reduced$ In addition, the cost of
develo!ing and maintaining anti.virus !rograms 3ill be minimized$
B
P#ssi:'e S#'&!i#n )# P#:'e-s
"his re!ort suggests building an e.mail traffic monitor that runs on a mail server$
"his monitor is going to generate virus alert based on the e.mail traffic !assing through a
mail server$ (ince a mail server is a single !oint of entrances and e2it to any other
destination, the monitor should be able to !rotect net3or4 com!uters served by sto!!ing
e.mail viruses at the mail server$
O,e,ie; #) !he C#n!en!s #) !he Res! #) !he Re+#!
#ha!ter t3o of the re!ort 3ill tal4 about all the related !revious 3or4 on
com!uter virus$ #ha!ter three of the re!ort 3ill e2!lain the electronic mail virus detection
methodology$ #ha!ter four 3ill !resent the simulation results$ #ha!ter five 3ill discuss
simulation result$ 'inally, cha!ter si2 3ill be the conclusion of this re!ort$
<
2 Vi&s De!e%!i#n
,efer to *!!endi2 * for descri!tion about traditional virus detection$ *nti.virus
organizations and com!anies have develo!ed many innovative ideas to detect viruses$
"he follo3ing sho3 t3o of those ne3 methods to detect viruses$
76ata Mining Methods for 6etection of /e3 Malicious -2ecutables,: it sho3s
3ays of artificial intelligence to detect viruses$ "he authors have created three learning
algorithms in this !ro5ect$ -ach of learning algorithms is ca!able of e2tracting malicious
e2ecutables and generates rules sets for detecting the corres!onding viruses ;12=$ "hen
they uses the rules sets that learning algorithms generated to detect viruses$ "his data
mining a!!roach !roves to be fairly successful in detecting 4no3n viruses$ It can detect
CD$D<E of the 4no3n viruses, but none of the three algorithms is reliable in detecting
ne3 viruses$ "he false virus alarm rate of this data mining detection is almost the same as
the rate of the four traditional anti.virus methods mentioned in cha!ter one$
In the second e2am!le, 9alzer has develo!ed e.mail 3ra!!er to detect viruses in
e.mail attachments ;1>=$ His focus 3as on e.mail attachment because most of the viruses
!ro!agates by electronic mails are sent as e.mail attachments$ "he 3ra!!er !rovides run.
time monitoring and authorization to ensure that the content e2ecutes safely so that any
harmful behaviors are bloc4ed$ Monitoring and authorization are accom!lished by
mediating the interfaces used by the !rocesses to access and modify resources$ In this
3ay, the 3ra!!er can detect violation !rocess s!ecific rules$ @hen the rules are violated,
the 3ra!!er 3ill inform users, and users 3ill determine 3hether to allo3 or !rohibits the
offending o!erations$ "his a!!roach !roves to be very successful$ It has successfully
sto!!ed small number of viruses received since it 3as de!loyed in (e!tember 2000
D
&including I love you and the *nna.Forni4ova viruses) ;1>=$ "his a!!roach is very
similar to the 3ay behavior bloc4er 3or4s, but the difference is that 3ra!!ers only
monitor e.mail attachment 3hile behavior bloc4ers monitor on all com!uter !rograms$
"he ne2t cha!ter of !a!er is going to tal4 about the virus detection method, 3hich
monitors the e.mail traffic$
G
2 E'e%!#ni% M$i' Vi&s De!e%!i#n Me!h#"#'#*(
"he statistical data of e.mail viruses from Message8abs, 3hich ca!tures daily and
monthly viruses? activity, gives us the foundation of this !a!er$
De!e%!i#n Me!h#"#'#*(
*ccording to the virus activities statistics from Message8abs, most of the 4no3n
successful viruses s!read e2!onentially during first fe3 days of its e2istence ;1B=$ Human
daily activities directly affect activities of e.mail viruses$ "he e.mail viruses? activities
gro3 dramatically during the morning as !eo!le go to 3or4 and use e.mail$ "hen it !ea4s
during noon and starts to dro! as !eo!le leave the office$ Moreover, the e.mail viruses?
activities dro! to its minimum at midnight$ *lmost all e.mail viruses follo3 this activity
!attern$
-.mail viruses? activity also has life cycle that 3ill hel! us to identify them$ 'irst,
e.mail virus infects a hostH then, infected host send e.mail viruses to infect other hostsH
this life cycle continues until there is an anti.virus solution, or other method to sto! it$ 9y
identifying this life cycle, anti.virus !rogram may be able to detect virus by building a
tree structure that connects infected com!uters in chronological order$ In this tree
structure, e.mails that contain virus then become the edges bet3een tree nodes$ 9y
correctly defining the minimum size of for an e.mail virus tree, it is logical that anti.virus
!rogram should be able to detect the !resence of e.mail$
Ho3ever, an e.mail virus does not infect every host 3ho has received the e.mail
virus$ 'or instance, if an e.mail virus is sent to an o!erating !latform, 3hich the e.mail
virus cannot run on, the host of that o!erating !latform stays virus free$ "his situation
C
may cause insufficient data to dra3 a tree$ 'ortunately, a large virus activity data set can
solve this !roblem$ (ince e.mail virus activity gro3s e2!onentially during its early stage,
early e.mail virus activities can su!!ly such data set$
Ass&-+!i#n
(ince simulation abstract the real model into a sim!ler model, the simulation runs
3ith several assum!tions$
-very user 3ithin the simulated net3or4 registered 3ith only one e.mail service
!rovider$
"he e.mail service !rovider can access all the e.mails circulating bet3een its
clients 3ithin the net3or4$
"he number of users in the net3or4 is limited and stays constant$
-ach user?s mailbo2 has a ma2imum ca!acity on hisIher mailbo2 3hich resides
on the server$
I-+'e-en!$!i#n
"his simulation model has t3o !arts+ * simulated net3or4 based on ,a!tor, and
an e.mail traffic monitor$
,a!tor is a !rogram that simulates a net3or4 environment ;14=$ "his !ro5ect uses
,a!tor as the basis for net3or4 model$ -.mail traffic monitor interce!ts messages !ass
bet3een nodes 3ithin a net3or4 and generates a!!ro!riate virus alerts base on the
interce!ted messages$
"he follo3ing is the detail im!lementation of the simulated net3or4 and the e.
mail traffic monitor$
10
Si-&'$!e Ne!;#=
"he net3or4 is simulated using on ,a!tor ;14=$ (imulated net3or4 has t3o
layers$ "he lo3er layer is a ra!tor$ "he u!!er layer is a net3or4 model$
R$+!#
,a!tor uses threads to re!resent nodes in a net3or4$ -very thread in ,a!tor
re!resents a single node 3ithin the simulated net3or4$ ,a!tor has the ability to !ass
messages bet3een different threads$ ,a!tor also synchronizes every thread &node) 3ithin
the simulated net3or4 so that every thread &node) has to 3ait for all the threads finish
current tas4 before it can e2ecute the ne2t tas4$
Ne!;#= M#"e'
/et3or4 model in this !ro5ect creates one single thread to serve as a server for
other threads &client threads) in all simulations$ "he server thread receives messages from
client threads$ *ccording to each message?s destination, the server thread then directs the
message to its desire destination threads$ "herefore, the server thread is acting as a
medium of message e2change, and the server thread can access all the messages it has
received$ "his means the server thread has access to all the messages in the net3or4$
-ach of the client threads in the simulated model has an ob5ect called machine$
Machine ob5ect stores information of each client thread$ 'or e2am!le, machine stores the
name of the client thread and the address boo4 of the !arent client thread$ "he stored
information in a machine ob5ect directly determines the behavior it !arent client thread$
"he !arent client thread 3ill not send virus e.mails if the stored information in the child
11
machine ob5ect s!ecifies that the !arent client thread is virus free$ "he machine stored
information changes over time$ 'or e2am!le, e.mail virus infects a client thread 3ill
change the stored information of the machine so that the client thread 3ill behave
differently$
E--$i' T$))i% M#ni!#
-.mail traffic monitor runs in the server thread$ "here is only one server thread
for the simulated net3or4$ -.mail traffic monitor interce!ts and stores related e.mails,
3hich the server thread receives$
-.mail monitor then grou!s stored e.mails according to their attachment size$ -.
mails in each grou! are sorted according to the chronological order that the monitor has
received them$ 'inally, the monitor finally 3ill try to build a tree from the messages in
each grou!$ "he monitor then 3ill determine 3hether there is e.mil virus by analyzing
the tree structure$
"here are three ma5or !arts that !erforms the actions above$ "he monitor also has
three im!ortant values$ "he details are as follo3$
M#ni!#in* R$n*e >,$'&e?
'or sim!lification !ur!oses, natural numbers re!resents I addresses in the
simulations$ Monitoring range has t3o numbers that s!ecify a range of numbers bet3een
these t3o numbers$ 'or e2am!le, 1 and C s!ecify all number bet3een 1 and C$ -mail
traffic monitor uses monitoring range to determine 3hich messages it should interce!t$
'or instance, if there are CC client threads, e.mails can only send to CC com!uters$ *ll e.
mails in this e2am!le can only address to any number bet3een 1 and CC$ -mail sender?s
12
com!uter number also is bet3een 1 and CC$ In this case, if traffic monitor has a
monitoring range bet3een 4 to C, it 3ill only interce!t emails messages 3hich are
sending to com!uter number bet3een 4 and C, or the sender com!uter number is bet3een
4 and C$
Mess$*e S!#$*e
-.mail traffic monitors 3ill store a monitored message if it has an attachment$ -.
mail traffic monitor stores a monitor message using the attachment size as an inde2H in
addition, all messages are stored in a chronological order$
Pi#i!( In"e@ Lis!
"he monitor does not scan all the messages it has stored$ It only scans messages
according to !riorities$ riority inde2 list store the message !riorities$
riority inde2 list is a lin4 list$ 8in4 list consists of nodes$ -ach node contains
three !ieces of information+ number of occurrences, last u!dated time, and attachment
size$
'or e2am!le, if the monitor have stores four messages in the last four time unit,
each has an attachment of size 200, there 3ill be a node in the list contain this
information &number of occurrences is 4, last u!dated time is 4, and attachment size is
200)$ Inde2 list 3ill store the nodes in descending number of occurrences order so the
nodes at front have the highest occurrences$
In order to reduce the accumulative effect of time on the number of occurrence on
each node, inde2 list 3ill reduce the number of occurrences of a node as if inde2 list does
u!date that node for a !eriod of time$ 'or e2am!le, if scan range &!lease refer to
1>
follo3ing sections) is 4 time unit and current time is 10, inde2 list 3ill halves the
occurrences of all nodes 3hich last u!dated time bet3een time < &10 minus 4) and time
10$
Inde2 list 3ill delete a node 3hen that node?s information is no longer useful to
traffic monitor$ "raffic monitor does not need any node 3hich the last u!dated time
e2ceeded t3o scan range &!lease refer to follo3ing sections)$ 'or e2am!le, if scan range
of traffic monitor is 4 and current time is B<, inde2 list 3ill delete any node that the last
u!dated time is smaller than time B2 &B< minus 4)$
S%$n R$n*e >,$'&e?
(can range is a number 3hich s!ecifies a time 3indo3$ "ime unit is measured in
time ste!$ @hen all the client threads have finished one turn, a time ste! is !assed
Monitor uses scan range to decide ho3 many messages it$ 'or e2am!le, if the scan range
is D and current is <D, traffic monitor 3ill scan all the messages received bet3een <0 &<D
minus D) and <D$
(can range also determines ho3 many messages in the message storage$ "raffic
monitor discards all messages 3hich have been received earlier than one scan range ago$
Vi&s De!e%!i#n En*ine
0irus 6etection engine runs according to a schedule$ "he virus detection engine
runs re!eatedly$ It 3ill 3ait for a s!ecific time before it runs again$ "he user s!ecified the
time traffic monitor should 3ait$
@hen the virus detection engine runs, first it retrieves information from inde2 list$
Inde2 list !rovides detection engine the ten highest occurring attachment size$ "hen the
14
virus detection engine uses attachment size as inde2 to retrieve ten sets of messages$ 'or
each message set, detection engine tries to build a tree structure the messages$ 6etection
engine gives score to each tree it builds according to the number of tree branches, tree
de!th, and the number of child nodes$ If the result tree score is bigger than the monitor?s
default score, the scan engine gives a virus alert$
S%#e V$'&e
(core is a number that determines ho3 sensitive the detection engine to e.mail
viruses$ #hange in score 3ill vary the number of virus alertH because detection engine
detects viruses by giving score to tree structure derived from received messages$ If the
score is too lo3, detection engine 3ill have a high !robability of giving false virus alert$
If the score is too high, detection engine 3ill have a high !robability of failing to detect
viruses$ In the derived virus message tree, connection to one child at first level of the tree
structure adds one !oint to the total score of the tree$ "he scoring of a child node doubles
as the tree de!th of the child increase by one$ If any result virus tree score is bigger than
monitor?s default score, the scan engine 3ill generate a virus alert$
1B
4 Si-&'$!i#n Res&'!s
D$!$ C#''e%!i#n Me!h#"
"here are five different in!ut !arameters, 3hich 3ould affect the simulation
out!ut results$ "here are t3o ty!es of simulations+ control simulations and virus
contaminated simulations$ In control simulation, there is no e.mail virus$ In virus
contaminated simulation, there 3ill be different ty!es of viruses 3ithin the net3or4$
In+&! P$$-e!es
/umber of nodes in the net3or4+ /umber of nodes re!resents the number of
com!uters 3ithin the net3or4$
Monitor 0irus (can Interval+ "his is a number that indicates ho3 often the e.mail
traffic monitor is going to run virus scan$
/umber of viruses !resent+ "his number determines the number of viruses !resent
in the net3or4 and indirectly determines the e.mail activity$
Monitor ,ange+ "he number of nodes &com!uters) that the mail traffic monitor is
going to monitor$ * large monitor range increases the 3or4load of the monitor$
Monitor (core+ "his is a number that determines ho3 sensitive the e.mail traffic
monitors to e.mail viruses activities$ "he smaller the number, the more sensitive
the monitor to e.mail viruses activities$
C#n!#' Si-&'$!i#ns
(imulation 3ill run 3ithout the !resence of e.mail viruses as control e2!eriment$
1<
9ecause there is no virus !resent in control e2!eriment, any virus alert generated by
detection engine of traffic monitor 3ill be false alert$ *s a result, results from control
e2!eriments determine 3hether traffic monitor gives false virus alert$
Vi&ses C#n!$-in$!e" Si-&'$!i#ns
(imulations 3ill run in t3o different virus environments$
(ingle 0irus+ (imulations run 3ith the !resent of single viruses$
Multi.virus+ (imulations run 3ith !resence of multi!le ty!es of e.mail viruses$
Si-&'$!i#n D$!$
-ach simulation gives result in t3o files$
8og$t2t+ ,ecords email activities$ It selectively records information of e.mails
3hich have highest occurring attachment size$$
0irus*lert$t2t+ ,ecords each virus alert that the e.mail traffic monitor generates$
Si-&'$!i#n Res&'!s
"here are t3o ty!es of simulations+ control simulations and viruses simulations$
"hese t3o ty!es of simulations begin 3ith the same initial setting e2ce!t that control
simulations have no e.mail viruses$ "he follo3ing is the simulation results$
1D
C#n!#' Si-&'$!i#ns
"here are si2 control simulations$ -ach control simulation starts 3ith a hundred
nodes$ CC nodes are clients &e.mail users), and one node is the server &e.mail server)$ "he
monitor scan range is four$ "he monitor 3ill generate an alert if it can build a tree from
the interce!ted messages 3ith score bigger than >0$ In this setting, it is desire that there is
no virus alert$ -very client has a random chance of re!lying and generating e.mails$
-very client also has a different !robability of infected by e.mail virus$ *fter a virus has
infected a client, the virus 3ill change the client behavior according to the virus
characteristics$ *ll the control simulations started 3ith the same initial setting describe
above$
"here is one out of seven control simulations in 3hich the monitor generates a
false virus alert$ "he monitor has a false alert rate of 1BE in the control e2!eriment$
"he result of the control simulations are sho3n belo3+
Initial (ettings
/umber of /odes 100
(core >0
(can range 4
Monitor range 4
(tatus /o virus
(imulation "rial
0irus
*lert
/umber
of *lerts
"otal (can
Messages
"otal
"ime
*verage
messages
1 0 0 4B2 1G 2B
2 0 0 B4B 1G >0
> 1 24 >G2B 1G 21>
4 0 0 1D< 1G 10
B 0 0 >C4 1B 2<
< 0 0 >4< 1G 1C
D 0 0 4BD 1G 2B
Fi*&e 10 C#n!#' Si-&'$!i#n Res&'!s0
1G
Vi&sesA Si-&'$!i#ns
0iruses contaminated simulations have the same setting as control e2!eriment
e2ce!t that e.mail viruses are !resent$ 0iruses contaminated simulations have t3o ty!es+
single virus simulation and multi!le virus generation$ *ll the control simulations started
3ith the same initial setting as the control e2!eriment e2ce!t the number of e.mail
viruses !resent$
"he follo3ing is the results of the single virus simulations$ "here is only one ty!e
of viruses !resent in each of the follo3ing simulations$ Monitor gave five correct virus
alerts in seven single virus simulations$ "he monitor is D1E accurate to give true !ositive
alert$ "he monitor failed to re!ort the !resence of virus in t3o simulations$ "herefore, the
false negative rate is 2CE$
Fi*&e 20 Sin*'e Vi&s Si-&'$!i#n Res&'!s0
Initial Settings
Number of nodes 100
Score 30
Scan range 4
Monitor range 4
Status 1
Simulation
Number
Number of
Virus
Number of Virus
Alert
Number of Correct
Virus Alert Accuracy
Time
Delay
1 1 1 1 100 !
" 1 1 1 100 #
3 1 1 1 100 1$
4 1 1 1 100 #
$ 1 0 0 0 None
# 1 1 1 100 1"
% 1 0 0 0 None
1C
"he follo3ing is the results of multi!le virus simulations$ "here is more than one
ty!e of viruses in each of the follo3ing simulations$
"he first four simulations ran 3ith t3o viruses$ Monitor gave eight correct virus
alerts$ "herefore, the true !ositive rate of the monitor is DBE$ Monitor failed to detect 2
viruses in one simulation$ Hence, the false negative rate is 2BE$
"he last simulation had three viruses$ Monitor successfully detects t3o virusesH
therefore, the true !ositive rate is DBE$ "he false negative rate is 2BE because the
monitor failed to detect one virus$
Fi*&e 20 M&'!i+'e Vi&s Si-&'$!i#n0
Initial Settings
Number of Nodes 100
Score 30
Scan range 4
Monitor range 4
Status " &iruses
Simulation
Number
Number of
Virus
Number of Virus
Alert
Number of Correct Virus
Alert Accuracy
Time
Delay
1 " " " 100 3
" " " " 100 #
3 " " 0 0 #
4 " " " 100 #
T'ree Virus Simulation
Status 3 &iruses
Simulation
Number
Number of
Virus
Number of Virus
Alert
Number of Correct Virus
Alert Accuracy
Time
Delay
$ 3 " " ## #
20
7 Si-&'$!i#n Res&'!s An$'(sis
"he simulation results from the !revious cha!ter sho3 that the monitor is fairly
accurate in detecting e.mail viruses$ Ho3ever, it also has some 3ea4ness+ first, it
!roduces some false virus alertH second, it fails to detect some of the virus$ "his cha!ter
3ill e2amine simulation results$
F$'se P#si!i,e A'e! An$'(sis
*lthough there is no virus !resent in the third control simulation trial, the monitor gives
one false virus alert in one control simulation$ @hat causes the monitor gives false virus
alertA "he log data in that simulation reveals the origin of the false alert$ ,efer to the
gra!h belo3, 3hich sho3s the e.mail activities at each unit time$ "he number of
messages !er unit time is e2tremely highH it is ten times higher than that of the other si2
Number of Received Messages per Unit Time
0
$0
100
1$0
"00
"$0
1 " 3 4 $ # %
Trial Simulation Number
Number (f )ecei&ed Message *er Time +nit
Fi*&e 40 4i*h e--$i's -ess$*es %$n +#!en!i$''( !i**e )$'se ,i&s $'e!0
21
control simulations$ "herefore, large amount of e.mail activity can !otentially trigger
false virus alert$ If a attachment hash is using instead of attachment length, it can reduce
change of false virus alert$ It is because using attachment hash reduces the chance of
falsely categorized t3o different attachment 3ith different content but same length as the
same attachment$
F$'se Ne*$!i,e R$!e An$'(sis
"he monitor failed to detect t3o viruses in trial simulation > of the viruses
contaminated simulations$ It also failed to detect one of the three viruses in trial
simulation B$ On the other hand, the traffic monitor successfully detected most of the
viruses in trial simulation$ 9ecause the monitor 3as able to detect most of the viruses, it
!roves that the monitor itself can detect viruses$ 9ase on this fact, there is one !ossible
e2!lanation that the monitor cannot detect some viruses$ If there is only a fe3 client
threads, 3hich are 3ithin the monitoring range of the traffic monitor, are infected 3ith
the virus$ "he monitor 3ill not have enough infected tree nodes to build a tree that can
trigger a virus alert$
T&e P#si!i,e A'e!
"he simulation results sho3 that the monitor can detect e.mail viruses$ "he
monitor can only detect e.mail viruses 3ith accuracy around D0E$ It is relatively accurate
considering that the monitor has no 4no3ledge of any e.mail viruses$
22
6 C#n%'&si#n
"his cha!ter gives final conclusion of this !ro5ect based on the simulation results$
It 3ill also give future recommendation and direction for 3ho interests in further research
in this field$
S&--$(
"he simulation result analysis sho3s that the monitor is able to detect e.mail
viruses by monitoring e.mail traffic$ Ho3ever, simulation analysis sho3s that monitor
cannot detect all the viruses and sometimes generate false virus alert$
In!e+e!$!i#n
"his !ro5ect has succeeded gives theoretical foundations to detect virus by
analyzing e.mail traffic !ass through mail server$ "he simulation result suggests that it is
!ossible to detect e.mail virus 3ithin a net3or4$ It is a robust method since it can detect
ne3 e.mail viruses on the go$
Ho3ever, the virus detection mechanism reJuires further im!rovements before
!ractical usage$ -ven 3hen it becomes ready for !ractical usage, it should not be used as
the only !rotection against e.mail viruses$ It should be used to strengthen !rotection
against e.mail viruses$
Re%#--en"$!i#n
"his virus detection mechanism reJuires further im!rovements and modifications
before !ut it into !ractical usage$ (ince the re!ort on this virus detection mechanism
2>
comes from net3or4 simulation, it does not guarantee this virus detection mechanism is
going 3or4 e2actly the same on a real net3or4$ "his virus detection mechanism should
be tested on a !hysical net3or4$ "his is because this virus detection mechanism runs on
mail servers, 3hich are critical !oints in electronic communications$
'inally, there still t3o concerns for this virus detection method$ 'irst, each mail
server in reality !otentially could have thousands of users in a real net3or4H running the
e.mail traffic monitor consumes e2tra com!utational resources that on mail server can
effectively delay e.mail services$ (econd, a com!uter user usually has several e.mail
accounts$ In order to !rotect the user?s com!uter, every user?s e.mail service !roviders
have to install this traffic monitor$
24
Re)een%e:
;1= K6/et UF$ /e3 !age$ C May 2000$ K6/et UF$ C May 2000
L'tt,-..///01dnet0co0u2.ne/s."000.13.ns41$"#$0'tml50
;2= #hen2i @ang, Mohn Fnight, and M$ -lder$ 7On #om!uter 0iral Infection and the
-ffect of Immunization$: "echnical ,e!ort U0*.#(.CC.>2, 6e!artment of
#om!uter (cience, University of 0irginia, 1CCC$
;>= Ma4e 'erry$ 7* (tudy and -valuation of 0irus rotection (oft3are Mar4eted to
*verage #om!uter Users$: 6issertation -(20000<, 6e!artment of #om!uter
(cience, University of 0irginia, 2000$
;4= 6avid Moore, Geoffrey 0oel4er, and (tefan (avage$ 7Inferring Internet 6enial.of.
(erve *ctivity$: roceedings of the 10th U(-/IN (ecurity (ym!osium,
*ugust 2001$
;B= 9rian Utt$ 76etection and Identification of Intruders in /et3or4 (ystems$:
6issertation #(CC00>>, 6e!artment of #om!uter (cience, University of
#om!uter (cience, 1CCC$
;<= Mac4 9roc4$ 7 7I 8ove Oou: #om!uter 0irus Highlights /eed for Im!roved *lert
and #oordination #a!abilities$: In roceeding of #ritical Infrastructure
rotection ?00 &May 1G), G*O$
;D= -ugene Fas!ers4y$ 70iruses and the Internet. @hatever /e2tA: 0irus 9ulletin,
!14.1D, 'ebruary 1CCC$
;G= 8$M$ *dleman$ 7*dvances in #ry!tology$: #ry!to ?GG$ roceeding, 8ecture /otes
in #om!uter (cience 40>, (!ringer, 9erlin$ 1CC0$ !!$ >B4.>D4$
2B
;C= ,ichard 'ord$ 7Mal3are+ "roy ,evisited$: #om!uter P (ecurity, v 1G n 2 1CCC, !
10B.10G$
;10= aul 6ochery, and eter (im!son$ 7Macro *ttac4s+ @hat /e2t *fter MelissaA:
#om!uters P (ecurity, v 1G n B 1CCC, ! >C1.>CB$
;11= 0esselin 9ontchev$ 7Macro 0irus identification !roblems$: #om!uter P (ecurity, v
1D n1 1CCG, !<C.GC$
;12= Kado4, (tolfo, (chultz, and -s4in$ 76ata Mining Methods for 6etection of /e3
Malicious -2ecutables$: "echnical ,e!ort, 6e!artment of #om!uter (cience,
University of 0irginia, 2001$
;1>= ,obert 9alzer$ 7*ssuring the (afety of O!ening -.mail *ttachment$: 6*,*
Information (urvivability #onference P -2!osition II, 2001$ 6I(#-N Q01$
roceedings , 0olume+ 2 , 2000$
;14= "he ,a!tor (imulator$ Home !age$ 1C Mar$ 2002$ "he ,a!tor (imulator$ 1C Mar$
2002 Lhtt!+II333$cs$virginia$eduIRsurviveIra!torIS$
;1B= Message8abs$ Home !age$ 1C Mar$ 2202$ Message8abs$ 1C Mar$ 2002
Lhtt!+II333$messagelabs$comIS
2<
A++en"i@ A
Vi&s "e!e%!i#n Me!h#"s
0irus scanner is !robably the most 3idely used method to detect virus, and virus
scanner is by far the most accurate and effective 3ay in detecting com!uter viruses$
Ho3ever, scanner reJuires constant u!date, 3hich adds cost to maintain$ 0irus scanner
alone cannot guarantee a com!uter is virus free, because virus u!dates usually come out
after e.mail viruses have inflicted ma5or damage$ "hat is 3hy anti.virus !rograms usually
use three methods mention in the follo3ing to strengthen their ability to detect viruses$
-ven though Heuristic analysis, behavior bloc4, and integrity chec4er add strength to the
anti.virus !rograms, they have the same 3ea4nesses, 3hich they tend to have a high false
virus alert rate$
Heuristic analysis analyzes com!uter files$ "hen, it tries to !redict 3hat a
com!uter file is trying to do$ If the action of a com!uter file violates the rules of heuristic
analysis, heuristic analysis 3ill generate a virus alert$ Ho3ever, heuristic analysis cannot
al3ays !redict 3hat e2actly a com!uter file is doing, because com!uter files have billions
of variations$ "hus, heuristic analysis generates many false !ositive, false negative alerts
as 3ell as some true !ositive, true negative virus alerts$
9ehavior bloc4 monitors !rogram behavior$ If the !rogram is trying to do
something that the !rogram is not su!!osed to do, behavior bloc4 3ill bloc4s the
!rograms action and fire a virus alert$ 9ehavior bloc4 acts almost the same as heuristic
analysis, e2ce!t that behavior bloc4 chec4 !rogram behavior at run time 3hile heuristic
2D
analysis chec4s com!uter file?s action before a com!uter file runs$ "herefore, behavior
bloc4 has the same !roblems as heuristic analysis$
Integrity chec4er chec4s com!uter files? integrity using chec4 sum$ If the chec4sum
value of a com!uter file does not match its old chec4sum value stored in the integrity
chec4er, integrity chec4er 3ill give a virus alert$ /evertheless, because com!uter files are
constant modified by the com!uter and the user, integrity chec4er does not give accurate
virus alerts$
2G
A++en"i@ B
Vi&s B$%=*#&n" In)#-$!i#n
#om!uter virus is not a ne3 to!ic in the com!uter field$ It 3as originally the
results of both !rogramming bugs from careless !rogrammers, and malicious codes from
malicious !rogrammers$ "he first re!ort of com!uter virus 3as in 1CG1$ *dleman is
credited 3ith coining the term 7com!uter virus$: #ohen is credited 3ith doing the first
serious research in com!uter virus ;G=$
In the last D.year, viruses have changed its course in its 3ay of infecting its targets$
-lectronic e.mails no3 have become the most common medium for virus infection$
Unli4e old 3ay of virus !ro!agation, 3hich s!read virus by sharing dis4s, electronic
messaging can infect millions of com!uter in an hour 3ithout any !hysical contact$
Many of the e.mail viruses today use the 7"ro5an Horse: strategy$ "hey contain
hidden functions that can e2!loit the !rivileges of the user 3ith a resulting security threat$
"his all begins 3hen the des4to! !latform become homogenizes and !eo!le start share
files ;C=$ In the infamous virus 7Melissa:, the virus 3ill ta4e control of Outloo4 once the
user clic4 on the Melissa.infected attachment, and virus 3ill send out co!ies of the virus
to first fifty !eo!le on the mailing list$ Ho3ever, 7Melissa: 3as not the first one to use
such techniJue, virus such as (harefun also used the same techniJue ;10=$
7Melissa: and 7I love you: are belonged to a virus set called macro virus$ Macro
viruses usually are embedded !rograms of Microsoft Office documents$ It is e2tremely
tric4y to remove macro viruses$ 'or e2am!le, if an anti.virus !rogram im!ro!erly
disinfects a macro virus, the im!ro!er disinfections !rocess can create a ne3 macro virus
2C
;11=$ In this e2am!le, anti.virus !rogram disinfections !rocess generates a ne3 variety of
the same virus, 3hose behaviors become un!redictable$ Macro virus has also !resented
another !roblem$ *s the macro in old office document formats is converted to ne3 office
formats, macro virus 3ould become hard to recognize because office converter adds
information into the macro virus$ (ame difficulty a!!lies 3hen macro is converted from
ne3 office document formats bac4 to old office document formats ;11=$
>0

You might also like