The future of virus detection, Hacking and IT E-Book Dump Release

[ Pobierz całość w formacie PDF ]
The future of virus detection
Richard Ford, PhD.
Research Professor,
Florida Institute of
Technology
1. Introduction
than a couple of kilobytes in length that
spread via the boot sectors of removable
media. Due to the size limitations, and the
naivety of the virus writers of the time,
each copy of a virus “looked” the same, and
the machine code was not obfuscated in any
serious way. Furthermore, outbreaks
generally took several weeks or even months
to build, as the time between infection
events was large.
When the average computer user can tell
you the name of the latest major Internet
worm, it becomes clear that we are not
winning the war against Malicious Mobile
Code. With the billions of dollars spent
annually by the antivirus community and
operating system manufactures on new and
innovative countermeasures to stop viruses
and worms, one is tempted to ask is the war
even winnable? Is the issue simply one of
careless users taking unnecessary risks, or is
there something fundamentally broken in
the way in which we attempt to defend
against the onslaught of new viruses?
Dept. of Computer
Science, Florida Institute
of Technology, 150 W.
University Blvd,
Melbourne, FL 32901,
USA.
Dr. Richard Ford graduated
from the University of
Oxford in 1992 with a D.Phil
in Quantum Physics. Since
that time, he has worked
extensively in computer
security and malicious
mobile code. Previous
projects include work on the
Computer Virus Immune
System at IBM Research and
development of the largest
web hosting system in the
world, whilst Director of
Engineering for Verio. He is
currently Research Professor
at Florida Institute of
Technology's Center for
Information Assurance,
where he carries out research
on a wide variety of security-
related areas. He is currently
Executive Editor of Reed-
Elsevier's Computers &
Security, and Virus Bulletin.
He is also Scientific Advisor
to the European Institute of
Computer Antivirus
Research.
It was in this environment that the
scanner made its debut. At the time, most
viruses were relatively static and slow
moving. It seemed as though a signature-
based virus scanning approach would thus
be capable of arresting their spread, and
thus this technology formed the basis of
most virus scanners. Here, the scanner
would look for a particular sequence of
bytes in boot sectors and files. This
approach was fairly accurate, as the virus
code was simplistic. Furthermore, due to
the slow spread, updates could be issued
monthly, often via regular mail.
In this article, these issues will be
examined by looking not at the current
threat profile, but at the different ways in
which we defend our systems and their
weaknesses. By examining the problem
from this direction, it is possible to build up
a list of requirements for solutions of the
future – essentially, what functionality
tomorrow’s anti-virus software will need to
have to keep us one step ahead of the next
generation of viruses and worms.
As the threat changed, so did the
solution. While the idea of “signature”
based scanning remained popular, scanner
vendors improved performance by moving
away from scanning every byte of an object
to just scanning those around the entry
point of the object. This optimization
provided for significant speed
improvements, which were critical given the
memory and processor limitations of the
day. Furthermore, by restricting the area of
search the problem of false positives –
where a clean file is inadvertently reported
as infected – was reduced.
2. Overview of the solution
space
While the antivirus industry has been
steadily improving over the years, it may
come as a surprise for some to realize that
the fundamental technology used for
detecting and removing viruses has changed
very little over time. In this section, a brief
discussion of current common virus
solutions is outlined, so that a better
understanding can be gained of what one
can realistically expect from such
countermeasures.
Richard lives with his wife
and two dogs in beautiful
Florida and, when not trying
to solve the world's malcode
problems, plays both classical
and jazz flute.
The next large change in detection
strategy arrived with the advent of so-called
polymorphic viruses. These viruses remain
operationally unchanged with each
replication, but employ variable decryption
routines so that there is no constant string of
When viruses first became common,
most samples were simple boot viruses –
that is, small blocks of code, usually less
19
1363-4127/04/© 2004, Elsevier Ltd
Malware
bytes that can be used for detection. This
technique became more popular and forced
most vendors to trace the path of execution
into the virus code in order to detect an
infected file. Perhaps the largest milestone –
certainly the most well known – was the
release of the Mutation Engine. This was a
simple but effective polymorphic engine that
was designed to be linked in to simple static
file viruses; suddenly polymorphism was
within the reach of any virus writer who
knew how to leverage a link library.
carry out against current anti-virus
products. In this section, we will examine
two of the attacks we have already observed
as opposed to future threats, respecting the
long-standing tension within the anti-virus
world between informing users of risks and
actually arming virus writers with new
ideas.
3.1. Speed of spread
Modern malware is fast. In almost the blink
of an eye, an infected machine can send
infected documents worldwide; worms like
SQL.Slammer can become pandemic in just
minutes [
]. Compare this to the
virus problem before ubiquitous
connectivity. Organizations like The
WildList tracked outbreaks around the
world, as infection spread from host to host
via sneakernet. Those days are gone forever.
The current situation is not
fundamentally changed from this distant
past. While anti-virus vendors no longer use
strings of static bytes to detect viruses, the
vast majority of viruses are detected using
techniques that identify viruses by
particular attributes that must be known
before detection and removal can take place.
Because of this, signature distribution has
moved from a monthly event to an ongoing
race, with vendors issuing updates via the
Internet in real time.
Of course, anti-virus software has taken
advantage of connectivity too. Suspicious
samples can be gathered from host
machines quickly and solutions distributed
worldwide in real time. There have even
been some attempts – most notably the IBM
Digital Immune System [
] – to
automate parts of the solution generation
process. However, with outbreak times now
being measured in minutes not weeks, it is
increasingly improbable that solutions that
are reactive can hope to keep up with the
virus problem. This was nicely illustrated in
[
], which showed that the
lag between virus outbreak and solution
dissemination is an important factor in
determining outbreak size.
3. Challenges in the solution
space
From a description of the problem and
solution space, it is fairly easy to see that
there are a number of open issues for those
interested in virus prevention. Note here the
change in focus: anti-virus solutions are
about return on investment and risk
mitigation, not the complete removal of the
virus problem. Thus, from a purely
pragmatic worldview, the primary goal of
anti-virus software is not ultimately virus
detection, but reduction of the cost
associated with viruses. While this almost
always means detection, it does not have to
be detection. This clarification is an
important point, as we shall see later.
In addition to this work, we have created
a network-aware simulator to study the
challenges of solution dissemination in a
degraded network environment. Detailed
results of our work in this area are to be the
subject of a later paper. Funded by a grant
from the Cisco Critical Infrastructure
This clarification aside, there are a
number of very unpleasant attacks one can
20
Information Security Technical Report. Vol. 9, No. 2
 Richard Ford
The future of virus detection
Assurance group, early results indicate that
network degradation during an outbreak
further weakens the abilities of reactive
solutions. While speed of spread is the
result of a number of different factors, the
result of increased spread speed is very
simple: signature-based solutions are
becoming increasingly untenable.
In order to detect such files, anti-virus
developers have shifted detection strategies
and now heavily rely on tracing of the
execution path within scanned objects.
During this process, the scanner can
determine if the code being executed is
viral. This technique has proven very
effective, and is significantly faster than
bulk-scanning large files. However, its
usefulness is predicated on the fact that
binary file viruses are executed at the start
of the execution process.
3.2. Entry point obfuscation
The next big concern for scanner
manufacturers is entry point obfuscation, or
EPO. In order to understand why EPO is
such a problem for the scanner developers,
we must first gain a better understanding of
how virus scanners work.
EPO viruses attack this assumption
directly, embedding the jump to the virus
code deep within a target executable. Thus,
simply tracing the execution path of an
EPO-infected file provides no guarantee that
the virus code itself will ever be called. Even
if the actual virus code is located at the end
of the executable, the scanner is unable to
detect its exact location in order to emulate
it. Essentially, EPO raises the bar in terms of
what is required for static detection, as it
removes the ability for a scanner to trace
into the virus code with surety.
Once, when viruses were very simple,
and each replication of a virus “looked”
very much like every other, viruses could be
detected by scanning files for simple
hexadecimal strings. Such scanning,
however, was slow and unwieldy – as any
file that contained the magic hex signature
was detected as infected. Given that most
viruses modified the file in such a way that
the virus code was located at a predictable
position within the file, vendors moved
from simplistic bulk scanners to scanners
that only searched particular locations in
files.
4. New solutions and the
future
Given that we have already demonstrated
that reliable virus detection via static
analysis is impossible and that current
attacks are pushing the envelope in terms of
manageability, new solutions to malcode are
paramount if we are to create a robust
Internet infrastructure. In the remainder of
this paper, we will discuss the attributes of
these new technologies, and from a
requirements analysis position, attempt to
work out what effective solutions might look
like.
This improvement was rendered useless
by the creation of “polymorphic” viruses.
Such viruses contain a static code block that
is trivially encrypted, and a “new”
decryptor is built for each replication. Thus,
as the static code is encrypted, the virus
scanner is forced to search for the presence
of the decryptor… which, as it is
dynamically generated, is different for each
replication of the virus. The result of this is
that while each generation of a
polymorphic virus is functionally identical,
there may be no single code sequence found
in every infected file.
4.1. Speed of response, revisited
Given the speed of spread of current
malware, one of the obvious requirements is
21
Information Security Technical Report. Vol. 9, No. 2
 Malware
that we need a solution that prevents
pandemics from erupting. This implies
either a solution that is capable of
preventing fast attacks outright, or that
slows attacks sufficiently that they become
amenable to slower, but trusted, reactive
techniques.
clean, the virus code can do its job and
infect the system. This last point is
extremely important in virus prevention; so
much so that the general case is explored
below.
4.2. Try, try, try again attacks
The “killer” solution to the virus
problem has always been a method of
stopping viruses that were unknown to the
solution provider – that is, to provide
generic detection of viruses rather than
solutions that rely on prior knowledge of
the virus.
One of the challenges with developing
virus solutions is that it is somewhat like
playing “Paper, Rock, Scissors” when your
opponent has to move before you choose
to. In such a scenario, the player who moves
first always loses against best play.
Similarly, with current generic solutions a
virus writer is free to study the detection
mechanism at his or her leisure and then
play to win.
Numerous attempts to provide such
protection have been investigated by the
anti-virus industry. Static heuristic analysis
of files has been proposed, either by direct
binary analysis, or by runtime simulation.
In
Norman Data Defense
describes a system that emulates samples in
a virtual environment, allowing viruses to
“replicate” safely in a virtual machine. Such
a technology is potentially extremely
powerful, except for three significant
weaknesses.
This drawback in detection leaves the
door wide open to attacks. At the simplest
level, an attacker may develop a virus that
evades the proactive detection of a
particular scanner. At a higher level, a
nation state or similar entity could obtain
copies of all proactive solutions and
develop a single attack that avoids them all.
In such a circumstance, proactive solutions
become worthless and the network once
again becomes susceptible to a crippling
attack.
First, such an approach is incapable of
detecting worms that rely on code injection
– that is, where the worm code is contained
in a data stream.
5. Generic virus detection?
Second, this approach presupposes that
the virus code actually gets to execute. With
an EPO virus, this is not trivially the case,
and the emulator is not practically able to
execute all code paths within the target
executable. Thus, EPO can potentially
render such approaches useless.
Given the challenges faced by solutions that
are based upon static analysis techniques,
anti-virus researchers have been placing
renewed efforts in provided for solutions
that allow new viruses to be reliably
detected. Indeed, a brief examination of
most anti-virus vendors’ web sites provides
a raft of information that might lead one to
think that the problem was well under
control.
Third, the approach is susceptible to
viruses that are able to detect that they are
under emulation, and thereby change their
execution order to hide their intentions.
Once free of the emulator and certified as
Unfortunately, a more in-depth analysis
of current generic techniques shows that
22
Information Security Technical Report. Vol. 9, No. 2
 Richard Ford
The future of virus detection
this is far from the truth: the state of
generic virus detection is very weak. In this
section, we will show some of the
techniques currently used for generic
detection of viruses and illustrate their
susceptibility to one or both of the attacks
mentioned in the preceding section. In no
particular order, we will therefore examine
Checksummers, the Digital Immune
System, Static Heuristic Analysis, Fuzzy
Detection, the Norman Sandbox and
Behavior Blockers.
point of the virus code must be located for
meaningful static analysis to occur.
5.3. Fuzzy detection
Most computer viruses are not entirely new;
rather, a virus writer often takes an existing
virus and modifies it in some way. Given the
similarity between many different virus
variants there are some advantages to
imprecisely detecting particular viruses –
that is, detecting large groups of viruses and
potential variants with one signature
provides some level of proactive detection
of new viruses based upon the original
sample.
5.1. Checksummers
Checksummers had been a vastly underused
part of virus countermeasures, although
they are somewhat tricky to deploy on a
widespread basis. Essentially, a
checksummer relies on the fact that a virus
must change an object it infects. By creating
a cryptographically strong hash of objects
before infection, subsequent changes can be
easily detected. Unfortunately, this strength
of checksummers is also their biggest
weakness: they are change detectors, not
virus detectors. Furthermore, they rely on
the system environment in which they
running, making them vulnerable to viruses
that implement a high degree of stealth.
Finally, they are highly reactive, letting users
know that something occurred, but doing
little to prevent further infection.
However while imprecise detection
sounds attractive, it can pose some
difficulties with respect to disinfection.
Furthermore, such an approach is
pragmatic, but not effective against a
targeted attack: as the detection algorithm
can be obtained by the virus writer by
simple trial and error, creating a variant
that goes undetected is not difficult. Thus,
while fuzzy detection is a useful aid in
prevention, we should not consider it to be
in any way a cure all.
5.4. The Norman Sandbox
One highly effective solution to proactive
virus detection is to run a file in a virtual
environment, and see what changes it makes
on the system. Such approaches are
becoming increasingly attractive as
processors become faster, as it is now
feasible to carry out such simulation in real
time.
5.2. Static heuristic analysis
Static heuristic analysis is the process
whereby a file is searched for certain tell-
tale indicators of viral infection. This
technique can be useful as infected files are
often fairly easily identifiable.
Unfortunately, such heuristics fail to both
“multiple attempt” attacks and EPO. In the
former case, the virus writer can
experimentally determine which factors
trigger a heuristic response; in the latter,
one simply needs to recognize that the entry
Furthermore, if one takes the changes
and simulates the result of those changes,
false positives can be dramatically reduced,
as only files that have some properties of
self-replication can be tagged. One excellent
– and highly effective – example of such a
23
Information Security Technical Report. Vol. 9, No. 2
  [ Pobierz całość w formacie PDF ]

  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • ministranci-w.keep.pl