Dynamic Analysis Roadmap
Background
The discipline of dynamic program analysis can be summarized as covering techniques for extracting information from program/system execution runs to conclude properties about the program/system. A program can be regarded as emitting an execution trace during its execution, and it is this trace that can be analyzed.
Dynamic analysis can be specification-based or algorithm-based. Specification-based analsysis consists of checking the trace against a formal specification. Some approaches try to go the other way and generate the specification from a set of runs. In this way the specification represents true facts about all these runs. Algorithm-based analysis is used when specific errors are looked for, such as data races and deadlocks. In this case special algorithms can be hardwired to find these specific errors and the user does not have to provide a specification.
An interesting more recent trend is the incorporation of runtime verification techniques into aspect oriented programming where aspects become specifications using temporal logic, regular expressions, grammars or state machines. In this form of languages it becomes possible to express statements of the form "when this trace predicate is true about the trace seen so far, execute this piece of code". As can be seen, this amounts to monitoring a program execution against a specification and activating repair code when violations occur.
The long term goal of this effort is to develop a theory and a set of tools for performing dynamic analysis in the context of a verifying compiler. One can imagine that a verifying compiler, given a program and a specification, is able to only prove parts of the specification, and that the "left-overs" are thrown to a dynamic analysis tool to monitor during test or even operation of the tool.
The short term goal is to stimulate the scientific community to gather around this subject to discuss possible problems and solutions. One of the initial focus points of the effort is the problem of how to instrument code and how to monitor a program execution against a formal specification. This includes a discussion of what are good specification languages for dynamic analysis, or how one performs dynamic analysis using existing specification languages.
Topics
The following topics (amongst others) are of interest.
Specification languages and logics
Specification languages and logics have in the past been developed that are suitable for model checking and theorem proving. Monitoring can however reveal new requirements and may lead to new forms of observation-based foundational logics. Monitoring will always have an impact on the monitored system, especially if the evaluation of the specification happens as part of the program. Hence execution efficiency suddenly becomes crucial for such logics, in contrast to what is the case for example in theorem proving logics.
Various forms of logics can be considered, ranging from assertions, pre-post conditions, invariants, to more temporal logics such as future and past time temporal logics, regular expressions, grammars, state machines and timed automata, to mention a variety. Different logics have advantages and disadvantages wrt. expressiveness and monitoring efficiency, and the goal is to find the choices that optimize both. For example, a propositional temporal logic (without data) is more efficient to monitor than a quantified temporal logic that can refer to data values, but the latter is much more useful for practical purposes.
Program instrumentation
In order to monitor the execution of a program or system, the program or system needs to be instrumented, either manually or automatically. Several automated program instrumentation systems have seen the light of day over the last 5-10 years, some working at the source code level and some working at the byte code level or object code level. A research topic is how to connect specifications with code fragments in a flexible and non-intrusive manner. Assuming that a specification consists of a set of properties, and each property refers to a set of predicates, the specification must also state how these predicates relate to the code. A recent development in this direction is seen within the aspect oriented programming community, where pointcut languages are being extended with trace predicates, in the form of for example temporal logic, regular expressions, grammars or state machines.
Fault protection
The idea behind aspect oriented programming is to associate actions with properties about the program: "when this happens, do this". This idea can be generalized to more sophisticated forms of fault protection, where a system is able to "heal" itself when an error occurs during its execution. Detecting that an error has occurred (through dynamic analysis) obviously becomes a central element, but also the diagnosis of the error and the subsequent repair may require sophisticated analysis. Planning systems can be regarded as an example of such systems, where a planner is activated to plan the next steps when it is regarded necessary to do so, either because the previous plan executed to its end, or because an error occurred during its execution. The relationship between general planning and traditional software engineering becomes relevant.
Monitoring of distributed systems
Monitoring a single sequential program requires only a simple instrumentation strategy (ignoring here the efficiency issue), since events are generated to the monitor in a sequential and unambiguous order. Monitoring a concurrent program running on one computer is slightly more complicated, requiring access to the monitor to be protected by locks to enforce mutual exclusion. However, the ordering of events is still sequential, although their order may depend on the way the scheduler operates. Monitoring a distributed system violates the sequentiality of events in the sense that an event E1 that arrives before an event E2 not necessarily occurred before E2. Techniques for dealing with this problem involves regarding the exection trace as a partial order of events rather than a total order.
Algorithm-based dynamic analysis
Some properties can be regarded as generic in the sense that they are desired of all (or most) programs. Examples are data race freedom and deadlock freedom. In this case specialized algorithms can be designed that can detect these errors, and even is some cases can detect these errors even if they do not occur in the examined execution. As it is often the case, when the errors become generic in this manner it is also easier to design static analysis techniques that can prove their absence or precence. Such algroithms usually do not require the user to provide a specification.
Dynamic reverse engineering
Reverse engineering normally has the meaning of automatic creation of specifications from existing code. By dynamic reverse engineering we mean the extraction of specifications from execution traces (runs). The DAIKON system is perhaps the most well-known example of this kind of system. The idea is to collect information from each program run and then on the basis of several runs to automatically establish a specification of what was true about all the runs. Such a specification can then provide a respresentation of behavior observed"so far". Such extracted specifications cannot be used as true specifications about the code, but they can perhaps be used to identify deviations from nominel behavior, or they can be fed into theorem provers as initial specifications as done in the case of DAIKON.
Relationship between static and dynamic analysis
As mentioned earlier, an important question becomes how to monitor a program/system with the least possible impact. For this purpose static analysis can be used to reduce monitoring overhead by statically proving "as much as possible" only leaving a reduced part of the specification to be monitored. Another way of viewing the same activity is to try to prove the specification and what cannot be proven is left over for runtime monitoring. No matter what way one looks at it, the goal is the same: to use static analysis to reason about the relationship between program and specification in order to minimize what has to be checked during runtime.
Work Plan
Creation of a Working Group
We intend to create an initial group of scientists interested in this subject. This group will discuss the details of the future plan and will collaborate to create a system over the next 5 years. By trying to build a system over the next 5 years, it will be possible to correct and improve the rersults in the following 10 years (out of a 15 year total time horizon). The group will initially consist of around 10 people but is open to all that are interested. See contact information below and contact us in case you are interested.
Setting up dynamic analysis infra structure
In order to make collacoration possible, we plan to focus on runtime verification techniques for the Java programming language. However, this decision might appear to be too limiting. We plan to to set up infra structure for instrumenting code. This infra structure can then be used for extracting execution traces from program runs. The infra structure can be used to support a competition between various runtime verification technologies. It is intended to collaborate with the aspect oriented programming community.
Contact Information
Klaus Havelund : http://www.havelund.com
Hassen Saidi : http://www.sdl.sri.com/users/saidi/
Henny Sipma : http://theory.stanford.edu/~sipma/
The Runtime Verification website : http://www.runtime-verification.org