Personal tools
You are here: Home VSR Private VerifiedByConstruction roadmap.tex
Document Actions

roadmap.tex

This is now available as Iowa State University TR#06-21, July 2006.

Click here to get the file

Size 76.6 kB - File type text/x-tex

File contents

% $Id: roadmap.tex,v 1.54 2006/07/21 21:56:59 leavens Exp leavens $
\documentclass{article}  
\usepackage[plainpages=false]{hyperref}
\usepackage{url}
\setlength{\textheight}{9in}
\setlength{\headheight}{.60in}
\setlength{\headsep}{.40in}
\setlength{\topmargin}{-1in}
\begin{document}
\input{tr-cover}
\title{Roadmap for Enhanced Languages and Methods \\ to Aid Verification}
\author{Gary T. Leavens, Iowa State University, USA, \\
Jean-Raymond Abrial, ETH Z\"{u}rich, Switzerland, \\
Don Batory, University of Texas, Austin, USA, \\
Michael Butler, University of Southampton, UK, \\
Alessandro Coglio, Kestrel Institute, USA, \\
Kathi Fisler, Worcester Polytechnic Institute, USA, \\
Eric Hehner, University of Toronto, Canada, \\
Cliff Jones, Newcastle, UK, \\
Dale Miller, INRIA-Futurs, Polytechnique, France, \\
Simon Peyton-Jones, Microsoft Research, Cambridge, UK, \\
Murali Sitaraman, Clemson University, USA \\
Douglas R. Smith, Kestrel Institute, USA, \\
Aaron Stump, Washington University of St. Louis, USA
}

\maketitle

\begin {abstract}
This roadmap describes ways that researchers in four areas ---
specification languages, program generation, correctness by
construction, and programming languages ---
might help further the goal of verified software.
It also describes what advances the
``verified software'' grand challenge might anticipate or demand from
work in these areas.
That is, the roadmap is intended to help foster collaboration between
the grand challenge and these research areas.

A common goal for research in these areas
is to establish language designs and tool architectures that would
allow multiple annotations and tools to be used on a single
program. In the long term, researchers could try to unify these
annotations and integrate such tools.
\end {abstract}

\section{Introduction}

Hoare has proposed a grand challenge project, formerly called the 
``verifying compiler'' grand challenge \cite{Hoare03}, and now called
the ``verified software'' grand challenge by Hoare, Misra, and Shankar
\cite{Hoare-Misra-Shankar05}.
The original idea was to automatically check correctness of programs
that are ``specified by types, assertions, and other redundant
annotations.''  However, the current version of the grand challenge
recognizes the possibility of many tools, some of which may require
human intervention or assistance.  In any case, verification would be
based on the text of the program and the annotations contained within it.

\subsection{Audience}

This report is addressed to two audiences.  The first is researchers
interested in program verification, especially related to 
the ``verified software'' grand challenge.
The second is researchers in the following areas:
\begin{description}
\item[specification languages] that describe behavior or properties to
be verified, 

\item[program generation] that automatically synthesizes code,

\item[correctness by construction] that concerns development and documentation
of implementations especially to facilitate verification, and

\item[programming languages] that describe algorithms and data.
\end{description}
The report is addressed to researchers in these four areas
who are interested in verification, specifically how their work might
aid the verifying software grand challenge.
This report explains what these four areas 
might do to help the overall grand challenge project
and thus foster the goal of verified software within the scope of the
grand challenge project.
It is not intended to suggest an overall research agenda for any of
these areas. 

\subsection{Motivation}

There are many approaches to verification, all of which are embraced
by the grand challenge effort.  One can write or find code and verify
it using a variety of tools and approaches.  

While recognizing the value of many approaches to producing verified
software, researchers in the four areas mentioned above are often
motivated by the idea of gaining benefits (in ease, productivity, or power of
verification) by providing the verifier with more information than
just a bare program in some standard programming language.
Verifying a bare program after-the-fact has the following fundamental problems.
\begin{itemize}
\item
Without a specification or some annotations in the code, the
properties that one can verify must be implicit and thus very weak,
such as that the program will not crash or throw exceptions.  

\item
Even with a specification, a program can be
arbitrarily difficult to verify (due to lack of modularity or other
artificial complexities).
\end{itemize}

With regard to the first point,
even adding some partial specifications makes
the verification problem more interesting and the results more useful.
This is a potentially valuable technique for legacy code.
For example, one might specify that a function
returns a list of length equal to its input, which is 
only a partial specification of what the function does.
Indeed, there is an entire spectrum of properties that one might
consider, as shown in Figure~\ref{fig:spectrum}.
So there is not necessarily a unique best
specification for a function, since some kinds of properties,
such as resource consumption, its behavior in a transactional setting,
its real-time behavior, and so on, may best be thought of as outside
of the traditional specification of functional behavior.

\setlength{\unitlength}{1mm}
\thicklines
\begin{figure}[ht]
\begin{center}
\begin{picture}(100,13)(0,2)
\put(5,14){\vector(1,0){95}}
\put(5,14){\vector(-1,0){5}}
\put(3,5){\mbox{No crashes}}
\put(7,1){\mbox{No type errors}}
\put(23,5){\mbox{No exceptions}}
\put(33,10){\mbox{No deadlock}}
\put(50,5){\mbox{Preconditions}}
\put(50,2){\mbox{hold}}
\put(78,8){\mbox{Control and}}
\put(78,5){\mbox{functional}}
\put(78,2){\mbox{behavior}}
\put(96,5){\mbox{$\ldots$}}
\end{picture}
\end{center}
\caption{A spectrum of specification properties, from partial
  specifications on the left to more complete specifications on the right.}
\label{fig:spectrum}
\end{figure}

With regard to the second point, researchers believe that information
about design decisions made in the program's development can be of
great use to the verification process.  Well-known examples are
annotations for loops and object invariants, but information can also
be obtained from the process of generating a program (up to and
including a complete proof), and the process of constructing a program
and its proof hand in hand.  Intermediate modeling and refinement
steps are also believed to greatly aid verification and may in the limit
constitute a proof. Types in programming languages can also be
augmented with additional information related to correctness proofs,
and other program annotations, such as those describing ownership in
the heap, can be of great value.  To summarize, the motivation for all
these areas is to make such information available to a verifier.

\subsection{Limitations}

The ``research roadmap'' that follows is limited in several ways.

First, the roadmap focuses on the four research areas named above and
their relation to verification.  Other techniques and research areas
related to verified software are largely ignored. Furthermore, although
there are many ways in which these four research areas
might aid the general goal of more reliable software, this roadmap
only focuses on 
the specific ways that these areas might produce verified or
more easily verifiable software in the context of the grand challenge
project.  Much research is already going on in all of these areas to
promote more reliable software, and such research would also
contribute, indirectly, to the goal of making software easier to
verify.  However, discussing all such research would lead to a very
broad survey which would be of less use to the verified software
grand challenge.

The second way in which our roadmap is limited is that it has only
(thus far) drawn on the expertise of a very small sample of
researchers in each of the research areas.\footnote{
However, it also reflects feedback from the members of IFIP working
group 2.3, the mini-conference on verified software April 1--2,
2006 held at SRI, and the Dagstuhl workshop on ``The Challenge of
Software Verification'' July 10--13, 2006.
}
The authors of this report
were selected in the following way.  A conference on the verified
software grand challenge was held in Z\"{u}rich Switzerland in October
2005 \cite{VSTTE05}.
At that conference, the organizers --- Hoare, Shankar, and
Misra --- picked leaders for three committees to write research
roadmaps.  Leavens was picked to lead the committee writing this
report.  Leavens in turn picked the committee members, intentionally
aiming for a small committee, using a selection that was biased
toward people who had attended the conference in Z\"{u}rich.

Finally, the preceding limitations result in limitations on the
applicability of our roadmap.  First it is biased toward research
directly related to the verified software grand challenge.
Second, since the committee is small compared to the number of
researchers in the four research areas,
this report does not necessarily represent a
consensus of the researchers in any of the four research areas.

\subsection{Outline}

The next section gives some background about verification problems and
challenge problems.
Following that, Section~\ref{sect:commongoal} describes
the common goal of the four areas with respect to the
grand challenge, that is, what they might, overall, provide to it.
Sections~\ref{sect:sl}--\ref{sect:pld} describe the more specific
needs and potential research directions in each of the four areas.
Section~\ref{sect:conclusions} concludes this report.

\section{Background}
\label{sect:needs}

This section gives some background on verification problems and lays
out some needs that researchers in the four areas have for challenge problems.

\subsection{Verification Problems}

An enhanced language or tool is intended to work on some class of
verification problems.
A precise way to state a such class of \emph{verification problems} is to
describe:
\begin{itemize}
\item
a specification language, in which to state the assumptions and
guarantees of a correct implementation, and

\item
a programming language, in which to implement the specifications, and
whose code is to be verified.
\end{itemize}
A specification in the specification language together with a program
in the programming language constitute a problem for a verification
system.  A pair of a specification and programming language describe
the set of possible such problem instances that such a system should
be able to handle.

The specification language and programming
language might be integrated; there is no need to have two separate
languages.  Some examples of integrated languages are 
Gypsy \cite{Ambler-etal77},
Alphard \cite{Hilfinger-etal78,London-Shaw-Wulf76,Shaw81a},
Euclid \cite{Lampson-etal81,London-etal78},
Eiffel \cite{Meyer92b,Meyer97}, Resolve \cite{Sitaraman94},
and SPARK \cite{Barnes03}.

For various reasons the grand challenge project has not articulated,
and will probably not articulate, constraints on what verification
problems are of interest.  But verification problems of interest will
be described indirectly, through challenge problems. 

\subsection{Challenge problems}

Challenge problems can help stimulate research, especially in the
short term.  The following are some suggestions for such challenge
problems. 

To reward research that can handle problems of significant size, the
challenge problems should be big enough to require reusable modules
and structuring (at multiple levels).  

Challenge problems at a minimum need to have explicitly stated
(informal) requirements.  It will also be helpful to have
formal requirement models. 

A formal specification of the properties of interest for each
challenge problem is also needed by each of the four areas.
Those working in specification languages
could use the formal specification as a baseline for case studies that
compare their work against the notation used to state the properties
of the challenge problem.
The other areas need a formal specification as a starting point for
certain kinds of research.

As a practical matter, and as an aid to those working in all four
areas, challenge problems should also come with test cases.

To aid work on programming languages and some researchers in the
correctness by construction approach, it would also be helpful to
provide well-tested candidate implementations with each challenge problem.
Such implementations would be useful to researchers in programming languages,
who could try to devise alternative implementations or languages that
would allow easier verification of implementations.

\section{Common Goal: Verifiable Artifacts}
\label{sect:commongoal}

To set out goals for the four areas, we make some assumptions.
The main assumption is that the grand challenge is
interested in \emph{at least} the following:
\begin{itemize}
\item
specification of safety properties 
(e.g., the relation between inputs and outputs, lack of deadlock), and

\item
imperative programming languages (such as Pascal or C), including
object-oriented languages (such as Java).
\end{itemize}

On the one hand, although it is non-trivial and of some economic importance,
this is a rather small class of verification problems.
For example, most imperative programming languages have only limited
support for concurrency (e.g., threads in Java), but different models
of concurrency may become increasingly important in the next several years.
On the other hand it is still perhaps too large, because it
encompasses the entire spectrum of safety properties, including
everything in Figure~\ref{fig:spectrum}.
The reader should keep in mind that the grand challenge project may
indeed be interested in other kinds of specifications and programs.
In that case this report will most likely be missing some potentially
interesting research directions.

Assuming the goal of the project is to build tools that will be able
to handle at least verifying safety properties for imperative
languages, we see the following short-term and long-term
goals that are shared across the four areas. 

\subsection{Short Term: Extensible Languages and Tools}

In the short term (i.e., in the next 5-7 years), a common goal is to
allow for extension of tools and languages by other researchers (and
ultimately, by users). 

For specification and programming languages, this means
designing languages so that other researchers (and ultimately users)
can add new specification notations and new annotations to aid in
verification proofs.
These languages should allow specifications to be added (and proved)
incrementally.\footnote{In addition to the utility of such annotations in
verification, the more properties one proves, the more confidence one
has in a program.  This is an additional motivation for the goal of
allowing language extension.}

Such extensions should ideally not just describe syntax,
but also have access to information from the language processor
(e.g., a compiler).
User-extensible annotation mechanisms, such as those found in C\# and
Java may be a useful technique for achieving parts of this goal.

In all four areas, tool builders should strive to define
architectures that will permit other researchers to easily add new
specifications and other proof-oriented annotations, that will enable
other tools to cooperate on verification of the same program.
XML may be an aid for achieving parts of this goal.
Overall, the idea is to recognize that no one tool will have all the
necessary features for attacking all parts of a difficult verification
problem.  Tool (framework) builders
should make it easier to build new tools or extend existing tools.
This in turn will help other researchers gain much 
needed experience with their approaches, but at a lower cost.

Since efforts in building extensible tools can have a multiplicative
effect in enabling research, such efforts should be highly
encouraged by the project.

\subsection{Long Term: Unification}

In the long term (8-15 years), researchers should attempt some
consolidation of various languages and tools in their areas.
This is desirable because the software industry does not want to deal with
many different languages, notations, methods, and tools.
Furthermore, it is also theoretically unsatisfying to have to explain
a wide diversity of approaches.  Thus, while research will continue to
make progress by exploring a wide range of approaches to attacking
verification problems, in the second half of the project
some researchers should also build on and consolidate the ideas of
several tools and languages. 

\section{Research in Specification Languages}
\label{sect:sl}

This section was mainly written by:
Gary T. Leavens, Kathi Fisler, Cliff Jones, Douglas R. Smith, and
Murali Sitaraman. 

\subsection{Need for Specification Languages}

Research in (formal) specification languages is central to the grand challenge,
because interesting verification problems contain interesting
specifications.  Thus the grand challenge
project needs at least one specification language for stating
the properties that are to be verified in the class of verification
problems of interest.  Even if the class of verification problems only
encompasses very weak or partial specifications, such as those on the left side
of Figure~\ref{fig:spectrum}, there will still be the need for a
specification language (although in the extreme case, the
specification language might be trivial in the sense that it contains
just one sentence: ``the program should not crash'').

\subsection{Assumed Scope}

Since it is not clear what properties are of interest to the grand
challenge, this section assumes that the set of properties of interest
includes \emph{at least} safety properties for sequential and concurrent
programs.  That is, the remainder of this section assumes that the
grand challenge is interested in specifying at least:
\begin{itemize}
\item
assertions about states and data values, which allow one to
describe the functionality of procedures in imperative programming
languages, and

\item
properties of the history of events in a program's execution.
\end{itemize}

\subsection{Background: Kinds of Specification Languages}

This section defines terms used in
the description of short-term and long-term research directions,
particularly about different kinds of specification languages.

Specifications can be stated at many different abstraction levels.  At
the highest level of abstraction are \emph{requirements}
\cite{vanLamsweerde00},
which describe the behavior of entire programs
from the end-user's perspective, often including non-functional
properties, such as cost or time.  Requirements are initially
informal, but may  be (partially) formalized later.
What we hereafter refer to as \emph{specifications\/}
are statements that may describe or refer to a program's internal
states or events, which may not be directly visible to a program's user.
Such statements are usually formal and
describe a class of programs or program modules (components) that
have a design with features that can be related to the internal states
or events mentioned in the specification.
Thus what we call specifications are at a level of abstraction that is
more relevant to the detailed design of a program.
Such detailed-design specifications are capable of documenting
interfaces of individual program modules, such as procedures or
classes \cite{Wing90a}.

One technique for writing such specifications is algebraic 
\cite{Guttag-Horning78,Goguen-Thatcher-Wagner78,Ehrig-Mahr85,Futatsugi-etal85},
in which one writes axioms that relate operations to other operations.
While the early papers described non-imperative examples,
this technique has also been adapted to specification of imperative code
\cite{Boehm85,Goguen-Malcolm96,Hoare-etal87}.
The CLEAR language \cite{Burstall-Goguen77,Burstall-Goguen80},
which provides category-theoretic foundations for the structuring and
refinement of algebraic specifications.  In CLEAR, specification
morphisms are used to structure specifications, and colimits serve to
compose specifications.  Later examples of this approach include Specware
\cite{Specware04} and CASL \cite{Bidoit-Mosses04}.

Another technique for writing such specifications is the pre- and
postcondition style originated by Hoare \cite{Hoare69}.
In this technique, 
if a purely mathematical language, such as higher-order logic 
(as in the PVS theorem prover \cite{Owre-etal95} or
Isabelle/HOL \cite{Nipkow-Paulson-Menzel02})
or temporal logic \cite{Manna-Pnueli92} is used for specification of a
program, there must be some abstraction function (or relation) that
maps the states or events in the program's execution to the abstract
states or event models that the specification's formulas mention
\cite{Hoare72a,Lamport89,Zave-Jackson97}.
Many behavioral specification languages, such as VDM \cite{Jones90}, Z
\cite{Spivey92}, Object-Z \cite{Rose92}, and OCL
\cite{Warmer-Kleppe99} 
have more structuring mechanisms, many of which
resemble structures (such as procedures and classes) in programming
languages.  
% [[[Same for property-event languages?  Promela or whatnot?, refs]]]
Besides helping structure larger specifications,
such mechanisms constrain
what kinds of abstraction functions are considered in proofs.

Carrying these structuring mechanisms farther, by writing
specifications as annotations to programs in
some particular programming language, yields an
\emph{interface specification language} \cite{Wing87}.
In such a language, a correct implementation must have both the
specified interface and specified behavior (or properties), and thus the 
relation between a program's state (or events) and the abstract state
(or events) described by the specification is much more tightly constrained.
Examples of behavioral interface specification languages include
the Larch family \cite{Guttag-Horning93,Wing87},
the Resolve family \cite{Edwards-etal94,Ogden-etal94},
SPARK \cite{Barnes03},
Eiffel \cite{Meyer92b,Meyer97}, JML \cite{Burdy-etal05,Leavens-etal05},
and Spec\# \cite{Barnett-etal04,Barnett-Leino-Schulte04,Leino-Mueller04}.
Examples of history-based interface specification 
languages include Bandera \cite{Corbett-etal00} and Java Pathfinder
\cite{Havelund-Pressburger00}.
Interface specification languages, with their close relationship to a
programming language, seem likely to be important for the grand
challenge, especially in the short term.

\subsection{Short-Term Research Goals}

The following are some short-term (5-7 years) research goals for
specification language research.

\subsubsection{Open Languages and Tools}

Specification languages should be designed to be extensible and open,
so that researchers can more easily experiment with variations and
extensions.  Tools for specification languages, such as type checkers
or verification condition generators, should also be designed with an
architecture that makes for easy variation and extension.
Tools should also allow different analysis and verification systems
easy access to and manipulation of specifications, as these will aid
the verification efforts of the grand challenge. 

\subsubsection{Reasoning about Partial Specifications}

Tools for specification languages should make it easy to state and
prove logical consequences of specifications.  These can be used both
for debugging specifications and for proving connections with
formalizations of requirements, etc.  It should not be necessary to
have a complete specification in order to do such reasoning; in other
words, it should be possible to reason about partial specifications in
which many parts are underspecified, to permit early debugging of the
specification. 

\subsubsection{Refinement}

Tools for specification languages should make it easy to state 
refinements between specifications
\cite{Back-vonWright98,Hehner93,Morgan94,Morgan-Vickers94,deRoever-Englhardt98,Specware04}.
There should be automated support for both debugging and proving such
refinements, using techniques such as model checking for finding
problems with proposed refinements.
Section~\ref{sect:cxc} discusses both the posit-and-prove
and transformational approaches to proving refinements, and how 
these techniques can aid verification.

\subsubsection{Modularity and Reuse}

Specification languages should permit modular
descriptions of reusable interfaces.
While verified software does not have to be reusable,
reusable modules can make it easier to develop larger and more
interesting verified software.

\subsubsection{Specification of Resources}

If non-functional properties, such as time and space consumption, are
of interest to the grand challenge, then
specification and reasoning techniques for such nonfunctional
properties \cite{Hehner98,Krone-Ogden-Sitaraman01,Sitaraman-etal01}
should be further developed 
and integrated with other kinds of specification.

\subsubsection{Interface Specifications}

The design of interface specification languages poses some special
problems.  

\paragraph{Specification and Translation of Assertions}

Experience with Eiffel \cite{Meyer92b,Meyer97} and Larch seems to
suggest that programmers may find that specification languages like
Eiffel, in which assertions are written in the syntax of the
programming language, are easier to use than Larch-style
languages. (See also Finney's study of mathematical notations
\cite{Finney96}.)  However, other efforts in teaching mathematical
specifications to undergraduate students appear to be quite
successful, suggesting that the exact notations and language might
play a significant role in ease of understandability and use
\cite{Sitaraman-etal01a}.  Thus one research problem is to understand the ease
of use of different specification notations (both in practice and for
use in verification).

Another research problem is to study how to
translate assertions in different languages into logical formula that
are useful in reasoning (e.g., in a theorem prover)
\cite{Darvas-Mueller05,Leavens-etal05}.

\paragraph{Heap Structuring}

Better techniques for heap
structuring, using concepts such as ownership % [[[refs]]]
seem to hold promise for aiding verification of pointer-based and
object-oriented programs. 
At the very least, some way to prevent representation exposure
\cite{Liskov-Guttag86,Noble-Vitek-Potter98} seems
necessary to do modular reasoning about frame axioms and
invariants \cite{Kassios06,Mueller02,Mueller-Poetzsch-Heffter-Leavens03}.
Heap structuring also seems helpful for making sense of object
invariants in systems built from abstraction layers
\cite{Barnett-etal04,Leino-Mueller04,Mueller-Poetzsch-Heffter-Leavens05}.

It may be that other simplifications in reasoning can be obtained by 
introducing specifications that further restrict heap structures, for
example, to cycle-free-pointers, where such restrictions
are appropriate (e.g., in the implementation of lists and trees).
What are the right techniques for specifying such restrictions and
what kinds of reasoning benefits are obtainable?

\paragraph{Assistance in Writing Specifications}

To verify large programs that use many modules and libraries, it is
often necessary to specify large libraries or code.
Many such specification tasks are quite labor-intensive and somewhat
unrewarding intellectually.  Some automation would help.  Tools like
Daikon \cite{Ernst-etal01,Nimmer-Ernst01} and Houdini
\cite{Flanagan-etal01} have demonstrated that it is possible to
recover some formal specifications from code using various
heuristics.  It might be interesting to infer
specifications from examples or directly from test cases.
A research goal would be to have such tools work with
user-specified abstractions, so that they could be used to more
quickly write more abstract specifications.
Or perhaps some automatic abstraction heuristics could be used.
An environment for writing specifications could allow users to edit
out some cases in a specification, to achieve more abstraction by
underspecification. 

\paragraph{New Language Features}

If more advanced programming languages are of interest to the grand
challenge project, then how to specify properties of programs that use
advanced features, like advice in
aspect-oriented languages, will be important.

\subsection{Long-Term Research Goals}

The following are some longer term (8-15 years) goals for
specification languages.

\subsubsection{Integration of Data and Control}

An important challenge for specification language design is to integrate
the two disparate worlds of state-based and history-based (or
event-based) specification languages.  Typically, specification
languages either focus on sequential programs and describe properties
of data values, or they focus on concurrent programs and
described properties of event histories.
However, complete verification of concurrent programs demands
reasoning about both data and control.
Some potential approaches are to use atomicity
\cite{Liskov-Weihl86,Rodriguez-etal05} or
to use transitions over relations. % [[[refs?]]]

\subsubsection{Traceability}

Links between requirements and detailed design specifications should
be able to be explicitly stated and reasoned about.  One approach may
be to develop techniques for stating and proving refinement
relationships between (particular pairs of) requirement and
specification languages.  Another approach might be to design
languages that are good both for formalizing requirements and
for specification of the detailed design.

\subsubsection{Tool Frameworks that Support Integration}

Frameworks that would make it easy to build tools for specification
languages and to integrate different tools for reasoning about
specifications should be a long-term goal.
Integration among reasoning tools, such as model checkers and theorem
provers, would also be helpful.  % [[[Relation to verification bus?]]]

\subsubsection{Interface Specification Language Design}

A theory of how to design interface specification languages should be
developed that allows a new specification language to be
quickly designed for a new programming language, at least within a
fixed set of programming paradigms.
Ultimately such a theory should extend beyond the imperative and
object-oriented paradigms to other paradigms of interest to the grand
challenge.

Along the same lines, it may also be useful to understand how to
tailor the design of such a language to a specific architectural style.
This would potentially help with verification of programs written in
such styles. % [[[refs?  Something by Kramer and MaGee was mentioned,
             % but I'm not sure what...]]]

\section{Research in Program Generation}
\label{sect:pg}

This section was mainly written by:
Gary T. Leavens, Don Batory, Alessandro Coglio, and Douglas R. Smith.

\subsection{Background on Program Generation}

A program \emph{generator} \cite{Czarnecki-Eisenecker00}
is a tool that produces code from some higher-level
description of the code. Conventional compilers for languages such as C and
Java fit this characterization, because they generate lower-level assembly or
bytecode from higher-level programming languages. However, the term ``program
generator'' is typically used for tools that produce code in relatively
high-level languages such as C and Java, and where the higher-level
description of the code is a specification. Nonetheless, we do not
rule out the view of compilers as generators; in fact, the research directions
advocated here apply to compilers as well.

A program generator operates on the syntax of the source
(specification) and target (code) languages. Roughly speaking, the
generator reads the specification and writes 
the code, i.e.\ it \emph{transforms} the specification into the code. Program
generators are often written in conventional languages such as C or Java; they
manipulate data structures that encode abstract syntax trees of the source and
target languages. The pattern matching featured by languages like ML and
Haskell provides a convenient way to implement syntactic transformations.
Languages like Refine \cite{ReasoningInc06} and Stratego
\cite{Stratego06} provide even 
more convenient features to implement syntactic transformations in a more
declarative way, by means of rewriting rules, strategies, and
quotation/anti-quotation pattern matching.

\subsection{Relation to Model-Driven Development}

The premise of Model-Driven Development (MDD)
\cite{Batory06,Booch02,Turk-etal02}
is that a program has
multiple representations, expressed as models.  Transformations will
update models and map models to other models, and compose
models.\footnote{
Thus, roughly speaking, a model is an object and a transformation is a
method.
}
Since code is the most important kind of model in MDD, MDD 
falls within the scope of the program generation area.

\subsection{Motivation for Program Generation}

Program generation is useful for at least two reasons
\cite{Czarnecki-Eisenecker00}.
One is \emph{productivity}: instead of writing the code directly, the developer
writes and maintains the specification, which is supposedly shorter
and easier to read and write than the code. The other reason, which is
more relevant to our context, is that the code can be generated in
such a way as to be \emph{automatically verified\/}; that is, it will be
correct with respect to the specification.
The research directions advocated here aim at automatic verification.

Program generation also fits well with the use of software product lines.
A software product line describes a family of programs
\cite{Batory-Sarvela-Rauschmayer04,Czarnecki-Eisenecker00}. 
Using a product line gives a significant reduction
in artificial complexity, more regularity and structure in a program's
modules, and leads to modules are more likely to encapsulate increments in
program functionality. All three are key requirements for module
reusability, large scale synthesis, and verification.
Showing how to verify software product lines 
would illustrate the connection between scale, design, and verification.

\subsection{Problem: Verified Program Generation}

The problem is that even when using the most declarative syntax
transformation languages available, the semantics of the source
specification and of target code are not 
directly ``represented'' in the program generator. Thus, it is very possible
to generate code that is incorrect with respect to the specification, by doing
``wrong'' syntactic transformations.  Achieving correctness is thus
the overriding research problem for program generation with respect to
the grand challenge.

\subsection{Problem: Scalability}

There has been significant progress in algorithm synthesis and
automatic design optimization \cite{Smith90b},
especially in restricted domains;
examples include Planware \cite{Becker-Gilham-Smith03}, Amphion 
\cite{Stickel-etal94}, and AutoBayes \cite{Fischer-Schumann03}.  
While continued progress in the generation of moderate size programs
can be expected, a scalable approach to program generation must also
focus on how to generate verified compositions of reusable modules.  A
vast majority of practitioners and researchers who are automating
parts of program development are building tools that are compositional
in nature. COM, Java server pages, and Enterprise Java Beans are examples.
% [[[refs]]]
These tools stitch code modules together to synthesize
larger modules. Most code modules are written by hand, but some (e.g.,
parsers or boiler-plate interfaces) are generated by simple tools. In
effect, the specification languages for these code synthesizers 
are akin to module interconnection languages.

A module is more than just code; it encapsulates several different kinds of
information: specifications, code, formal models from which properties
can be inferred, documentation, performance models, etc. 
Specifications and performance models are especially important for
verification.  It is thus important to synthesize such information for
generated compositions of modules \cite{Batory-Sarvela-Rauschmayer04}.

A well-known example of the above is the work on query optimization in
relational databases \cite{Selinger-etal79}. An optimizer maps a
declarative specification (e.g., a SQL SELECT statement) to an
efficient implementation. A SELECT statement is first mapped to a
relational algebra expression, the expression is optimized, and then
code is generated from the optimized expression. Each relational
algebra operation is a module, and a relational algebra expression is
a composition of modules that represents a query evaluation
program. Each module (operation) encapsulates two different
representations: a performance model (which evaluates the efficiency
of the operation) and code (to implement the operation). The query
optimizer uses only the performance model of an operation to deduce
the most efficient composition. The program synthesizer uses only
the code representation to generate the implementation. A
similar organization (i.e., modules containing multiple formal models)
will be needed for program verification.

\subsection{Short-Term Research Goals}

The following are some short-term (5-7 year) research goals in the
area of program generation.

\subsubsection{Formalizing Language Semantics}

The first step to establish the correctness of generated code is to
formalize the semantics of the source and target language, along with a notion
of what it means for an artifact in the target language (the code) to be
correct with respect to an artifact in the source language (the
specification).  For example, the correctness notion could be that the
two artifacts have the same observable behavior (where the notion of
observable behavior must be also formalized). These formalizations
should be developed in a suitably expressive logical language with a
formal proof theory, such as the languages found in mundane theorem provers.
Examples include Project Bali \cite{Nipkow-Oheimb-Pusch00}
and the LOOP Project \cite{Jacobs-etal98,vandenBerg-Jacobs01}, both of
which formalize Java.

\subsubsection{Tool Development}

Current (meta-)languages and tools \cite{ReasoningInc06,Stratego06}
do not deal with
the semantics and proof aspects of transformations, but only with their
syntax.
Thus, an important research direction is to design
languages and tools, by which one can more directly
represent semantics and generate proofs and code in an integrated fashion.
% [[[Any research to point to along these lines?]]]

\subsubsection{Certified Code Generation}

Instead of directly verifying the generator, a promising
approach is to have the generator 
produce, along with the code, a machine-checkable proof of the correctness of
the output code with respect to the input specification
\cite{Coglio-Green05,Colby-etal00,Narayanan-Karsai06}.
The proof should use the inference rules of the logical language in
which the semantics of source and target language, as well as the notion of
correctness, are formalized.

Then, as in the well-known proof-carrying code technique
\cite{Necula97}, the proof is checked by a simple proof checker, so
that trust is shifted from a large and
complex generator to a small and simple checker.

\subsubsection{Transformation Patterns}

Proof-generating transformation patterns, which will emerge from
applying program generation in practice should be cataloged;
e.g. taxonomies of algorithm theories and datatype refinements
\cite{Smith99}. These catalogs will help others apply the ideas and build
tools more quickly.

\subsubsection{Better Algorithms to Aid in Program Generation}

To apply general design
principles and transformations to a concrete specification requires
some analysis (to verify applicability conditions) and constructive
inference (to extract expressions to fill in design templates).

More practical program generation  
requires low-order polynomial time algorithms for analysis and
constraint solving.  A promising approach is to compose
constraint-solvers and decision-procedures for various specialized
theories.  % [[[refs?]]]
Static analysis can also sometimes provide a fast
alternative to search-based theorem provers.

\subsection{Long-Term Research Goals in Program Generation}

The following are some long-term (8-15 year) goals for research in
program generation.

\subsubsection{Scalability}

To allow scalability of program generation, techniques for generating
compositional, well-structured designs are needed in each application
domain.  A complementary need is for techniques for composing
properties, specifications, and other non-code information in
modules.  It must be clear how such compositions preserve (or
affect) properties of interest.

\subsubsection{Taxonomy of Proof-Generating Transformations}

A collection of proof-generating patterns (or templates) should be
made into a library, categorized by various dimensions, such as
application domain, source and target language, etc.  This knowledge
would make it easier to develop future program generators.

\subsubsection{Better Tools and Frameworks}

Researchers could design better languages, tools, and frameworks, to
ease the task of building future program generators.
Such tools could both more directly support proof generation and could
also ease the proof of correctness for the program generator itself.

Such tools and languages could also more directly support
proof-generating patterns. 

\subsubsection{Factoring the Certification Process}

Establish sound techniques for incorporating formal proofs into the
certification process for program generators, in order to eliminate
some testing and reduce the need for other kinds of testing.
(Current practice is to perform extensive and expensive
testing, both to validate the generated code's functionality and performance,
and to test for vulnerabilities and flaws along various code paths.)
Given a complete specification from which the code is generated, together
with a proof of consistency between code and specification, there
should be little need to perform path testing to reveal flaws.  There will
still be a need to test that the specification meets intentions, but
that can be a more specialized activity.  Also, those requirements that are not
treated during generation or refinement (e.g. performance concerns) would
also still need to be tested.

\subsubsection{Allow Update of Running Systems}

For embedded systems, it is often necessary to update (fix) the
code while the system is running.  Supporting such updates in a system
where code is generated may be a matter of generating the code to
allow for eventual update.

\subsubsection{More Manual Control}

To allow users to operate outside a limited domain to some extent,
program generators could be designed to allow more manual input,
making them a blend of a program generator and a correctness by
construction system, as described in the next section.

\section{Research in Correctness by Construction}
\label{sect:cxc}

This section was mainly written by:
Michael Butler, Gary T. Leavens, Eric Hehner, Murali Sitaraman,
Jean-Raymond Abrial, and Cliff Jones.

\subsection{Motivation}

Much discussion on the need for a powerful program verifier seems to
contain the following underlying assumptions:

\begin{itemize}
\item
That a program verifier will be used mostly to verify completed programs.

\item
That when verification fails it is because the program contains errors.
\end{itemize}

While a powerful program verifier is a very valuable tool for
programmers, it does not help them construct a correct program in
the first place, nor does it help document and explain decisions
(e.g., those motivated by efficiency considerations)
made in existing code.

Equally important, the correctness of any verification is dependent
on the validity of the formal properties against which a program is
checked.  Since we cannot, in general, guarantee that such properties
are what users really want, we will, in the remainder of this section
use the phrase ``verification by construction,'' instead of
the more common phrase ``correctness by construction,'' to emphasize
the potential problems with the initial specification.

The verification by construction approach % [[[refs needed here]]]
helps developers who want to construct verified software systems by
addressing the following questions:

\begin{description}
\item[Q1]
How do we construct models and properties against which to verify our
software?

\item[Q2]
How do we ensure that our models and properties properly reflect the
requirements on the system?

\item[Q3]
How do we take account of the environment in which our software is
intended to operate?

\item[Q4]
How do we construct our software so that the verification will succeed?
\end{description}

In the following, we will largely ignore question Q2, since it too
large and important to be included in our grand challenge; it would
constitute a grand challenge on its own.

As can be seen from the other questions, the verification by
construction approach broadens the focus away from just verifying
a finished \emph{product} to analysis of
models at all stages of the development \emph{process}.
It encourages verification of designs and not just verification of
programs.  Verification of designs may lead to a greater payoff than
just verifying programs.  Introducing formal modeling early in the development
cycle helps to identify problems earlier, long before any code is developed,
thus helping to avoid expensive later rework.

As well as supporting verification of designs and implementations, the
formal modeling languages used in verification by construction
encourage a rational design process. We contend that
the use of good abstractions and
simple mathematical structures in modeling, and reuse of modules
with specifications can lead to cleaner, more
rational system architectures that are easier to verify (and maintain)
than architectures developed using less disciplined approaches. 
% [[[Any references to support this assertion?]]]

\subsection{How is Verification by Construction Achieved?}

Verification by construction can be achieved by having a formal
framework in which models are constructed at multiple levels of
abstraction; each level of abstraction is refined by the one below,
and this refinement relationships is documented by an abstraction
relation (typically in the form of a gluing invariant)
\cite{Abadi-Lamport88,Abrial96,Back-vonWright98,deRoever-Englhardt98,%
Hehner93,Jones90,Lamport94,Morgan94,Morgan-Vickers94,Morris80}.
The highest levels of
abstraction are used to express the required behavior in terms of the
problem domain. The closer it is to the problem domain, the easier it
is to validate against the informal requirements, i.e., ensure that it
is the right specification. The lowest level of abstraction
corresponds to either an implementation, a specification from which an
efficient implementation can be derived automatically, or to a
specification realized in hardware.

Also critical in this framework are mechanisms for composing and
decomposing models. Composition can be useful for building up
specifications by combining models incorporating different
requirements. Decomposition is important for relating system models to
architectures of subsystem models and also for subsequent separate
refinement of subsystems
\cite{Abrial-Hallerstede06,Abadi-Lamport93,Back89,Back-Sere90,Butler96,compos97}. 

Ensuring that a model $M2$ refines or implements $M1$
requires bridging the abstraction gap between them.
Typically there is a large abstraction gap between a good formal
specification, i.e., one that is easy to validate against the
requirements, and an efficient implementation.

Verification by construction
does not require that such abstraction gaps be bridged by a series of (small) 
transformations, done at the time that $M2$ is derived from $M1$,
each step of which guarantees refinement.
While this kind of \emph{transformational approach} is valuable
\cite{Hehner93,Morgan94,Morgan-Vickers94,Morris80},
verification by construction also includes a
\emph{posit-and-prove approach}, in which the developer provides both
$M1$ and $M2$ and uses tools to verify that $M1$ is refined by $M2$
\cite{Abadi-Lamport88,Abrial96,Jones90,Lamport94}.
The difference is not great, especially since in the transformational
approach, the transformation applied might
result in the generation of side conditions that will need to be
verified.  Conversely, if the abstraction gap between $M1$ and $M2$ is
small enough, or if the properties involved are limited, 
a tool can generate proof obligations that can be
verified, perhaps automatically using model checkers or powerful
theorem provers.  Tools are important for the transformational
approach, but tools are also useful in the posit-and-prove approach,
for example, to help one discover ancillary properties, such as invariants.

Through refinement it is often possible to model and reason
about how a strategy solves a problem in an abstract way
using abstract specifications that encapsulate algorithms and data
structures. At higher levels of abstraction one can focus
reasoning on design choices closely related to the problem domain and
less on coding details.  These abstract specifications can then be optimized
through refinements that select implementation modules, or that
introduce more concrete algorithms and data structures.
Reasoning about these optimizing refinements no longer
requires reasoning about the original problem as this will have been
dealt with by the earlier refinement. 

In this way, by keeping the models as
abstract as possible at each level, or by reusing modules, one will
often have simpler proof obligations to discharge. 
This contrasts with the situation that obtains when one verifies a
program (without annotations) and without intermediate refinement steps.
In doing such a proof, one must reason about a number of issues
simultaneously: the problem to be solved, the data structures, and the
algorithms used in the solution.  Using a series of refinement steps
helps factor out and modularize such decisions, allowing them to be
dealt with separately.  This often simplifies proof obligations and
helps make reasoning made more manageable.

When using refinement, one does not necessarily distinguish between
properties and models. Essentially we are working with models in a
modeling language and the important property to be proved of some
model $M2$ is that it is a refinement of some other model $M1$.
So the answer to the question ``what properties should we prove of a
model?''~is ``those properties that help show that it is a
refinement of its abstraction.'' For the most abstract models, the
important property is that they satisfy the requirements of the
problem domain. This is an informal check which can sometimes be aided
by checking required ancillary properties. With a refinement approach
the ``creative'' input in a development is a collection of explicit models
at different levels of abstraction. The invention of ancillary properties is 
dictated by the need to prove refinement between these explicit
models. Creating models at different levels of abstraction, or reusing
previously-available modules with specifications, fits well
with an engineering approach.

\subsection{The Goal of Verification by Construction}

Existing theories, languages, proof techniques and tools for
verification by construction need to be evolved to address more fully
questions Q1, Q3, and Q4 above. This will lead to powerful tools
that will:

\begin{itemize}
\item
Support the construction of models (specifications, designs, programs)
at multiple levels of abstraction,

\item
Support the verification of refinement between models,

\item
Support the verification of modules built from other modules,
and

\item
Support verified construction of complex systems consisting of
software and environments in which software operates.
\end{itemize}

The feasibility of these results will be demonstrated through their
application to the development of complex software systems. The long
term directions described later are intended to lead toward these
goals. We also suggest some short-term directions which can build
immediately on existing work in the area and will contribute to
elaboration of the longer term problems and their solutions.

\subsection{Short-Term Research Directions}

The following are some short-term (5-7 year) research goals.

\subsubsection{Range of Case Studies}

Develop and open for scrutiny several case studies of verification by
construction, using existing techniques and tools.  These case studies
should be selected from the class of verification problems considered
for the grand challenge project, and might include some of the overall
project's challenge problems.  Some case studies should focus on verification
of modules.  In all cases, the studies will help identify
particular areas for improvement in the approaches.

Researchers should consider developments in which not every part of
a design is mapped down to fresh code, rather some parts are
implemented by legacy systems.  The specifications of the legacy parts
need not appear at the highest level, rather they
could be introduced in later refinement steps.  The correctness
of the overall system implementation with respect to the abstract
specification would be conditional on the assumption that any legacy parts
satisfy their specification; an assumption whose discharge may be
tackled by other parts of the grand challenge.

Existing research projects and efforts have made requirements
documents and formal specifications available and these could be
used as starting points and built on further
\cite{Matisse03,Rodin06,Stepney-Cooper-Woodcock00}.

\subsubsection{Links between tools}

Build links between existing tools to support verification by
construction. In particular, build links between proof obligation
generators for refinement checking (as found in B and Z for example)
and
\begin{itemize}
\item
the latest powerful theorem provers, model checkers and SAT solvers, and

\item
automated invariant generation tools (such as Daikon \cite{Ernst-etal01}).
\end{itemize}

Existing work that could be used as a basis for tool integration
work includes the Eclipse-based Rodin platform for refinement \cite{Rodin06}
and the Community Z tools initiative \cite{CZT06}.

These experiments will guide the long term direction of a unified
tools framework for verification by construction.

\subsubsection{Programming Language Mappings}

Models at low levels of abstraction need to be converted to executable
software. The effective way of doing this is through tool-supported
mappings to existing programming languages such as Ada, Eiffel, Java
and C\#. 
In the medium term these mappings should be pragmatic and
their soundness provided through informal arguments. To increase
confidence in the resulting code, the mappings should also generate
appropriate formal annotations (e.g., SPARK, Eiffel, JML or Spec\#
assertions) from the models and ancillary properties. This allows the
generated code and annotations to be analyzed using existing program
analysis tools.  For some applications or domains it may be appropriate
to consider mapping low-level models direct to byte code by-passing
the compiler.
Since the code generation problem is essentially the problem of program
generation, the research directions pointed out in
Section~\ref{sect:pg} also apply to this problem.

Examples of automated mapping of models to code are found in
AtelierB \cite{ClearSy96}, which supports generation of C and Ada code
from low level B models,
and the B-Toolkit \cite{BCore99}, which supports generation of C code
from low level B models.

% [[[Would any of the following be useful directions: Extending tools
% upward toward program generation from models?  What about tools
% like B are not good enough?]]]

\subsection{Long-Term Research Directions}

The following are some long-term (8-15 year) research directions in
the verification by construction approach.

\subsubsection{Evolution + Refinement}

Refinement is never purely top down from most to least abstract, because
it is difficult to get the abstract model precisely
right. One usually starts with an idealistic abstract model because
that is easy to define. As refinement proceeds and more architectural
and environmental details are addressed it often becomes clearer how
the ideal abstract model needs to be modified to reflect reality
better. Modifications to some level of abstraction will ripple up and
down the refinement chain. This is not a weakness of the refinement
approach per se, rather a reflection of the reality of engineering of
complex systems. The theories, languages, proof techniques and tools
need to support evolution of designs during and after development with
minimal effort.

\subsubsection{Complex system design}

Control systems, interactive systems, and distributed systems involve
multiple agents  (users, environments, new programs, legacy
code) all of which contribute to the correctness of a system.
Individually the agents may be very complex, so reasoning about
compositions of agents in all their gory detail may be infeasible. Instead,
there is evidence that it will be feasible to reason about complex
systems through good use of abstraction, refinement and
module composition
\cite{Butler02,Butler-Sekerinski-Sere95,Hayes-Jackson-Jones03}.

The extent to which one must consider the operating
environment when developing software
depends on where one draws the boundaries of
the system.  To reason about the validity of any fault tolerance
mechanisms, it is useful to include some abstraction
of the environment in the formal models in order to verify the
effectiveness of these mechanisms.  For example,
when reasoning about the
effectiveness of a security protocol, it is usual to include some
abstraction of an attacker.  The goal is not to implement the
attacker, rather it is to show that the protocol achieves its
security goal even in the presence of an attacker, under some
assumptions about attacker behavior.  These assumptions about
attacker behavior can be encoded in the formal abstraction
of the attacker.

\subsubsection{Richer Refinement Theories}

Within a particular framework there may be differing strengths of
refinement. A weaker notion might capture the preservation of safety
behavior, while stronger notions might capture preservation of
liveness and/or fairness.  

Another important dimension is
resource usage.  A theory of refinement should ideally allow one to
prove tight bounds on resources, while still permitting abstract
reasoning.  Specifications of resource usage should also not require
reverification when the computing platform is changed.

The refinement relation should enjoy some
form of transitivity. Refinement is based on comparing models
according to some notion of what can be observed about them,
and it is useful to be able to
modify what can be observed at different levels of abstraction. In
particular, the interface to a system is usually described abstractly
and may need to be made much more concrete at decomposition or
implementation levels. In such cases, the observable behavior is not
directly comparable, but needs to be compared via some mapping and
transitivity of refinement is via composition of mappings.

\subsubsection{Refinement Patterns}


A halfway house between transformational and posit-and-prove can be
envisaged, where certain patterns of model and refinement can be
captured and used in the construction of refinements. This is a more
pragmatic idea than transformational refinement in that the pattern
might not guarantee the correctness of the refinement. Instead $M2$
would be constructed from $M1$ by application of a pattern and the
correctness of the refinement would be proved in the usual
posit-and-prove way. Ideally the pattern should provide much of the
ancillary properties (e.g., invariants, tactics) required to complete
the proof, or at least an indication of what kinds of properties might
be needed.

The aim of using such patterns is to minimize verification effort
when applying refinement.
A research goal is to identify such patterns through a range of case studies
and supporting the application of the patterns with tools.

\subsubsection{Integrated Tools Framework}

To a large extent the theory needed to support
verification by construction already exists. The challenge is to
provide a powerful set of tools to support abstraction, refinement,
decomposition and proof. Tools should strive to
achieve as much integration as possible and avoid isolation.  Such
tools should also exploit as much of the existing work in theorem
proving and model checking as possible and should be designed in
anticipation of future advances in these areas.  The same can be said
for using state-of-the-art methods in
programming language design, program verification, and automated
program generation.
As they evolve, the support tools should be
applied to the development of interesting software-based systems. 

\section{Research in Programming Languages}
\label{sect:pld}

This section was mainly written by 
Gary T. Leavens, Simon Peyton-Jones, Dale Miller, and Aaron Stump.

\subsection{Assumptions and Scope}

In this section we assume
that imperative languages are of interest.  
This is not meant to exclude research on other paradigms.
For example, functional languages and domain-specific languages each
have their own advantages for verification.

Also, this roadmap assumes that verifying a compiler (or other
programming language tools) is not a goal of the grand challenge.  This
is not to say that researchers in programming languages are not
concerned about correctness of the tools they produce.  On the
contrary, it is standard, for example, for all type systems in
programming language research papers to come with a formal proof of
correctness.  (The recent POPLmark challenge calls for such proofs to
be written in machine-checkable form~\cite{Aydemier-etal05a}.)
However, it seems likely that such verification problems will be outside
the emphasized areas of the grand challenge.

\subsection{Programming Language Approaches to Verification}

Aside from using refinement to derive programs that are
``correct by construction,'' 
program generation (including certifying compilers
\cite{Morrisett-etal99}), and direct use of semantics\footnote{ 
Besides use of Hoare logic, or ``axiomatic semantics'' \cite{Hoare69}
one can also specify and verify software using denotational \cite{Schmidt86}
or operational semantics \cite{Astesiano91}.
However, these styles are not typically well-suited for specification
purposes, at least for imperative programs.
}
we know of the following main approaches that directly aid
the verification of software.

% [[[Simon suggests that we compress the follow 3 subsections into
% bullets.  I still think it's helpful to have some representative
% sample of these ideas (e.g., for researchers in other areas), but I
% see the point that this may be irritating. -- Gary]]]

\subsubsection{Type systems}
Types are weak specifications \cite{Howard80} that are automatically
checked by compilers.

Type systems are a long-standing topic of interest
in programming language research.  Early work in type theory
\cite{Constable89,Nordstrom-Peterson83} showed how dependent types
allow a type system to express complete functional specifications as
well as constructive proofs of program correctness, at many levels of
detail.  Examples of dependently typed programming languages where
this idea is explored include ATS, RSP1, $\Omega$mega, Epigram,
Cayenne, and Martin-L\"{o}f type theory
% [[[Aaron, can you separate the following so that each is cited after
% the name?]]]
\cite{Augustsson99,Chen-Xi05,McBride-McKinna04,Nordstrom-Peterson-Smith90,Sheard04,Westbrook-Stump-Wehrman05}.
Work by Voda has similar goals \cite{Voda03}.  

\subsubsection{Program Analysis}

Program analysis gathers information that safely approximates what
programs will do at runtime.
Static type systems are a special case of static analysis
\cite{Nielson-Nielson-Hankin99}, but program analysis is not
restricted to obtaining information about types.
Like type checking, program analysis can be seen as a way of
doing weak verification; for example shape analysis can be seen as a way of
``computing a safe approximation to a statement's strongest postcondition''
\cite[p.\ 284]{Sagiv-Reps-Wilhelm02}.

Many interesting formal methods tools have checked various properties
using static analyses of various sorts. Examples include
partial correctness (checked by, e.g., TVLA \cite{Lev-Ami-Sagiv00}),
conformance to API protocols (checked by SLAM \cite{Ball-Rajamani02}),
memory safety (checked by Prefix and Prefast \cite{Larus-etal04}
and LCLint \cite{Evans96}),and 
absence of race conditions (checked by Autolocker \cite{McCloskey-etal06}).
% [[[add citations and lots more here]]]
(There are also several systems that look for error patterns,
including Metal \cite{Engler-etal00} and Findbugs \cite{Hovemeyer05}.)

\subsubsection{Assertions}

Assertions are logical properties of a system, usually expressed in
some extension of predicate logic or temporal logic.
Assertions can specify post-conditions for methods,
invariant properties for objects, and protocols that API calls should
obey. 

There has also been a historical strand of work that directly adds
Hoare-style specification and verification to programming
languages. Gypsy \cite{Ambler-etal77} and Alphard
\cite{Hilfinger-etal78,London-Shaw-Wulf76,Shaw81a} are early
examples. The Euclid language \cite{Lampson-etal81,London-etal78} was
notable along these lines; Euclid omitted or restricted several
features of Pascal, as an aid to formal verification.  For example,
Euclid introduced the notion of heap regions as a way to get some
control on aliasing, and also prohibited overlap among the parameters
to procedure calls.  The SPARK subset of Ada \cite{Barnes03}
continues this tradition.  Perhaps the most successful such language
is Eiffel \cite{Meyer92b,Meyer97}, which takes a very pragmatic
approach to specification and focuses on run-time assertion checking.
The ESC system \cite{Detlefs-etal98} is an interesting hybrid, since
it uses assertions, but in some ways is more like a static analysis system.

\subsection{Problems with Current Approaches}

We see several overall problems with the above
approaches to directly aiding verification.

\subsubsection{Effort Needed for Verification}

Programmers are less likely to use a technique if it does not allow
them to suppress proofs or details.

For example, when using a dependent type system, the need to provide
proofs of correctness along with executable code limits the appeal of
dependent type systems, since this demands substantially more work
than needed in currently popular programming languages, and the proofs
are not optional.  A potential way out of this difficulty for
dependent types is shown by Dependent ML, 
which, while also based on dependent types, has the goal of checking
properties without programmer-supplied proofs \cite{Xi03}.
Thus one research direction would be to explore how to gain the
advantages of dependent type systems without the need to explicitly
supply proofs.

Similarly, when using assertions, one often has to specify many
properties in addition to the property of interest.  
The Bandera system \cite{Corbett-etal00} and SLAM
\cite{Ball-Rajamani02} both use slicing \cite{Tip95,Weiser84} before
model checking to avoid state space explosion.
An interesting research direction would be to use slicing more
extensively in other kinds of verification.

\subsubsection{Lack of Extensibility}

% [[[Simon comments:it's hard to argue against the idea that "extensible type
% systems" would be a good thing, but so would world peace.  It seems too
% aspirational to me to deserve a section (a sentence perhaps).  Why do
% languages allow for just one type system?  Because we don't have any
% idea how to give them extensible type systems -- or perhaps a
% dependent-type person would say that a dependent type system *was*
% extensible.  Anyway, this section seems either vacuous or contentious to
% me.]]]
% [[[However, I'm leaving this in for now, as I think it's one of the
% few interesting recommendations we make; I'd put 

Current programming languages often fix a particular notation and
verification technique, and do not allow users to modify or add to it.
For example, it is hard to find a single level of specification beyond
types that all programmers would agree is worthwhile.
Indeed one might criticize most languages
where types play a central role for taking an important concept and
freezing it.  That is, if types are so important, why do languages
(like Java, Standard ML, and Haskell) allow for just one type system?
It would seem more valuable to first see a programming language as
describing an untyped computation and then allow for various ways to
infer the various kinds of typings as well as other static properties.
Also, types are open-ended: there is no one best type system,
and researchers will always be making new proposals for better systems.
Similar remarks apply to assertion languages and static analysis
frameworks.

Thus a research direction would be to find a more open architecture
for programming language definition (and implementation) that
allows the use of multiple type systems, multiple static analyses and
multiple different kinds of assertions.
Ideally, it would be best to allow these different kinds of
annotations to interact with each other. For example, it would be
great if specifications written using assertions could refer to
properties (such as what variables are assigned) that are covered by a
static analysis.

\subsection{Short-Term Research Directions}

In this section we describe some ideas for research directions in the
short term (5-7 years), with two goals: directly supporting
specification and verification, and eliminating much of its drudgery
by eliminating common problems.

\subsubsection{Supporting Specification and Verification Annotations}

Basic language features for supporting specification and verification
have been discussed above, in the section on specification languages.
These should be investigated for their interactions with programming
languages and systems.  For example to what extent can optimizing
compilers and other kinds of static analysis make use of such 
information?

There is one important aspect of programming language designs that
could greatly ease specification and verification, which is to design
languages so that expressions (or at least some identifiable subset of
expressions) have no side effects.  Side effects in expressions make
it difficult to follow Eiffel's lead in using programming language
expressions in assertions \cite{Meyer92b,Meyer97}.  While some
languages in the Pascal family (including Euclid \cite{Lampson-etal81}
and Ada \cite{Ada95b}) already do this, based on Pascal's separation
of functions and procedures \cite{Jensen-Wirth85,Wirth71a}, it
deserves to be more widely followed.

Tools for programming languages could also be designed to better
support specification and verification annotations.  Ideally
annotations should be provided in an open manner, which would allow
users and tool providers to add to the set of annotations.
Meta-information such as the annotations of Java and C\# are useful
for this purpose, but are weak in that they do not allow full use of
the language's expression syntax and are not hierarchical, and thus do
not support rich syntax for specification. Furthermore, to support
typing and verification, annotations must be permitted at all levels
of syntax; for example, adding annotations to statements is necessary
to specify the effect of a loop.

Another way that programming languages could aid working with
annotations is if they would allow annotations to substitute for
code.  That is, a tool should be able to manipulate a program in which
some parts are not implemented in the language, but are merely
specified with some annotations.  Achieving this kind of
``specification closure''
would help researchers working on compilers and interface specification.

\subsubsection{Eliminating Drudgery in Specification and Verification}

Programming language design can reduce the cost of specification and
verification by keeping the language simple,
by automating more of the work (e.g., by propagating
type information), and by eliminating common errors.
(Eliminating common errors would also help make programs more
reliable, even if programmers do not use verification techniques.)
Historical examples include elimination of dangling references by the
use of garbage collection, encapsulation of iteration idioms (such as
map or for loops), type systems that avoid null pointer
dereferences (as in Lisp or CLU \cite{Liskov-etal77} and the work of
F\"{a}hndrich and Leino \cite{Fahndrich-Leino03}), and SPARK's
elimination of conditional data flow errors (such as reading from
uninitialized variables) \cite{Barnes03}.

It seems like a fruitful research direction to try to eliminate other
common errors, such as array indexing errors, perhaps by using
dependent types or by using modulo arithmetic to map all integers back
to defined array elements.

It is perhaps also useful to look closely at verification technology
and to see what features of programming languages cause the most
trouble for verification efforts.  Following the lead of Euclid
\cite{Lampson-etal81,London-etal78}, and SPARK \cite{Barnes03}, it
may be interesting to try to design languages (or subsets) without
such features.  Another way of putting this research question is: what
features that are not in languages like SPARK can now be handled
without causing difficulty for verification?

Some common errors may not be problems with language itself, but may
be problems with use of libraries or simply mistakes that programmers
commonly make.  Can rules for automatically finding such common
errors, as is done in Metal \cite{Engler-etal00} and Findbugs
\cite{Hovemeyer05}, 
be added to a programming language, under the
control of tool builders or users?
One simple direction for achieving allowing such extensions
may be to add features like \texttt{declare error} and
\texttt{declare warning} from AspectJ
\cite{AspectJ04,Kiczales-etal01}, although such mechanisms may be too
simple to handle all the kinds of bugs detected by such tools.

\subsection{Long-Term Directions}

In the longer term (8-15 years), one can contemplate more integration
instead of just promoting extensible tools to aid specification and
verification.

\subsubsection{Integration of Tools and Languages}

Make the programming language's compiler a platform
that makes it easier to build and integrate multiple specification and
verification tools.  Eclipse may be an example of the kind of
development platform that is headed in the right direction, but it
would need to be substantially enhanced to allow for the addition of
multiple tools and to support their integration.

\subsubsection{More Integration of Types and Specifications}

Another goal is to find potential ``sweet spots'' that are
intermediate between full functional (or control) specifications and type
systems.  Dependent types might be
helpful as a technology for verification of such partial
specifications, but they must be 
made much more accessible to programmers.

\subsubsection{Integration of Rich Static Checking}

Support the integration of rich static checking
(verification of partial specifications) in the programming language.
Researchers could explore taking some existing programming languages
and providing support for flexible deduction to be allowed on source
code and any assertions that are associated with that code (either in
the code as type declarations, loop invariants, etc.) or separately.

Allow for possible community-based inference to be performed on a
module-by-module level.  Provide the elements of a computational logic
that could help in performing basic source-level manipulations such as
substitutions and unification.  An example of such a scheme can be
found in the work of the Ciao system \cite{Hermenegildo-etal05}.

\section{Conclusions}
\label{sect:conclusions}

This roadmap has described ways that researchers in four areas ---
specification languages, program generation, correctness by
construction, and programming languages ---
might help the verified software grand challenge project.
Researchers in these areas need challenge problems to be described in
many different ways, including requirements, source code, and test
cases.

In the short term, a common research goal shared by all four areas is
building extensible tool frameworks that would allow
researchers to more easily implement specification and verification
tools.  This could lead to the exploration of more research ideas and
to more careful evaluation of these ideas.  

In the long term, researchers can try to consolidate the best of these
ideas into new theories and tools.

\section*{Acknowledgments}

Thanks to the members of IFIP Working Group 2.3 (Programming
Methodology) for discussions and for comments on an earlier draft of
this material, presented at the Brugges meeting in March 2006.
Special thanks to Michael Jackson (the one involved in IFIP WG 2.3)
for his advice on narrowing the scope of this roadmap: ``specialize!''
(Yes, it was even broader previously.)
Thanks to Shriram Krishnamurthi for several discussions and suggestions.
Thanks to Rod Chapman for comments, ideas, and corrections relating to SPARK.
Thanks also to the participants at the SRI Mini-Conference on Verified
Software (April 1--2, 2006) and to the Dagstuhl workshop
on ``The Challenge of Software Verification'' (July 10--13, 2006)
for additional comments and suggestions.
Thanks to the US National Science Foundation for grants supporting
these meetings and for supporting, in part,
the work of Leavens (CCF-0428078 and CCF-0429567),
Fisler (CCR-0132659 and CCR-0305834), and Stump (CCF-0448275).

\nocite{FMIA95,FME03}
\bibliographystyle{plain}
\bibliography{biblio}

\end{document}
by Gary T. Leavens last modified 2006-07-21 16:17
« November 2008 »
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
What's up ?
Be notified when a document is published in this folder or below.
 

Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: