File contents
% Areas we're missing, and possible people to help
% * hybrid systems - Rajeev Alur
% * theory of 'evidence' for system certification - John Rushby
% * ? verification of machine-learning systems? other special areas?
% * concurrency - Peter's text is perhaps missing some dimensions such as
% composable abstractions, distributed systems
% * can we cite survey papers or relevant conferences for each topic?
\documentclass[11pt]{article}
\addtolength{\textheight}{10ex} \addtolength{\voffset}{-5ex}
\addtolength{\textwidth}{4em} \addtolength{\hoffset}{-2em}
\title{Report of Theory Panel\\for Verified Software Grand Challenge}
\author{
Egon B\"{o}rger \\ Universit\`{a} di Pisa
\and Xavier Leroy \\ INRIA Rocquencourt
\and Markus M\"{u}ller-Olm \\ Westf\"{a}lische Wilhelms-Universit\"{a}t M\"{u}nster
\and David Naumann (chair) \\ Stevens Institute of Technology
\and Peter O'Hearn \\ Queen Mary, University of London
\and John Reynolds \\ Carnegie Mellon University
\and Peter Sestoft \\ Royal Veterinary and Agricultural University
\and Steve Zdancewic \\ University of Pennsylvania
}
\newtheorem{milestone}{Milestone}
\usepackage{html} \newcommand{\link}[2]{\htmladdnormallink{#2}{#1}}
\begin{document}
\maketitle
\begin{footnotesize}
\tableofcontents
\end{footnotesize}
\section*{Preface}
The Theory Panel was established at the VSTTE conference, Zurich, October 2005,
with the purpose of formulating, within nine months, a roadmap for the theory
dimension of the Verified Software grand challenge.
Such a roadmap would identify research challenges in the theory of program
construction and analysis that are critical for the overall goals of the
Verified Software grand challenge.
It would serve to coordinate activities of researchers who seek to meet those
challenges and to generate publicity to broaden the participation in the effort.
This report presents the findings of the panel, with additional contributions
by Tony Hoare, Mark-Oliver Stehr, and Jim Woodcock.
This report provides a broad overview of challenges, highlighting key long- and
short-term milestones and indicating the interested researchers and groups known to us.
Ultimately, a theory roadmap should enumerate and prioritize concrete activities
to be undertaken to address the challenges, in coordination with plans for experiments and
tool-building activities. But that level of detail can only be achieved as more
specific goals and activites emerge in the roadmaps for experiments and tools.
This report will serve as the initial version of an online document open to
ongoing contributions by all interested theorists.
% To this end, the panel will hibernate for a couple of years and then reconstitute itself.
\section{Introduction}
The Verified Software grand challenge was initiated by Tony Hoare to
develop an integrated automated tool suite for establishing the total
correctness of a significant body of software~\cite{Hoare-JACM-03}.
An IFIP Working Conference on Verified
Software: Theories, Tools, and Experiments was held at Zurich, Switzerland in
October 2005~\cite{VSTTE05}.
There have been a number of other meetings and related publications.
The reader is supposed to be familiar with the vision and goals as articulated
for example in Hoare's paper or the more recent article by
Jones, O'Hearn and Woodcock~\cite{Jones-OHearn-Woodcock-06}.
All of these will be superceded by the overarching Verified Software Roadmap to
appear soon.
This introductory section describes the role of theory,
highlights some accomplishments, and sketches some challenge areas.
Subsequent section delve into the theory challenges more deeply.
An appendix contains additional challenges and milestones which were discussed
by the panel but pertain mainly to experiments or tools.
\subsection{The role of theory}
% Hoare
Research into theory plays a literally fundamental role in the Verified Software
project. Theory provides the conceptual and mathematical
basis for design of all programming tools including verifiers, and it
provides the general scientific hypotheses which are tested by
experimentalists on a range of realistic programs. Theorists will be
the first to address verification challenges of new kinds, outside
the reach of current understanding. And their research results will
be the first to help the programmer in the earliest stages of system
design, where concepts and techniques can be useful even if not supported by
well developed tools.
%[[DN: Tony, is this what you meant by this sentence?]]
Theory will support the programming process by the provision of
reusable domain models in important areas of application.
It will provide the foundation for
construction and analysis of \emph{golden} or \emph{ground models}~\cite{BoergerStaerk},
i.e., abstract rigorous system blueprints that completely document system requirements.
% as starting point (contract) for further design leading to code.
It will
provide an underpinning for the design and use of special-purpose
program generators as well as compilers and other general-purpose tools.
It will provide the basis for the safe
exploitation of standard libraries and other software services such
as databases, GUIs, and various middleware. It will explore the problems
posed by new methods of metaprogramming. It will validate common
program design patterns and safe subsets of programming languages
that simplify the task of verification. It will extend the expressive
power of specification languages and assertions to deal with
intensional properties such as timing, security, and probabilities.
It will unify disparate forms of system description and
thereby support system evolution, wherein changes in component design, hardware
platform, requirements etc.\ necessitate revisions throughout the edifice of ground
model, designs, refinements, test suites, source code, configurtion files, etc.
Finally, work on the Verified Software project will contribute to one
of the long-term goals of every mature branch of science, namely the
unification of theories. This is not only a long term goal, but an
essential short-term prerequisite to ensure soundness of the
combination of verification technologies into a coherent programming
toolset.
Theory has to provide methods for linking in a verifiable way the
application-centric meaning of a system, which (ideally) is expressed by a ground model
(logically prior to coding), to the meaning of the implementing code that is
verified by the compiler. In particular, theory has to define and prove correct refinement
schemes that provide for connections between descriptions at different levels of
abstraction ranging from abstract models to machine code.
Theory also has to justify the integration
of heterogeneous tools necessary to construct and validate links in the
refinement chain, e.g., linking a model checker (that uses operational semantics
and first-order reasoning) with an interactive theorem prover (using higher
order logic to reason about a denotational semantics of the same target language).
\subsection{Background}
Computer science theory started in the first third of the 20-th century with
computability theory, whose main discovery was Turing's notion of a
programmable universal computing machine. With the advent of computers computer
science theory quickly focussed
\begin{itemize}
\item on the hardware side on the theory of circuits and computer
architectures,
\item on the software side on the theory of algorithms, data structures
and programming languages.
\end{itemize}
Numerous theoretical achievements in these fields are nowadays
incorporated into advanced tools for the design and the analysis of
modern programming languages and compilable programs. This includes
powerful static and dynamic analysis techniques, e.g. type inference
engines, mechanical verification of type correctness and similar
program properties, etc.
Nonetheless, many tools and technologies in common practice ---from UML design to
optimizing compilers to scripting languages--- have flaws that reflect
the lack of an adequate theory.
With the growing complexity and diversity of computer-based systems
one more abstraction level came into view, where prior to coding a
system has to be specified by an abstract model to capture the system
requirements in a faithful and complete manner, which can be used as starting
point for its implementation by executable code.
Indeed, the Verified Software challenge was originally conceived in terms of a
``verifying compiler'' that checks source code against detailed design
specifications expressed as assertions in the program text; but the challenge
has been broadened to encompass system certfication including, e.g., formal
analysis of high level models and even evidence from disparate sources such as
risk analyses and usability tests.
Computer science theory has provided a large
variety of high level specification languages. Some of them come with tools supporting
the development of abstract system models and their refinements to code.
Though some are quite general, there is no single standard specification
language. Moreover, their theoretical bases are not well integrated with the
theories that underly tools for analysis of low level source and machine
code.\footnote{Of course some bridges do exist between theories that are
distinct ---mathematically or in terms of research culture.
For example, CSP as a retract of CCS, dataflow analysis as model checking of
abstract interpretations, Stone duality, Curry-Howard isomorphism.}
Various tools are based on coherent individual theories (e.g., higher order
logic theorem provers; first order SMT tools; analyzers based on abstract
interpretation) but large experiments in verified software will require
integration of disparate tools. The software to be verified is itself
implemented in a variety of languages. A central challenge is for unification
of theories; this can take the form of broad general theories but also ``linking
theories'' that bridge particular gaps~\cite{HoareHe}.
\paragraph{Some successes.}
The most visible success of theory is probably the progress in techniques to
define and implement programming (including hardware design) languages and
the use of even more abstract modeling languages on top of programming
languages. In brief: progress in how we speak about and speak to computing
machines.
Theories of program verification [Floyd,Hoare,...] are manifest in the
ubiquitous use of specifications using assertions including pre- and post-conditions (if not often
mathematically precise ones).
Type theory led to the strong and expressive type systems of mainstream
languages like Java.
Algorithmic complexity theory provides fundamental algorithms and reductions
between problems that are the basis for decision procedures and combinations
thereof, which are at the core of verification systems.
Feasibility (or not) of solutions has been connected to expressiveness of the
logic needed for problem descriptions.
A key example is the notion of NP-completeness and the NP-complete problem of
boolean satisfiability (SAT)~\cite{GareyJohnson,Immerman}.
Deeper study of proof-theoretic complexity has led to increasingly sophisticated
type systems in mainstream programming languages~\cite{PierceAdvanced}.
Recent dramatic advances in the engineering of SAT
solvers, applied to analysis problems ranging from requirements consistency to
instruction scheduling, are a catalyst of the Verified Software project.
The theory of logics and their models, especially temporal logics and
operational semantics, is applied quite directly in model checking tools which
are widely used for verifying hardware (chip design), device drivers (SLAM
project) or certain (including security) protocols in cases where even the
smallest incorrectness could have disastrous effects.
The theory of resolution proving and logic programming is direct basis for
languages like Prolog and Datalog that are commonly integrated in commercial
knowledge base and management decision support software.
% also policy management in security
A well known success of theory in domain modeling,
the theory of relational algebra regularized database design and is the basis
for automatic generation and optimization of query programs from declarative specifications.
Semantic theories of domains, fixpoints, and abstract interpretation
[Scott,Cousot] have more or
less direct application in automatic static analysis tools. Model checking has
been extended to software, e.g., for checking of software protocols such as
device drivers [SLAM], by predicate abstraction and the integration with
automated theorem provers for abstraction refinement.
%Other successes are discussed throughout this report.
\subsection{Theory challenge areas}
%[[DN: High level discussion adapted from Tony's text. Flows nicely but doesn't
%reflect the structure of following sections.]]
\paragraph{Domain models.}
In any practical software project, theory begins to play a role long before the project starts. Its initial role is to construct a mathematical model of the real world domain in which the projected software will be deployed. A good experimental prototype of a domain model is one that describes the process of organisation of a scientific conference [Fisler], or the domain of scheduling algorithms [Doug Smith]. More serious applications (perhaps too serious for experimentation!) are in the service industries, including transport, the health service, and in e-government.
A well-structured domain model will describe all aspects of the real world that are relevant for any good software design in the area. They describe possible places to define the system boundary for any particular project.
%
A domain model makes explicit the preconditions about the real world that have to be made in any embedded software design, especially one that is going to be formally proved.
%
A domain model describes the whole range of possible designs for the software, and the whole range of technologies available for its realisation.
%
A domain model provides a logical framework for formalisation and full analysis of requirements; it reduces the risk of contamination of requirements by the choice of a particular technology of implementation.
%
A domain model enumerates and analyses the decisions that must be taken earlier or later any design project; it identifies those decisions that are independent and those that conflict, those decisions that must be taken early, and those that can be postponed. Late discovery of feature interactions can be prevented.
It will be significant challenge to conduct a broad survey of existing implementations, or devise and conduct other experiments to validate a domain model. But it will be very worth while, because domain models are much more re-usable than any of their implementations.
Challenges in theory for domain modeling are discussed in Section~\ref{sec:model}.
\paragraph{Program generation.}
When our understanding of an application domain has been formalised as a domain
model, the model itself can serve as a specification language and library for
the design of particular applications in the domain. It is often possible and
desirable to write an automatic translator from such a specification to object
code that implements it. Good examples are embedded systems implemented by
translation from Esterel [Berry] and Uppaal [Larsen]. Commercial generators of
this kind include Biztalk [Microsoft] and Mastercraft [Joseph]. Ideally, the
generator should routinely generate assertions with its object code, so that it
can be independently checked to ensure the correctness. These assertions have an
independent practical value when generated code has to be mixed with
hand-written code. The theorist must formalise the interface across which such
``own code'' (a.k.a.\ native code) interventions can be safely integrated with
the generated code.
Theorists have a fundamental role in providing the technology for verification
of the generators themselves. Firstly, they must provide guarantees of the
correctness of the object code generated by each construct in the specification.
This correspondence can then be taken as a specification against which the
generator itself, or each of its uses, is proved correct. Even in the case of
prior verification of the generator, assertions should still be generated to
provide an additional check. There are always applications sufficiently
critical that double checking is a serious engineering option.
Challenges in program generation are discussed in Section~\ref{sec:compile} and
under the next topic.
\paragraph{Metaprogramming.}
Many computer systems of the present day are put together
from a collection of standard utilities such as data bases, spread sheets, word
processors, etc. Obviously, correctness of such programs will depend on some
specification of these utilities.
In many cases, however, the calls on the utilities are actually generated by
string handling code at run time.
Moreover the generators are often written in scripting languages for which
formal semantics may be lacking.
Relatively well-studied languages like Java and C\# offer
metaprogramming capability (reflection) to analyse programs written in the same
language, but reflection is omitted from extant formalizations.
Theory to encompass multi-notation systems and multi-stage computations is at an
early stage of development.
Even modest advances may provide significant improvement in system reliability,
akin to the introduction of type systems in high level languages decades ago.
Challenges in metaprogramming are discussed in Section~\ref{sec:meta}.
\paragraph{Design patterns and libraries.}
Many applications systems of the present day consist mainly of calls to standard
libraries of classes and methods. They are often called in accordance with some
standard Program Design Pattern. Theorists can lead the way in making such
applications amenable to machine checks of correctness. The first requirement
is to make explicit specification of the functions of standard libraries such as
[Java] and [Microsoft Common Base Library]. At present, unfortunately, the only
way of finding out what these programs do is to read their texts.
A more idealistic pursuit would be to redesign the library interfaces in a neater and
more abstract way, following the example of Eiffel [Meyer].
A second task is to formalise the correctness criteria for familiar design
patterns. The obligations on the user of the pattern must be similarly
formalised; preferably these obligations can be mechanically checked so that the
using programs can be more easily proved correct.
But state-of-the-art specification techniques for patterns and extensible
components seem to require higher order or trace-based assertions which are
awkward and ill-suited to automation.
Theoretical advances in this area will have significant impact because
proofs of correctness of design patterns are highly reusable, not like the
proofs of specific individual programs.
Challenges in this area are discussed in Section~\ref{sec:despat}.
\paragraph{Safe subsets.}
Many programmers for critical applications now follow development guides that
restrict the programmer to some safe subset of an existing programming language.
Examples are SPARK ADA~\cite{HallChapman} and Misra C~\cite{MisraC}. The purpose of the subset is to
simplify the proof rules that can be validly applied to ensure correctness to
all programs that obey the restrictions.
Programming theorists have a role in the precise definition of a nested range of
safe subsets of interesting languages; they should formalise the checks that a
program falls within a particular subset, and the simplified proof rules
and semantic models appropriate for each of them. It will be interesting to
check which design patterns accord with which subsets.
This research may proceed either from the top down (starting with an existing
language and excluding its more complex features), or from the bottom up
(starting with an ideal simple experimental language, and adding features that
are recommended by the simplicity of their proof rules). By the end of the
Verified Software project, these two approaches will have met in the middle. By that time,
experiments in the use of the various subsets should give convincing scientific
evidence of the true cost of using the more complex features of some current
languages. Meanwhile, theorists can afford to adopt a scientifically neutral
stance about whether the use of verifiable subsets is morally superior to
current practices.
The formalisation of valid proof rules for all subsets including the complete
language is required to give confidence that the science is sufficiently
advanced to cover every possible requirement. The choice how far to exploit the
further reaches of the science should always be left to the engineer engaged on
a specific project for a particular client.
Relevant challenges are found throughout this report.
\paragraph{Assertion languages.}
For formulation of purely assertional specification languages, standard logic,
elementary set theory and simple discrete mathematics have proved to be widely
applicable. They describe testable relationships between the global variables of
the program, as observed before or after execution of modules of the program.
But there are many desirable ``intensional'' properties of programs, for example
timing, security, probability, interaction, that are not testable in terms of
actual program variables. They can be represented to a program verifier by
systematic introduction of ghost variables into both the specifications and the
programs; but this is error-prone to do by hand and cumbersome to do by machine.
Each of these more intensional properties has been much more elegantly
formulated and explored by programming theorists in isolation from each other.
Relevant execution models are often given in the operational semantics of a
process calculus like the pi calculus [Milner], often supplemented by typing
rules [e.g., Session types]. Suitable specification concepts are often given in
the form of modal logics, for example temporal logic [Manna Pnueli] or
separation logic [Pym Calcagno Yang].
The time is now ripe for exploitation of the results of these researches in the
context of a more integrated program verification tool. For example, we would
like to find a neat way of introducing special-purpose modal logics into a
standard assertional language. Modal logics can be given a semantics by making
explicit some relevant but unmentioned ghost variable, which can only be
quantified in the way permitted by the definition of the modal operator.
For modals to be used in ordinary assertions, the program semantics needs to
includes suitable updates of ghost state, e.g., appending actions to a history
trace.
A challenge is to hide these updates in code just as they are hidden by the
modality in formulas.
Section \ref{sec:despat} poses challenges for assertion languages.
\paragraph{Verifying the verifiers.}
A point of some discussion at the Verified Software meetings has been the
extent to which verification tools should themselves be verified.
On one hand, it is still challenging to functionally specify and verify
applications in seemingly modest domains such as business processes, so it could
be even more difficult to deal with highly optimized theorem provers and model
checkers.
It can be argued that priority should be given first to experimental
investigation to determine which methods and techniques are most useful, before
verifying their implementations. But J Moore's VSTTE talk pointed out that such
work can go in parallel with verification of the tools.
On the other hand, verification tools are based on well developed theory and
moreover are very familiar to researchers, so at least requirements
specification should be less difficult [B\"{o}rger JBook etc].
As in other application areas, analysis of high level designs can have high
payoff: here, that means validation of integrated decision procedures
[Deconstructing Shostak], type systems and the like.
The most widely used and highly complex tool is the compiler, for which
verification poses many theoretical challenges detailed in Section~\ref{sec:compile}.
Ultimately, interworking tools will be founded on linking-theories
that connect the different logics and semantic models that specify individual
tools.
This is discussed in Section~\ref{sec:utp} on unifying theories of programming
% A long term challenge is for a theory that gives separate accounts for
% individual notations, so that development and analysis of a system component can
% be carried out not in terms of a fixed ``language'', a subset of which is used
% in the component, but rather a selection of notations needed for that
% component. The theory should account for reasoning principles associated with
% these notations and in particular guide the composition of sound tools no more
% complex than necessary for the particular selection of notations.
% [[DN: tie to reqt's database]]
\section{Modeling and specification of requirements}\label{sec:model}
A large portion of current software has been developed on the basis of
requirements for which only informal and often quite incomplete specifications exist.
In cases such as a novel web-based service or decision support tools for a
financial analyst, deployed versions of the software serves as prototype for
exploring possible requirements. In such situations, formal specification can
still play a role for internal component interfaces, and verification of
low-level properties such as memory safety can be carried out in the absence of
explicit specifications. But in this section we address cases such as avionics,
automotive software, and eventually medical devices, where systems are subject
to rigorous certification.
For systems subject to certification and for other major pieces of software such
as an operating system kernel or programming language compiler, a ground model
is important.
Ground models are based on domain models.
Domain models for some areas are well understood, e.g., compilers and
perhaps operating system, relational databases, various financial processing and
other application areas;
but others lack well understood and accepted domain models, e.g., complex
business arrangements among mutually untrusting hosts in distributed setting,
subject to malicious attacks.
Theorists may contribute to the development of novel domain models and improved
notations and concepts for ground models.
But there are already many very general frameworks for modeling and
specification, including for example ASMs~\cite{BoergerStaerk} and B abstract
machines, higher order logics, Broy's theory, and CSP/Object-Z which combines
an applied process calculus with rich data modeling using set theory.
The challenges are mainly ones of engineering, experimentation and tool
development ---in particular, objectively assessing competing approaches and
synthesizing their best features.
The panel identified only two major challenge areas for theory per se:
intensional properties and integration of multi-perspective descriptions.
\paragraph{Multi-perspective descriptions.}
Because a ground model is the primary formalization of requirements that
otherwise have only informal description, its validation involves
experiments including simulation/prototyping for inspection from the different
points of view of the various stakeholders in the requirements.
Another sort of validation is by checking for inconsistencies (e.g., from
feature interactions) and by checking mutual consistency of redundant
descriptions by different means (e.g., checking that some high level property
such as ``the patient is never over-dosed'' is a consequence of the stated
requirements).
Modularity is crucial to facilitate various kinds of validation of domain models
and ground models (and for many other reasons discussed throughout this report).
To be useful, models and validation methods need to be sound and also to have
adequate capacity and performance; both soundness and performance are achieved
largely through decomposition. For this reason, modularity in many forms has
been a focus of research for decades. Many of the issues are relevant to
designs and intermediate models as well as to requirements; see section~\ref{sec:refine}.
As discussed in B\"{o}rger's VSTTE paper~\cite{} theory is needed for semantic
composability of modules addressing ``crosscutting concerns'' [Kiczales] which
appear also at the level of software design: a system decomposition is based on
some aspect or criterion and there are invariably other considerations that beg
for an orthogonal decomposition.
A conceptual and technical basis is needed for integration of views meaningful
to different stakeholders or at different levels of detail.
For hybrid systems, relevant views involve disparate models of computation
(discrete event, continuous time, synchronous dataflow, etc).
Abrial's VSTTE paper argues to construct a system, and in particular the models of its
domain and application requirements, as a database.
This perspective also raises the need the need for multiple orthogonal
decompositions and views of the ground model and the domain model on which it
rests.
Relevant work includes category-theoretic constructions [Goguen,Meseguer,?] but
these remain to be specialized to particular domains and connected with less
abstract specification notations such as set theory and temporal logic.
The focus here is on integrating multiple perspectives on the ground model.
A closely related area is refinement between models, for the purpose of
connecting the implementation with its requirements, or even constructing the
implementation based on requirements. Refinement is the topic of Section~\ref{sec:refine}.
\paragraph{Intensional properties.}
Functional specifications describe behavioral properties: a computation may be
observed to satisfy a specification or not.
Intensional properties go beyond ordinary observations of behavior.
For example, noninterference~\cite{GoguenM82} is about the flow of information within a program
and its formalization involves observation of multiple executions.
Time and space complexity involves use of resources that do not exist in the
more abstract models typically used for functional specification.
Such non-functional properties are important in practice.
Yet abstracting away from details is crucial for high-level modeling.
And escalating from ordinary per-computation properties to sets of computations
risks adding unintuitive complexity to specifications and models.
(Though the example of computation tree logic shows that it need not be so.)
In the case of security, it is not entirely clear what are the desired
properties, even intuitively.
For example, many notions of noninterference have been proposed but it remains
unclear how to account for downgrading of security labels, policies that are
updated at runtime, etc.
Another challenge in security is to account for covert flows of information,
e.g., by means of timing or power consumption.
This overlaps with the general problem of specifying resource assumptions and
consumption and the problem of consistent specification at multiple levels of
abstraction.
Given a means to refer to a resource, e.g., a ghost variable that represents
time, explicitly bounding resource consumption is merely a safety property.
But what about asymptotic bounds, e.g., for specification of tools and in case
specific bound functions may not be known?
Moreover, theory is needed for robust models of resources at various levels of
abstraction.
A particular challenge is the time and space complexity of lazy functional
programs, for which naive introduction of ghost variables is inadequate.
A problem of immediate practical importance is specification of
resource properties for embedded components (aiming for generation of executables based on resource parameters).
Beyond the modeling and specification of intensional properties there is the
problem of reasoning about them. [[DN: ....about which we have little to say?]]
\section{Refinement and compositionality}\label{sec:refine}
The basic idea of refinement is that requirements are connected to implementation by a chain of
intermediate system descriptions.
Following the initial requirements specification come more detailed
specifications that introduce system components and their interfaces as well as
other design decisions such as data representation and synchronization patterns.
The last links may be source code, then intermediate languages within stages of
a compiler, then bytecode and linkable assemblies, and finally linked
executables for one or more hardware components.
In current practice, many software projects have only vague and incomplete
documentation of high level requirements but some have nontrivial specifications
for internal component interfaces. Refinements may be present in the form of
refactorings and versions in an iterative development process.
A motivating ideal is of systems that are ``correct by construction'' because
each link is constructed by systematic means, largely automated, from its predecessor.
For links that bridge a relatively small gap, such as from Java source code to
bytecode or from a Simulink model to source code using a custom library,
the constructive approach is well established in practice though theoretical
challenges remain.
This topic is of such importance that there is a separate Verified Software
panel for it~\cite{Leavens-etal06a}.
The gap between application requirements and detailed design is typically
spanned by positing the design and later making significant revisions, along
with requirements revisions, when experimental results are obtained, sometimes
as late as beta-testing or later.
Even in entirely post-hoc verifications, the difficulty is greatly reduced by
decomposing the proof into a series of relatively small refinement steps.
For this to be possible, intermediate stages must be modular so that a step can
be small because it involves revision of a small unit which interfaces in a well
defined way with relatively independent other units.
%[[DN: horiz/vert terminology here]]
One of the catalysts of the Verified Software project is success in verification
of low level code using predicate abstraction and abstraction refinement.
This is an iterative technique wherein an automatically generated approximate
program semantics is used in a verification attempt; separately, accuracy of the
approximation is checked. If the abstraction is inaccurate, the failed check
guides construction of a better abstraction.
If successful, the process results in construction of an approximation that
refines the program and is refined by the specification.
Scientists need not take a position on the extent to which systems are to be
derived from their specifications or verified post hoc.
A theory of refinements accounts for the different levels of system description,
proof obligations for refinement steps, and monotonicity of system constructs
with respect to various notions of refinement ---regardless of which parts of
the chain are made first in a development process.
By taking refinement chains as object of study, theory can contribute to system
evolution: when one link is modified, how can the others be modified with
proportional effort?
\begin{milestone}(8-10 years)
Give semantics and annotation languages for a fragment of Java together with a
fragment of C.
Give rules for modular reasoning about Java programs that contain ``native code'' in C.
\end{milestone}
An issue here is that inline assembler or C exposes aspects of machine semantics
that in well designed programs are carefully localized.
Theory should account for this: providing sound reasoning about the fragments
that depend on the low-level semantics and sound rules that delimit the scope of
low-level effects and thereby justify reasoning outside that scope using a more
abstract semantics.
A specific issue is that inline assembler can use address arithmetic and other
ways to expose memory addresses, which can subvert expected encapsulation and
security properties achieved by treating pointers as an opaque type with no
operations other than field access and equality test.
\begin{milestone}(5 years)
Give a semantics that exposes a concrete model of pointers for use in inline C
together with a theory that delimits its use and justifies more abstract
reasoning about Java source code with well behaved native fragments.
\end{milestone}
The ``refinement generator goal'' is to define practical
model refinement schemes, which capture established system development and programming
knowledge, together with justifications of their correctness. Where the
refinement process is highly creative, interactive schemes can still
be helpful to guide the refinement process by a combination of
user-insight and mechanized tactics. In particular, these refinement
schemes should allow the verifier to turn model properties into
software interface assertions comprising behavioral component aspects,
to be used where state-based run-time features are crucial for a
satisfactory semantically founded correctness notion for
code.
\begin{milestone}(5 years)
Give a theory of refinement to connect sequential object-oriented programs
(in well behaved subsets of Java and C\#) with object models in some formalized
notation related to UML (say, Alloy).
The theory should provide proven-sound refactoring laws at the model and code
level:
given code that refines a model, and a refactored model, the theory should guide
construction of correspondingly refactored code.
\end{milestone}
\begin{milestone}
Refinement verifier challenge: enhance current computer-based verification
systems by means to prove the correctness of practical model refinement schemes.
\end{milestone}
\begin{milestone}
A well developed theory of design and reasoning principles to design and verify system components and their compositions (say, a web service model, design, or implementation) using only the interface specifications. By system component we emphasize components in the large, with complex interfaces and behaviors.
\end{milestone}
Intensional properties are often not preserved by extant notions of refinement.
For many purposes, useful notions of equivalence and refinement are deliberately
defined to ignore differences in resource consumption.
A notorious example is secure information flow [Jacob] which is not preserved
under reduction of nondeterminacy in general.
Hehner's theory of programming lends itself to resource-sensitive refinement.
There is recent work by Morgan and others on security-sensitive refinement.
Recent work on refinement and security has been done by Alur et al~\cite{AlurCernyZdancewic}.
To support refactoring and other code revision, simulation-based refinement
theory is used. For OO programs, all the issues that complicate reasoning about
invariants (see Section~\ref{sec:despat}) also pertain to simulation relations.
Some theories and tools, e.g., those based on automata, easily encompass
relational reasoning (i.e., properties of two runs) because their notion of
program is closed under cartesian product. But others, like JML, are closely
tied to a model of state for which neither theory nor implementation provides an
explicit cartesian product (e.g., Java and C runtime systems).
In the absence of an explicit product construct, it may still be possible to
reduce simulations to invariants by encoding a pair of programs as a single one
using available constructs such as sequential composition [Reynolds' method].
This becomes tricky for heap objects~\cite{Naumann06esorics} and effective use
with extant assertion-based tools depends on program
transformations~\cite{TerauchiAiken,Naumann06esorics}.
Interesting transformations, both for design restructing and for code
optimization, depend on conditions that are typically checked by iterative
approximation-based static analyses but which can be specified in terms of
relational Hoare logic [Benton].
%\begin{milestone}
%Give a theory for reasoning about relational properties of heap-manipulating
%programs by Reynolds' method. The theory should provide basic program
%transformations from which can be derived those used in compilation as well as
%those needed to apply Reynolds' method to verifying data refinements of source programs.
%\end{milestone}
\section{Concurrency}\label{sec:concur}
% There are numerous abstract models of concurrency suitable for requirements and
% specification and design refinement. For verification of code, however, we need
% good semantics for shared-memory concurrency with weakly consistent memory
% models. Lock-free programs and transactional memory appear promising and
% important but extant work is relatively informal. Theories like separation
% logic [Reynolds, O'Hearn, et al] and Boogie [Leino, Schulte et al] have good modularity
% properties but are only at the level of source code. A gap remains between
% designs and concrete code. Specification-oriented semantics, e.g., Reynolds'
% grainless concurrency model may help here.
% Rely-guarantee [Jones;Misra,Chandy] is one of the most effective techniques for
% modular reasoning in concurrency. The technique needs to be extended to a richer data
% model that includes the heap or heaps; a promising approach is to do so using
% separation logic [Parkinson,Winskel,...].
Reasoning about concurrency has always been extremely difficult, and the need for workable theories is set to rise with the increasing appearance of multi-core processors.
The difficulty has never been one of describing what concurrent processes might do in a theoretically adequate way. Rather, the core problem is to find tractable ways of reasoning about concurrent programs. In his classic paper 1972 "Towards a Theory of Parallel Programming", Hoare spoke of the problem of controlling, or understanding, the "fantastic number of of combinations involved in arbitrary interleaving". The problem remains, and has only got worse with weak memory models, which give rise to even more combinations. (Note that the difficulty is not to describe those interleavings, but rather is to understand or even avoid them in reasoning.)
There have been significant advances on reasoning about (shared memory)
concurrent programs, including the Owicki-Gries theory, temporal logic (Pnueli)
, and rely(or assume)-guarantee method (Jones, Chandy, Misra), reduction and
atomicity (Lipton, Lamport). More recently,
concurrent separation logic (O'Hearn, Brookes) and Boogie (Leino, Schulte et al) have emerged as methods that provide for modular reasoning about concurrent processes in the presence of heap-allocated data structures.
However, what we see is a mishmash of specific techniques, and very specific advances whose limitations and whose fundamental assumptions are not well understood. For example, Boogie and concurrent separation logic both speak of "ownership", but it is not easy to connect them (their modes of expression being somewhat distant). A proper theory would lay bare the primitive assumptions that give rise to modular reasoning, giving Boogie and separation logic as instances and, perhaps, other, improved methods. Such a theory does not seem near at hand.
Furthermore, all of the best known proof methods can show stress under the simplest example programs. (By "stress" is meant that the specifications and proofs become complex out of proportion with the problem, and detached from reasoning intuition.)
To find smooth proof methods with clear and general and simple foundations is a challlenge for theorists. This challenge should also be matched with the test of proving actual programs. Even now, we would benefit from a clear understanding of where various methods do not work well.
The problems with interleaving show up most vividly in model checking and program analysis. Starting from an abstract model, one can often, in principle, enumerate a finite number of the possible interleavings. But that finite number can be too large to be enumerable in reasonable time, in practice.
Most of the approaches mentioned above have been applied in tools only to safety properties. Liveness raises a host of other issues. It has sometimes been supposed that liveness, by its very nature, requires global knowledge, in a way that is an impediment to scalable reasoning. But some researchers are actively pursuing local reasoning about liveness properties. Automatic verification techniques for liveness properties of programs are in their infancy (e.g., Terminator; Cook, Podelski, Rybalchenko), but are needed because they approximate responsiveness properties of systems.
There has been important work on identifying variations on the concept of "atomicity", as a way to simplify reasoning about the use of concurrent objects. Early work by Lamport, Herlihy and others focussed on the semantic level, in the identification of such concepts as serializability and linearizability. Proof systems for establishing such properties are lacking. There is much that remains to be done, in the foundations and in applications and in tools that check and exploit notions of atomicity.
Finally, in an almost disjoint line of development, there have appeared numerous models of concurrency expressed as calculi or with algebra. Examples are CSP, CCS, pi-calculus and Petri nets. These models were purposely described in a way that is removed from computers, and programs. That was a good choice, but it is somewhat of an embarrassment that they have not been more thoroughly connected to the verification methods, such as those mentioned above, for the code level.
Ideally, one would like to have a refinement method that started from an abstract description of a system (say, using a process calculus) and ending up with verified real code. Again ideally, such a method should combat the "fantastic combinations" problem with interleaving. And it should not rule out low-level concurrent programming based on, say, fine-grained locking or non-blocking primitives. There have been nods in this direction (e.g., by Abrial), but the area is wide open.
To sum up, although there has been important progress over the years, basic
problems in concurrency remain, particularly on obtaining tractable reasoning
and specification methods, and just as particularly on the revealing of the
foundation of these methods.
With all this being said, the opposing position of Lamport - that modularity is
a way to make proofs harder - should be kept in mind.
[[DN: Perhaps this section over-emphasizes shared-memory and verification of
low-level code. What about design theory for scalable abstractions?
For distributed systems, does process algebra adequately model the APIs of
current languages and operating systems, i.e., do we have good semantics
for distributed programs implemented using shared memory and UNIX sockets etc?
]]
\section{Designs, patterns, and program logics}\label{sec:despat}
The verification of detailed designs and implementations using contemporary
programming languages, design patterns, and libraries, poses many challenges
besides the very difficult ones in concurrency.
Such refinement steps are most often expressed as annotated programs with
the design expressed by assertions attached to specific program fragments,
thereby expressing both satisfaction of component requirements and fine-grained
interfaces between relatively small program components.
The requisite semantic models and program logics are well understood in the case
of simple imperative languages annotated with state predicates, but extant theory falls
well short of dealing with the following:
\begin{description}
\item[programming language features] such as inheritance, code pointers,
aspects, and advanced module systems (as well as concurrency and multi-stage
computation which are topics of separate sections in this report);
\item[specification of methodologies] such as disciplines that provide modular
invariants for small configurations of objects by means of special global
invariants;
\item[specification of design patterns] such as the OO patterns
for generic and higher order functionality;
\item[intensional properties] including those like heap ownership/regions and
procedure call traces which facilitate modular specification of internal
interfaces, e.g., frame conditions, and those like noninterference that
originate in requirements specifications.
\end{description}
\paragraph{Language features.}
Semantic models are needed for the language features, including both models of
code in the full generality available with widely used compilers and simplified
models adequate for certain useful and well behaved subsets (together with type
systems or other static analyses that enforce the subsets).
Models are of direct use in verification, e.g., in model-checking and in
interactive proof of specialized rules for reasoning about reusable design
patterns. Models are also used by theorists to validate proof rules and other
forms of axiomatic semantics.
Assertion languages are needed that cater for intensional properties; for the
``protected'' interface associated with inheritance (as opposed to the usual
client interface); for higher order interfaces as found in design patterns and
language features such as aspects.
%[Ideally these models are themselves compositional -- see UTP section.]
One problematic language feature is inheritance. In conventional
object-oriented languages, the syntax does not distinguish between inheritance
used as an ad hoc mechanism (for code reuse) and subclassing intended to be
behavioral refinements
in a coherent classification scheme (e.g., AST node types or to reflect a domain model).
Moreover, although the general notion of behavioral
subtyping is well established, its use in modular reasoning about method calls
has been investigated in several proof and verification systems
[Abadi/Leino,M\"{u}ller/Poetszch-Heffter,O'Heimb,Parkinson,Pierik] but a definitive
theory remains to be developed.
Whereas established notions of encapsulation focus on fully encapsulated
internals versus the external view of an abstraction that is provided to
clients, inheritance involves a third interface: the ``protected'' interface by
which subclasses can access designated state and operations.
Another problematic language feature code pointers. Must first-order
specifications be abandoned? Can the problems with denotational semantics of
stored procedures, vs procedures as arguments as in Algol (work in the 70s by,
e.g., Olderog, Halpern, Damm and Josko,...), be dodged to get good proofsystems?
Can the recent semantic advances [game semantics, Abramsky, McCusker, Honda,
Yoshida, Levy] be integrated with practical verification techniques?
Code pointers are being revisited in low level code (e.g., Appel, Ahmed, Shao,
Thieleke).
\paragraph{Methodologies.}
To recover modular reasoning researchers in OO formal methods have
proposed several formal methodologies.
Each works for a subset of programs that
contain certain specifications and/or annotations
and satisfy certain proof obligations;
these obligations enforce some global property,
which validates some form of modular reasoning.
Methodologies are often tied with specific design patterns, e.g., ownership
pertains to the situation where a single object represents some abstraction
and is implemented using a number of other objects that it encapsulates.
Unfortunately, no single methodology fits all design patterns and coding
techniques.
Moreover, most extant methodologies have been studied and deployed in isolation,
with specialized notations and assumptions tied to particular tools.
For lack of a standard theoretical basis and understanding of the language
design issues, these methodologies
are difficult to compare, combine, and build into tools.
Modular treatment of object invariants is made difficult by the ubiquity of
shared mutable objects and by design patterns that involve reentrant callbacks.
The very notion of when a data structure invariant should hold is in question.
Partial solutions have been developed using complicated syntactic restrictions
[Leino,Nelson,M\"{u}ller] and disciplines based on ghost variables [Boogie].
For modular reasoning, a promising direction is to make the ``heap footprint'' of an
invariant or other assertion explicit, using separating conjunction
[Reynolds,O'Hearn,Yang] or in ghost variable [Kassios].
Heap footprints are also important for specifying frame conditions.
Theorists need to explore extensions of separation logic to richer data models
and language features and to make connections with ownership systems
[Leino; Mueller; Parkinson; Birkedal].
Inheritance is a complication here because in a given context the complete fields of an
object are usually not visible.
\paragraph{Design patterns.}
The lack of true higher order facilities in many conventional languages
allows simpler semantic models and proof rules but has the result that programs
use various design patterns that capture higher order structure such as
the map operation (Visitor pattern) or an application parameterized on an
algorithm or protocol (Strategy pattern). The natural means to specify library
components that implement such patterns is either higher order specifications
(which are unfamiliar and potentially costly compared with usual first order
assertions) or traces of internal method calls (which are intensional and risk
loss of abstraction)
Related to the problems of higher order specifications and reasoning about code
pointers is the use of procedures in assertions.
To cater for practical use of assertions, notations like Eiffel and JML
encourage program expressions in assertions.
Whereas the expressions allowed in early Hoare logics were confined to
arithmetic expressions with clear meaning in mathematical logic, these notations
admit expressions that invoke programmed procedures.
Such invocations are intended to be ``pure'' in some sense, but to cater for use
of extant libraries the notion of purity has been relaxed to allow allocation
and even update of suitably encapsulated objects [Naumann,Benton,Hofmann].
One challenge is that modular reasoning about method invocations is ordinarily
based on their specification, but an invocation in an assertion may be of little
use unless the definition of the method is available ---its implementation in
the programming language or in the language of an underlying
logic.\footnote{This and other challenges for reasoning about sequential
object-oriented programs are posed in a paper by Leavens, Leino and
M\"{u}ller~\cite{}.}
% TBD: besides code+assertions, another form of refinement step is replacement of
% one annotated code fragment by another. Well understood in terms of abstract
% models (transition systems, predicate transformers), this is less developed for
% source code. Reynolds' method involves an intermediate program that interleaves
% code from both versions and thereby reduces the simulation property to an
% ordinary program annotation. For better automation it is promising to use
% logic specifically tailored to simulation [Benton].
\paragraph{Intensional properties.}
Modal logics can provide more effective assertions by hiding ghost state or
making it explicit in a controlled way, e.g., in contrast with brittle and ad hoc
encodings of the heap by explicit ghost variables, separation logic uses a
``separating conjunction'', $*$, to reason about regions of the heap. In a technical
sense, $*$ is a modal operator~\cite{Calcagno-Gardner-Zarfaty-07},
and it may be that explicit ownership systems
like Boogie that rely on ghost state can be expressed in terms of modals.
%[[DN: one modality might be ``when-stable'' with reference to the ``inv'' field.]]
In addition to standard modalities such as knowledge (or ignorance [Morgan]), it
could be useful to have a general scheme for introducing such modalities in
assertions and reasoning tools, e.g., to encode the reasoning associated with
specific design patterns.
For intensional properties in requirements, see Section~\ref{sec:model}.
\begin{milestone}(5 years)
Give a machine-checked theory for reasoning about frame conditions and
[invariants, that justifies the axiomatic semantics used by a static verifier for
a large fragment of Java or C\#.
\end{milestone}
\begin{milestone}(5 years)
Give a theory to account for runtime and static reasoning about assertions that
invoke weakly pure methods and which guides their practical use in interface specification.
\end{milestone}
The topics in this section have mostly been adressed at the level of source code
but, motivated by proof-carrying-code, have also been studied for low level code
[Tan and Appel; M\"{u}ller; Pavlova,Hofmann in the Mobius project; Benton; Spec\#].
\section{Program generation and multi-stage computation}\label{sec:meta}
Programmer-controlled generation of program code and other documents is widely
practiced but there is almost no theory to support reasoning about functional
correctness of such programs. This topic has relations to correctness of code
generation and compilation; but whereas a lot of one-time effort can (and
should) be invested in proving a compiler correct, this section is about
improving the reliability *ad hoc* generators. Thousands of such generators are
written by practitioners, to generate J2EE deployment descriptors and
interfaces, web pages, Javascript functions, various forms of glue code, SQL
statements and so on. Also, such generators are essential to realize model
driven development. Type theories are needed to support checking of multistage
programs.
The challenge is to provide the theoretical foundation for moving away from the
classical static compile-link-run model of language semantics to where
meta-programming, generative programming, model-driven development, partial
evaluation and multistage programming are leading. Here programs are generated
from code patterns, from code fragments written in different languages or from
components, according to directives expressed through metadata, instantiating a
general problem solution to particular cases of the problem. What we need is a
practical theory that brings generative programming techniques, which at present
are mostly embedded in some tools or language features, under the explicit
control of the programmer, providing means to control the structure and the
semantical content of generated code.
There are a number of topics to consider:
\begin{enumerate}
\item More and more code is generated rather than written. By code we mean
here first and foremost source code in a higher level programming languages, but
also GUI descriptions, deployment descriptors, database queries, hyperlinked
documentation, web pages, Javascript code embedded in web pages, and so on.
The tools and notations used to write the code generators often do not
provide even the most elementary guarantees that the generated code is
correct.
\item Generated code may be checked for correctness on at least four levels:
(a) Syntactic well-formedness; (b) Scope well-formedness (when the generated
code contains variable binders, cross-references and similar); (c) type
correctness (when the generated notation has a static notion of type); and
(d) dynamic semantics correctness -- does the generated code mean or do what
it is intended to?
\item Static correctness guarantees are particularly important for runtime code
generation, where a system may fail at runtime due to the faulty generation of
code. For code generated at compiletime or build time, a compilation phase
will usually check at least correctness (a), (b) and (c) before deployment is
attempted, but this is not necessarily the case for code generated at runtime.
\item One can even consider runtime ``hardware'' generation: there are now
systems that include FPGA boards, so one can create and configure a
specialized co-processor at runtime. Such systems are rather hard to debug,
so static guarantees are highly desirable.
\end{enumerate}
Relevant work and people include: Staged computation (Joerring and Scherlis), Lisp/Scheme quasiquotation (backquote and comma), DynJava (Oiwa), MetaML (Sheard), MetaOCaml (Calcagno, Leroy, Taha), Metaphor (Neverov, Roe), dependent types (Martin-Lof, Coquand), the Coq system, Omega (Sheard), Pasalic, Linger, Pfenning, Davies, Nanevski, Pitts, Nielson, Batory, Czarnecki, Smaragdakis, Kamin, explicit substitutions, ...
\begin{milestone}
(5 years) Develop a theory of types for multistage languages that covers
correctness levels (a), (b), (c) and (d). The theory should support type
notations and mechanized consistency checking.
\end{milestone}
Relevant existing theory includes dependent types (Martin-Lof) and nominal logic (Pitts).
\begin{milestone}
(5 years) Develop a theory of types to support safe runtime reflection.
\end{milestone}
\begin{milestone}
(5 years) Develop theory to support typesafe database access from a mainstream programming language.
\end{milestone}
\section{Verification of compilers}\label{sec:compile}
Formal verification methods such as program proof, model checking and
static analyses increase tremendously the confidence we can have in
critical software. However, the code that is formally verified is
rarely, if ever, identical to the code that is eventually executed.
The latter is generally machine code for a hardware processor.
Formal verification methods are almost never applied directly to
machine code (with the notable exceptions of Foundational
Proof-Carrying Code and Typed Assembly Language; some work in verifying
cryptographic devices; [[DN: and the CLI stack?]]).
Instead, the form of the code that is formally verified -- the model of the code
-- is generally one of:
\begin{itemize}
\item
Source code written either in a general-purpose programming language
(C, C++, Java, etc) or a domain-specific language (Scade, Esteral,
Simulink diagrams, etc)
\item An intermediate representation such as Java or .Net virtual machine code.
\item An abstract presentation of the program as algorithmic pseudo-code,
labeled transition systems, functional or relational specifications
in constructive logics, TLA+ specifications, B abstract machines, etc.
\end{itemize}
In the first two cases, the executable code is obtained from the model
of the code by compilation, that is, automatic translation of a
high-level or intermediate language to machine code.
[[DN: well, bytecode linked with the mach code of VM.]]
While globally
well understood, compilation -- and especially optimizing compilation,
where the generated code must meet demanding speed and size constraints --
is a complex process involving delicate code transformations backed up
by static analyses. Bugs in compilers can and do happen, causing the
generated machine code to crash or behave differently from the source
code. An incorrect compiler therefore has the potential to completely
invalidate the guarantees obtained by formal verification of the
source code.
In the third case above, the model of the code used for verification
is further removed from the actual executable code. Sometimes,
source code is generated automatically from the model of the code
(e.g. code generation in the Atelier B; program extraction in the Coq
theorem prover). This can be viewed as an additional compilation
step, with the same risks described above for traditional compilation.
In other cases, the model of the code is extracted mechanically from
the actual source code, using abstract interpretation or static
analyses. The soundness of this extraction is again critical for the
formal guarantees to carry over to executable code. Often, the
model of the code and the source code are both written by hand,
leaving ample room for human error when one is transcribed from the
other.
In the remainder of this section, we focus on the questions
How to establish trust in the process of compilation from
source code to machine code?
How can we ensure that compilation preserves the guarantees obtained by formal
verification of the source code?
Two approaches have been considered to gain confidence in compilation processes:
\begin{itemize}
\item
Compiler verification. Formal methods are applied to the compiler
itself to prove that it meets its specifications. These
specifications can be full correctness properties of the form "the
generated code is observationally equivalent to the source code",
or weaker safety properties such as "the generated code is type- and
memory-safe".
\item A posteriori code verification. Here, the desired correctness property
is established by a separate tool, the verifier, which takes
as inputs the source and compiled code, plus annotations (also
called certificates) also produced by the compiler. The verifier
replies ``yes'' or ``no'' depending on whether it was able to establish
the desired property for this particular run of the compilation
process with the help of the compiler-generated annotations. This
approach encompasses:
\begin{itemize}
\item Translation validation [Pnueli,Zuck]. The annotation is empty or minimal
(debugging annotation) and the verifier proceeds by static
analysis over the source code and the generated code, e.g.
to construct and check valid a simulation relation between the
two codes.
\item Proof-carrying code (PCC). Here, the compiler-generated
certificate is more substantial: in the original presentation of PCC,
it is a proof term in a constructive logic. The verifier is
correspondingly simpler: it checks this proof against a
verification condition generated from the code.
\item Type-preserving compilation and typed intermediate or assembly
languages (TIL, TAL). In this approach, popularized by Java bytecode
verification, the desired property is type- and memory-safety
of the generated (virtual) machine code; it is established by
type-checking of the machine code using type annotations left
by the compiler.
\end{itemize}
\end{itemize}
The two approaches --a priori verification of the compiler or a
posteriori verification of the results of one run of the compiler--
are not exclusive. In particular, a verifier that was itself formally
verified (we have a proof that whenever it says "yes", the desired
property of the compiled code indeed holds) gives formal guarantees
as strong as those of a verified compiler. For each pass of a
compiler, we have a choice between verifying the compiler code for
this pass, or performing it by non-verified code and checking its
results a posteriori using a verified verifier.
Following are the key challenges.
[[DN: I would like to extract from the following the theory challenges;
some of this is really experiments that can be conducted with known theory and
perhaps incremental tool improvements; could go in appendix of the report.]]
\paragraph{Formal verification of realistic compilers usable for critical
embedded software.}
Many proofs of compiler correctness have been carried out already,
both on paper and on machine, but for non-optimizing compilers
operating on simplified source and target languages. The first
challenge is to scale these efforts all the way to the verification of
a realistic compiler that can be used in the context of critical
embedded software. The source and target languages must be among
those used in this context, e.g., C as source language and machine code
for popular embedded microprocessors as targets. The compiler must
generate code that is efficient enough and compact enough to meet
application constraints, meaning that a number of optimizations must
be deployed and proved correct.
\paragraph{Formal verification of static analyses.}
Static analysis plays a crucial role both in optimizing compilers and
in code verification tools such as static bug finders. Formal
verification of static analyzers is needed both
to prove the correctness of optimizing compilers and to establish
trust in these code verification tools. An early success in this area
was the verification of type-checkers for machine-level languages,
such as Java/.Net bytecode verification and Typed Assembly Language.
It remains to extend this success to finer, more diverse static
analyses.
Even for a low level source language like C, type systems are sometimes
used for intermediate languages or assembler (TIL/TAL) that include complex
higher order structure; especially complicated are the type systems for
multistage computation.
A particular difficulty is meta-logic and techniques for machine-checking is
binders (HOAS good [Crary,Harper ML formalization] but hard to do logical
relations, [Pollack06] summarizes alternatives: nominal logic, etc. This issue
is especially critical in operational semantics where substitution is
everywhere.
Many static analyses are described in the literature in very algorithmic or
operational terms, giving the impression that the analysis pertains to very
intensional properties. Some analyses have been described using relational
properties over an extensional semantics [Benton].
Relational reasoning is problemmatic in some approaches to machine checked
metatheory [Schuermann].
\paragraph{Adequate semantics.}
Verification of compilers and static analyzers is carried out relative
to the semantics for the source and target languages. It
is therefore crucial -- and definitely not obvious -- that these
semantics match reality.
For the target languages, the formal semantics of machine-level
languages are well understood and relatively easy to validate against
real hardware by testing. However, this is true only for sequential
code: the semantics of parallel machine code is often unclear,
especially for multiprocessors with shared memory and relaxed
consistency between each processor's view of the memory.
For source languages, it is essential that the semantics
used to prove compilers and static analyzers match, together and with
the semantics used to verify source code by program proof or
model-checking; otherwise, there is no hope that guarantees
established by source code verification carry through to the
executable code. In particular, the operational semantics typically
used for compiler proofs must be validated against the axiomatic
semantics typically used for program proofs. More generally,
a systematic effort should be made to share and reuse semantics across
compilers and verification tools, instead of the current practice of
developing one's own semantics for one's own purposes.
Another dimension is modularity. Good axiomatic semantics supports modular
reasoning in accord with program and specification structure. Operational
semantics needs to model things like separate compilation and multi-stage
computation.
\paragraph{Extension to finer properties of the generated code.}
Past and current work on compiler verification concentrates on two
classes of guarantees over the generated code: type- and memory-safety in the
context of security of mobile code, and observational equivalence or refinement
with the source code in the context of high-assurance software. Some
applications demand stronger guarantees. For instance, consider the compilation
of high-level security constructs (such as secure channels, authentication
logic, etc) down to low-level cryptographic operations. To ensure that no
attacks are possible on the generated code that are not also possible on the
source code, semantic preservation is insufficient: a full abstraction result
between the source and compiled code is needed [Abadi] (and the notion of
observation may need to be very low to address ``abstraction-violating''
attacks). Another example is the
compilation of code fragments written in different languages and eventually
combined by linking the generated machine code. Here, semantic preservation
results that apply only to whole programs are insufficient and must be refined
into properties that take the context into account.
\paragraph{Extension to code generators from higher-level source languages and
from specification languages.}
As mentioned previously, what we consider as "the source code"
(written in a conventional programming languages) is, in many cases,
automatically generated from higher-level, possibly domain-specific
programming or specification languages. Examples include the
generation of C code from reactive languages or Simulink diagrams,
or the code extraction facilities of proof assistants such as Atelier
B and Coq. This approach is to be encouraged as it automates
error-prone encodings of high-level specifications into low-level code.
The formal verification of the correctness of such code generators is
important for the same reasons that compiler verification is
important. More generally, all tools that participate in code
generation for a critical application should ideally be formally
verified at appropriate levels of confidence.
Taking a broad view on
tool verification in general, verification of code
generation tools is more important than verification of code
verification tools, as a bug in the latter can only mask the existence
of a bug in the software, while a bug in the former can introduce
additional bugs.
\begin{milestone}(2 years?)
A benchmark challenge has been formulated to stimulate research in this area:
\link{http://pauillac.inria.fr/~xleroy/listmachine/}{http://pauillac.inria.fr/~xleroy/listmachine/}.
\end{milestone}
\begin{milestone}(2 years?)
As mentioned earlier, a key question is effective manipulation of syntax in the
presence of binding constructs. This is the focus of a series of challenge
problems called the POPLmark challenge:
\link{http://fling-l.seas.upenn.edu/~plclub/cgi-bin/poplmark/}{http://fling-l.seas.upenn.edu/~plclub/cgi-bin/poplmark/}
\end{milestone}
\begin{milestone}(10 years)
Develop a theory of compiler verification that is generic as far
as possible for different target processors and (less important as
changes occur less often) source languages. Such a theory should
provide general theorems that are applicable in many contexts.
Current state-of-the-art are rather specific considerations for
specific scenarios.
\end{milestone}
Develop the theory in such a way that compiler verification interfaces directly to
\begin{itemize}
\item the specification from which the target processor is
designed/generated
\item the programming language model that is used in verifying source
programs relative to program specifications.
\end{itemize}
There is clearly a relationship to ``unifying theories'' here. (Some older
work in this direction was done by the ProCoS project in the early
90ties by the groups of Hoare/Langmaack/Olderog/Ravn/Bjoerner, see
e.g., LNCS 1283, and by a national German project, Verifix).
If the compilers translate to an intermediate format like to the JVM
the target processor becomes less critical for the compiler proof.
However, we then have to study how to come to provably correct
execution mechanisms/interpreters for the JVM.
Just-in-time compilation involves both.
\section{Verification algorithmics}
The impetus for a Verified Software grand challenge is in part the recent
dramatic advances in automated analysis and verification.
These achievements combine theoretical advances, such as pushdown model checking
[Esparza;Steffen?...], with highly engineered implementations.
An important theoretical challenge is to develop a systematic and useful theory
of \emph{practical precision} and \emph{practical running time} of automatic
analysis/verification methods.
Empirical observations show that certain automatic verification methods with bad
worst case-complexity and/or imperfect precision work well for typical practical
problems. [[DN: examples? broaden to include, e.g., type inference with
polymorphic recursion]] A ``systematic and useful theory'' should explain this
and also allow to predict the practical behavior by theoretical means. Such an
explanation could be of great practical importance, as it could direct research
towards practical methods. The current state-of-the-art in this area is rather
of a trial-and-error kind where analysis procedures are tuned on the basis of
benchmark examples.
This topic is different from specification and verification of intensional
properties, where the challenge is to find means to reason about certain
properties for a class of programs, e.g., run time for lazy functional
programs. Here, the challenge is to predict resource usage, and characterize
inputs on which performance is good, for specific algorithms.
But work on either challenge may well shed light on the other.
\section{Unification of foundational models and logics}\label{sec:utp}
(Tony Hoare wrote this section.)
Unifying theories is one of the major goals for every branch of science,
and it is considered worthy of pursuit as an intellectual achievement
for its own sake. A successful unification demonstrates that a number
of existing experimentally supported theories are special cases
derivable from some more general theory; and the general theory often
turns out to be the simplest member of the family. Study of such a
unifying theory often suggests new special theories that can
independently verified and applied. Confidence in the validity of all
the theories involved is greatly increased. Quantum theory is a superb
example of such a grand unification.
In the study of the theory of programming and the semantics of
programming languages, there are already a great many elegant and useful
theories. Some of these have been incorporated into tools supporting
the various phases of a software project -- design, development,
testing, verification and evolution of computer systems, both in
hardware and in software. Many successful tools have exploited
particular characteristics of their intended application area, or they
are restricted to a particular design pattern or paradigm of system
implementation. Many tools are rightly targeted at only one phase of
the software project. For this reason, the most successful tools are
often used in an ad hoc combination with other tools, some of which have
a less secure theoretical foundation.
Unifying the theories that underlie wider ranges of software tools will
give a theoretical foundation for the combined use of the tools, each of
which can be more confidently specialised to its particular function.
It will ensure that there are no gaps in the chain of evidence that
validates the software product. It will reduce the conceptual
mismatches that unnecessarily complicate the learning of new tools. And
it will be a worthwhile intellectual achievement for its own sake.
Research into unifying theories can proceed simultaneously from the
bottom up and from the top down. Bottom-up research can start with a
selection of existing specialised tools that have proved their value in
application; and it can explore the unifying theories that would support
their use in combination. Where tools are in direct competition, a
unifying theory can help to resolve the issues in a sound scientific
spirit. In all cases, the possibility of removing unnecessary
complexity and inconsistencies will be revealed.
Top-down research is more idealistic, less tied to existing languages,
tools, applications, and processes. The goal is to provide a general
theory of correctness of programs, including their necessary links to
specifications. The challenge is to combine, yet keep separate, the
main features of modern programming -- concurrency, non-determinism,
object orientation, transactions, multi-stage, etc. Its methods are those of
programming language semantics. A successful unification will link the
presentation of semantics in the operational, algebraic and denotational
styles. The operational style is essential for program testing and
model checking; the denotational style is appropriate for the design of
specification languages and the proof tools that support reasoning about
correctness; and the algebraic style provides a useful bridge between
the two. The unification is essential to validate the reliable use of
model checking in combination with proof, and to formalise and prove
correctness of program verifiers and programming language compilers. And
the intellectual achievement would serve as a foundation for a major
strand in the study of Computer Science.
\paragraph{Research directions in Unifying Theories of Programming}
(Jim Woodcock wrote this section.)
\begin{enumerate}
\item\label{item:1} A theory of pointer structures~\cite{pointers-1}.
Pointers are essential for many real programs, including embedded
systems. They need to be smoothly integrated into the UTP framework.
\item A theory of Object
orientation~\cite{objects-1,objects-2,objects-3,objects-4}. OO is
the only programming paradigm used in industry, and a unifying
theory not handling OO well will be regarded useless by engineers.
No theory of OO modelling has been widely accepted. The reason
perhaps lies in the inevitable pointer analysis (see~\ref{item:1},
which is an extremely challenging topic, but we must tackle it
head-on. There is no way around it.
\item A theory of infinite data structures. Two reasons:
\begin{enumerate}
\item In many applications we do not need to deal directly with
pointers but rather with the regularly infinite data structures
that can be represented using pointers. In these cases it is
desirable to drop the representation step to simplify reasoning
and then refine into pointer structures. This may be an outcome
of the work in~\ref{item:1}.
\item We need to deal directly with lazy data types that are not
necessarily representable by pointers and require closures to
model them. Again although these can be modelled by closures the
representation-as-closures step adds complexity to the reasoning.
\end{enumerate}
\item A foundational theory of UTP logic or algebra and a theorem
prover based upon these foundations\cite{tools-3}. UTP is loosely
based on predicate logic and relational calculus where predicates
and relations are sorted (as in many sorted algebra) by alphabets
(and possibly, in some cases by types). The exact logic, however,
is left imprecise and this is a barrier to building theorem proving
tools and libraries of results that can be transferred.
\item A theory of information confinement. Information confinement in
programming languages is essential to reasoning about security and
safety. A general theory for UTP would be a great help in reasoning
about security and safety of systems described in UTP. The theory
would not only need to define information confinement but how it is
preserved through various forms of refinement.
\item Cross-platform and cross-language compatibility. This seems to
be an issue neglected by the Grand Challenge working group. Most
modern software nowadays involve multiple platforms (OS) and
programming (or scripting) languages. Can we provide tools for such
modern software? What is the role we UTP people can play? A popular
solution is middleware like Java, but that cannot solve all the
problems. Language translation (including compilation) is seen as a
possible solution.
\item Synchronous systems~\cite{hardware-1}. A key piece of ongoing
work is the construction of a group of related theories, all
connected with the notion of systems that evolve within the
constraints of regular time-slots, globally synchronised. The
theories differ in the precise nature of what transpires during each
slot --- in one variant we might consider a slot as a mini-CSP-like
process with ordered events and refusals, whilst in another we may
assume that events are simultaneous. These 'slotted' theories are
parameterised by a 'module' that describes the details of the
internal slot activity. Given such a module, all the main machinery
of a UTP theory, including healthiness conditions, are automatically
generated. Different slotted-theories can be interrelated, as any
Galois connection at the individual slot level can be lifted to one
over the full slotted theories.
This modular approach to building similar theories offers more than
just simply building theories "on we will investigate the cheap".
For instance, given an intended application of this work, namely the
semantics of Handel-C (a language for hardware compilation), we can
envisage slot behaviours of increasing complexity used to model
layers of behaviour in the Handel-C language. This would allow
reasoners to choose the simplest theory that is strong enough to
reason about a given program --- why use a slotted theory designed
to handle the prialt statement in its full generality for reasoning
about a program with no use of that construct?
A long-term goal of UTP research would be to discover other families
or groups of relating theories that could be modularised in this
fashion.
\item A unifying theory of probabilistic
programming~\cite{probability-1}.
\item A unifying theory of rigorous software testing\cite{testing-1}.
\item Component-based software systems~\cite{components-1}. The main
feature of the work is the separation of concerns with
specifications of functionality using pre and post conditions,
interactions with traces, and deadlock and livelock with failures
and divergences (or guarded state machines/automata). UTP is used
to ensure the consistency of separated specifications. The
advantage of this separation is that it allows us to use different
verification techniques and tool for the analysis of different
properties. The model allows us to characterise the software
architecture, refinement, and design patterns and supports both
verification and transformation in software development. Further
work will be the treatment of quality of service, such as time
fault-tolerance, in a component-based development. In the long
term, a tool support will be investigated for the approach.
\item Unification of more programming paradigms. This includes
angelic nondeterminism~\cite{angels-1}, hybrid control
systems~\cite{control-laws-1,control-laws-2}, hardware compilation
and hardware/software
co-design~\cite{hardware-1,hardware-2,hardware-software-1,hardware-software-3},
mobility\cite{mobility-1,mobility-2}, state-rich
processes~\cite{state-rich-processes-1,state-rich-processes-2} and
time~\cite{real-time-1,time-1,time-2,time-3,time-4}.
\item Unification with industrial design techniques~\cite{uml-1}.
\end{enumerate}
\paragraph{Linking theories for theorem provers}
In addition to connections between alternative program semantics,
connections are needed between the logics implemented by different theorem
provers.
This is especially true in the case of interactive provers, which use a number
of significantly different higher order logics.
For example, suppose PVS has been used to model a processor and its instruction
set, and HOL has been used to specify and verify correctness of a compiler
from some source language to ``the same'' instruction set.
Now suppose we want to prove intensional properties of source programs,
exploiting the particulars of the compiler and properties of the instructions
that are proved with respect to the processor.
It could well be that the two models can be sensibly combined because they rely
on subsets of the PVS and HOL logics have semantics-preserving translations even
though the full logics are incomparable.
A near-term milestone would be a linking theory that accounts for the semantic
connection between these two sub-logics.
A recent advance in interworking of disparate logics links ACL2
and HOL, motivated by the need for high confidence cryptography
implementations~\cite{GordonHuntKaufmannReynolds}.
\begin{milestone}
A unifying theory of theorem proving serving as a framework for linking
theories connecting different logics and subsets,
and to guide the design of future logics and interactive verification systems.
This is the basis to justify particular integrations of tools that are sound in
virtue of using only subsets of the logics as needed for a particular
verification.
\end{milestone}
To-do: need link to tool interoperability roadmap items: bridging
semantic gaps. (E.g., verify a compiler in HOL and a processor in PVS; justify
the combination, which may be sound only owing to the subsets of PVS and HOL
being used.) Logosphere.org is relevant.
\section{Conclusion}
This report is intended to help coordinate researchers who seek to meet the
theoretical challenges raised by the grand challenge, and to generate publicity
to broaden participation in the effort.
It is expected that new milestones and priorities will emerge as progress is
made and plans refined for experiments and tool development.
The following questions need to be answered.
Which challenge problems should be addressed first?
%In which time frames (say, 1-5, 5-10, 10-15 years)?
Who are the researchers/groups interested?
What concrete activities are to be taken to address the challenges and how
will these activities be integrated with the other verified software activities?
Researchers are invited to join the discussion of priorities, to contribute new
challenges and milestones, and to report progress and accomplishments.
\appendix
\section{Findings of the panel on experiments and tools}\label{sec:findings}
Agendas for research in theory may be formed opportunistically, driven by needs
of existing research priorities, or driven by applications.
Experimental needs take the lead in the Verified Software project and the panel
found it difficult to contemplate research challenges and milestones for theory
separate from envisioned experiments and tool development.
\subsection{Experiments in machine-checked tools}
\begin{milestone}(5 years)
Secure information flow checker for bytecode. This encompasses a verifier for
bytecode with security type annotations as well as machine-checked proof that
typable code conforms to security policy.
(Work under way by Barthe et al., Mobius project.)
\end{milestone}
\begin{milestone}(10 years)
Verification of an individual analysis tool (e.g., SAT solver, or Java type
checker), from high level requirements in terms of program logic down to the
code of a production implementation. This involves specification of the
analyzed language and its analyis and also the semantics of the implementation
language.
\end{milestone}
\begin{milestone}(15 years)
Complete verification of functional correctness of a realistic runtime
system, for instance a runtime system for real-time Java, including its
intensional properties such as space usage (code will not overflow the stack)
and time usage (the code is fast enough, and will meet its deadlines).
\end{milestone}
\subsection{Experiments in machine-checked metatheory}
This section outlines some challenges to be addressed with respect to
formal verification of programming language metatheoretic properties
(such as type safety) using theorem provers and other proof assistant
technology.
\begin{itemize}
\item Developing appropriate theorem proving infrastructure for dealing
with alpha-conversion and manipulating language syntax with variable
binding structure. Several techniques, such as de Bruijn indices,
higher-order abstract syntax, and nominal logic, have been proposed,
but they have been mostly applied in specific settings. More
general support for structures-with-binding is critical.
\item Developing a library of modular components, such as type
environments, finite sets, etc, that can be reused in different
language formalization contexts. Such a library might consist of a
collection of domain-specific data structures for manipulating
programming language syntax, specifying inference rules, and
defining semantics, along with appropriate collections of lemmas and
tactics for manipulating those data structures.
\item Demonstrating the applicability of the tools and techniques
developed to address the first two challenges for problems relevant
to programming language type systems. Here, issues of scale and
expressiveness can be tested: how well do the mechanized proofs
scale to real programming language concepts and how well can they be
used for standard proof techniques such as logical relations