Open Proof FAQ
An algorithm must be seen to be believed. - Donald Knuth
This is the "open proof FAQ", which explains what open proofs are, why they are needed, and how the openproof.org website will (hopefully) help.
What's the underlying problem the "open proofs" site is trying to solve?
The problem we're trying to solve is that the world depends on having secure, accurate, and reliable software - but most software isn't. This is especially obvious for security, since the software we depend on is routinely subverted.
In the short term the world is using band-aids like patch distribution systems and firewalls. The IT security community has also developed tools that search for "likely vulnerabilities" (the webmaster's flawfinder is one such tool) or "likely errors" in source code. But none of these approaches ensure the absence of vulnerabilities or other problems - they just try to reduce their number or effect. If there's enough at stake, attackers will continue to attack systems - and as the world moves to using computers, there's plenty at stake.
Testing can find some problems, but testing by itself is inadequate because it's impractical to completely test real programs. Completely testing a trivial program that merely adds three 64-bit numbers would take about 49,700,000,000,000,000,000,000,000,000 years if you used a trillion superfast computers. Real programs are far more complicated.
In many circumstances we need "high confidence" (aka "high assurance") software - software were there's a reasoned argument that the software will always (or never) do something important. This in turn essentially requires the use of verified software, where formal methods (which apply mathematical techniques to prove properties of software) can show that the software really does what it's supposed to do, and nothing else.
Developing verified software is currently very difficult to do, or improve on, in part because there are few fully-public examples of verified software. Verified software is often highly classified, sensitive, and/or proprietary. In addition, research papers are too short and abstract to present meaningful examples. This impedes progress by everyone involved:
- Software developers who have never used them do not believe such approaches can be used in "real software development" (since there are few examples) and/or can't figure out how to apply them
- Tool developers often cannot improve tools by using real-world experience. They are also impeded in their ability to learn from other tools, and impeded in integrating their tools together
- Users do not know what to ask for, since they have difficulty determining what can (and should) be done
- Teachers have difficulty explaining them, and students have difficulty learning them, since they lack examples
- Current practitioners often have difficulty explaining their approaches to others, since they cannot provide detailed examples from their work.
In 1637 Pierre de Fermat wrote, "I have a truly marvelous proof of this proposition which this margin is too narrow to contain." Mathematicians didn't accept Fermat's word for it; they demanded public evidence. Today, people publish "experience reports", yet again they declare that the margins are too narrow to contain their implementation and proof. They get the same result; the general computing community doesn't accept those claims. Instead, there needs to be some way to make fully-worked examples of verified software available to all. Both small and large examples are needed. There are other tools and practices necessary for developing high assurance software (software configuration management, peer reviews, testing, and so on), but these are already in wide use.
Software has additional challenges, compared to a mathematical proof. Software normally undergoes change due to changing conditions and needs. Many potential users of verified software examples require that flexibility. So just publishing unchanging software, with an unchanging proof, isn't enough. We need a number of "open proofs", which can be modified and improved.
Why should code and evidence be published at all? Why not just announce "We did it"?
The FM community has been publishing "experience reports" for decades, which describe generic approaches or high-level descriptions of the problems they were trying to solve. But while entertaining, they typically do not really convey enough information to explain how to do it in real life, or to convince people that it might be worth doing in some circumstances. It's also difficult to build on their previous work.
Hundreds of years ago, mathematicians routinely hid their proofs, but mathematics has grown up since then. Fermat claimed that he had a proof for "Fermat's last theorem", but mathematicians refused to accept the theorem until it was publicly proved by Andrew Wiles. It's time for software development to grow up too, including published proofs of publicly-available programs.
What is an "open proof"?
Software or a system is an "open proof" if all of the following are free/ libre/ open source software (FLOSS):
- the entire implementation (requirements, design, code, required documentation for use/maintenance, etc.)
- automatically-verifiable proofs of at least one key criteria, and
- all required tools needed for use and modification of the above (e.g., compilers, editors, proof-checkers, proof assistants, etc.)
FLOSS is licensed so anyone has the freedom to use it for any purpose, to study and modify it, and to redistribute copies of both the original and modified information without having to pay royalties. FLOSS must meet the criteria of both the Free software definition and the open source definition. With such rights, developers can build on the examples to build larger works, teachers and students can use the examples for learning and research, and tool suppliers can use real examples to improve tools.
What's the advantage of having FLOSS code, proofs (including specifications), and evidence?
Publishing unchangeable implementations (including specifications and code) are much less useful as examples. We want developers, tool-builders, teachers, and even students to build on the examples, and have the examples improve over time. Indeed, we want at least some examples to grow into useful programs where appropriate. Besides, the environment changes over time, new FM techniques are discovered (and must be tried out), and requirements change too. That implies that FLOSS examples will be much more valuable than unchangeable "source available" programs.
What's the advantage of having FLOSS tools for open proofs?
Technological advancements in formal methods (FM) tools are sometimes held back by lack of published code. For example, "The Evolution from LIMMAT to NANOSAT",Technical Report #444 by Armin Biere (Dept. Computer Science, ETH Z¨urich, CH-8092 Z¨urich, Switzerland, Computer System Institute, April 2004) reports that their efforts to "cross check the impressive results the authors of CHAFF" had problems. "From the publications alone, without access to the source code, various details were still unclear... what we did not realize, and which hardly could be deduced from the literature, was [an optimization] employed in GRASP and CHAFF... Only [when CHAFF's source code became available did] our unfortunate design decision became clear... The lesson learned is, that important details are often omitted in publications and can only be extracted from source code. It can be argued, that making source code of SAT solvers available is as important to the advancement of the field as publications".
FLOSS tools enable advances to be quickly disseminated and improved on, accelerating the capabilities of such tools.
Many developers are wary of depending on proprietary tools (for a discussion of this, see Free But Shackled - The Java Trap). There are a variety of reasons for this, for example:
- it's hard to be sure the tool is correct. Having a tool you can examine, and change if you need to, greatly reduces your risk when using it.
- a tool vendor might go out of business, abandon the tool, change the tool in incompatible ways, or "raise rents" beyond what you can afford. This is not an idle concern; tools such as Z/Eves, ESC/Java, and Simplify were once widely used, but they were proprietary and when their backing disappeared, they became impractical (or illegal) to use.
What other properties should open proofs have?
To be useful as examples, examples should be clearly relevant or similar to typical software development. Both small and large examples are useful; small ones are better when teaching, and larger (more useful) ones are useful as building blocks.
The few examples that do exist usually don't meet this criteria; instead, they tend to be too trivial ("compute the factorial") and/or esoteric (chess puzzles, garbage collection algorithms, and security protocols). But we will identify the examples that exist, and over time we hope the list will grow.
Software or systems that are extremely different from "typical software development" are still open proofs if they meet the definition, and we'll be happy to identify them as such. But we'd like to especially encourage developing software that's more similar to "typical" software.
Will you accept "partly open proofs" in the list of open proofs?
Yes. To make that simpler, I propose using the following terminology where appropriate:
- Proof-available: the entire implementation and proofs are available for reading and local processing, sufficient to allow for re-verification of the proof. The required tools need not be FLOSS. This is similar to a common meaning of "source-available" (aka "open box"). This is enough to verify claims made by another, including those necessary for "proof-carrying-code".
- Proof-locally-maintainable: the entire implementation and proofs are available for reading, local processing, and local modification, sufficient to allow for user repair of implementations and proofs The required tools need not be FLOSS, and that there's no assumption that the implementation and proof be redistributable. This allows some kinds of limited classroom use, but for most academic purposes this is useless.
Form the point-of-view of users, open proofs are better than these alternatives, because with open proofs they can do more with the results. Open proofs can be reused (as libraries or as starting points) to build larger systems, they can be freely discussed and improved by a community of users, they can be publicly used with new tools (by modifying the open proof for a new notation), and so on.
But components with fewer rights still have some value for their users. Having at least the implementation and proof, even if it can't be changed, allows users to verify the proof of that implementation. That's a big advantage to users over current evaluation models, where evidence is secret and thus unverifiable by users. If the implementation and proofs can be modified locally, at least users can make emergency repairs themselves and verify those properties, as well as use them in some kinds of classroom exercises. In addition, there are so few examples that for educational purposes it's better to point to some partially-open proofs than to have none at all, so that they can learn from them (speeding the development of open proofs). Some business models may not be able to support development of fully open proofs, and in those cases what is possible should still be an improvement.
As more open proofs become available, lists of these partial proofs will be shunted to separate pages.
Do I have to believe that all software should be open proofs to participate?
No. Some people do believe that all software should be open proofs, but that's not a requirement to help us. Developers of classified or proprietary software can benefit from open proofs, because they can learn from them and then apply that knowledge to classified or proprietary software. Proprietary tool vendors also benefit from having open proofs, because they'll be able to (much more easily) examine the open proofs, discuss them in public forums, learn what is needed to better support them, and so on. Proprietary tool vendors can learn from the open proofs tools, and in many cases build on the tools themselves.
There's no requirement that all proofs need to be open proofs, but if we want to have more high assurance software, we need more open proofs to help us get there.
Are specific FLOSS licenses required?
No specific FLOSS license is required, but it needs to be a FLOSS license, and the license needs to be clearly marked.
We strongly encourage the use of existing widely-used licenses that are compatible with most FLOSS software - and strongly discourage creating new licenses. FLOSS components are much less useful if they cannot be easily combined with other FLOSS components. Incompatibility is much less likely if you use a common FLOSS license (e.g., GPL, LGPL, BSD-new, or MIT licenses), or release it to the public domain. See Make Your Open Source Software GPL-Compatible. Or Else. for more about why selecting a common FLOSS license is important, and what those common licenses are. The FLOSS license slide shows how some common FLOSS licenses are compatible.
Do I have to prove implementations (code) is correct?
No, proofs don't have to be at the code level. There need to be many examples, covering different types of programs and toolsuites, since we want people to be able to apply these ideas in many different circumstances.
In particular, there should eventually be examples of levels 0, 1, and 2 of FM, which some define as:
- Level 0: "Formal methods lite" - A formal specification is created (using mathematics), then the program is informally developed from it.
- Level 1: Level 0 plus (a) refining the specification deeper, and/or (b) proving important properties about the specification/model.
- Level 2: Prove that the code matches the specification. This "gold standard" is difficult with current technology.
However, to be an open proof, there still needs to be some implementation that's supposed to match the model, even if you're only doing a level 0 proof. That's because people need to be able to see how go from a high-level model to a specific implementation.
Are model-checking approaches acceptable?
Absolutely - model-checking is fine.
Aren't formal methods useless?
No. First of all, they're already being used for small-scale and narrowly-defined problems.
More broadly, current formal methods approaches have trouble scaling up to large systems, and also tend to have trouble adapting to requirements changes. But there's no fundamental reason this must be so. To be practical, FM must be automated by tools that can handle larger scales and can handle changes more gracefully.
What we hope to do is share OSS FM tools that are available, encourage development of implementations and proofs that use those tools, and through this mature both the tools and our understanding of how to best use them. By starting small, we can identify some of the key tool challenges, and feed those back to toolmakers so that they can improve. As those improvements are placed in tools, they can be used to handle larger and larger problems. In addition, larger proved systems can be built by starting with smaller proved systems and components. This will be a long-term process, but if we want it, we need to start now.
What's the difference between open proofs and proof-carrying code (PCC)?
Proof-Carrying Code (PCC) is a software mechanism that allows a host system to verify properties about an application via a formal proof that accompanies the application's executable code. In both open proofs and PCC, a user/host system can verify that a particular program meets a particular claimed property by re-examining the accompanying proof.
But there the similarity ends. If a host or end-user wants to prove a different property than what the PCC provided, the PCC need not include enough information to do that easily. If the host or end-user wants to change the executable code (to fix, improve, or customize it), it tends to be difficult; often the user does not even get the source code! If the host or end-user wants to understand or improve on the underlying tools used to develop the proofs, again, there's no guarantee that the end-user will receive tools either. Thus, PCC and open proofs are really quite different.
Why hasn't this been done before?
While this could have been done before, applying FM has become more feasible than it used to be:
- Newer tools, such as Why/Krakatoa/Caduceus and Coq, make some work much easier than before
- Older tools, such as ACL2, PVS, and Prover9, have had a number of refinements or re-implementations
- The availability of multiple FLOSS FM tools makes it much easier to integrate different tools (with different strengths) into larger, integrated wholes
- Improved algorithms and approaches (such as for SAT solving) have greatly increased the size of problems that can be automatically solved
- Much-increased CPU speeds and parallelism have made FM much more practical (particularly when using model-checkers) than they were in years past.
In addition, in the past many questioned if FLOSS approaches were practical methods to develop software or other information. The rise of FLOSS projects (such as the Apache web server, the Firefox web browser, and the Linux kernel) have ended that debate for software; even Microsoft now supports FLOSS development through its Codeplex site. FLOSS and FLOSS-like approaches have also become widespread in other information fields; the Creative Commons movement, Wikipedia, and NIH's embrace of public access all demonstrate this.
How do you plan to encourage development of open proofs?
- Defining a term ("open proofs") for the idea. Then establish a website explaining the concept and why it's needed. You're looking at that website now!
- Identify existing software which could be released as open proofs, and get them to release it. There are many existing examples where there's no valid reason to withhold it (such as classification, export control laws, or business models preventing it). In many cases, a government paid for its development, and has all the necessary rights to release it. Currently, many examples are rotting in storage like the conclusion of "Raiders of the Lost Ark"; let's get value for the money spent!
- Identify and briefly examine especially promising FM tools that could be used for this purpose, and publish some sort of document discussing them. That way, others can more quickly understand their options. Examples include Alloy, PVS, ACL2, Why/Krakatoa/Caduceus, Coq, and BLAST. Until there's such a list, see David A. Wheeler's "High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS)... with Lots on Formal Methods"
- Encourage many to identify a few trivial OSS functions already in use (e.g., in the C library), and develop open proofs of them using extant tools. These small-scale experiments will help many others. The website can then serve as a way to let others know about these open proofs.
- Encourage creation of pre-packaged suites of tools (e.g., for Fedora Core). Installing these tools can be painful; make it easy, to encourage their use.
- Establish ways where computer science / software engineering academics who develop "open proofs" (including lengthy ones) can get peer review and academic credit. Currently, publishing a paper with an idea helps their career; developing examples/proofs, including the details, do not. The mathematics community has established norms where failing to publish the details of a proof is unacceptable, while in CS, not only are such details not expected, there isn't even a way TO publish them in a way that grants credit or that meets the "open proof" criteria. Currently I don't know how to do that; suggestions welcome.
- Find a way to get larger open proofs developed and published as open proofs. I think that until there are more small-scale open proofs, this will be harder to do. There are some potential starting points that already exist, such as Coyotos, L4 work, the GNAT Pro High-Integrity Edition (HIE), and the "Trusted Exemplar" project.
I expect that FM tool developers will improve their tools based on the lessons learned from open proofs as they become available. The same is true for professors and developers. Once there are examples that people can learn from, and improve on, there's reason to hope that it would become a "virtuous cycle". Many COTS OSS leaders actually have substantial mathematical training, and are very interested in making their software very secure or very reliable. But they will not use FM just on someone's say so; if we think they're useful, someone needs to "prove it".
Are test cases required?
To be an "open proof", no. You have to apply mathematical techniques in some way, but not test cases, to be an open proof.
Now, to be high assurance, as a practical matter you need many test cases. Proofs are only as good as their assumptions, and test cases are very useful for partly checking many assumptions - including assumptions you didn't realize you were making. Sometimes specifications have errors, and test cases can quickly spot many such errors.
It turns out that traditional testing, plus the use of FM, make a really good team. Tests can only test very specific sets of inputs, and as noted above, they cannot even begin to test all possible inputs. So formal methods can check "all possible inputs" against specifications given certain assumptions, while test cases can be useful in checking the specifications and assumptions.
But today, our problem isn't really a lack of knowledge of how to test. Most developers know (to at least some level) how to test programs, and there are many books and examples to show how to test programs. There are also a number of easily-available tools (such as gcov) to help create tests or determine the quality of a test suite, many test frameworks that ease automated testing, and examples of test suites are easy to find. In contrast, the knowledge of how to apply FM to develop verified software is not well-known, and examples are almost non-existant.
What are some similar projects?
The entire Verified Software "Grand Challenge" work is related, and in fact, the idea of "open proofs" is a complement to the "grand challenge" work. The "grand challenge" work wants to create proofs of larger systems, but it fails to make clear what people will be allowed to do with the works once they are created (at great expense). Wheeler thinks this is a terrible mistake; creators should never start developing software or evidence until they know how it will be released. If the license of verified software prevents building on it, then there is a very high risk that all that money and effort will be wasted. Thus, "open proofs" can be considered a practical strategy for implementing the verified software grand challenge, by eliminating legal roadblocks to implementing and using the results of the verified software grand challenge.
The Verified Software Repository is set up to store programs, but doesn't clearly define terms like "open proof".
FLOSS-FM International Workshop on Foundations and Techniques: bringing together Free/Libre/Open Source Software and Formal Methods.
The TYPES project does seem to be intentionally exploiting open source software approaches to improve software. The TYPES project is a "coordination action" in EU's 6th framework programme. The aim of its research is to "develop the technology of formal reasoning and computer programming based on Type Theory. This is done by improving the languages and computerised tools for reasoning, and by applying the technology in several domains such as analysis of programming languages, certified software, formalisation of mathematics and mathematics education." The popular press article Goodbye to faulty software? (ICT Results) describes the types project, noting that the "TYPES partners are also releasing open source software packages that anyone can download, use and modify. These packages include several "proof editors" that, in type theory, are the key to guaranteeing the correctness of programs... The open source principle... is fundamental to what they are trying to achieve. 'It's important that anyone can evaluate the code and check if it is correct, so it's inherent in this project that what we are doing should be open so that it can be discussed by everybody.' Results from type theory are already finding their way into other projects....".
The report "Shared Responsibilities: A national security strategy for the UK" by the Institute for Public Policy Research (ippr) Commission on National Security in the 21st Century (30 June 2009, ISBN 9781860303258) includes "Recommendation 60: The Government should also approach the European Commission and the incoming Swedish Presidency to sponsor a programme for the creation of a range of secure and reliable standard software modules (such as simple operating systems, database management systems and graphical user interfaces). These modules should be developed using formal methods and be made available free of charge through an open source licence to encourage their widespread use." The entire report has a small cost, but a summary (including its recommendations) can be obtained at no charge without registering.
The Computational Infrastructure for Operations Research (COIN-OR**, or simply COIN) project is an initiative to spur the development of open-source software for the operations research community. As they state on their web site, "Why open source? The Open Source Initiative explains it well. When people can read, redistribute, and modify the source code, software evolves. People improve it, people adapt it, people fix bugs. The results of open-source development have been remarkable. Community-based efforts to develop software under open-source licenses have produced high-quality, high-performance code---code on which much of the Internet is run. Why for OR? Consider the following scenario. You read about an optimization algorithm in the literature and you get an idea on how to improve it. Today, testing your new idea typically requires re-implementing (and re-debugging and re-testing) the original algorithm. Often, clever implementation details aren't published. It can be difficult to replicate reported performance. Now imagine the scenario if the original algorithm was publicly available in a community repository. Weeks of re-implementing would no longer be required. You would simply check out a copy of it for yourself and modify it. Imagine the productivity gains from software reuse! Our goal is to create for mathematical software what the open literature is for mathematical theory. We are building an open-source community for operations research software in order to speed development and deployment of models, algorithms, and cutting-edge computational research, as well as provide a forum for peer review of software similar to that provided by archival journals for theoretical research. This is a lofty goal, but we believe it's a worthwhile one. We have ideas, but we don't have all the answers. Only the community of users and contributors can define what is needed to make it a reality."
Openpacket.org is "a Web site whose mission is to provide a centralized repository of network traffic traces for researchers, analysts, and other members of the digital security community." Security tools for network analysis are difficult to maintain without some publicly-accessible dataset to work with. If organizations can manage to reveal their internal network traffic, then surely we can find a way to get program proofs out there.
Kind Software's software section says the following (as of 2008-08-05): "KindSoftware believes in producing and supporting Open Source Software (OSS). We are not in the 'information wants to be free' crowd, thus we don't do OSS for philosophical or pedagogical reasons. We believe that software created by collaborating participants in an open, non-proprietary, standard-compliant, brains-before-brawn fashion results in a higher quality product. We also believe that such a creation methodology coupled with OSS licensing is for the good of the user and consumer, and thus good for society. We manage the development of several pieces of OSS... [and] we are major contributors to the new version of PVS, which we hope will be available here in the future."
vdash.org is a trying to develop a "wiki of formally verified math".
The Open-DO Initiative is an Open Source initiative with the following goals:
- Address the "big-freeze" problem of safety-critical software;
- Ensure wide and long-term availability of qualified open-source tools and certifiable components for the main aspects of safety-critical software development;
- Decrease the barrier of entry for the development of safety-critical software;
- Encourage research in the area of safety-critical software development;
- Increase the availability of educational material for the development of safety-critical software in particular for academics and their students;
- Foster cross-fertilization between open-source and safety-critical software communities.
Project Hi-Lite aims to popularize formal methods for the development of high-integrity software; it is related to the Open-DO initiative.
Seneca College has the The Seneca Centre for Development of Open Technology "provides a physical and virtual environment for the development and research of open source software through collaboration with Seneca (College)". The Seneca project list is quite broad. Cory Doctorow mentions Seneca as a positive example of academic/FLOSS collaboration.
In general, students own the copyright to works they produce in universities and colleges, so typically students can release their works as open source software (as long as they are their works).
Projects which try to help collaborative development of critically important software are of interest, too. Forge.mil is a web-based software collaboration tool; GCN ran a July 2009 article about forge.mil.
Who do I blame for this?
Creating this site was David A. Wheeler's idea. He's thankful that others have come alongside and helped him make this a reality - thank you! In particular, Wheeler thanks Alan Dunn, who has done a lot of the legwork to make the initial site operational and creating the first packages (so people can do one-click installs of useful tools). Wheeler also thanks Mark Rader and many others who have helped.
How can I get in contact with you?
You can use the site's email list (http://lists.openproofs.org/mailman/listinfo/openproofs for information and signup) for discussion. No spam, please. Thanks.