Design for a Cross Compiler Regression Testing System - - - [Proposal #3] Original Draft 16Sep88 - smh - - - The Problem. All the Lisp code that runs on the Falcon is dependent on one or both of the two compilers and their underlying runtime support. If a compiler or the runtime support fail to implement proper semantics in some way -- whether because of an original implementation error or a unintended change -- it can cause unrelated code to fail in ways that are difficult to debug. Therefore, it will be helpful to have a testing procedure for compiled code and the runtime support. This procedure can be invoked periodically, and particularly whenever these components have been changed, to verify that they still implement Lisp semantics. Although this test cannot be expected to identify every failure that would effect other parts of the system, it can at least detect a large fraction of compiler/runtime bugs that might otherwise consume great amounts of debugging time. Alternatives and Cost. The primary alternative is to do nothing, and find compiler and runtime system bugs through normal debugging. This is seductive, since we will have to work this way anyway whenever a bugis not detected by the testing procedure. However, the testing procedure wins if it saves more debugging time than it costs to implement. (This ignores the subliminal payoff of being able to debug dependent code with greater security that the compiler works correctly.) There are some additional payoffs which should not be ignored, however. For instance, some of the tests (but not the automatic procedures for running them) have already been and are being created during compiler development and debugging. Also, a compiler vs language test suite is a valuable future technical property; it will likely someday be necessary to implement a new compiler, or else port this compiler to a new version of the processor, and the established test cases will be just as valuable the next time around. One immediately forseeable future use for the test suite is to check out the port of the cross compiler to run natively on the Falcon. It is difficult to estimate in advance the cost, and especially, the payoff. However, it ``feels'' safe that the test suite and execution support code would be worth at least a man-week, as it is clearly likely to save at least that amount of time in other debugging. Test System Design. The system can fail in essentially two ways. First, one of the numerous internal compiler consistency checks can be signal an error at compile time. Second, the code can execute incorrectly at run (or load!) time. Run-time failures are usefully subdivided into two types. For current purposes, let us define a ``test failure'' as an incorrect execution of a program construct -- for instance, returning the wrong number of values from a COND clause. A ``test crash'' can be defined as a test which crashes the machine when it executes. The nature of the Falcon as a stack-frame machine perhaps makes crashing more likely than failing, compared to (say) the Lambda. For this reason it is necessary to design a testing procedure that will be very robust about reporting what it was about to do even when the execution environment gets completely blown away. It should also be possible selectively to bypass certain conditionalized tests so that features known to be temporarily broken will not prevent the rest of the suite from being run without special efforts. This is important because during development a compiler or runtime system can be known to be broken with regard to some particular capability for a considerable period, and one wants to be able to run the test suite during that time. The test suite will consist of a large number of test definitions. Most are expected to be DEFUNs, but DEFMACROs, DEFTYPEs, DEFSTRUCTs, and others will also be necessary to test these constructions. Each ``top-level'' form in the suite will be wrapped inside a test-suite control macro which serves two purposes: First, it allows a unique identifier to be assigned to each test or sequence of tests; and second, it provides a hook for controlling test execution. Here is a nonexhaustive illustration of some related test forms. Their complete specification will follow. ================================================================================ ;; This is a useful helper function. It is used instead of IDENTITY because ;; the compiler does not have an a priori optimizer for it. (DEFTEST () :LOAD (DEFUN IDENTITY-1 (X) X) ) ;; This tests that neither the compiler nor runtime stack are corrupted by ;; throwing out the computation of a catch tag. Both native Lambda and ;; Falcon cross compiler trip up over this. (DEFTEST THROW-FROM-CATCH-TAG-1 :RUN (DEFUN THROW-FROM-CATCH-TAG-1 () (CATCH 'FOO (IDENTITY-1 (CATCH (THROW 'FOO) (BAR X)))))) ================================================================================ A fundamental problem in any test system is that whenever something breaks, it might cause a complete crash, and thereby prevent execution of the rest of the test. There is no completely general solution to this problem, but it is possible to minimize exposure to lossage. Failures can be divided into two classes, depending on when it happens: failures during compilation, and failures during execution. A failure during compilation is the signalling of an error by the (cross) compiler. This can be either an ``anticipated'' error from an internal consistency check, or else it can be an ``unanticipated'' exception in the compiler code. In either case, an error is signalled and the normal error mechanism receives control. The compiler's error handlers normally report the top-level form in which the error occurred. While it might be nice to have some automatic mechanism for recording the error and continuing automatically with the next form, this job can easily be done by a human. The only really important design consideration is that the test forms passed to the compiler be easily identifiable by error-handler-visible names. A failure during execution on the Falcon is more serious, primarily because the Falcon software system is not very robust in the face of runtime or compiler bugs, and because the Falcon lacks a robust and mature debugger. The system needs to provide (at least) the following controls An alternative approach has been discussed in which the compiler output would be captured in source form and compared against previous output. This has been rejected here for two reasons. First, code details generated by the compiler can be expected to change from time to time, and updating the test result text would impose unpleasant overhead. Secondly, this system would not verify correct execution of the code in conjunction with the runtime system. It is precisely this guarantee of correctness that other code developers require. :compile-only :load :exec