Y2K Site Map | Terms of Use | Problem | Steps | Certification | Briefings | Compliance | Solutions | BIOS | Test & Evaluation | Cost


The following Y2K material has been kept available by MITRE for historical purposes only and has not been updated unless noted.

MITRE - Y2K - Compiler Solutions Paper
Compiler Solutions Paper


One formulation of a century compliant date-handling system for many legacy systems duplicates an Ada solution using prefedined types and operations, tailored to the particular development and even specific application formats. This would involve declaring appropriate types, appropriately mapping allocation and other implied operations, defining the exteneded operations needed for manipulating the converted values, and providing for all conversion and assignment contingencies. This is fairly complicated, and could be the subject of an extensive detailed design involving just Ada facilites.

Some appropriate type definitions might be:

   type CC is range 19..30;
   pragma PACK CC;
   type YY is range 0..99;
   pragma PACK YY;
   type MM is range 1..12;
   prgama PACK MM;
   type DD is range 1..31;
   pragma PACK DD;

   type CCYYMMDD is record
        Century : CC;
        Year    : YY;
        Month   : MM;
        Day     : DD;
   end record;

   pragma PACK CCYYMMDD;
There is no obvious reason why compilers wouldn't represent this in four bytes, not taking any more space than the "array(1..6) of character" that is the symptom of the Y2K problem. Why six byte data types were used originally in COBOL is anybody's guess (i.e., explanations abound but none are really regarded as complete), but one overriding (if not entirely conscious) motive might have been the implicit constraint of modeling the input from users or other media (screens, messages, other devices?) with minimal conversion. What is a legitimate question to some, given a recent trends in responses to our online solutions suggestion form. is the use of traditional compiling and building techniques, i.e., using linguistic approaches like "pragmas" and special date libraries with the ability to deal with as many formats as are appropriate to an application.

The apparent conceptual leap (at least for user-friendly DB interfaces in the 3-4GL world) is the kind of mapping needed to deal with useful internal representations like the one above. Of couse, this is nothing new to compilers, but is a significant issue in the data processing world, possibly as much for psychological as for overall technical and system issues. The following addresses what modern compilers might do with the above strawman "updated representation". What organizations might do is another question, but is similarly undeterminable without a major improvement in modern multi-organizational communication. Why this is evan worth raising as a point of consideration here might be best illustrated by the controversy about language choice so well known in the programming world, and more recently in multi-system data exchange issues raised by the CORBA/ActiveX controversy.

The conversion issues are certainly where the complications lie. Most compiler generated code for operations would need to convert the days and months to ordinals locally before any manipulations, at least conceptually. And of course, the year has to be mapped to a four digit value. All the conversions are different, as well as the representation needed by many of the operations. Sorting is the most commonly cited special case, where the day and month conversions can sometimes avoid the intermediate ordinal representations, but the year expansion is crucial. Simplifying requires going to a Julian-like representation, which may be the closest to an obvious candidate for a universal intermediate representation. One interesting issue is how often actual code really has to use such a complete conversion (or more accurately when would it do so optimally).

Thinking of other languages than Ada really makes the problem interesting. Presumably in C++ anything that can be done by Ada packaging, clever typing and operator definition (and generics if necessary) can be done in C++ and other OO languages by encapsulation, appropriate OO mechanisms (and templates if necessary). In C, the number of mappings needed is truly challenging, but of course programmers do it all the time (and are even known sometimes to suggest it is intuitively obvious what is going on). Whatever the actual complexity, it seems the problem we need to address involves finding a way to deal practically with it. One solution is to think of compilers as often allowing some perturbations of fundamental representations and processing, and try to think of our problem as minimizing the perturbations, subject to necessary local variations in how each individual compiler would find the least cost solution.

Whether existing vendors are likely to find this kind of analysis in any way useful or even interesting is the final quandary. Built-in data, types, and operations are not really a problem for compilers. The ability to tailor representations is traditionally a difficult barrier since data allocation is often one of the most rigid or highly tailored and optimized parts of a compiler. On the other hand, symbol tables and various intermediate representations are actually more flexible than vendors like to think of them as being. Their basic declarations are usually well supported by compilers, builders and flexible configuration control/scripting frameworks that make minor changes comparable to lexical and grammatical modification. Of course, the latter (lexical mechanisms and grammars) are the easiest to modify, though no vendor likes to take on the testing required even with such well controlled changes without considerable good reason. Perhaps one of the positive aspects of the Y2K date calculation dilemma is that it shows there may be reasons for thinking seriously about how flexible compilers really are. In particular, a case could be made (and maybe we should try to make it here) that compilers are really the most cost effective way to think about doing controlled modification of existing systems in order to fix the Y2K problem with the least risk and expense. (Of course, we know it may really be too late, but still the approach may be worth exploring for future problems.)

Another interesting issue is the actual choice of internal representation. Whether many observors not familiar with compiler storage allocation would think the "strawman" representation above is useful or not, the actual details are to some extent ideally under user control (as in Ada record representation clauses for example). Suggestions such as one bit to indicate the century in an otherwise familiar COBOL YYMMDD six character representation are certainly interesting, possibly for reasons of user-interface compatibility over a range of special applications. Anyway, this perhaps only corresponds to the distributed architecture version of the original intent of the kind of representation control that was provided by record representation clauses in Ada (and do generally have corresponding language/compiler convention facilities in most DoD development systems).

A tangential issue is the relationship to software change management. The fixing step of the Y2K problem is generally acknowledged to be around ten per cent of the effort of the total project. This is the phase compilers are traditionally directly involved with, so the question of why bother becomes interesting. The reason is really that having the extended compilers do the changing must affect other phases. One fundamental issue (that gets involved in thinking about Y2K testing) is risk reduction, since a considerable amount of the remaining nintey per cent of effort is an attempt to deal with risk factors in identification of problem areas so that they can be fixed and then reintegrated. What a compiler oriented solution might supply is control of the risk factors associated with:

Of course, all automation has some of this effect, but given the critical role of source code, controlling the effects of changing and rebuilding with it is arguably the best way of finessing these three risk areas. The complete paper has some detailed code for compiler extensions and is available for downloading.



For further information directly related to these issues, please contact Year2000@mitre.org


Last modified: Thursday, 14-Feb-2008 09:21:05 EST