.TL A C Language-Based Modular System for Analyzing and Displaying Gridded Numerical Data .AU David J. Raymond .AI Physics Department and Geophysical Research Center, R&D Division New Mexico Institute of Mining and Technology Socorro, NM 87801 .AB .ls 2 A system for analyzing and displaying gridded numerical data called Candis is described. The system is written in the C programming language, and is built on a standard way of representing such data. The analysis package is modular, hierarchical, and extensible. Facilities available on the .UX operating system enhance its ease of use. .AE .DA .ls 2 .EQ delim @@ define prx "\'" .EN .LP .bp .NH Introduction .PP Over the years, a number of formats have been introduced for representing numerical data produced by various observing systems and numerical models. These formats have the advantage of providing a well-defined standard for the benefit of analysis programs. Examples are the Block Data Set (BDS) of McPherron (1976), the Universal Format for meteorological radar data (Barnes, 1980), FLATDBMS of Smith and Clauer (1986), the Common Cartesian Format (CCF) of Mohr et al. (1986), and the Common Data Format (CDF) of Treinish and Gough (1987). These systems typically isolate the programmer from details of data representation by providing a standard set of access subroutines. Data files are generally self-defining, in that all the information required to interpret a data set is included in the data set. .PP A common feature of all these systems is that access routines are written in Fortran. One disadvantage of Fortran is that dynamic allocation of arrays is not available except possibly in non-portable extensions on particular operating systems. This makes the construction of general purpose analysis programs awkward, as the largest array ever expected must be allocated space at compile time. .PP With the spread of the UNIX operating system to a wide range of hardware, the C programming language (Kernighan and Ritchie, 1978) is becoming widely available. The C language is generally accompanied by a set of libraries to do mathematics, input and output, and other functions such as dynamic memory allocation. Though not strictly part of the language definition, these libraries are quite standardized. The ability to dynamically allocate memory plus the variable pointer facility of C allow the construction of compact and general analysis programs. .PP This paper describes a system written in the \fIC\fR language for the \fIan\fRalysis and \fIdis\fRplay of gridded numerical data (Candis). As with the systems described above, Candis is based on a standard way of representing numerical data, with associated standard access methods. In addition, the system is modular, with individual modules reading and writing files in standard format. It may be extended by creating new modules. The system is also heirarchical, in that applications are constructed by writing shell scripts invoking modules or other shell scripts. .PP Unlike some of the above systems, access to data files is purely sequential. Random access to files facilitates many operations in data analysis. However, the advent of computers with large virtual memories allows rather large files to be read completely into "memory". Subsequent addressing of different parts of such files corresponds to a form of random disk access, and has the advantage of being completely transparent to the user. Some understanding of how paging works is needed to use this method intelligently, but a similar comment can be made of more conventional forms of random access. .PP One benefit of using only sequential access is the ability to construct applications as sequences of modules in which the output of one module is fed directly into the input of another. This so-called pipe mechanism first appeared on the UNIX operating system, but is becoming available on other systems as well. Advantages are that a proliferation of intermediate files is avoided, and the heirarchical construction of applications using a shell or command processor is facilitated. In addition, new applications may require the development of only a small number of new programs, the bulk of the processing being done by existing software. This speeds development and makes debugging easier. .PP The organization of this paper is as follows: Section 2 describes the data format used with Candis. Section 3 illustrates the operation of Candis using examples of existing analysis programs. The analysis and display of Doppler radar data is described in section 4. Section 5 is a summary and discussion. .NH Common data format .PP In this section I specify the format of data files used by the system, and then introduce some additional concepts outside the formal specification, but of great utility. .LP .I a. Formal specification .R .PP The file structure used here became locally known as the \fIcommon data format\fR before the publication of Treinish and Gough's (1987) system of the same name. The collision in terminology is unfortunate, but I will try to avoid confusion by using lower case letters for the common data format of the Candis system. Changing our terminology would be difficult, as it has become deeply embedded in the documentation. .PP Common data format files contain three parts, which occur in sequence (see figure 1). The first part is the \fIheader\fR, which is a sequence of alphanumeric characters organized into lines. The second part is called the \fIstatic slice\fR, and contains data such as calibration fields. The third part contains one or more \fIvariable slices\fR, each slice containing a particular instance of a set of data fields. This partitioning is internally defined so that the file is nothing more than a stream of bytes to the underlying operating system. In particular, no dependence is made upon, say, the physical record structure of some device such as a magnetic tape or a disk drive. .PP As mentioned above, the header is organized into lines. Each line must be terminated by a newline character (i. e., linefeed) and must not exceed 81 characters in length, including the newline. White space (i. e., spaces and tabs) before the final newline is ignored. The maximum number of lines in a header is normally 300. .PP Figure 2 shows the contents of a typical header. Since the header is totally alphanumeric, its contents may be examined by text editors. As figure 2 shows, the header contains 5 sections, namely, 1) comments, 2) parameters, 3) static field descriptions, 4) variable field descriptions, and 5) file format. The header is then terminated with a single line containing an asterisk in the first position. .PP The comment section is free form, subject only to the restrictions on line length, number, and termination discussed above. Each line of the parameter section contains a parameter name-parameter value pair separated by white space. The value is in alphanumeric form, and need not even be numeric. However, it can't contain white space. Following the parameter value, and separated by white space, is an optional comment. The comment must begin with a pound sign. This provides commenting capability in addition to the comments found in the first section of the header. Use of parameters is up to the programmer -- they play no direct role in defining the rest of the file. .PP The static field section describes the data that occur in the static slice. Each line describes a \fIfield\fR. A field is an array of numbers in zero, one, two, three, or four dimensions, and is the basic unit of data in the common data format representation. Fields described in this section occur in the same order in the static slice. .PP A field description contains a series of alphanumeric strings separated by white space. These strings have the following meaning in sequence: .IP 1) The name of the field. .IP 2) The multiplicative scaling constant, @smul@. .IP 3) The additive scaling constant, @sadd@. .IP 4) The precision in packed integer format (@c@, @s@, or @l@). .IP 5) The dimensionality of the field (0, 1, 2, 3, or 4). .IP 6) Name of the first dimension, @dname1@. .IP 7) Array size of the first dimension, @dsize1@. .IP 8) Name of the second dimension, @dname2@. .IP 9) Array size of the second dimension, @dsize2@. .IP 10) Name of the third dimension, @dname3@. .IP 11) Array size of the third dimension, @dsize3@. .IP 12) Name of the fourth dimension, @dname4@. .IP 13) Array size of the fourth dimension, @dsize4@. .IP 14) An optional comment as in the parameter description lines. .LP Only as many of entries 6-13 need be included as is justified by the dimensionality of the field, e. g., a two dimensional field would require 6-9, while a zero dimensional field (a scalar field) would require none. The meanings of entries 2-4 will be explained below in the description of the format section. .PP The variable field section describes the structure of variable slices in the same format used for the static field section. Multiple variable slices can occur, but they all must have the same structure. This restriction makes the storage of variable length data somewhat awkward, but greatly simplifies analysis programs. .PP The format section contains just one line that has a string equal to "float", "int", or "ascii". This indicates the way in which data elements in the static and variable fields are represented. In the float format, data elements are stored sequentially in the internal single precision floating point format used by the subject computer. In ascii format data elements are in ASCII character set floating point form (g format in Fortran or C) separated by white space. In this case, white space can include newlines as well as space characters and tabs. In int format the field elements are stored sequentially as integers in binary representation at a level of precision specified by entry 4 in the associated field definition line. The one character codes here refer to associated types in the C language, namely, @c@ = char, or nominally 8 bits, @s@ = short, or nominally 16 bits, and @l@ = long, or nominally 32 bits. The quoted numbers of bits refer to the values used in C compilers on commonly available computers, but there is no guarantee that these values will always hold. .PP The integer format actually contains data that are scaled so as to be representable as an integer. The scaling parameters @smul@ and @sadd@ are defined in the header for each field, and it is up to the creator of the file to give these sensible values. The integer representation is obtained from the equation @I = F*smul + sadd + 0.5*s@ where @F@ is the floating point value and @I@ is the integer value. The inverse transformation is @F = (I - sadd)/smul@. The term 0.5*s in the first equation is to enforce rounding rather than truncation in the float to integer conversion. If @F*smul + sadd@ is positive, @s = +1@. If negative, @s = -1@ for computers that truncate negative floats toward zero, and @+1@ otherwise. The different integer precisions allow tradeoffs between precision and dynamic range on one hand, and data storage space on the other hand. Since a separate precision is defined for each field, this tradeoff can be made on a field-by-field basis. .PP A consequence of the above definitions is that the only file format that can generally be expected to transport from one computer type to another without modification is the ascii form. The float format will very rarely transport. Storage of data as 16 bit integers is frequently used to facilitate transport between computers, but even this can be tricky, as different computers may represent integers with different byte order. The main criterion with the ascii format is that both computers do indeed use the 7 bit ASCII standard to represent characters. Issues having to do with parity bits and extraneous characters such as carriage returns and nulls at the ends of lines need also to be considered. .PP Each slice has the following structure. At the beginning there is an 8 byte sub-header that contains the \fIelement count\fR, or the number of \fIelements\fR in the slice. The elements from each field then occur in sequence. The element sequence for multi-dimensional fields is the same as in the C language, i. e., the last dimension mentioned in the field definition is iterated most rapidly. .PP The element count is obtained by adding up the elements from all constituent fields. The number of elements in a field is simply unity times @dsize1@ times @dsize2@ ..., where as many dimension sizes are included as there are dimensions. For example, a scalar, or zero dimensional field would have an element size of one, whereas a two dimensional field would have @dsize1@ times @dsize2@. The element count is represented in the slice sub-header as an ASCII-coded decimal integer. .PP In the header, all section sub-headers (e. g., ***parameters*** -- see figure 2) must be present, even if there are no entries for that section. There must be one or more variable slices, and if there are no static fields, the static slice element count must still be present, albeit with a value of zero. .LP .I b. Useful constructs .R .PP I now discuss a number of concepts that are not a formal part of the common data format specification, but that turn out to be quite useful. .PP Even though successive variable slices are typically envisioned to represent fields at successive times, no special mechanism is provided to specify the time of each slice. Instead, one simply defines a scalar variable field called, for instance, "time", that contains the time information. This \fIsequence field\fR need not even be time -- it could be, for instance, elevation, with the idea that successive slices represent fields at different levels rather than different times. For use with certain software, it should be monotonically increasing with position in the file. .PP Another useful construct is the \fIindex field\fR. Index fields are one dimensional static fields with the same dimension name and field name. They are useful for specifying the domain over which data are defined. For instance, if data are defined for x between 0 and 10 km at intervals of 2 km, then one would define an index field named "x" with @dsize1@ = 6. The successive elements of this field would be assigned the values 0, 2, 4, 6, 8, 10. Index fields aid plotting routines, and it is good practice to define an index field for each dimension used in a common data format file. Note that the elements of index fields need not be equally spaced. For instance, if data points were closer together for small x in the above example, the index field might take on values 0, 0.5, 1, 2, 5, 10. .PP A more economical way of defining a domain is to specify \fIindex parameters\fR. These are typically used when equally spaced data points must be guaranteed. Some programs search the parameter section of the header for parameters of the form @dname@0 and d@dname@, where @dname@ is a dimension name that occurs in a field definition. For instance, if a dimension name "x" is found, parameters with the names "x0" and "dx" are sought. These are the index parameters for the dimension x, and are interpreted respectively as the starting value and increment for points in the x domain on which field values are defined. For example, if x0 = 3 and dx = 2.5, then the x values 3, 5.5, 8, ... are implied. Most programs that use index parameters assume default values of 0 and 1 respectively for the starting value and the increment if the corresponding parameters are not found. .PP Many observational data sets have regions of bad or missing data. The prime example is radar data, wherein data are only defined in regions containing precipitation particles. Many of the programs written for common data format files look for parameters called "bad" and "badlim". The value of the latter parameter is assumed to define the range of valid numerical data. Values larger than "badlim" in absolute magnitude are assumed to indicate that the datum is bad or missing. The parameter "bad" suggests a value (greater than the value of "badlim") to be used to indicate bad data. If these parameters are missing, default values of 1.e30 and 9.99e29 are respectively assumed. .NH Hierarchical approach to data analysis .PP Programs in the Candis system can be classified into one of three levels, namely \fIprimitive functions\fR, \fIfilters\fR, and \fIshell scripts\fR. Casual users should be able to obtain considerable utility from the system by programming at the highest and simplest level, namely the shell script level. However, new projects will often require the creation of new analysis programs, or filters. Generally these can be kept quite simple, as many standard functions will already be available to solve standard parts of the analysis problem. The shell script and pipe mechanisms provide an effective way of combining standard and non-standard operations. Recourse to the lowest level should rarely if ever be necessary. This is the level of direct manipulation of common data format files, and is adequately done by a library of primitive C language functions. .PP The primary exception to this rule occurs when a common data format file is created by a program in a language different from C. In this case the C primitives can't be used. However, creation of a particular common data format file is much easier than interpreting an arbitrary file, so this presents no particular problems. (A word of caution: It is generally not safe to assume that languages other than C produce binary data in a form that is compatible with the C representation. For instance, Fortran implementations sometimes write binary data in a record structure with embedded byte counts, checksums, etc. Thus, translation between \fIlanguages\fR, even on the same computer, can cause problems. Use of ascii format should minimize these problems. Recall, however, the different order in which multi-dimensional arrays are stored in Fortran and in most other languages, including C.) .PP I now describe the approaches used and specific functions developed for the three levels of programming. .LP .I a. Primitive functions .R .PP The required primitive functions fall into four categories, namely functions to create common data format headers, to interpret these headers, to read and write headers and data slices, and to access fields within slices. Header information may be needed throughout the period in which a data file is being accessed, so the philosophy is to read it into a user-defined buffer that can be retained. The imposition of a maximum number of header lines allows static allocation of buffer memory. This is memory-inefficient for small headers, but simplifies programming. .PP The technique for creating a new common data format header is to create a null header (with just the section labels) and then add comments, parameters, and fields on a line-by-line basis. Functions exist to perform each of these tasks. If a new header consists of an old header plus additions, it can be created by copying the old header to a new buffer and then applying the above functions. Partial copies of individual sections of headers can also be accomplished. .PP Interpretation of headers is perhaps the most difficult task. Functions exist to extract the comment section from a common data format header and to determine the format of the data file. In addition, there are functions to extract parameter values and field characteristics by either name or position in the header. If requested parameters or fields don't exist, a special code is returned, and appropriate action can be taken by the calling program. .PP There are functions to read and write headers and slices, as well as a function to query the header as to the expected number of elements in a static or variable slice. This information is needed so that consistency with the element count in the appropriate slice may be checked. As mentioned above, it is customary to statically allocate memory for header buffers. However, data slices of unpredictable size can occur, and dynamic allocation of memory is important here. Once the header is read, the memory required for static and dynamic slices is readily computed. Space can then be allocated using a C language library function. There is a Candis routine that simplifies this operation. .PP Accessing particular fields within a data slice is done by assigning a pointer to the start of the field of interest. Two functions exist to accomplish this task, one of which also returns additional information about the field in question. .PP Important declaration information is kept in a file named "cdfhdr.h". Any program calling the primitive functions needs to include this file. Visual inspection of the file can also enhance one's understanding of their operation. Table 1 gives the names and summarizes the uses of available primitive functions. .KF .TS box; c c l l . Name Use gethdr read header from designated stream to a header buffer getelcnt read an element count from designated stream getslice read data slice from designated stream puthdr write header to designated stream from a header buffer putslice write data slice to designated stream nullhdr create null header in specified buffer copycmt copy comment section to new header buffer copypar copy parameter section to new buffer copyfld copy static or variable field section to new buffer addcline add comment line to specified header buffer addpar add parameter entry to specified header buffer addfld add field description to specified header buffer getcmt extract comment from specified header buffer getpar extract parameter name and value by position from a header buffer seekpar extract parameter value by name from a header buffer getfld extract field characteristics by position from a header buffer seekfld extract field characteristics by name from a header buffer getfmt extract file format from a header buffer elemcnt extract expected element count from a header buffer getbuff allocate memory for a slice buffer getptr compute a pointer for a specified field getptr2 compute a pointer and return field information .TE .LP Table 1. Primitive functions and their use. Functions that deal with fields can be directed either to the static or variable section of the header. Getslice and putslice only work on files in float format. Special handling is required for slices in other formats. .sp 2 .KE .LP .I b. Standard filters .R .PP In the context of UNIX, a filter is a program that reads data from the \fIstandard input\fR, transforms it in some way, and sends the result to the \fIstandard output\fR. Standard input and output are pre-defined ports that normally read from and write to the user's terminal, but may be redirected to a file or another program. The operation of all but the simplest filters is controlled by command line arguments. If errors occur, error messages are sent to \fIstandard error\fR, another pre-defined output port. The standard error writes to the user's terminal, and normally isn't redirected like the standard output. This provides a mechanism for separating data from error messages. When filters are invoked with an incorrect number of arguments, a "usage" statement is printed to standard error, and the filter exits. This provides a simple form of on-line help. Candis filters typically read and write common data format files and record their actions in the comment section of the output file. Translating filters convert foreign data formats to common data format. Programs that read or write more than one data file don't fit into the filter paradigm and must obtain names of desired files from the command line. However, most desired operations on data can indeed be regarded as filters. A naming convention has been adopted, in which all general purpose filters begin with the prefix "cdf". .PP The natural data format for most numerical work is the float format. Most filters therefore only work on common data format files in this format. A filter (cdftrans) is provided to transform files from any format into any other. .PP One of the most commonly needed operations is to determine the contents of a common data format file. Cdflook fulfills this need by displaying the header and selected information about each slice on the standard output. .PP Several filters are available to limit the domain over which data are passed to the output file. Cdfwindow passes only data within requested limits for specified dimension names. Cdftsel passes only variable slices that have values of a specified sequence field within a particular range. Cdfextr passes only those fields specified as command line arguments. Cdfrdim passes only data defined at a particular value of a specified dimension name, and thus reduces the dimensionality of those fields with a dimension of that name. .PP Three programs combine data into bigger chunks. Cdfcat combines all variable slices of a particular file into a single variable slice. A specified sequence field is turned into an index field, and the dimensionality of all other fields is increased, with the new dimension being given the name of the new index field. Cdfcatf merges homogeneous files such as successive radar volumes into a single file. Cdfcatf is not a filter, because it obtains its input from files listed on the command line. Compatibility between files is checked. Cdfmerge combines heterogeneous files, each with only a single variable slice. Static and variable slices from each file are merged into a single file. This allows, for instance, the merging of aircraft and radar data for common display. (Unlike the CCF system mentioned in the Introduction, no conversion to a common Cartesian coordinate system is done by this program.) Collisions between field names are prevented by appending a unique suffix to fields from each input file. .PP Rtape, as its name suggests, reads magnetic tape files onto disk. Since some existing data formats depend on a particular record structure for their decoding, rtape passes on information about the physical record structure on the tape by prepending each record with a byte count for that record. Tape records of arbitrary size can be read, and different record sizes can be mixed within a particular tape file. A number of common data format filters use rtape output. .PP Numerous other filters have been written, but the above examples give the flavor of the Candis system. One filter not mentioned so far is a plotting filter called cdfplot. This is sufficiently complex that it is discussed separately. UNIX-style documentation exists for all of the above filters, and on the primitive functions as well. .LP .I c. Shell scripts .R .PP All operating systems have some form of command interpreter. Many are suitable for constructing complex applications by combining calls to programs in a script or text file which is read by the interpreter. The UNIX command interpreter is called the "shell", and I illustrate its use with an example taken from the analysis of the output of a time-dependent, two-dimensional numerical model. .PP The sample shell script, named "slice", reads as follows: .DS : make arbitrary slice and plot if test $# -lt 4 then echo "Usage: slice testnumber reducevar reduceval cmdlist ..." else if test -f txz$1 then echo -n "" else echo -n "making txz file ... " expand 0 < test_$1 | cdfcat time 0 201 51 > txz$1 echo "done" fi for i in $4 $5 $6 $7 $8 $9 do list="$list $i" done cdfrdim $2 $3 < txz$1 | cdfplot $list $PG fi .DE The function of slice is to make contour and vector plots over two-dimensional sub-spaces of the three-dimensional space of the model, x, z, and time. These sub-spaces include snapshots at a given time and time sections at constant x and constant z. The input to slice consists of a common data format file containing multiple variable slices, one per time level. This file is created by the numerical model, and is assumed to have the name "test_N" where N is the number of the test run. The first function of slice is to create a file named "txzN", in which all variable slices have been combined into one using cdfcat, and in which certain auxiliary fields have been computed by a special purpose program "expand". The vertical bar in the line containing these programs indicates that the output of expand is piped into the input of cdfcat. This sequence is only invoked if the file txzN doesn't already exist, so that this relatively time consuming operation need not be repeated. The sub-space extraction and plot generation is accomplished by the line containing cdfrdim and cdfplot. .PP The symbols "<" and ">" respectively indicate redirection of standard input and output from keyboard and terminal to the indicated files. Character sequences consisting of a dollar sign and a number are dummy variables replaced by the corresponding command line arguments to slice. Thus, $1 refers to the desired test, $2 to the dimension to be held constant, and $3 to the desired value of that dimension. Subsequent dummy variables contain instructions to the plotting routine, cdfplot. These are concatenated into a single string by the looping construct that begins with "for". $PG is a variable that is set previous to the invocation of slice indicating which graphics device should receive the plots. One of the features of slice is that if less than four command line arguments are typed, a usage statement is printed and the shell script exits. .PP Though relatively simple, this shell script illustrates the features needed to make a command processor useful in the context of Candis, namely dummy variable replacement, looping, and branching. It is also useful if shell scripts can invoke other shell scripts. It is evident from the above example that individual filters \fImust not\fR operate in an interactive manner through the user's terminal. All control over filter operation must be via command line arguments. This is necessary to avoid collisions between terminal input and output from different filters. If interactivity is desired, it should be limited to the uppermost level, i. e., the shell script itself. .LP .I d. The portable graphics system .R .PP The filter cdfplot referred to above invokes a locally developed graphics system called @Pgraf@. Candis makes no commitment to any particular graphics system. However, Pgraf (for "portable graphics") provides some useful lessons that are worth describing, even though its capabilities are relatively primitive. .PP Pgraf, as the name implies, was developed to port easily from one graphics device and computer to another. To facilitate this, a main program interfaces to hardware through six simple, low level subroutines. These are easily rewritten for each graphics device, and versions could be made to drive various graphics standards as well. User programs actually invoke subroutines that create a file containing a device-independent graphics metacode. A separate program then reads the metacode file and draws the graphics images on the desired device. Available graphics functions include station plots, line graphs, scatter plots, contour plots with optional hatching for emphasis, and vector plots. .PP The filter cdfplot serves as a general purpose link between Candis and the portable graphics system, allowing arbitrary one and two-dimensional fields to be graphed, contoured, etc. It therefore largely eliminates the need to write special purpose programs to generate plots. A complete description of the operation of cdfplot is beyond the scope of this paper, but examples of its use will be cited in the next section. .PP One important feature of cdfplot is that by default it prints out a copy of the entire comment section of the common data format header adjacent to the actual plot. Since Candis filters record their actions as comments, this provides a complete history of significant operations on the data with every plot. .NH Synthesized radar data .PP The National Center for Atmospheric Research (NCAR) has developed programs to synthesize multiple Doppler radar data and extract three dimensional particle and wind velocities (Mohr et al., 1986). One of the first major uses of the Candis system has been to further analyze output data from these programs. In this section I present our efforts in this area as a hopefully non-trivial example of the use of Candis. .PP Several specialized filters were developed to handle the output of NCAR's programs. The first, called radcedric, simply converts the output of the CEDRIC program into a common data format file. Radcedric requires that NCAR tapes be read onto disk by rtape, which is discussed in the previous section. CEDRIC presents data fields as a sequence of two dimensional arrays at different levels. Radcedric converts these into a single three dimensional field in x, y, and z for each variable at each analysis time. Typically one then has Cartesian components of particle velocities, reflectivities from one or more radars, and possibly vertical air motion obtained from integration of the continuity equation. CEDRIC field names are retained, but converted to lower case. The prefix "rad" indicates programs dealing exclusively with radar data. .PP Radvert recomputes the vertical air motion based on a locally-derived algorithm (Krehbiel, personal communication). The ambient pressure and density fields are computed and stored in the static slice in the course of these computations. .PP Figure 3 shows a contour plot of vertical and horizontal winds for a thunderstorm that occurred over Langmuir Laboratory in central New Mexico. It was created with the following command: .DS cdfrdim z 7 < b16 | cdfplot 6,6,t/u,v,3,3,v/wi,4,1,c/wi,-4,4,1,f ; $PG .DE Radcedric, radvert, and cdfwindow were used separately to create the file "b16". Cdfrdim then extracted a slice through the data at an elevation of 7 km. The result was passed to cdfplot, which made a vector plot of the horizontal wind components, u and v, and 4 m/s contours of the vertical wind, wi. Vertical hatching indicates wi > 4 m/s, while horizontal hatching indicates wi < -4 m/s. Horizontal wind vectors have a cross at their tail indicating the analysis point. Vector components one grid interval in length equal 3 m/s. .PP A filter called cdfocut can substitute for cdfrdim with the result that the sub-space is a vertical plane with arbitrary azimuth and location. Bilinear interpolation is made to the desired plane, and horizontal velocity components in and normal to the plane are also computed. Replacement of cdfrdim by cdfocut in the above script makes possible the examination of data along non-cardinal directions. .PP CEDRIC works with data at one analysis time (actually, the range of times over which all radars complete a single volume scan) and combines it into a single file called a volume. The resulting common data format files thus contain a single variable slice. Certain analyses such as the computation of Lagrangian parcel trajectories require data at different times. Cdfcatf provides a way to combine multiple volumes into a single file. Cdflagr then uses such a file to compute trajectories from specified starting points in space and time. Integrations can proceed both forward and backward in time, and either air parcel or particle trajectories can be computed. In addition to the actual trajectories, cdflagr interpolates and stores the values of all fields along the trajectories. .PP As an example of the use of cdflagr, we compute the trajectories of air parcels reaching z = 10 km at x = 4 km at the time of figure 3. Figure 4 shows air velocities and reflectivity in a vertical plane defined by x = 4 km, while figure 5 shows the projection into this plane of trajectories reaching the above-defined line at time = 40.9 ks (kiloseconds after midnight). Successive diamonds indicate parcel positions at intervals of 100 s. The results clearly show that some parcels originated from near cloud base at 4 km, even though vertical velocities at lower levels are quite small by the analysis time. Figure 5 was created with the following command: .DS cdflagr u v wi time 40.9 -0.1 21 x 4 z 10 < b0 | cdfplot -4,6,x/4,13,y/6,10,t/y,z,1,p/y,z,4,m ; $PG .DE The input file "b0" was created by concatenating several successive volumes with cdfcatf. .PP Cdflagr works by creating a new common data format file consisting of two dimensional fields, the two dimensions being parcel number and time. Three of these fields are called "x", "y", and "z", and in the above example, "y" and "z" are plotted against each other. In addition to these fields, cdflagr creates additional fields with the same names as fields in the input file. These contain the input fields sampled along the trajectories of the associated parcels. .PP One of the recognized problems in meteorological data analysis is combining data of disparate types. Mohr, et al.'s (1986) CCF system solves this problem by interpolating everything to a Cartesian grid. The Candis system offers the alternative of combining heterogeneous data without modification in the same common data format file using the cdfmerge utility mentioned above. Cdfplot can then be used to present these data sets on the same plot. Figure 6 shows an example of this procedure, wherein the trajectories of figure 5 are overlayed on the reflectivity field shown in figure 4. This was done by diverting the output cdflagr into a temporary file "lagr1" and then merging this file with the radar volume of interest, "b16": .DS cdfmerge b16 "" lagr1 ".l" > b16.l .DE The output file was given the name "b16.l", and all fields from the parcel trajectory file were given the suffix ".l". Fields from b16 were given a null suffix. Figure 6 was then made using the command .DS cdfrdim x 4 < b16.l | cdfplot 6,10,t/zcp4,10,3,c/zcp4,-30,30,3,f/y.l,z.l,1,p/y.l,z.l,4,m ; $PG .DE The merger process in this case is only graphical, whereas the CCF system is capable of, say, combining different sources of wind information on a common grid using an objective analysis scheme. The latter approach is also possible using the Candis system, but sometimes graphics overlays are all that is required. .NH Summary and discussion .PP The key feature of Candis is use of the C programming language for its implementation. Though this a potential barrier to use by programmers experienced only in Fortran, the ability to dynamically allocate memory and to manipulate pointers provide substantial advantages in the construction of general purpose analysis programs. The general purpose nature of Candis filters derives directly from these characteristics. .PP Another important feature of Candis is that applications can be built up in a heirarchical fashion using a mix of old and new filters in conjunction with a shell script. The key to this capability is the restriction of programs to strict serial access of data in the standard format. The availability of large virtual memory computers alleviates this restriction in cases where random access is needed; the entire file is simply read into memory. .PP The final feature that enhances the utility of Candis is the flexibility of the common data format. All data are represented on grids of from zero to four dimensions, with data defined over different spaces being kept separate by judicious use of index fields. Unlike CCF or the CDF system of Treinish and Gough (1987), heterogeneous data can easily be stored in the same file. The advantage is that different types of data can be made available to a common analysis program in a structured fashion without having to access more than one file. .PP One important consequence of the structure of the common data format is that variable length data are difficult to store efficiently. By allowing successive variable slices to be variable in length, this goal could be accomplished. However, the added complexity was not judged to be worth the gain in generality. One way variable length data, e. g., significant level sounding data from a collection of stations, could be handled is to allocate a variable slice of maximum plausible size and define all unused storage as bad. This approach does, of course, make files larger than they strictly need to be. .PP Efficiency of operation, narrowly defined in terms of minimizing demands on processor time and disk storage and access, was \fInot\fR a goal of this project. The point was to define a system that minimized the effort required to develop new applications. In light of the ever-increasing costs of software and the ever-decreasing costs of hardware, this seems like a reasonable point of view. In spite of this, response time in data-intensive applications such as Doppler radar is tolerable on modern workstations. There will, of course, be applications in which absolute maximum efficiency must be extracted from the hardware. Candis may not be suitable for such applications. .PP One area in which the Candis system may be useful is in the direct @generation@ of data sets in the field. The format is simple enough that very little overhead is imposed on the data collection system, and the ability to write an indefinite number of variable slices without knowing how many there will be beforehand is essential to field-generated data. Since it is easier to \fIcreate\fR common data format files than it is to interpret them, the real time system need not be restricted to use of the C language or the UNIX operating system. .PP Candis software will be made available to other users through the University Corporation for Atmospheric Research's UNIDATA project. .PP \fIAcknowledgments.\fR The critical comments of Bill Winn resulted in clearer and more useful constructs at all stages of this project. The comments of anonymous reviewers were also appreciated. Student programmers Sarah Bottomley, Dale Harris, Robert Solomon, and Dinh Ton That made significant contributions. This work was supported by National Science Foundation grants ATM-8311017, ATM-8611364, and ATM-8605136. .EQ REFERENCES .EN .XP Barnes, S. L., 1980: Report on a meeting to establish a common Doppler radar exchange format. \fIBull. Amer. Meteor. Soc.,\fB 61\fR, 1401-1404. .XP Kernighan, B. W., and D. M. Ritchie, 1978: The C programming language. Prentice-Hall, Inc., Englewood Cliffs, NJ, 228 pp. .XP McPherron, R. L., 1976: A self-documenting source-independent data format for computer processing of tensor time series. \fIPhys. Earth Planet. Inter.,\fB 12\fR, 103-111. .XP Mohr, C. G., L. J. Miller, R. L. Vaughan, and H. W. Frank, 1986: The merger of mesoscale datasets into a common Cartesian format for efficient and systematic analyses. \fIJ. Atmos. Oceanic Tech., \fB3\fR, 143-161. .XP Smith, A. Q., and C. R. Clauer, 1986: A versatile source-independent system for digital data management. \fIEOS, Trans. AGU, \fB67\fR, 188. .XP Treinish, L. A., and M. L. Gough, 1987: A software package for the data-independent management of multidimensional data. \fIEOS, Trans. AGU, \fB68\fR, 633-635. .EQ FIGURE~CAPTIONS .EN .XP Figure 1. Schematic layout of a common data format file. Variable fields have the same structure as the static field. .XP Figure 2. Example of a common data format header. .XP Figure 3. Example of a plot produced by the filter cdfplot. Radar-derived horizontal and vertical winds are shown in a horizontal plane at 7 km. A vector equal to a grid interval in length represents 3 m/s in horizontal wind. The cross defines the analysis point and indicates the tail of the vector. Vertical wind contours are at 4 m/s intervals, with horizontal hatching indicating vertical winds less than -4 m/s and vertical hatching for vertical winds greater than +4 m/s. .XP Figure 4. Vertical section at x = 4 km at the same time as figure 3. Wind components in the y-z plane are shown. Vector components equal to a horizontal grid interval represent 3 m/s, while the corresponding vertical scaling is 6 m/s. The difference reflects the different grid intervals in the vertical (500 m) and horizontal (250 m). Radar reflectivities exceeding 30 dBZ are indicated by vertical hatching. .XP Figure 5. Projection into the y-z plane of trajectories of parcels reaching the line x = 4 km, z = 10 km at the time of figures 3 and 4. Integration is backward in time for a maximum of 2000 s. Parcel positions at 100 s intervals are denoted by the diamonds. Note that some parcels originated from as low as 4 km. .XP Figure 6. Regions of high radar reflectivity as in figure 4, with the trajectories of figure 5 overlayed. Reflectivity is contoured at 10 dBZ intervals, and regions with reflectivity exceeding 30 dBZ are hatched.