--------------------------------------------------------------------------
***  For installation instructions, skip to Section 1.
--------------------------------------------------------------------------
  NIST SPARSE BLAS v. 0.9 (Sat Jul 6 14:27:21 EDT 1996)        
--------------------------------------------------------------------------
                                                        
  Authors:                                              
     Karin A. Remington and Roldan Pozo                 
     National Institute of Standards and Technology     
                                                        
  Based on the interface standard proposed in:          
   "A Revised Proposal for a Sparse BLAS Toolkit" by    
    S. Carney and K. Wu -- University of Minnesota      
    M. Heroux and G. Li -- Cray Research                
    R. Pozo and K.A. Remington -- NIST                  
                                                        
  Contact:                                              
     Karin A. Remington, email: kremington@nist.gov     
--------------------------------------------------------------------------
     
Contents:

0.  Release Notes
1.  Installation Intructions
2.  Toolkit Interface
3.  Lite Interface
4.  Source Code Generation

-----------------------------
Section 0.   Release Notes
-----------------------------

What's included:

    The package includes support for the "BASIC" Toolkit, including
  matrix-multiply and triangular solve routines for the following
  sparse matrix formats:
  
            csr -  compressed sparse row
            csc -  compressed sparse column 
            coo -  coordinate 
            bsr -  block sparse row
            bsc -  block sparse column 
            bco -  block coordinate 
            vbr -  variable block row
          
What's  * NOT *  included:

    The following is **NOT** included in this release:  
      --  support for triangular solves for the block coordinate (bco) scheme
      --  support for non-contiguous block storage in the block formats

What's required:

    Minimum:  ANSI C compiler
              12 MB Free disk space
    Optional: Fortran compiler (for testing fortran interfaces)
              AWK and SED (for re-generating kernel source code)
              
Testing:

    The testing directories contain both matrix-multiply and triangular
  solve testers for each supported storage scheme.  C and Fortran
  testers are both included, and can be used a examples for library usage. 

    This distribution has been tested under the following OS/compiler 
  configurations:

         sunos4.1.4:   gcc 2.7.0, gcc 2.7.2 and  acc 3.0.1
         sunsolaris2.4:gcc 2.7.0  (no RANLIB, see makefile.def)
         AIX.1.1:      xlc
         sgi-irix5.3:  gcc 2.7.0

Bug reports:

  Please send bug reports to kremington@nist.gov.

---------------------------------------
Section 1.   Installation Intructions
---------------------------------------

   The installation of the Sparse BLAS Toolkit is automated with
the "make" utility.  To use "make" to build the library:

   1.  Edit the file ./makefile.def to reflect your system setup:

         - The minimum installation requires an ANSI C compiler.  
         - An extended installation which includes Fortran 
           callable routines and testers is available.  
           If the presence of a Fortran compiler is indicated in 
           the makefile.def file, the extended version will be installed.
         - The archival process by default uses "ranlib". If this
           is not available on your system, set HASRANLIB to 'f'.

   2.  Type: 

       "make install" (**) to build the library AND make and run 
                           the C and Fortran testers 
       "make installc"     to build the library AND make and run the C testers 
       "make library"      to build the archive file ./lib/libsptk.a
                           (tests are not built)
       "make testc"        to build and run the C testers 
                           (library must be pre-built)
       "make testf77" (**) to build and run the Fortran testers
                           (library must be pre-built)

           (**) requires a Fortran compiler

   3.  For space-saving cleanup, type "make clean" to remove all .o files.

--------------------------------
Section 2.   Toolkit Interface
--------------------------------
    
    The Toolkit interface, along with the decision trees for 
    calling the proper kernel routine for a given set of input 
    values are implemented in the files  
            ./src_tkc/_xxxmm_c.c and _xxxsm_c.c       (C bindings)
            ./src_tkf/_xxxmm_f.c and _xxxsm_f.c    (Fortran bindings)
    where:
         xxx      is the matrix storage format (csr, csc, coo, etc.)
          mm      indicates matrix multiply routine
          sm      indicates triangular solve routine
         
    
    **********************************************************************
  *  For a complete description of the Sparse BLAS Toolkit interface,      *
 *    see: "A Revised Proposal for a Sparse BLAS Toolkit", an article by    *
 *    S. Carney, M. Heroux, G. Li, R. Pozo, K. Remington and K. Wu.         *
  *   http://www.cray.com/PUBLIC/APPS/SERVICES/ALGORITHMS/spblastk.ps      *
    **********************************************************************
    

---------------------------------------
Section 3.   Lite Interface
---------------------------------------
    
    FILE STRUCTURE:
    
        The FILE structure for the internal lightweight routines of the 
    NIST Sparse BLAS keys filenames to storage format and computation type.  
    The filenames follow these two templates:
      
         multiply:          _xxxyml.c    
         triangular solve:  _xxxytsl.c
    
    where:
         xxx      is the matrix storage format (csr, csc, coo, etc.)
          y       v - single column result   ( n = 1 )
                  m - multiple column result ( n > 1 )
    
    
    ROUTINES:
    
      The routines in the NIST Sparse BLAS library follow a naming
      convention which encodes specific kernels drawn from the generic
      routine.

      The source for the library is divided into separate files for 
      each storage format and matrix or vector computation combination. 
      The following files are used in this distribution:

dbcomml.c	dbscvtsl.c	dcoomml.c	dcscvtsl.c	dutil.c
dbcovml.c	dbsrmml.c	dcoovml.c	dcsrmml.c	dvbrmml.c
dbscmml.c	dbsrmts.c	dcscmml.c	dcsrmts.c	dvbrmts.c
dbscmts.c	dbsrmtsl.c	dcscmts.c	dcsrmtsl.c	dvbrmtsl.c
dbscmtsl.c	dbsrvml.c	dcscmtsl.c	dcsrvml.c	dvbrvml.c
dbscvml.c	dbsrvts.c	dcscvml.c	dcsrvts.c	dvbrvts.c
dbscvts.c	dbsrvtsl.c	dcscvts.c	dcsrvtsl.c	dvbrvtsl.c


     VECTOR/MATRIX MULTIPLY ROUTINES:

      Each MULTIPLY file contains all of the either vector or matrix 
      "lite" kernel routines for the following 6 kernels.  (dxxxvml.c
      contains the vector routines, dxxxmml.c contains the matrix
      or multiple right-hand-side routines.)

               CAB      =   C <- A*B 
               CABC     =   C <- A*B + C
               CaAB     =   C <- alpha*A*B 
               CaABC    =   C <- alpha*A*B + C
               CABbC    =   C <- A*B + beta*C
               CaABbC   =   C <- alpha*A*B + beta*C

      In the cases where storage formats do not allow directly calling
      an alternate kernel for performing the transpose multiplication
      (all except CSR and CSC), the following kernels are also included:

               CATB      =   C <- A'*B 
               CATBC     =   C <- A'*B + C
               CaATB     =   C <- alpha*A'*B 
               CaATBC    =   C <- alpha*A'*B + C
               CATBbC    =   C <- A'*B + beta*C
               CaATBbC   =   C <- alpha*A'*B + beta*C

      For each of these kernels, there is a basic vector/matrix multiply,
      and a skew symmetric vector/matrix multiply:

       void XXX_<Vec|Mat>Mult_<kernel>_TYPE
       void XXXskew_<Vec|Mat>Mult_<kernel>_TYPE

      For the non-transpose kernels, there is also a symmetric vector/matrix
      multiply routine:

       void XXXsymm_<Vec|Mat>Mult_<kernel>_TYPE 

      Calling sequences for these routines are similar to the Toolkit
      interface, but with meaningless arguments for each special case
      eliminated.  See the User's Guide or the include header files for
      specific calling sequences.

                   
     VECTOR/MATRIX TRIANGULAR SOLVE ROUTINES:

      Each TRIANGULAR SOLVE  file contains all of the either vector or matrix 
      "lite" kernel routines for the following 24 kernels.  (dxxxvml.c
      contains the vector routines, dxxxmml.c contains the matrix
      or multiple right-hand-side routines.)

               CAB      =   C <- A*B 
               CaAB     =   C <- alpha*A*B 
               CABC     =   C <- A*B + C
               CaABC    =   C <- alpha*A*B + C
               CABbC    =   C <- A*B + beta*C
               CaABbC   =   C <- alpha*A*B + beta*C

               CDAB     =   C <- DL*A*B 
               CaDAB    =   C <- alpha*DL*A*B 
               CDABC    =   C <- DL*A*B + C
               CaDABC   =   C <- alpha*DL*A*B + C
               CDABbC   =   C <- DL*A*B + beta*C
               CaDABbC  =   C <- alpha*DL*A*B + beta*C

               CADB     =   C <- A*DR*B 
               CaADB    =   C <- alpha*A*DR*B 
               CADBC    =   C <- A*DR*B + C
               CaADBC   =   C <- alpha*A*DR*B + C
               CADBbC   =   C <- A*DR*B + beta*C
               CaADBbC  =   C <- alpha*A*DR*B + beta*C

               CDADB    =   C <- DL*A*DR*B 
               CaDADB   =   C <- alpha*DL*A*DR*B 
               CDADBC   =   C <- DL*A*DR*B + C
               CaDADBC  =   C <- alpha*DL*A*DR*B + C
               CDADBbC  =   C <- DL*A*DR*B + beta*C
               CaDADBbC =   C <- alpha*DL*A*DR*B + beta*C
       
      In the cases where storage formats do not allow directly calling
      an alternate kernel for performing the transpose multiplication
      (all except CSR and CSC), transpose kernels are also included.

      For each of these kernels, there are two unit-diagonal triangular 
      solve routines, and for point-entry formats there are also two
      non-unit-diagonal triangular solve routines.

       XXX_<Vec|Mat>TriangSlvUU_<kernel>_TYPE (Upper triangular, Unit diag.)
       XXX_<Vec|Mat>TriangSlvLU_<kernel>_TYPE (Lower triangular, Unit diag.)
       XXX_<Vec|Mat>TriangSlvUD_<kernel>_TYPE (Upper triangular, non-unit Diag.)
       XXX_<Vec|Mat>TriangSlvLD_<kernel>_TYPE (Lower triangular, non-unit Diag.)

      Calling sequences for these routines are similar to the Toolkit
      interface, but with meaningless arguments for each special case
      eliminated.  See the User's Guide or the include header files for
      specific calling sequences.

    
--------------------------------------------------------------------------
                    
-----------------------------------
Section 4.   Source code generation
-----------------------------------

   The SRC_GEN directory contains generic source files:

bcomm.c         bsrmm.c         cscmm.c         csrmts.c
bscmm.c         bsrmts.c        cscmts.c        vbrmm.c
bscmts.c        coomm.c         csrmm.c         vbrmts.c

along with generator scripts for creating the NIST Sparse BLAS kernel 
routines from these generic source files.

   These source files are used as "master files", and are written in such
a way that special case routines can be generated by relatively simple
shell scripts which use "sed" and "awk" for text replacement.
The approach saves considerable programming effort by generating most
source files automatically, and reduces errors by ensuring that
any changes are propagated throughout all of the related source code.

   The master files provide working source code for the most general
version of the kernel routine.  This is where real programming effort
should be expended to optimized the library.  The code is commented
with tags which can be used to selectively delete code for special
case routines.   The "rules" for creating each special case file 
are defined in the SRC_GEN/kernels subdirectory.  The kernels subdirectory
contains the files

CAB		CADBbC		CDADBC		CaADB		CaDABbC
CABC		CDAB		CDADBbC		CaADBC		CaDADB
CABbC		CDABC		CaAB		CaADBbC		CaDADBC
CADB		CDABbC		CaABC		CaDAB		CaDADBbC
CADBC		CDADB		CaABbC		CaDABC

one representing each of the specializations from the generic master
code, along with kernel files for the master codes.  Each of these
kernel files contains pointers to appropriate "Definition" files,
in the directory SRC_GEN/Defs, which are used to build up the 
sed script for the text replacement to generate the kernel routines.

   For typical use, these kernel and definition files would never have 
to be touched.  Many modifications (say for optimization) can be made 
to the master source files without requiring any change whatsoever 
to the file generation mechanism.  The only source code changes which
would affect code generation would be those which alter the 
relationship between the comment tags and the related source.
A more detailed explanation of the mechanism, and requirements
for modifications, will be forthcoming in the 1.0 release.

   After making any necessary changes to these "master" source files, 
the library source files may be generated via the "create" script
(automated in the "make" process in this directory with "make install" 
or "make re-install").

                          ** IMPORTANT NOTE **

   Any changes to source for any routines below the Toolkit interface 
layer ** MUST ** be made in the ../SRC_GEN directory to be retained and 
propagated to all appropriate kernel routines. 

   Changes to the Toolkit interface routines, however, should be made 
directly in the directory ../src_tk[c|f].)

                          ** IMPORTANT NOTE **


--------------------------------------------------------------------------