Last update: Fri Nov 8 06:30:22 MST 2024
Volume 1, Number 1, August, 1984D. J. Evans Parallel SOR iterative methods . . . . . 3--18 W. Gentzsch Numerical algorithms in computational fluid dynamics on vector computers . . . 19--33 M. J. Kascic, Jr. Vorton dynamics: a case study of developing a fluid dynamics model for a vector processor . . . . . . . . . . . . 35--44 P. N. Swarztrauber FFT algorithms for vector computers . . 45--63 D. Parkinson and M. Wunderlich A compact algorithm for Gaussian elimination over GF(2) implemented on highly parallel computers . . . . . . . 65--73 W. Ronsch Stability aspects in using parallel algorithms . . . . . . . . . . . . . . . 75--98 F. J. Peters Parallel pivoting algorithms for sparse symmetric matrices . . . . . . . . . . . 99--110
C. C. Hsiung and W. Butscher A numerical seismic $3$-D migration model for vector multiprocessors . . . . 113--120 M. Kratz Vectorized finite-element stiffness generation: tuning the Noor-Lambiotte algorithm . . . . . . . . . . . . . . . 121--132 J. J. Dongarra and R. E. Hiromoto A collection of parallel linear equations routines for the Denelcor HEP 133--142 D. C. Sorensen Buffering for vector performance on a pipelined MIMD machine . . . . . . . . . 143--164 M. Bishop The Ultracomputer as a vehicle for polymer simulations . . . . . . . . . . 165--174 P. Frederickson and R. Hiromoto and T. L. Jordan and B. Smith and T. Warnock Pseudo-random trees in Monte Carlo . . . 175--180 J. Tappe The minimal average latency of multiconfigurable pipelines . . . . . . 181--183 J. Tappe Algorithms for pipeline control . . . . 185--188
Robert E. Hiromoto and Olaf M. Lubeck and James Moore Experiences with the Denelcor HEP . . . 197--206 Nisheeth R. Patel and Harry F. Jordan A parallelized point rowwise successive over-relaxation method on a multiprocessor . . . . . . . . . . . . . 207--222 Jack J. Dongarra and Ahmed H. Sameh On some parallel banded system solvers 223--235 T. Axelrod and P. Dubois and P. Eltgroth A simulator for MIMD performance prediction: application to the S-1 MkIIa multiprocessor . . . . . . . . . . . . . 237--274 Shao-Wen Mai and D. J. Evans A parallel algorithm for the enumeration of the spanning trees of a graph . . . . 275--286 Celso Ribeiro Performance evaluation of vector implementations of combinatorial algorithms . . . . . . . . . . . . . . . 287--294 F. W. Bobrowicz and J. E. Lynch and K. J. Fisher and J. E. Tabor Vectorized Monte Carlo photon transport 295--305 B. L. Buzbee and H. J. Raveché Conference on forefronts of large-scale computational problems . . . . . . . . . 307--315 Ad Emmen International supercomputer applications symposium . . . . . . . . . . . . . . . 317--319 Iain S. Duff Supercomputers in Europe . . . . . . . . 321--324 Marian Vajtersic Parallel marching Poisson solvers . . . 325--330 Alberto Pettorossi and Andrzej Skowron Higher-order communications for concurrent programming . . . . . . . . . 331--336 Ondrej Sýkora VLSI systems for some problems of computational geometry . . . . . . . . . 337--342 Anonymous Calendar . . . . . . . . . . . . . . . . 343--344 Anonymous Author index to volume 1 (1984) . . . . 345--346
Roger W. Hockney $(r_\infty,\,n_{1/2},\,s_{1/2})$ measurements on the 2-CPU CRAY X-MP . . 1--14 W. Handler Dynamic computer structures for manifold utilization . . . . . . . . . . . . . . 15--32 U. Meier A parallel partition method for solving banded systems of linear equations . . . 33--43 Daniel A. Reed and Merrell L. Patrick Parallel, iterative solution of sparse linear systems: models and architectures 45--67 J. J. Modi and J. S. Rollett An algorithm for inverse square-roots 69--71 Nikola K. Kasabov A method for SIMD/MIMD functionally reconfigurable multimicroprocessor systems design and parallel data exchange algorithms . . . . . . . . . . 73--78
Hiroshi Tamura and Sachio Kamiya and Takahiro Ishigai FACOM VP-100/200: supercomputers with ease of use . . . . . . . . . . . . . . 87--107 D. A. Calahan Task granularity studies on a many-processor CRAY X-MP . . . . . . . . 109--118 R. W. Hockney MIMD computing in the U.S.A.---1984 . . 119--136 J. A. Clausing and R. Hagstrom and E. L. Lusk and R. A. Overbeek A technique for achieving portability among multiprocessors: Implementation on the Lemur . . . . . . . . . . . . . . . 137--162 N. C. Kalra and P. C. P. Bhatt Parallel algorithms for tree traversals 163--171 Wilhelm Oberaigner Parallel algorithms for rounding exact evaluation of sums of products . . . . . 173--182
G. S. Almasi Overview of parallel processing . . . . 191--203 Garry Rodrigue Inner/outer iterative methods and numerical Schwarz algorithms . . . . . . 205--218 R. Ohbuchi Overview of parallel processing research in Japan . . . . . . . . . . . . . . . . 219--228 C. Ghezzi Concurrency in programming languages: a survey . . . . . . . . . . . . . . . . . 229--241 P. M. Kogge Function-based computing and parallelism: a review . . . . . . . . . 243--253 Paul O. Frederickson and Rondall E. Jones and Brian T. Smith Synchronization and control of parallel algorithms . . . . . . . . . . . . . . . 255--264 D. D. Gajski and J. K. Peir Comparison of five multiprocessor systems . . . . . . . . . . . . . . . . 265--282 S. E. Fahlman Parallel processing in artificial intelligence . . . . . . . . . . . . . . 283--286 P. C. Treleaven Control-driven, data-driven and demand-driven computer architecture . . 287--288
Arthur Rizzi Vector coding the finite-volume procedure for the CYBER 205 . . . . . . 295--312 D. J. Evans and S. Mai Two parallel algorithms for the convex hull problem in a two dimensional space 313--326 F. Seutter CEPROL: a cellular programming language 327--333 J. Staunstrup and J. O. Jespersen and O. V. Johansen Physical datarepresentation in a multiprocessor database machine . . . . 335--343 S. A. Williams The transformation of collections of communicating sequential processes that represent pipeline configurations . . . 345--351 S. Kutti Taxonomy of parallel processing and definitions . . . . . . . . . . . . . . 353--359
J. C. Browne Framework for formulation and analysis of parallel computation structures . . . 1--9 S. G. Akl and H. Schmeck Systolic sorting in a sequential input/output environment . . . . . . . . 11--23 J. J. Dongarra and A. H. Sameh and D. C. Sorensen Implementation of some concurrent algorithms for matrix factorization . . 25--34 M. K. Seager Parallelizing conjugate gradient for the Cray X-MP . . . . . . . . . . . . . . . 35--47 H. A. van der Vorst The performance of FORTRAN implementations for preconditioned conjugate gradients on vector computers 49--58 M. Sonnenschein An extension of the language C for concurrent programming . . . . . . . . . 59--71 O. Axelsson and V. Eijkhout A note on the vectorization of scalar recursions . . . . . . . . . . . . . . . 73--83 D. J. Evans and N. Y. Yousif The parallel neighbor sort and $2$-way merge algorithm . . . . . . . . . . . . 85--90
Harry F. Jordan Structuring parallel algorithms in an MIMD, shared memory environment . . . . 93--110 Robert Hiromoto Some issues in parallel processing as encountered on the Denelcor HEP . . . . 111--127 Tim S. Axelrod Effects of synchronization barriers on multiprocessor performance . . . . . . . 129--140 M. Goldapp Fast scan-line conversion using vectorisation . . . . . . . . . . . . . 141--152 Daniel Boley Solving the generalized eigenvalue problem on a synchronous linear processor array . . . . . . . . . . . . 153--166 A. Brass and G. S. Pawley Two- and three-dimensional FFTs on highly parallel computers . . . . . . . 167--184
B. L. Buzbee A strategy for vectorization . . . . . . 187--192 Iain S. Duff Parallel implementation of multifrontal schemes . . . . . . . . . . . . . . . . 193--204 U. Meier Two parallel SOR variants of the Schwarz alternating procedure . . . . . . . . . 205--215 C. B. Yang and R. C. T. Lee The mapping of $2$-D array processors to $1$-D array processors . . . . . . . . . 217--229 John P. Shen and John P. Hayes and Luigi Ciminiera and Angelo Serra Fault-tolerance and performance analysis of beta-networks . . . . . . . . . . . . 231--249 E. Katona A lattice model for cellular (systolic) algorithms . . . . . . . . . . . . . . . 251--258 V. Faber and Olaf M. Lubeck and Andrew B. White, Jr. Superlinear speedup of an efficient sequential algorithm is not possible . . 259--260 D. Parkinson Parallel efficiency can be greater than unity . . . . . . . . . . . . . . . . . 261--262 J. J. Modi and J. S. Rollett Some problems of exploiting a pipeline processor . . . . . . . . . . . . . . . 263--265
H.-C. Hoppe and H. Mühlenbein Parallel adaptive full-multigrid methods on message-based multiprocessors . . . . 269--287 D. J. Evans and G. M. Megson Romberg integration using systolic arrays . . . . . . . . . . . . . . . . . 289--304 D. Gannon and J. Panetta Restructuring SIMPLE for the CHiP architecture . . . . . . . . . . . . . . 305--326 J. W. H. Liu Computational models and task scheduling for parallel sparse Cholesky factorization . . . . . . . . . . . . . 327--342 W. Oed and O. Lange Modelling, measurement, and simulation of memory interference in the Cray X-MP 343--358
T. Yuba and H. Kashiwagi The Japanese national project for new generation supercomputing systems . . . 1--16 G. C. Fox and S. W. Otto and A. J. G. Hey Matrix algorithms on a hypercube. I. Matrix multiplication . . . . . . . . . 17--31 D. J. Evans and G. M. Megson Construction of extrapolation tables by systolic arrays for solving ordinary differential equations . . . . . . . . . 33--48 W. Ronsch and H. Strauss Timing results of some internal sorting algorithms on vector computers . . . . . 49--61 P. Moller-Nielsen and J. Staunstrup Problem-heap: a paradigm for multiprocessor algorithms . . . . . . . 63--74 H. Carlisle and A. Crawford and S. Sheppard ADA multitasking and the single source shortest path problem . . . . . . . . . 75--91 I. Parberry Some practical simulations of impractical parallel computers . . . . . 93--101 E. D. Brooks, III A butterfly processor-memory interconnection for a vector processing environment . . . . . . . . . . . . . . 103--110
D. Kamowitz SOR and MGR$(\nu)$ experiments on the Crystal multicomputer . . . . . . . . . 117--142 M. Louter-Nool Basic linear algebra subprograms (BLAS) on the CDC Cyber 205 . . . . . . . . . . 143--165 R. Suros and E. Montagne Optimizing systolic networks by fitting diagonals . . . . . . . . . . . . . . . 167--174 R. B. Simpson and A. Yazici An organization of the extrapolation method for vector processing . . . . . . 175--188 Z. Strakos Effectivity and optimizing of algorithms and programs on the host-computer/array-processor system . . 189--207 V. Faber and O. M. Lubeck and A. B. White, Jr. Comments on the paper `Parallel efficiency can be greater than unity' 209--210 R. Janssen A note on superlinear speedup . . . . . 211--213 H. Umeo and I. Nakatsuka A design of pipeline-interval-optimum systolic stack . . . . . . . . . . . . . 215--219 M. P. Bekakos and D. J. Evans A `rotating' and `folding' algorithm using a two-dimensional `systolic' communication geometry . . . . . . . . . 221--228 Michael Kaps and Michael Schlegl A short proof for the existence of the ${WZ}$-factorisation . . . . . . . . . . 229--232
K. H. Cheng and S. Sahni VLSI systems for band matrix multiplication . . . . . . . . . . . . . 239--258 C. A. Pogue and P. Willett Use of text signatures for document retrieval in a highly parallel environment . . . . . . . . . . . . . . 259--268 H. Mühlenbein and M. Gorges-Schleuter and O. Kramer New solutions to the mapping problem of parallel systems: the evolution approach 269--279 P. Federickson and R. Hiromoto and J. Larson A parallel Monte Carlo transport algorithm using a pseudo-random tree to guarantee reproducibility . . . . . . . 281--290 Y. N. Srikant and P. Shankar A new parallel algorithm for parsing arithmetic infix expressions . . . . . . 291--304 Guang R. Gao A stability classification method and its application to pipelined solution of linear recurrences . . . . . . . . . . . 305--321 Hartmut Schwandt An interval arithmetic method for the solution of nonlinear systems of equations on a vector computer . . . . . 323--337 Rami Melhem Parallel Gauss--Jordan elimination for the solution of dense linear systems . . 339--343 J. Modi and R. Prager Implementation of bubble sort and the odd-even transposition sort on a rack of transputers . . . . . . . . . . . . . . 345--348 W. Gentzsch A fully vectorizable SOR variant . . . . 349--353
Anonymous International Conference on Vector and Parallel Computing --- Issues in Applied Research and Development . . . . . . . . ?? Petter E. Bjòrstad A large scale, sparse, secondary storage, direct linear equation solver for structural analysis and its implementation on vector and parallel architectures . . . . . . . . . . . . . 3--12 E. Clementi and J. Detrich and S. Chin and G. Corongiu and D. Folsom and D. Logan and R. Caltabiano and A. Carnevali and J. Helin and M. Russo and A. Gnudi and P. Palamidese Large-scale computations on a scalar, vector and parallel `supercomputer' . . 13--44 Henk A. van der Vorst Large tridiagonal and block tridiagonal linear systems on vector and parallel computers . . . . . . . . . . . . . . . 45--54 Ameet K. Dave and Iain S. Duff Sparse matrix calculations on the CRAY-2 55--64 Eleanor Chu and Alan George Gaussian elimination with partial pivoting and load balancing on a multiprocessor . . . . . . . . . . . . . 65--74 D. Parkinson Organisational aspects of using parallel computers . . . . . . . . . . . . . . . 75--83 Alan George and Michael T. Heath and Esmond Ng and Joseph Liu Symbolic Cholesky factorization on a local-memory multiprocessor . . . . . . 85--95 R. W. Hockney Parametrization of computer performance 97--103 M. Itoh and K. Uchida Trends in Fujitsu large scale computer technology . . . . . . . . . . . . . . . 105--115 Oliver A. McBryan and Eric F. Van de Velde Matrix and vector operations on hypercube parallel processors . . . . . 117--125 Dianne P. O'Leary Parallel implementation of the block conjugate gradient algorithm . . . . . . 127--139 Catherine E. Houstis and Elias N. Houstis and John R. Rice Partitioning PDE computations: methods and performance evaluation . . . . . . . 141--163 William D. Gropp Solving PDEs on loosely-coupled parallel processors . . . . . . . . . . . . . . . 165--173 J. J. Dongarra and D. C. Sorensen A portable environment for developing parallel FORTRAN programs . . . . . . . 175--186 G. W. Stewart A parallel implementation of the $QR$-algorithm . . . . . . . . . . . . . 187--196 Paul N. Swarztrauber Multiprocessor FFTs . . . . . . . . . . 197--210 Merrell L. Patrick and Daniel A. Reed and Robert G. Voigt The impact of domain partitioning on the performance of a shared memory multiprocessor . . . . . . . . . . . . . 211--217 Jack J. Dongarra and Lennart Johnsson Solving banded systems on a parallel processor . . . . . . . . . . . . . . . 219--246 T. Watanabe Architecture and performance of NEC supercomputer SX system . . . . . . . . 247--255 Ruth Gonzalez and Mary Fanett Wheeler Domain decomposition for elliptic partial differential equations with Neumann boundary conditions . . . . . . 257--263
Gérard Meurant Multitasking the conjugate gradient method on the CRAY X-MP/48 . . . . . . . 267--280 Alan H. Karp and John Greenstadt An improved parallel Jacobi method for diagonalizing a symmetric matrix . . . . 281--294 Nikolaos M. Missirlis Scheduling parallel iterative methods on multiprocessor systems . . . . . . . . . 295--302 Henk A. van der Vorst Analysis of a parallel solution method for tridiagonal linear systems . . . . . 303--311 Jian Ping Shao and Li Shan Kang An asynchronous parallel mixed algorithm for linear and nonlinear equations . . . 313--321 M. A. de Bruijn EPS: an `elementary' programming system for the Delft Parallel Processor . . . . 323--337 Piyush Mehrotra and John Van Rosendale The BLAZE language: a parallel language for scientific programming . . . . . . . 339--361 T. Hoshino and T. Shirakawa and K. Tsuboi Mesh-connected parallel computer PAX for scientific applications . . . . . . . . 363--371 I. Stojmenovic and D. J. Evans Comments on two parallel algorithms for the planar convex hull problem . . . . . 373--375
H. P. Zima and H.-J. Bast and M. Gerndt SUPERB: a tool for semi-automatic MIMD/SIMD parallelization . . . . . . . 1--18 Joel H. Saltz and Vijay K. Naik Towards developing robust algorithms for solving partial differential equations on MIMD machines . . . . . . . . . . . . 19--44 Stavros A. Zenios and John M. Mulvey A distributed algorithm for convex network optimization problems . . . . . 45--56 M. Zubair and B. B. Maden Efficient systolic algorithm for finding bridges in a connected graph . . . . . . 57--61 David L. Cochrane and Donald G. Truhlar Strategies and performance norms for efficient utilization of vector pipeline computers as illustrated by the classical mechanical simulation of rotationally inelastic collisions . . . 63--85 Zahari Zlatev Treatment of some mathematical models describing long-range transport of air pollutants on vector processors . . . . 87--98 Clive Temperton Implementation of a prime factor FFT algorithm on CRAY-1 . . . . . . . . . . 99--108 Charles H. Romine and James M. Ortega Parallel solution of triangular systems of equations . . . . . . . . . . . . . . 109--114 P. Carnevali Timing results of some internal sorting algorithms on the IBM 3090 . . . . . . . 115--117 D. J. Evans and K. Margaritis Optical processing of banded matrix algorithms using outer product concepts 119--125
F. A. Lootsma and K. M. Ragsdell State-of-the-art in parallel nonlinear optimization . . . . . . . . . . . . . . 133--155 D. J. Silvester Optimising finite element matrix calculations using the general technique of element vectorisation . . . . . . . . 157--164 Rami Melhem Parallel solution of linear systems with striped sparse matrices . . . . . . . . 165--184 Willi Schönauer and Eric Schnepf FIDISOL: a ``black box'' solver for partial differential equations . . . . . 185--193 Yau Shu Wong Solving large elliptic difference equations on CYBER 205 . . . . . . . . . 195--207 T. Asano and H. Umeo Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region . . . . . . . . . . . 209--216 Mike Ashworth and Andrew G. Lyne A segmented FFT algorithm for vector computers . . . . . . . . . . . . . . . 217--224 R. M. Chamberlain Gray codes, fast Fourier transforms and hypercubes . . . . . . . . . . . . . . . 225--233 E. D. Brooks, III The shared memory hypercube . . . . . . 235--245 C. R. Askew and D. B. Carpenter and J. T. Chalker and A. J. G. Hey and M. Moore and D. A. Nicole and D. J. Pritchard Monte Carlo simulation on transputer arrays . . . . . . . . . . . . . . . . . 247--258 M. Hatzopoulos and D. J. Evans Comments on the paper: ``A short proof for the existence of the WZ-factorisation'' [Parallel Comput. \bf 4 (1987), no. 2, 229--232, MR 88j:65064a] by M. Kaps and M. Schlegl 259--259
William L. Briggs and Thomas Turnbull Fast Poisson solvers for MIMD computers 265--274 M. Cosnard and M. Marrakchi and Y. Robert and D. Trystram Parallel Gaussian elimination on an MIMD computer . . . . . . . . . . . . . . . . 275--296 H. Y. Chang and S. Utku and M. Salama and D. Rapp A parallel Householder tridiagonalization strategem using scattered square decomposition . . . . . 297--311 D. J. Evans and Jian Ping Shao and Li Shan Kang The convergence factor of the parallel Schwarz overrelaxation method for linear systems . . . . . . . . . . . . . . . . 313--324 B. W. Glickfeld and R. A. Overbeek Geometric specification of scheduling constraints: a simplified approach to multiprocessing . . . . . . . . . . . . 325--337 E. D. Brooks, III The indirect $k$-ary $n$-cube for a vector processing environment . . . . . 339--348 M. J. Quinn Parallel sorting algorithms for tightly coupled multiprocessors . . . . . . . . 349--357 Robert A. Wagner and Merrell L. Patrick A sparse matrix algorithm on the Boolean vector machine . . . . . . . . . . . . . 359--371 U. Harms and H. Luttermann Experiences in benchmarking the three supercomputers CRAY-1M, CRAY-X/MP, FUJITSU VP-200 compared with the CYBER 76 . . . . . . . . . . . . . . . . . . . 373--382 S. C. Kak A two-layered mesh array for matrix multiplication . . . . . . . . . . . . . 383--385
D. A. Poplawski Mapping rings and grids onto the FPS T-Series hypercube . . . . . . . . . . . 1--10 F. Darema and D. A. George and V. A. Norton and G. F. Pfister A single-program-multiple-data computational model for EPEX/FORTRAN . . 11--24 Manfred Kunde and Hans-Werner Lang and Manfred Schimmler and Hartmut Schmeck and Heiko Schröder The instruction systolic array and its relation to other models of parallel computers . . . . . . . . . . . . . . . 25--39 F. C. Kampe and T. M. Nguyen Performance comparison of the Cray-2 and Cray X-MP on a class of seismic data processing algorithms . . . . . . . . . 41--53 B. Steffen Implementation of a resonant cavity package on MIMD computers . . . . . . . 55--63 H. Mühlenbein and M. Gorges-Schleuter and O. Kramer Evolution algorithms in combinatorial optimization . . . . . . . . . . . . . . 65--85 Peter H. Michielse and Henk A. van der Vorst Data transport in Wang's partition method . . . . . . . . . . . . . . . . . 87--95 Mark Goldmann Vectorisation of the multiple shooting method for the nonlinear boundary value problem in ordinary differential equations . . . . . . . . . . . . . . . 97--110 D. J. Evans and M. P. Bekakos The solution of linear systems by the QIF algorithm on a wavefront array processor . . . . . . . . . . . . . . . 111--130
J. M. Ortega The $ijk$ forms of factorization methods. I. Vector computers . . . . . . 135--147 J. M. Ortega and C. H. Romine The $ijk$ forms of factorization methods. II. Parallel systems . . . . . 149--162 J. T. Feo An analysis of the computational and parallel complexity of the Livermore loops . . . . . . . . . . . . . . . . . 163--185 R. G. Babb and L. Storc and R. Hiromoto Developing a parallel Monte Carlo transport algorithm using large-grain data flow . . . . . . . . . . . . . . . 187--198 Earl Zmijewski and John R. Gilbert A parallel algorithm for sparse symbolic Cholesky factorization on a multiprocessor . . . . . . . . . . . . . 199--210 J. A. Kapenga and E. de Doncker A parallelization of adaptive task partitioning algorithms . . . . . . . . 211--225 J. L. Gaudiot and J. I. Pi and M. L. Campbell Program graph allocation in distributed multicomputers . . . . . . . . . . . . . 227--247 I. Stojmenovic and M. Miyakawa An optimal parallel algorithm for solving the maximal elements problem in the plane . . . . . . . . . . . . . . . 249--251 Yves Robert and Denis Trystram Comments on scheduling parallel iterative methods on multiprocessor systems . . . . . . . . . . . . . . . . 253--255
Anonymous 2nd International SUPRENUM Colloquium ?? K. Solchenbach and U. Trottenberg SUPRENUM: system essentials and grid applications . . . . . . . . . . . . . . 265--281 W. K. Giloi SUPRENUM: a trendsetter in modern supercomputer development . . . . . . . 283--296 K. Peinze The SUPRENUM preprototype: status and experiences . . . . . . . . . . . . . . 297--313 H. Kammer The SUPRENUM vector floating-point unit 315--323 W. Schroder PEACE: the distributed SUPRENUM operating system . . . . . . . . . . . . 325--333 G. Schaffler Connecting PEACE to UNIX . . . . . . . . 335--339 K. Solchenbach Grid applications on distributed memory architectures: implementation and evaluation . . . . . . . . . . . . . . . 341--356 O. Kolp and H. Mierendorff Performance estimations for SUPRENUM systems . . . . . . . . . . . . . . . . 357--366 M. D. Ercegovac Heterogeneity in supercomputer architectures . . . . . . . . . . . . . 367--372 F. Hossfeld Vector-supercomputers . . . . . . . . . 373--385 U. Kremer and H.-J. Bast and M. Gerndt and H. P. Zima Advanced tools and techniques for automatic parallelization . . . . . . . 387--393 L. Lehmann and F. Hopfl A model of distributed recovery for the SUPRENUM multiprocessor . . . . . . . . 395--401 B. Franke and R. Harneit and A. Kern and H. C. Zeidler The pipeline bus: an interconnection network for multiprocessor systems . . . 403--412 W. Ronsch and H. Strauss A linear algebra package for a local memory multiprocessor: problems, proposals and solutions . . . . . . . . 413--418 I. Gutheil SUPRENUM software for the symmetric eigenvalue problem . . . . . . . . . . . 419--424 U. Herzog Performance evaluation principles for vector- and multiprocessor systems . . . 425--438 R. Williams Free-Lagrange hydrodynamics with a distributed-memory parallel processor 439--443 D. Seldner and M. Alef and T. Westermann and E. Halter Parallel particle simulation in high voltage diodes (algorithms and concepts for implementation on SUPRENUM) . . . . 445--449 H. Capdevila Solution of $2$-D Euler equations with a parallel code . . . . . . . . . . . . . 451--460 J. Linden and B. Steckel and K. Stuben Parallel multigrid solution of the Navier--Stokes equations on general 2D domains . . . . . . . . . . . . . . . . 461--475 O. A. McBryan New architectures: performance highlights and new algorithms . . . . . 477--499
Anonymous International Conference on Vector and Parallel Processors in Computational Science III . . . . . . . . . . . . . . ?? A. Kashko and H. Buxton and B. F. Buxton and D. A. Castelow Parallel matching and reconstruction algorithms in computer vision . . . . . 3--17 C. Jesshope Transputers and switches as objects in OCCAM . . . . . . . . . . . . . . . . . 19--30 H. F. Jordan Programming language concepts for multiprocessors . . . . . . . . . . . . 31--40 J. J. Dongarra and D. C. Sorensen and K. Connolly and J. Patterson Programming methodology and performance issues for advanced computer architectures . . . . . . . . . . . . . 41--58 P. C. Treleaven Parallel architecture overview . . . . . 59--70 B. M. Forrest and D. Roweth and N. Stroud and D. J. Wallace and G. V. Wilson Neural network models . . . . . . . . . 71--83 R. G. Babb, II and L. Storc and P. G. Eltgroth Parallelization schemes for $2$-D hydrodynamics codes using the independent time step method . . . . . . 85--89 K. Miura and R. G. Babb, II Tradeoffs in granularity and parallelization for a Monte Carlo shower simulation code . . . . . . . . . . . . 91--100 C. F. Baillie Comparing shared and distributed memory computers . . . . . . . . . . . . . . . 101--110 Thomas Brandes Determination of dependencies in a knowledge-based parallelization tool . . 111--119 G. Carver A spectral meteorological method on the ICL DAP . . . . . . . . . . . . . . . . 121--126 M. Clint and D. Roantree and A. Stewart Towards the construction of an eigenvalue engine . . . . . . . . . . . 127--132 A. Corona and C. Martini and M. Morando and S. Ridella and C. Rolando Solving linear equation systems on vector computers with maximum efficiency 133--139 D. Crookes and P. J. Morrow and P. Milligan and P. L. Kilpatrick and N. S. Scott An array processing language for transputer networks . . . . . . . . . . 141--148 D. Dent and M. O'Neill Microtasking as a complement to macrotasking . . . . . . . . . . . . . . 149--154 Peter G. Eltgroth and Mark K. Seager The sub-implicit method: new multiprocessor algorithms for old implicit codes . . . . . . . . . . . . . 155--163 R. Francis and I. Mathieson Synchronised execution on shared memory multiprocessors . . . . . . . . . . . . 165--175 R. Gurke The approximate solution of the Euclidean traveling salesman problem on a CRAY X-MP . . . . . . . . . . . . . . 177--183 A. Inoue and A. Maeda The architecture of a multi-vector processor system, VVP . . . . . . . . . 185--193 T. Legendi and E. Katona and J. Toth and A. Zsoter Megacell machine . . . . . . . . . . . . 195--199 H. Mühlenbein and O. Kramer and F. Limburger and M. Mevenkamp and S. Streitz MUPPET: a programming environment for message-based multiprocessors . . . . . 201--221 W. E. Nagel Using multiple CPUs for problem solving: experiences in multitasking on the CRAY X-MP/48 . . . . . . . . . . . . . . . . 223--230 S. Katz and W. A. Ray and G. Walder Multiprocessor software for the CYBERPLUS high performance system . . . 231--244 J. B. G. Roberts and J. G. Harp and B. C. Merrifield and K. J. Palmer and P. Simpson and J. S. Ward and H. C. Webber Evaluating parallel processors for real-time applications . . . . . . . . . 245--254 D. F. Snelling and G.-R. Hoffmann A comparative study of libraries for parallel processing . . . . . . . . . . 255--266 D. A. Tanqueray and D. F. Snelling A distributed self-scheduler for partially ordered tasks . . . . . . . . 267--273 R. Wait Partitioning and preconditioning of finite element matrices on the DAP . . . 275--284 H. J. Wasserman and M. L. Simmons and O. M. Lubeck The performance of minisupercomputers: Alliant FX/8, Convex C-1, and SCS-40 . . 285--293 A. T. Brint and V. J. Gillet and M. F. Lynch and P. Willett and G. A. Manson and G. A. Wilson Chemical graph matching using transputer networks . . . . . . . . . . . . . . . . 295--300 Z. Zlatev and Phuong Vu and J. Wasniewski and K. Schaumburg Computations with symmetric, positive definite and band matrices on a parallel vector processor . . . . . . . . . . . . 301--312 J. Berntsen and T. O. Espelid A parallel global adaptive quadrature algorithm for hypercubes . . . . . . . . 313--323 R. Wait and N. G. Brown Overlapping block methods for solving tridiagonal systems on transputer arrays 325--333 A. J. Davies The boundary element method on the ICL DAP . . . . . . . . . . . . . . . . . . 335--343 J. J. Du Croz and P. J. D. Mayes and J. Wasniewski and S. Wilson Applications of Level 2 BLAS in the NAG library . . . . . . . . . . . . . . . . 345--350 C. H. Lai and H. M. Liddell Finite elements using long vectors of the DAP . . . . . . . . . . . . . . . . 351--361 A. McKerrell and L. M. Delves Monte Carlo simulation of neutron diffusion on SIMD architectures . . . . 363--370 R. Reuter Solving tridiagonal systems of linear equations on the IBM 3090 VF . . . . . . 371--376 G. Radicati and Y. Robert and P. Sguazzero Dense linear systems FORTRAN solvers on the IBM 3090 vector multiprocessor . . . 377--384 C. Froese Fischer and N. S. Scott and J. Yoo Multitasking the calculation of angular integrals on the CRAY-2 and CRAY X-MP 385--390 H. Finnemann and J. Brehm and E. Michel and J. Volkert Solution of the neutron diffusion equation through multigrid methods implemented on a memory-coupled 25-processor system . . . . . . . . . . 391--398 C. A. Pogue and E. M. Rasmussen and P. Willett Searching and clustering of databases using the ICL distributed array processor . . . . . . . . . . . . . . . 399--407 D. F. Snelling Standard FORTRAN 77 as a parallel language . . . . . . . . . . . . . . . . 409--414
O. A. McBryan The Connection Machine: PDE solution on 65536 processors . . . . . . . . . . . . 1--24 O. Brewer and J. Dongarra and D. Sorensen Tools to aid in the analysis of memory access patterns for FORTRAN programs . . 25--35 O. M. Lubeck and V. Faber Modeling the performance of hypercubes: a case study using the particle-in-cell application . . . . . . . . . . . . . . 37--52 T. Hoshino and R. Hiromoto and S. Sekiguchi and S. Majima Mapping schemes of the particle-in-cell method implemented on the PAX computer 53--75 Dieter Müller-Wichards Performance estimates for applications: an algebraic framework . . . . . . . . . 77--106 W. Gentzsch and F. Szelenyi and V. Zecca Use of parallel FORTRAN for engineering problems on the IBM 3090 vector multiprocessor . . . . . . . . . . . . . 107--115
Emile H. L. Aarts and Jan H. M. Korst Computations in massively parallel networks based on the Boltzmann machine: a review . . . . . . . . . . . . . . . . 129--145 J. K. Annot A deadlock free and starvation free network of packet switching communication processors . . . . . . . . 147--162 H. P. Barendregt and M. C. J. D. Van Eekelen and M. J. Plasmeijer and J. R. W. Glauert and J. R. Kennaway and M. R. Sleep LEAN: an intermediate language based on graph rewriting . . . . . . . . . . . . 163--177 D. I. Bevan An efficient reference counting solution to the distributed garbage collection problem . . . . . . . . . . . . . . . . 179--192 W. Damm and G. Dohmen Specifying distributed computer architectures in AADL . . . . . . . . . 193--211 O. Krämer and H. Mühlenbein Mapping strategies in message-based multiprocessor systems . . . . . . . . . 213--225 A. R. Martin and J. V. Tucker The concurrent assignment representation of synchronous systems . . . . . . . . . 227--256 P. H. Welch Emulating digital logic using transputer networks (very high parallelism=simplicity=performance) . . 257--272
R. Hockney Synchronization and communication overheads on the LCAP multiple FPS-164 computer system . . . . . . . . . . . . 279--290 Chandrika Kamath and Ahmed Sameh A projection method for solving nonsymmetric linear systems on multiprocessors . . . . . . . . . . . . 291--312 Robert A. Wagner Parallel solution of arbitrarily sparse linear systems . . . . . . . . . . . . . 313--331 Loyce M. Adams and Elizabeth G. Ong Additive polynomial preconditioners for parallel computers . . . . . . . . . . . 333--345 Israel Gottlieb The partitioning of QSDF computation graphs . . . . . . . . . . . . . . . . . 347--358 Ilio Galligani and Valeria Ruggiero Solving large systems of linear ordinary differential equations on a vector computer . . . . . . . . . . . . . . . . 359--365 M. Bessenrodt-Weberpals and H. Weberpals A fast vector algorithm for solving tridiagonal linear equations . . . . . . 367--372 D. J. Evans and K. Margaritis and M. P. Bekakos Systolic and holographic pyramidical soft-systolic designs for successive matrix powers . . . . . . . . . . . . . 373--384 M. Cosnard and A. G. Ferreira and H. Herbelin The two list algorithm for the knapsack problem on a FPS T20 . . . . . . . . . . 385--388
A. Greenbaum Synchronization costs on multiprocessors 3--14 Th. Ruppelt and G. Wirtz Automatic transformation of high-level object-oriented specifications into parallel programs . . . . . . . . . . . 15--28 C. McCrosky Realizing the parallelism of array-based computation . . . . . . . . . . . . . . 29--43 Y. Wolfstahl Mapping parallel programs to multiprocessors: a dynamic approach . . 45--50 J. Gary and L. Fosdick An optimizing precompiler for finite-difference computations on a vector computer . . . . . . . . . . . . 51--64 J.-F. Hake and W. Homberg Linear algebra software on a vector computer . . . . . . . . . . . . . . . . 65--81 Aydin Üresin and Michel Dubois Sufficient conditions for the convergence of asynchronous iterations 83--92 V. Eijkhout and P. Vassilevski Positive definiteness aspects of vectorizable preconditioners . . . . . . 93--100 Susumu Horiguchi and Willard L. Miranker A parallel algorithm for finding the maximum value . . . . . . . . . . . . . 101--108 Kam-Hoi Cheng and Sartaj Sahni A new VLSI system for adaptive recursive filtering . . . . . . . . . . . . . . . 109--115 Michel Cosnard and Maurice Tchuente and Bernard Tourancheau Systolic Gauss--Jordan elimination for dense linear systems . . . . . . . . . . 117--122 H. M. Amman Nonlinear control simulation on a vector machine . . . . . . . . . . . . . . . . 123--127
Nigel Dodd Graph matching by stochastic optimisation applied to the implementation of multi layer perceptrons on transputer networks . . . 135--142 E. Gallopoulos and Y. Saad A parallel block cyclic reduction algorithm for the fast solution of elliptic equations . . . . . . . . . . . 143--159 Wolfgang Pelz and Layne T. Watson Message length effects for solving polynomial systems on a hypercube . . . 161--176 Michael R. Leuze Independent set orderings for parallel matrix factorization by Gaussian elimination . . . . . . . . . . . . . . 177--191 D. J. Evans and A. M. S. Rahma The numerical solution of Fredholm integral equations on parallel computers 193--205 G. M. Megson and D. J. Evans Algorithmic fault tolerance for matrix operations on triangular arrays . . . . 207--219 S. Storoy Holistic algorithms: a paradigm for multiprocessor programming . . . . . . . 221--229 Ferng-Ching Lin and R. Charng Pin reduction through variable duplications and substitutions in a data dependence graph . . . . . . . . . . . . 231--238 John M. Conroy A note on the parallel Cholesky factorization of wide banded matrices 239--246 M. Cosnard and Y. Robert and B. Tourancheau Evaluating speedups on distributed memory architectures . . . . . . . . . . 247--253
J. J. Hack On the promise of general-purpose parallel computing . . . . . . . . . . . 261--275 R. W. Hockney and I. J. Curington $f_{1/2}$: a parameter to characterize memory and communication bottlenecks . . 277--286 Alan George and Joseph W. H. Liu and Esmond Ng Communication results for parallel sparse Cholesky factorization on a hypercube . . . . . . . . . . . . . . . 287--298 R. M. Hyatt and B. W. Suter and H. L. Nelson A parallel alpha/beta tree searching algorithm . . . . . . . . . . . . . . . 299--308 Tao Li Parallel implementation of rule-based expert systems for interactive applications . . . . . . . . . . . . . . 309--318 M. Malek and E. Opper The cylindrical banyan multicomputer: a reconfigurable systolic architecture . . 319--327 C. Holt and A. Stewart A parallel thinning algorithm with fine grain subtasking . . . . . . . . . . . . 329--334 James R. A. Allwright and D. B. Carpenter A distributed implementation of simulated annealing for the travelling salesman problem . . . . . . . . . . . . 335--338 G. de Biase and P. Ciucci and M. Cottone Vectorized algorithms for astronomical image processing . . . . . . . . . . . . 339--346 Jong-Chuang Tsay and Yodung-Chang Hou Generating function and equivalent transformation for systolic arrays . . . 347--356 M. P. Bekakos and D. J. Evans Relative performance comparisons for the group explicit class of methods on MIMD, SIMD and pipelined vector computers . . 357--364
K. Ohmaki and S. Tomura and K. Inoue and T. Ito and K. Ito and K. Torii TERM: a parallel executable graph reduction machine for equational language . . . . . . . . . . . . . . . . 1--16 M. E. Henderson and W. L. Miranker Synergy in parallel algorithms . . . . . 17--35 A. T. Chronopoulos and C. W. Gear On the efficient implementation of preconditioned $s$-step conjugate gradient methods on multiprocessors with memory hierarchy . . . . . . . . . . . . 37--53 Eleanor Chu and Alan George $QR$ factorization of a dense matrix on a shared-memory multiprocessor . . . . . 55--71 Joseph W. H. Liu Reordering sparse matrices for parallel elimination . . . . . . . . . . . . . . 73--91 David A. Carlson and Binay Sugla Adapting shuffle-exchange like parallel processing organizations to work as systolic arrays . . . . . . . . . . . . 93--106 C. Temperton Further measurements of $(r_\infty,n_{1/2})$ on the CRAY-1 and CRAY X-MP . . . . . . . . . . . . . . . 107--111 C. Lecot An algorithm for generating low discrepancy sequences on vector computers . . . . . . . . . . . . . . . 113--116 J. Moscinski and Z. A. Rycerz and P. W. M. Jacobs Timing results of some internal sorting algorithms on the ETA 10-P . . . . . . . 117--119 J. M. Troya and M. Ortega A study of parallel branch-and-bound algorithms with best-bound-first search 121--126
Youcef Saad and Martin H. Schultz Data communication in parallel architectures . . . . . . . . . . . . . 131--150 A. M. Frieze and J. Yadegar and S. El-Horbaty and D. Parkinson Algorithms for assignment problems on an array processor . . . . . . . . . . . . 151--162 E. Adamides and Ph. Tsalides and A. Thanailakis Synchronization of asynchronous concurrent processes using cellular automata . . . . . . . . . . . . . . . . 163--169 Christian H. Bischof Computing the singular value decomposition on a distributed system of vector processors . . . . . . . . . . . 171--186 J. P. Bonomo and W. R. Dyksen Pipelined iterative methods for shared memory machines . . . . . . . . . . . . 187--199 Gita Alaghband Parallel pivoting combined with parallel reduction and fill-in control . . . . . 201--221 G. Radicati di Brozolo and Y. Robert Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor . . . . . . . . . . . . . 223--239 Stanley C. Eisenstat Comments on scheduling parallel iterative methods on multiprocessor systems. II . . . . . . . . . . . . . . 241--244 D. J. Evans and B. B. Sanugi A parallel Runge--Kutta integration method . . . . . . . . . . . . . . . . . 245--251
David E. Womble and Richard C. Allen, Jr. and Lorraine S. Baca Invariant imbedding and the method of lines for parallel computers . . . . . . 263--273 Xiaobo Li and Zhi Xi Fang Parallel clustering algorithms . . . . . 275--290 E. L. Zapatam and F. F. Rivera and O. G. Plata and M. A. Ismail Parallel fuzzy clustering on fixed size hypercube SIMD computers . . . . . . . . 291--303 D. W. Lozier and R. G. Rehm Some performance comparisons for a fluid dynamics code . . . . . . . . . . . . . 305--320 G. S. Pawley and C. F. Baillie and E. Tenenbaum and W. Celmaster The BBN Butterfly used to simulate a molecular liquid . . . . . . . . . . . . 321--329 J. Glasgow and M. Jenkins and H. Meijer and C. McCrosky Expressing parallel algorithms in Nial 331--347 V. K. Murthy and H. Schröder Systolic arrays for parallel matrix $g$-inversion and finding Petri net invariants . . . . . . . . . . . . . . . 349--359 W. Ewinger and O. Haan and E. Haupenthal and C. Siemers Modelling and measurement of memory access in SIEMENS VP supercomputers . . 361--365 I.-C. Chang Jou Linear rotation based algorithm and systolic architecture for solving linear system equations . . . . . . . . . . . . 367--379 Jang-Ping Sheu and Chun-lien Wu and Gen-Huey Chen Selection of the first k largest processes in hypercubes . . . . . . . . 381--384 D. J. Evans A systolic design for the Aitken extrapolation formula . . . . . . . . . 385--388
Horace P. Flatt and Ken Kennedy Performance of parallel processors . . . 1--20 L. Brochard Efficiency of some parallel numerical algorithms on distributed systems . . . 21--44 R. S. Barr and R. V. Helgaon and J. L. Kennington Minimal spanning trees: an empirical investigation of parallel algorithms . . 45--52 Kam Hoi Cheng and S. Sahni VLSI architectures for back substitution 53--69 Hussein M. Alnuweiri and V. K. Prasanna Kumar An efficient VLSI architecture with applications to geometric problems . . . 71--93 E. Babolian and L. M. Delves Parallel solution of Fredholm integral equations . . . . . . . . . . . . . . . 95--106 Manfred Schimmler and Heiko Schröder A simple systolic method to find all bridges of an undirected graph . . . . . 107--111 H. Bohr and K. S. Jensen and T. Petersen and B. Rathjen and E. Mosekilde and N.-H. Holstein-Rathlou Parallel computer simulation of nearest-neighbour interaction in a system of nephrons . . . . . . . . . . . 113--120 David J. Evans and Ivan Stojmenovi\'c On parallel computation of Vorono\u\i diagrams . . . . . . . . . . . . . . . . 121--125
L. Hart and S. McCormick Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors: basic ideas . . . . . . . . . . . . . . . . . 131--144 S. McCormick and D. Quinlan Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors: performance results . . . . . . . . . . 145--156 N. S. Arenstorf and H. F. Jordan Comparing barrier algorithms . . . . . . 157--170 Theodore S. Papatheodorou and Yiannis G. Saridakis Parallel algorithms and architectures for multisplitting iterative methods . . 171--182 Mounir Marrakchi and Yves Robert Optimal algorithms for Gaussian elimination on an MIMD computer . . . . 183--194 Concettina Guerra and Rami Melhem Synthesis of systolic algorithm design 195--207 C. F. Baillie and G. S. Pawley A comparison of the CM with the DAP for lattice gauge theory . . . . . . . . . . 209--220 Frank Dehne and Anne-Lise Hassenklover and Jörg-Rüdiger Sack Computing the configuration space for a robot on a mesh-of-processors . . . . . 221--231 Hans-Jürgen Hotop New Kalman filter algorithms based on orthogonal transformations for serial and vector computers . . . . . . . . . . 233--247 A. Benaini and Y. Robert An even faster systolic array for matrix multiplication . . . . . . . . . . . . . 249--254
F. Hossfeld and R. Knecht and W. E. Nagel Multitasking: experience with applications on a CRAY X-MP . . . . . . 259--283 Hiroshi Umeo A design of time-optimum and register-number-minimum systolic convolvers . . . . . . . . . . . . . . . 285--299 N. Petkov and F. Sloboda A bit-level systolic array for digital contour smoothing . . . . . . . . . . . 301--313 E. Eskow and R. B. Schnabel Mathematical modeling of a parallel global optimization algorithm . . . . . 315--325 P. Fernandes and P. Girdinio A new storage scheme for an efficient implementation of the sparse matrix-vector product . . . . . . . . . 327--333 J. Berntsen Communication efficient matrix multiplication on hypercubes . . . . . . 335--342 I. Gladwell and R. I. Hay Vector- and parallelisation of ODE BVP codes . . . . . . . . . . . . . . . . . 343--350 T. L. Freeman Calculating polynomial zeros on a local memory parallel computer . . . . . . . . 351--358 George T. Papaspyropoulos and D. G. Maritsas Parallel discrete event simulation with SIMULA . . . . . . . . . . . . . . . . . 359--373 Tsung Chuan Huang and Jhing-Fa Wang and Chu Sing Yang and Jau-Yien Lee Graph theoretic characterization and reliability of the generalized Boolean $n$-cube network . . . . . . . . . . . . 375--385
P. Sadayappan and F. Ercal and J. Ramanujam Cluster partitioning approaches to mapping parallel programs onto a hypercube . . . . . . . . . . . . . . . 1--16 M. R. Exum and J. L. Gaudiot Network design and allocation considerations in the Hughes data-flow machine . . . . . . . . . . . . . . . . 17--34 P. Carnevali and M. Kindelan A simplified model to predict the performance of FORTRAN vector loops on the IBM 3090/VF . . . . . . . . . . . . 35--46 H. Weberpals Architectural approach to the IBM 3090E vector performance . . . . . . . . . . . 47--59 M. Zubair An optimal speedup algorithm for the measure problem . . . . . . . . . . . . 61--71 Ronald J. Leach and O. Michael Atogi and Razeyah R. Stephen The actual complexity of parallel evaluation of low degree polynomials . . 73--83 G. M. Megson Rank annihilation on a ring of processors . . . . . . . . . . . . . . . 85--94 J. Zerovnik A parallel variant of a heuristical algorithm for graph colouring . . . . . 95--100 Herbert Fischer Automatic differentiation: parallel computation of function, gradient, and Hessian matrix . . . . . . . . . . . . . 101--110 Gen-Huey Chen and Maw-Sheng Chern and Jin-Hwang Jang Pipeline architectures for dynamic programming algorithms . . . . . . . . . 111--117 J. C. Tsay and C. J. Lin A systolic design for generating combinations in lexicographic order . . 119--125
Z. C. Shih and R. C. T. Lee and S. N. Yang A parallel algorithm for finding congruent regions . . . . . . . . . . . 135--142 Sajal K. Das and Narsingh Deo and Sushil Prasad Parallel graph algorithms for hypercube computers . . . . . . . . . . . . . . . 143--158 H. Eckardt System performance and execution of scientific algorithms on the parallel computer Parawell . . . . . . . . . . . 159--173 R. R. Oldehoeft and J. R. McGraw Mixed applicative and imperative programs . . . . . . . . . . . . . . . . 175--191 A. De Matteis and S. Pagnutti A class of parallel random number generators . . . . . . . . . . . . . . . 193--198 G. A. Geist and G. J. Davis Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed-memory multiprocessor . . . 199--209 M. K. Stoj\vcev and E. I. Milovanovi\'c and I. \vZ. Milovanovi\'c An algorithm for multiplication of concatenated matrices . . . . . . . . . 211--223 W. E. Nagel Exploiting autotasking on a CRAY Y-MP: an improved software interface to multitasking . . . . . . . . . . . . . . 225--233 Gen Huey Chen and Hong Fa Ho and Shieu Hong Lin and Jang-Ping Sheu Data mapping of linear programming on fixed-size hypercubes . . . . . . . . . 235--243 Jang-Ping Sheu and Nan-Ling Kuo and Gen-Huey Chen Graph search algorithms and maximum bipartite matching algorithm on the hypercube network model . . . . . . . . 245--251 Chii Huah Shyu A parallel algorithm for finding a maximum weight clique of an interval graph . . . . . . . . . . . . . . . . . 253--256
J. Dantas De Melo and J. L. Calvet and J. M. Garcia Vectorization and multitasking of dynamic programming in control: experiments on a CRAY-2 . . . . . . . . 261--269 R. Morandi and F. Sgallari Parallel algorithms for the iterative solution of sparse least-squares problems . . . . . . . . . . . . . . . . 271--280 J. S. Weston and M. Clint Two algorithms for the parallel computation of eigenvalues and eigenvectors of large symmetric matrices using the ICL DAP . . . . . . . . . . . 281--288 Hyoung Joong Kim and Jang Gyu Lee A parallel algorithm solving a tridiagonal Toeplitz linear system . . . 289--294 S. J. Shyu and R. C. T. Lee Solving the set cover problem on a supercomputer . . . . . . . . . . . . . 295--300 E. V. Krishnamurthy and M. Kunde and M. Schimmler and H. Schröder Systolic algorithm for tensor products of matrices: implementation and applications . . . . . . . . . . . . . . 301--308 G. R. Gao Exploiting fine-grain parallelism on dataflow architectures . . . . . . . . . 309--320 R. Doallo and E. L. Zapata A VLSI Systolic Architecture for Solving DBT-Transformed Fuzzy Clustering Problems of Arbitrary Size . . . . . . . 321--335 P. Lenders and H. Schroder A programmable systolic device for image processing based on mathematical morphology . . . . . . . . . . . . . . . 337--344 D. W. Heermann and A. N. Burkitt Parallelization of the Ising model and its performance evaluation . . . . . . . 345--357 P. Michielse Parallel adaptive reservoir simulation 359--368 R. M. R. Page and S. F. Reddaway The DAP as a filestore search engine . . 369--376 Pierre Fraigniaud and Serge Miguet and Yves Robert Scattering on a ring of processors . . . 377--383
Pelle Olsson and S. Lennart Johnsson A dataparallel implementation of an explicit method for the three-dimensional compressible Navier--Stokes equations . . . . . . . . 1--30 Arno Krechel and Hans-Joachim Plum and Klaus Stüben Parallelization and vectorization aspects of the solution of tridiagonal linear systems . . . . . . . . . . . . . 31--49 F. F. Rivera and R. Doallo and J. D. Bruguera and E. L. Zapata and R. Peskin Gaussian elimination with pivoting on hypercubes . . . . . . . . . . . . . . . 51--60 U. Block and A. Frommer and G. Mayer Block colouring schemes for the SOR method on local memory parallel computers . . . . . . . . . . . . . . . 61--75 D. J. Evans and K. Margaritis Systolic designs for eigenvalue-eigenvector computations using matrix powers . . . . . . . . . . 77--87 Jau-Hsiung Huang and Leonard Kleinrock Optimal parallel merging and sorting algorithms using $\sqrt {N}$ processors without memory contention . . . . . . . 89--97 W. Hasselbring CELIP: a Cellular Language for Image Processing . . . . . . . . . . . . . . . 99--109 D. J. Evans A parallel sorting-merging algorithm for tightly coupled multiprocessors . . . . 111--121
Ramesh Natarajan A parallel algorithm for the generalized symmetric eigenvalue problem on a hybrid multiprocessor . . . . . . . . . . . . . 129--150 John R. Gilbert and Hjálmtýr Hafsteinsson Parallel symbolic factorization of sparse linear systems . . . . . . . . . 151--162 Sanjay V. Rajopadhye and Richard M. Fujimoto Synthesizing systolic arrays from recurrence equations . . . . . . . . . . 163--189 L. Brugnano and M. Marrone Vectorization of some block preconditioned conjugate gradient methods . . . . . . . . . . . . . . . . 191--198 G. M. Megson A systolic helix for matrix triangularisation with partial pivoting 199--206 A. de Matteis and S. Pagnutti Long-range correlations in linear and nonlinear random number generators . . . 207--210 J. Li and A. Brass and D. J. Ward and B. Robson A study of parallel molecular dynamics algorithms for $N$-body simulations on a transputer system . . . . . . . . . . . 211--222 Basile Louka and Maurice Tchuente Triangular matrix inversion on systolic arrays . . . . . . . . . . . . . . . . . 223--228 T. Theoharis and J. J. Modi Implementation of matrix multiplication on the T-RACK . . . . . . . . . . . . . 229--233 Liwu Li Systolic computation with fault diagnosis . . . . . . . . . . . . . . . 235--243
H. Mühlenbein Limitations of multi-layer perceptron networks-steps towards genetic neural networks . . . . . . . . . . . . . . . . 249--260 F. J. Smieja and H. Mühlenbein The geometry of multi-layer perceptron solutions . . . . . . . . . . . . . . . 261--275 J. Kindermann and A. Linden Inversion of neural networks by gradient descent . . . . . . . . . . . . . . . . 277--286 T. E. Lange Simulation of heterogeneous neural networks on serial and parallel machines 287--303 A. Singer Implementations of artificial neural networks on the Connection Machine . . . 305--315 Xiru Zhang and M. McKenna and J. P. Mesirov and D. L. Waltz The backpropagation algorithm on grid and hypercube architectures . . . . . . 317--327 M. Witbrock and M. Zagha An implementation of backpropagation learning on GF11, a large SIMD parallel computer . . . . . . . . . . . . . . . . 329--346 D. Whitley and T. Starkweather and C. Bogart Genetic algorithms and neural networks: optimizing connections and connectivity 347--361 M. F. da Mota Tenorio Topology synthesis networks: self-organization of structure and weight adjustment as a learning paradigm 363--380 K. Obermayer and H. Ritter and K. Schulten Large-scale simulations of self-organizing neural networks on parallel computers: application to biological modelling . . . . . . . . . . 381--404 R. W. Kentridge Neural networks for learning in the real world: representation, reinforcement and dynamics . . . . . . . . . . . . . . . . 405--414
S. Knecht and E. Laermann and W. E. Nagel Parallelizing QCD with dynamical fermions on a Cray multiprocessor system 3--20 Ibrahim N. Hajj and Stig Skelboe A multilevel parallel solver for block tridiagonal and banded linear systems 21--45 F. F. Van der Vlugt and D. A. van Delft and A. F. Bakker and T. H. van der Meer The implementation of a $3$D Navier--Stokes algorithm on an algorithm oriented processor . . . . . . . . . . . 47--60 Amir Averbuch and Eran Gabber and Boaz Gordissky and Yoav Medan A parallel FFT on an MIMD machine . . . 61--74 Michel Cosnard and Pierre Fraigniaud Finding the roots of a polynomial on an MIMD multicomputer . . . . . . . . . . . 75--85 I. Garcia and J. J. Merelo and J. D. Bruguera and E. L. Zapata Parallel quadrant interlocking factorization on hypercube computers . . 87--100 T. Z. Kalamboukis The symmetric tridiagonal eigenvalue problem on a transputer network . . . . 101--106 J. Boreddy and A. Paulraj On the performance of transputer arrays for dense linear systems . . . . . . . . 107--117 L. Bomans and D. Roose and R. Hempel The Argonne/GMD macros in FORTRAN for portable parallel programming and their implementation on the Intel iPSC/2 . . . 119--132 Igor \vZ. Milovanovi\'c and Emina I. Milovanovi\'c and Mile K. Stoj\vcev An optimal algorithm for Gaussian elimination of band matrices on an MIMD computer . . . . . . . . . . . . . . . . 133--145 Michael Thuné A partitioning strategy for explicit difference methods . . . . . . . . . . . 147--154 István Deák Uniform random number generators for parallel computers . . . . . . . . . . . 155--164 Peter J. Varman and Balakrishna R. Iyer and Donald J. Haderle and Stephen M. Dunn Parallel merging: algorithm and implementation results . . . . . . . . . 165--177 Sajal K. Das and Narsingh Deo and Sushil Prasad Two minimum spanning forest algorithms on fixed-size hypercube computers . . . 179--187 Ferng-Ching Lin and Kuo Liang Chung A cost-optimal parallel tridiagonal system solver . . . . . . . . . . . . . 189--199 F. Dehne and A. G. Ferreira and A. Rau-Chaplin Parallel branch and bound on fine-grained hypercube multiprocessors 201--209 Abdelhamid Benaini and Yves Robert Spacetime-minimal systolic arrays for Gaussian elimination and the algebraic path problem . . . . . . . . . . . . . . 211--225 K. Margaritis and D. J. Evans Systolic designs for Bernoulli's method 227--240 Sung Kwon Kim Parallel algorithms for planar dominance counting . . . . . . . . . . . . . . . . 241--246 D. Morris and C. J. Theaker and R. Phillips and D. G. Evans An experimental parallel system (EPS) 247--259 Evgenij E. Tyrtyshnikov New approaches to deriving parallel algorithms . . . . . . . . . . . . . . . 261--265 Chau-Jy Lin Parallel generation of permutations on systolic arrays . . . . . . . . . . . . 267--276 S. R. Das and N. H. Vaidya and L. M. Patnaik A systolic algorithm for hidden surface removal . . . . . . . . . . . . . . . . 277--289
Craig C. Douglas and Willard L. Miranker Beyond massive parallelism: numerical computation using associative tables . . 1--25 G. W. Stewart Communication and matrix computations on large message passing systems . . . . . 27--40 Chien Min Wang and Sheng-De Wang Structured partitioning of concurrent programs for execution on multiprocessors . . . . . . . . . . . . 41--57 Feng Gao and Beresford N. Parlett A note on communication analysis of parallel sparse Cholesky factorization on a hypercube . . . . . . . . . . . . . 59--60 Qian Ping Gu and Tadao Takaoka A sharper analysis of a parallel algorithm for the all pairs shortest path problem . . . . . . . . . . . . . . 61--67 Sathiamoorthy Manoharan and Nigel P. Topham A general bound on schedule length for independent tasks . . . . . . . . . . . 69--73 F. Dehne and M. Gastaldo A note on the load balancing problem for coarse grained hypercube dictionary machines . . . . . . . . . . . . . . . . 75--79 D. J. Evans and W. S. Yousif The implementation of the explicit block iterative methods on the Balance 8000 parallel computer . . . . . . . . . . . 81--97 D. P. O'Leary and P. Whitman Parallel $QR$ factorization by Householder and modified Gram--Schmidt algorithms . . . . . . . . . . . . . . . 99--112 M. F. X. B. van Swaaij and F. V. M. Catthoor and H. J. de Man Deriving ASIC architectures for the Hough transform . . . . . . . . . . . . 113--121
Eric F. Van de Velde Data redistribution and concurrency . . 125--138 John M. Conroy Parallel nested dissection . . . . . . . 139--156 Michael L. Dowling Optimal code parallelization using unimodular transformations . . . . . . . 157--171 B. Veltman and B. J. Lageweg and J. K. Lenstra Multiprocessor scheduling with communication delays . . . . . . . . . . 173--182 Jau Hsiung Huang and Leonard Kleinrock Distributed selectsort sorting algorithms on broadcast communication networks . . . . . . . . . . . . . . . . 183--190 G. M. Megson and D. J. Evans Systolic arrays for group explicit methods for solving first order hyperbolic equations . . . . . . . . . . 191--205 D. J. Evans and C. Li Successive underrelaxation (SUR) and generalised conjugate gradient (GCG) methods for hyperbolic difference equations on a parallel computer . . . . 207--220 Stephen J. Wright Solution of discrete-time optimal control problems on parallel computers 221--237 M. C. Counilh and J. Roman Expression for massively parallel algorithms-description and illustrative example . . . . . . . . . . . . . . . . 239--251 G. M. Megson and D. J. Evans An orthogonal systolic design for the assignment problem . . . . . . . . . . . 253--267 N. Dodd Slow annealing versus multiple fast annealing runs --- an empirical investigation . . . . . . . . . . . . . 269--272 Yen Chun Lin and Ferng-Ching Lin Parallel sorting with cooperating heaps in a linear array of processors . . . . 273--278 D. J. Evans and M. Adamopoulos and S. Kortesis and K. Tsouros Searching sets of properties with neural networks . . . . . . . . . . . . . . . . 279--285 T. Samad and P. Harper High-order Hopfield and Tank optimization networks . . . . . . . . . 287--292 Marc Garbey and David Levine Massively parallel computation of conservation laws . . . . . . . . . . . 293--304 K. Burrage An adaptive numerical integration code for a chain of transputers . . . . . . . 305--312 M. A. Baker and K. C. Bowler and R. D. Kenway MIMD implementations of linear solvers for oil reservoir simulation . . . . . . 313--334 A. Stewart and G. J. Shaw A parallel multigrid FAS scheme for transputer networks . . . . . . . . . . 335--342 S. J. Shyu and R. C. T. Lee The vectorization of the partition problem . . . . . . . . . . . . . . . . 343--350 Tanguy Risset Implementing Gaussian elimination on a matrix-matrix multiplication systolic array . . . . . . . . . . . . . . . . . 351--359 F. Reale A tridiagonal solver for massively parallel computer systems . . . . . . . 361--368 S. A. Levin A fully vectorized quicksort . . . . . . 369--373 C. Kamath and S. Weeratunga Implementation of two projection methods on a shared memory multiprocessor: DEC VAX 6240 . . . . . . . . . . . . . . . . 375--382
M. Alef Concepts for efficient multigrid implementation on SUPRENUM-like architectures . . . . . . . . . . . . . 1--16 S. Heydorn and P. Weidner Optimization and performance analysis of thinning algorithms on parallel computers . . . . . . . . . . . . . . . 17--27 P. Senechaud A MIMD Implementation of the Buchberger Algorithm for Boolean Polynomials . . . 29--37 N. Kockler and M. Simon Parallel singular value decomposition with cyclic storing . . . . . . . . . . 39--47 D. J. Evans and M. D. Levin A matrix-squaring variant of the power method on the DAP . . . . . . . . . . . 49--54 E. Bampis and J. C. Konig and D. Trystram Impact of communications on the complexity of the parallel Gaussian Elimination . . . . . . . . . . . . . . 55--61 S. Manoharan and P. Thanisch Assigning dependency graphs onto processor networks . . . . . . . . . . . 63--73 C.-J. Wang and V. P. Nelson Petri net performance modeling of a modified mesh-connected parallel computer . . . . . . . . . . . . . . . . 75--84 A. Torralba A systolic array with applications to image processing and wire-routing in VLSI circuits . . . . . . . . . . . . . 85--93 W. Dzwinel The search for an optimal multiprocessor interconnection network . . . . . . . . 95--100 M. Wheat and D. J. Evans Maintenance of shared data structures on tightly coupled multiprocessors . . . . 101--107 M. Simmen Comments on broadcast algorithms for two-dimensional grids . . . . . . . . . 109--112
Roland A. Sweet and William L. Briggs and Suely Oliveira and Jules L. Porsche and Tom Turnbull FFTs and three-dimensional Poisson solvers for hypercubes . . . . . . . . . 121--131 Marcin Paprzyck and Ian Gladwell Solving almost block diagonal systems on parallel computers . . . . . . . . . . . 133--153 P. Tervola and W. Yeung Parallel Jacobi algorithm for matrix diagonalisation on transputer networks 155--163 D. J. Evans and Wang Deren An asynchronous parallel algorithm for solving a class of nonlinear simultaneous equations . . . . . . . . . 165--180 S. M. Muller and D. Scheerer A method to parallelize tridiagonal solvers . . . . . . . . . . . . . . . . 181--188 F. A. Rabhi and G. A. Manson Divide-and-conquer and parallel graph reduction . . . . . . . . . . . . . . . 189--205 H. Schroder and P. Strazdins Program compression on the instruction systolic array . . . . . . . . . . . . . 207--219 Chang-Sung Jeong and Myung-Ho Kim Fast parallel simulated annealing for traveling salesman problem on SIMD machines with linear interconnections 221--228 Pao-Hsu Shih and Wu-Shung Feng An application of neural networks on channel routing problem . . . . . . . . 229--240 Chang-Sung Jeong Parallel Vorono\u\i diagram in ${L}_1({L}_\infty)$ metric on a mesh-connected computer . . . . . . . . 241--252 L. Bacchelli Montefusco and C. Guerrini A domain decomposition method for scattered data approximation on a distributed memory multiprocessor . . . 253--263 Hong Zhang On the accuracy of the parallel diagonal dominant algorithm . . . . . . . . . . . 265--272 H. Schröder and E. V. Krishnamurthy Systolic computation of characteristic polynomials of Hessenberg matrices . . . 273--277 Gen Huey Chen and Maw Sheng Chern Synthesis of algorithms on processor arrays . . . . . . . . . . . . . . . . . 279--284 R. J. van der Pas and J. M. van Kats Parallelism in a multi-user environment 285--296 N. Honjou and K. Ohtsuki and M. Sekiya and F. Sasaki A parallelization technique for the speedup of configuration interaction computing . . . . . . . . . . . . . . . 297--310 J.-Fr. Hake and W. Homberg The impact of memory organization on the performance of matrix calculations . . . 311--327 H. Schwandt Memory access problems in block cyclic reduction on vector computers . . . . . 329--346 M. Kiehl A vector implementation of an ODE code for multi-point-boundary-value problems 347--352
T. Tollenaere and G. A. Orban Simulating modular neural networks on message-passing multiprocessors . . . . 361--379 Xiaobo Li Nearest neighbor classification on two types of SIMD machines . . . . . . . . . 381--407 Ilan Bar-On Efficient logarithmic time parallel algorithms for the Cholesky decomposition and Gram--Schmidt process 409--417 S. Bondeli Divide and conquer: a parallel algorithm for the solution of a tridiagonal linear system of equations . . . . . . . . . . 419--434 Fridrich Sloboda A projection method of the Cimmino type for linear algebraic systems . . . . . . 435--442 E. Taillard Robust taboo search for the quadratic assignment problem . . . . . . . . . . . 443--455 Yen-Chun Lin An FP-based tool for the synthesis of regular array algorithms . . . . . . . . 457--470 Z. Mahjoub and F. Karoui-Sahtout Parallel algorithms for redundant precedence relations elimination in task systems . . . . . . . . . . . . . . . . 471--481 E. V. Krishnamurthy and H. Schröder Systolic algorithm for multivariable approximation using tensor products of basis functions . . . . . . . . . . . . 483--492 H. Schroder and V. K. Murthy and E. V. Krishnamurthy Systolic algorithm for polynomial interpolation and related problems . . . 493--503 Chang-Sung Jeong An improved parallel algorithm for constructing Vorono\u\i diagram on a mesh-connected computer . . . . . . . . 505--514 Yen-Chun Lin Array size anomaly of problem-size independent systolic arrays for matrix-vector multiplication . . . . . . 515--522 S. Storoy and T. Sorevik A note on an orthogonal systolic design for the assignment problem . . . . . . . 523--525 Sajal K. Das and Cui-Qing Yang Performance of parallel spanning tree algorithms on linear arrays of transputers and Unix systems . . . . . . 527--551 G. Pini A parallel algorithm for the partial eigensolution of sparse symmetric matrices on the CRAY Y-MP . . . . . . . 553--561 I. Gohberg and I. Koltracht and A. Averbuch and B. Shoham Timing analysis of a parallel algorithm for Toeplitz matrices on a MIMD parallel machine . . . . . . . . . . . . . . . . 563--577 U. Detert and G. Hofemann CRAY X-MP and Y-MP memory performance 579--590 M. D. Levin and D. J. Evans The inversion of matrices by the double-bordering algorithm on MIMD computers . . . . . . . . . . . . . . . 591--602
Paul N. Swarztrauber and Roland A. Sweet and William L. Briggs and Van Emden Henson and James Otto Bluestein's FFT for arbitrary ${N}$ on the hypercube . . . . . . . . . . . . . 607--617 H. Mühlenbein and M. Schomisch and J. Born The parallel genetic algorithm as function optimizer . . . . . . . . . . . 619--632 V. V. R. Prasad and C. Siva Ram Murthy Downloading node programs/data into hypercubes . . . . . . . . . . . . . . . 633--642 Constantine N. K. Osiakwan and Selim G. Akl Parallel computation of matchings in trees . . . . . . . . . . . . . . . . . 643--656 Manfred Schimmler Parallel strong orientation on a mesh connected computer . . . . . . . . . . . 657--664 Michael Thuné Straightforward partitioning of composite grids for explicit difference methods . . . . . . . . . . . . . . . . 665--672 T. L. Freeman and M. K. Bane Asynchronous polynomial zero-finding algorithms . . . . . . . . . . . . . . . 673--681 Stephan Olariu and Zhaofang Wen and Wei Xiong Zhang A faster optimal algorithm for the measure problem . . . . . . . . . . . . 683--687 S. Olariu and Z. Wen An efficient parallel algorithm for multiselection . . . . . . . . . . . . . 689--693 D. Fischer On superlinear speedups . . . . . . . . 695--697 J. Hagemann Combinatorial structures for multiprocessor-systems . . . . . . . . . 699--706 D. P. Bertsekas and D. A. Castanon Parallel synchronous and asynchronous implementations of the auction algorithm 707--732 D. Moncrieff and V. R. Saunders and S. Wilson Parallel processing using macro-tasking in a multi-job environment on a CRAY Y-MP computer . . . . . . . . . . . . . 733--750 C. Phillips The performance of the BLAS and LAPACK on a shared memory scalar multiprocessor 751--761 S. K. Kim and A. T. Chronopoulos A class of Lanczos-like algorithms implemented on parallel computers . . . 763--778 K. Wright Parallel algorithms for $QR$ decomposition on a shared memory multiprocessor . . . . . . . . . . . . . 779--790 F. Wiegand and B. S. Hoyle Development and implementation of real-time ultrasound process tomography using a transputer network . . . . . . . 791--807 A. Corana and A. Casaleggio and C. Rolando and S. Ridella Efficient computation of the correlation dimension from a time series on a LIW computer . . . . . . . . . . . . . . . . 809--820 C.-H. Wu and R. E. Hodges and C. J. Wang Parallelizing the self-organizing feature map on multiprocessor systems 821--832 D. J. Evans and S. Chikohora The alternating group explicit (AGE) method on a transputer network . . . . . 833--843
V. Topkar and O. Frieder and A. K. Sood Duplicate removal on hypercube engines: an experimental analysis . . . . . . . . 845--871 E. D. Chajakis and S. A. Zenios Synchronous and asynchronous implementations of relaxation algorithms for nonlinear network optimization . . . 873--894 Y. Huang and Y. Paker A parallel FFT algorithm for transputer networks . . . . . . . . . . . . . . . . 895--906 E. Francomano and A. Pecorella and A. Tortorici Macaluso Parallel experience on the inverse matrix computation . . . . . . . . . . . 907--912 H. Park A parallel algorithm for the unbalanced orthogonal Procrustes problem . . . . . 913--923 D. J. Evans The parallel AGE method for the elliptic problem in two dimensions . . . . . . . 925--940 Y.-H. Choi Reconfigurable VLSI/WSI multipipelines 941--952
D. Hutchinson and B. M. S. Khalaf Parallel algorithms for solving initial value problems: front broadening and embedded parallelism . . . . . . . . . . 957--968 A. De Gloria and P. Faraboschi A Boltzmann Machine approach to code optimization . . . . . . . . . . . . . . 969--982 Wen Tsuen Chen and Ming Yi Fang An efficient procedure for theorem proving in propositional logic on vector computers . . . . . . . . . . . . . . . 983--995 S. Horiguchi Hybrid systolic sorters . . . . . . . . 997--1007 S. Selvakumar and C. Siva Ram Murthy An efficient algorithm for mapping VLSI circuit simulation programs onto multiprocessors . . . . . . . . . . . . 1009--1016 L. Brugnano A parallel solver for tridiagonal linear systems for distributed memory parallel computers . . . . . . . . . . . . . . . 1017--1023 V. R. Saunders and S. Wilson ``Scavenger'' programming for the CRAY X-MP computer (Short communication) . . 1025--1034 M. Wheat and D. J. Evans Asynchronous parallel merging . . . . . 1035--1041 L. C. Waring and M. Clint Parallel Gram--Schmidt orthogonalisation on a network of transputers . . . . . . 1043--1050 J. Erhel and A. Traynard and M. Vidrascu An element-by-element preconditioned conjugate gradient method implemented on a vector computer . . . . . . . . . . . 1051--1065
J. Worlton Toward a taxonomy of performance metrics 1073--1092 Xian-He Sun and J. L. Gustafson Toward a better parallel performance metric . . . . . . . . . . . . . . . . . 1093--1109 R. Hockney Performance parameters and benchmarking of supercomputers . . . . . . . . . . . 1111--1130 W. Schonauer and H. Hafner Performance estimates for supercomputers: the responsibilities of the manufacturer and of the user . . . . 1131--1149 R. P. Weicker A detailed look at some popular benchmarks . . . . . . . . . . . . . . . 1153--1172 M. Berry and G. Cybenko and J. Larson Scientific benchmark characterizations 1173--1194 K. M. Dixit The SPEC benchmarks . . . . . . . . . . 1195--1209 A. J. van der Steen The benchmark of the EuroBen group . . . 1211--1221 D. Levine and D. Callahan and J. Dongarra A comparative study of automatic vectorizing compilers . . . . . . . . . 1223--1244 J. Dongarra and M. Furtney and S. Reinhardt and J. Russell Parallel loops --- a test suite for parallelizing compilers: description and example results . . . . . . . . . . . . 1247--1255 C. M. Grassl Parallel performance of applications on supercomputers . . . . . . . . . . . . . 1257--1273 A. J. G. Hey The Genesis distributed memory benchmarks . . . . . . . . . . . . . . . 1275--1283 T. H. Dunigan Performance of the Intel iPSC/860 and Ncube 6400 hypercubes . . . . . . . . . 1285--1302 W. E. Nagel and M. A. Linn Benchmarking parallel programs in a multiprogramming environment: the PAR-Bench system . . . . . . . . . . . . 1303--1321
S. Arvindam and V. Kumar and V. Nageshwara Rao and V. Singh Automatic test pattern generation on parallel processors . . . . . . . . . . 1323--1342 Jenn Yang Tien and Wei Pang Yang Hierarchical spanning trees and distributing on incomplete hypercubes 1343--1360 Dieter Müller-Wichards Problem size scaling in the presence of parallel overhead . . . . . . . . . . . 1361--1376 D. G. Feitelson Deadlock detection without wait-for graphs . . . . . . . . . . . . . . . . . 1377--1383 A. Chakraborty and D. C. S. Allison and C. J. Ribbens and L. T. Watson Note on unit tangent vector computation for homotopy curve tracking on a hypercube . . . . . . . . . . . . . . . 1385--1395 G. Bader and E. Gehrke On the performance of transputer networks for solving linear systems of equations . . . . . . . . . . . . . . . 1397--1407 A. Peters Sparse matrix vector multiplication techniques on the IBM 3090 VF . . . . . 1409--1424 Y. Escaig and W. Oed Analysis tools for Micro- and Autotasking programs on CRAY multiprocessor systems . . . . . . . . . 1425--1433
E. Chu and A. George A balanced submatrix merging algorithm for multiprocessor architectures . . . . 1--10 G. Lotti and M. Vajtersic The application of VLSI Poisson solvers to the biharmonic problem . . . . . . . 11--19 G. Horton and R. Knirsch A time-parallel multigrid-extrapolation method for parabolic partial differential equations . . . . . . . . . 21--29 D. Conforti and L. Grandinetti and R. Musmanno and M. Cannataro and G. Spezzano and D. Talia A model of efficient asynchronous parallel algorithms on multicomputer systems . . . . . . . . . . . . . . . . 31--45 C. Neusius and J. Olszewski and D. Scheerer An efficient distributed thinning algorithm . . . . . . . . . . . . . . . 47--55 A. De Gloria and P. Faraboschi and S. Ridella A dedicated massively parallel architecture for the Boltzmann machine 57--73 V. K. Murthy and E. V. Krishnamurthy and Pin Chen Systolic algorithm for rational interpolation and Padé approximation . . 75--83 Anatol G. Filin and Michael A. Frumkin A systolic array for inversion of a finite Radon transform . . . . . . . . . 85--90 M. Wheat and D. J. Evans An efficient parallel sorting algorithm for shared memory multiprocessors . . . 91--102 El-Sayed M. El-Horbaty and A. El-Din H. Mohamed A synchronous algorithm for shortest paths on a tree machine . . . . . . . . 103--107 W. Erhard and A. Grefe Improved parallel algorithms for the classification of electroencephalograms (EEGs) on the DAP510 . . . . . . . . . . 109--115
R. von Hanxleden and L. R. Scott Correctness and determinism of Parallel Monte Carlo Processes . . . . . . . . . 121--132 Tzung-Pei Hong and Shian-Shyong Tseng Parallel perceptron learning on a single-channel broadcast communication model . . . . . . . . . . . . . . . . . 133--148 D. Audet and Y. Savaria and J.-L. Houle Performance improvements to VLSI parallel systems, using dynamic concatenation of processing resources 149--167 M. Marrakchi Optimal parallel scheduling for the $2$-steps graph with constant task cost 169--176 Hong Shen Improved universal $k$-selection in hypercubes . . . . . . . . . . . . . . . 177--184 Ph. Clauss and C. Mongenet and G. R. Perrin Synthesis of size-optimal toroidal arrays for the Algebraic Path Problem: a new contribution . . . . . . . . . . . . 185--194 D. J. Evans A systolic array design for matrix system solution by the symmetric bordering method . . . . . . . . . . . . 195--205 T. Z. Kalamboukis A parallel algorithm for the dense symmetric eigenvalue problem on a transputer array . . . . . . . . . . . . 207--212 Przemys\law Stpiczy\'nski Parallel Cholesky factorization on orthogonal multiprocessors . . . . . . . 213--219 Chang-Sung Jeong and Jung-Ju Choi and Der Tsai Lee Parallel enclosing rectangle on SIMD machines . . . . . . . . . . . . . . . . 221--229 S. Kohlhoff and J. Krone Performance evaluation of SUPRENUM for the LINPACK benchmark (Short communication) . . . . . . . . . . . . . 231--238
R. Hiromoto and B. R. Wienke and R. G. Brickner The performance of asynchronous iteration schemes applied to the linearized Boltzmann transport equation 241--268 A. Schuller Parallelizing particle simulations based on the Boltzmann equation . . . . . . . 269--279 J. Andrew Holey and Oscar H. Ibarra Iterative algorithms for the planar convex hull problem on mesh-connected arrays . . . . . . . . . . . . . . . . . 281--296 B. Robic and P. Kolbezen and J. Silc Area optimization of dataflow-graph mappings . . . . . . . . . . . . . . . . 297--311 P. Casiccia and P. Castangia and S. Cincotti and G. Parodi Simulation of a molecular cellular array on a transputer-based parallel computer 313--324 K. G. Margaritis and D. J. Evans Systolic implementation of neural networks for searching sets of properties . . . . . . . . . . . . . . . 325--334 W. Loots and T. H. C. Smith A parallel three phase sorting procedure for a $k$-dimensional hypercube and a transputer implementation . . . . . . . 335--344 Eric Goles and Marcos Kiwi A lower bound on the computational complexity of the $QR$ decomposition on a shared memory SIMD computer . . . . . 345--354 G. M. Megson and D. J. Evans More on systolic line drawing . . . . . 355--358
P. H. Worley The effect of multiprocessor radius on scaling . . . . . . . . . . . . . . . . 361--376 Su Chu Hsu and Hsien Fen Hsieh and Shing Tsaan Huang A fully-pipelined systolic algorithm for finding bridges on an undirected connected graph . . . . . . . . . . . . 377--391 Hong Chich Chou and Chung Ping Chung A bound analysis of scheduling instructions on pipelined processors with a maximal delay of one cycle . . . 393--399 I. Mahadevan and L. M. Patnaik Performance evaluation of bidirectional associative memory on a transputer-based parallel system . . . . . . . . . . . . 401--413 G. M. Megson and O. Brudaru and D. Comish Systolic designs for Aitken's root finding method . . . . . . . . . . . . . 415--429 Pl. Iv. Piskoulijski Error analysis of parallel algorithm for the solution of a tridiagonal Toeplitz linear system of equations . . . . . . . 431--438 Gen-Huey Chen and Wei-Wen Liang Conflict-free broadcasting algorithms for graph traversals and their applications . . . . . . . . . . . . . . 439--448 C. P. Thompson and W. R. Cowell and G. K. Leaf On the parallelization of an adaptive multigrid algorithm for a class of flow problems . . . . . . . . . . . . . . . . 449--466 H. C. Burg and J. Helin 1991 International Conference on Supercomputing . . . . . . . . . . . . . 467 H.-C. Hege and R. Knecht Parallel Computing 91 . . . . . . . . . 473
Y. Robert and S. W. Song Revisiting cycle shrinking . . . . . . . 481--496 Yuh-Horng Shiau and Chung-Ping Chung Adoptability and effectiveness of microcode compaction algorithms in superscalar processing . . . . . . . . . 497--510 R. Lin and S. Olariu A fast cost-optimal parallel algorithm for the lowest common ancestor problem 511--516 E. D. Adamides and Ph. Tsalides and A. Thanailakis Hierarchical Cellular Automata structures . . . . . . . . . . . . . . . 517--524 D. J. Evans and M. Gusev Implementation of folding transformations on linear VLSI processor arrays . . . . . . . . . . . . . . . . . 525--542 R. S. Francis and L. J. H. Pannan A parallel partition for enhanced parallel QuickSort . . . . . . . . . . . 543--550 F. Suraweera and P. Bhattacharya A parallel cost-optimal algorithm to compute the supremum of max-min powers 551--556 H. Schreiber and O. Steinhauser and P. Schuster Parallel molecular dynamics of biomolecules . . . . . . . . . . . . . . 557--573 T. Dontje and Th. Lippert and N. Petkov and K. Schilling Statistical analysis of simulation-generated time series: Systolic vs. semi-systolic correlation on the Connection Machine . . . . . . . 575--588
Ajay K. Gupta and Susanne E. Hambrusch Load balanced tree embeddings . . . . . 595--614 Y. P. Boglaev Exact dynamic load balancing of MIMD architectures with linear programming algorithms . . . . . . . . . . . . . . . 615--623 Chien-Min Wang and Sheng-De Wang A hybrid scheme for efficiently executing nested loops on multiprocessors . . . . . . . . . . . . 625--637 J.-C. Bermond and P. Michallon and D. Trystram Broadcasting in wraparound meshes with parallel monodirectional links . . . . . 639--648 Ömer E\ugecio\uglu and Çetin K. Koç A parallel algorithm for generating discrete orthogonal polynomials . . . . 649--659 B. M. S. Khalaf and D. Hutchinson Parallel algorithms for initial value problems: parallel shooting . . . . . . 661--673 J. Andersen and G. Mitra and D. Parkinson The scheduling of sparse matrix-vector multiplication on a massively parallel DAP computer . . . . . . . . . . . . . . 675--697 J. M. D. Hill Parallel lexical analysis and parsing on the AMT distributed array processor . . 699--714
E. Rothberg and A. Gupta Parallel ICCG on a hierarchical memory multiprocessor --- Addressing the triangular solve bottleneck . . . . . . 719--741 T. Takeda and K. Tani and T. Tsunematsu and Y. Kishimoto and G. I. Kurita and S. Matsushita and T. Nakata Plasma simulator METIS for tokamak confinement and heating studies . . . . 743--765 L. Lopez and T. Politi Parallel methods in the numerical treatment of population dynamic models 767--777 Jianjian Song A distributed-termination experiment on a mesh-connected array of processors . . 779--791 D. Morris and D. G. Evans Modelling distributed and parallel computer systems . . . . . . . . . . . . 793--806 Laurence Boxer Finding congruent regions in parallel 807--810 Gen Huey Chen and Jin Hwang Jang An improved parallel algorithm for $0/1$ knapsack problem . . . . . . . . . . . . 811--821 Yung Chen Hung and Gen Huey Chen Distributed algorithms for the quickest path problem . . . . . . . . . . . . . . 823--834
Srinivas Aluru and G. M. Prabhu and John Gustafson A random number generator for parallel computers . . . . . . . . . . . . . . . 839--847 P. Sreenivasa Kumar and M. Kishore Kumar and A. Basu A parallel algorithm for elimination tree computation and symbolic factorization . . . . . . . . . . . . . 849--856 M. Gusev and J. Tasic Comparative analysis of methods for broadcast elimination . . . . . . . . . 857--866 M. Thune The partitioning problem for a class of data parallel algorithms . . . . . . . . 867--878 M. P. Bekakos and D. J. Evans The double alternating group explicit method for nonlinear parabolic equations on MIMD parallel computers . . . . . . . 879--895 J. Zerovnik and M. Kaufman A parallel variant of a heuristical algorithm for graph coloring --- Corrigendum (Short communication) . . . 897--900 K. Okamoto and Y. Kodama and S. Sakai and Y. Yamaguchi Methodologies in development and testing of the dataflow machine EM-4 . . . . . . 901--912 K. R. Tout and D. J. Evans Parallel forward chaining technique with dynamic scheduling, for rule-based expert systems . . . . . . . . . . . . . 913--930 R. Butel A Cray-2 versus CM-2 comparison using several polynomial benchmarks . . . . . 931--945 W. Oed Cray Y-MP C90: System features and early benchmark results (Short communication) 947--954
S. Stark and A. N. Beris LU decomposition optimized for a parallel computer with a hierarchical distributed memory . . . . . . . . . . . 959--971 Jack J. Dongarra and Robert A. van de Geijn Reduction to condensed form for the eigenvalue problem on distributed memory architectures . . . . . . . . . . . . . 973--982 Y. P. Chu and C. M. Hsieh An artificial neural network model with modified perceptron algorithm . . . . . 983--996 M. Gusev and D. J. Evans VLSI processor array IPS cells (Short communication) . . . . . . . . . . . . . 997--1007 G. Zhang and H. C. Elman Parallel sparse Cholesky factorization on a shared memory multiprocessor . . . 1009--1022 M. Bentley and C. Froese Fischer Hypercube conversion of serial codes for atomic structure calculations . . . . . 1023--1031 S. S. Nielsen and S. A. Zenios Data structures for network algorithms on massively parallel architectures . . 1033--1052 J. Tasic and M. Gusev and D. J. Evans Systolic implementation of preconditioned conjugate gradient method in adaptive transversal filters . . . . 1053--1065 R. W. Hockney and E. A. Carmona Comparison of communications on the Intel iPSC/860 and Touchstone Delta (Short communication) . . . . . . . . . 1067--1072 H. Strauss Parallel CFD'92 . . . . . . . . . . . . 1073
M. V. A. Hancu and K. Iwasaki and Y. Sato and M. Sugie Experimental results on the error detection capability of a concurrent test architecture for massively-parallel computers . . . . . . . . . . . . . . . 1079--1103 Peter Arbenz Divide and conquer algorithms for the bandsymmetric eigenvalue problem . . . . 1105--1128 A. Basermann and P. Weidner A parallel algorithm for determining all eigenvalues of large real symmetric tridiagonal matrices . . . . . . . . . . 1129--1141 Lih-Hsing Hsu and Peng Fei Wang and Chu Tao Wu Parallel algorithms for finding the most vital edge with respect to minimum spanning tree . . . . . . . . . . . . . 1143--1155 T. Chockalingam and S. Arunkumar A randomized heuristics for the mapping problem: The genetic approach . . . . . 1157--1165 E. Violard and G.-R. Perrin PEI: a language and its refinement calculus for parallel programming . . . 1167--1184 Y.-H. Choi An easily-diagnosable fault-tolerant binary tree architecture (Short communication) . . . . . . . . . . . . . 1185--1195
S. L. Johnsson and R. L. Krawitz Cooley--Tukey FFT on the Connection Machine . . . . . . . . . . . . . . . . 1201--1221 M. Zubair and S. N. Gupta and C. E. Grosch A variable precision approach to speedup iterative schemes on fine grained parallel machines (short communication) 1223--1231 Emmanouel A. Varvarigos and Dimitri P. Bertsekas Communication algorithms for isotropic tasks in hypercubes and wraparound meshes . . . . . . . . . . . . . . . . . 1233--1257 Roman G. Strongin and Yaroslav D. Sergeyev Global multidimensional optimization on parallel computer . . . . . . . . . . . 1259--1273 I. Vlahavas and P. Kefalas A parallel Prolog resolution based on multiple unifications . . . . . . . . . 1275--1283
F. J. Peters Preface . . . . . . . . . . . . . . . . 1289 T. Lippert and K. Schilling and N. Petkov Quark propagator on the Connection Machine . . . . . . . . . . . . . . . . 1291--1299 Mi Lu and Xiangzhen Qiao Applying parallel computer systems to solve symmetric tridiagonal eigenvalue problems . . . . . . . . . . . . . . . . 1301--1315 E. M. Daoudi and J. Lobry Implementation of a boundary element method on distributed memory computers 1317--1324 M. Clint and J. S. Weston and C. W. Bleakney A comparison of two Fortran dialects for expressing parallel solutions for a problem in linear algebra . . . . . . . 1325--1333 B. Khan and L. Hayes and A. P. Cracknell The optimisation of higher order resampling methods in a multiprocessor environment . . . . . . . . . . . . . . 1335--1347 P. Spee and W. F. Wong and M. Sato and E. Goto Evaluation of the continuation bit in the Cyclic Pipeline Computer . . . . . . 1349--1361 D. Sharp and M. Cripps and J. Darlington Parallel-architecture-directed program transformation . . . . . . . . . . . . . 1363--1380 D. K. Arvind On the detection of communication-related errors in concurrent programs . . . . . . . . . . 1381--1392 C. Ribeiro and D. El Baz A parallel optimal routing algorithm . . 1393--1402 A. W. G. Duller and R. Storer Simulation and verification of associative processor arrays . . . . . . 1403--1414 B. Quatember Concept of a crossbar switch for large-scale multiple processor systems in the field of process control . . . . 1415--1431
S. Foresti and S. Hassanzadeh and H. Murakami and V. Sonnad Parallel rapid operator for iterative finite element solvers on a shared memory machine . . . . . . . . . . . . . 1--7 P. Edmonds and E. Chu and A. George Dynamic programming on a shared-memory multiprocessor . . . . . . . . . . . . . 9--22 G. Lonsdale and A. Schuller Multigrid efficiency for complex flow simulations on distributed memory machines . . . . . . . . . . . . . . . . 23--32 H. Barada and A. El- Amawy A methodology for algorithm regularization and mapping into time-optimal VLSI arrays . . . . . . . . 33--61 N. Funabiki and Y. Takefuji A parallel multi-layer channel router on the HVH model . . . . . . . . . . . . . 63--77 D. J. Evans and C. R. Wan Parallel direct solution for $P$-cyclic matrix systems . . . . . . . . . . . . . 79--93 S. G. Akl and Ke Qiu A novel routing scheme on the star and pancake networks and its applications 95--101 J. Struckmeier and F. J. Pfreundt On the efficiency of simulation methods for the Boltzmann equation on parallel computers . . . . . . . . . . . . . . . 103--119
S. Sakai and Y. Kodama and Y. Yamaguchi Design and implementation of a circular omega network in the EM-4 . . . . . . . 125--142 P. S. Laursen Simple approaches to parallel Branch and Bound . . . . . . . . . . . . . . . . . 143--152 E. Ng Supernodal symbolic Cholesky factorization on a local-memory multiprocessor . . . . . . . . . . . . . 153--162 A. De Gloria and P. Faraboschi and M. Olivieri Clustered Boltzmann Machines: Massively parallel architectures for constrained optimization problems . . . . . . . . . 163--175 G. P. Balboni and G. P. Cabodi and S. Gai and M. Sonza Reorda A parallel system for test pattern generation . . . . . . . . . . . . . . . 177--185 P. Sreenivasa Kumar and M. K. Kumar and A. Basu Parallel algorithms for sparse triangular system solution . . . . . . . 187--196 M. Y. Mohd-Saman and D. J. Evans Investigation of a set of Bernstein Tests for the detection of loop parallelization . . . . . . . . . . . . 197--207 G. Horton A multi-level diffusion method for dynamic load balancing . . . . . . . . . 209--218 Yi-Bing Lin Parallel trace-driven simulation for packet loss in finite-buffered voice multiplexers . . . . . . . . . . . . . . 219--228 Stephan Olariu and James L. Schwing and Jingyuan Zhang Applications of reconfigurable meshes to constant-time computations . . . . . . . 229--237
E. Chu and A. George and D. Quesnel Parallel matrix inversion on a subcube-grid . . . . . . . . . . . . . . 243--256 Volker Mehrmann Divide and conquer methods for block tridiagonal systems . . . . . . . . . . 257--279 Bassem F. Beidas and George P. Papavassilopoulos Convergence analysis of asynchronous linear iterations with stochastic delays 281--302 C. R. Wan and D. J. Evans A systolic array architecture for linear and inverse matrix systems . . . . . . . 303--321 Zhiyong Liu and Jia-Huai You Conflict-free routing for BPC-permutations on synchronous hypercubes . . . . . . . . . . . . . . . 323--342 A. G. Chalmers and S. Gregory Constructing minimum path configurations for multiprocessor systems . . . . . . . 343--355
S. Lakshmivarahan and Jung Sing Jwo and S. K. Dhall Symmetry in interconnection networks based on Cayley graphs of permutation groups: a survey . . . . . . . . . . . . 361--407 A. R. Krommer and C. W. Ueberhuber Architecture adaptive algorithms . . . . 409--435 M. Mantharam and P. J. Eberlein New Jacobi-sets for parallel computations . . . . . . . . . . . . . . 437--454 M. Atiquzzaman and M. M. Banat Effect of hot-spots on the performance of crossbar multiprocessor systems . . . 455--461 M. Graca Ruano and D. F. Garcia Nocetti and P. J. Fish and P. J. Fleming Alternative parallel implementations of an AR-modified covariance spectral estimator for diagnostic ultrasonic blood flow studies . . . . . . . . . . . 463--476
Mythili Mantharam and P. J. Eberlein Block recursive algorithm to generate Jacobi-sets . . . . . . . . . . . . . . 481--496 Mokhtar Aboelaze and De-Lei L. Lee A method for data allocation and manipulation in hypercube computers . . 497--510 M. Bahi and J. C. Miellou Contractive mappings with maximum norms: comparison of constants of contraction and application to asynchronous iterations . . . . . . . . . . . . . . . 511--523 M. Misra and D. Nassimi and V. K. Prasanna Efficient VLSI implementation of iterative solutions to sparse linear systems . . . . . . . . . . . . . . . . 525--544 M. P. Bekakos and D. J. Evans Parallel cyclic odd-even reduction algorithms for solving Toeplitz tridiagonal equations on MIMD computers 545--561 G. Spaletta and D. J. Evans The Parallel Recursive Decoupling algorithm for solving tridiagonal linear systems . . . . . . . . . . . . . . . . 563--576 E. V. Krishnamurthy and Chen Pin Data parallel evaluation-interpolation algorithm for polynomial matrix inversion . . . . . . . . . . . . . . . 577--589
S. Chandra and M. Jain and A. Basu and P. S. Kumar Sorting algorithms on transputer arrays 595--607 T. B. Boffey and W. A. Essah Implementing a parallel constrained $\ell_1$ approximation algorithm . . . . 609--620 A. N. Choudhary and B. Narahari and R. Krishnamurti An efficient heuristic scheme for dynamic remapping of parallel computations (Short communication) . . . 621--632 H. Azaria and Y. Elovici Modeling and evaluation of a new message-passing system for parallel multiprocessor systems . . . . . . . . . 633--649 M. Paprzycki and I. Gladwell A parallel chopping algorithm for ODE boundary value problems . . . . . . . . 651--666 F. Pagano and G. Parodi and R. Zunino Parallel implementation of associative memories for image classification . . . 667--684 R. Campanini and I. D'Antone and G. Di Caro and G. Giusti A transputer-based parallel expert diagnostic system . . . . . . . . . . . 685--692 Y.-W. Leung On-line fault identification in multistage interconnection networks . . 693--702 E. J. Kontoghiorghes and M. R. B. Clarke Parallel reorthogonalization of the $QR$ decomposition after deleting columns (Short communication) . . . . . . . . . 703--707
S. J. Horng Computing dominators on a cube-connected machine . . . . . . . . . . . . . . . . 713--728 J. D. Bruguera and E. Antelo and E. L. Zapata Design of a pipelined radix 4 CORDIC processor . . . . . . . . . . . . . . . 729--744 C. N. Zhang and H. F. Li and R. Jayakumar A systematic approach for designing concurrent error-detecting systolic arrays using redundancy . . . . . . . . 745--764 Ren-Lianq Cheng and Chung-Ping Chung Reaching approximate agreement on hypercube . . . . . . . . . . . . . . . 765--775 P. A. Nelson Hypercube matrix multiplication . . . . 777--788 A. El-Amawy and R. Raja Split sequence generation algorithms for efficient identification of operational subcubes in faulty hypercubes . . . . . 789--805 Yung-Chang Wong and Shu-Yuen Hwang On parallelizing the Dempster-Shafer method using transputer network . . . . 807--822 S. D. Altekar and A. K. Ray and B. R. Wienke On the parallelization of a $S_n$ transport algorithm on a CRAY Y MP . . . 823--834
B. L. Menezes and I. L. M. Ricarte and R. Thurimella Analysis of pipelined external sorting on a reconfigurable message-passing multicomputer . . . . . . . . . . . . . 839--858 Nicolas Boissin and Jean-Luc Lutton A parallel simulated annealing algorithm 859--872 Louis Ibarra and Dana Richards Efficient parallel graph algorithms based on open ear decomposition . . . . 873--886 Jiawang Wei Parallel asynchronous iterations of least fixed points . . . . . . . . . . . 887--895 D. J. Evans and W. U. N. Butt Dynamic load balancing using task-transfer probabilities . . . . . . 897--916 Przemys\law Stpiczy\'nski Error analysis of two parallel algorithms for solving linear recurrence systems . . . . . . . . . . . . . . . . 917--923 John J. Buoni and Paul A. Farrell and Arden Ruttan Algorithms for ${LU}$ decomposition on a shared memory multiprocessor . . . . . . 925--937 Jianping Zhu $QR$ factorization for the regularized least squares problem on hypercubes . . 939--948 A. Matrone and P. Schiano and V. Puoti LINDA and PVM: a comparison between two environments for parallel programming 949--957
Zbigniew J. Czech and Marek Konopka and Bohdan S. Majewski Parallel algorithms for finding a suboptimal fundamental-cycle set in a graph . . . . . . . . . . . . . . . . . 961--971 I. W. Chan and D. K. Friesen Parallel algorithm for segment visibility reporting . . . . . . . . . . 973--978 L. Lopez Methods based on boundary value techniques for solving parabolic equations on parallel computers . . . . 979--991 Hong Shen A high performance interconnection network for multiprocessor systems . . . 993--1001 H. Caffey and L. Z. Liao and C. A. Shoemaker Parallel processing of large scale discrete-time unconstrained differential dynamic programming . . . . . . . . . . 1003--1017 D. El Baz Asynchronous implementation of relaxation and gradient algorithms for convex network flow problems . . . . . . 1019--1028 R. Trobec and I. Jerebic and D. Janezic Parallel algorithm for molecular dynamics integration . . . . . . . . . . 1029--1039 P. Altevogt and A. Linke Parallelization of the two-dimensional Ising model on a cluster of IBM RISC System/6000 workstations . . . . . . . . 1041--1052 A. Nanayakkara and D. Moncrieff and S. Wilson Performance of IBM RISC System/6000 workstation clusters in a quantum chemical application . . . . . . . . . . 1053--1062 A. Jakobs and R. W. Gerling Scaling aspects for the performance of parallel algorithms . . . . . . . . . . 1063--1073
Xiaobo Li and Paul Lu and Jonathan Schaeffer and John Shillington and Pok Sze Wong and Hanmao Shi On the versatility of parallel sorting by regular sampling . . . . . . . . . . 1079--1103 Rajesh Aggarwal and David R. Dellwo and Morton B. Friedman Parallel solution of Fredholm integral equations of the second kind by accelerated projection methods . . . . . 1105--1115 Maria Antonietta Pirozzi The fast numerical solution of mildly nonlinear elliptic boundary value problems on multiprocessors . . . . . . 1117--1128 Terry Bossomaier and Adrian Loeff Parallel computation of the Hausdorff distance between images . . . . . . . . 1129--1140 D. Busvine Implementing recursive functions as processor farms . . . . . . . . . . . . 1141--1153 Y. Kanada A method of vector processing for shared symbolic data . . . . . . . . . . . . . 1155--1175 M. Gusev and D. J. Evans New linear systolic arrays for the string comparison algorithm . . . . . . 1177--1193
J. De Keyser and D. Roose Load balancing data parallel programs on distributed memory computers . . . . . . 1199--1219 C. H. Cap and V. Strumpen Efficient parallel computing in distributed workstation environments . . 1221--1234 S. L. Johnsson Minimizing the communication time for matrix multiplication on multiprocessors 1235--1257 B. Hendrickson Parallel $QR$ factorization using the torus-wrap mapping . . . . . . . . . . . 1259--1271 P. Amodio and N. Mastronardi A parallel version of the cyclic reduction algorithm on a hypercube . . . 1273--1281 H. Dhrif and D. Sarkar Fuzzy arithmetic on systolic arrays . . 1283--1301 Çetin Kaya Koç and Peter Cappello Systolic arrays for integer Chinese remaindering . . . . . . . . . . . . . . 1303--1311 S. Hurley Taskgraph mapping using a genetic algorithm: a comparison of fitness functions . . . . . . . . . . . . . . . 1313--1317
T. Yang and A. Gerasoulis List scheduling with and without communication delays . . . . . . . . . . 1321--1344 F. B. Hanson and J.-D. Mei and C. Tier and H. Xu PDAC: a data parallel algorithm for the performance analysis of closed queueing networks . . . . . . . . . . . . . . . . 1345--1358 H. B. Zhou Two-stage $m$-way graph partitioning . . 1359--1373 K.-H. Hoffmann and J. Zou Parallel efficiency of domain decomposition methods . . . . . . . . . 1375--1391 M. Kumar and Y. Baransky and M. Denneau The GF11 parallel computer . . . . . . . 1393--1412 U. Gartel and W. Joppich and A. Schuller Parallelizing the ECMWF's weather forecast program: the 2D case . . . . . 1413--1425 U. Gartel and W. Joppich and A. Schuller First results with a parallelized $3$D weather prediction code . . . . . . . . 1427--1429 F.-H Hebeker Parallel CFD'93 . . . . . . . . . . . . 1431
Shen Shen Wu and David Sweeting Heuristic algorithms for task assignment and scheduling in a processor network 1--14 J. B\la\.zewicz and M. Drozdowski and G. Schmidt and D. de Werra Scheduling independent multiprocessor tasks on a uniform $k$-processor system 15--28 D. J. Evans and M. Gusev New linear systolic arrays for digital filters and convolution . . . . . . . . 29--61 Thomas Schreiber and Peter Otto and Fridolin Hofmann A new efficient parallelization strategy for the $QR$ algorithm . . . . . . . . . 63--75 (or 63--76??) R. Calinescu and D. J. Evans A parallel simulation model for load balancing in clustered distributed systems . . . . . . . . . . . . . . . . 77--91 Jaime Seguel and Dorothy Bollman Fast digit-reversal algorithms on a shared-memory machine . . . . . . . . . 93--99 Shyan-Ming Yuan An efficient fault-tolerant decentralized commit protocol . . . . . 101--114 Thomas Umland Parallel sorting revisited . . . . . . . 115--124 K. Nagel and A. Schleicher Microscopic traffic modeling on parallel high performance computers . . . . . . . 125--146
G. W. Stewart Updating URV decompositions in parallel 151--172 D. M. Beazley and P. S. Lomdahl Message-passing multi-cell molecular dynamics on the Connection Machine 5 . . 173--195 M. Angelaccio and M. Colajanni The row/column pivoting strategy on multicomputers . . . . . . . . . . . . . 197--213 Michael Conner and Richard Tolimieri Special purpose hardware for Discrete Fourier Transform implementation . . . . 215--232 Henry Ker-Chang Chang and Jonathan Jen-Rong Chen and Shyong-Jian Shyu A parallel algorithm for the knapsack problem using a generation and searching technique . . . . . . . . . . . . . . . 233--243 Antonio d'Acierno and Roberto Vaccaro On parallelizing recursive neural networks on coarse-grained parallel computers: a general algorithm . . . . . 245--256 M. Angelaccio and M. Colajanni Subcube matrix decomposition: a unifying view for LU factorization on multicomputers . . . . . . . . . . . . . 257--270
M. Kiehl Parallel multiple shooting for the solution of initial value problems . . . 275--295 Lujuan Chen and E. V. Krishnamurthy and Iain Macleod Generalised matrix inversion and rank computation by successive matrix powering . . . . . . . . . . . . . . . . 297--311 Jian-Jin Li Multiscattering on the Cube-Connected Cycles . . . . . . . . . . . . . . . . . 313--324 D. J. Evans and W. U. N. Butt Load balancing with network partitioning using host groups . . . . . . . . . . . 325--345 Tzung-Pei Hong and Shian-Shyong Tseng An optimal parallel perceptron learning algorithm for a large training set . . . 347--352 Jong-Chuang Tsay and Wei-Ping Lee An optimal parallel algorithm for generating permutations in minimal change order . . . . . . . . . . . . . . 353--361 M. L. Sawley and C. M. Bergman A comparative study of the use of the data-parallel approach for compressible flow calculations . . . . . . . . . . . 363--373 A. Asenov and D. Reid and J. R. Barker Speed-up of scalable iterative linear solvers implemented on an array of transputers . . . . . . . . . . . . . . 375--387 Roger W. Hockney The communication challenge for MPP: Intel Paragon and Meiko CS-2 . . . . . . 389--398 U. Kleis and J. M. Singer and I. Morgenstern and Th. Hußlein and H.-G. Matuttis Experiences with re-engineering and parallelizing a high-T$_c$ superconductivity code . . . . . . . . . 399--407 Anonymous Parallel Computing 93 . . . . . . . . . 409
Oliver A. McBryan An overview of message passing environments . . . . . . . . . . . . . . 417--443 (or 417--444??) Vasanth Bala and Jehoshua Bruck and Raymond Bryant and Robert Cypher and Peter de Jong and Pablo Elustondo and D. Frye and Alex Ho and Ching-Tien Ho and Gail Irwin and Shlomo Kipnis and Richard Lawrence and Marc Snir The IBM External User Interface for scalable parallel systems . . . . . . . 445--462 Paul Pierce The NX message passing interface . . . . 463--480 Lewis W. Tucker and Alan Mainwaring CMMD: Active messages on the CM-5 . . . 481--496 Eric Barton and James Cownie and Moray McLaren Message passing on the Meiko CS-2 . . . 497--507 M. Schmidt-Voigt Efficient parallel communication with the nCUBE 2S processor . . . . . . . . . 509--530 V. S. Sunderam and G. A. Geist and J. Dongarra and R. Manchek The PVM concurrent computing system: Evolution, experiences, and trends . . . 531--545 Ralph M. Butler and Ewing L. Lusk Monitors, messages, and clusters: The p4 parallel programming system . . . . . . 547--564 Anthony Skjellum and Steven G. Smith and Nathan E. Doss and Alvin P. Leung and Manfred Morari The design and evolution of Zipcode . . 565--596 Jon Flower and Adam Kolawa Express is not just a message passing system: Current and future directions in Express . . . . . . . . . . . . . . . . 597--614 R. Calkin and R. Hempel and H.-C. Hoppe and P. Wypior Portable programming with the PARMACS message-passing library . . . . . . . . 615--632 Nicholas J. Carriero and David Gelernter and Timothy G. Mattson and Andrew H. Sherman The Linda alternative to message-passing systems . . . . . . . . . . . . . . . . 633--655 David W. Walker The design of a standard message passing interface for distributed memory concurrent computers . . . . . . . . . . 657--673
Alain Darte and Yves Robert Mapping uniform loop nests onto distributed memory architectures . . . . 679--710 Jingling Xue Automating non-unimodular loop transformations for massive parallelism 711--728 David J. Lilja A multiprocessor architecture combining fine-grained and coarse-grained parallelism strategies . . . . . . . . . 729--751 Mark T. Jones and Paul E. Plassmann Scalable iterative solution of sparse linear systems . . . . . . . . . . . . . 753--773 Wei Ping Lee and Jong Chuang Tsay A systolic design for generating permutations in lexicographic order . . 775--785 D. J. Evans and W. S. Yousif The solution of unsymmetric tridiagonal Toeplitz systems by the strides reduction algorithm . . . . . . . . . . 787--798 E. V. Krishnamurthy and Vikram Krishnamurthy An ANN model perceptron algorithm using generalized matrix inversion . . . . . . 799--806 E. Montagne and M. Rukoz and R. Surós and F. Breant Modeling optimal granularity when adapting systolic algorithms to transputer based supercomputers . . . . 807--814 Y. F. Hu and R. J. Blake Numerical experiences with partitioning of unstructured meshes . . . . . . . . . 815--829
S. Selvakumar and C. Siva Ram Murthy Static task allocation of concurrent programs for distributed computing systems with processor and resource heterogeneity . . . . . . . . . . . . . 835--851 Jianjian Song A partially asynchronous and iterative algorithm for distributed load balancing 853--868 Dongseung Kim and Byung-Guoen Yi A two-pass scheduling algorithm for parallel programs . . . . . . . . . . . 869--885 Tien-Yu Huang and Jean-Lien C. Wu Alternate resolution strategy in multistage interconnection networks . . 887--896 Bao Lin Zhang and Wen Zhi Li On Alternating Segment Crank--Nicolson scheme (Short communication) . . . . . . 897--902 C. R. Wan and D. J. Evans A systolic array architecture for $QR$ decomposition of block structured sparse systems . . . . . . . . . . . . . . . . 903--914
Kapil K. Mathur and S. Lennart Johnsson Multiplication of matrices of arbitrary shape on a data parallel computer . . . 919--951 Inge Gutheil and Werner Krotz-Vogel Performance of a parallel matrix multiplication routine on Intel iPSC/860 953--974 H. Suman and K. Schilling A comparative study of gauge fixing procedures on the connection machines CM2 and CM5 . . . . . . . . . . . . . . 975--990 Chang-ming Ma Implementation of a Monte Carlo code on a parallel computer system . . . . . . . 991--1005 Hsiao-Hsi Wang and Ruei-Chuan Chang A distributed shared memory system with self-adjusting coherence scheme . . . . 1007--1025 Takenori Makino Shift-net and power shift-net for parallel processor systems . . . . . . . 1027--1039 Jean-Lien C. Wu and T.-Y. Huang A new bus contention scheme in S/NET with dynamic priority . . . . . . . . . 1041--1054 D. J. Evans and E. Galligani A parallel additive preconditioner for conjugate gradient method for $AX+XB=C$ 1055--1064
Johan De Keyser and Kurt Lust and Dirk Roose Run-time load balancing support for a parallel multiblock Euler/Navier--Stokes code with adaptive refinement on distributed memory computers . . . . . . 1069--1088 Hong Zhang and William F. Moss Using parallel banded linear system solvers in generalized eigenvalue problems . . . . . . . . . . . . . . . . 1089--1105 Sabine Van Huffel and Haesun Park Parallel tri- and bi-diagonalization of bordered bidiagonal matrices . . . . . . 1107--1128 T. F. Pena and E. L. Zapata and D. J. Evans Finite element simulation of semiconductor devices on multiprocessor computers . . . . . . . . . . . . . . . 1129--1159 Nicholas J. Higham and Pythagoras Papadimitriou A parallel algorithm for computing the polar decomposition . . . . . . . . . . 1161--1173 P. Yalamov and D. J. Evans On the forward stability of a modified `stride of $3$' reduction method . . . . 1175--1190 Amit J. Basu A parallel algorithm for spectral solution of the three-dimensional Navier--Stokes equations . . . . . . . . 1191--1204 Richard E. Overill and Stephen Wilson Performance of parallel algorithms for the evaluation of power series . . . . . 1205--1213 David W. Walker Erratum to: ``The design of a standard message passing interface for distributed memory concurrent computers'' . . . . . . . . . . . . . . 1215--1215
L. C. Polymenakos and D. P. Bertsekas Parallel shortest path auction algorithms . . . . . . . . . . . . . . . 1221--1247 Qi Gan and Qing Yang and Chen-Yi Hu Parallel all-row preconditioned interval linear solver for nonlinear equations on multiprocessors . . . . . . . . . . . . 1249--1268 Jeffrey T. Draper and Joydeep Ghosh The M-cache: a message-handling mechanism for multicomputer systems . . 1269--1288 Abdel Aziz Farrag Tolerating faulty edges in a multi-dimensional mesh . . . . . . . . . 1289--1301 Abhay Jain and N. S. Chaudhari Efficient parallel recognition of context-free languages . . . . . . . . . 1303--1321 (or 1303--1322??) Yen Chun Lin New systolic arrays for the longest common subsequence problem$^+$ . . . . . 1323--1334 Saulo R. M. Barros and Tuomo Kauranne On the parallelization of global spectral weather models . . . . . . . . 1335--1356 Jun Makino Lagged-Fibonacci random number generators on parallel computers . . . . 1357--1367 Frank Dehne and Afonso Ferreira and Andrew Rau-Chaplin A massively parallel knowledge-base server using a hypercube multiprocessor 1369--1382
Oliver A. McBryan The SUPRENUM and GENESIS projects . . . 1389--1396 Ulrich Trottenberg Some remarks on the SUPRENUM project . . 1397--1406 W. K. Giloi The SUPRENUM supercomputer: Goals, achievements, and lessons learned . . . 1407--1425 Oliver A. McBryan SUPRENUM: Perspectives and performance 1427--1442 Wolfgang K. Giloi Parallel supercomputer architectures and their programming models . . . . . . . . 1443--1470 Wolfgang Schröder-Preikschat PEACE --- a software backplane for parallel computing . . . . . . . . . . . 1471--1485 Hans P. Zima and Peter Brezany and Barbara M. Chapman SUPERB and Vienna Fortran . . . . . . . 1487--1517 R. Hempel Application programming interfaces for SUPRENUM . . . . . . . . . . . . . . . . 1519--1526 Hermann Mierendorff and Helmut Schwamborn and Maurizio Tazza Performance modelling of grid problems --- a case study on the SUPRENUM system 1527--1546 Manfred Alef Implementation of a multigrid algorithm on SUPRENUM and other systems . . . . . 1547--1557 Hubert Ritzdorf and Anton Schüller and Barbara A. Steckel and Klaus Stüben $L_i$SS --- an environment for the parallel multigrid solution of partial differential equations on general 2D domains . . . . . . . . . . . . . . . . 1559--1570 Ortwin Pätzold and Anton Schüller and Horst Schwichtenberg Parallel applications and performance measurements on SUPRENUM . . . . . . . . 1571--1582 Georg Fleischmann and Matthias Gente and Fridolin Hofmann and Gunter Bolch Performance analysis of parallel programs based on model calculations . . 1583--1603 Tony Hey The Genesis Esprit project --- an overview . . . . . . . . . . . . . . . . 1605--1612 Otto Kolp Performance estimation for a parallel system with a hierarchical switch network . . . . . . . . . . . . . . . . 1613--1626 Jon Beecroft and Mark Homewood and Moray McLaren Meiko CS-2 interconnect Elan-Elite design . . . . . . . . . . . . . . . . . 1627--1638 L. M. Delves and C. A. Addison and O. A. Aziz The design and implementation of a portable parallel numerical library . . 1639--1651 C. A. Addison and V. S. Getov and A. J. G. Hey and R. W. Hockney and I. C. Wolton Benchmarking for distributed memory parallel systems: Gaining insight from numbers . . . . . . . . . . . . . . . . 1653--1668 Karl Solchenbach and Clemens-August Thole and Ulrich Trottenberg GENESIS application software . . . . . . 1669--1673 Edgar A. Gerteisen Preliminary performance results of the massive parallel Aircraft Euler Method 1675--1683 Tuomo Kauranne Summary of GENESIS work at the European Centre for Medium-range Weather Forecasts (ECMWF) . . . . . . . . . . . 1685--1688 J. J. H. Miller and S. Wang On the implementation of a $3$-D semiconductor device simulator on distributed-memory MIMD/SIMD machines 1689--1691
A. Dubey and M. Zubair and C. E. Grosch A general purpose subroutine for fast Fourier transform on a distributed memory parallel machine . . . . . . . . 1697--1710 Ralf Östermark and Martin Saarinen Parallel implementation of a VARMAX algorithm . . . . . . . . . . . . . . . 1711--1720 Shu Hua Hu and Hsing Lung Chen An effective routing algorithm in incomplete hypercubes . . . . . . . . . 1721--1738 M. S. Horng and D. J. Chen and Kuo Lung Ku Parallel routing algorithms for incomplete hypercube interconnection networks . . . . . . . . . . . . . . . . 1739--1761 Kemal Efe and P. K. Blackwell and W. Slough and T. Shiau Topological properties of the crossed cube architecture . . . . . . . . . . . 1763--1775
Samir W. Mahfoud and David E. Goldberg Parallel recombinative simulated annealing: a genetic algorithm . . . . . 1--28 R. Van Driessche and D. Roose An improved spectral bisection algorithm and its application to dynamic load balancing . . . . . . . . . . . . . . . 29--48 Claus Bendtsen and Per Christian Hansen and Kaj Madsen and Hans Bruun Nielsen and Mustafa Pinar Implementation of $QR$ up- and downdating on a massively parallel computer . . . . . . . . . . . . . . . . 49--61 T. H. C. Smith and G. L. Thompson A parallel implementation of the column subtraction algorithm . . . . . . . . . 63--71 A. De Matteis and S. Pagnutti Controlling correlations in parallel Monte Carlo . . . . . . . . . . . . . . 73--84 Sathiamoorthy Manoharan and Nigel P. Topham An assessment of assignment schemes for dependency graphs . . . . . . . . . . . 85--107 D. J. Evans and S. A. Amin Systolic algorithms for digital image filtering . . . . . . . . . . . . . . . 109--119 Kuninobu Tanno and Toshihiro Taketa and Susumu Horiguchi Parallel FFT algorithms using radix 4 butterfly computation on an eight-neighbor processor array . . . . . 121--136 Chi-kin Lee and Mounir Hamdi Practical aspects and experiences: Parallel image processing applications on a network of workstations . . . . . . 137--160 Howard C. Elman and Dennis K.-Y. Lee Use of linear algebra kernels to build an efficient finite element solver . . . 161--173
J. De Keyser and D. Roose Run-time load balancing techniques for a parallel unstructured multi-grid Euler solver with adaptive grid refinement . . 179--198 Tilmann Bönniger and Rüdiger Esser and Dietrich Krekel CM-5E, KSR2, Paragon XP/S: a comparative description of massively parallel computers . . . . . . . . . . . . . . . 199--232 Juan C. Agüí and Javier Jiménez A binary tree implementation of a parallel distributed tridiagonal solver 233--241 Emmanouel A. Varvarigos and Dimitri P. Bertsekas Transposition of banded matrices in hypercubes: a nearly isotropic task . . 243--264 E. Lega and H. Scholl and J.-M. Alimi and A. Bijaoui and P. Bury A parallel algorithm for structure detection based on wavelet and segmentation analysis . . . . . . . . . 265--285 F. J. Muniz and E. J. Zaluska Parallel load-balancing: an extension to the gradient model . . . . . . . . . . . 287--301 Hong Shen An efficient permutation-based parallel algorithm for range-join in hypercubes 303--313 M. Y. Mohd-Saman and D. J. Evans Inter-procedural analysis for parallel computing . . . . . . . . . . . . . . . 315--338 Zaher Mahjoub and Mohamed Jemni Restructuring and parallelizing a static conditional loop . . . . . . . . . . . . 339--347
F. Desprez and B. Tourancheau Basic routines for the rank-$2k$ update: 2D torus vs.\ reconfigurable network . . 353--372 Jörg-Thomas Pfenning and Christoph Moll Optimized communication patterns on workstation clusters . . . . . . . . . . 373--388 Liu Yong and Kang Lishan and D. J. Evans The annealing evolution algorithm as function optimizer . . . . . . . . . . . 389--400 S. Crivelli and E. R. Jessup The cost of eigenvalue computation on distributed-memory MIMD multiprocessors 401--422 L. Nicastro and N. D'Amico An optimized mass storage FFT for vector computers . . . . . . . . . . . . . . . 423--432 R. Sridhar and N. Chandrasekharan Highly parallelizable problems on sorted intervals . . . . . . . . . . . . . . . 433--446 K. G. Kumar and D. B. Skillicorn Data parallel geometric operations on lists . . . . . . . . . . . . . . . . . 447--459 Zhaofang Wen Fast parallel algorithms for the maximum sum problem . . . . . . . . . . . . . . 461--466 D. Moncrieff and R. E. Overill and S. Wilson $\alpha_{\mbox{critical}}$ for parallel processors . . . . . . . . . . . . . . . 467--471 Pontus Matstoms Parallel sparse $QR$ factorization on shared memory architectures . . . . . . 473--486 Pasqua D'Ambra and Giulio Giunta Concurrent banded Cholesky factorization on workstation networks using PVM . . . 487--494 Frederic Desprez and Marc Garbey Numerical simulation of a combustion problem on a Paragon machine . . . . . . 495--508 Gerhard Globisch PARMESH --- a parallel mesh generator 509--524
David M. Nicol Noncommittal barrier synchronization . . 529--549 Rolf Borgeest and Bernward Dimke and Olav Hansen A trace based performance evaluation tool for parallel real time systems . . 551--564 Bai Zhongzhi and Wang Deren and D. J. Evans Models of asynchronous parallel matrix multisplitting relaxed iterations . . . 565--582 L. F. Romero and E. L. Zapata Data distributions for sparse matrix vector multiplication . . . . . . . . . 583--605 N. M. Bahoshy and D. J. Evans A general harness for explicit parallel programming . . . . . . . . . . . . . . 607--617 M. P. Bekakos A notational approach to formulation of systolic array programs (Short communication) . . . . . . . . . . . . . 619--626 Xiaodong Zhang Parallelizing an oil refining simulation: Numerical methods, implementations and experience . . . . . 627--647 Albert Y. Zomaya Parallel processing for robot dynamics computations . . . . . . . . . . . . . . 649--668 A. Asenov and D. Reid and J. R. Barker Speed-up of scalable iterative linear solvers implemented on an array of transputers . . . . . . . . . . . . . . 669--682 G. A. Kohring Dynamic load balancing for parallelized particle simulations on MIMD computers 683--693
Takuya Terasawa and Ou Yamamoto and Tomohiro Kudoh and Hideharu Amano A performance evaluation of the multiprocessor testbed ATTEMPT-0 . . . . 701--730 Susanne E. Hambrusch and Farooq Hameed and Ashfaq A. Khokhar Communication operations on coarse-grained mesh architectures . . . 731--751 (or 731--752??) Shuichi Sakai and Yuetsu Kodama and Mitsuhisa Sato and Andrew Shaw and Hiroshi Matsuoka and Hideo Hirono and Kazuaki Okamoto and Takashi Yokota Reduced interprocessor-communication architecture and its implementation on EM-4 . . . . . . . . . . . . . . . . . . 753--769 (or 753--770??) Dilip K. Saikia and Ranjan K. Sen Order preserving communication on a star network . . . . . . . . . . . . . . . . 771--782 M. A. de Rosa and G. Giunta and M. Rizzardi Parallel Talbot's algorithm for distributed memory machines . . . . . . 783--801 (or 783--802??) M. Cannataro and S. Di Gregorio and R. Rongo and W. Spataro and G. Spezzano and D. Talia A parallel cellular automata environment on multicomputers for computational science . . . . . . . . . . . . . . . . 803--823 (or 803--824??) K. G. Margaritis On the systolic implementation of associative memory artificial neural networks . . . . . . . . . . . . . . . . 825--840 Ling Chen and Henry Y. H. Chuang An efficient algorithm for complete Euclidean distance transform on mesh-connected SIMD (Short communication) . . . . . . . . . . . . . 841--852 Marek T. Michalewicz and Mark Priebatsch Perfect scaling of the electronic structure problem on a SIMD architecture 853--870
Robert B. Schnabel A view of the limitations, opportunities, and challenges in parallel nonlinear optimization . . . . 875--905 Kai Rothe and Heinrich Voss A fully parallel condensation method for generalized eigenvalue problems on distributed memory computers . . . . . . 907--921 Arkady Kanevsky and Chao Feng On the embedding of cycles in pancake graphs . . . . . . . . . . . . . . . . . 923--936 Dieter Müller-Wichards and Wolfgang Rönsch Scalability of algorithms: an analytic approach . . . . . . . . . . . . . . . . 937--952 Tzong Wann Kao and Shi Jinn Horng Optimal algorithms for computing articulation points and some related problems on a circular-arc graph (Short communication) . . . . . . . . . . . . . 953--969 John Brown and Jerzy Was\'niewski and Zahari Zlatev Practical aspects and experiences. Running air pollution models on massively parallel machines . . . . . . 971--991 Vamsee Lakamsani and Laxmi N. Bhuyan and D. Scott Linthicum Practical aspects and experiences. Mapping molecular dynamics computations on to hypercubes . . . . . . . . . . . . 993--1013 Jun Makino and Osamu Miyamura Parallelized feedback shift register generators of pseudorandom numbers . . . 1015--1028
Tony F. Chan and Jian Ping Shao Parallel complexity of domain decomposition methods and optimal coarse grid size . . . . . . . . . . . . . . . 1033--1049 Hugo Embrechts and Dirk Roose MIMD divide-and-conquer algorithms for the distance transformation. Part I: City Block distance . . . . . . . . . . 1051--1076 Hugo Embrechts and Dirk Roose MIMD divide-and-conquer algorithms for the distance transformation. Part II. Chamfer $3$-$4$ distance . . . . . . . . 1077--1096 Pierluigi Amodio and Luigi Brugnano The parallel $QR$ factorization algorithm for tridiagonal linear systems 1097--1110 P. Yalamov and D. J. Evans The $WZ$ matrix factorisation method . . 1111--1120 Edward Rothberg Alternatives for solving sparse triangular systems on distributed-memory multiprocessors . . . . . . . . . . . . 1121--1136 N. Floros and J. S. Reeve Evaluation of a spectral element CFD code on parallel architectures . . . . . 1137--1150 A. Averbuch and M. Israeli and L. Vozovoi Parallel implementation of non-linear evolution problems using parabolic domain decomposition . . . . . . . . . . 1151--1183
Michael W. Berry and Jack J. Dongarra and Youngbae Kim A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form . . . . . . . . . 1189--1211 C. Trefftz and C. C. Huang and P. K. McKinley and T.-Y. Li and Z. Zeng A scalable eigenvalue solver for symmetric tridiagonal matrices . . . . . 1213--1240 Xian-He Sun Application and accuracy of the parallel diagonal dominant algorithm . . . . . . 1241--1267 H. R. Barada Modular matrix computations on multi-linear VLSI arrays . . . . . . . . 1269--1284 Paraskevas Evripidou and Jean-Luc Gaudiot Incorporating input/output operations into dynamic data-flow graphs . . . . . 1285--1311 Clark F. Olson Parallel algorithms for hierarchical clustering . . . . . . . . . . . . . . . 1313--1325 Tom Altman and Yoshihide Igarashi and Koji Obokata Hyper-ring connection machines . . . . . 1327--1338 J. P. Geschiere and H. A. G. Wijshoff Exploiting large grain parallelism in a sparse direct linear system solver . . . 1339--1364 G. Casciola and S. Morigi Graphics in parallel computation for rendering $3$D modelled scenes . . . . . 1365--1382
Jaeyoung Choi and Jack J. Dongarra and David W. Walker Parallel matrix transpose algorithms on distributed memory concurrent computers 1387--1405 Gita Alaghband Parallel sparse matrix solution and performance . . . . . . . . . . . . . . 1407--1430 Bassem F. Beidas and George P. Papavassilopoulos Distributed asynchronous algorithms with stochastic delays for constrained optimization problems with conditions of time drift . . . . . . . . . . . . . . . 1431--1450 Fotis Barlos and Ophir Frieder A load balanced multicomputer relational database system for highly skewed data 1451--1483 Akiyoshi Wakatani and Michael Wolfe Optimization of array redistribution for distributed memory multicomputers . . . 1485--1490 Umpei Nagashima and Sachiko Hyugaji and Satoshi Sekiguchi and Mitsuhisa Sato and Haruo Hosoya An experience with super-linear speedup achieved by parallel computing on a workstation cluster: Parallel calculation of density of states of large scale cyclic polyacenes . . . . . 1491--1504 Jesper Larsson Träff An experimental comparison of two distributed single-source shortest path algorithms . . . . . . . . . . . . . . . 1505--1532
J. Drake and I. Foster Guest Editorial: Parallel computing in climate and weather modeling . . . . . . 1537 J. Drake and I. Foster Introduction to the special issue on parallel computing in climate and weather modeling . . . . . . . . . . . . 1539--1544 James J. Hack and James M. Rosinski and David L. Williamson and Byron A. Boville and John E. Truesdale Computational design of the NCAR community climate model . . . . . . . . 1545--1569 John Drake and Ian Foster and John Michalakes and Brian Toonen and Patrick Worley Design and performance of a scalable parallel community climate model . . . . 1571--1591 Steven W. Hammond and Richard D. Loft and John M. Dennis and Richard K. Sato Implementation and performance issues of a massively parallel atmospheric model 1593--1619 S. R. M. Barros and D. Dent and L. Isaksen and G. Robinson and G. Mozdzynski and F. Wollenweber The IFS model: a parallel production weather code . . . . . . . . . . . . . . 1621--1638 J. G. Sela Weather forecasting on parallel architectures . . . . . . . . . . . . . 1639--1654 M. F. Wehner and A. A. Mirin and P. G. Eltgroth and W. P. Dannevik and C. R. Mechoso and J. D. Farrara and J. A. Spahr Performance of a distributed memory finite difference atmospheric general circulation model . . . . . . . . . . . 1655--1675 Philip W. Jones and Christopher L. Kerr and Richard S. Hemler Practical considerations in development of a parallel SKYHI general circulation model . . . . . . . . . . . . . . . . . 1677--1694 Rainer Bleck and Sumner Dean and Matthew O'Keefe and Aaron Sawdey A comparison of data-parallel and message-passing versions of the Miami Isopycnic Coordinate Ocean Model (MICOM) 1695--1720
Y. Nota An efficient parallel discrete PDE solver . . . . . . . . . . . . . . . . . 1725--1748 Chang Shu and Hilary Buxton Parallel path planning on the distributed array processor . . . . . . 1749--1767 Nathan Mattor and Timothy J. Williams and Dennis W. Hewett Algorithm for solving tridiagonal matrix problems in parallel . . . . . . . . . . 1769--1782 Suchendra M. Bhandarkar and Hamid R. Arabnia The REFINE multiprocessor --- Theoretical properties and algorithms 1783--1805 Ramachandran Vaidyanathan and Anand Padmanabhan Short communication: Bus-based networks for fan-in and uniform hypercube algorithms . . . . . . . . . . . . . . . 1807--1821 N. Floros and J. S. Reeve and J. Clinckemaillie and S. Vlachoutsis and G. Lonsdale Comparative efficiencies of domain decompositions . . . . . . . . . . . . . 1823--1835 Mats Holmström Practical aspects and experiences: Parallelizing the fast wavelet transform 1837--1848 M. Briscolini A parallel implementation of a $3$-D pseudospectral based code on the IBM 9076 scalable POWER parallel system . . 1849--1862
T. Dehn and M. Eiermann and K. Giebermann and V. Sperling Structured sparse matrix-vector multiplication on massively parallel SIMD architectures . . . . . . . . . . . 1867--1894 PeiZong Z. Lee Techniques for compiling programs on distributed memory multicomputers . . . 1895--1923 C. S. Yang and Y. M. Tsai and S. L. Chi and Shepherd S. B. Shi Adaptive wormhole routing in $k$-ary $n$-cubes . . . . . . . . . . . . . . . 1925--1943 J. B\la\.zewicz and M. Drozdowski Short Communication: Scheduling divisible jobs on hypercubes . . . . . . 1945--1956 Sergio De Agostino Short communication: a parallel decoding algorithm for LZ2 data compression . . . 1957--1961 Chandra N. Sekharan and Vineet Goel and R. Sridhar Load balancing methods for ray tracing and binary tree computing using PVM . . 1963--1978 Gerhard Globisch On an automatically parallel generation technique for tetrahedral meshes . . . . 1979--1995 Murray Dow Transposing a matrix on a vector computer . . . . . . . . . . . . . . . . 1997--2005
Bruno Lang Parallel reduction of banded matrices to bidiagonal form . . . . . . . . . . . . 1--18 Francisco Argüello and Margarita Amor and Emilio L. Zapata FFTs on mesh connected computers . . . . 19--38 S. A. Savari and D. P. Bertsekas Finite termination of asynchronous iterative algorithms . . . . . . . . . . 39--56 E. de Sturler A performance model for Krylov subspace methods on mesh-based parallel computers 57--74 Himanshu Gupta and P. Sadayappan Communication-efficient matrix multiplication on hypercubes . . . . . . 75--99 A. Baronio and F. Zama A domain decomposition technique for spline image restoration on distributed memory systems . . . . . . . . . . . . . 101--110 Donald Dabdub and John H. Seinfeld Parallel computation in atmospheric chemical modeling . . . . . . . . . . . 111--130 R. Hempel and R. Calkin and R. Hess and W. Joppich and C. W. Oosterlee and H. Ritzdorf and P. Wypior and W. Ziegler and N. Koike and T. Washio and U. Keller Real applications on the new parallel system NEC Cenju-3 . . . . . . . . . . . 131--148 Andreas Uhl Wavelet packet best basis selection on moderate parallel MIMD architectures . . 149--158
C. S. Ierotheou and S. P. Johnson and M. Cross and P. F. Leggett Computer aided parallelisation tools (CAPTools) --- conceptual overview and performance on the parallelisation of structured mesh codes . . . . . . . . . 163--195 S. P. Johnson and M. Cross and M. G. Everett Exploitation of symbolic information in interprocedural dependence analysis . . 197--226 S. P. Johnson and C. S. Ierotheou and M. Cross Automatic parallel code generation for message passing on distributed memory systems . . . . . . . . . . . . . . . . 227--258 P. F. Leggett and A. T. J. Marsh and S. P. Johnson and M. Cross Integrating user knowledge with information from parallelisation tools to facilitate the automatic generation of efficient parallel FORTRAN code . . . 259--288 L. Colombet and Ph. Michallon and D. Trystram Parallel matrix-vector product on rings with a minimum of communications . . . . 289--310 Yu-Hua Lee and Shi-Jinn Horng and Tzong-Wann Kao and Ferng-Shi Jaung and Yuung-Jih Chen and Horng-Ren Tsai Parallel computation of exact Euclidean distance transform . . . . . . . . . . . 311--325 Theodore Johnson and Timothy A. Davis and Steven M. Hadfield A concurrent dynamic task graph . . . . 327--333
Jingling Xue Transformations of nested loops with non-convex iteration spaces . . . . . . 339--368 Bruce Boldon and Narsingh Deo and Nishit Kumar Minimum-weight degree-constrained spanning tree problem: Heuristics and implementation on an SIMD parallel machine . . . . . . . . . . . . . . . . 369--382 Peter Fiebach Cyclic block-algorithms for solving triangular systems on distributed-memory multiprocessors with mesh topology . . . 383--393 Imtiaz Ahmad and Muhammad K. Dhodhi Multiprocessor scheduling in a genetic paradigm . . . . . . . . . . . . . . . . 395--406 D. Moncrieff and R. E. Overill and S. Wilson Heterogeneous computing machines and Amdahl's law . . . . . . . . . . . . . . 407--413 Roland Wismüller and Michael Oberhuber and Johann Krammer and Olav Hansen Interactive debugging and performance analysis of massively parallel applications . . . . . . . . . . . . . . 415--442 F. Gutbrod and N. Attig and M. Weber The SU(2)-Lattice Gauge Theory simulation code on the Intel Paragon supercomputer . . . . . . . . . . . . . 443--463 M. M. Shearer Computational optimization of finite difference methods on the CM5 . . . . . 465--481
Samuel Kortas and Philippe Angot A practical and portable model of programming for iterative solvers on distributed memory machines . . . . . . 487--512 S. Oliveira Parallel multigrid methods for transport equations: the anisotropic case . . . . 513--537 Markus Hegland Real and complex fast Fourier transforms on the Fujitsu VPP 500 . . . . . . . . . 539--553 Roni Khardon and Shlomit S. Pinter Partitioning and scheduling to counteract overhead . . . . . . . . . . 555--593 Sotirios G. Ziavras and Arup Mukherjee Data broadcasting and reduction, prefix computation, and sorting on reduced hypercube parallel computers . . . . . . 595--606 Lin Chen Partitioning graphs into Hamiltonian ones . . . . . . . . . . . . . . . . . . 607--618
A. T. Chronopoulos and C. D. Swanson Parallel iterative ${S}$-step methods for unsymmetric linear systems . . . . . 623--641 D. Conforti and L. De Luca and L. Grandinetti and R. Musmanno A parallel implementation of automatic differentiation for partially separable functions using PVM . . . . . . . . . . 643--656 Y. Trémolet and F.-X. Le Dimet Parallel algorithms for variational data assimilation and coupling models . . . . 657--674 Dugki Min and Matt W. Mutka A model for analyzing interactions in $2$-D mesh wormhole-routed multicomputers . . . . . . . . . . . . . 675--699 Borut Robi\vc and Bo\vstjan Vilfan Improved schemes for mapping arbitrary algorithms onto processor meshes . . . . 701--724 Klaus Stüben and Hermann Mierendorff and Clemens-August Thole and Owen Thomas Industrial parallel computing with real codes . . . . . . . . . . . . . . . . . 725--737 Umakishore Ramachandran and Gautam Shah and S. Ravikumar and Jeyakumar Muthukumarasamy Scalability study of the KSR-1 . . . . . 739--759 G. Fabbretti and A. Farina and D. Laforenza and F. Vinelli Mapping the synthetic aperture radar signal processor on a distributed-memory MIMD architecture . . . . . . . . . . . 761--784
William Gropp and Ewing Lusk and Nathan Doss and Anthony Skjellum High-performance, portable implementation of the MPI Message Passing Interface Standard . . . . . . . 789--828 Y. F. Hu and D. R. Emerson and R. J. Blake The communication performance of the Cray T3D and its effect on iterative solvers . . . . . . . . . . . . . . . . 829--844 M. Chandwani and N. S. Chaudhari Formulation and analysis of parallel context-free recognition and parsing on a PRAM model . . . . . . . . . . . . . . 845--868 Mats Brorsson and Per Stenström Characterising and modelling shared memory accesses in multiprocessor programs . . . . . . . . . . . . . . . . 869--893 Sanjeev R. Rastogi and Norman J. Wagner A parallel algorithm for Lees-Edwards boundary conditions . . . . . . . . . . 895--901 Leszek Ga\csieniec and Andrzej Pelc Adaptive broadcasting with faulty nodes 903--912
Zhiwei Xu and Kai Hwang Early prediction of MPP performance: The SP2, T3D, and Paragon experiences . . . 917--942 S. Lanteri Parallel solutions of compressible flows using overlapping and non-overlapping mesh partitioning strategies . . . . . . 943--968 Mark A. Franklin and Vasudha Govindan A general matrix iterative model for dynamic load balancing . . . . . . . . . 969--989 Paraskevi Fragopoulou and Selim G. Akl Spanning subgraphs with applications to communication on the multidimensional torus network . . . . . . . . . . . . . 991--1015 N. Bassiliades and I. Vlahavas Hierarchical query execution in a parallel object-oriented database system 1017--1048
M. Surridge and D. J. Tildesley and Y. C. Kong and D. B. Adolf Practical aspects and experiences. A parallel molecular dynamics simulation code for dialkyl cationic surfactants 1053--1071 Frank C. Wimberly and Michael H. Lambert and Nicholas A. Nystrom and Alex Ropelewski and William Young Porting third-party applications packages to the Cray T3D: Programming issues and scalability results . . . . . 1073--1089 Josep-Lluis Larriba-Pey and Juan J. Navarro and Angel Jorba and Oriol Roig Review of general and Toeplitz vector bidiagonal solvers . . . . . . . . . . . 1091--1125 (or 1091--1126??) Peter K. K. Loh Artificial intelligence search techniques as fault-tolerant routing strategies . . . . . . . . . . . . . . . 1127--1147 H. H. ten Cate and E. A. H. Vollebregt On the portability and efficiency of parallel algorithms and software . . . . 1149--1163
Ignacio Martín Llorente and Francisco Tirado and Luis Vázquez Some aspects about the scalability of scientific applications on parallel architectures . . . . . . . . . . . . . 1169--1195 Goran Lj. Djordjevi\'c and Milorad B. To\vsi\'c A heuristic for scheduling task graphs with communication delays onto multiprocessors . . . . . . . . . . . . 1197--1214 Jerry C. Yan and Sekhar R. Sarukkai Analyzing parallel program performance using normalized performance indices and trace transformation techniques . . . . 1215--1237 Abdel Aziz Farrag New algorithm for constructing fault-tolerant solutions of the circulant graph configuration . . . . . 1239--1253 (or 1239--1254??) C. Calvin Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication . . . . . . . . . . . . . 1255--1279 Maria Antonietta Pirozzi A fast numerical method for mildly nonlinear parabolic initial boundary value problems. II: The parallel implementation on the Intel Touchstone Delta system . . . . . . . . . . . . . . 1281--1285
K. A. Gallivan and B. A. Marsolf and H. A. G. Wijshoff Solving large nonsymmetric sparse linear systems using MCSPARSE . . . . . . . . . 1291--1333 S. Hioki Construction of staples in lattice gauge theory on a parallel computer . . . . . 1335--1344 Rabi N. Mahapatra and Sudipta Mahapatra Mapping of neural network models onto two-dimensional processor arrays . . . . 1345--1357 Piyush Maheshwari Improving granularity and locality of data in multiprocessor execution of functional programs . . . . . . . . . . 1359--1372 Mich\`ele Dion and Yves Robert Mapping affine loop nests . . . . . . . 1373--1397 Ingmar Neumann and Wolfgang Wilhelmi A parallel algorithm for achieving the Smith normal form of an integer matrix 1399--1412 C. Calvin and L. Colombet Performance evaluation and modeling of collective communications on Cray T3D 1413--1427
Yasushi Shinjo and Yasushi Kiyoki A lightweight process facility supporting meta-level programming . . . 1429--1454 A. Cichocki and A. Bargiela Neural networks for solving linear inequality systems . . . . . . . . . . . 1455--1475 M. Hamdi and C. K. Lee Dynamic load-balancing of image processing applications on clusters of workstations . . . . . . . . . . . . . . 1477--1492 N. P. Kruyt A conjugate gradient method for the spectral partitioning of graphs . . . . 1493--1502 R. Hess and W. Joppich A comparison of parallel multigrid and a fast Fourier transform algorithm for the solution of the Helmholtz equation in numerical weather prediction . . . . . . 1503--1512 William Gropp and Ewing Lusk A high-performance MPI implementation on a shared-memory vector supercomputer . . 1513--1526 Bodo Heise and Michael Jung Parallel solvers for nonlinear elliptic problems based on domain decomposition ideas . . . . . . . . . . . . . . . . . 1527--1544 Edward Walker and Gary Morgan and Bruce Cass and Zygmunt Ulanowski A note on compiling FORTRAN loop kernels onto a dataflow architecture . . . . . . 1545--1557
Dominique Barth Parallel matrix product algorithm in the de Bruijn network using emulation of meshes of trees . . . . . . . . . . . . 1563--1578 Jong-Uk Kim and Kyu-Hyun Shim and Kyu Ho Park A link-disjoint subcube for processor allocation in hypercube computers . . . 1579--1595 Dale M. Slone and Garry H. Rodrigue Efficient biased random bit generation for parallel lattice gas simulations . . 1597--1620 Jingling Xue Unimodular transformations of non-perfectly nested loops . . . . . . . 1621--1645 David J. Jackson and Chris W. Humphres A simple yet effective load balancing extension to the PVM software system . . 1647--1660 S. Mahapatra and R. N. Mahapatra and B. N. Chatterji A parallel formulation of back-propagation learning on distributed memory multiprocessors . . . . . . . . . 1661--1675 Satoko Sakata and Umpei Nagashima and Mitsuhisa Sato and Satoshi Sekiguchi and Haruo Hosoya Performance evaluation of a workstation cluster, TMC CM-5, and Intel Paragon/XP using a parallel homology analysis program . . . . . . . . . . . . . . . . 1677--1693
G. Haring and P. Kacsuk and G. Kotsis Distributed and parallel systems: Environments and tools . . . . . . . . . 1699--1701 G. Chiola and G. Ciaccio Implementing a low cost, low latency parallel platform . . . . . . . . . . . 1703--1717 F. Bergadano and A. Giallombardo and A. Puliafito and G. Ruffo and L. Vita Security agents for information retrieval in distributed systems . . . . 1719--1731 Rushed Kanawati LICRA: a replicated-data management algorithm for distributed synchronous group-ware applications . . . . . . . . 1733--1746 Péter Kacsuk and José C. Cunha and Gábor Dózsa and João Lourenço and Tibor Fadgyas and Tiago Antão A graphical development and debugging environment for parallel programs . . . 1747--1770 Gabriele Kotsis A systematic approach for workload modeling for parallel processing systems 1771--1787 J. Lüthi and S. Majumdar and G. Kotsis and G. Haring Performance bounds for distributed systems with workload variabilities and uncertainties . . . . . . . . . . . . . 1789--1806 Tamás Bartha and Endre Selényi Probabilistic system-level fault diagnostic algorithms for multiprocessors . . . . . . . . . . . . 1807--1821 T. Delaitre and G. R. Ribeiro-Justo and F. Spies and S. C. Winter A graphical toolset for simulation modelling of parallel systems . . . . . 1823--1836 H. Wabnig and G. Haring PAPS --- a testbed for performance prediction of parallel applications . . 1837--1851 Péter Kacsuk and Zsolt Németh and Zsolt Puskás Tools for mapping, load balancing and monitoring in the LOGFLOW parallel Prolog project . . . . . . . . . . . . . 1853--1881 E. Morel and J. Briat and J. Chassin de Kergommeaux Cuts and side-effects in distributed memory OR-parallel Prolog . . . . . . . 1883--1896 Szabolcs Ferenczi Parallel execution of object-oriented programs: Message handling strategies 1897--1912 László Böszörményi and Karl-Heinz Eder M3Set --- a language for handling of distributed and persistent sets of objects . . . . . . . . . . . . . . . . 1913--1925
Xiaodong Wang and Vwani P. Roychowdhury and Pratheep Balasingam Practical aspects and experiences. Scalable massively parallel algorithms for computational nanoelectronics . . . 1931--1963 Anthony Theodore Chronopoulos and Gang Wang Practical aspects and experiences. Parallel solution of a traffic flow simulation problem . . . . . . . . . . . 1965--1983 Der-Chyuan Lou and Chin-Chen Chang A parallel two-list algorithm for the knapsack problem . . . . . . . . . . . . 1985--1996 M. A. Amer and B. A. Abdel-Hamida and D. Fausett Parallel implementation of the Kronecker product technique for numerical solution of parabolic partial differential equations . . . . . . . . . . . . . . . 1997--2005 Edward A. Billard and Joseph C. Pasquale Load balancing to adjust for proximity in some network topologies . . . . . . . 2007--2023 D. M. Dhamdhere and Sridhar R. Iyer and E. Kishore Kumar Reddy Distributed termination detection for dynamic systems . . . . . . . . . . . . 2025--2045 Srabani Sen Gupta and Rajib K. Das and Krishnendu Mukhopadhyaya and Bhabani P. Sinha A family of network topologies with multiple loops and logarithmic diameter 2047--2064
J. Dongarra and B. Tourancheau Workshop on environments and tools for parallel scientific computing . . . . . 1 Tony Hey and Alistair Dunlop and Emilio Hernández Realistic parallel performance estimation . . . . . . . . . . . . . . . 5--21 Jesus Labarta and Sergi Girona and Toni Cortes Analyzing scheduling policies using Dimemas . . . . . . . . . . . . . . . . 23--34 Gilles Berger Sabbatel Hardware solutions for efficient distributed computing on ATM networks 35--48 Jack J. Dongarra and Sven Hammarling and David W. Walker Key concepts for parallel out-of-core $L U$ factorization . . . . . . . . . . . . 49--70 T. Brandes and S. Chaumette and M. C. Counilh and J. Roman and A. Darte and F. Desprez and J. C. Mignot HPFIT: a set of integrated tools for the parallelization of applications using High Performance Fortran. Part I: HPFIT and the TransTOOL environment . . . . . 71--87 T. Brandes and S. Chaumette and M. C. Counilh and J. Roman and F. Desprez and J. C. Mignot HPFIT: a set of integrated tools for the parallelization of applications using High Performance Fortran. Part II: Data-structure visualization and HPF extensions for irregular problems . . . 89--105 Lo\"\ic Prylli The CAPDYN environment and its message-passing library implementation 107--120 Vaidy Sunderam Heterogeneous network computing: The next generation . . . . . . . . . . . . 121--135 El Mostafa Daoudi and Abdelhak Lakhouaja Exploiting the symmetry in the parallelization of the Jacobi method . . 137--151 François Pellegrini Graph partitioning based methods and tools for scientific computing . . . . . 153--164 Jean-Yves Berthou and Laurent Colombet Which approach to parallelizing scientific codes --- That is the question . . . . . . . . . . . . . . . . 165--179 Karen L. Karavanic and Jussi Myllymaki and Miron Livny and Barton P. Miller Integrated visualization of parallel program performance data . . . . . . . . 181--198 D. Kranzlmüller and S. Grabner and J. Volkert Debugging with the MAD environment . . . 199--217 Bruno Gaujal and Alain Jean-Marie and Philippe Mussi and Gunther Siegel High speed simulation of discrete event systems by mixing process oriented and equational approaches . . . . . . . . . 219--233 Laurent Lef\`evre Parallel programming on top of DSM system. An experimental study . . . . . 235--249 Pierre-Yves Calland and Alain Darte and Yves Robert and Frederic Vivien Plugging anti and output dependence removal techniques into loop parallelization algorithm . . . . . . . 251--266
Timo Hamalainen and Harri Klapuri and Jukka Saarinen and Kimmo Kaski Mapping of SOM and LVQ algorithms on a tree shape parallel computer system . . 271--289 Chao-Tung Yang and Shian-Shyong Tseng and Cheng-Der Chuang and Wen-Chung Shih Using knowledge-based techniques on loop parallelization for parallelizing compilers . . . . . . . . . . . . . . . 291--309 Yuh-Shyan Chen and Jang-Ping Sheu Tolerating faults in injured hypercubes using maximal fault- free subcube-ring 311--331 Plamen Y. Yalamov Stability of a partitioning algorithm for bidiagonal systems . . . . . . . . . 333--348 Sung Kwon Kim Rectangulating rectilinear polygons in parallel . . . . . . . . . . . . . . . . 349--367 C. K. Yuen Parallel programming --- a critique . . 369--380 A. Basermann and B. Reichel and C. Schelthoff Preconditioned CG methods for sparse matrices on massively parallel machines 381--398
David E. Womble and David S. Greenberg Parallel I/O: an introduction . . . . . 403--417 Ethan L. Miller and Randy H. Katz RAMA: An easy-to-use, high-performance parallel file system . . . . . . . . . . 419--446 Nils Nieuwejaar and David Kotz The Galley parallel file system . . . . 447--476 Jason A. Moore and Michael J. Quinn Enhancing disk-directed I/O for fine-grained redistribution of file data 477--499 Eric J. Schwabe and Ian M. Sutherland and Bruce K. Holmer Evaluating approximately balanced parity-declustered data layouts for disk arrays . . . . . . . . . . . . . . . . . 501--523 J. Carretero and F. Pérez and P. de Miguel and F. García and L. Alonso Performance increase mechanisms for parallel and distributed file systems 525--542 Ian Parsons and Ron Unrau and Jonathan Schaeffer and Duane Szafron PI/OT: Parallel I/O templates . . . . . 543--570 Thomas H. Cormen and Melissa Hirschl Early experiences in evaluating the parallel disk model with the ViC* implementation . . . . . . . . . . . . . 571--600 Rakesh D. Barve and Edward F. Grove and Jeffrey Scott Vitter Simple randomized mergesort on parallel disks . . . . . . . . . . . . . . . . . 601--631
M. Pakzad and J. L. Lloyd and C. Phillips Independent columns: a new parallel ILU preconditioner for the PCG method . . . 637--647 (or 637--648??) Mohan K. Kadalbajoo and A. Appaji Rao Parallel group explicit method for two-dimensional parabolic equations . . 649--666 J. Lopez and O. Plata and F. Arguello and E. L. Zapata Unified framework for the parallelization of divide and conquer based tridiagonal systems . . . . . . . 667--686 Sergei Gorlatch $N$-graphs: Scalable topology and design of balanced divide-and-conquer algorithms . . . . . . . . . . . . . . . 687--698 M. Cermele and M. Colajanni Non-uniform and dynamic domain decompositions for hypercomputing . . . 699--720 Roman Trobec and Izidor Jerebic Local diagnosis in massively parallel systems . . . . . . . . . . . . . . . . 721--731 G. Mitra and I. Hai and M. T. Hajian A distributed processing algorithm for solving integer programs using a cluster of workstations . . . . . . . . . . . . 733--753 Jiahong Wang and Jie Li and Hisao Kameda Simulation studies on concurrency control in parallel transaction processing systems . . . . . . . . . . . 755--775 Neeraj K. Sharma and Madhusudhana R. Pinnu An efficient implementation of bypass queue under bursty traffic . . . . . . . 777--781 Ishfaq Ahmad Express versus PVM: a performance comparison . . . . . . . . . . . . . . . 783--812 Anonymous Miscellaneous: Calendar of forthcoming conferences and events . . . . . . . . . 813
A. Chalmers and F. W. Jansen Parallel graphics and visualisation . . 817 Thomas W. Crockett An introduction to parallel rendering 819--843 Alan Heirich and James Arvo Scalable Monte Carlo image synthesis . . 845--859 Hyeon-Ju Yoon and Seongbae Eun and Jung Wan Cho Image parallel ray tracing using static load balancing and data prefetching . . 861--872 Erik Reinhard and Frederik W. Jansen Rendering large scenes using parallel ray tracing . . . . . . . . . . . . . . 873--885 Bruno Arnaldi and Thierry Priol and Luc Renambot and Xavier Pueyo Visibility masks for solving complex radiosity computations on multiprocessors . . . . . . . . . . . . 887--897 Christophe Renaud and François Rousselle Fast massively parallel progressive radiosity on the MP-1 . . . . . . . . . 899--913 Anton H. J. Koning and Karel J. Zuiderveld and Max A. Viergever Volume visualization on shared memory architectures . . . . . . . . . . . . . 915--925 Rüdiger Westermann and Thomas Ertl Distributed volume visualization: a step towards integrated data analysis and image synthesis . . . . . . . . . . . . 927--941 Cemal Köse and Alan Chalmers Profiling for efficient parallel volume visualisation . . . . . . . . . . . . . 943--952 David C. Banks Screen-parallel determination of intersection curves . . . . . . . . . . 953--960 Michael Krogh and James Painter and Charles Hansen Parallel sphere rendering . . . . . . . 961--974 Malte Zöckler and Detlev Stalling and Hans-Christian Hege Parallel line integral convolution . . . 975--989 Shaun Bangay and James Gain and Greg Watkins and Kevan Watkins Building the second generation of parallel/distributed virtual reality systems . . . . . . . . . . . . . . . . 991--1000
Guangye Li A block variant of the GMRES method on massively parallel processors . . . . . 1005--1019 P. Beraldi and F. Guerriero Parallel asynchronous implementation of the $\epsilon$-relaxation method for the linear minimum cost flow problem . . . . 1021--1044 Padma Raghavan Parallel ordering using edge contraction 1045--1067 Soren S. Nielsen and Stavros A. Zenios Scalable parallel Benders decomposition for stochastic linear programming . . . 1069--1088 Ajit Singh and Vincent Van Dongen An integrated performance analysis tool for SPMD data-parallel programs . . . . 1089--1112 Svetozara Petrova Parallel implementation of fast elliptic solver . . . . . . . . . . . . . . . . . 1113--1128 S. Chandra Sekhara Rao Existence and uniqueness of WZ factorization . . . . . . . . . . . . . 1129--1139 Xin Wang and Edward K. Blum and D. Stott Parker and Daniel Massey The dance party problem and its application to collective communication in computer networks . . . . . . . . . . 1141--1156 D. C. Hodgson and P. K. Jimack A domain decomposition preconditioner for a parallel finite element solver on distributed unstructured grids . . . . . 1157--1181 Mouloud Oussaid\`ene and Bastien Chopard and Olivier V. Pictet and Marco Tomassini Parallel genetic programming and its application to trading model induction 1183--1198 Marco D'Apuzzo and Marco Lapegna and Almerico Murli Scalability and load balancing in adaptive algorithms for multidimensional integration . . . . . . . . . . . . . . 1199--1210
Michael Eldredge and Thomas J. R. Hughes and Robert M. Ferencz and Steven M. Rifai and Arthur Raefsky and Bruce Herndon High-performance parallel computing in industry . . . . . . . . . . . . . . . . 1217--1233 V. Kalro and T. Tezduyar Parallel $3$D computation of unsteady flows around circular cylinders . . . . 1235--1248 Y. Matsumoto and T. Tokumasu Parallel computing of diatomic molecular rarefied gas flows . . . . . . . . . . . 1249--1260 L. Paglieri and D. Ambrosi and L. Formaggia and A. Quarteroni and A. L. Scheinine Parallel computation for shallow water flow: a domain decomposition approach 1261--1277 S. E. Ray and G. P. Wren and T. E. Tezduyar Parallel implementations of a finite element formulation for fluid-structure interactions in interior flows . . . . . 1279--1292 N. Satofuka and M. Obata and T. Suzuki Parallel computation of super-/hypersonic flows on workstation network and Transputer arrays . . . . . 1293--1305 John Shadid and Scott Hutchinson and Gary Hennigan and Harry Moffat and Karen Devine and A. G. Salinger Efficient parallel computation of unstructured finite element reacting flow solutions . . . . . . . . . . . . . 1307--1325 M. S. Shephard and J. E. Flaherty and C. L. Bottasso and H. L. de Cougny and C. Ozturan and M. L. Simone Parallel automatic adaptive analysis . . 1327--1347 T. Tezduyar and V. Kalro and W. Garrard Parallel computational methods for $3$D simulation of a parafoil with prescribed shape changes . . . . . . . . . . . . . 1349--1363 Genki Yagawa and Yasushi Nakabayashi and Hiroshi Okuda Large-scale finite element fluid analysis by massively parallel processors . . . . . . . . . . . . . . . 1365--1377 Andrew Yeckel and Jeffrey J. Derby Parallel computation of incompressible flows in materials processing: Numerical experiments in diagonal preconditioning 1379--1400
Mark J. Clement and Michael J. Quinn Automated performance prediction for scalable parallel computing . . . . . . 1405--1420 P. Arbenz and W. Gander and M. Oettli The remote computation system . . . . . 1421--1428 W. J. Gutjahr and M. Hitz and T. A. Mueck Task assignment in Cayley interconnection topologies . . . . . . . 1429--1460 Aiichiro Nakano and Timothy Campbell Adaptive curvilinear-coordinate approach to dynamic load balancing of parallel multiresolution molecular dynamics . . . 1461--1478 Fabio Ancona and Stefano Rovetta and Rodolfo Zunino Transputer-based implementation of distributed associative memories . . . . 1479--1491 E. W. Evans and S. P. Johnson and P. F. Leggett and M. Cross Automatic code generation of overlapped communications in a parallelisation tool 1493--1523 X. Yuan and C. Salisbury and D. Balsara and R. Melhem Load balancing package on distributed memory systems and its application to particle-particle particle-mesh (P3M) methods . . . . . . . . . . . . . . . . 1525--1544 M. S. Bebbington Parallel implementation of an aggregation/disaggregation method for evaluating quasi-stationary behavior in continuous-time Markov chains . . . . . 1545--1559
M. Kutrib and R. Vollmar and Th. Worsch Introduction to the special issue on cellular automata . . . . . . . . . . . 1567--1576 J.-P. Allouche and F. v. Haeseler and E. Lange and A. Petersen and G. Skordev Linear cellular automata and automatic sequences . . . . . . . . . . . . . . . 1577--1592 G. Cattaneo and E. Formenti and L. Margara and G. Mauri Transformations of the one-dimensional cellular automata rule space . . . . . . 1593--1611 Klaus Sutner Linear cellular automata and Fischer automata . . . . . . . . . . . . . . . . 1613--1634 Mario Markus and Tomas Hahn and Ingo Kusch A novel quantification of cellular automata . . . . . . . . . . . . . . . . 1635--1642 Thomas Buchholz and Martin Kutrib Some relations between massively parallel arrays . . . . . . . . . . . . 1643--1662 Olivier Heen Efficient constant speed-up for one dimensional cellular automata calculators . . . . . . . . . . . . . . 1663--1671 Paola Flocchini and Frédéric Geurts and Nicola Santoro CA-like error propagation in fuzzy CA 1673--1682 Thomas Worsch On parallel Turing machines with multi-head control units . . . . . . . . 1683--1697 Jörg R. Weimar Cellular automata for reaction--diffusion systems . . . . . . 1699--1715
Divyesh Jadav and Chutimet Srinilta and Alok Choudhary Batching and dynamic allocation techniques for increasing the stream capacity of an on-demand media server 1727--1742 Jinsung Cho and Heonshik Shin Scheduling video streams in a large-scale video-on-demand server . . . 1743--1755 Valentin Rottmann and Petra Berenbrink and Reinhard Luling Simple distributed scheduling policy for parallel interactive continuous media servers . . . . . . . . . . . . . . . . 1757--1776 Constantin Arapis and Simon Gibbs and Christian Breiteneder Real-time segmentation of video on a multiprocessor platform . . . . . . . . 1777--1792 John A. Watlington and V. Michael Bove, Jr. A system for parallel media processing 1793--1809 Eddy De Greef and Francky Catthoor and Hugo De Man Memory size reduction through storage order optimization for embedded parallel multimedia applications . . . . . . . . 1811--1837 (or 1811--1838??) Wei Li and Xiaohu Huang and Nanning Zheng Parallel implementing OpenGL on PVM . . 1839--1850
Abdelsalam Heddaya and Kihong Park Congestion control for asynchronous parallel computing on workstation networks . . . . . . . . . . . . . . . . 1855--1875 P. S. Rao and G. Mouney Data communication in parallel block predictor-corrector methods for solving ODE's . . . . . . . . . . . . . . . . . 1877--1888 Weifa Liang and Xiaojun Shen Finding the $k$ most vital edges in the minimum spanning tree problem . . . . . 1889--1907 Yih Huang and Philip K. McKinley Adaptive global reduction algorithm for wormhole-routed 2D meshes . . . . . . . 1909--1936 Seong-Pyo Kim and Taisook Han Fault-tolerant wormhole routing in mesh with overlapped solid fault regions . . 1937--1962 M.-Tahar Kechadi and J.-Luc Dekeyser Analysis and simulation of an out-of-order execution model in vector multiprocessor systems . . . . . . . . . 1963--1986 Hong Shen Optimal parallel multiselection on EREW PRAM . . . . . . . . . . . . . . . . . . 1987--1992 Tong-Yee Lee Exploitation of image parallelism for ray tracing $3$D scenes on $2$D mesh multicomputers . . . . . . . . . . . . . 1993--2015 Jarmo Rantakokko Strategies for parallel variational data assimilation . . . . . . . . . . . . . . 2017--2039 Michael A. Lambert and Garry H. Rodrigue and Dennis W. Hewett Parallel DSDADI method for solution of the steady state diffusion equation . . 2041--2065 Ç. K. Koç Parallel $p$-adic method for solving linear systems of equations . . . . . . 2067--2074 Chunguang Sun Parallel solution of sparse linear least squares problems on distributed-memory multiprocessors . . . . . . . . . . . . 2075--2093 Daniela di Serafino Parallel implementation of a multigrid multiblock Euler solver on distributed memory machines . . . . . . . . . . . . 2095--2113 R. E. Overill and S. Wilson Data parallel evaluation of univariate polynomials by the Knuth-Eve algorithm 2115--2127
C. Baillie and J. Michalakes and R. Skålin Regional weather modeling on parallel computers . . . . . . . . . . . . . . . 2135--2142 S. J. Thomas and A. V. Malevsky and M. Desgagne and R. Benoit and P. Pellerin and M. Valin Massively parallel implementation of the mesoscale compressible community model 2143--2160 R. Skålin and D. Bjòrge Implementation and performance of a parallel version of the HIRLAM limited area atmospheric model . . . . . . . . . 2161--2172 J. Michalakes MM90: a scalable parallel implementation of the Penn State/NCAR Mesoscale Model (MM5) . . . . . . . . . . . . . . . . . 2173--2186 Donald Dabdub and Rajit Manohar Performance and portability of an air quality model . . . . . . . . . . . . . 2187--2200 M. Ashworth and F. Foelkel and V. Gülzow and K. Kleese and D. P. Eppel and H. Kapitza and S. Unger Parallelization of the GESIMA mesoscale atmospheric model . . . . . . . . . . . 2201--2213 Ulrich Schättler and Elisabeth Krenzien Parallel `Deutschland-Modell' --- a message-passing version for distributed memory computers . . . . . . . . . . . . 2215--2226 Alan J. Wallcraft and Daniel R. Moore The NRL layered ocean model . . . . . . 2227--2242 A. Sathye and M. Xue and G. Bassett and K. Droegemeier Parallel weather modeling with the advanced regional prediction system . . 2243--2256
Thomas H. Cormen and David M. Nicol Performing out-of-core FFTs on parallel disk systems . . . . . . . . . . . . . . 5--20 Peter Triantafillou and Christos Faloutsos Overlay striping and optimal parallel I/O for modern applications . . . . . . 21--43 Daniel A. Ford and Robert J. T. Morris and Alan E. Bell Redundant arrays of independent libraries (RAIL): the StarFish tertiary storage system . . . . . . . . . . . . . 45--64 Carter T. Shock and Chialin Chang and Bongki Moon and Anurag Acharya and Larry Davis and Joel Saltz and Alan Sussman Design and evaluation of a high-performance earth science database 65--89 Shahram Ghandeharizadeh and Richard Muntz Design and implementation of scalable continuous media servers . . . . . . . . 91--122 Leana Golubchik and John C. S. Lui and Maria Papadopouli Survey of approaches to fault tolerant design of VOD servers: techniques, analysis and comparison . . . . . . . . 123--155 Ann L. Chervenak Challenges for tertiary storage in multimedia servers . . . . . . . . . . . 157--176
Manu Konchady and Arun Sood and Paul S. Schopf Implementation and performance evaluation of a parallel ocean model . . 181--203 Kangwoo Lee and Michel Dubois Empirical models of miss rates . . . . . 205--219 Luis Díaz de Cerio and Miguel Valero-García and Antonio González Method for exploiting communication/computation overlap in hypercubes . . . . . . . . . . . . . . . 221--245 Michael E. Houle and Gavin Turner Dimension-exchange token distribution on the mesh and the torus . . . . . . . . . 247--265 Jelena Mi\vsi\'c Unicast-based multicast algorithm in wormhole-routed star graph interconnection networks . . . . . . . . 267--286 K. Sumiyoshi and T. Ebisuzaki Performance of parallel solution of a block-tridiagonal linear system on Fujitsu VPP500 . . . . . . . . . . . . . 287--304 S. V. Kuznetsov Orthogonal reduction of dense matrices to bidiagonal form on computers with distributed memory architectures . . . . 305--313
Piyush Mehrotra and John Van Rosendale and Hans Zima High Performance Fortran: History, status and future . . . . . . . . . . . 325--354 Henk J. Sips and Will Denissen and Kees van Reeuwijk Analysis of local enumeration and storage schemes in HPF . . . . . . . . . 355--382 Michael Gerndt High-level programming of massively parallel computers based on shared virtual memory . . . . . . . . . . . . . 383--400 Brian Armstrong and Seon Wook Kim and Insung Park and Michael Voss and Rudolf Eigenmann Compiler-based tools for analyzing parallel programs . . . . . . . . . . . 401--420 Pierre Boulet and Alain Darte and Georges-André Silber and Frédéric Vivien Loop parallelization algorithms: From parallelism extraction to code generation . . . . . . . . . . . . . . . 421--444 Amy W. Lim and Monica S. Lam Maximizing parallelism and minimizing synchronization with affine partitions 445--475 Trung N. Nguyen and Zhiyuan Lib Interprocedural analysis for loop scheduling and data allocation . . . . . 477--504 Wolfram Amme and Eberhard Zehendner Data dependence analysis in programs with pointers . . . . . . . . . . . . . 505--525 Lawrence Rauchwerger Run-time parallelization: Its time has come . . . . . . . . . . . . . . . . . . 527--556 Eduard Ayguadé and Jordi Garcia and Ulrich Kremer Tools and techniques for automatic data layout: a case study . . . . . . . . . . 557--578 Hironori Kasahara and Akimasa Yoshida A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing 579--596 M. Kandemir and A. Choudhary and J. Ramanujam and R. Bordawekar Compilation techniques for out-of-core parallel computations . . . . . . . . . 597--628 B. Creusillet and F. Irigoin Interprocedural analyses of Fortran programs . . . . . . . . . . . . . . . . 629--648 Vincent Lefebvre and Paul Feautrier Automatic storage management for parallel programs . . . . . . . . . . . 649--671
A. Averbuch and L. Ioffe and M. Israeli and L. Vozovoi Two-dimensional parallel solver for the solution of Navier--Stokes equations with constant and variable coefficients using ADI on cells . . . . . . . . . . . 673--699 C. Ceron and J. Dopazo and E. L. Zapata and J. M. Carazo and O. Trelles Parallel implementation of DNAml program on message-passing architectures . . . . 701--716 P. Fisette and J. M. Péterkenne Contribution to parallel and vector computation in multibody dynamics . . . 717--728 A. A. Mirin and D. E. Shumaker and M. F. Wehner Efficient filtering techniques for finite-difference atmospheric general circulation models on parallel processors . . . . . . . . . . . . . . . 729--740 R. Aversa and A. Mazzeo and N. Mazzocca and U. Villano Developing applications for heterogeneous computing environments using simulation: a case study . . . . . 741--761 Mostafa M. Aref and Mohammed A. Tayyib Lana-Match algorithm: a parallel version of the Rete-Match algorithm . . . . . . 763--775 D. J. Evans and M. Barulli BSP linear solvers for dense matrices 777--795 Ananth Grama and Vipin Kumar and Ahmed Sameh Scalable parallel formulations of the Barnes--Hut method for $n$-body simulations . . . . . . . . . . . . . . 797--822 Zden\vek Hanzálek A parallel algorithm for gradient training of feedforward neural networks 823--839 Alain Jean-Marie and Sophie Lefebvre-Barbaroux and Zhen Liu An analytical approach to the performance evaluation of master--slave computational models . . . . . . . . . . 841--862 Zhen Liu Worst-case analysis of scheduling heuristics of parallel systems . . . . . 863--891 Piyush Maheshwari and Hong Shen An efficient clustering algorithm for partitioning parallel programs . . . . . 893--909 M. Marrocu and R. Scardovelli and P. Malguzzi Parallelization and performance of a meteorological limited area model . . . 911--922 Michael Mascagni Parallel linear congruential generators with prime moduli . . . . . . . . . . . 923--936 Tz. Ostromsky and P. C. Hansen and Z. Zlatev A coarse-grained parallel $QR$-factorization algorithm for sparse least squares problems . . . . . . . . . 937--964 Sung Kwon Kim Constant-time RMESH algorithms for the range minima and co-minima problems . . 965--977
F. Arbab and P. Ciancarini and C. Hankin Coordination languages for parallel programming . . . . . . . . . . . . . . 989--1004 Nicholas Carriero An implementation of Linda for a NUMA machine . . . . . . . . . . . . . . . . 1005--1021 Michel R. V. Chaudron and Arno C. N. van Duin The formal derivation of parallel triangular system solvers using a coordination-based design method . . . . 1023--1046 Lorenzo Donatiello and Alessandro Fabbri Generative coordination environments supporting parallel discrete event simulation . . . . . . . . . . . . . . . 1047--1080 Kees Everaars and Barry Koren Using coordination to parallelize sparse-grid methods for $3$-D CFD problems . . . . . . . . . . . . . . . . 1081--1106 Tom Holvoet and Thilo Kielmann Behaviour specification of parallel active objects . . . . . . . . . . . . . 1107--1135 George A. Papadopoulos Distributed and parallel systems engineering in MANIFOLD . . . . . . . . 1137--1160
Bouchaib Radi and Jean-François Estrade Adaptive parallelization techniques in global weather models . . . . . . . . . 1167--1175 Suchendra M. Bhandarkar and Salem Machaka and Sridhar Chirravuri and Jonathan Arnold Parallel computing for chromosome reconstruction via ordering of DNA sequences . . . . . . . . . . . . . . . 1177--1204 O. Benkahla and C. Aktouf and C. Robach Performance evaluation of distributed diagnosis algorithms in parallel systems 1205--1222 Elise de Doncker and Ajay Gupta Multivariate integration on hypercubic and mesh networks . . . . . . . . . . . 1223--1244 Qian-Ping Gu and Shietung Peng Node-to-set and set-to-set cluster fault tolerant routing in hypercubes . . . . . 1245--1261 Shahram Latifi and Pradip K. Srimani Wormhole broadcast in star graph networks . . . . . . . . . . . . . . . . 1263--1276
Mahlon Stacy and Dennis Hanson and Jon Camp and Richard A. Robb High performance computing in biomedical imaging research . . . . . . . . . . . . 1287--1321 Robert L. Galloway and W. Andrew Bass and Christopher E. Hockey Task-oriented asymmetric multiprocessing for interactive image-guided surgery . . 1323--1343 Simon K. Warfield and Ferenc A. Jolesz and Ron Kikinis A high performance computing approach to the registration of medical imaging data 1345--1368 Gary E. Christensen MIMD vs. SIMD parallel processing: a case study in $3$D medical image registration . . . . . . . . . . . . . . 1369--1383 Craig M. Wittenbrink Extensions to permutation warping for parallel volume rendering . . . . . . . 1385--1406 Chris Basoglu and Ravi Managuli and George York and Yongmin Kim Computing requirements of modern medical diagnostic ultrasound machines . . . . . 1407--1431 Paul Schimpf and Jens Haueisen and Ceon Ramon and Hannes Nowak Realistic computer modelling of electric and magnetic fields of human head and torso . . . . . . . . . . . . . . . . . 1433--1460 C. Laurent and F. Peyrin and J-M Chassery and M. Amiel Parallel image reconstruction on MIMD computers for three- dimensional cone-beam tomography . . . . . . . . . . 1461--1479 Jens Gregor and Dean A. Huff A computational study of the focus-of-attention EM-ML algorithm for PET reconstruction . . . . . . . . . . . 1481--1497 Chung-Ming Chen An efficient four-connected parallel system for PET image reconstruction . . 1499--1522 Habib Zaidi and Claire Labbé and Christian Morel Implementation of an environment for Monte Carlo simulation of fully $3$-D positron tomography on a high-performance parallel platform . . . 1523--1536 Bjorn De Sutter and Mark Christiaens and Koen De Bosschere and Jan Van Campenhout On the use of subword parallelism in medical image processing . . . . . . . . 1537--1556 Yuan-Ping Pang and Stephen Brimijoin Supercomputing-based dimeric analog approach for drug optimization . . . . . 1557--1566 Todd E. Scheetz and Terry A. Braun and Kyle J. Munn and Edwin M. Stone and Val C. Sheffield and Thomas L. Casavant GenoMap: a distributed system for unifying genotyping and genetic linkage analysis . . . . . . . . . . . . . . . . 1567--1592
Craig Chase and Prakash Arunachalam and Jacob Abraham Memory distribution: Techniques and practice for CAD applications . . . . . 1597--1615 Jih-H. Chen and Shu-Yun Le and Bruce A. Shapiro and Jacob V. Maizel Optimization of an RNA folding algorithm for parallel architectures . . . . . . . 1617--1634 Paul Caprioli and Mark H. Holmes A parallel quasi-Newton method for Gaussian data fitting . . . . . . . . . 1635--1651 E. Bampis and C. Delorme and J.-C. König Optimal schedules for $d-D$ grid graphs with communication delays . . . . . . . 1653--1664 Cyril Fonlupt and Philippe Marquet and Jean-Luc Dekeyser Data-parallel load balancing strategies 1665--1684 G. Haase Parallel incomplete Cholesky preconditioners based on the non-overlapping data distribution . . . 1685--1703
Greg Eisenhauer and Beth Plale and Karsten Schwan DataExchange: high performance communications in distributed laboratories . . . . . . . . . . . . . . 1713--1733 Ian Foster and Jonathan Geisler and William Gropp and Nicholas Karonis and Ewing Lusk and George Thiruvathukal and Steven Tuecke Wide-area implementation of the Message Passing Interface . . . . . . . . . . . 1735--1749 Matthias Brune and Jorn Gehring and Axel Keller and Burkhard Monien and Friedhelm Ramme and Alexander Reinefeld Specifying resources and services in metacomputing environments . . . . . . . 1751--1776 Henri Casanova and Jack Dongarra Using agent-based software for scientific computing in the NetSolve system . . . . . . . . . . . . . . . . . 1777--1790 Roy Williams and Bruce Sears A high-performance active digital library . . . . . . . . . . . . . . . . 1791--1806 A. W. van Halderen and B. J. Overeinder and P. M. A. Sloot and R. van Dantzig and D. H. J. Epema and M. Livny Hierarchical resource management in the Polder Metacomputing Initiative . . . . 1807--1825 Timothy J. Sheehan and William A. Shelton and Thomas J. Pratt and Philip M. Papadopoulos and Philip LoCascio and Thomas H. Dunigan The locally self-consistent multiple scattering code in a geographically distributed linked MPP environment . . . 1827--1846 Th Eickermann and J. Henrichs and M. Resch and R. Stoy and R. Volpel Metacomputing in gigabit environments: networks, tools, and applications . . . 1847--1872 Sharon Brunett and Thomas Gottschalk A large-scale metacomputing framework for the ModSAF real-time simulation . . 1873--1900 K. Mani Chandy and Joseph Kiniry and Adam Rifkin and Daniel Zimmerman A framework for structured distributed object computing . . . . . . . . . . . . 1901--1922
C. Vuik and R. R. P. van Nooyen and P. Wesseling Parallelism in ILU-preconditioned GMRES 1927--1946 Jonathan M. D. Hill and Bill McColl and Dan C. Stefanescu and Mark W. Goudreau and Kevin Lang and Satish B. Rao and Torsten Suel and Thanasis Tsantilas and Rob H. Bisseling BSPlib: The BSP programming library . . 1947--1980 Alina N. Moga and Bogdan Cramariuc and Moncef Gabbouj Parallel watershed transformation algorithms for image segmentation . . . 1981--2001 E. G. Talbi and Z. Hafidi and J-M. Geib A parallel adaptive tabu search approach 2003--2019 L. K. Lundin Computing the velocity of a rotating flow . . . . . . . . . . . . . . . . . . 2021--2034 Ravi Prakash and Dhabaleswar K. Panda Designing communication strategies for heterogeneous parallel systems . . . . . 2035--2052 B. Ciciani and M. Colajanni and C. Paolucci Performance evaluation of deterministic wormhole routing in $k$-ary $n$-cubes 2053--2075 Kuo-Pao Fan and Chung-Ta King Efficient barrier synchronization in wormhole-routed mesh networks supporting turn model . . . . . . . . . . . . . . . 2077--2099 Weng-Long Chang and Chih-Ping Chu The extension of the $I$ test . . . . . 2101--2127 Jos B. T. M. Roerdink and Michel A. Westenberg Data-parallel tomographic reconstruction: a comparison of filtered backprojection and direct Fourier reconstruction . . . . . . . . . . . . . 2129--2142 Joe Shang-Chieh Wu and Ying-Dar Lin An efficient and orderly implementation of bypass queue under bursty traffic . . 2143--2148
Jacques Verriet Scheduling interval-ordered tasks with non-uniform deadlines subject to non-zero communication delays . . . . . 3--21 Rolf H. Möhring and Markus W. Schäffter Scheduling series--parallel orders subject to $0/1$-communication delays 23--40 Alix Munier Approximation algorithms for scheduling trees with general communication delays 41--48 A. K. Amoura and E. Bampis and Y. Manoussakis and Zs. Tuza A comparison of heuristics for scheduling multiprocessor tasks on three dedicated processors . . . . . . . . . . 49--61 Cristina Boeres and Vinod E. F. Rebello A versatile cost modelling approach for multicomputer task scheduling . . . . . 63--86 Jacek B\la\.zewicz and Maciej Drozdowski and Mariusz Markiewicz Divisible task scheduling --- Concept and verification . . . . . . . . . . . . 87--98
Christoph W. Keßler and Jesper Larsson Träff Language and library support for practical PRAM programming . . . . . . . 105--135 Horng-Ren Tsai and Shi-Jinn Horng and Tzong-Wann Kao and Shung-Shing Lee and Shun-Shan Tsai Fundamental data movement operations and its applications on a hyper-bus broadcast network . . . . . . . . . . . 137--157 Danny Krizanc and Anton Saarimaki Bulk synchronous parallel: practical experience with a model for parallel computing . . . . . . . . . . . . . . . 159--181 S. W. Chen and C. Y. Fang and K. E. Chang Neural simulation of Petri nets . . . . 183--207
Ravi Murty and Daniel Okunbor Efficient parallel algorithms for molecular dynamics simulations . . . . . 217--230 Vikramaditya Sen and Mrinal K. Sen and Paul L. Stoffa PVM based $3$-D Kirchhoff depth migration using dynamically computed travel-times: an application in seismic data processing . . . . . . . . . . . . 231--248 Mohamed Benmaiza and Abderezak Touzene One-to-all broadcast algorithm for constant degree 4 Cayley graphs . . . . 249--264 Cristina Corral and Isabel Giménez and José Marín and José Mas Parallel $m$-step preconditioners for the conjugate gradient method . . . . . 265--281 Sunil Kim and Alexander V. Veidenbaum Interconnection network organization and its impact on performance and cost in shared memory multiprocessors . . . . . 283--309 J. S. Reeve and M. Heath An efficient parallel version of the householder-QL matrix diagonalisation algorithm . . . . . . . . . . . . . . . 311--319 I. Vlahavas and P. Kefalas and C. Halatsis OASys: an AND/OR parallel logic programming system . . . . . . . . . . . 321--336
Edmund Chadwick A hybrid parallel algorithm for the spectral transform method which uses functional parallelism . . . . . . . . . 345--360 T. C. Clune and J. R. Elliott and M. S. Miesch and J. Toomre and G. A. Glatzmaier Computational aspects of a code to study rotating turbulent convection in spherical shells . . . . . . . . . . . . 361--380 Maciej Drozdowski and W\lodzimierz Glazek Scheduling divisible loads in a three-dimensional mesh of processors . . 381--404 Akihiro Fujiwara and Michiko Inoue and Toshimitsu Masuzawa and Hideo Fujiwara A cost optimal parallel algorithm for weighted distance transforms . . . . . . 405--416 Y. F. Hu and R. J. Blake An improved diffusion algorithm for dynamic load balancing . . . . . . . . . 417--444 Zhiyong Liu and David W. Cheung Oblivious routing for LC permutations on hypercubes . . . . . . . . . . . . . . . 445--460 Roseli S. Wedemann and Valmir C. Barbosa and Raul Donangelo Defeasible time-stepping . . . . . . . . 461--489
Nicholas Giolmas and Daniel W. Watson and David M. Chelberg and Peter V. Henstock and June Ho Yi and Howard Jay Siegel Aspects of computational mode and data distribution for parallel range image segmentation . . . . . . . . . . . . . . 499--523 U. W. Rathe and P. Sanders and P. L. Knight A case study in scalability: An ADI method for the two-dimensional time-dependent Dirac equation . . . . . 525--533 H. Schwichtenberg and G. Winter and H. Wallmeier Acceleration of molecular mechanic simulation by parallelization and fast multipole techniques . . . . . . . . . . 535--546 Pierre Boulet and Jack Dongarra and Yves Robert and Frédéric Vivien Static tiling for heterogeneous computing platforms . . . . . . . . . . 547--568 W. Cai and K. Zhang and S. J. Turner and C. Sun Interlock avoidance in transparent and dynamic parallel program instrumentation using logical clocks . . . . . . . . . . 569--591 Giuseppe Passoni and Giancarlo Alfonsi and Giovanni Tula and Umberto Cardu A wavenumber parallel computational code for the numerical integration of the Navier--Stokes equations . . . . . . . . 593--611 M. Szularz and J. Weston and M. Clint Explicitly restarted Lanczos algorithms in an MPP environment . . . . . . . . . 613--631
Angelo Corana Parallel computation of the correlation dimension from a time series . . . . . . 639--666 Hermann Mierendorff and Helmut Schwamborn Automatic model generation for performance estimation of parallel programs . . . . . . . . . . . . . . . . 667--680 Zhong-Zhi Bai A class of asynchronous parallel multisplitting blockwise relaxation methods . . . . . . . . . . . . . . . . 681--701 S. Ramesh Implementation of communicating reactive processes . . . . . . . . . . . . . . . 703--727 Reiji Suda and Akira Nishida and Yoshio Oyanagi A high performance parallelization scheme for the Hessenberg double shift $QR$ algorithm . . . . . . . . . . . . . 729--744 Franco Zambonelli Exploiting biased load information in direct-neighbour load balancing policies 745--766 R. S. Wedemann and V. C. Barbosa and R. Donangelo Erratum to ``Defeasible time-stepping'' [Parallel Computing 25 (4) (April 1999) pp. 461--489] . . . . . . . . . . . . . 767--767
Anonymous Parallelization techniques for numerical modelling . . . . . . . . . . . . . . . 775--776 Gerhard Adrian Parallel processing in regional climatology: The parallel version of the ``Karlsruhe Atmospheric Mesoscale Model'' (KAMM) . . . . . . . . . . . . . 777--787 Ralf Diekmann and Andreas Frommer and Burkhard Monien Efficient schemes for nearest neighbor load balancing . . . . . . . . . . . . . 789--812 Ralf Ebner and Christoph Zenger A distributed functional framework for recursive finite element simulations . . 813--826 Michael Griebel and Gerhard Zumbusch Parallel multigrid in an adaptive PDE solver based on hashing and space-filling curves . . . . . . . . . . 827--843 Bruno Lang Efficient eigenvalue and singular value computations on shared memory machines 845--860 Ingrid Lenhardt and Thomas Rottner Krylov subspace methods for structural finite element analysis . . . . . . . . 861--875 Thomas Lippert Hyper-systolic algorithms for $N$-body computations and parallel level-$3$ BLAS libraries . . . . . . . . . . . . . . . 877--891 Wolfgang Mackens and Heinrich Voss General masters in parallel condensation of eigenvalue problems . . . . . . . . . 893--903 Reinhard Möller A systolic implementation of the MLEM reconstruction algorithm for positron emission tomography images . . . . . . . 905--920
S. J. Dodson and S. P. Walker and M. J. Bluck Parallelisation issues for high speed time domain integral equation analysis 925--942 W.-Y. Lin and C.-L. Chen Minimum communication cost reordering for parallel sparse Cholesky factorization . . . . . . . . . . . . . 943--967 B. Großer and B. Lang Efficient parallel reduction to bidiagonal form . . . . . . . . . . . . 969--986 G. S. Brodal Priority queues on parallel machines . . 987--1011 P. Sanders Analysis of nearest neighbor load balancing algorithms for random loads 1013--1033 D. Barth and C. Laforest Scattering and multi-scattering in trees and meshes, with local routing and without buffering . . . . . . . . . . . 1035--1057
M. E. Barrows and D. E. Gregory and L. Gao and A. L. Rosenberg and P. R. Cohen An empirical study of dynamic scheduling on rings of processors . . . . . . . . . 1063--1079 J. Yamamoto and others Performance evaluation of SNAIL: a multiprocessor based on the simple serial synchronized multistage interconnection network architecture . . 1081--1103 G.-H. Hwang and J. K. Lee Communication set generations with CSD calculus and expression-rewriting framework . . . . . . . . . . . . . . . 1105--1130 A. Clematis and A. Corana Modeling performance of heterogeneous parallel computing systems . . . . . . . 1131--1145 E. J. Kontoghiorghes and M. Clint and H.-H. Naegeli Recursive least-squares using a hybrid Householder algorithm on massively parallel SIMD systems . . . . . . . . . 1147--1159 G. Edjlali and M. Garbey and D. Tromeur-Dervout Interoperability parallel programs approach to simulate $3$D frontal polymerization processes . . . . . . . . 1161--1191
N. Cabibbo and Y. Iwasaki and K. Schilling High performance computing in lattice QCD . . . . . . . . . . . . . . . . . . 1197--1198 R. Gupta General physics motivations for numerical simulations of quantum field theory . . . . . . . . . . . . . . . . . 1199--1215 F. Rapuano Quenched physics on APE computers . . . 1217--1226 Stephan Güsken and Thomas Lippert and Klaus Schilling Lattice QCD with two dynamical Wilson fermions on APE100 parallel systems . . 1227--1242 S. Aoki and others Performance of lattice QCD programs on CP-PACS . . . . . . . . . . . . . . . . 1243--1255 Akira Ukawa Lattice QCD results from the CP-PACS computer . . . . . . . . . . . . . . . . 1257--1280 Robert D. Mawhinney The 1 Teraflops QCDSP computer . . . . . 1281--1296 R. Tripiccione APEmille . . . . . . . . . . . . . . . . 1297--1309 A. D. Kennedy The Hybrid Monte Carlo algorithm on parallel computers . . . . . . . . . . . 1311--1339 Philippe de Forcrand The MultiBoson method . . . . . . . . . 1341--1355 Th. Lippert Parallel SSOR preconditioning for lattice QCD . . . . . . . . . . . . . . 1357--1370 Stephan Güsken Stochastic estimator techniques and their implementation on distributed parallel computers . . . . . . . . . . . 1371--1381 G. Peter Lepage Improved discretizations for lattice QCD 1383--1393 Robert G. Edwards and Urs M. Heller and Rajamani Narayanan Chiral fermions on the lattice . . . . . 1395--1407
V. Annamalai and C. S. Krishnamoorthy and V. Kamakoti Adaptive finite element analysis on a parallel and distributed environment . . 1413--1434 G. Carré and S. Lanteri and Mark Loriot High performance simulations of compressible flows inside car engine geometries using the N3S-NATUR parallel solver . . . . . . . . . . . . . . . . . 1435--1458 Myron Ginsberg Influences, challenges, and strategies for automotive HPC benchmarking and performance improvement . . . . . . . . 1459--1476 S. Loucif and M. Ould-Khaoua and L. M. Mackenzie Analysis of fully adaptive wormhole routing in tori . . . . . . . . . . . . 1477--1487 Max Geigl and Martin Griebl and Christian Lengauer Termination detection in parallel loop nests with while loops . . . . . . . . . 1489--1510
Erich Strohmaier and Jack J. Dongarra and Hans W. Meuer and Horst D. Simon The marketplace of high-performance computing . . . . . . . . . . . . . . . 1517--1544 Yoshio Oyanagi Development of supercomputers in Japan: Hardware and software . . . . . . . . . 1545--1567 Enrico Clementi and Giorgina Corongiu Early parallelism with a loosely coupled array of processors: The ICAP experiment 1583--1600 Shunichi Uchida and Akira Aiba and Kazuaki Rokusawa and Takashi Chikayama and Ryuzo Hasegawa The parallel logic programming system in the FGCS project and its future directions . . . . . . . . . . . . . . . 1601--1633 Kisaburo Nakazawa and Hiroshi Nakamura and Taisuke Boku and Ikuo Nakata and Yoshiyuki Yamashita CP-PACS: a massively parallel processor at the University of Tsukuba . . . . . . 1635--1661 D. Sugimoto GRAPE: a parallel computer dedicated to astrophysical many-body problems . . . . 1663--1676 Paolo Cremonesi and Emilia Rosti and Giuseppe Serazzi and Evgenia Smirni Performance evaluation of parallel systems . . . . . . . . . . . . . . . . 1677--1698 V. S. Sunderam and G. A. Geist Heterogeneous parallel and distributed computing . . . . . . . . . . . . . . . 1699--1721 A. P. Willem Böhm and Jeffrey P. Hammes and Sumit S. Sur On the performance of pure and impure parallel functional programs . . . . . . 1723--1740 Rajiv Gupta and Santosh Pande and Kleanthis Psarris and Vivek Sarkar Compilation techniques for parallel systems . . . . . . . . . . . . . . . . 1741--1783 Siegfried Benkner and Hans Zima Compiling High Performance Fortran for distributed-memory architectures . . . . 1785--1825 B. Bacci and M. Danelutto and S. Pelagatti and M. Vanneschi SkIE: a heterogeneous environment for HPC applications . . . . . . . . . . . . 1827--1852 David E. Womble and others Massively parallel computing: A Sandia perspective . . . . . . . . . . . . . . 1853--1876 S. Lakshmivarahan and Sudarshan K. Dhall Ring, torus and hypercube architectures/algorithms for parallel computing . . . . . . . . . . . . . . . 1877--1906 Walid A. Najjar and Edward A. Lee and Guang R. Gao Advances in the dataflow computational model . . . . . . . . . . . . . . . . . 1907--1929 Iain S. Duff and Henk A. van der Vorst Developments and trends in the parallel solution of linear systems . . . . . . . 1931--1970 E. L. Zapata and O. Plata and R. Asenjo and G. P. Trabado Data-parallel support for numerical irregular problems . . . . . . . . . . . 1971--1994 Shun Doi and Takumi Washio Ordering strategies and related techniques to overcome the trade-off between parallelism and convergence in incomplete factorizations . . . . . . . 1995--2014 Clemens-August Thole and Klaus Stüben Industrial simulation on parallel computers . . . . . . . . . . . . . . . 2015--2037 Tayfun Tezduyar and Yasuo Osawa Methods for parallel computation of complex flow problems . . . . . . . . . 2039--2066 Richard A. Robb Visualization in biomedical computing 2067--2110 Kenneth C. Bowler and Anthony J. G. Hey Parallel computing and quantum chromodynamics . . . . . . . . . . . . . 2111--2134 Hermann Mierendorff and Wolfgang Joppich Empirical performance modeling for parallel weather prediction codes . . . 2135--2148 Stavros A. Zenios High-performance computing in finance: The last 10 years and the next . . . . . 2149--2175 Andreas Reuter Methods for parallel execution of complex database queries . . . . . . . . 2177--2188 Anonymous Index . . . . . . . . . . . . . . . . . 2189--2196
G. Ch. Pflug and A. \'Swi\cetanowski Selected parallel optimization methods for financial management under uncertainty . . . . . . . . . . . . . . 3--25 Beno\^\it Bourbeau and Teodor Gabriel Crainic and Bernard Gendron Branch-and-bound parallelization strategies applied to a depot location and container fleet management problem 27--46 Ricardo C. Corrêa A parallel approximation scheme for the multiprocessor scheduling problem . . . 47--72 Stella C. S. Porto and João Paulo F. W. Kitajima and Celso C. Ribeiro Performance evaluation of a parallel tabu search task scheduling algorithm 73--90 Michel Toulouse and Teodor Gabriel Crainic and K. Thulasiraman Global optimization properties of parallel cooperative search algorithms: a simulation study . . . . . . . . . . . 91--112 D. G. Morales and others Parallel dynamic programming and automata theory . . . . . . . . . . . . 113--134 M. D. Durand and Steve R. White Trading accuracy for speed in parallel simulated annealing with simultaneous moves . . . . . . . . . . . . . . . . . 135--150 I. Maros and G. Mitra Investigating the sparse simplex algorithm on a distributed memory multiprocessor . . . . . . . . . . . . . 151--170
Mohammed Atiquzzaman and Pradip K. Srimani Parallel computing on clusters of workstations . . . . . . . . . . . . . . 175--177 W.-M. Lin and W. Xie Load-skewing task assignment to minimize communication conflicts on network of workstations . . . . . . . . . . . . . . 179--197 Stephen R. Donaldson and Jonathan M. D. Hill and David B. Skillicorn BSP clusters: High performance, reliable and very low cost . . . . . . . . . . . 199--242 Ron Brightwell and others Massively parallel computing using commodity components . . . . . . . . . . 243--266 N. Melab and E.-G. Talbi Parallel adaptive computing on meta-systems including NOWs . . . . . . 267--284 John C. Chu and Patrick W. Dowd Adaptive cache coherence over a high bandwidth broadband mesh network . . . . 285--311 Edward K. Blum and Xin Wang and Patrick Leung Architectures and message-passing algorithms for cluster computing: Design and performance . . . . . . . . . . . . 313--332 G. Chiola and G. Ciaccio Efficient parallel processing on low-cost clusters with GAMMA active ports . . . . . . . . . . . . . . . . . 333--354 Yung-Lin Liu and Chung-Ta King EXPLORER: Supporting run-time parallelization of DOACROSS loops on general networks of workstations . . . . 355--375
N. Marco and S. Lanteri A two-level parallelization strategy for Genetic Algorithms applied to optimum shape design . . . . . . . . . . . . . . 377--397 Moez Ayed and Jean-Luc Gaudiot An efficient heuristic for code partitioning . . . . . . . . . . . . . . 399--426 Peter K. K. Loh and Wen Jing Hsu The Josephus cube: a novel interconnection network . . . . . . . . 427--453 Pao-Hwa Sui and Sheng-De Wang A fault-tolerant routing algorithm for wormhole routed meshes . . . . . . . . . 455--465 Taesoon Park and Heon Y. Yeom Application controlled checkpointing coordination for fault-tolerant distributed computing systems . . . . . 467--482 Costas S. Iliopoulos and James F. Reid Optimal parallel analysis and decomposition of partially occluded strings . . . . . . . . . . . . . . . . 483--494 A. Bevilacqua and E. Loli Piccolomini Parallel image restoration on parallel and distributed computers . . . . . . . 495--506
Erricos John Kontoghiorghes and Anna Nagurney and Berç Rustem Parallel computing in economics, finance and decision-making . . . . . . . . . . 507--509 S. A. MirHassani and C. Lucas and G. Mitra and E. Messina and C. A. Poojari Computational solution of capacity planning models under uncertainty . . . 511--538 G. Zanghirati and F. Cocco and G. Paruolo and F. Taddei A Cray T3E implementation of a parallel stochastic dynamic assets and liabilities management model . . . . . . 539--567 Cyril Godart Parallel implementation of a two-factor Cheyette-beta model calibration . . . . 569--586 Rodolphe Chatagny and Bastien Chopard A parallel model for the foreign exchange market . . . . . . . . . . . . 587--600 F. O. Bunnin and Y. Guo and Y. Ren and J. Darlington Design of high performance financial modelling environment . . . . . . . . . 601--622 S. C. Perry and R. H. Grimwood and D. J. Kerbyson and E. Papaefstathiou and G. R. Nudd Performance optimization of financial option calculations . . . . . . . . . . 623--639 Jenny X. Li and Gary L. Mullen Parallel computing of a quasi-Monte Carlo algorithm for valuing derivatives 641--653 Elias S. Manolakos and Haris M. Stellakis Systematic synthesis of parallel architectures for the computation of higher order cumulants . . . . . . . . . 655--676
E. W. Evans and S. P. Johnson and P. F. Leggett and M. Cross Automatic and effective multi-dimensional parallelisation of structured mesh based codes . . . . . . 677--703 R. Keppens and G. Tóth Using high performance Fortran for magnetohydrodynamic simulations . . . . 705--722 Keqin Li and Yi Pan and Mounir Hamdi Solving graph theory problems using reconfigurable pipelined optical buses 723--735 Arjen Schoneveld and Peter M. A. Sloot and Martin Lees and Erwan Karyadi A framework for dynamic load balancing: a case study on explosive containment simulation . . . . . . . . . . . . . . . 737--751 C. Rodríguez and J. L. Roda and F. Sande and D. G. Morales and F. Almeida A new parallel model for the analysis of asynchronous algorithms . . . . . . . . 753--767 Huan-Chao Keh and Jen-Chih Lin On fault-tolerant embedding of Hamiltonian cycles, linear arrays and rings in a Flexible Hypercube . . . . . 769--781 Jan Trdli\vcka and Pavel Tvrdík Embedding complete $k$-ary trees into $k$-square $2$D meshes with optimal edge congestion . . . . . . . . . . . . . . . 783--790 Shijun Diao and T. Fujiwara Evaluation and strategy of different data parallel implementation methods of a stiff chemical non-equilibrium flow solver . . . . . . . . . . . . . . . . . 791--804 J. G. Liu and F. H. Y. Chan and F. K. Lam and H. F. Li A new approach to fast calculation of moments of $3$-D gray level images . . . 805--815
Jerzy Leszczynski Computational chemistry . . . . . . . . 817--818 Wanda Andreoni and Alessandro Curioni New advances in chemistry and materials science with CPMD and parallel computing 819--842 C. P. Sosa and G. Scalmani and R. Gomperts and M. J. Frisch Ab initio quantum chemistry on a ccNUMA architecture using openMP. III . . . . . 843--856 John D. Watts Parallel algorithms for coupled-cluster methods . . . . . . . . . . . . . . . . 857--867 Ross H. Nobes and Alistair P. Rendell and Jarek Nieplocha Computational chemistry on Fujitsu vector-parallel processors: Hardware and programming environment . . . . . . . . 869--886 Alistair P. Rendell and others Computational chemistry on Fujitsu vector-parallel processors: Development and performance of applications software 887--911 Piotr Piecuch and Joseph I. Landman Parallelization of multi-reference coupled-cluster method . . . . . . . . . 913--943 David E. Bernholdt Scalability of correlated electronic structure calculations on parallel computers: a case study of the RI-MP2 method . . . . . . . . . . . . . . . . . 945--963 Dennis M. Newns and others Molecular dynamics study of structure and gating of low molecular weight ion channels . . . . . . . . . . . . . . . . 965--976 Barry Robson Simplified models of protein folding exploiting the Lagrange radius of gyration of the hydrophobic component 977--998 Jacek Komasa and Jacek Rychlewski Solving quantum-mechanical problems on parallel systems . . . . . . . . . . . . 999--1009 Jon Baker and Matt Shirel Ab initio quantum chemistry on PC-based parallel supercomputers . . . . . . . . 1011--1024 Marc Pavese and Soonmin Jang and Gregory A. Voth Centroid molecular dynamics: a quantum dynamics method suitable for the parallel computer . . . . . . . . . . . 1025--1041 Leonid Gorb and Ilya Yanov and Jerzy Leszczynski High performance computing on the Cray T3E and IBM SP2 systems with the parallel version of GAUSSIAN 94 . . . . 1043--1060
Jacek Ba\lewicz and Klaus H. Ecker and Tao Yang New trends on scheduling in parallel and distributed systems . . . . . . . . . . 1061--1063 Jacques Verriet Scheduling outtrees of height one in the LogP model . . . . . . . . . . . . . . . 1065--1082 Welf Löwe and Wolf Zimmermann Scheduling balanced task-graphs to LogP-machines . . . . . . . . . . . . . 1083--1108 Tomasz Kalinowski and Iskander Kort and Denis Trystram List scheduling of general task graphs under LogP . . . . . . . . . . . . . . . 1109--1128 Chams Lahlou Approximation algorithms for scheduling with a limited number of communications 1129--1162 Philippe Chrétienne On Graham's bound for cyclic scheduling 1163--1174 Alain Darte On the complexity of loop fusion . . . . 1175--1193 Jacek Ba\lewicz and Maciej Drozdowski and Piotr Formanowicz and Wies\law Kubiak and Günter Schmidt Scheduling preemptable tasks on parallel processors with limited availability . . 1195--1211 Luis Miguel Campos and Isaac D. Scherson Rate of change load balancing in distributed and parallel systems . . . . 1213--1230
Alan D. George and Jeff Markwell and Ryan Fogarty Real-time sonar beamforming on high-performance distributed computers 1231--1252 J. Chassin de Kergommeaux and B. Stein and P. E. Bernard Pajé, an interactive visualization tool for tuning multi-threaded parallel applications . . . . . . . . . . . . . . 1253--1274 Weng-Long Chang and Chih-Ping Chu The infinity Lambda test: a multi-dimensional version of Banerjee infinity test . . . . . . . . . . . . . 1275--1295 David K. Lowenthal and Vincent W. Freeh Architecture-independent parallelism for both shared- and distributed-memory machines using the Filaments package . . 1297--1323 Minyi Guo and Ikuo Nakata and Yoshiyuki Yamashita Contention-free communication scheduling for array redistribution . . . . . . . . 1325--1343 Peter Benner and Ralph Byers and Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search . . . . . . . . . 1345--1368
Peiyi Tang and Jingling Xue Generating efficient tiled code for distributed memory machines . . . . . . 1369--1410 Sajal K. Das and M. Cristina Pinotti Parallel priority queues based on binomial heaps . . . . . . . . . . . . . 1411--1428 Clémentin Tayou Djamégni and Patrice Quinton and Sanjay Rajopadhye and Tanguy Risset Derivation of systolic algorithms for the algebraic path problem by recurrence transformations . . . . . . . . . . . . 1429--1445 M. Manzur Murshed and Richard P. Brent Adaptive AT 2 optimal algorithms on reconfigurable meshes . . . . . . . . . 1447--1458 Tzung-Shi Chen and Nen-Chung Wang and Chih-Ping Chu Multicast communication in wormhole-routed star graph interconnection networks . . . . . . . . 1459--1490 J. A. Bakker Semantic partitioning as a basis for parallel I/O in database management systems . . . . . . . . . . . . . . . . 1491--1513
Rupak Biswas and Bruce Hendrickson and George Karypis Graph partitioning and parallel computing . . . . . . . . . . . . . . . 1515--1517 Bruce Hendrickson and Tamara G. Kolda Graph partitioning models for parallel computing . . . . . . . . . . . . . . . 1519--1534 N. Touheed and P. Selwood and P. K. Jimack and M. Berzins A comparison of some dynamic load-balancing algorithms for a parallel adaptive flow solver . . . . . . . . . . 1535--1554 Ralf Diekmann and Robert Preis and Frank Schlimbach and Chris Walshaw Shape-optimized mesh partitioning and load balancing for parallel adaptive FEM 1555--1581 Leonid Oliker and Rupak Biswas and Harold N. Gabow Parallel tetrahedral mesh adaptation with dynamic load balancing . . . . . . 1583--1608 Burkhard Monien and Robert Preis and Ralf Diekmann Quality matching and local improvement for multilevel graph-partitioning . . . 1609--1634 C. Walshaw and M. Cross Parallel optimisation algorithms for multilevel mesh partitioning . . . . . . 1635--1660 J. Rantakokko Partitioning strategies for structured multiblock grids . . . . . . . . . . . . 1661--1680
J. Chassin de Kergommeaux and P. J. Hatcher and L. Rauchwerger Parallel computing for irregular applications . . . . . . . . . . . . . . 1681--1684 Manuel Hermenegildo Parallelizing irregular and pointer-based computations automatically: Perspectives from logic and constraint programming . . . . . . . 1685--1708 E. Gutiérrez and R. Asenjo and O. Plata and E. L. Zapata Automatic parallelization of irregular applications . . . . . . . . . . . . . . 1709--1738 F. Warren Burton and David J. Simpson Memory requirements for parallel programs . . . . . . . . . . . . . . . . 1739--1763 Andras Laszloffy and Jingping Long and Abani K. Patra Simple data management, scheduling and solution strategies for managing the irregularities in parallel adaptive hp finite element simulations . . . . . . . 1765--1788 Frédéric Brégier and Marie-Christine Counilh and Jean Roman Scheduling loops with partial loop-carried dependencies . . . . . . . 1789--1806 Thomas Brandes and Cécile Germain-Renaud A schedule cache for data parallel unstructured computations . . . . . . . 1807--1823 Thomas Decker Virtual data space --- load balancing for irregular applications . . . . . . . 1825--1860 Hwansoo Han and Chau-Wen Tseng Efficient compiler and run-time support for parallel irregular reductions . . . 1861--1887 P. Beraldi and L. Grandinetti and R. Musmanno and C. Triki Parallel algorithms to solve two-stage stochastic linear programs with robustness constraints . . . . . . . . . 1889--1908 C. S. Pua and M. H. Williams and D. H. Marwick Modelling parallel databases with process algebra . . . . . . . . . . . . 1909--1924 Ming-Yang Su and Hui-Ling Huang and Gen-Huey Chen and Dyi-Rong Duh Node-disjoint paths in incomplete WK-recursive networks . . . . . . . . . 1925--1944 Roman Trobec Two-dimensional regular $d$-meshes . . . 1945--1953 Anonymous Index . . . . . . . . . . . . . . . . . 1955--1962
O. Ya\csar and Y. Deng and R. E. Tuzun and D. Saltz New trends in high performance computing 1--2 R. Clint Whaley and Antoine Petitet and Jack J. Dongarra Automated empirical optimizations of software and the ATLAS project . . . . . 3--35 Dinshaw S. Balsara and Charles D. Norton Highly parallel structured adaptive mesh refinement using parallel language-based approaches . . . . . . . . . . . . . . . 37--70 Reginald L. Walker Search engine case study: searching the Web using genetic programming and MPI 71--89 Yuefan Deng and Alex Korobka The performance of a supercomputer built with commodity components . . . . . . . 91--108 Michael D. Letherwood and David D. Gunter Ground vehicle modeling and simulation of military vehicles using high performance computing . . . . . . . . . 109--140 Ting Chen and Vladimir Filkov and Steven S. Skiena Identifying gene regulatory networks from experimental data . . . . . . . . . 141--162 Alfredo U. Luccio Numerical simulation of particle accelerators . . . . . . . . . . . . . . 163--177 O. Ya\csar A new ignition model for spark-ignited engine simulations . . . . . . . . . . . 179--200
César Rego Node-ejection chains for the vehicle routing problem: Sequential and parallel algorithms . . . . . . . . . . . . . . . 201--222 Antonio Corradi and Letizia Leonardi and Franco Zambonelli Parallel object allocation via user-specified directives: a case study in traffic simulation . . . . . . . . . 223--241 Patrick Dymond and Jieliang Zhou and Xiaotie Deng A $2$-D parallel convex hull algorithm with optimal communication phases . . . 243--255 Sathiamoorthy Manoharan Effect of task duplication on the assignment of dependency graphs . . . . 257--268 Masayoshi Aritsugi and Hiroki Fukatsu and Yoshinari Kanamori Several partitioning strategies for parallel image convolution in a network of heterogeneous workstations . . . . . 269--293 B. Di Martino and S. Briguglio and G. Vlad and P. Sguazzero Parallel PIC plasma simulation through particle decomposition techniques . . . 295--314 Avi Kavas and David Er-El and Dror G. Feitelson Using multicast to pre-load jobs on the ParPar cluster . . . . . . . . . . . . . 315--327
J. W. Manke Parallel computing in aerospace . . . . 329--336 William D. Gropp and Dinesh K. Kaushik and David E. Keyes and Barry F. Smith High-performance parallel implicit CFD 337--362 M. Garbey and Yu. V. Vassilevski A parallel solver for unsteady incompressible $3$D Navier--Stokes equations . . . . . . . . . . . . . . . 363--389 Jay Hoeflinger and Prasad Alavilli and Thomas Jackson and Bob Kuhn Producing scalable performance with OpenMP: Experiments with two CFD applications . . . . . . . . . . . . . . 391--413 P. Aumann and others MEGAFLOW: Parallel complete aircraft CFD 415--440 M. S. Fisher and M. Mani and D. Stookesberry Parallel processing with the Wind CFD code at Boeing . . . . . . . . . . . . . 441--456 Joseph W. Manke and G. David Kerlick and David Levine and Subhankar Banerjee and Eric Dillon Parallel performance of two applications in the Boeing high performance computing benchmark suite . . . . . . . . . . . . 457--475 Piyush Mehrotra and Hans Zima High Performance Fortran for aerospace applications . . . . . . . . . . . . . . 477--501 Paul D. Hovland and Lois C. McInnes Parallel simulation of compressible flow using automatic differentiation and PETSc . . . . . . . . . . . . . . . . . 503--519 James R. Taft Achieving 60 GFLOP/s on the production CFD code OVERFLOW-MLP . . . . . . . . . 521--536
Stefania Bandini and Giancarlo Mauri and Roberto Serra Cellular automata: From modeling to applications . . . . . . . . . . . . . . 537--538 S. Bandini and G. Mauri and R. Serra Cellular automata: From a theoretical parallel computational model to its application to complex systems . . . . . 539--553 Andreas Beckers and Thomas Worsch A perimeter-time CA for the queen bee problem . . . . . . . . . . . . . . . . 555--569 F. Jiménez Morales and J. P. Crutchfield and M. Mitchell Evolving two-dimensional cellular automata to perform density classification: a report on work in progress . . . . . . . . . . . . . . . . 571--585 Hiroshi Umeo Linear-time recognition of connectivity of binary images on $1$-bit inter-cell communication cellular automaton . . . . 587--599 Jörg R. Weimar Coupling microscopic and macroscopic cellular automata . . . . . . . . . . . 601--611 B. Ostrovsky and G. Crooks and M. A. Smith and Y. Bar-Yam Cellular automata for polymer simulation with application to polymer melts and polymer collapse including implications for protein folding . . . . . . . . . . 613--641 Stefania Bandini and Massimiliano Magagnini Parallel processing simulation of dynamic properties of filled rubber compounds based on cellular automata . . 643--661 Roberto Serra and Marco Villani and Anna Salvemini Continuous genetic networks . . . . . . 663--683 R. Cappuccio and G. Cattaneo and G. Erbacci and U. Jocher A parallel implementation of a cellular automata based model for coffee percolation . . . . . . . . . . . . . . 685--717 J. Wahle and L. Neubert and J. Esser and M. Schreckenberg A cellular automaton traffic flow model for online simulation of traffic . . . . 719--735
Th. Lippert and N. Petkov and P. Palazzari and K. Schilling Hyper-systolic matrix multiplication . . 737--759 Gundolf Haase and Michael Kuhn and Ulrich Langer Parallel multigrid $3$D Maxwell solvers 761--775 Yair Censor and Dan Gordon and Rachel Gordon Component averaging: an efficient iterative parallel algorithm for large and sparse unstructured problems . . . . 777--808 Alexandros V. Gerbessiotis and Constantinos J. Siniolakis Merging on the BSP model . . . . . . . . 809--822 Ishfaq Ahmad and Shahriar M. Akramullah and Ming L. Liou and Muhammad Kafil A scalable off-line MPEG-2 video encoding scheme using a multiprocessor system . . . . . . . . . . . . . . . . . 823--846 Paul N. Swarztrauber and Steven W. Hammond A comparison of optimal FFTs on torus and hypercube multicomputers . . . . . . 847--859 Muhammad H. Alsuwaiyel An optimal parallel algorithm for the multiselection problem . . . . . . . . . 861--865
Henk J. Sips and Ruud Sommerhalder and Erik D'Hollander Linear systems and associated problems 867--868 A. Basermann and J. Fingberg and G. Lonsdale and B. Maerten and C. Walshaw Dynamic multi-partitioning for parallel finite element applications . . . . . . 869--881 Roman Geus and Stefan Röllin Towards a fast parallel sparse symmetric matrix-vector multiplication . . . . . . 883--896 D. B. Heras and J. C. Cabaleiro and F. F. Rivera Modeling data locality for the sparse matrix-vector product using distance measures . . . . . . . . . . . . . . . . 897--912 A. Cooper and M. Szularz and J. Weston External selective orthogonalization for the Lanczos algorithm in distributed memory environments . . . . . . . . . . 913--923 H. X. Lin A unifying graph model for designing parallel algorithms for tridiagonal systems . . . . . . . . . . . . . . . . 925--939 Peter Christen and others Scalable parallel algorithms for surface fitting and data mining . . . . . . . . 941--961 Luca Bergamaschi and Giorgio Pini and Flavio Sartoretto Parallel preconditioning of a sparse eigensolver . . . . . . . . . . . . . . 963--976
Yuto Komeiji and Makoto Haraguchi and Umpei Nagashima Parallel molecular dynamics simulation of a protein . . . . . . . . . . . . . . 977--987 Mardochée Magolu monga Made and Henk A. van der Vorst Parallel incomplete factorizations with pseudo-overlapped subdomains . . . . . . 989--1008 Arnold Krechel and Klaus Stüben Parallel algebraic multigrid based on subdomain blocking . . . . . . . . . . . 1009--1031 Azzedine Boukerche and Carl Tropper Local versus global lookahead in conservative parallel simulations . . . 1033--1055 Byung S. Yoo and Chita R. Das Efficient processor management schemes for mesh-connected multicomputers . . . 1057--1078 Constantine Katsinis Performance analysis of the simultaneous optical multi-processor exchange bus . . 1079--1115 Weng-Long Chang and Chih-Ping Chu The generalized Direction Vector I test 1117--1144
M. Alabdulkareem and S. Lakshmivarahan and S. K. Dhall Scalability analysis of large codes using factorial designs . . . . . . . . 1145--1171 Daeyeon Park and Byeong Hag Seong and Rafael H. Saavedra Adaptive software prefetching in scalable multiprocessors using cache information . . . . . . . . . . . . . . 1173--1195 Paraskevas Evripidou $D^3$-Machine: a decoupled data-driven multithreaded architecture with variable resolution support . . . . . . . . . . . 1197--1225 Vittorio Cortellessa and Francesco Quaglia A checkpointing-recovery scheme for Time Warp parallel simulation . . . . . . . . 1227--1252 Dolors Royo and Miguel Valero-García and Antonio González Implementing the one-sided Jacobi method on a $2$D/$3$D mesh multicomputer . . . 1253--1271 Gen-Huey Chen and Shien-Ching Hwang and Hui-Ling Huang and Ming-Yang Su and Dyi-Rong Duh A general broadcasting scheme for recursive networks with complete connection . . . . . . . . . . . . . . . 1273--1278
Gabriel Antoniu and others The Hyperion system: Compiling multithreaded Java bytecode for distributed execution . . . . . . . . . 1279--1297 Eric Noulard and Nahid Emad A key for reusable parallel linear algebra software . . . . . . . . . . . . 1299--1319 Jeff Boleng and Manavendra Misra Load balanced parallel QR decomposition on shared memory multiprocessors . . . . 1321--1345 L. F. Romero and E. M. Ortigosa and E. L. Zapata Data-task parallelism for the VMEC program . . . . . . . . . . . . . . . . 1347--1364 O. Yu. Milyukova Parallel approximate factorization method for solving discrete elliptic equations . . . . . . . . . . . . . . . 1365--1379 J. Al-Sadi and K. Day and M. Ould-Khaoua Fault-tolerant routing in hypercubes using probability vectors . . . . . . . 1381--1399
Jack Dongarra and Masaaki Shimasaki and Bernard Tourancheau Clusters and computational grids for scientific computing . . . . . . . . . . 1401--1402 Cherri M. Pancake Performance tools for today's HPC: Are we addressing the right issues? . . . . 1403--1415 Ralph Butler and William Gropp and Ewing Lusk Components and interfaces of a process management system for parallel programs 1417--1429 Thilo Kielmann and Henri E. Bal and Sergei Gorlatch and Kees Verstoep and Rutger F. H. Hofman Network performance-aware collective communication for clustered wide-area systems . . . . . . . . . . . . . . . . 1431--1456 Michael D. Beynon and others Distributed processing of very large datasets with DataCutter . . . . . . . . 1457--1478 Graham E. Fagg and Antonin Bukovsky and Jack J. Dongarra HARNESS and fault tolerant MPI . . . . . 1479--1495 E. Caron and others \sc Scilab to \sc Scilab$_{//}$: The \sc Ouragan project . . . . . . . . . . . . 1497--1519
Michael Florian and Michel Gendreau Applications of parallel computing in transportation . . . . . . . . . . . . . 1521--1522 S. C. Wong and C. K. Wong and C. O. Tong A parallelized genetic algorithm for the calibration of Lowry model . . . . . . . 1523--1536 Michelle R. Hribar and Valerie E. Taylor and David E. Boyce Implementing parallel shortest path for parallel transportation applications . . 1537--1568 N. Tremblay and M. Florian Temporal shortest paths: Parallel computing implementations . . . . . . . 1569--1609 Kai Nagel and Marcus Rickert Parallel implementation of the TRANSIMS micro-simulation . . . . . . . . . . . . 1611--1639 Michel Gendreau and Gilbert Laporte and Frédéric Semet A dynamic model and parallel tabu search heuristic for real-time ambulance relocation . . . . . . . . . . . . . . . 1641--1653
Laurent Hascoët A method for automatic placement of communications in SPMD parallelisation 1655--1664 Giuseppe Passoni and Paolo Cremonesi and Giancarlo Alfonsi Analysis and implementation of a parallelization strategy on a Navier--Stokes solver for shear flow simulations . . . . . . . . . . . . . . 1665--1685 B. V. Rathish Kumar and T. Yamaguchi and H. Liu and R. Himeno A parallel $3$D unsteady incompressible flow solver on VPP700 . . . . . . . . . 1687--1713 Ignacio M. Llorente and Manuel Prieto-Matías and Boris Diskin A parallel multigrid solver for $3$D convection and convection-diffusion problems . . . . . . . . . . . . . . . . 1715--1741 M. Arenaz and R. Doallo and J. Touriño and C. Vázquez Efficient parallel numerical solver for the elastohydrodynamic Reynolds-Hertz problem . . . . . . . . . . . . . . . . 1743--1765 Wahid Nasri and Zaher Mahjoub Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers . . . . . . 1767--1782 Weng-Long Chang and Chih-Ping Chu and Jia-Hwa Wu A multi-dimensional version of the I test . . . . . . . . . . . . . . . . . . 1783--1799 H. Sarbazi-Azad and M. Ould-Khaoua and L. M. Mackenzie Communication delay in hypercubes in the presence of bit-reversal traffic . . . . 1801--1816 Jau-Der Shih Wormhole routing for torus networks with faults . . . . . . . . . . . . . . . . . 1817--1829
Takahiro Katagiri and Yasumasa Kanada An efficient implementation of parallel eigenvalue computation for massively parallel processing . . . . . . . . . . 1831--1845 Márcia A. Inda and Rob H. Bisseling A simple and efficient parallel FFT algorithm using the BSP model . . . . . 1847--1878 C. Bekas and E. Gallopoulos Cobra: Parallel path following for computing the matrix pseudospectrum . . 1879--1896 Wei Shi and Pradip K. Srimani A regular scalable fault tolerant interconnection network for distributed processing . . . . . . . . . . . . . . . 1897--1919 P. Dmitruk and L.-P. Wang and W. H. Matthaeus and R. Zhang and D. Seckel Scalable parallel FFT for spectral simulations on a Beowulf cluster . . . . 1921--1936 Anonymous Author index to volume 27 . . . . . . . 1937--1944
Gerhard R. Joubert Editorial . . . . . . . . . . . . . . . 1--2 Angela C. Sodan Applications on a multithreaded architecture: a case study with EARTH-MANNA . . . . . . . . . . . . . . 3--33 Hendrik L. Tolman Distributed-memory concepts in the wave model WAVEWATCH III . . . . . . . . . . 35--52 P. Wang and Karen Y. Liu and Tom Cwik and Robert Green MODTRAN on supercomputers and parallel computers . . . . . . . . . . . . . . . 53--64 Fusen He and Jie Wu An efficient parallel implementation of the Everglades Landscape Fire Model using checkpointing . . . . . . . . . . 65--82 Rajeev Thakur and William Gropp and Ewing Lusk Optimizing noncontiguous accesses in MPI-IO . . . . . . . . . . . . . . . . . 83--105 Hung-Chang Hsiao and Chung-Ta King Implementation and evaluation of directory hints in CC-NUMA multiprocessors . . . . . . . . . . . . 107--132 Huei-Huang Chang and Ge-Ming Chiu An improved fault-tolerant routing algorithm in meshes with convex faults 133--149
Erricos John Kontoghiorghes and Ahmed Sameh and Denis Trystram Special issue on parallel matrix algorithms and applications . . . . . . 151--153 Olivier Beaumont and Arnaud Legrand and Fabrice Rastello and Yves Robert Dense linear algebra kernels on heterogeneous platforms: Redistribution issues . . . . . . . . . . . . . . . . . 155--185 Olaf Schenk and Klaus Gärtner Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems . . . . . . . . 187--197 Dany Mezher and Bernard Philippe Parallel computation of pseudospectra of large sparse matrices . . . . . . . . . 199--221 C. Bekas and E. Gallopoulos Parallel computation of pseudospectra by fast descent . . . . . . . . . . . . . . 223--242 M. Be\vcka and G. Ok\vsa and M. Vajter\vsic Dynamic ordering for a parallel block-Jacobi SVD algorithm . . . . . . . 243--262 Martin H. Gutknecht and Stefan Röllin The Chebyshev iteration revisited . . . 263--283 Ahmed H. Sameh and Vivek Sarin Parallel algorithms for indefinite linear systems . . . . . . . . . . . . . 285--299 P. Hénon and P. Ramet and J. Roman \sc PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems . . . . . . . 301--321 Y. Liang and J. Weston and M. Szularz Generalized least-squares polynomial preconditioners for symmetric indefinite linear equations . . . . . . . . . . . . 323--341 Joël M. Malard Parallel restricted maximum likelihood estimation for linear models with a dense exogenous matrix . . . . . . . . . 343--353 Wojciech Owczarz and Zahari Zlatev Parallel matrix computations in air pollution modelling . . . . . . . . . . 355--368
B. Nkonga and P. Charrier Generalized parcel method for dispersed spray and message passing strategy on unstructured meshes . . . . . . . . . . 369--398 Stephen H. Brill and George F. Pinder Parallel implementation of the Bi-CGSTAB method with block red-black Gauss--Seidel preconditioner applied to the Hermite collocation discretization of partial differential equations . . . 399--414 Harald J. Ehold and Wilfried N. Gansterer and Dieter F. Kvasnicka and Christoph W. Ueberhuber Optimizing Local Performance in HPF . . 415--432 Alain Girault Elimination of redundant messages with a two-pass static analysis algorithm . . . 433--453 Kleanthis Psarris Program analysis techniques for transforming programs for parallel execution . . . . . . . . . . . . . . . 455--469 Jen-Chih Lin and Nan-Chen Hsien Reconfiguring binary tree structures in a faulty supercube with unbounded expansion . . . . . . . . . . . . . . . 471--483 F. Quaglia and B. Ciciani and M. Colajanni Performance analysis of adaptive wormhole routing in a two-dimensional torus . . . . . . . . . . . . . . . . . 485--501 Yosi Ben-Asher The parallel client-server paradigm . . 503--523
Abdelkader Hameurlain and Franck Morvan CPU and incremental memory allocation in dynamic parallelization of SQL queries 525--556 A. Goscinski and M. Hobbs and J. Silcock GENESIS: an efficient, transparent and easy to use cluster operating system . . 557--606 Olivier Aumage and Luc Bougé and Jean-François Méhaut and Raymond Namyst Madeleine II: a portable and efficient communication library for high-performance cluster computing . . . 607--626 Petr Salinger and Pavel Tvrdík Optimal broadcasting and gossiping in one-port meshes of trees with distance-insensitive routing . . . . . . 627--647 Abderezak Touzene Edges-disjoint spanning trees on the binary wrapped butterfly network with applications to fault tolerance . . . . 649--666 Roberto Serra and Marco Villani and Anna Salvemini Erratum to ``Continuous genetic networks'' [Parallel Comput. 27(5) (2001) 663--683] . . . . . . . . . . . . 667--667
Domenico Talia and Pradip K. Srimani Guest editorial: Parallel data-intensive algorithms and applications . . . . . . 669--671 Mario Cannataro and Domenico Talia and Pradip K. Srimani Parallel data intensive computing in scientific and commercial applications 673--704 P. Sanders Reconciling simplicity and realism in parallel disk models . . . . . . . . . . 705--723 Renato Ferreira and Gagan Agrawal and Joel Saltz Data parallel language and compiler support for data intensive applications 725--748 Bill Allcock and others Data management and transfer in high-performance computational grid environments . . . . . . . . . . . . . . 749--771 Yanyan Yang and others Agent based data management in digital libraries . . . . . . . . . . . . . . . 773--792 Massimo Coppola and Marco Vanneschi High-performance data mining with skeleton-based structured parallel programming . . . . . . . . . . . . . . 793--813 D. B. Skillicorn Parallel frequent set counting . . . . . 815--825 Michael Beynon and others Processing large-scale multi-dimensional data in parallel and distributed environments . . . . . . . . . . . . . . 827--859
Pablo A. Estévez and Hél\`ene Paugam-Moisy and Didier Puzenat and Manuel Ugarte A scalable parallel algorithm for training a hierarchical mixture of neural experts . . . . . . . . . . . . . 861--891 Tyng-Yeu Liang and Ce-Kuen Shieh and Jun-Qi Li Selecting threads for workload migration in software distributed shared memory systems . . . . . . . . . . . . . . . . 893--913 Jingling Xue and Wentong Cai Time-minimal tiling when rise is larger than zero . . . . . . . . . . . . . . . 915--939
Andreas Uhl and Peter Zinterhof Guest editorial: Parallel computing in image and video processing . . . . . . . 941--943 Cristina Nicolescu and Pieter Jonker A data and task parallel image processing environment . . . . . . . . . 945--965 F. J. Seinstra and D. Koelma and J. M. Geusebroek A software architecture for user transparent parallel image processing 967--993 A. Biancardi and A. Mérigot Extending the data parallel paradigm with data-dependent operators . . . . . 995--1021 Francisco Argüello and Juan López and María A. Trenas and Emilio L. Zapata Architecture for wavelet packet transform based on lifting steps . . . . 1023--1037 Ishfaq Ahmad and Yong He and Ming L. Liou Video compression with parallel processing . . . . . . . . . . . . . . . 1039--1078 Hazem M. Abbas and Mohamed M. Bayoumi Parallel codebook design for vector quantization on a message passing MIMD architecture . . . . . . . . . . . . . . 1079--1093 Rade Kutil Approaches to zerotree image and video coding on MIMD architectures . . . . . . 1095--1109 Aravind Dasu and Sethuraman Panchanathan Reconfigurable media processing . . . . 1111--1139 K. Benkrid and D. Crookes and A. Benkrid Towards a general framework for FPGA based image processing using hardware skeletons . . . . . . . . . . . . . . . 1141--1154 A. C. Zawada and N. L. Seed and P. A. Ivey Continuous and high coverage self-testing of dynamically re-configurable systems . . . . . . . . 1155--1178 Virginie Fresse and Olivier Deforges ARIAL: R\em apid P\em rototyping for M\em ixed and P\em arallel P\em latforms 1179--1202 Edwige Pissaloux and Franck Amiot and Tharam Dillon A vision-application adaptable computer concept and its implementation in FreeTIV computer . . . . . . . . . . . . 1203--1219 Anonymous IFC --- Inside Front Cover (Editorial Board) . . . . . . . . . . . . . . . . . CO2--CO2
Mark Christiaens and Michiel Ronsse and Koen De Bosschere Bounding the number of segment histories during data race detection . . . . . . . 1221--1238 Mikhail S. Tarkov and Youngsong Mun and Jaeyoung Choi and Hyung-Il Choi Mapping adaptive fuzzy Kohonen clustering network onto distributed image processing system . . . . . . . . 1239--1256 Erricos John Kontoghiorghes Greedy Givens algorithms for computing the rank-$k$ updating of the QR decomposition . . . . . . . . . . . . . 1257--1273 Ke Chen and Choi H. Lai Parallel algorithms of the Purcell method for direct solution of linear systems . . . . . . . . . . . . . . . . 1275--1291 Shao Dong Chen and Hong Shen and Rodney Topor An efficient algorithm for constructing Hamiltonian paths in meshes . . . . . . 1293--1305 Yuan-Shin Hwang Parallelizing graph construction operations in programs with cyclic graphs . . . . . . . . . . . . . . . . . 1307--1328 PeiZong Lee and Wen-Yao Chen Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers . . . . . . . . . . . 1329--1368 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Alexey Lastovetsky Adaptive parallel computing on heterogeneous networks with mpC . . . . 1369--1407 Jeffrey Nesheiwat and Boleslaw K. Szymanski Instrumentation database system for performance analysis of parallel scientific applications . . . . . . . . 1409--1449 Chi Shen and Jun Zhang Parallel two level block ILU preconditioning techniques for solving large sparse linear systems . . . . . . 1451--1475 Lili Ju and Qiang Du and Max Gunzburger Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations . . . . . . . . . . . . 1477--1500 Carlos Alberto Alonso Sanches and Nei Yoshihiro Soma and Horacio Hideki Yanasse Short communication: Comments on parallel algorithms for the knapsack problem . . . . . . . . . . . . . . . . 1501--1505 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Ian N. Dunn and Gerard G. L. Meyer QR factorization for shared memory and message passing . . . . . . . . . . . . 1507--1530 Jean-Guillaume Dumas and Jean-Louis Roch On parallel block algorithms for exact triangularizations . . . . . . . . . . . 1531--1548 Taesoon Park and Inseon Lee and Heon Y. Yeom An efficient causal logging scheme for recoverable distributed shared memory systems . . . . . . . . . . . . . . . . 1549--1572 Claire Hanen and Alix Munier Kordon Minimizing the volume in scheduling an out-tree with communication delays and duplication . . . . . . . . . . . . . . 1573--1585 S. A. Jarvis and J. M. D. Hill and C. J. Siniolakis and V. P. Vasilev Portable and architecture independent parallel performance tuning using BSP 1587--1609 Li-Chiu Chang and Fi-John Chang An efficient parallel algorithm for LISSOM neural network . . . . . . . . . 1611--1633 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Pasqua D'Ambra and Marco Danelutto and Daniela di Serafino Advanced environments for parallel and distributed computing . . . . . . . . . 1635--1636 Pasqua D'Ambra and Marco Danelutto and Daniela di Serafino and Marco Lapegna Advanced environments for parallel and distributed applications: a view of current status . . . . . . . . . . . . . 1637--1662 S. MacDonald and J. Anvik and S. Bromling and J. Schaeffer and D. Szafron and K. Tan From patterns to frameworks to parallel programs . . . . . . . . . . . . . . . . 1663--1683 Jocelyn Sérot and Dominique Ginhac Skeletons for parallel image processing: an overview of the SKIPPER project . . . 1685--1708 Marco Vanneschi The programming model of ASSIST, an environment for parallel and distributed portable applications . . . . . . . . . 1709--1732 D. Laforenza Grid programming: some indications where we are headed . . . . . . . . . . . . . 1733--1752 Nathalie Furmento and Anthony Mayer and Stephen McGough and Steven Newhouse and Tony Field and John Darlington ICENI: Optimisation of component applications within a Grid environment 1753--1772 Micah Beck and Dorian Arnold and Alessandro Bassi and Fran Berman and Henri Casanova and Jack Dongarra and Terry Moore and Graziano Obertelli and James Plank and Martin Swany Middleware for the use of storage in communication . . . . . . . . . . . . . 1773--1787 M. Di Santo and F. Frattolillo and W. Russo and E. Zimeo A component-based approach to build a portable and flexible middleware for metacomputing . . . . . . . . . . . . . 1789--1810 Boyana Norris and Satish Balay and Steven Benson and Lori Freitag and Paul Hovland and Lois McInnes and Barry Smith Parallel components for PDEs and optimization: some issues and experiences . . . . . . . . . . . . . . 1811--1831 Anonymous Author Index . . . . . . . . . . . . . . 1833--1839 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
E. A. H. Vollebregt and M. R. T. Roest and J. W. M. Lander Large scale computing at Rijkswaterstaat 1--20 Leo Chin Sim and Heiko Schroder and Graham Leedham MIMD--SIMD hybrid system----towards a new low cost parallel system . . . . . . 21--36 Hon F. Li and Gabriel Girard View consistencies and exact implementations . . . . . . . . . . . . 37--67 Ashok Srinivasan and Michael Mascagni and David Ceperley Testing parallel random number generators . . . . . . . . . . . . . . . 69--94 Ramachandran Vaidyanathan and Jerry L. Trahan and Chun-ming Lu Degree of scalability: scalable reconfigurable mesh algorithms for multiple addition and matrix--vector multiplication . . . . . . . . . . . . . 95--109 Salma A. Ghoneim and Hossam M. A. Fahmy Job preemption, fast subcube compaction, or waiting in hypercube systems? A selection methodology . . . . . . . . . 111--134 Heejo Lee and Jong Kim and Sung Je Hong and Sunggu Lee Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization . . . . . . . . . . . . . 135--159 Oh-Han Kang and Si-Gwan Kim A task duplication based scheduling algorithm for shared memory multiprocessors . . . . . . . . . . . . 161--166 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Hongzhang Shan and Jaswinder P. Singh and Leonid Oliker and Rupak Biswas Message passing and shared address space parallelism on an SMP cluster . . . . . 167--186 Olaf Bonorden and Ben Juurlink and Ingo von Otte and Ingo Rieping The Paderborn University BSP (PUB) library . . . . . . . . . . . . . . . . 187--207 Fabrice Rastello and Amit Rao and Santosh Pande Optimal task scheduling at run time to exploit intra-tile parallelism . . . . . 209--239 D. González and F. Almeida and L. Moreno and C. Rodríguez Towards the automatic optimal mapping of pipeline algorithms . . . . . . . . . . 241--254 Cosimo Anglano and Claudio Casetti and Emilio Leonardi and Fabio Neri Network interface multicast protocols for wormhole-based networks of workstations . . . . . . . . . . . . . . 255--283 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Erik Reinhard and Dirk Bartz Parallel graphics and visualisation . . 285--288 Toshi Kato ``Kilauea''----parallel global illumination renderer . . . . . . . . . 289--310 M. Isard and M. Shand and A. Heirich Distributed rendering of interactive soft shadows . . . . . . . . . . . . . . 311--323 Wagner T. Corrêa and James T. Klosowski and Cláudio T. Silva Out-of-core sort-first parallel rendering for cluster-based tiled displays . . . . . . . . . . . . . . . . 325--338 Jürgen P. Schulze and Ulrich Lang The parallelized perspective shear-warp algorithm for volume rendering . . . . . 339--354 Li Chen and Issei Fujishiro and Kengo Nakajima Optimizing parallel performance of unstructured volume rendering for the Earth Simulator . . . . . . . . . . . . 355--371 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
A. Migdalas and G. Toraldo and V. Kumar Parallel computing in numerical optimization . . . . . . . . . . . . . . 373--373 A. Migdalas and G. Toraldo and V. Kumar Nonlinear optimization and parallel computing . . . . . . . . . . . . . . . 375--391 R. M. Aiex and S. Binato and M. G. C. Resende Parallel GRASP with path-relinking for job shop scheduling . . . . . . . . . . 393--430 Jörgen Blomvall A multistage stochastic programming algorithm suitable for parallel computing . . . . . . . . . . . . . . . 431--445 Ricardo C. Corrêa and Fernando C. Gomes and Carlos A. S. Oliveira and Panos M. Pardalos A parallel implementation of an asynchronous team to the point-to-point connection problem . . . . . . . . . . . 447--466 M. D'Apuzzo and M. Marino Parallel computational issues of an interior point method for solving large bound-constrained quadratic programming problems . . . . . . . . . . . . . . . . 467--483 C. Durazzi and V. Ruggiero Numerical solution of special linear and quadratic programs via a parallel interior-point method . . . . . . . . . 485--503 Cristian Gatu and Erricos J. Kontoghiorghes Parallel algorithms for computing all possible subset regression models using the QR decomposition . . . . . . . . . . 505--521 Susana Gómez and Nelson del Castillo and Longina Castellanos and Julio Solano The parallel tunneling method . . . . . 523--533 G. Zanghirati and L. Zanni A parallel solver for large quadratic programs in training support vector machines . . . . . . . . . . . . . . . . 535--551 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2 Anonymous Obituary: Harry F. Jordan . . . . . . . iii--iii
Gilbert Laporte and Roberto Musmanno Parallel computing in logistics . . . . 553--554 James F. Campbell and Gary Stiehr and Andreas T. Ernst and Mohan Krishnamoorthy Solving hub arc location problems on a cluster of workstations . . . . . . . . 555--574 Félix García-López and Belén Melián-Batista and José A. Moreno-Pérez and J. Marcos Moreno-Vega Parallelization of the scatter search for the $p$-median problem . . . . . . . 575--589 Bernard Gendron and Jean-Yves Potvin and Patrick Soriano A parallel hybrid heuristic for the multicommodity capacitated location problem with balancing requirements . . 591--606 T. K. Ralphs Parallel branch and cut for capacitated vehicle routing . . . . . . . . . . . . 607--629 Pierpaolo Caricato and Gianpaolo Ghiani and Antonio Grieco and Emanuela Guerriero Parallel tabu search for a pickup and delivery problem under track contention 631--639 A. Bortfeldt and H. Gehring and D. Mack A parallel tabu search algorithm for solving the container loading problem 641--662 F. Guerriero and M. Mancini A cooperative parallel rollout algorithm for the sequential ordering problem . . 663--677 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Daisuke Takahashi A parallel $1$-D FFT algorithm for the Hitachi SR8000 . . . . . . . . . . . . . 679--690 Coskun Mermer and Donglok Kim and Yongmin Kim Efficient 2D FFT implementation on mediaprocessors . . . . . . . . . . . . 691--709 P. H. Muir and R. N. Pancer and K. R. Jackson PMIRKDC: a parallel mono-implicit Runge--Kutta code with defect control for boundary value ODEs . . . . . . . . 711--741 A. Plastino and C. C. Ribeiro and N. Rodriguez Developing SPMD applications with load balancing . . . . . . . . . . . . . . . 743--766 Naya Nagy and Selim G. Akl The maximum flow problem: a real-time approach . . . . . . . . . . . . . . . . 767--794 Bassel R. Arafeh A task duplication scheme for resolving deadlocks in clustered DAGs . . . . . . 795--820 Jung-Sheng Fu Fault-tolerant cycle embedding in the hypercube . . . . . . . . . . . . . . . 821--832 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Patrick R. Amestoy and Iain S. Duff and Jean-Yves L'Excellent and Xiaoye S. Li Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers . . . . . . . . . . . . . . . . 833--849 James Kohout and Alan D. George A high-performance communication service for parallel computing on distributed DSP systems . . . . . . . . . . . . . . 851--878 Christopher J. Freitas and Derrick B. Coffin and Richard L. Murphy The characterization of a wide area network computation . . . . . . . . . . 879--894 Lúcia M. A. Drummond and Valmir C. Barbosa On reducing the complexity of matrix clocks . . . . . . . . . . . . . . . . . 895--905 Manuel Prieto and Ruben S. Montero and Ignacio M. Llorente and Francisco Tirado A parallel multigrid solver for viscous flows on anisotropic structured grids 907--923 Manuel Díaz and Bartolomé Rubio and Enrique Soler and José M. Troya Domain interaction patterns to coordinate HPF tasks . . . . . . . . . . 925--951 Y. Tseng and R. F. DeMara and P. J. Wilder Distributed-sum termination detection supporting multithreaded execution . . . 953--968 Wolfgang Blochinger and Carsten Sinz and Wolfgang Küchlin Parallel propositional satisfiability checking with distributed dynamic learning . . . . . . . . . . . . . . . . 969--994 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
M. Govett and L. Hart and T. Henderson and J. Middlecoff and D. Schaffer The Scalable Modeling System: directive-based code parallelization for distributed and shared memory computers 995--1020 Jorge Buenabad-Chávez and Henk L. Muller and Paul W. A. Stallard and David H. D. Warren Virtual memory on data diffusion architectures . . . . . . . . . . . . . 1021--1052 M. Yamashita and K. Fujisawa and M. Kojima SDPARA: SemiDefinite Programming Algorithm paRAllel version . . . . . . . 1053--1067 V. Teuli\`ere and Olivier Brun Parallelisation of the particle filtering technique and application to Doppler-bearing tracking of maneuvering sources . . . . . . . . . . . . . . . . 1069--1090 Liang Peng and Weng-Fai Wong and Chung-Kwong Yuen SilkRoad II: mixed paradigm cluster computing with RC\_dag consistency . . . 1091--1115 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Peter Arbenz and Efstratios Gallopoulos and Bernard Philippe and Yousef Saad Parallel Matrix Algorithms and Applications (PMAA '02) . . . . . . . . 1117--1119 Olivier Beaumont and Arnaud Legrand and Yves Robert Scheduling divisible workloads on heterogeneous platforms . . . . . . . . 1121--1152 Martin Be\vcka and Gabriel Ok\vsa On variable blocking factor in a parallel dynamic block-Jacobi SVD algorithm . . . . . . . . . . . . . . . 1153--1174 Olivier Coulaud and Michaël Dussere and Pascal Hénon and Erik Lefebvre and Jean Roman Optimization of a kinetic laser--plasma interaction code for large parallel systems . . . . . . . . . . . . . . . . 1175--1189 Abdou Guermouche and Jean-Yves L'Excellent and Gil Utard Impact of reordering on the memory of a multifrontal solver . . . . . . . . . . 1191--1218 Hemant Mahawar and Vivek Sarin Parallel iterative methods for dense linear systems in inductance extraction 1219--1235 James R. McCombs and Andreas Stathopoulos Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters . . . . . . . . . . 1237--1259 Sreekanth R. Sambavaram and Vivek Sarin and Ahmed Sameh and Ananth Grama Multipole-based preconditioners for large sparse linear systems . . . . . . 1261--1273 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Andrea Clematis and Mike Mineter and Richard Marciano High performance computing with geographical data . . . . . . . . . . . 1275--1279 K. C. Clarke Geocomputation's future at the extremes: high performance computing and nanoclients . . . . . . . . . . . . . . 1281--1295 Kenneth A. Hawick and P. D. Coddington and H. A. James Distributed frameworks and parallel algorithms for processing large-scale geographic data . . . . . . . . . . . . 1297--1333 Ann Chervenak and Ewa Deelman and Carl Kesselman and Bill Allcock and Ian Foster and Veronika Nefedova and Jason Lee and Alex Sim and Arie Shoshani and Bob Drach and others High-performance remote access to climate simulation data: a challenge problem for data grid technologies . . . 1335--1356 Giovanni Aloisio and Massimo Cafaro A dynamic earth observation system . . . 1357--1362 Asvin Ananthanarayan and Rajiv Balachandran and Robert Grossman and Yunhong Gu and Xinwei Hong and Jorge Levera and Marco Mazzucco Data webs for earth science data . . . . 1363--1379 Erik G. Hoel and Hanan Samet Data-parallel polygonization . . . . . . 1381--1401 Giuseppe Dattilo and Giandomenico Spezzano Simulation of a cellular landslide model with CAMELOT on high performance computers . . . . . . . . . . . . . . . 1403--1418 Apostolos Papadopoulos and Yannis Manolopoulos Parallel bulk-loading of spatial data 1419--1444 Mark Lanthier and Doron Nussbaum and Jörg-Rüdiger Sack Parallel implementation of geometric shortest path algorithms . . . . . . . . 1445--1479 Shaowen Wang and Marc P. Armstrong A quadtree approach to domain decomposition for spatial interpolation in Grid computing environments . . . . . 1481--1504 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Laurence T. Yang and Yi Pan and Minyi Guo Parallel and distributed scientific and engineering computing . . . . . . . . . 1505--1508 Yoshiyuki Iwamoto and Koichi Suga and Kanemitsu Ootsu and Takashi Yokota and Takanobu Baba Receiving message prediction method . . 1509--1538 Yudong Sun and Cho-Li Wang Solving irregularly structured problems based on distributed object model . . . 1539--1562 Weijian Fang and Cho-Li Wang and Francis C. M. Lau On the design of global object space for efficient multi-threading Java computing on clusters . . . . . . . . . . . . . . 1563--1587 Fan Chan and Jiannong Cao and Yudong Sun High-level abstractions for message-passing parallel programming . . 1589--1621 Xiaohui Shen and Alok Choudhary A distributed multi-storage I/O system for data intensive scientific computing 1623--1643 Patrick R. Amestoy and Iain S. Duff and Stéphane Pralet and Christof Vömel Adapting a parallel sparse direct solver to architectures with clusters of SMPs 1645--1668
Suchuan Dong and George Em Karniadakis Dual-level parallelism for high-order CFD methods . . . . . . . . . . . . . . 1--20 V. A. Pais and N. Fournier and M. A. Sutton and K. J. Weston and U. Dragosits Using High Performance Fortran to parallelise a multi-layer atmospheric transport model . . . . . . . . . . . . 21--33 Milan D. Mihajlovi\'c and David J. Silvester Efficient parallel solvers for the biharmonic equation . . . . . . . . . . 35--55 Michel Toulouse and Teodor Gabriel Crainic and Brunilde Sansó Systemic behavior of cooperative search algorithms . . . . . . . . . . . . . . . 57--79 Oliver Sinnen and Leonel Sousa List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures . . . . . . . . . 81--101 Frédéric Guinand and Aziz Moukrim and Eric Sanlaville Sensitivity analysis of tree scheduling on two machines with communication delays . . . . . . . . . . . . . . . . . 103--120 Yang-Suk Kee and Jin-Soo Kim and Soonhoi Ha Memory management for multi-threaded software DSM systems . . . . . . . . . . 121--138 Eric Violard A semantic framework to address data locality in data parallel languages . . 139--161 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Jörg Wensch and Ben Sommeijer Parallel simulation of axon growth in the nervous system . . . . . . . . . . . 163--186 Javier Cuenca and Domingo Giménez and José González Architecture of an automatically tuned linear algebra library . . . . . . . . . 187--210 Maria Calzarossa and Luisa Massari and Daniele Tessera A methodology towards automatic performance analysis of parallel applications . . . . . . . . . . . . . . 211--223 B. B. Fraguela and R. Doallo and J. Touriño and E. L. Zapata A compiler tool to predict memory hierarchy performance of scientific codes . . . . . . . . . . . . . . . . . 225--248 N. Tomov and E. Dempster and M. H. Williams and A. Burger and H. Taylor and P. J. B. King and P. Broughton Analytical response time estimation in parallel relational database systems . . 249--283 Kentaro Sano and Yusuke Kobayashi and Tadao Nakamura Differential coding scheme for efficient parallel image composition on a PC cluster system . . . . . . . . . . . . . 285--299 Alexandros V. Gerbessiotis Architecture independent parallel binomial tree option price valuations 301--316 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Lieven Eeckhout and Koen De Bosschere Efficient simulation of trace samples on parallel machines . . . . . . . . . . . 317--335 V. Blanco and J. A. González and C. León and C. Rodríguez and G. Rodríguez and M. Printista Predicting the performance of parallel programs . . . . . . . . . . . . . . . . 337--356 Eddy Caron and Gil Utard On the performance of parallel factorization of out-of-core matrices 357--375 Andrea Attanasio and Jean-François Cordeau and Gianpaolo Ghiani and Gilbert Laporte Parallel Tabu search heuristics for the dynamic multi-vehicle dial-a-ride problem . . . . . . . . . . . . . . . . 377--387 Murray Cole Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming . . . . . . . . . . 389--406 Sun-Yuan Hsieh and Chun-Hua Chen Pancyclicity on Möbius cubes with maximal edge faults . . . . . . . . . . . . . . 407--421 Jipeng Zhou and Francis C. M. Lau Multi-phase minimal fault-tolerant wormhole routing in meshes . . . . . . . 423--442 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Valerie Guralnik and George Karypis Parallel tree-projection-based sequence mining algorithms . . . . . . . . . . . 443--472 Gwan-Hwan Hwang An efficient algorithm for communication set generation of data parallel programs with block-cyclic distribution . . . . . 473--501 V. Dolean and S. Lanteri Parallel multigrid methods for the calculation of unsteady flows on unstructured grids: algorithmic aspects and parallel performances on clusters of PCs . . . . . . . . . . . . . . . . . . 503--525 Rong-Guey Chang and Tyng-Ruey Chuang and Jenq Kuen Lee Support and optimization for parallel sparse programs with array intrinsics of Fortran 90 . . . . . . . . . . . . . . . 527--550 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Albert Y. Zomaya and Fikret Ercal and El-ghazali Talbi Parallel and nature-inspired computational paradigms and applications 551--552 V. Di Martino and M. Mililotti Sub optimal scheduling in a grid using genetic algorithms . . . . . . . . . . . 553--565 Michelle Moore An accurate parallel genetic algorithm to schedule tasks on a cluster . . . . . 567--583 P. Morillo and J. M. Orduña and M. Fernández A comparison study of evolutive algorithms for solving the partitioning problem in distributed virtual environment systems . . . . . . . . . . 585--610 E. Alba and G. Luque and J. M. Troya Parallel LAN/WAN heuristics for optimization . . . . . . . . . . . . . . 611--628 Azzedine Boukerche and Kathia Regina Lemos Jucá and João Bosco Sobral and Mirela Sechi Moretti Annoni Notare An artificial immune based intrusion detection model for computer and telecommunication systems . . . . . . . 629--646 Sven E. Eklund A massively parallel architecture for distributed genetic algorithms . . . . . 647--676 S. Cahon and N. Melab and E.-G. Talbi Building with ParadisEO reusable parallel and distributed evolutionary algorithms . . . . . . . . . . . . . . . 677--697 E. Alba and F. Luna and A. J. Nebro and J. M. Troya Parallel heterogeneous genetic algorithms for continuous optimization 699--719 F. de Toro Negro and J. Ortega and E. Ros and S. Mota and B. Paechter and J. M. Martín PSFGA: Parallel processing and evolutionary computation for multiobjective optimisation . . . . . . 721--739 Xin-She Yang Pattern formation in enzyme inhibition and cooperativity with parallel cellular automata . . . . . . . . . . . . . . . . 741--751 Franciszek Seredynski and Pascal Bouvry and Albert Y. Zomaya Cellular automata computations and secret key cryptography . . . . . . . . 753--766 Tiago Sousa and Arlindo Silva and Ana Neves Particle swarm-based data mining algorithms for classification tasks . . 767--783 Peter Koro\vsec and Jurij \vSilc and Borut Robi\vc Solving the mesh-partitioning problem with an ant-colony algorithm . . . . . . 785--801 Forbes J. Burkowski Proximity and priority: applying a gene expression algorithm to the Traveling Salesperson Problem . . . . . . . . . . 803--816 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Matthew L. Massie and Brent N. Chun and David E. Culler The ganglia distributed monitoring system: design, implementation, and experience . . . . . . . . . . . . . . . 817--840 Gerassimos Barlas and Bharadwaj Veeravalli Quantized load distribution for tree and bus-connected processors . . . . . . . . 841--865 Nihar R. Mahapatra and Shantanu Dutt Adaptive Quality Equalizing: High-performance load balancing for parallel branch-and-bound across applications and computing systems . . . 867--881 Ching-Wen Chen and Shih-Chang Fu A minimal links traversed dynamic rerouting network . . . . . . . . . . . 883--898 Michael Mascagni and Ashok Srinivasan Parameterizing parallel multiplicative lagged-Fibonacci generators . . . . . . 899--916 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Gerhard R. Joubert Editorial note . . . . . . . . . . . . . 917--918 Peter Koro\vsec and Jurij \vSilc and Borut Robi\vc ``Solving the mesh-partitioning problem with an ant-colony algorithm'' [Parallel Computing 30 (2004) 785--801] . . . . . 919--921 Stéphane Genaud and Arnaud Giersch and Frédéric Vivien Load-balancing scatter operations for grid computing . . . . . . . . . . . . . 923--946 Ming Zhu and Constantine Katsinis and Wentong Cai and Bu-Sung Lee Key Messaging on SOME-Bus clusters . . . 947--971 Teofilo F. Gonzalez and David Serena $n$-Cube network: node disjoint shortest paths for maximal distance pairs of vertices . . . . . . . . . . . . . . . . 973--998 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Chun-Hsi Huang and Sanguthevar Rajasekaran High-performance parallel bio-computing 999--1000 Mark L. Green and Russ Miller Molecular structure determination on a computational and data grid . . . . . . 1001--1017 Werner Dubitzky and Damian McCourt and Mykola Galushka and Mathilde Romberg and Bernd Schuller Grid-enabled data warehousing for molecular engineering . . . . . . . . . 1019--1035 Alfredo Tirado-Ramos and Peter M. A. Sloot and Alfons G. Hoekstra and Marian Bubak An integrative approach to high-performance biomedical problem solving environments on the Grid . . . . 1037--1055 Mark L. Green and Russ Miller Evolutionary molecular structure determination using grid-enabled data mining . . . . . . . . . . . . . . . . . 1057--1071 David Piggott and Conor Teljeur and Alan Kelly Exploring the potential for using the grid to support health impact assessment modelling . . . . . . . . . . . . . . . 1073--1091 N. Jacq and C. Blanchet and C. Combet and E. Cornillot and L. Duret and K. Kurata and H. Nakamura and T. Silvestre and V. Breton Grid as a bioinformatic tool . . . . . . 1093--1107 Minyi Guo and Michael (Shan-Hui) Ho and Weng-Long Chang Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing . . . . . . . . . 1109--1125 Chain-Wu Lee and Chun-Hsi Huang Toward cooperative genomic knowledge inference . . . . . . . . . . . . . . . 1127--1135 John H. Miller and Fang Zheng Large-scale simulations of cellular signaling processes . . . . . . . . . . 1137--1149 Peter K. K. Loh and W. J. Hsu Fault-tolerant routing for complete Josephus Cubes . . . . . . . . . . . . . 1151--1167 Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Jorge Buenabad-Chávez and Henk L. Muller and Paul W. A. Stallard and David H. D. Warren The diffusion space of data diffusion architectures . . . . . . . . . . . . . 1169--1193 Alexey Lastovetsky and Ravi Reddy On performance analysis of heterogeneous parallel algorithms . . . . . . . . . . 1195--1216 Michael Mascagni and Hongmei Chi Parallel linear congruential generators with Sophie--Germain moduli . . . . . . 1217--1231 Suchendra M. Bhandarkar and Shankar R. Chandrasekaran Parallel parsing of MPEG video on a shared-memory symmetric multiprocessor 1233--1276
Masaaki Shimasaki and Hans P. Zima The Earth Simulator . . . . . . . . . . 1277--1278 Tetsuya Sato The Earth Simulator: roles and impacts 1279--1286 Shinichi Habata and Kazuhiko Umezawa and Mitsuo Yokokawa and Shigemune Kitawaki Hardware system of the Earth Simulator 1287--1313 Takashi Yanagawa and Kenji Suehiro Software system of the Earth Simulator 1315--1327 K. Itakura and A. Uno and M. Yokokawa and T. Ishihara and Y. Kaneda Scalability of hybrid programming for a CFD code on the Earth Simulator . . . . 1329--1343 Akiyoshi Wakatani A parallel and scalable algorithm for ADI method with pre-propagation and message vectorization . . . . . . . . . 1345--1359 Kentaro Sano and Shintaro Momose and Hiroyuki Takizawa and Hiroaki Kobayashi and Tadao Nakamura Efficient parallel processing of competitive learning algorithms . . . . 1361--1383
Jürg Hutter and Alessandro Curioni Dual-level parallelism for ab initio molecular dynamics: Reaching teraflop performance with the CPMD code . . . . . 1--17 Fumihiko Ino and Kanrou Ooyama and Kenichi Hagihara A data distributed parallel algorithm for nonrigid image registration . . . . 19--43 M. Salomon and F. Heitz and G.-R. Perrin and J.-P. Armspach A massively parallel approach to deformable matching of $3$D medical images via stochastic differential equations . . . . . . . . . . . . . . . 45--71 Stéphane Guyetant and Mathieu Giraud and Ludovic L'Hours and Steven Derrien and Stéphane Rubini and Dominique Lavenier and Frédéric Raimbault Cluster of re-configurable nodes for scanning large genomic banks . . . . . . 73--96 Andrea Di Blas and Arun Jagota and Richard Hughey Optimizing neural networks on SIMD parallel computers . . . . . . . . . . . 97--115 Michihiro Koibuchi and Akiya Jouraku and Hideharu Amano Path selection algorithm: the strategy for designing deterministic routing from alternative paths . . . . . . . . . . . 117--130 Hong-Chun Hsu and Liang-Chih Chiang and Jimmy J. M. Tan and Lih-Hsing Hsu Fault hamiltonicity of augmented cubes 131--145
Bruno Raffin and Han-Wei Shen and Dirk Bartz Parallel graphics and visualization . . 147--148 T. Furumura and L. Chen Parallel simulation of strong ground motions during recent and historical damaging earthquakes in Tokyo, Japan . . 149--165 Hongfeng Yu and Kwan-Liu Ma A study of I/O methods for parallel visualization of large-scale data . . . 167--183 Jinzhu Gao and Chaoli Wang and Liya Li and Han-Wei Shen A parallel multiresolution volume rendering algorithm for large data visualization . . . . . . . . . . . . . 185--204 M. Strengert and M. Magallón and D. Weiskopf and Stefan Guthe and T. Ertl Large volume visualization of compressed time-dependent datasets on GPU clusters 205--219 David E. DeMarle and Christiaan P. Gribble and Solomon Boulos and Steven G. Parker Memory sharing for interactive ray tracing on clusters . . . . . . . . . . 221--242 Kevin Liang and Patricia Monger and Huge Couchman Interactive parallel visualization of large particle datasets . . . . . . . . 243--260
Erich Strohmaier and Jack J. Dongarra and Hans W. Meuer and Horst D. Simon Recent trends in the marketplace of high performance computing . . . . . . . . . 261--273 Iain S. Duff and Jennifer A. Scott Stabilized bordered block diagonal forms for parallel sparse solvers . . . . . . 275--289 Arijit Laha and Amitava Sen and Bhabani P. Sinha Parallel algorithms for identifying convex and non-convex basis polygons in an image . . . . . . . . . . . . . . . . 290--310 Bhanu Hariharan and Srinivas Aluru Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods . . 311--331 Li Chunlin and Li Layuan A distributed utility-based two level market solution for optimal resource scheduling in computational grid . . . . 332--351 Takashi Midorikawa and Daisuke Shiraishi and Masayoshi Shigeno and Yasuki Tanabe and Toshihiro Hanawa and Hideharu Amano The performance of SNAIL-2 a (S2SS-MIN connected multiprocessor with cache coherent mechanism) . . . . . . . . . . 352--370 Yuan-Hsiang Teng and Jimmy J. M. Tan and Lih-Hsing Hsu Honeycomb rectangular disks . . . . . . 371--388 Dong Xiang and Ai Chen and Jiaguang Sun Fault-tolerant routing and multicasting in hypercubes using a partial path set-up . . . . . . . . . . . . . . . . . 389--411
Daniel A. Reed and Mitsuhisa Sato and Denis Trystram Editorial . . . . . . . . . . . . . . . 413--413 Margreet Nool and Michael M. J. Proot A parallel least-squares spectral element solver for incompressible flow problems on unstructured grids . . . . . 414--438 Jacques M. Bahi and Sylvain Contassot-Vivier and Raphaël Couturier Evaluation of the asynchronous iterative algorithms in the context of distant heterogeneous clusters . . . . . . . . . 439--461 Ghazi Al-Rawi and John Cioffi and Mark Horowitz On task mapping optimization for parallel decoding of low-density parity-check codes on message-passing architectures . . . . . . . . . . . . . 462--490 Josef Kohout and Ivana Kolingerová and Ji\vrí \vZára Parallel Delaunay triangulation in $E^2$ and $E^3$ for computers with shared memory . . . . . . . . . . . . . . . . . 491--522 Z. Du and F. Lin A novel parallelization approach for hierarchical clustering . . . . . . . . 523--527
Sanya Tangpongprasit and Takahiro Katagiri and Kenji Kise and Hiroki Honda and Toshitsugu Yuba A time-to-live based reservation algorithm on fully decentralized resource discovery in Grid computing . . 529--543 Oscar Plata and Rafael Asenjo and Eladio Gutiérrez and Francisco Corbera and Angeles Navarro and Emilio L. Zapata On the parallelization of irregular and dynamic programs . . . . . . . . . . . . 544--562 J. Verkaik and H. X. Lin A class of novel parallel algorithms for the solution of tridiagonal systems . . 563--587 Robert W. Numrich Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax . . . . . . . . . . . . . . . . . 588--607 Marcello Balduccini and Enrico Pontelli and Omar Elkhatib and Hung Le Issues in parallel execution of non-monotonic reasoning systems . . . . 608--647
Alexey Kalinov and Alexey Lastovetsky and Yves Robert Heterogeneous computing . . . . . . . . 649--652 T. Hagras and J. Jane\vcek A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems . . 653--670 S. Shivle and P. Sugavanam and H. J. Siegel and A. A. Maciejewski and T. Banka and K. Chindam and S. Dussinger and A. Kutruff and P. Penumarthy and P. Pichumani and P. Satyasekaran and D. Sendek and J. Smith and J. Sousa and J. Sridharan and J. Velazco Mapping subtasks with multiple versions on an ad hoc grid . . . . . . . . . . . 671--690 Yoshinori Kishimoto and Shuichi Ichikawa Optimizing the configuration of a heterogeneous cluster with multiprocessing and execution-time estimation . . . . . . . . . . . . . . . 691--710 Javier Cuenca and Domingo Giménez and Juan-Pedro Martínez Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems . . . . 711--735 Ioana Banicescu and Ricolindo L. Cariño and Jaderick P. Pabico and Mahadevan Balasubramaniam Design and implementation of a novel dynamic load balancing library for cluster computing . . . . . . . . . . . 736--756 M-Tahar Kechadi and Ilias K. Savvas Dynamic task scheduling for irregular network topologies . . . . . . . . . . . 757--776 A. Srinivasan and N. Chandra Latency tolerance through parallelization of time in scientific applications . . . . . . . . . . . . . . 777--796 Han Yu and Xin Bai and Dan C. Marinescu Workflow management and resource discovery for an intelligent grid . . . 797--811
Bruno Richard and Nicolas Maillard and César A. F. De Rose and Reynaldo Novaes The I-Cluster Cloud: distributed management of idle resources for intense computing . . . . . . . . . . . . . . . 813--838 Z. G. Wang and Y. S. Wong and M. Rahman Development of a parallel optimization method based on genetic simulated annealing algorithm . . . . . . . . . . 839--857 J. C. Pichel and D. B. Heras and J. C. Cabaleiro and F. F. Rivera Performance optimization of irregular codes based on the combination of reordering and blocking techniques . . . 858--876 G. L. Reijns and A. J. C. van Gemund Predicting the execution times of parallel-independent programs using Pearson distributions . . . . . . . . . 877--899 Uro\vs \vCibej and Bo\vstjan Slivnik and Borut Robi\vc The complexity of static data replication in data grids . . . . . . . 900--912 Jürgen Dreher and Rainer Grauer Racoon: a parallel mesh-adaptive framework for hyperbolic conservation laws . . . . . . . . . . . . . . . . . . 913--932 Tao Dong A linear time pessimistic one-step diagnosis algorithm for hypercube multicomputer systems . . . . . . . . . 933--947 Hayedeh Ahrabian and Abbas Nowzari-Dalini Parallel generation of binary trees in $A$-order . . . . . . . . . . . . . . . 948--955
Barbara M. Chapman and Federico Massaioli OpenMP . . . . . . . . . . . . . . . . . 957--959 Xinmin Tian and Jay P. Hoeflinger and Grant Haab and Yen-Kuang Chen and Milind Girkar and Sanjiv Shah A compiler for exploiting nested parallelism in OpenMP programs . . . . . 960--983 R. Blikberg and T. Sòrevik Load balancing and OpenMP implementation of nested parallelism . . . . . . . . . 984--998 C. S. Ierotheou and H. Jin and G. Matthews and S. P. Johnson and R. Hood Generating OpenMP code using an interactive parallelization environment 999--1012 Rocco Aversa and Beniamino Di Martino and Massimiliano Rak and Salvatore Venticinque and Umberto Villano Performance prediction through simulation of a hybrid MPI/OpenMP application . . . . . . . . . . . . . . 1013--1033 Rocco Aversa and Beniamino Di Martino and Nicola Mazzocca and Salvatore Venticinque A hierarchical distributed-shared memory parallel Branch & Bound application with PVM and OpenMP for multiprocessor clusters . . . . . . . . . . . . . . . . 1034--1047 Kengo Nakajima Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator . . . . . . . . . . 1048--1065 Federico Massaioli and Filippo Castiglione and Massimo Bernaschi OpenMP parallelization of agent-based models . . . . . . . . . . . . . . . . . 1066--1081 Roland Norcen and Andreas Uhl High performance JPEG 2000 and MPEG-4 VTC on SMPs using OpenMP . . . . . . . . 1082--1098 Inho Park and Seon Wook Kim Study of OpenMP applications on the InfiniBand-based software distributed shared-memory system . . . . . . . . . . 1099--1113 Lei Huang and Barbara Chapman and Zhenying Liu Towards a more efficient implementation of OpenMP for clusters via translation to global arrays . . . . . . . . . . . . 1114--1139 Motonori Hirano and Mitsuhisa Sato and Yoshio Tanaka OpenGR: a directive-based grid programming environment . . . . . . . . 1140--1154 P. E. Hadjidoukas and T. S. Papatheodorou OpenMP extensions for master-slave message passing computing . . . . . . . 1155--1167
Anonymous Editorial Board . . . . . . . . . . . . iv--vi P. Wapperom and A. N. Beris and M. A. Straka A new transpose split method for three-dimensional FFTs: performance on an Origin2000 and Alphaserver cluster 1--13 Chun-Hsi Huang and Xin He and Min Qian Communication-optimal parallel parenthesis matching . . . . . . . . . . 14--23 Kazuhide Nakata and Makoto Yamashita and Katsuki Fujisawa and Masakazu Kojima A parallel primal-dual interior-point method for semidefinite programs using positive definite matrix completion . . 24--43 Valmir C. Barbosa and Fernando M. N. Miranda and Matheus C. M. Agostini Cell-centric heuristics for the classification of cellular automata . . 44--66 L. Carracciuolo and L. D'Amore and A. Murli Towards a parallel component for imaging in PETSc programming environment: a case study in $3$-D echocardiography . . . . 67--83 Sun-Yuan Hsieh Fault-tolerant cycle embedding in the hypercube with more both faulty vertices and faulty edges . . . . . . . . . . . . 84--91 Takahiro Katagiri and Kenji Kise and Hiroki Honda and Toshitsugu Yuba ABCLibScript: a directive to support specification of an auto-tuning facility for numerical software . . . . . . . . . 92--112
Maurice Clint and Efstratios Gallopoulos and Esmond Ng and Jean Roman Parallel Matrix Algorithms and Applications (PMAA'04) . . . . . . . . . 113--114 Asad Awan and Ronaldo A. Ferreira and Suresh Jagannathan and Ananth Grama Unstructured peer-to-peer networks for sharing processor cycles . . . . . . . . 115--135 Patrick R. Amestoy and Abdou Guermouche and Jean-Yves L'Excellent and Stéphane Pralet Hybrid scheduling for the parallel solution of linear systems . . . . . . . 136--156 Peter Arbenz and Martin Be\vcka and Roman Geus and Ulrich Hetmaniuk and Tiziano Mengotti On a parallel multilevel preconditioned Maxwell eigensolver . . . . . . . . . . 157--165 Gabriel Ok\vsa and Marián Vajter\vsic Efficient pre-processing in the parallel block-Jacobi SVD algorithm . . . . . . . 166--176 Eric Polizzi and Ahmed H. Sameh A parallel hybrid banded system solver: the SPIKE algorithm . . . . . . . . . . 177--194 Petko Yanev and Erricos John Kontoghiorghes Efficient algorithms for estimating the general linear model . . . . . . . . . . 195--204
P. Rajesh Kumar and K. Sridharan and S. Srinivasan A parallel algorithm, architecture and FPGA realization for landmark determination and map construction in a planar unknown environment . . . . . . . 205--221 Marc Hofmann and Erricos John Kontoghiorghes Pipeline Givens sequences for computing the QR decomposition on a EREW PRAM . . 222--230 Takahiro Katagiri and Kenji Kise and Hiroki Honda and Toshitsugu Yuba ABCLib\_DRSSED: a parallel eigensolver with an auto-tuning facility . . . . . . 231--250 Nahid Emad and Ani Sedrakian Toward the reusability for iterative linear algebra software in distributed environment . . . . . . . . . . . . . . 251--266
R. S. Montero and E. Huedo and I. M. Llorente Benchmarking of high throughput computing applications on Grids . . . . 267--279 Makoto Satoh and Kiyoshi Negishi and Atsushi Kobayashi Analysis of two-level data mapping in an HPF compiler for distributed-memory machines . . . . . . . . . . . . . . . . 280--300 Prasanta K. Jana Polynomial interpolation and polynomial root finding on OTIS-mesh . . . . . . . 301--312 Silvia M. Figueira Optimal partitioning of nodes to space-sharing parallel tasks . . . . . . 313--324 R. Hatzky Domain cloning for a particle-in-cell (PIC) code on a cluster of symmetric-multiprocessor (SMP) computers 325--330
Xiao Qin and Hong Jiang A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems 331--356 Gianluigi Folino and Giuseppe Mendicino and Alfonso Senatore and Giandomenico Spezzano and Salvatore Straface A model based on cellular automata for the parallel simulation of $3$D unsaturated flow . . . . . . . . . . . . 357--376 F. Luna and A. J. Nebro and E. Alba Observations in using Grid-enabled technologies for solving multi-objective optimization problems . . . . . . . . . 377--393 A. H. Baker and R. D. Falgout and U. M. Yang An assumed partition algorithm for determining processor inter-communication . . . . . . . . . . 394--414 E. Alba and F. Almeida and M. Blesa and C. Cotta and M. Díaz and I. Dorta and J. Gabarró and C. León and G. Luque and J. Petit and C. Rodríguez and A. Rojas and F. Xhafa Efficient parallel LAN/WAN algorithms for optimization. The \sc MALLBA Project 415--440 Zhihua Du and Feng Lin pNJTree: a parallel program for reconstruction of neighbor-joining tree and its application in ClustalW . . . . 441--446
Herbert Kuchen and Murray Cole Editorial . . . . . . . . . . . . . . . 447--448 Marco Danelutto and Marco Aldinucci Algorithmic skeletons meeting grids . . 449--462 Xiao Yan Deng and Greg Michaelson and Phil Trinder Autonomous mobility skeletons . . . . . 463--478 Horacio González-Vélez Self-adaptive skeletal task farm for computational grids . . . . . . . . . . 479--490 Antonio Dorta and Pablo López and Francisco de Sande Basic skeletons in llc . . . . . . . . . 491--506 Clemens Grelck and Sven-Bodo Scholz Merging compositions of array skeletons in SaC . . . . . . . . . . . . . . . . . 507--522 Mercedes Hidalgo-Herrero and Yolanda Ortega-Mallén and Fernando Rubio Analyzing the influence of mixed evaluation on the performance of Eden skeletons . . . . . . . . . . . . . . . 523--538 F. Clément and V. Martin and A. Vodicka and R. Di Cosmo and P. Weis Domain decomposition and skeleton programming with OCamlP3l . . . . . . . 539--550 Rob H. Bisseling and Ildikó Flesch Mondriaan sparse matrix partitioning for attacking cryptosystems by a parallel block Lanczos algorithm --- a case study 551--567 E. Cesar and A. Moreno and J. Sorribes and E. Luque Modeling Master/Worker applications for automatic performance tuning . . . . . . 568--589 Kiminori Matsuzaki and Zhenjiang Hu and Masato Takeichi Parallel skeletons for manipulating general trees . . . . . . . . . . . . . 590--603 J. Falcou and J. Sérot and T. Chateau and J. T. Lapresté Quaff: efficient C++ design for parallel skeletons . . . . . . . . . . . . . . . 604--615 Paras Mehta and José Nelson Amaral and Duane Szafron Is MPI suitable for a generative design-pattern system? . . . . . . . . . 616--626
Jeff Linderoth and Roberto Musmanno Optimization on grids --- optimization for grids . . . . . . . . . . . . . . . 627--628 Lúcia M. A. Drummond and Eduardo Uchoa and Alexandre D. Gonçalves and Juliana M. N. Silva and Marcelo C. P. Santos and Maria Clícia S. de Castro A grid-enabled distributed branch-and-bound algorithm with application on the Steiner Problem in graphs . . . . . . . . . . . . . . . . . 629--642 N. Melab and M. Mezmaz and E.-G. Talbi Parallel cooperative meta-heuristics on the computational grid.: a case study: the bi-objective Flow-Shop problem . . . 643--659 Wahid Chrabakh and Rich Wolski GridSAT: a system for solving satisfiability problems using a computational grid . . . . . . . . . . . 660--687 Demetrio Laganá and Pasquale Legato and Ornella Pisacane and Francesca Vocaturo Solving simulation optimization problems on grid computing systems . . . . . . . 688--700 Andrea Attanasio and Gianpaolo Ghiani and Lucio Grandinetti and Francesca Guerriero Auction algorithms for decentralized parallel machine scheduling . . . . . . 701--709
Georgios Goumas and Nikolaos Drosinos and Maria Athanasaki and Nectarios Koziris Message-passing code generation for non-rectangular tiling transformations 711--732 Hon F. Li and Zunce Wei and Dhrubajyoti Goswami Quasi-atomic recovery for distributed agents . . . . . . . . . . . . . . . . . 733--758 Savina Bansal and Padam Kumar and Kuldip Singh An improved two-step algorithm for task and data parallel scheduling in distributed memory machines . . . . . . 759--774
H. Sarbazi-Azad and M. Ould-Khaoua and A. Y. Zomaya Performance evaluation of communication networks for parallel and distributed systems . . . . . . . . . . . . . . . . 775--776 Luca Gatani and Giuseppe Lo Re and Salvatore Gaglio An efficient distributed algorithm for generating and updating multicast trees 777--793 Rod Fatoohi and Ken Kardys and Sumy Koshy and Soundarya Sivaramakrishnan and Jeffrey S. Vetter Performance evaluation of high-speed interconnects using dense communication patterns . . . . . . . . . . . . . . . . 794--807 James Broberg and Zahir Tari and Panlop Zeephongsekul Task assignment with work-conserving migration . . . . . . . . . . . . . . . 808--830 Bahman Javadi and Mohammad K. Akbari and Jemal H. Abawajy A performance model for analysis of heterogeneous multi-cluster systems . . 831--851 Masaru Takesue The psi-cube: a bus-based cube-type clustering network for high-performance on-chip systems . . . . . . . . . . . . 852--869 A. Shahrabi Performance comparison of routing algorithms in wormhole-switched networks 870--885 M. Hoseiny Farahabady and F. Safaei and A. Khonsari and M. Fathy Characterization of spatial fault patterns in interconnection networks . . 886--901 Azzedine Boukerche and Caron Dzermajko and Lu Kaiyuan An enhancement towards dynamic grid-based DDM protocol for distributed simulation using multiple levels of data filtering . . . . . . . . . . . . . . . 902--919
Dan Reed Changes and updates . . . . . . . . . . 1--1 Jong Wook Kwak and Chu Shik Jhon Torus Ring: improving performance of interconnection network by modifying hierarchical ring . . . . . . . . . . . 2--20 Celso C. Ribeiro and Isabel Rosseti Efficient parallel cooperative implementations of GRASP heuristics . . 21--35 Meijie Ma and Guizhen Liu and Jun-Ming Xu Panconnectivity and edge-fault-tolerant pancyclicity of augmented cubes . . . . 36--42 James S. Hammonds and Faisal Saied and Mark A. Shannon Solving coupled $3$-D paraxial wave and thermal diffusion equations with mixed-mode parallel computations . . . . 43--53 Gregorio Bernabé and Ricardo Fernández and Jose M. García and Manuel E. Acacio and José González An efficient implementation of a $3$D wavelet transform based encoder on hyper-threading technology . . . . . . . 54--72 Jinn-Shyong Yang and Shyue-Ming Tang and Jou-Ming Chang and Yue-Li Wang Parallel construction of optimal independent spanning trees on hypercubes 73--79
Osman Ya\csar and Hasan Da\=g Trends in parallel computing . . . . . . 81--82 Hasan Da\=g An approximate inverse preconditioner and its implementation for conjugate gradient method . . . . . . . . . . . . 83--91 Halis Sak and Süleyman Özekici and \.Ilkay Boduro\uglu Parallel computing in Asian option pricing . . . . . . . . . . . . . . . . 92--108 Omar Ramadan Three dimensional MPI parallel implementation of the PML algorithm for truncating finite-difference time-domain Grids . . . . . . . . . . . . . . . . . 109--115 Peter Rissland and Yuefan Deng Electrostatic force computation for bio-molecules on supercomputers with torus networks . . . . . . . . . . . . . 116--123 Ferat Sahin and M. Çetin Yavuz and Ziya Arnavut and Önder Uluyol Fault diagnosis for airplane engines using Bayesian networks and distributed particle swarm optimization . . . . . . 124--143
César A. F. De Rose and Hans-Ulrich Heiss and Barry Linnert Distributed dynamic processor allocation for multicomputers . . . . . . . . . . . 145--158 Alessia Gualandris and Simon Portegies Zwart and Alfredo Tirado-Ramos Performance analysis of direct N . . . . 159--173 Zeyao Mo and Xiaowen Xu Relaxed RS0 or CLJP coarsening strategy for parallel AMG . . . . . . . . . . . . 174--185 D. D'Ambrosio and W. Spataro Parallel evolutionary modelling of geological processes . . . . . . . . . . 186--212 Walfredo Cirne and Francisco Brasileiro and Daniel Paranhos and Luís Fabrício W. Góes and William Voorsluys On the efficacy, efficiency and emergent behavior of task replication in large distributed systems . . . . . . . . . . 213--234
Christophe Cérin Large scale grids . . . . . . . . . . . 235--237 Vandy Berten and Bruno Gaujal Brokering strategies in computational grids using stochastic prediction models 238--249 J. R. Bilbao-Castro and A. Merino and I. García and J. M. Carazo and J. J. Fernández Parameter optimization in $3$D reconstruction on a large scale grid . . 250--263 Benjamin Gaidioz and Birger Koblitz and Nuno Santos Exploring high performance distributed file storage using LDPC codes . . . . . 264--274 Denis Caromel and Alexandre di Costanzo and Clément Mathieu Peer-to-peer for computational grids: mixing clusters and desktop machines . . 275--288 Nicolas Jacq and Vincent Breton and Hsin-Yen Chen and Li-Yung Ho and Martin Hofmann and Vinod Kasam and Hurng-Chun Lee and Yannick Legré and Simon C. Lin and Astrid Maaß and Emmanuel Medernach and Ivan Merelli and Luciano Milanesi and Giulio Rastelli and Matthieu Reichstadt and Jean Salzemann and Horst Schwichtenberg and Ying-Ta Wu and Marc Zimmermann Virtual screening on large scale grids 289--301 M. Mezmaz and N. Melab and E.-G. Talbi An efficient load balancing strategy for grid-based branch and bound algorithm 302--313 Hiroshi Yamauchi and Dongyan Xu Portable virtual cycle accounting for large-scale distributed cycle sharing systems . . . . . . . . . . . . . . . . 314--327 Eun-Kyu Byun and Jin-Soo Kim DynaGrid: a dynamic service deployment and resource migration framework for WSRF-compliant applications . . . . . . 328--338 Moreno Marzolla and Matteo Mordacchini and Salvatore Orlando Peer-to-peer systems for discovering resources in a dynamic grid . . . . . . 339--358
Luis Paulo Santo and Bruno Raffin and Alan Heirich Parallel graphics and visualization . . 359--360 K. Debattista and A. Chalmers and R. Gillibrand and P. Longhurst and G. Mastoropoulou and V. Sundstedt Parallel selective rendering of high-fidelity virtual environments . . . 361--376 Bernhard Thomaszewski and Wolfgang Blochinger Physically based simulation of cloth on distributed memory architectures . . . . 377--390 Fábio F. Bernardon and Steven P. Callahan and João L. D. Comba and Cláudio T. Silva An adaptive framework for visualizing unstructured grids with time-varying scalar fields . . . . . . . . . . . . . 391--405 C. Müller and M. Strengert and T. Ertl Adaptive load balancing for raycasting of non-uniformly bricked volumes . . . . 406--419 D. Cotting and M. Waschbüsch and M. Duller and M. Gross WinSGL: synchronizing displays in parallel graphics using cost-effective software genlocking . . . . . . . . . . 420--437 Mario Lorenz and Guido Brunnett and Marcel Heinz Driving tiled displays with an extended Chromium system based on stream cached multicast communication . . . . . . . . 438--466
Chao-Tung Yang and Kuan-Wei Cheng and Wen-Chung Shih On development of an efficient parallel loop self-scheduling for grid computing environments . . . . . . . . . . . . . . 467--487 Jung-Sheng Fu Conditional fault-tolerant hamiltonicity of star graphs . . . . . . . . . . . . . 488--496 Henrique Andrade and Tahsin Kurc and Alan Sussman and Joel Saltz Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments 497--520 V. Hernandez and J. E. Roman and A. Tomas Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement . . . . . . 521--540 T. Esposti Ongaro and C. Cavazzoni and G. Erbacci and A. Neri and M. V. Salvetti A parallel multiphase flow code for the $3$D simulation of explosive volcanic eruptions . . . . . . . . . . . . . . . 541--560 Isaac D. Scherson and Daniel S. Valencia and Enrique Cauich Service address routing: a network-embedded resource management layer for cluster computing . . . . . . 561--571 Wei Jie and Wentong Cai and Lizhe Wang and Rob Procter A secure information service for monitoring large scale grids . . . . . . 572--591
Bernd Mohr and Jesper Larsson Träff and Joachim Worringen Selected papers from EuroPVM/MPI 2006 593--594 William Gropp and Rajeev Thakur Thread-safety in an MPI implementation: Requirements and analysis . . . . . . . 595--604 Fabian Kulla and Peter Sanders Scalable parallel suffix array construction . . . . . . . . . . . . . . 605--612 Jelena Pje\vsivac-Grbovi\'c and George Bosilca and Graham E. Fagg and Thara Angskun and Jack J. Dongarra MPI collective algorithm selection and quadtree encoding . . . . . . . . . . . 613--623 Torsten Hoefler and Peter Gottschling and Andrew Lumsdaine and Wolfgang Rehm Optimizing a conjugate gradient solver with non-blocking collective operations 624--633 Darius Buntinas and Guillaume Mercier and William Gropp Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem . . . . . . . . . . . . . . . 634--644
Wu-chun Feng and Dinesh Manocha High-performance computing using accelerators . . . . . . . . . . . . . . 645--647 Patrick McCormick and Jeff Inman and James Ahrens and Jamaludin Mohd-Yusof and Greg Roth and Sharen Cummins Scout: a data-parallel programming language for graphics processors . . . . 648--662 Naga K. Govindaraju and Dinesh Manocha Cache-efficient numerical algorithms using graphics hardware . . . . . . . . 663--684 Dominik Göddeke and Robert Strzodka and Jamaludin Mohd-Yusof and Patrick McCormick and Sven H. M. Buijssen and Matthias Grajewski and Stefan Turek Exploring weak scalability for FEM calculations on a GPU-enhanced cluster 685--699 Filip Blagojevic and Dimitrios S. Nikolopoulos and Alexandros Stamatakis and Christos D. Antonopoulos and Matthew Curtis-Maury Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems . . . . . . . . . . . 700--719 David A. Bader and Virat Agarwal and Kamesh Madduri and Seunghwa Kang High performance combinatorial algorithm design on the Cell Broadband Engine processor . . . . . . . . . . . . . . . 720--740 Martin C. Herbordt and Josh Model and Bharat Sukhwani and Yongfeng Gu and Tom VanCourt Single pass streaming BLAST on FPGAs . . 741--756
Alexey Lastovetsky and Ravi Reddy Data distribution for dense factorization on computers with memory heterogeneity . . . . . . . . . . . . . 757--779 J. Xu Benchmarks on tera-scalable models for DNS of turbulent channel flow . . . . . 780--794 N. Botta and C. Ionescu Relation-based computations in a monadic BSP model . . . . . . . . . . . . . . . 795--821 M. Vanneschi and L. Veraldi Dynamicity in distributed applications: issues, problems and the ASSIST approach 822--845
V. Santhosh Kumar and R. Nanjundiah and M. J. Thazhuthaveetil and R. Govindarajan Impact of message compression on the scalability of an atmospheric modeling application on clusters . . . . . . . . 1--16 Yuhui Deng and Frank Wang and Na Helian and Sining Wu and Chenhan Liao Dynamic and scalable storage management architecture for Grid Oriented Storage devices . . . . . . . . . . . . . . . . 17--31 Jason Brazile and Rudolf Richter and Daniel Schläpfer and Michael E. Schaepman and Klaus I. Itten Cluster versus grid for operational generation of ATCOR's \sc MODTRAN-based look up tables . . . . . . . . . . . . . 32--46 Albert Chan and Frank Dehne and Prosenjit Bose and Markus Latzel Coarse grained parallel algorithms for graph matching . . . . . . . . . . . . . 47--62 Fouad B. Chedid An optimal parallelization of the two-list algorithm of cost ${O}(2^{n/2})$ . . . . . . . . . . . . . 63--65 Anonymous Acknowledgement to reviewers . . . . . . 66--68
Andrzej M. Goscinski and Adam K. L. Wong A study of the concurrent execution of parallel and sequential applications on a non-dedicated cluster . . . . . . . . 69--91 Antonio Plaza and David Valencia and Javier Plaza An experimental comparison of parallel algorithms for hyperspectral analysis using heterogeneous and homogeneous networks of workstations . . . . . . . . 92--114 A. Murli and L. D'Amore and L. Carracciuolo and M. Ceccarelli and L. Antonelli High performance edge-preserving regularization in $3$D SPECT imaging . . 115--132
Eladio Gutiérrez and Oscar Plata and Emilio L. Zapata An analytical model of locality-based parallel irregular reductions . . . . . 133--157 Jean-François Pineau and Yves Robert and Frédéric Vivien The impact of heterogeneity on master-slave scheduling . . . . . . . . 158--176 S. Chandra Sekhara Rao and Sarita Parallel solution of large symmetric tridiagonal linear systems . . . . . . . 177--197
Volodymyr Kindratenko and Duncan Buell Reconfigurable Systems Summer Institute 2007 . . . . . . . . . . . . . . . . . . 199--200 Roger D. Chamberlain and Joseph M. Lancaster and Ron K. Cytron Visions for application development on hybrid computing systems . . . . . . . . 201--216 Seth Koehler and John Curreri and Alan D. George Performance analysis challenges and framework for high-performance reconfigurable computing . . . . . . . . 217--230 M. Wirthlin and D. Poznanovic and P. Sundararajan and A. Coppola and D. Pellerin and W. Najjar and R. Bruce and M. Babst and O. Pritchard and P. Palazzari and G. Kuzmanov OpenFPGA CoreLib core library interoperability effort . . . . . . . . 231--244 Proshanta Saha and Esam El-Araby and Miaoqing Huang and Mohamed Taher and Sergio Lopez-Buedo and Tarek El-Ghazawi and Chang Shu and Kris Gaj and Alan Michalski and Duncan Buell Portable library development for reconfigurable computing systems: a case study . . . . . . . . . . . . . . . . . 245--260 Yongfeng Gu and Tom VanCourt and Martin C. Herbordt Explicit design of FPGA-based coprocessors for short-range force computations in molecular dynamics simulations . . . . . . . . . . . . . . 261--277 Akila Gothandaraman and Gregory D. Peterson and G. L. Warren and Robert J. Hinde and Robert J. Harrison FPGA acceleration of a quantum Monte Carlo application . . . . . . . . . . . 278--291
Laura Grigori and Bernard Philippe and Ahmed Sameh and Damien Tromeur-Dervout and Marian Vajtersic Parallel matrix algorithms and applications . . . . . . . . . . . . . . 293--295 Emmanuel Agullo and Abdou Guermouche and Jean-Yves L'Excellent A parallel out-of-core multifrontal method: Storage of factors on disk and analysis of models for an out-of-core active memory . . . . . . . . . . . . . 296--317 C. Chevalier and F. Pellegrini PT-Scotch: a tool for efficient parallel graph ordering . . . . . . . . . . . . . 318--331 Guy Antoine Atenekeng Kahou and Laura Grigori and Masha Sosonkina A partitioning algorithm for block-diagonal matrices with overlap . . 332--344 Pascal Hénon and Pierre Ramet and Jean Roman On finding approximate supernodes for an efficient block-ILU$(k)$ factorization 345--362 L. Giraud and A. Haidar and L. T. Watson Parallel scalability study of hybrid preconditioners in three dimensions . . 363--379 Raphaël Couturier and Christophe Denis and Fabienne Jézéquel GREMLINS: a large sparse linear solver for grid environment . . . . . . . . . . 380--391 N. Yamanaka and T. Ogita and S. M. Rump and S. Oishi A parallel algorithm for accurate dot product . . . . . . . . . . . . . . . . 392--410 S. Hunold and T. Rauber and G. Rünger Combining building blocks for parallel multi-level matrix multiplication . . . 411--426 Kok Fu Ng and Norhashidah Hj. Mohd Ali Performance analysis of explicit group parallel algorithms for distributed memory multicomputer . . . . . . . . . . 427--440 C. Bekas and A. Curioni and W. Andreoni Atomic wavefunction initialization in ab initio molecular dynamics using distributed Lanczos . . . . . . . . . . 441--450 Petko I. Yanev and Erricos J. Kontoghiorghes Parallel algorithms for downdating the least squares estimator of the regression model . . . . . . . . . . . . 451--468 Maria Lucka and Igor Melichercik and Ladislav Halada Application of multistage stochastic programs solved in parallel in portfolio management . . . . . . . . . . . . . . . 469--485
Dajin Wang A linear-time algorithm for computing collision-free path on reconfigurable mesh . . . . . . . . . . . . . . . . . . 487--496 Yasheng Maimaitijiang and Mohammed Ali Roula and Stuart Watson and Ralf Patz and Robert J. Williams and Huw Griffiths Parallelization methods for implementation of a magnetic induction tomography forward model in symmetric multiprocessor systems . . . . . . . . . 497--507 Lee Kee Goh and Bharadwaj Veeravalli Design and performance evaluation of combined first-fit task allocation and migration strategies in mesh multiprocessor systems . . . . . . . . . 508--520 Wei-Ming Lin Performance modeling and analysis of correlated parallel computations . . . . 521--538 J. Sánchez-Curto and P. Chamorro-Posada On a faster parallel implementation of the split-step Fourier method . . . . . 539--549
Julien Straubhaar Parallel preconditioners for the conjugate gradient algorithm using Gram--Schmidt and least squares methods 551--569 Woo-Chul Jeun and Yang-Suk Kee and Soonhoi Ha and Changdon Kee Overcoming performance bottlenecks in using OpenMP on SMP clusters . . . . . . 570--592 Carlo Mastroianni and Domenico Talia and Oreste Verta Designing an information system for Grids: Comparing hierarchical, decentralized P2P and super-peer models 593--611
David A. Bader and Srinivas Aluru High-performance computational biology 613--615 Vipin Sachdeva and Michael Kistler and Evan Speight and Tzy-Hwa Kathy Tzeng Exploring the viability of the Cell Broadband Engine for bioinformatics applications . . . . . . . . . . . . . . 616--626 David A. Bader and Kamesh Madduri A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms . . . . . 627--639 Sadaf R. Alam and Pratul K. Agarwal and Jeffrey S. Vetter Performance characteristics of biomolecular simulations on high-end systems with multi-core processors . . . 640--651 P. Brenner and J. M. Wozniak and D. Thain and A. Striegel and J. W. Peng and J. A. Izaguirre Biomolecular committor probability calculation enabled by processing in network storage . . . . . . . . . . . . 652--660 Michela Taufer and Ming-Ying Leung and Thamar Solorio and Abel Licon and David Mireles and Roberto Araiza and Kyle L. Johnson RNAVLab: a virtual laboratory for studying RNA secondary structures based on grid computing technology . . . . . . 661--680 Tim Oliver and Leow Yuan Yeow and Bertil Schmidt Integrating FPGA acceleration into HMMer 681--691
Alain Merigot and Alfredo Petrosino Parallel processing for image and video processing . . . . . . . . . . . . . . . 693--693 Alain Merigot and Alfredo Petrosino Parallel processing for image and video processing: Issues and challenges . . . 694--699 O. Kao On parallel image retrieval with dynamically extracted features . . . . . 700--709 Myeongsoo Oh and Kiyoharu Aizawa Large-scale image sensing by a group of smart image sensors . . . . . . . . . . 710--717 C. Colombo and A. Del Bimbo and A. Valli A real-time full body tracking and humanoid animation system . . . . . . . 718--726 Francesco Isgr\`o and Domenico Tegolo A distributed genetic algorithm for restoration of vertical line scratches 727--734 P. P. Jonker and J. G. E. Olk and C. Nicolescu Distributed bucket processing: a paradigm embedded in a framework for the parallel processing of pixel sets . . . 735--746 Radhika S. Grover and Qiang Li and H.-P. Dommel Performance study of data layout schemes for a SAN-based video server . . . . . . 747--756 Paolo Gamba and Luca Lombardi and Marco Porta Log-map analysis . . . . . . . . . . . . 757--764
X. Meng and V. Chaudhary Boosting data throughput for sequence database similarity searches on FPGAs using an adaptive buffering scheme . . . 1--11 Ricardo C. Corrêa and Valmir C. Barbosa Partially ordered distributed computations on asynchronous point-to-point networks . . . . . . . . 12--28 Lih-Yuan Deng and Huajiang Li and Jyh-Jen Horng Shiau Scalable parallel multiple recursive generators of large order . . . . . . . 29--37 Alfredo Buttari and Julien Langou and Jakub Kurzak and Jack Dongarra A class of parallel tiled linear algebra algorithms for multicore architectures 38--53 Anonymous Acknowledgement to reviewers . . . . . . 54--55
Anonymous Editorial Board . . . . . . . . . . . . ??
Fabrício A. B. da Silva and Hermes Senger Improving scalability of Bag-of-Tasks applications running on master-slave platforms . . . . . . . . . . . . . . . 57--71 Yuh-Rau Wang A novel $O(1)$ time algorithm for $3$D block-based medial axis transform by peeling corner shells . . . . . . . . . 72--82 Anne Benoit and Mourad Hakem and Yves Robert Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems . . . . . 83--108 L. K. S. Daldorff and B. Eliasson Parallelization of a Vlasov--Maxwell solver in four-dimensional phase space 109--115
Rupak Biswas and Leonid Oliker and Jeffrey Vetter Revolutionary technologies for acceleration of emerging petascale applications . . . . . . . . . . . . . . 117--118 David A. Bader and Virat Agarwal and Seunghwa Kang Computing discrete transforms on the Cell Broadband Engine . . . . . . . . . 119--137 Jakub Kurzak and Wesley Alvaro and Jack Dongarra Optimizing matrix multiplication for a short-vector SIMD architecture --- CELL processor . . . . . . . . . . . . . . . 138--150 Jeremy S. Meredith and Gonzalo Alvarez and Thomas A. Maier and Thomas C. Schulthess and Jeffrey S. Vetter Accuracy and performance of graphics processors: a Quantum Monte Carlo application case study . . . . . . . . . 151--163 David J. Hardy and John E. Stone and Klaus Schulten Multilevel summation of electrostatic potentials using graphics processing units . . . . . . . . . . . . . . . . . 164--177 Samuel Williams and Leonid Oliker and Richard Vuduc and John Shalf and Katherine Yelick and James Demmel Optimization of sparse matrix-vector multiplication on emerging multicore platforms . . . . . . . . . . . . . . . 178--194
Suresh Behara and Sanjay Mittal Parallel finite element computation of incompressible flows . . . . . . . . . . 195--212 Arquimedes Canedo and Ben A. Abderazek and Masahiro Sowa Efficient compilation for queue size constrained queue processors . . . . . . 213--225 Tien-Yien Li and Chih-Hsiung Tsai HOM4PS-2.0para: Parallelization of HOM4PS-2.0 for solving polynomial systems . . . . . . . . . . . . . . . . 226--238 Sid-Ahmed-Ali Touati and Zsolt Mathe Periodic register saturation in innermost loops . . . . . . . . . . . . 239--254
Won W. Ro and Jean-Luc Gaudiot A complexity-effective microprocessor design with decoupled dispatch queues and prefetching . . . . . . . . . . . . 255--268 Yaohang Li and Michael Mascagni and Andrey Gorin A decentralized parallel implementation for parallel tempering algorithm . . . . 269--283 L. Grinberg and D. Pekurovsky and S. J. Sherwin and G. E. Karniadakis Parallel performance of the coarse space linear vertex solver and low energy basis preconditioner for spectral/hp elements . . . . . . . . . . . . . . . . 284--304 Antonio Robles-Gómez and Aurelio Bermúdez and Rafael Casado and Åshild Grònstad Solheim A dynamic distributed mechanism for reconfiguring high-performance networks 305--312
Ching-Wen Chen and Chuan-Chi Weng and Chang-Jung Ku An overlapping and pipelining data transmission MAC protocol with multiple channels in ad hoc networks . . . . . . 313--330 Taro Konda and Yoshimasa Nakamura A new algorithm for singular value decomposition and its parallelization 331--344 Gerold Jäger and Clemens Wagner Efficient parallelizations of Hermite and Smith normal form algorithms . . . . 345--357 Julian Borrill and Leonid Oliker and John Shalf and Hongzhang Shan and Andrew Uselton HPC global file system performance analysis using a scientific-application derived benchmark . . . . . . . . . . . 358--373
Markus Geimer and Felix Wolf and Brian J. N. Wylie and Bernd Mohr A scalable tool architecture for diagnosing wait states in massively parallel applications . . . . . . . . . 375--388 Jay Smith and Vladimir Shestak and Howard Jay Siegel and Suzy Price and Larry Teklits and Prasanna Sugavanam Robust resource allocation in a cluster based imaging system . . . . . . . . . . 389--400 Yang Wang and Ming Zhu and Hua Li A distributed Key Message algorithm to optimize the communication in clusters 401--415 Hatem Ltaief and Marc Garbey A parallel Aitken-additive Schwarz waveform relaxation suitable for the grid . . . . . . . . . . . . . . . . . . 416--428
Cole Trapnell and Michael C. Schatz Optimizing data intensive GPGPU computations for DNA sequence alignment 429--440 Tz-Liang Kueng and Cheng-Kuan Lin and Tyne Liang and Jimmy J. M. Tan and Lih-Hsing Hsu Embedding paths of variable lengths into hypercubes with conditional link-faults 441--454 Arturo González-Escribano and Arjan J. C. van Gemund and Valentín Cardeñoso-Payo Performance implications of synchronization structure in parallel programming . . . . . . . . . . . . . . 455--474 Ananta Tiwari and Vahid Tabatabaee and Jeffrey K. Hollingsworth Tuning parallel applications in parallel 475--492
Diane Lingrand and Tristan Glatard and Johan Montagnat Modeling the latency on production grids with respect to the execution context 493--511 Anshu Dubey and Katie Antypas and Murali K. Ganapathy and Lynn B. Reid and Katherine Riley and Dan Sheeler and Andrew Siegel and Klaus Weide Extensible component-based architecture for FLASH, a massively parallel, multiphysics simulation code . . . . . . 512--522 I. Marín Carrión and E. Arias Antúnez and M. M. Artigao Castillo and J. J. Águila Guerrero and J. J. Miralles Canals Thread-based implementations of the false nearest neighbors method . . . . . 523--534 Hamid Mahini and Hamid Sarbazi-Azad Resource placement in three-dimensional tori . . . . . . . . . . . . . . . . . . 535--543 Henning Meyerhenke and Burkhard Monien and Stefan Schamberger Graph partitioning and disturbed diffusion . . . . . . . . . . . . . . . 544--569
Franck Cappello and Thomas Herault and Jack Dongarra Foreword . . . . . . . . . . . . . . . . 571--571 Bin Jia Process cooperation in multiple message broadcast . . . . . . . . . . . . . . . 572--580 Peter Sanders and Jochen Speck and Jesper Larsson Träff Two-tree algorithms for full bandwidth broadcast, reduction and scan . . . . . 581--594 Daniel Becker and Rolf Rabenseifner and Felix Wolf and John C. Linford Scalable timestamp synchronization for event traces of message-passing applications . . . . . . . . . . . . . . 595--607 Rajeev Thakur and William Gropp Test suite for evaluating performance of multithreaded MPI communication . . . . 608--617
Jeffrey K. Hollingsworth Editorial . . . . . . . . . . . . . . . 1--2 P. Amestoy and I. S. Duff and A. Guermouche and Tz. Slavova Analysis of the solution phase of a parallel multifrontal approach . . . . . 3--15 Shigeo Orii Metrics for evaluation of parallel efficiency toward highly parallel processing . . . . . . . . . . . . . . . 16--25 Juan Piernas-Canovas and Jarek Nieplocha Implementation and evaluation of active storage in modern parallel file systems 26--47 Rajesh Sudarsan and Calvin J. Ribbens Design and performance of a scheduling framework for resizable parallel applications . . . . . . . . . . . . . . 48--64 Carlos Alberto Alonso Sanches and Nei Yoshihiro Soma and Horacio Hideki Yanasse Observations on optimal parallelizations of two-list . . . . . . . . . . . . . . 65--67 Anonymous Acknowledgment to Reviewers . . . . . . 68--69 Anonymous Editorial Board . . . . . . . . . . . . ??
Javier Navaridas and Jose Miguel-Alonso and Francisco Javier Ridruejo and Wolfgang Denzel Reducing complexity in tree-like computer interconnection networks . . . 71--85 Hinde Lilia Bouziane and Christian Pérez and Thierry Priol Extending software component models with the master-worker paradigm . . . . . . . 86--103 Yi-Neng Lin and Ying-Dar Lin and Yuan-Cheng Lai Thread allocation in CMP-based multithreaded network processors . . . . 104--116 Mathieu Luisier and Gerhard Klimeck Numerical strategies towards peta-scale simulations of nanoelectronics devices 117--128 Yusuke Okitsu and Fumihiko Ino and Kenichi Hagihara High-performance cone beam reconstruction using CUDA compatible GPUs . . . . . . . . . . . . . . . . . . 129--141 J. Götz and K. Iglberger and C. Feichtinger and S. Donath and U. Rüde Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores . . . . . . . . . . . . 142--151
Mauricio Marin and Veronica Gil-Costa and Carolina Bonacic and Ricardo Baeza-Yates and Isaac D. Scherson Sync/Async parallel search for the efficient design and construction of Web search engines . . . . . . . . . . . . . 153--168 Andrzej Karbowski and Maciej Remiszewski Assessment of the Cell Broadband Engine Architecture as a platform to solve closed-loop optimal control problems . . 169--180 M. Krotkiewski and M. Dabrowski Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs . . . 181--198 J. Berli\'nska and M. Drozdowski Heuristics for multi-round divisible loads scheduling with limited memory . . 199--211
Costas Bekas and Pasqua D'Ambra and Ananth Grama and Yousef Saad and Petko Yanev Special issue on Parallel Matrix Algorithms and Applications . . . . . . 213--214 Joseph M. Elble and Nikolaos V. Sahinidis and Panagiotis Vouzis GPU computing with Kaczmarz's and other iterative algorithms for linear systems 215--231 Stanimire Tomov and Jack Dongarra and Marc Baboulin Towards dense linear algebra for hybrid GPU accelerated manycore systems . . . . 232--240 Aydìn Buluç and John R. Gilbert and Ceren Budak Solving path problems on the GPU . . . . 241--253 Bora Uçar and Ümit V. Çatalyürek and Cevdet Aykanat A Matrix Partitioning Interface to PaToH in MATLAB . . . . . . . . . . . . . . . 254--272 T. Huckle and A. Kallischko and A. Roy and M. Sedlacek and T. Weinzierl An efficient parallel implementation of the MSPAI preconditioner . . . . . . . . 273--284 L. Giraud and A. Haidar and S. Pralet Using multiple levels of parallelism to enhance the performance of domain decomposition solvers . . . . . . . . . 285--296 Martin Be\vcka and Gabriel Ok\vsa and Marián Vajter\vsic and Laura Grigori On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm . . 297--307 Fabrice Dupros and Florent De Martin and Evelyne Foerster and Dimitri Komatitsch and Jean Roman High-performance finite-element simulations of seismic wave propagation in three-dimensional nonlinear inelastic geological media . . . . . . . . . . . . 308--325 Maximilian Emans Performance of parallel AMG-preconditioners in CFD-codes for weakly compressible flows . . . . . . . 326--338 Jose E. Roman and Matthias Kammerer and Florian Merz and Frank Jenko Fast eigenvalue calculations in a massively parallel plasma turbulence code . . . . . . . . . . . . . . . . . . 339--358 T. Auckenthaler and M. Bader and T. Huckle and A. Spörl and K. Waldherr Matrix exponentials and parallel prefix computation in a quantum control problem 359--369
Ruppa K. Thulasiram Preface . . . . . . . . . . . . . . . . 371--371 Vladimir Surkov Parallel option pricing with Fourier space time-stepping method on graphics processing units . . . . . . . . . . . . 372--380 Manfred Gilli and Enrico Schumann Distributed optimisation of a portfolio's Omega . . . . . . . . . . . 381--389 S. Corsaro and P. L. De Angelis and Z. Marino and F. Perla and P. Zanetti On parallel asset-liability management in life insurance: a forward risk-neutral approach . . . . . . . . . 390--402 Gianluca Fusai and Daniele Marazzina and Marina Marena Option pricing, maturity randomization and distributed computing . . . . . . . 403--414 Giray Ökten and Matthew Willyard Parameterization based on randomized quasi-Monte Carlo methods . . . . . . . 415--422
Andrew V. Terekhov Parallel Dichotomy Algorithm for solving tridiagonal system of linear equations with multiple right-hand sides . . . . . 423--438 Daisuke Takahashi Parallel implementation of multiple-precision arithmetic and $2,576,980,370,000$ decimal digits of $\pi$ calculation . . . . . . . . . . . 439--448 Pavan Yalamanchili and Sumod Mohan and Rommel Jalasutram and Tarek Taha Acceleration of hierarchical Bayesian network based cortical models on multicore architectures . . . . . . . . 449--468 Tomas Hruz and Stefan Geisseler and Marcel Schöngens Parallelism in simulation and modeling of scale-free complex networks . . . . . 469--485
Qiankun Miao and Guangzhong Sun and Jiulong Shan and Guoliang Chen Parallelization and optimization of Mfold on shared memory system . . . . . 487--494 Dan Gordon and Rachel Gordon CARP--CG: a robust and efficient parallel solver for linear systems, applied to strongly convection dominated PDEs . . . . . . . . . . . . . . . . . . 495--515 Fei Xia and Yong Dou and Dan Zhou and Xin Li Fine-grained parallel RNA secondary structure prediction using SCFGs on FPGA 516--530 Sean Rul and Hans Vandierendonck and Koen De Bosschere A profile-based tool for finding pipeline parallelism in sequential programs . . . . . . . . . . . . . . . . 531--551
J. Ignacio Hidalgo and Francisco Fernandez and Juan Lanchares and Erick Cantú-Paz and Albert Zomaya Parallel Architectures and Bioinspired Algorithms . . . . . . . . . . . . . . . 553--554 M. Ruci\'nski and D. Izzo and F. Biscani On the impact of the migration topology on the Island Model . . . . . . . . . . 555--571 José L. Risco-Martín and David Atienza and J. Manuel Colmenar and Oscar Garnica A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems . . . . . . . . . . . . 572--590 Marjan Rouhipour and Peter J. Bentley and Hooman Shayani Fast bio-inspired computation using a GPU-based systemic computer . . . . . . 591--617 Carlos Pérez-Miguel and Jose Miguel-Alonso and Alexander Mendiburu Porting Estimation of Distribution Algorithms to the Cell Broadband Engine 618--634 Una-May O'Reilly and Eric Robinson and Sanjeev Mohindra and Julie Mullen and Nadya Bliss Hogs and slackers: Using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures . . . . . . . 635--644
Stanimire Tomov and Rajib Nath and Jack Dongarra Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing 645--654 K. A. Hawick and A. Leist and D. P. Playne Parallel graph component labelling with GPUs and CUDA . . . . . . . . . . . . . 655--678 T. E. Athanaileas and G. E. Athanasiadou and G. V. Tsoulos and D. I. Kaklamani Parallel radio-wave propagation modeling with image-based ray tracing techniques 679--695 Marina Alonso and Salvador Coll and Juan-Miguel Martínez and Vicente Santonja and Pedro López and José Duato Power saving in regular interconnection networks . . . . . . . . . . . . . . . . 696--712
Bo Li and Koichi Wada Communication latency tolerant parallel algorithm for particle swarm optimization . . . . . . . . . . . . . . 1--10 Yung-Chang Chiu and Ce-Kuen Shieh and Tzu-Chi Huang and Tyng-Yeu Liang and Kuo-Chih Chu Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems . . . . . . . . . . . . . . . . 11--25 Sevin Varoglu and Stephen Jenks Architectural support for thread communications in multi-core processors 26--41 Rahul Nagpal and Y. N. Srikant Compiler-assisted power optimization for clustered VLIW architectures . . . . . . 42--59 Oleg V. Shylo and Timothy Middelkoop and Panos M. Pardalos Restart strategies in optimization: parallel and serial cases . . . . . . . 60--68
Robert W. Numrich and Michael A. Heroux Self-similarity of parallel machines . . 69--84 Brice Goglin High-performance message-passing over generic Ethernet hardware with Open-MX 85--100 Anshu Dubey and Katie Antypas and Christopher Daley Parallel algorithms for moving Lagrangian data on block structured Eulerian meshes . . . . . . . . . . . . 101--113 Alireza Poshtkohi and M. B. Ghaznavi-Ghoushchi DotDFS: a Grid-based high-throughput file transfer system . . . . . . . . . . 114--136 Anonymous Editorial Board . . . . . . . . . . . . ifc
Antonio Robles-Gómez and Aurelio Bermúdez and Rafael Casado Efficient network management applied to source routed networks . . . . . . . . . 137--156 Liangxiu Han and Chee Sun Liew and Jano van Hemert and Malcolm Atkinson A generic parallel processing model for facilitating data mining and integration 157--171 Eric Aubanel Scheduling of tasks in the parareal algorithm . . . . . . . . . . . . . . . 172--182 José I. Aliaga and Matthias Bollhöfer and Alberto F. Martín and Enrique S. Quintana-Orti' Exploiting thread-level parallelism in the iterative solution of sparse linear systems . . . . . . . . . . . . . . . . 183--202 Anonymous Editorial Board . . . . . . . . . . . . ??
Christian Konrad Two-constraint domain decomposition with Space Filling Curves . . . . . . . . . . 203--216 Robert W. Robey and Jonathan M. Robey and Rob Aulwes In search of numerical consistency in parallel programming . . . . . . . . . . 217--229 Omar Bouattane and Bouchaib Cherradi and Mohamed Youssfi and Mohamed O. Bensalah Parallel $c$-means algorithm for image segmentation on a reconfigurable mesh computer . . . . . . . . . . . . . . . . 230--243 David Díaz and Francisco José Esteban and Pilar Hernández and Juan Antonio Caballero and Gabriel Dorado and Sergio Gálvez Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture . . . . . . . . . . . . . . 244--259 Anonymous Editorial Board . . . . . . . . . . . . ??
Dimitrije Jevremovi\'c and Cong T. Trinh and Friedrich Srienc and Carlos P. Sosa and Daniel Boley Parallelization of Nullspace Algorithm for the computation of metabolic pathways . . . . . . . . . . . . . . . . 261--278 Fangzhou Wei and Ali E. Yilmaz A hybrid message passing/shared memory parallelization of the adaptive integral method for multi-core clusters . . . . . 279--301 Hao Wang and Xudong Fu and Guangqian Wang and Tiejian Li and Jie Gao A common parallel computing framework for modeling hydrological processes of river basins . . . . . . . . . . . . . . 302--315 Pablo D. Mininni and Duane Rosenberg and Raghu Reddy and Annick Pouquet A hybrid MPI--OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence . . . . . . . . . . . . 316--326 Anonymous Editorial Board . . . . . . . . . . . . ??
Jeffrey K. Hollingsworth In Memoriam: Angela C. Sodan, PhD (August 30, 1955--April 21, 2011) . . . 327--327 Yves Robert and Leonel Sousa and Denis Trystram Parallel Computing --- Special Issue . . 329--330 Anne Benoit and Henri Casanova and Veronika Rehn-Sonigo and Yves Robert Resource allocation for multiple concurrent in-network stream-processing applications . . . . . . . . . . . . . . 331--348 Cristina Boeres and Idalmis Milián Sardiña and Lúcia M. A. Drummond An efficient weighted bi-objective scheduling algorithm for heterogeneous systems . . . . . . . . . . . . . . . . 349--364 Anne Benoit and Yves Robert and Arnold Rosenberg and Frédéric Vivien Static worksharing strategies for heterogeneous computers with unrecoverable interruptions . . . . . . 365--378 Luis Garcés-Erice Admission control for a responsive distributed middleware using decision trees to model run-time parameters . . . 379--391 M. M. Khan and A. D. Rast and J. Navaridas and X. Jin and L. A. Plana and M. Luján and S. Temple and C. Patterson and D. Richards and J. V. Woods and J. Miguel-Alonso and S. B. Furber Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric . . . . . . . . . . 392--409 Anne Benoit and Alexandru Dobrila and Jean-Marc Nicod and Laurent Philippe Mapping workflow applications with types on heterogeneous specialized platforms 410--427 Jorge G. Barbosa and Belmiro Moreira Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters . . . . . . . . . . . . . . . . 428--438 Peter Benner and Pablo Ezzatti and Daniel Kressner and Enrique S. Quintana-Orti' and Alfredo Remón A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU--GPU platforms . . . . . . . . . . . 439--450 Chenqi Wang and Neil Cafferkey and James Kennedy and John P. Morrison CG3DR: Coordination of icosahedral virus reconstruction using Condensed Graphs 451--465 Mathieu Giraud and Jean-Stéphane Varré Parallel Position Weight Matrices algorithms . . . . . . . . . . . . . . . 466--478 Anna Beletska and W\lodzimierz Bielecki and Albert Cohen and Marek Palkowski and Krzysztof Siedlecki Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations . . . . . . . . . . . . 479--497 Anonymous Editorial Board . . . . . . . . . . . . ??
Leonid Oliker and Rajesh Nishtala and Rupak Biswas Emerging programming paradigms for large-scale scientific computing . . . . 499--500 Kamesh Madduri and Eun-Jin Im and Khaled Z. Ibrahim and Samuel Williams and Stéphane Ethier and Leonid Oliker Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms . . . . . . . . . . . 501--520 Wang Xian and Aoki Takayuki Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster . . . . . . . . . 521--535 Christian Feichtinger and Johannes Habich and Harald Köstler and Georg Hager and Ulrich Rüde and Gerhard Wellein A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU--CPU clusters . . . . 536--549 Darren J. Kerbyson and Michael Lang and Scott Pakin Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies . . . . . . . 550--561 Haoqiang Jin and Dennis Jespersen and Piyush Mehrotra and Rupak Biswas and Lei Huang and Barbara Chapman High performance computing using MPI and OpenMP on multi-core parallel systems 562--575 Rajesh Nishtala and Yili Zheng and Paul H. Hargrove and Katherine A. Yelick Tuning collective communication for Partitioned Global Address Space programming models . . . . . . . . . . . 576--591 David Gay and Joel Galenson and Mayur Naik and Kathy Yelick Yada: Straightforward parallel programming . . . . . . . . . . . . . . 592--609 Steven J. Plimpton and Karen D. Devine MapReduce in MPI for large-scale graph algorithms . . . . . . . . . . . . . . . 610--632 Michael Wilde and Mihael Hategan and Justin M. Wozniak and Ben Clifford and Daniel S. Katz and Ian Foster Swift: a language for distributed parallel scripting . . . . . . . . . . . 633--652 Anonymous Editorial Board . . . . . . . . . . . . ??
Lizhi Peng and Bo Yang and Lei Zhang and Yuehui Chen A parallel evolving algorithm for flexible neural tree . . . . . . . . . . 653--666 Min Yeol Lim and Vincent W. Freeh and David K. Lowenthal Adaptive, transparent CPU scaling algorithms leveraging inter-node MPI communication regions . . . . . . . . . 667--683 Tristan Glatard and Sorina Camarasu-Pop A model of pilot-job resource provisioning on production grids . . . . 684--692 Loris Marchal and Frédéric Vivien Editorial . . . . . . . . . . . . . . . 693--693 Naga Vydyanathan and Umit Catalyurek and Tahsin Kurc and Ponnuswamy Sadayappan and Joel Saltz Optimizing latency and throughput of application workflows on clusters . . . 694--712 Ioannis Riakiotakis and Florina M. Ciorba and Theodore Andronikos and George Papakonstantinou Distributed dynamic load balancing for pipelined computations on heterogeneous systems . . . . . . . . . . . . . . . . 713--729 Anonymous Editorial Board . . . . . . . . . . . . ??
Peter Arbenz and Yousef Saad and Ahmed Sameh and Olaf Schenk Special issue on Parallel Matrix Algorithms and Applications (PMAA'10) 731--732 Karan Mendiratta and Eric Polizzi A threaded SPIKE algorithm for solving general banded systems . . . . . . . . . 733--741 Daniel Maurer and Christian Wieners A parallel block LU decomposition method for distributed finite element matrices 742--758 Chenhan D. Yu and Weichung Wang and Dan'l Pierce A CPU--GPU hybrid approach for the unsymmetric multifrontal method . . . . 759--770 L. Karlsson and B. Kågström Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures . . . . . 771--782 T. Auckenthaler and V. Blum and H.-J. Bungartz and T. Huckle and R. Johanni and L. Krämer and B. Lang and H. Lederer and P. R. Willems Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations . . . . . . . . . 783--794 M. Petschow and P. Bientinesi MR$^3$-SMP: a symmetric tridiagonal eigensolver for multi-core architectures 795--805 A. N. Yzelman and Rob H. Bisseling Two-dimensional cache-oblivious sparse matrix-vector multiplication . . . . . . 806--819 Johannes Langguth and Md. Mostofa Ali Patwary and Fredrik Manne Parallel algorithms for bipartite matching problems on distributed memory computers . . . . . . . . . . . . . . . 820--845 Cyril Flaig and Peter Arbenz A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images . . . . . . . . . . . 846--854 Anonymous Editorial Board . . . . . . . . . . . . ??
Torsten Hoefler Extensions for next-generation parallel programming models . . . . . . . . . . . 1--1 Nick Rutar and Jeffrey K. Hollingsworth Data centric techniques for mapping performance data to program variables 2--14 Joshua Hursey and Richard L. Graham Analyzing fault aware collective performance in a process fault tolerant MPI . . . . . . . . . . . . . . . . . . 15--25 Jesper Larsson Träff Alternative, uniformly expressive and more scalable interfaces for collective communication in MPI . . . . . . . . . . 26--36 George Bosilca and Aurelien Bouteiller and Anthony Danalis and Thomas Herault and Pierre Lemarinier and Jack Dongarra DAGuE: a generic distributed DAG engine for High Performance Computing . . . . . 37--51 Martin Sandrieser and Siegfried Benkner and Sabri Pllana Using explicit platform descriptions to support programming of heterogeneous many-core systems . . . . . . . . . . . 52--65 Phil Miller and Aaron Becker and Laxmikant Kalé Using shared arrays in message-driven parallel programs . . . . . . . . . . . 66--74 Pieter Hijma and Rob V. van Nieuwpoort and Ceriel J. H. Jacobs and Henri E. Bal Generating synchronization statements in divide-and-conquer programs . . . . . . 75--89 Anonymous Editorial Board . . . . . . . . . . . . ??
Lucas Mello Schnorr and Guillaume Huard and Philippe Olivier Alexandre Navaux A hierarchical aggregation model to achieve visualization scalability in the analysis of parallel applications . . . 91--110 Holger Scherl and Markus Kowarschik and Hannes G. Hofmann and Benjamin Keck and Joachim Hornegger Evaluation of state-of-the-art hardware architectures for fast cone-beam CT reconstruction . . . . . . . . . . . . . 111--124 A. Moreno and E. Cesar and A. Guevara and J. Sorribes and T. Margalef Load balancing in homogeneous pipeline based applications . . . . . . . . . . . 125--139 Aleksandr Ovcharenko and Daniel Ibanez and Fabien Delalondre and Onkar Sahni and Kenneth E. Jansen and Christopher D. Carothers and Mark S. Shephard Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications . . . . 140--156 Andreas Klöckner and Nicolas Pinto and Yunsup Lee and Bryan Catanzaro and Paul Ivanov and Ahmed Fasih PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation 157--174 Anonymous Editorial Board . . . . . . . . . . . . ??
Minhaj Ahmad Khan Scheduling for heterogeneous systems using constrained critical paths . . . . 175--193 Kathryn Mohror and Karen L. Karavanic Trace profiling: Scalable event tracing on high-end parallel systems . . . . . . 194--225 Gerassimos Barlas Cluster-based optimized parallel video transcoding . . . . . . . . . . . . . . 226--244 H. M. Aktulga and J. C. Fogarty and S. A. Pandit and A. Y. Grama Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques . . . . . . . . . . . . . . . 245--259 Roman Wyrzykowski and Krzysztof Rojek and Lukasz Szustak Model-driven adaptation of double-precision matrix multiplication to the Cell processor architecture . . . 260--276 Anonymous Editorial Board . . . . . . . . . . . . ??
F. Argüello and D. B. Heras and M. Bóo and J. Lamas-Rodríguez The split-and-merge method in general purpose computation on GPUs . . . . . . 277--288 Timothy D. R. Hartley and Erik Saule and Ümit V. Çatalyürek Improving performance of adaptive component-based dataflow middleware . . 289--309 Peng Di and Hui Wu and Jingling Xue and Feng Wang and Canqun Yang Parallelizing SOR for GPGPUs using alternate loop tiling . . . . . . . . . 310--328 Rahul Nagpal and Anasua Bhowmik Criticality guided energy aware speculation for speculative multithreaded processors . . . . . . . . 329--341 Anonymous Editorial Board . . . . . . . . . . . . ??
Volodymyr Kindratenko and Gregory D. Peterson Application accelerators in HPC --- Editorial introduction . . . . . . . . . 343--343 Andrew G. Schmidt and Siddhartha Datta and Ashwin A. Mendon and Ron Sass Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster . . . . . . . . . . 344--364 Frederico Pratas and Pedro Trancoso and Leonel Sousa and Alexandros Stamatakis and Guochun Shi and Volodymyr Kindratenko Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems . . . . . . . . 365--390 Peng Du and Rick Weber and Piotr Luszczek and Stanimire Tomov and Gregory Peterson and Jack Dongarra From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming . . . . . 391--407 Francisco Vázquez and José Jesús Fernández and Ester M. Garzón Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach . . . . . . . . . . . . 408--420 Depeng Yang and Gregory. D. Peterson and Husheng Li Compressed sensing and Cholesky decomposition on FPGAs and GPUs . . . . 421--437 John R. Wernsing and Greg Stitt Elastic computing: a portable optimization framework for hybrid computers . . . . . . . . . . . . . . . 438--464 Anonymous Editorial Board . . . . . . . . . . . . ??
Basilio B. Fraguela and Ganesh Bikshandi and Jia Guo and María J. Garzarán and David Padua and Christoph von Praun Optimization techniques for efficient HTA programs . . . . . . . . . . . . . . 465--484 Takeshi Iwashita and Yu Hirotani and Takeshi Mifune and Toshio Murayama and Hideki Ohtani Large-scale time-harmonic electromagnetic field analysis using a multigrid solver on a distributed memory parallel computer . . . . . . . . . . . 485--500 Amit Amritkar and Danesh Tafti and Rui Liu and Rick Kufrin and Barbara Chapman OpenMP parallelism for fluid and fluid-particulate systems . . . . . . . 501--517 Wlodzimierz Bielecki and Marek Palkowski and Tomasz Klimek Free scheduling for statement instances of parameterized arbitrarily nested affine loops . . . . . . . . . . . . . . 518--532 Anonymous Editorial Board . . . . . . . . . . . . ??
Yong Chen and Huaiyu Zhu and Hui Jin and Xian-He Sun Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors . . . . . . . . . . . . . . . 533--551 Mickeal Verschoor and Andrei C. Jalba Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs . . . . . . . . . . . . . 552--575 Ümit V. Çatalyürek and John Feo and Assefaw H. Gebremedhin and Mahantesh Halappanavar and Alex Pothen Graph coloring algorithms for multi-core and massively multithreaded architectures . . . . . . . . . . . . . 576--594 Anonymous Editorial Board . . . . . . . . . . . . ??
Madan Sathe and Olaf Schenk and Helmar Burkhart An auction-based weighted matching implementation on massively parallel architectures . . . . . . . . . . . . . 595--614 M. Etinski and J. Corbalan and J. Labarta and M. Valero Parallel job scheduling for power constrained HPC systems . . . . . . . . 615--630 Anonymous Editorial Board . . . . . . . . . . . . ??
Dana A. Jacobsen and Inanc Senocak Multi-level parallelism for incompressible flow computations on GPU clusters . . . . . . . . . . . . . . . . 1--20 Masha Sosonkina and Layne T. Watson and Nicholas R. Radcliffe and Rafael T. Haftka and Michael W. Trosset Adjusting process count on demand for petascale global optimization . . . . . 21--35 Diego Andrade and Basilio B. Fraguela and Ramón Doallo Accurate prediction of the behavior of multithreaded applications in shared caches . . . . . . . . . . . . . . . . . 36--57 Orlando Ayala and Lian-Ping Wang Parallel implementation and scalability analysis of $3$D Fast Fourier Transform using $2$D domain decomposition . . . . 58--77 Anonymous Editorial Board . . . . . . . . . . . . ??
Abhinav Sarje and Srinivas Aluru All-pairs computations on many-core graphics processors . . . . . . . . . . 79--93 Ferit Büyükkeçeci and Omar Awile and Ivo F. Sbalzarini A portable OpenCL implementation of generic particle-mesh and mesh-particle interpolation in $2$D and $3$D . . . . . 94--111 Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous Preface: Infrastructure for scalable tools . . . . . . . . . . . . . . . . . 113--113 Mark W. Krentel Libmonitor: a tool for first-party monitoring . . . . . . . . . . . . . . . 114--119 Nick Rutar and Jeffrey K. Hollingsworth Software techniques for negating skid and approximating cache miss measurements . . . . . . . . . . . . . . 120--131 Marc-André Hermanns and Sriram Krishnamoorthy and Felix Wolf A scalable infrastructure for the performance analysis of passive target synchronization . . . . . . . . . . . . 132--145 Michael O. Lam and Jeffrey K. Hollingsworth and G. W. Stewart Dynamic floating-point cancellation detection . . . . . . . . . . . . . . . 146--155 Barry Rountree and Todd Gamblin and Bronis R. de Supinski and Martin Schulz and David K. Lowenthal and Guy Cobb and Henry Tufo Parallelizing heavyweight debugging tools with \tt mpiecho . . . . . . . . . 156--166 J. D. Goehner and D. C. Arnold and D. H. Ahn and G. L. Lee and B. R. de Supinski and M. P. LeGendre and B. P. Miller and M. Schulz LIBI: a framework for bootstrapping extreme scale software systems . . . . . 167--176 Anonymous Editorial Board . . . . . . . . . . . . ??
Sen Su and Jian Li and Qingjia Huang and Xiao Huang and Kai Shuang and Jie Wang Cost-efficient task scheduling for executing large programs in the cloud 177--188 George Teodoro and Tony Pan and Tahsin M. Kurc and Jun Kong and Lee A. D. Cooper and Joel H. Saltz Efficient irregular wavefront propagation algorithms on hybrid CPU--GPU machines . . . . . . . . . . . 189--211 Jack Dongarra and Mathieu Faverge and Thomas Hérault and Mathias Jacquelin and Julien Langou and Yves Robert Hierarchical QR factorization algorithms for multi-core clusters . . . . . . . . 212--232 Wagner Kolberg and Pedro de B. Marcos and Julio C. S. Anjos and Alexandre K. S. Miyazaki and Claudio R. Geyer and Luciana B. Arantes MRSG --- a MapReduce simulator over SimGrid . . . . . . . . . . . . . . . . 233--244 Anonymous Editorial Board . . . . . . . . . . . . ??
Andrew V. Terekhov A fast parallel algorithm for solving block-tridiagonal systems of linear equations including the domain decomposition method . . . . . . . . . . 245--258 Christian Obrecht and Frédéric Kuznik and Bernard Tourancheau and Jean-Jacques Roux Scalable lattice Boltzmann solvers for CUDA GPU clusters . . . . . . . . . . . 259--270 Yuefan Deng and Peng Zhang and Carlos Marques and Reid Powell and Li Zhang Analysis of Linpack and power efficiencies of the world's TOP500 supercomputers . . . . . . . . . . . . . 271--279 Ichitaro Yamazaki and Hiroto Tadano and Tetsuya Sakurai and Tsutomu Ikegami Performance comparison of parallel eigensolvers based on a contour integral method and a Lanczos method . . . . . . 280--290 Anonymous Editorial Board . . . . . . . . . . . . ??
Yang Wang and Paul Lu DDS: a deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints . . . . . . . . . . . . . . 291--305 A. Sandroos and I. Honkonen and S. von Alfthan and M. Palmroth Multi-GPU simulations of Vlasov's equation using Vlasiator . . . . . . . . 306--318 O. Fortmeier and H. M. Bücker and B. O. Fagginger Auer and R. H. Bisseling A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications . . . . . . . . . . . . . . 319--335 Harald Servat and Germán Llort and Kevin Huck and Judit Giménez and Jesús Labarta Framework for a productive performance optimization . . . . . . . . . . . . . . 336--353 Anonymous Editorial Board . . . . . . . . . . . . ??
Fangyang Shen and Mei Yang and Maurizio Palesi Guest Editors' Introduction to the Special Issue on ``Novel On-Chip Parallel Architectures and Software Support'' . . . . . . . . . . . . . . . 355--356 Sandeep Pande and Fearghal Morgan and Gerard Smit and Tom Bruintjes and Jochem Rutgers and Brian McGinley and Seamus Cawley and Jim Harkin and Liam McDaid Fixed latency on-chip interconnect for hardware spiking neural network architectures . . . . . . . . . . . . . 357--371 Junghee Lee and Chrysostomos Nicopoulos and Hyung Gyu Lee and Jongman Kim Sharded Router: a novel on-chip router architecture employing bandwidth sharding and stealing . . . . . . . . . 372--388 Michael Opoku Agyeman and Ali Ahmadinia and Alireza Shahrabi Efficient routing techniques in heterogeneous $3$D Networks-on-Chip . . 389--407 Xiaohang Wang and Peng Liu and Mei Yang and Yingtao Jiang Avoiding request-request type message-dependent deadlocks in networks-on-chips . . . . . . . . . . . 408--423 Ashkan Beyranvand Nejad and Anca Molnos and Matias Escudero Martinez and Kees Goossens A hardware/software platform for QoS bridging over multi-chip NoC-based systems . . . . . . . . . . . . . . . . 424--441 José M. Andión and Manuel Arenaz and Gabriel Rodríguez and Juan Touriño A novel compiler support for automatic parallelization on multicore systems . . 442--460 Jiyang Yu and Peng Liu and Weidong Wang and Chunming Huang and Jie Yang and Yingtao Jiang and Qingdong Yao An efficient protocol with synchronization accelerator for multi-processor embedded systems . . . . 461--474 Carlos H. González and Basilio B. Fraguela A framework for argument-based task synchronization with automatic detection of dependencies . . . . . . . . . . . . 475--489 Guiyuan Jiang and Jigang Wu and Jizhou Sun Efficient reconfiguration algorithms for communication-aware three-dimensional processor arrays . . . . . . . . . . . . 490--503 Giovanni Mariani and Gianluca Palermo and Vittorio Zaccaria and Cristina Silvano ARTE: an Application-specific Run-Time managEment framework for multi-cores based on queuing models . . . . . . . . 504--519 Jingweijia Tan and Yang Yi and Fangyang Shen and Xin Fu Modeling and characterizing GPGPU reliability in the presence of soft errors . . . . . . . . . . . . . . . . . 520--532 Anonymous Editorial Board . . . . . . . . . . . . ??
Marcin Krotkiewski and Marcin Dabrowski Efficient $3$D stencil computations using CUDA . . . . . . . . . . . . . . . 533--548 J. Joven and A. Marongiu and F. Angiolini and L. Benini and G. De Micheli An integrated, programming model-driven framework for NoC--QoS support in cluster-based embedded many-cores . . . 549--566 Laiping Zhao and Yizhi Ren and Kouichi Sakurai Reliable workflow scheduling with less resource redundancy . . . . . . . . . . 567--585 Libo Huang and Nong Xiao and Zhiying Wang and Yongwen Wang and Mingche Lai Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP . . . . . . . . . . . . . . . . 586--602 Dimitris Saougkos and George Manis Self adaptive run time scheduling for the automatic parallelization of loops with the C2$ \mu $TC/SL compiler . . . . 603--614 Agustín C. Caminero and Antonio Robles-Gómez and Salvador Ros and Roberto Hernández and Llanos Tobarra P2P-based resource discovery in dynamic grids allowing multi-attribute and range queries . . . . . . . . . . . . . . . . 615--637 Xiaoliang Wan and Guang Lin Hybrid parallel computing of minimum action method . . . . . . . . . . . . . 638--651 Anonymous Editorial Board . . . . . . . . . . . . ??
Gregory Tauer and Rakesh Nagi A map-reduce Lagrangian heuristic for multidimensional assignment problems with decomposable costs . . . . . . . . 653--668 G. R. Mudalige and M. B. Giles and J. Thiyagalingam and I. Z. Reguly and C. Bertolli and P. H. J. Kelly and A. E. Trefethen Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems . . . 669--692 Javier Navaridas and Steve Furber and Jim Garside and Xin Jin and Mukaram Khan and David Lester and Mikel Luján and José Miguel-Alonso and Eustace Painkras and Cameron Patterson and Luis A. Plana and Alexander Rast and Dominic Richards and Yebin Shi and Steve Temple and Jian Wu and Shufan Yang SpiNNaker: Fault tolerance in a power- and area- constrained large-scale neuromimetic architecture . . . . . . . 693--708 Hameed Hussain and Saif Ur Rehman Malik and Abdul Hameed and Samee Ullah Khan and Gage Bickler and Nasro Min-Allah and Muhammad Bilal Qureshi and Limin Zhang and Wang Yongji and Nasir Ghani and Joanna Kolodziej and Albert Y. Zomaya and Cheng-Zhong Xu and Pavan Balaji and Abhinav Vishnu and Fredric Pinel and Johnatan E. Pecero and Dzmitry Kliazovich and Pascal Bouvry and Hongxiang Li and Lizhe Wang and Dan Chenm and Ammar Rayes A survey on resource allocation in high performance distributed computing systems . . . . . . . . . . . . . . . . 709--736 Hoang-Vu Dang and Bertil Schmidt CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations . . . . . . . . . . . . . . . 737--750
Yong Chen and Pavan Balaji and Abhinav Vishnu Special issue on programming models, systems software, and tools for High-End Computing . . . . . . . . . . . . . . . 751--752 Wei Tang and Dongxu Ren and Zhiling Lan and Narayan Desai Toward balanced and sustainable job scheduling for production supercomputers 753--768 Mark Gardner and Paul Sathre and Wu-chun Feng and Gabriel Martinez Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator . . . . . . . 769--786 Zhiyi Huang and Kai-Cheung Leung Performance evaluation of View-Oriented Transactional Memory . . . . . . . . . . 787--801 E. J. Otoo and Gideon Nimako and Daniel Ohene-Kwofie Chunked extendible dense arrays for scientific data storage . . . . . . . . 802--818 Shannon Steinfadt Fine-grained parallel implementations for SWAMP+ Smith--Waterman alignment . . 819--833 Jie Shen and Jianbin Fang and Henk Sips and Ana Lucia Varbanescu An application-centric evaluation of OpenCL on multi-core CPUs . . . . . . . 834--850 Hisham Mohamed and Stéphane Marchand-Maillet MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy 851--866 Omer Erdil Albayrak and Ismail Akturk and Ozcan Ozturk Improving application behavior on heterogeneous manycore systems through kernel mapping . . . . . . . . . . . . . 867--878 Alexander Reinefeld and Robert Döbbelin and Thorsten Schütt Analyzing the performance of SMP memory allocators with iterative MapReduce applications . . . . . . . . . . . . . . 879--889
L. Yavits and A. Morad and R. Ginosar The effect of communication and synchronization on Amdahl's law in multicore systems . . . . . . . . . . . 1--16 Lois Curfman McInnes and Barry Smith and Hong Zhang and Richard Tran Mills Hierarchical Krylov and nested Krylov methods for extreme-scale computing . . 17--31
Pavan Balaji and Zhiyi Huang Special issue on programming models and applications for multicores and manycores --- Guest Editors' introduction . . . . . . . . . . . . . . 33--34 Mark Utting and Min-Hsien Weng and John G. Cleary The JStar language philosophy . . . . . 35--50 Weihua Sheng and Stefan Schürmans and Maximilian Odendahl and Mark Bertsch and Vitaliy Volevach and Rainer Leupers and Gerd Ascheid A compiler infrastructure for embedded heterogeneous MPSoCs . . . . . . . . . . 51--68 Vikas and Nasser Giacaman and Oliver Sinnen Multiprocessing with GUI-awareness using OpenMP-like directives in Java . . . . . 69--89 Oded Green and Yitzhak Birk Scheduling directives: Accelerating shared-memory many-core processor execution . . . . . . . . . . . . . . . 90--106 Zhenning Wang and Long Zheng and Quan Chen and Minyi Guo CPU + GPU scheduling with asymptotic profiling . . . . . . . . . . . . . . . 107--115 Yu Liu and Kento Emoto and Zhenjiang Hu A Generate-Test-Aggregate parallel programming library for systematic parallel programming . . . . . . . . . . 116--135 Zhijun Hao and Chenning Xie and Haibo Chen and Binyu Zang X10-FT: Transparent fault tolerance for APGAS language and runtime . . . . . . . 136--156
Mohammad Reza Selim and Mohammed Ziaur Rahman Carrying on the legacy of imperative languages in the future parallel computing era . . . . . . . . . . . . . 1--33 Jean-Yves L'Excellent and Wissam M. Sid-Lakhdar A study of shared-memory parallelism in a multifrontal solver . . . . . . . . . 34--46
Urban Borstnik and Joost VandeVondele and Valéry Weber and Jürg Hutter Sparse matrix multiplication: the distributed block-compressed sparse row library . . . . . . . . . . . . . . . . 47--58 Yuki Sugimoto and Fumihiko Ino and Kenichi Hagihara Improving cache locality for GPU-based volume rendering . . . . . . . . . . . . 59--69 Ray-Bing Chen and Yaohung M. Tsai and Weichung Wang Adaptive block size for dense $ Q R $ factorization in hybrid CPU--GPU systems via statistical modeling . . . . . . . . 70--85 Michael J. Hallock and John E. Stone and Elijah Roberts and Corey Fry and Zaida Luthey-Schulten Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations . . . . . . . . . . . . . . 86--99 Ivan Teixidó and Francesc Sebé and Josep Conde and Francesc Solsona MPI-based implementation of an enhanced algorithm to solve the LPN problem in a memory-constrained environment . . . . . 100--112 Alberto F. Martín and Ruymán Reyes and Rosa M. Badia and Enrique S. Quintana-Ortí Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs . . . . . . . 113--128 Jose A. Pascual and Jose Miguel-Alonso and Jose A. Lozano Application-aware metrics for partition selection in cube-shaped topologies . . 129--139 Robert Hallberg and Alistair Adcroft An order-invariant real-to-integer conversion sum . . . . . . . . . . . . . 140--143 Oscar Peredo and Julián M. Ortiz and José R. Herrero and Cristóbal Samaniego Tuning and hybrid parallelization of a genetic-based multi-point statistics simulation code . . . . . . . . . . . . 144--158 Anonymous Editorial Board . . . . . . . . . . . . IFC
Costas Bekas and Ananth Grama and Yousef Saad and Olaf Schenk Parallel matrix algorithms . . . . . . . 159--160 Robert Andrew and Nicholas Dingle Implementing $ Q R $ factorization updating algorithms on GPUs . . . . . . 161--172 Yiannis Cotronis and Elias Konstantinidis and Maria A. Louka and Nikolaos M. Missirlis A comparison of CPU and GPU implementations for solving the convection diffusion equation using the local modified SOR method . . . . . . . 173--185 T. Auckenthaler and T. Huckle and R. Wittmann A blocked $ Q R $-decomposition for the parallel symmetric eigenvalue problem 186--194 Hasan Metin Aktulga and Lin Lin and Christopher Haine and Esmond G. Ng and Chao Yang Parallel eigenvalue calculation based on multiple shift-invert Lanczos and contour integral based spectral projection method . . . . . . . . . . . 195--212 Marc Baboulin and Dulceneia Becker and George Bosilca and Anthony Danalis and Jack Dongarra An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems . . 213--223 P. Ghysels and W. Vanroose Hiding global synchronization latency in the preconditioned conjugate gradient algorithm . . . . . . . . . . . . . . . 224--238 Erhan Turan and Peter Arbenz Large scale micro finite element analysis of $3$D bone poroelasticity . . 239--250 Michele Martone Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format . . . . . 251--270 L. Karlsson and B. Kågström and E. Wadbro Fine-grained bulge-chasing kernels for strongly scalable parallel $ Q R $ algorithms . . . . . . . . . . . . . . . 271--288 J. Langguth and A. Azad and M. Halappanavar and F. Manne On parallel push-relabel based algorithms for bipartite maximum matching . . . . . . . . . . . . . . . . 289--308 Jesús Cámara and Javier Cuenca and Luis-Pedro García and Domingo Giménez Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems . . . . . . . . 309--327 Emanuel H. Rubensson and Elias Rudberg Chunks and Tasks: a programming model for parallelization of dynamic algorithms . . . . . . . . . . . . . . . 328--343 Anonymous Editorial Board . . . . . . . . . . . . IFC
María Botón-Fernández and Miguel A. Vega-Rodríguez and Francisco Prieto Castrillo Self-adaptivity for grid applications. An Efficient Resources Selection model based on evolutionary computation algorithms . . . . . . . . . . . . . . . 345--361 Chihiro Kodama and Masaaki Terai and Akira T. Noda and Yohei Yamada and Masaki Satoh and Tatsuya Seiki and Shin-ichi Iga and Hisashi Yashiro and Hirofumi Tomita and Kazuo Minami Scalable rank-mapping algorithm for an icosahedral grid system on the massive parallel computer with a $3$-D torus network . . . . . . . . . . . . . . . . 362--373 Angeles Navarro and Rafael Asenjo and Francisco Corbera and Antonio J. Dios and Emilio L. Zapata A case study of different task implementations for multioutput stages in non-trivial parallel pipeline applications . . . . . . . . . . . . . . 374--393 J. Sánchez-Curto and P. Chamorro-Posada and G. S. McDonald Efficient parallel implementation of the nonparaxial beam propagation method . . 394--407 Jie Chen and Tom L. H. Li and Mihai Anitescu A parallel linear solver for multilevel Toeplitz systems with possibly several right-hand sides . . . . . . . . . . . . 408--424 Roman Wyrzykowski and Lukasz Szustak and Krzysztof Rojek Parallelization of $2$D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators . . . . . . . . . . . . 425--447 Anonymous Editorial Board . . . . . . . . . . . . IFC
Joao Andrade and Gabriel Falcao and Vitor Silva Optimized Fast Walsh--Hadamard Transform on GPUs for non-binary LDPC decoding . . 449--453 Ehsan Totoni and Michael T. Heath and Laxmikant V. Kale Structure-adaptive parallel solution of sparse triangular linear systems . . . . 454--470 Diego Arroyuelo and Carolina Bonacic and Veronica Gil-Costa and Mauricio Marin and Gonzalo Navarro Distributed text search using suffix arrays . . . . . . . . . . . . . . . . . 471--495 Yingchong Situ and Chandra S. Martha and Matthew E. Louis and Zhiyuan Li and Ahmed H. Sameh and Gregory A. Blaisdell and Anastasios S. Lyrintzis Petascale large eddy simulation of jet engine noise based on the truncated SPIKE algorithm . . . . . . . . . . . . 496--511 Lucas Mello Schnorr and Philippe Olivier Alexandre Navaux Best of SBAC--PAD 2012 . . . . . . . . . 512--513 Luiz Ramos and Ricardo Bianchini Robust performance in hybrid-memory cooperative caches . . . . . . . . . . . 514--525 Joefon Jann and R. Sarma Burugula and Ching-Farn E. Wu and Kaoutar El Maghraoui Towards an immortal operating system in virtual environments . . . . . . . . . . 526--535 Esteban Meneses and Osman Sarood and Laxmikant V. Kalé Energy profile of rollback-recovery strategies in high performance computing 536--547 Teo Milanez and Sylvain Collange and Fernando Magno Quintão Pereira and Wagner Meira, Jr. and Renato Ferreira Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads . . . . . . . . . . . . . . . 548--558 Anonymous Editorial Board . . . . . . . . . . . . IFC
Li Tan and Shashank Kothapalli and Longxiang Chen and Omar Hussaini and Ryan Bissiri and Zizhong Chen A survey of power and energy efficient techniques for high performance numerical linear algebra operations . . 559--573 Antonio J. Peña and Carlos Reaño and Federico Silla and Rafael Mayo and Enrique S. Quintana-Ortí and José Duato A complete and efficient CUDA-sharing solution for HPC clusters . . . . . . . 574--588 George Teodoro and Tony Pan and Tahsin Kurc and Jun Kong and Lee Cooper and Scott Klasky and Joel Saltz Region templates: Data representation and management for high-throughput image analysis . . . . . . . . . . . . . . . . 589--610 Yizhuo Wang and Yang Zhang and Yan Su and Xiaojun Wang and Xu Chen and Weixing Ji and Feng Shi An adaptive and hierarchical task scheduling scheme for multi-core clusters . . . . . . . . . . . . . . . . 611--627 Andrew White and Soo-Young Lee Derivation of optimal input parameters for minimizing execution time of matrix-based computations on a GPU . . . 628--645 Nicholas Horelik and Andrew Siegel and Benoit Forget and Kord Smith Monte Carlo domain decomposition for robust nuclear reactor analysis . . . . 646--660 Leandro A. J. Marzulo and Tiago A. O. Alves and Felipe M. G. França and Vítor Santos Costa Couillard: Parallel programming via coarse-grained Data-flow Compilation . . 661--680 Philip C. Roth and Yong Chen Guest Editors' introduction to the special issue on ``DISCS-2013'' . . . . 681--681 Jesse Weaver and Vito Giovanni Castellana and Alessandro Morari and Antonino Tumeo and Sumit Purohit and Alan Chappell and David Haglin and Oreste Villa and Sutanay Choudhury and Karen Schuchardt and John Feo Toward a data scalable solution for facilitating discovery of science resources . . . . . . . . . . . . . . . 682--696 Jiangling Yin and Junyao Zhang and Jun Wang and Wu-chun Feng SDAFT: a novel scalable data access framework for parallel BLAST . . . . . . 697--709 Yong Li and Dan Feng and Zhan Shi Heterogeneous-aware cache partitioning: Improving the fairness of shared storage cache . . . . . . . . . . . . . . . . . 710--721 Joong-Yeon Cho and Hyun-Wook Jin and Min Lee and Karsten Schwan Dynamic core affinity for high-performance file upload on Hadoop Distributed File System . . . . . . . . 722--737 P. Coetzee and M. Leeke and S. Jarvis Towards unified secure on- and off-line analytics at scale . . . . . . . . . . . 738--753 Dominique LaSalle and George Karypis MPI for Big Data: New tricks for an old dog . . . . . . . . . . . . . . . . . . 754--767 Lan Vu and Gita Alaghband Novel parallel method for association rule mining on multi-core shared memory systems . . . . . . . . . . . . . . . . 768--785 Anonymous Editorial Board . . . . . . . . . . . . IFC
Saiqin Long and Yuelong Zhao and Wei Chen and Yuanbin Tang A prediction-based dynamic file assignment strategy for parallel file systems . . . . . . . . . . . . . . . . 1--13 Tassadaq Hussain and Amna Haider and Shakaib A. Gursal and Eduard Ayguadé AMC: Advanced Multi-accelerator Controller . . . . . . . . . . . . . . . 14--30 Hugo Rito and João Cachopo Adaptive transaction scheduling for mixed transactional workloads . . . . . 31--49 Ren Xiaoguang and Xu Xinhai and Wang Qian and Chen Juan and Wang Miao and Yang Xuejun GS-DMR: Low-overhead soft error detection scheme for stencil-based computation . . . . . . . . . . . . . . 50--65 Dounia Khaldi and Pierre Jouvelot and Corinne Ancourt Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems . . . . . . . . . . . . . 66--89 Alexandros V. Gerbessiotis Extending the BSP model for multi-core and out-of-core computing: MBSP . . . . 90--102 Anonymous Editorial Board . . . . . . . . . . . . IFC
Miguel A. Vega-Rodríguez and David L. González-Álvarez Parallelism in bioinformatics: a view from different parallelism-based technologies . . . . . . . . . . . . . . 1--3 Michael Bromberger and Fabian Nowak and Wolfgang Karl Combined hardware-software multi-parallel prefiltering on the Convey HC-1 for fast homology detection 4--17 Miquel Orobitg and Fernando Guirado and Fernando Cores and Jordi Llados and Cedric Notredame High performance computing improvements on bioinformatics consistency-based multiple sequence alignment tools . . . 18--34 Sérgio E. D. Dias and Abel J. P. Gomes Triangulating molecular surfaces over a LAN of GPU-enabled computers . . . . . . 35--47 Romain Vasseur and Stéphanie Baud and Luiz Angelo Steffenel and Xavier Vigouroux and Laurent Martiny and Michaël Krajecki and Manuel Dauchez Inverse docking method for new proteins targets identification: a parallel approach . . . . . . . . . . . . . . . . 48--59 Marco Ferretti and Mirto Musci Geometrical motifs search in proteins: a parallel approach . . . . . . . . . . . 60--74 Elmar Peise and Diego Fabregat-Traver and Paolo Bientinesi High performance solutions for big-data GWAS . . . . . . . . . . . . . . . . . . 75--87 Gonzalo Martín and David E. Singh and Maria-Cristina Marinescu and Jesús Carretero Towards efficient large scale epidemiological simulations in EpiGraph 88--102 Anonymous Editorial Board . . . . . . . . . . . . IFC
Daniel Chavarría-Miranda and Ajay Panyala and Wenjing Ma and Adrian Prantl and Sriram Krishnamoorthy Global transformations for legacy parallel applications via structural analysis and rewriting . . . . . . . . . 1--26 Kenli Li and Jing Liu and Lanjun Wan and Shu Yin and Keqin Li A cost-optimal parallel algorithm for the $0$--$1$ knapsack problem and its performance on multicore CPU and GPU implementations . . . . . . . . . . . . 27--42 Matthias Diener and Eduardo H. M. Cruz and Philippe O. A. Navaux and Anselm Busse and Hans-Ulrich Heiß Communication-aware process and thread mapping using online communication detection . . . . . . . . . . . . . . . 43--63 Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Jian Li and Sen Su and Xiang Cheng and Meina Song and Liyu Ma and Jie Wang Cost-efficient coordinated scheduling for leasing cloud resources on hybrid workloads . . . . . . . . . . . . . . . 1--17 Haifeng Wang and Yunpeng Cao Predicting power consumption of GPUs with fuzzy wavelet neural networks . . . 18--36 João V. F. Lima and Thierry Gautier and Vincent Danjean and Bruno Raffin and Nicolas Maillard Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures . . . . . . . . . . . . . 37--52 J. Iverson and C. Kamath and G. Karypis