Last update:
Sat Sep 6 08:12:11 MDT 2025
D. J. Evans Parallel SOR iterative methods . . . . . 3--18
W. Gentzsch Numerical algorithms in computational
fluid dynamics on vector computers . . . 19--33
M. J. Kascic, Jr. Vorton dynamics: a case study of
developing a fluid dynamics model for a
vector processor . . . . . . . . . . . . 35--44
P. N. Swarztrauber FFT algorithms for vector computers . . 45--63
D. Parkinson and
M. Wunderlich A compact algorithm for Gaussian
elimination over GF(2) implemented on
highly parallel computers . . . . . . . 65--73
W. Ronsch Stability aspects in using parallel
algorithms . . . . . . . . . . . . . . . 75--98
F. J. Peters Parallel pivoting algorithms for sparse
symmetric matrices . . . . . . . . . . . 99--110
C. C. Hsiung and
W. Butscher A numerical seismic $3$-D migration
model for vector multiprocessors . . . . 113--120
M. Kratz Vectorized finite-element stiffness
generation: tuning the Noor-Lambiotte
algorithm . . . . . . . . . . . . . . . 121--132
J. J. Dongarra and
R. E. Hiromoto A collection of parallel linear
equations routines for the Denelcor HEP 133--142
D. C. Sorensen Buffering for vector performance on a
pipelined MIMD machine . . . . . . . . . 143--164
M. Bishop The Ultracomputer as a vehicle for
polymer simulations . . . . . . . . . . 165--174
P. Frederickson and
R. Hiromoto and
T. L. Jordan and
B. Smith and
T. Warnock Pseudo-random trees in Monte Carlo . . . 175--180
J. Tappe The minimal average latency of
multiconfigurable pipelines . . . . . . 181--183
J. Tappe Algorithms for pipeline control . . . . 185--188
Robert E. Hiromoto and
Olaf M. Lubeck and
James Moore Experiences with the Denelcor HEP . . . 197--206
Nisheeth R. Patel and
Harry F. Jordan A parallelized point rowwise successive
over-relaxation method on a
multiprocessor . . . . . . . . . . . . . 207--222
Jack J. Dongarra and
Ahmed H. Sameh On some parallel banded system solvers 223--235
T. Axelrod and
P. Dubois and
P. Eltgroth A simulator for MIMD performance
prediction: application to the S-1 MkIIa
multiprocessor . . . . . . . . . . . . . 237--274
Shao-Wen Mai and
D. J. Evans A parallel algorithm for the enumeration
of the spanning trees of a graph . . . . 275--286
Celso Ribeiro Performance evaluation of vector
implementations of combinatorial
algorithms . . . . . . . . . . . . . . . 287--294
F. W. Bobrowicz and
J. E. Lynch and
K. J. Fisher and
J. E. Tabor Vectorized Monte Carlo photon transport 295--305
B. L. Buzbee and
H. J. Raveché Conference on forefronts of large-scale
computational problems . . . . . . . . . 307--315
Ad Emmen International supercomputer applications
symposium . . . . . . . . . . . . . . . 317--319
Iain S. Duff Supercomputers in Europe . . . . . . . . 321--324
Marian Vajtersic Parallel marching Poisson solvers . . . 325--330
Alberto Pettorossi and
Andrzej Skowron Higher-order communications for
concurrent programming . . . . . . . . . 331--336
Ondrej Sýkora VLSI systems for some problems of
computational geometry . . . . . . . . . 337--342
Anonymous Calendar . . . . . . . . . . . . . . . . 343--344
Anonymous Author index to volume 1 (1984) . . . . 345--346
Roger W. Hockney $(r_\infty,\,n_{1/2},\,s_{1/2})$
measurements on the 2-CPU CRAY X-MP . . 1--14
W. Handler Dynamic computer structures for manifold
utilization . . . . . . . . . . . . . . 15--32
U. Meier A parallel partition method for solving
banded systems of linear equations . . . 33--43
Daniel A. Reed and
Merrell L. Patrick Parallel, iterative solution of sparse
linear systems: models and architectures 45--67
J. J. Modi and
J. S. Rollett An algorithm for inverse square-roots 69--71
Nikola K. Kasabov A method for SIMD/MIMD functionally
reconfigurable multimicroprocessor
systems design and parallel data
exchange algorithms . . . . . . . . . . 73--78
Hiroshi Tamura and
Sachio Kamiya and
Takahiro Ishigai FACOM VP-100/200: supercomputers with
ease of use . . . . . . . . . . . . . . 87--107
D. A. Calahan Task granularity studies on a
many-processor CRAY X-MP . . . . . . . . 109--118
R. W. Hockney MIMD computing in the U.S.A.---1984 . . 119--136
J. A. Clausing and
R. Hagstrom and
E. L. Lusk and
R. A. Overbeek A technique for achieving portability
among multiprocessors: Implementation on
the Lemur . . . . . . . . . . . . . . . 137--162
N. C. Kalra and
P. C. P. Bhatt Parallel algorithms for tree traversals 163--171
Wilhelm Oberaigner Parallel algorithms for rounding exact
evaluation of sums of products . . . . . 173--182
G. S. Almasi Overview of parallel processing . . . . 191--203
Garry Rodrigue Inner/outer iterative methods and
numerical Schwarz algorithms . . . . . . 205--218
R. Ohbuchi Overview of parallel processing research
in Japan . . . . . . . . . . . . . . . . 219--228
C. Ghezzi Concurrency in programming languages: a
survey . . . . . . . . . . . . . . . . . 229--241
P. M. Kogge Function-based computing and
parallelism: a review . . . . . . . . . 243--253
Paul O. Frederickson and
Rondall E. Jones and
Brian T. Smith Synchronization and control of parallel
algorithms . . . . . . . . . . . . . . . 255--264
D. D. Gajski and
J. K. Peir Comparison of five multiprocessor
systems . . . . . . . . . . . . . . . . 265--282
S. E. Fahlman Parallel processing in artificial
intelligence . . . . . . . . . . . . . . 283--286
P. C. Treleaven Control-driven, data-driven and
demand-driven computer architecture . . 287--288
Arthur Rizzi Vector coding the finite-volume
procedure for the CYBER 205 . . . . . . 295--312
D. J. Evans and
S. Mai Two parallel algorithms for the convex
hull problem in a two dimensional space 313--326
F. Seutter CEPROL: a cellular programming language 327--333
J. Staunstrup and
J. O. Jespersen and
O. V. Johansen Physical datarepresentation in a
multiprocessor database machine . . . . 335--343
S. A. Williams The transformation of collections of
communicating sequential processes that
represent pipeline configurations . . . 345--351
S. Kutti Taxonomy of parallel processing and
definitions . . . . . . . . . . . . . . 353--359
J. C. Browne Framework for formulation and analysis
of parallel computation structures . . . 1--9
S. G. Akl and
H. Schmeck Systolic sorting in a sequential
input/output environment . . . . . . . . 11--23
J. J. Dongarra and
A. H. Sameh and
D. C. Sorensen Implementation of some concurrent
algorithms for matrix factorization . . 25--34
M. K. Seager Parallelizing conjugate gradient for the
Cray X-MP . . . . . . . . . . . . . . . 35--47
H. A. van der Vorst The performance of FORTRAN
implementations for preconditioned
conjugate gradients on vector computers 49--58
M. Sonnenschein An extension of the language C for
concurrent programming . . . . . . . . . 59--71
O. Axelsson and
V. Eijkhout A note on the vectorization of scalar
recursions . . . . . . . . . . . . . . . 73--83
D. J. Evans and
N. Y. Yousif The parallel neighbor sort and $2$-way
merge algorithm . . . . . . . . . . . . 85--90
Harry F. Jordan Structuring parallel algorithms in an
MIMD, shared memory environment . . . . 93--110
Robert Hiromoto Some issues in parallel processing as
encountered on the Denelcor HEP . . . . 111--127
Tim S. Axelrod Effects of synchronization barriers on
multiprocessor performance . . . . . . . 129--140
M. Goldapp Fast scan-line conversion using
vectorisation . . . . . . . . . . . . . 141--152
Daniel Boley Solving the generalized eigenvalue
problem on a synchronous linear
processor array . . . . . . . . . . . . 153--166
A. Brass and
G. S. Pawley Two- and three-dimensional FFTs on
highly parallel computers . . . . . . . 167--184
B. L. Buzbee A strategy for vectorization . . . . . . 187--192
Iain S. Duff Parallel implementation of multifrontal
schemes . . . . . . . . . . . . . . . . 193--204
U. Meier Two parallel SOR variants of the Schwarz
alternating procedure . . . . . . . . . 205--215
C. B. Yang and
R. C. T. Lee The mapping of $2$-D array processors to
$1$-D array processors . . . . . . . . . 217--229
John P. Shen and
John P. Hayes and
Luigi Ciminiera and
Angelo Serra Fault-tolerance and performance analysis
of beta-networks . . . . . . . . . . . . 231--249
E. Katona A lattice model for cellular (systolic)
algorithms . . . . . . . . . . . . . . . 251--258
V. Faber and
Olaf M. Lubeck and
Andrew B. White, Jr. Superlinear speedup of an efficient
sequential algorithm is not possible . . 259--260
D. Parkinson Parallel efficiency can be greater than
unity . . . . . . . . . . . . . . . . . 261--262
J. J. Modi and
J. S. Rollett Some problems of exploiting a pipeline
processor . . . . . . . . . . . . . . . 263--265
H.-C. Hoppe and
H. Mühlenbein Parallel adaptive full-multigrid methods
on message-based multiprocessors . . . . 269--287
D. J. Evans and
G. M. Megson Romberg integration using systolic
arrays . . . . . . . . . . . . . . . . . 289--304
D. Gannon and
J. Panetta Restructuring SIMPLE for the CHiP
architecture . . . . . . . . . . . . . . 305--326
J. W. H. Liu Computational models and task scheduling
for parallel sparse Cholesky
factorization . . . . . . . . . . . . . 327--342
W. Oed and
O. Lange Modelling, measurement, and simulation
of memory interference in the Cray X-MP 343--358
T. Yuba and
H. Kashiwagi The Japanese national project for new
generation supercomputing systems . . . 1--16
G. C. Fox and
S. W. Otto and
A. J. G. Hey Matrix algorithms on a hypercube. I.
Matrix multiplication . . . . . . . . . 17--31
D. J. Evans and
G. M. Megson Construction of extrapolation tables by
systolic arrays for solving ordinary
differential equations . . . . . . . . . 33--48
W. Ronsch and
H. Strauss Timing results of some internal sorting
algorithms on vector computers . . . . . 49--61
P. Moller-Nielsen and
J. Staunstrup Problem-heap: a paradigm for
multiprocessor algorithms . . . . . . . 63--74
H. Carlisle and
A. Crawford and
S. Sheppard ADA multitasking and the single source
shortest path problem . . . . . . . . . 75--91
I. Parberry Some practical simulations of
impractical parallel computers . . . . . 93--101
E. D. Brooks, III A butterfly processor-memory
interconnection for a vector processing
environment . . . . . . . . . . . . . . 103--110
D. Kamowitz SOR and MGR$(\nu)$ experiments on the
Crystal multicomputer . . . . . . . . . 117--142
M. Louter-Nool Basic linear algebra subprograms (BLAS)
on the CDC Cyber 205 . . . . . . . . . . 143--165
R. Suros and
E. Montagne Optimizing systolic networks by fitting
diagonals . . . . . . . . . . . . . . . 167--174
R. B. Simpson and
A. Yazici An organization of the extrapolation
method for vector processing . . . . . . 175--188
Z. Strakos Effectivity and optimizing of algorithms
and programs on the
host-computer/array-processor system . . 189--207
V. Faber and
O. M. Lubeck and
A. B. White, Jr. Comments on the paper `Parallel
efficiency can be greater than unity' 209--210
R. Janssen A note on superlinear speedup . . . . . 211--213
H. Umeo and
I. Nakatsuka A design of pipeline-interval-optimum
systolic stack . . . . . . . . . . . . . 215--219
M. P. Bekakos and
D. J. Evans A `rotating' and `folding' algorithm
using a two-dimensional `systolic'
communication geometry . . . . . . . . . 221--228
Michael Kaps and
Michael Schlegl A short proof for the existence of the
${WZ}$-factorisation . . . . . . . . . . 229--232
K. H. Cheng and
S. Sahni VLSI systems for band matrix
multiplication . . . . . . . . . . . . . 239--258
C. A. Pogue and
P. Willett Use of text signatures for document
retrieval in a highly parallel
environment . . . . . . . . . . . . . . 259--268
H. Mühlenbein and
M. Gorges-Schleuter and
O. Kramer New solutions to the mapping problem of
parallel systems: the evolution approach 269--279
P. Federickson and
R. Hiromoto and
J. Larson A parallel Monte Carlo transport
algorithm using a pseudo-random tree to
guarantee reproducibility . . . . . . . 281--290
Y. N. Srikant and
P. Shankar A new parallel algorithm for parsing
arithmetic infix expressions . . . . . . 291--304
Guang R. Gao A stability classification method and
its application to pipelined solution of
linear recurrences . . . . . . . . . . . 305--321
Hartmut Schwandt An interval arithmetic method for the
solution of nonlinear systems of
equations on a vector computer . . . . . 323--337
Rami Melhem Parallel Gauss--Jordan elimination for
the solution of dense linear systems . . 339--343
J. Modi and
R. Prager Implementation of bubble sort and the
odd-even transposition sort on a rack of
transputers . . . . . . . . . . . . . . 345--348
W. Gentzsch A fully vectorizable SOR variant . . . . 349--353
Anonymous International Conference on Vector and
Parallel Computing --- Issues in Applied
Research and Development . . . . . . . . ??
Petter E. Bjòrstad A large scale, sparse, secondary
storage, direct linear equation solver
for structural analysis and its
implementation on vector and parallel
architectures . . . . . . . . . . . . . 3--12
E. Clementi and
J. Detrich and
S. Chin and
G. Corongiu and
D. Folsom and
D. Logan and
R. Caltabiano and
A. Carnevali and
J. Helin and
M. Russo and
A. Gnudi and
P. Palamidese Large-scale computations on a scalar,
vector and parallel `supercomputer' . . 13--44
Henk A. van der Vorst Large tridiagonal and block tridiagonal
linear systems on vector and parallel
computers . . . . . . . . . . . . . . . 45--54
Ameet K. Dave and
Iain S. Duff Sparse matrix calculations on the CRAY-2 55--64
Eleanor Chu and
Alan George Gaussian elimination with partial
pivoting and load balancing on a
multiprocessor . . . . . . . . . . . . . 65--74
D. Parkinson Organisational aspects of using parallel
computers . . . . . . . . . . . . . . . 75--83
Alan George and
Michael T. Heath and
Esmond Ng and
Joseph Liu Symbolic Cholesky factorization on a
local-memory multiprocessor . . . . . . 85--95
R. W. Hockney Parametrization of computer performance 97--103
M. Itoh and
K. Uchida Trends in Fujitsu large scale computer
technology . . . . . . . . . . . . . . . 105--115
Oliver A. McBryan and
Eric F. Van de Velde Matrix and vector operations on
hypercube parallel processors . . . . . 117--125
Dianne P. O'Leary Parallel implementation of the block
conjugate gradient algorithm . . . . . . 127--139
Catherine E. Houstis and
Elias N. Houstis and
John R. Rice Partitioning PDE computations: methods
and performance evaluation . . . . . . . 141--163
William D. Gropp Solving PDEs on loosely-coupled parallel
processors . . . . . . . . . . . . . . . 165--173
J. J. Dongarra and
D. C. Sorensen A portable environment for developing
parallel FORTRAN programs . . . . . . . 175--186
G. W. Stewart A parallel implementation of the
$QR$-algorithm . . . . . . . . . . . . . 187--196
Paul N. Swarztrauber Multiprocessor FFTs . . . . . . . . . . 197--210
Merrell L. Patrick and
Daniel A. Reed and
Robert G. Voigt The impact of domain partitioning on the
performance of a shared memory
multiprocessor . . . . . . . . . . . . . 211--217
Jack J. Dongarra and
Lennart Johnsson Solving banded systems on a parallel
processor . . . . . . . . . . . . . . . 219--246
T. Watanabe Architecture and performance of NEC
supercomputer SX system . . . . . . . . 247--255
Ruth Gonzalez and
Mary Fanett Wheeler Domain decomposition for elliptic
partial differential equations with
Neumann boundary conditions . . . . . . 257--263
Gérard Meurant Multitasking the conjugate gradient
method on the CRAY X-MP/48 . . . . . . . 267--280
Alan H. Karp and
John Greenstadt An improved parallel Jacobi method for
diagonalizing a symmetric matrix . . . . 281--294
Nikolaos M. Missirlis Scheduling parallel iterative methods on
multiprocessor systems . . . . . . . . . 295--302
Henk A. van der Vorst Analysis of a parallel solution method
for tridiagonal linear systems . . . . . 303--311
Jian Ping Shao and
Li Shan Kang An asynchronous parallel mixed algorithm
for linear and nonlinear equations . . . 313--321
M. A. de Bruijn EPS: an `elementary' programming system
for the Delft Parallel Processor . . . . 323--337
Piyush Mehrotra and
John Van Rosendale The BLAZE language: a parallel language
for scientific programming . . . . . . . 339--361
T. Hoshino and
T. Shirakawa and
K. Tsuboi Mesh-connected parallel computer PAX for
scientific applications . . . . . . . . 363--371
I. Stojmenovic and
D. J. Evans Comments on two parallel algorithms for
the planar convex hull problem . . . . . 373--375
H. P. Zima and
H.-J. Bast and
M. Gerndt SUPERB: a tool for semi-automatic
MIMD/SIMD parallelization . . . . . . . 1--18
Joel H. Saltz and
Vijay K. Naik Towards developing robust algorithms for
solving partial differential equations
on MIMD machines . . . . . . . . . . . . 19--44
Stavros A. Zenios and
John M. Mulvey A distributed algorithm for convex
network optimization problems . . . . . 45--56
M. Zubair and
B. B. Maden Efficient systolic algorithm for finding
bridges in a connected graph . . . . . . 57--61
David L. Cochrane and
Donald G. Truhlar Strategies and performance norms for
efficient utilization of vector pipeline
computers as illustrated by the
classical mechanical simulation of
rotationally inelastic collisions . . . 63--85
Zahari Zlatev Treatment of some mathematical models
describing long-range transport of air
pollutants on vector processors . . . . 87--98
Clive Temperton Implementation of a prime factor FFT
algorithm on CRAY-1 . . . . . . . . . . 99--108
Charles H. Romine and
James M. Ortega Parallel solution of triangular systems
of equations . . . . . . . . . . . . . . 109--114
P. Carnevali Timing results of some internal sorting
algorithms on the IBM 3090 . . . . . . . 115--117
D. J. Evans and
K. Margaritis Optical processing of banded matrix
algorithms using outer product concepts 119--125
F. A. Lootsma and
K. M. Ragsdell State-of-the-art in parallel nonlinear
optimization . . . . . . . . . . . . . . 133--155
D. J. Silvester Optimising finite element matrix
calculations using the general technique
of element vectorisation . . . . . . . . 157--164
Rami Melhem Parallel solution of linear systems with
striped sparse matrices . . . . . . . . 165--184
Willi Schönauer and
Eric Schnepf FIDISOL: a ``black box'' solver for
partial differential equations . . . . . 185--193
Yau Shu Wong Solving large elliptic difference
equations on CYBER 205 . . . . . . . . . 195--207
T. Asano and
H. Umeo Systolic algorithms for computing the
visibility polygon and triangulation of
a polygonal region . . . . . . . . . . . 209--216
Mike Ashworth and
Andrew G. Lyne A segmented FFT algorithm for vector
computers . . . . . . . . . . . . . . . 217--224
R. M. Chamberlain Gray codes, fast Fourier transforms and
hypercubes . . . . . . . . . . . . . . . 225--233
E. D. Brooks, III The shared memory hypercube . . . . . . 235--245
C. R. Askew and
D. B. Carpenter and
J. T. Chalker and
A. J. G. Hey and
M. Moore and
D. A. Nicole and
D. J. Pritchard Monte Carlo simulation on transputer
arrays . . . . . . . . . . . . . . . . . 247--258
M. Hatzopoulos and
D. J. Evans Comments on the paper: ``A short proof
for the existence of the
WZ-factorisation'' [Parallel Comput. \bf
4 (1987), no. 2, 229--232, MR
88j:65064a] by M. Kaps and M. Schlegl 259--259
William L. Briggs and
Thomas Turnbull Fast Poisson solvers for MIMD computers 265--274
M. Cosnard and
M. Marrakchi and
Y. Robert and
D. Trystram Parallel Gaussian elimination on an MIMD
computer . . . . . . . . . . . . . . . . 275--296
H. Y. Chang and
S. Utku and
M. Salama and
D. Rapp A parallel Householder
tridiagonalization strategem using
scattered square decomposition . . . . . 297--311
D. J. Evans and
Jian Ping Shao and
Li Shan Kang The convergence factor of the parallel
Schwarz overrelaxation method for linear
systems . . . . . . . . . . . . . . . . 313--324
B. W. Glickfeld and
R. A. Overbeek Geometric specification of scheduling
constraints: a simplified approach to
multiprocessing . . . . . . . . . . . . 325--337
E. D. Brooks, III The indirect $k$-ary $n$-cube for a
vector processing environment . . . . . 339--348
M. J. Quinn Parallel sorting algorithms for tightly
coupled multiprocessors . . . . . . . . 349--357
Robert A. Wagner and
Merrell L. Patrick A sparse matrix algorithm on the Boolean
vector machine . . . . . . . . . . . . . 359--371
U. Harms and
H. Luttermann Experiences in benchmarking the three
supercomputers CRAY-1M, CRAY-X/MP,
FUJITSU VP-200 compared with the CYBER
76 . . . . . . . . . . . . . . . . . . . 373--382
S. C. Kak A two-layered mesh array for matrix
multiplication . . . . . . . . . . . . . 383--385
D. A. Poplawski Mapping rings and grids onto the FPS
T-Series hypercube . . . . . . . . . . . 1--10
F. Darema and
D. A. George and
V. A. Norton and
G. F. Pfister A single-program-multiple-data
computational model for EPEX/FORTRAN . . 11--24
Manfred Kunde and
Hans-Werner Lang and
Manfred Schimmler and
Hartmut Schmeck and
Heiko Schröder The instruction systolic array and its
relation to other models of parallel
computers . . . . . . . . . . . . . . . 25--39
F. C. Kampe and
T. M. Nguyen Performance comparison of the Cray-2 and
Cray X-MP on a class of seismic data
processing algorithms . . . . . . . . . 41--53
B. Steffen Implementation of a resonant cavity
package on MIMD computers . . . . . . . 55--63
H. Mühlenbein and
M. Gorges-Schleuter and
O. Kramer Evolution algorithms in combinatorial
optimization . . . . . . . . . . . . . . 65--85
Peter H. Michielse and
Henk A. van der Vorst Data transport in Wang's partition
method . . . . . . . . . . . . . . . . . 87--95
Mark Goldmann Vectorisation of the multiple shooting
method for the nonlinear boundary value
problem in ordinary differential
equations . . . . . . . . . . . . . . . 97--110
D. J. Evans and
M. P. Bekakos The solution of linear systems by the
QIF algorithm on a wavefront array
processor . . . . . . . . . . . . . . . 111--130
J. M. Ortega The $ijk$ forms of factorization
methods. I. Vector computers . . . . . . 135--147
J. M. Ortega and
C. H. Romine The $ijk$ forms of factorization
methods. II. Parallel systems . . . . . 149--162
J. T. Feo An analysis of the computational and
parallel complexity of the Livermore
loops . . . . . . . . . . . . . . . . . 163--185
R. G. Babb and
L. Storc and
R. Hiromoto Developing a parallel Monte Carlo
transport algorithm using large-grain
data flow . . . . . . . . . . . . . . . 187--198
Earl Zmijewski and
John R. Gilbert A parallel algorithm for sparse symbolic
Cholesky factorization on a
multiprocessor . . . . . . . . . . . . . 199--210
J. A. Kapenga and
E. de Doncker A parallelization of adaptive task
partitioning algorithms . . . . . . . . 211--225
J. L. Gaudiot and
J. I. Pi and
M. L. Campbell Program graph allocation in distributed
multicomputers . . . . . . . . . . . . . 227--247
I. Stojmenovic and
M. Miyakawa An optimal parallel algorithm for
solving the maximal elements problem in
the plane . . . . . . . . . . . . . . . 249--251
Yves Robert and
Denis Trystram Comments on scheduling parallel
iterative methods on multiprocessor
systems . . . . . . . . . . . . . . . . 253--255
Anonymous 2nd International SUPRENUM Colloquium ??
K. Solchenbach and
U. Trottenberg SUPRENUM: system essentials and grid
applications . . . . . . . . . . . . . . 265--281
W. K. Giloi SUPRENUM: a trendsetter in modern
supercomputer development . . . . . . . 283--296
K. Peinze The SUPRENUM preprototype: status and
experiences . . . . . . . . . . . . . . 297--313
H. Kammer The SUPRENUM vector floating-point unit 315--323
W. Schroder PEACE: the distributed SUPRENUM
operating system . . . . . . . . . . . . 325--333
G. Schaffler Connecting PEACE to UNIX . . . . . . . . 335--339
K. Solchenbach Grid applications on distributed memory
architectures: implementation and
evaluation . . . . . . . . . . . . . . . 341--356
O. Kolp and
H. Mierendorff Performance estimations for SUPRENUM
systems . . . . . . . . . . . . . . . . 357--366
M. D. Ercegovac Heterogeneity in supercomputer
architectures . . . . . . . . . . . . . 367--372
F. Hossfeld Vector-supercomputers . . . . . . . . . 373--385
U. Kremer and
H.-J. Bast and
M. Gerndt and
H. P. Zima Advanced tools and techniques for
automatic parallelization . . . . . . . 387--393
L. Lehmann and
F. Hopfl A model of distributed recovery for the
SUPRENUM multiprocessor . . . . . . . . 395--401
B. Franke and
R. Harneit and
A. Kern and
H. C. Zeidler The pipeline bus: an interconnection
network for multiprocessor systems . . . 403--412
W. Ronsch and
H. Strauss A linear algebra package for a local
memory multiprocessor: problems,
proposals and solutions . . . . . . . . 413--418
I. Gutheil SUPRENUM software for the symmetric
eigenvalue problem . . . . . . . . . . . 419--424
U. Herzog Performance evaluation principles for
vector- and multiprocessor systems . . . 425--438
R. Williams Free-Lagrange hydrodynamics with a
distributed-memory parallel processor 439--443
D. Seldner and
M. Alef and
T. Westermann and
E. Halter Parallel particle simulation in high
voltage diodes (algorithms and concepts
for implementation on SUPRENUM) . . . . 445--449
H. Capdevila Solution of $2$-D Euler equations with a
parallel code . . . . . . . . . . . . . 451--460
J. Linden and
B. Steckel and
K. Stuben Parallel multigrid solution of the
Navier--Stokes equations on general 2D
domains . . . . . . . . . . . . . . . . 461--475
O. A. McBryan New architectures: performance
highlights and new algorithms . . . . . 477--499
Anonymous International Conference on Vector and
Parallel Processors in Computational
Science III . . . . . . . . . . . . . . ??
A. Kashko and
H. Buxton and
B. F. Buxton and
D. A. Castelow Parallel matching and reconstruction
algorithms in computer vision . . . . . 3--17
C. Jesshope Transputers and switches as objects in
OCCAM . . . . . . . . . . . . . . . . . 19--30
H. F. Jordan Programming language concepts for
multiprocessors . . . . . . . . . . . . 31--40
J. J. Dongarra and
D. C. Sorensen and
K. Connolly and
J. Patterson Programming methodology and performance
issues for advanced computer
architectures . . . . . . . . . . . . . 41--58
P. C. Treleaven Parallel architecture overview . . . . . 59--70
B. M. Forrest and
D. Roweth and
N. Stroud and
D. J. Wallace and
G. V. Wilson Neural network models . . . . . . . . . 71--83
R. G. Babb, II and
L. Storc and
P. G. Eltgroth Parallelization schemes for $2$-D
hydrodynamics codes using the
independent time step method . . . . . . 85--89
K. Miura and
R. G. Babb, II Tradeoffs in granularity and
parallelization for a Monte Carlo shower
simulation code . . . . . . . . . . . . 91--100
C. F. Baillie Comparing shared and distributed memory
computers . . . . . . . . . . . . . . . 101--110
Thomas Brandes Determination of dependencies in a
knowledge-based parallelization tool . . 111--119
G. Carver A spectral meteorological method on the
ICL DAP . . . . . . . . . . . . . . . . 121--126
M. Clint and
D. Roantree and
A. Stewart Towards the construction of an
eigenvalue engine . . . . . . . . . . . 127--132
A. Corona and
C. Martini and
M. Morando and
S. Ridella and
C. Rolando Solving linear equation systems on
vector computers with maximum efficiency 133--139
D. Crookes and
P. J. Morrow and
P. Milligan and
P. L. Kilpatrick and
N. S. Scott An array processing language for
transputer networks . . . . . . . . . . 141--148
D. Dent and
M. O'Neill Microtasking as a complement to
macrotasking . . . . . . . . . . . . . . 149--154
Peter G. Eltgroth and
Mark K. Seager The sub-implicit method: new
multiprocessor algorithms for old
implicit codes . . . . . . . . . . . . . 155--163
R. Francis and
I. Mathieson Synchronised execution on shared memory
multiprocessors . . . . . . . . . . . . 165--175
R. Gurke The approximate solution of the
Euclidean traveling salesman problem on
a CRAY X-MP . . . . . . . . . . . . . . 177--183
A. Inoue and
A. Maeda The architecture of a multi-vector
processor system, VVP . . . . . . . . . 185--193
T. Legendi and
E. Katona and
J. Toth and
A. Zsoter Megacell machine . . . . . . . . . . . . 195--199
H. Mühlenbein and
O. Kramer and
F. Limburger and
M. Mevenkamp and
S. Streitz MUPPET: a programming environment for
message-based multiprocessors . . . . . 201--221
W. E. Nagel Using multiple CPUs for problem solving:
experiences in multitasking on the CRAY
X-MP/48 . . . . . . . . . . . . . . . . 223--230
S. Katz and
W. A. Ray and
G. Walder Multiprocessor software for the
CYBERPLUS high performance system . . . 231--244
J. B. G. Roberts and
J. G. Harp and
B. C. Merrifield and
K. J. Palmer and
P. Simpson and
J. S. Ward and
H. C. Webber Evaluating parallel processors for
real-time applications . . . . . . . . . 245--254
D. F. Snelling and
G.-R. Hoffmann A comparative study of libraries for
parallel processing . . . . . . . . . . 255--266
D. A. Tanqueray and
D. F. Snelling A distributed self-scheduler for
partially ordered tasks . . . . . . . . 267--273
R. Wait Partitioning and preconditioning of
finite element matrices on the DAP . . . 275--284
H. J. Wasserman and
M. L. Simmons and
O. M. Lubeck The performance of minisupercomputers:
Alliant FX/8, Convex C-1, and SCS-40 . . 285--293
A. T. Brint and
V. J. Gillet and
M. F. Lynch and
P. Willett and
G. A. Manson and
G. A. Wilson Chemical graph matching using transputer
networks . . . . . . . . . . . . . . . . 295--300
Z. Zlatev and
Phuong Vu and
J. Wasniewski and
K. Schaumburg Computations with symmetric, positive
definite and band matrices on a parallel
vector processor . . . . . . . . . . . . 301--312
J. Berntsen and
T. O. Espelid A parallel global adaptive quadrature
algorithm for hypercubes . . . . . . . . 313--323
R. Wait and
N. G. Brown Overlapping block methods for solving
tridiagonal systems on transputer arrays 325--333
A. J. Davies The boundary element method on the ICL
DAP . . . . . . . . . . . . . . . . . . 335--343
J. J. Du Croz and
P. J. D. Mayes and
J. Wasniewski and
S. Wilson Applications of Level 2 BLAS in the NAG
library . . . . . . . . . . . . . . . . 345--350
C. H. Lai and
H. M. Liddell Finite elements using long vectors of
the DAP . . . . . . . . . . . . . . . . 351--361
A. McKerrell and
L. M. Delves Monte Carlo simulation of neutron
diffusion on SIMD architectures . . . . 363--370
R. Reuter Solving tridiagonal systems of linear
equations on the IBM 3090 VF . . . . . . 371--376
G. Radicati and
Y. Robert and
P. Sguazzero Dense linear systems FORTRAN solvers on
the IBM 3090 vector multiprocessor . . . 377--384
C. Froese Fischer and
N. S. Scott and
J. Yoo Multitasking the calculation of angular
integrals on the CRAY-2 and CRAY X-MP 385--390
H. Finnemann and
J. Brehm and
E. Michel and
J. Volkert Solution of the neutron diffusion
equation through multigrid methods
implemented on a memory-coupled
25-processor system . . . . . . . . . . 391--398
C. A. Pogue and
E. M. Rasmussen and
P. Willett Searching and clustering of databases
using the ICL distributed array
processor . . . . . . . . . . . . . . . 399--407
D. F. Snelling Standard FORTRAN 77 as a parallel
language . . . . . . . . . . . . . . . . 409--414
O. A. McBryan The Connection Machine: PDE solution on
65536 processors . . . . . . . . . . . . 1--24
O. Brewer and
J. Dongarra and
D. Sorensen Tools to aid in the analysis of memory
access patterns for FORTRAN programs . . 25--35
O. M. Lubeck and
V. Faber Modeling the performance of hypercubes:
a case study using the particle-in-cell
application . . . . . . . . . . . . . . 37--52
T. Hoshino and
R. Hiromoto and
S. Sekiguchi and
S. Majima Mapping schemes of the particle-in-cell
method implemented on the PAX computer 53--75
Dieter Müller-Wichards Performance estimates for applications:
an algebraic framework . . . . . . . . . 77--106
W. Gentzsch and
F. Szelenyi and
V. Zecca Use of parallel FORTRAN for engineering
problems on the IBM 3090 vector
multiprocessor . . . . . . . . . . . . . 107--115
Emile H. L. Aarts and
Jan H. M. Korst Computations in massively parallel
networks based on the Boltzmann machine:
a review . . . . . . . . . . . . . . . . 129--145
J. K. Annot A deadlock free and starvation free
network of packet switching
communication processors . . . . . . . . 147--162
H. P. Barendregt and
M. C. J. D. Van Eekelen and
M. J. Plasmeijer and
J. R. W. Glauert and
J. R. Kennaway and
M. R. Sleep LEAN: an intermediate language based on
graph rewriting . . . . . . . . . . . . 163--177
D. I. Bevan An efficient reference counting solution
to the distributed garbage collection
problem . . . . . . . . . . . . . . . . 179--192
W. Damm and
G. Dohmen Specifying distributed computer
architectures in AADL . . . . . . . . . 193--211
O. Krämer and
H. Mühlenbein Mapping strategies in message-based
multiprocessor systems . . . . . . . . . 213--225
A. R. Martin and
J. V. Tucker The concurrent assignment representation
of synchronous systems . . . . . . . . . 227--256
P. H. Welch Emulating digital logic using transputer
networks (very high
parallelism=simplicity=performance) . . 257--272
R. Hockney Synchronization and communication
overheads on the LCAP multiple FPS-164
computer system . . . . . . . . . . . . 279--290
Chandrika Kamath and
Ahmed Sameh A projection method for solving
nonsymmetric linear systems on
multiprocessors . . . . . . . . . . . . 291--312
Robert A. Wagner Parallel solution of arbitrarily sparse
linear systems . . . . . . . . . . . . . 313--331
Loyce M. Adams and
Elizabeth G. Ong Additive polynomial preconditioners for
parallel computers . . . . . . . . . . . 333--345
Israel Gottlieb The partitioning of QSDF computation
graphs . . . . . . . . . . . . . . . . . 347--358
Ilio Galligani and
Valeria Ruggiero Solving large systems of linear ordinary
differential equations on a vector
computer . . . . . . . . . . . . . . . . 359--365
M. Bessenrodt-Weberpals and
H. Weberpals A fast vector algorithm for solving
tridiagonal linear equations . . . . . . 367--372
D. J. Evans and
K. Margaritis and
M. P. Bekakos Systolic and holographic pyramidical
soft-systolic designs for successive
matrix powers . . . . . . . . . . . . . 373--384
M. Cosnard and
A. G. Ferreira and
H. Herbelin The two list algorithm for the knapsack
problem on a FPS T20 . . . . . . . . . . 385--388
A. Greenbaum Synchronization costs on multiprocessors 3--14
Th. Ruppelt and
G. Wirtz Automatic transformation of high-level
object-oriented specifications into
parallel programs . . . . . . . . . . . 15--28
C. McCrosky Realizing the parallelism of array-based
computation . . . . . . . . . . . . . . 29--43
Y. Wolfstahl Mapping parallel programs to
multiprocessors: a dynamic approach . . 45--50
J. Gary and
L. Fosdick An optimizing precompiler for
finite-difference computations on a
vector computer . . . . . . . . . . . . 51--64
J.-F. Hake and
W. Homberg Linear algebra software on a vector
computer . . . . . . . . . . . . . . . . 65--81
Aydin Üresin and
Michel Dubois Sufficient conditions for the
convergence of asynchronous iterations 83--92
V. Eijkhout and
P. Vassilevski Positive definiteness aspects of
vectorizable preconditioners . . . . . . 93--100
Susumu Horiguchi and
Willard L. Miranker A parallel algorithm for finding the
maximum value . . . . . . . . . . . . . 101--108
Kam-Hoi Cheng and
Sartaj Sahni A new VLSI system for adaptive recursive
filtering . . . . . . . . . . . . . . . 109--115
Michel Cosnard and
Maurice Tchuente and
Bernard Tourancheau Systolic Gauss--Jordan elimination for
dense linear systems . . . . . . . . . . 117--122
H. M. Amman Nonlinear control simulation on a vector
machine . . . . . . . . . . . . . . . . 123--127
Nigel Dodd Graph matching by stochastic
optimisation applied to the
implementation of multi layer
perceptrons on transputer networks . . . 135--142
E. Gallopoulos and
Y. Saad A parallel block cyclic reduction
algorithm for the fast solution of
elliptic equations . . . . . . . . . . . 143--159
Wolfgang Pelz and
Layne T. Watson Message length effects for solving
polynomial systems on a hypercube . . . 161--176
Michael R. Leuze Independent set orderings for parallel
matrix factorization by Gaussian
elimination . . . . . . . . . . . . . . 177--191
D. J. Evans and
A. M. S. Rahma The numerical solution of Fredholm
integral equations on parallel computers 193--205
G. M. Megson and
D. J. Evans Algorithmic fault tolerance for matrix
operations on triangular arrays . . . . 207--219
S. Storoy Holistic algorithms: a paradigm for
multiprocessor programming . . . . . . . 221--229
Ferng-Ching Lin and
R. Charng Pin reduction through variable
duplications and substitutions in a data
dependence graph . . . . . . . . . . . . 231--238
John M. Conroy A note on the parallel Cholesky
factorization of wide banded matrices 239--246
M. Cosnard and
Y. Robert and
B. Tourancheau Evaluating speedups on distributed
memory architectures . . . . . . . . . . 247--253
J. J. Hack On the promise of general-purpose
parallel computing . . . . . . . . . . . 261--275
R. W. Hockney and
I. J. Curington $f_{1/2}$: a parameter to characterize
memory and communication bottlenecks . . 277--286
Alan George and
Joseph W. H. Liu and
Esmond Ng Communication results for parallel
sparse Cholesky factorization on a
hypercube . . . . . . . . . . . . . . . 287--298
R. M. Hyatt and
B. W. Suter and
H. L. Nelson A parallel alpha/beta tree searching
algorithm . . . . . . . . . . . . . . . 299--308
Tao Li Parallel implementation of rule-based
expert systems for interactive
applications . . . . . . . . . . . . . . 309--318
M. Malek and
E. Opper The cylindrical banyan multicomputer: a
reconfigurable systolic architecture . . 319--327
C. Holt and
A. Stewart A parallel thinning algorithm with fine
grain subtasking . . . . . . . . . . . . 329--334
James R. A. Allwright and
D. B. Carpenter A distributed implementation of
simulated annealing for the travelling
salesman problem . . . . . . . . . . . . 335--338
G. de Biase and
P. Ciucci and
M. Cottone Vectorized algorithms for astronomical
image processing . . . . . . . . . . . . 339--346
Jong-Chuang Tsay and
Yodung-Chang Hou Generating function and equivalent
transformation for systolic arrays . . . 347--356
M. P. Bekakos and
D. J. Evans Relative performance comparisons for the
group explicit class of methods on MIMD,
SIMD and pipelined vector computers . . 357--364
K. Ohmaki and
S. Tomura and
K. Inoue and
T. Ito and
K. Ito and
K. Torii TERM: a parallel executable graph
reduction machine for equational
language . . . . . . . . . . . . . . . . 1--16
M. E. Henderson and
W. L. Miranker Synergy in parallel algorithms . . . . . 17--35
A. T. Chronopoulos and
C. W. Gear On the efficient implementation of
preconditioned $s$-step conjugate
gradient methods on multiprocessors with
memory hierarchy . . . . . . . . . . . . 37--53
Eleanor Chu and
Alan George $QR$ factorization of a dense matrix on
a shared-memory multiprocessor . . . . . 55--71
Joseph W. H. Liu Reordering sparse matrices for parallel
elimination . . . . . . . . . . . . . . 73--91
David A. Carlson and
Binay Sugla Adapting shuffle-exchange like parallel
processing organizations to work as
systolic arrays . . . . . . . . . . . . 93--106
C. Temperton Further measurements of
$(r_\infty,n_{1/2})$ on the CRAY-1 and
CRAY X-MP . . . . . . . . . . . . . . . 107--111
C. Lecot An algorithm for generating low
discrepancy sequences on vector
computers . . . . . . . . . . . . . . . 113--116
J. Moscinski and
Z. A. Rycerz and
P. W. M. Jacobs Timing results of some internal sorting
algorithms on the ETA 10-P . . . . . . . 117--119
J. M. Troya and
M. Ortega A study of parallel branch-and-bound
algorithms with best-bound-first search 121--126
Youcef Saad and
Martin H. Schultz Data communication in parallel
architectures . . . . . . . . . . . . . 131--150
A. M. Frieze and
J. Yadegar and
S. El-Horbaty and
D. Parkinson Algorithms for assignment problems on an
array processor . . . . . . . . . . . . 151--162
E. Adamides and
Ph. Tsalides and
A. Thanailakis Synchronization of asynchronous
concurrent processes using cellular
automata . . . . . . . . . . . . . . . . 163--169
Christian H. Bischof Computing the singular value
decomposition on a distributed system of
vector processors . . . . . . . . . . . 171--186
J. P. Bonomo and
W. R. Dyksen Pipelined iterative methods for shared
memory machines . . . . . . . . . . . . 187--199
Gita Alaghband Parallel pivoting combined with parallel
reduction and fill-in control . . . . . 201--221
G. Radicati di Brozolo and
Y. Robert Parallel conjugate gradient-like
algorithms for solving sparse
nonsymmetric linear systems on a vector
multiprocessor . . . . . . . . . . . . . 223--239
Stanley C. Eisenstat Comments on scheduling parallel
iterative methods on multiprocessor
systems. II . . . . . . . . . . . . . . 241--244
D. J. Evans and
B. B. Sanugi A parallel Runge--Kutta integration
method . . . . . . . . . . . . . . . . . 245--251
David E. Womble and
Richard C. Allen, Jr. and
Lorraine S. Baca Invariant imbedding and the method of
lines for parallel computers . . . . . . 263--273
Xiaobo Li and
Zhi Xi Fang Parallel clustering algorithms . . . . . 275--290
E. L. Zapatam and
F. F. Rivera and
O. G. Plata and
M. A. Ismail Parallel fuzzy clustering on fixed size
hypercube SIMD computers . . . . . . . . 291--303
D. W. Lozier and
R. G. Rehm Some performance comparisons for a fluid
dynamics code . . . . . . . . . . . . . 305--320
G. S. Pawley and
C. F. Baillie and
E. Tenenbaum and
W. Celmaster The BBN Butterfly used to simulate a
molecular liquid . . . . . . . . . . . . 321--329
J. Glasgow and
M. Jenkins and
H. Meijer and
C. McCrosky Expressing parallel algorithms in Nial 331--347
V. K. Murthy and
H. Schröder Systolic arrays for parallel matrix
$g$-inversion and finding Petri net
invariants . . . . . . . . . . . . . . . 349--359
W. Ewinger and
O. Haan and
E. Haupenthal and
C. Siemers Modelling and measurement of memory
access in SIEMENS VP supercomputers . . 361--365
I.-C. Chang Jou Linear rotation based algorithm and
systolic architecture for solving linear
system equations . . . . . . . . . . . . 367--379
Jang-Ping Sheu and
Chun-lien Wu and
Gen-Huey Chen Selection of the first k largest
processes in hypercubes . . . . . . . . 381--384
D. J. Evans A systolic design for the Aitken
extrapolation formula . . . . . . . . . 385--388
Horace P. Flatt and
Ken Kennedy Performance of parallel processors . . . 1--20
L. Brochard Efficiency of some parallel numerical
algorithms on distributed systems . . . 21--44
R. S. Barr and
R. V. Helgaon and
J. L. Kennington Minimal spanning trees: an empirical
investigation of parallel algorithms . . 45--52
Kam Hoi Cheng and
S. Sahni VLSI architectures for back substitution 53--69
Hussein M. Alnuweiri and
V. K. Prasanna Kumar An efficient VLSI architecture with
applications to geometric problems . . . 71--93
E. Babolian and
L. M. Delves Parallel solution of Fredholm integral
equations . . . . . . . . . . . . . . . 95--106
Manfred Schimmler and
Heiko Schröder A simple systolic method to find all
bridges of an undirected graph . . . . . 107--111
H. Bohr and
K. S. Jensen and
T. Petersen and
B. Rathjen and
E. Mosekilde and
N.-H. Holstein-Rathlou Parallel computer simulation of
nearest-neighbour interaction in a
system of nephrons . . . . . . . . . . . 113--120
David J. Evans and
Ivan Stojmenovi\'c On parallel computation of Vorono\u\i
diagrams . . . . . . . . . . . . . . . . 121--125
L. Hart and
S. McCormick Asynchronous multilevel adaptive methods
for solving partial differential
equations on multiprocessors: basic
ideas . . . . . . . . . . . . . . . . . 131--144
S. McCormick and
D. Quinlan Asynchronous multilevel adaptive methods
for solving partial differential
equations on multiprocessors:
performance results . . . . . . . . . . 145--156
N. S. Arenstorf and
H. F. Jordan Comparing barrier algorithms . . . . . . 157--170
Theodore S. Papatheodorou and
Yiannis G. Saridakis Parallel algorithms and architectures
for multisplitting iterative methods . . 171--182
Mounir Marrakchi and
Yves Robert Optimal algorithms for Gaussian
elimination on an MIMD computer . . . . 183--194
Concettina Guerra and
Rami Melhem Synthesis of systolic algorithm design 195--207
C. F. Baillie and
G. S. Pawley A comparison of the CM with the DAP for
lattice gauge theory . . . . . . . . . . 209--220
Frank Dehne and
Anne-Lise Hassenklover and
Jörg-Rüdiger Sack Computing the configuration space for a
robot on a mesh-of-processors . . . . . 221--231
Hans-Jürgen Hotop New Kalman filter algorithms based on
orthogonal transformations for serial
and vector computers . . . . . . . . . . 233--247
A. Benaini and
Y. Robert An even faster systolic array for matrix
multiplication . . . . . . . . . . . . . 249--254
F. Hossfeld and
R. Knecht and
W. E. Nagel Multitasking: experience with
applications on a CRAY X-MP . . . . . . 259--283
Hiroshi Umeo A design of time-optimum and
register-number-minimum systolic
convolvers . . . . . . . . . . . . . . . 285--299
N. Petkov and
F. Sloboda A bit-level systolic array for digital
contour smoothing . . . . . . . . . . . 301--313
E. Eskow and
R. B. Schnabel Mathematical modeling of a parallel
global optimization algorithm . . . . . 315--325
P. Fernandes and
P. Girdinio A new storage scheme for an efficient
implementation of the sparse
matrix-vector product . . . . . . . . . 327--333
J. Berntsen Communication efficient matrix
multiplication on hypercubes . . . . . . 335--342
I. Gladwell and
R. I. Hay Vector- and parallelisation of ODE BVP
codes . . . . . . . . . . . . . . . . . 343--350
T. L. Freeman Calculating polynomial zeros on a local
memory parallel computer . . . . . . . . 351--358
George T. Papaspyropoulos and
D. G. Maritsas Parallel discrete event simulation with
SIMULA . . . . . . . . . . . . . . . . . 359--373
Tsung Chuan Huang and
Jhing-Fa Wang and
Chu Sing Yang and
Jau-Yien Lee Graph theoretic characterization and
reliability of the generalized Boolean
$n$-cube network . . . . . . . . . . . . 375--385
P. Sadayappan and
F. Ercal and
J. Ramanujam Cluster partitioning approaches to
mapping parallel programs onto a
hypercube . . . . . . . . . . . . . . . 1--16
M. R. Exum and
J. L. Gaudiot Network design and allocation
considerations in the Hughes data-flow
machine . . . . . . . . . . . . . . . . 17--34
P. Carnevali and
M. Kindelan A simplified model to predict the
performance of FORTRAN vector loops on
the IBM 3090/VF . . . . . . . . . . . . 35--46
H. Weberpals Architectural approach to the IBM 3090E
vector performance . . . . . . . . . . . 47--59
M. Zubair An optimal speedup algorithm for the
measure problem . . . . . . . . . . . . 61--71
Ronald J. Leach and
O. Michael Atogi and
Razeyah R. Stephen The actual complexity of parallel
evaluation of low degree polynomials . . 73--83
G. M. Megson Rank annihilation on a ring of
processors . . . . . . . . . . . . . . . 85--94
J. Zerovnik A parallel variant of a heuristical
algorithm for graph colouring . . . . . 95--100
Herbert Fischer Automatic differentiation: parallel
computation of function, gradient, and
Hessian matrix . . . . . . . . . . . . . 101--110
Gen-Huey Chen and
Maw-Sheng Chern and
Jin-Hwang Jang Pipeline architectures for dynamic
programming algorithms . . . . . . . . . 111--117
J. C. Tsay and
C. J. Lin A systolic design for generating
combinations in lexicographic order . . 119--125
Z. C. Shih and
R. C. T. Lee and
S. N. Yang A parallel algorithm for finding
congruent regions . . . . . . . . . . . 135--142
Sajal K. Das and
Narsingh Deo and
Sushil Prasad Parallel graph algorithms for hypercube
computers . . . . . . . . . . . . . . . 143--158
H. Eckardt System performance and execution of
scientific algorithms on the parallel
computer Parawell . . . . . . . . . . . 159--173
R. R. Oldehoeft and
J. R. McGraw Mixed applicative and imperative
programs . . . . . . . . . . . . . . . . 175--191
A. De Matteis and
S. Pagnutti A class of parallel random number
generators . . . . . . . . . . . . . . . 193--198
G. A. Geist and
G. J. Davis Finding eigenvalues and eigenvectors of
unsymmetric matrices using a
distributed-memory multiprocessor . . . 199--209
M. K. Stoj\vcev and
E. I. Milovanovi\'c and
I. \vZ. Milovanovi\'c An algorithm for multiplication of
concatenated matrices . . . . . . . . . 211--223
W. E. Nagel Exploiting autotasking on a CRAY Y-MP:
an improved software interface to
multitasking . . . . . . . . . . . . . . 225--233
Gen Huey Chen and
Hong Fa Ho and
Shieu Hong Lin and
Jang-Ping Sheu Data mapping of linear programming on
fixed-size hypercubes . . . . . . . . . 235--243
Jang-Ping Sheu and
Nan-Ling Kuo and
Gen-Huey Chen Graph search algorithms and maximum
bipartite matching algorithm on the
hypercube network model . . . . . . . . 245--251
Chii Huah Shyu A parallel algorithm for finding a
maximum weight clique of an interval
graph . . . . . . . . . . . . . . . . . 253--256
J. Dantas De Melo and
J. L. Calvet and
J. M. Garcia Vectorization and multitasking of
dynamic programming in control:
experiments on a CRAY-2 . . . . . . . . 261--269
R. Morandi and
F. Sgallari Parallel algorithms for the iterative
solution of sparse least-squares
problems . . . . . . . . . . . . . . . . 271--280
J. S. Weston and
M. Clint Two algorithms for the parallel
computation of eigenvalues and
eigenvectors of large symmetric matrices
using the ICL DAP . . . . . . . . . . . 281--288
Hyoung Joong Kim and
Jang Gyu Lee A parallel algorithm solving a
tridiagonal Toeplitz linear system . . . 289--294
S. J. Shyu and
R. C. T. Lee Solving the set cover problem on a
supercomputer . . . . . . . . . . . . . 295--300
E. V. Krishnamurthy and
M. Kunde and
M. Schimmler and
H. Schröder Systolic algorithm for tensor products
of matrices: implementation and
applications . . . . . . . . . . . . . . 301--308
G. R. Gao Exploiting fine-grain parallelism on
dataflow architectures . . . . . . . . . 309--320
R. Doallo and
E. L. Zapata A VLSI Systolic Architecture for Solving
DBT-Transformed Fuzzy Clustering
Problems of Arbitrary Size . . . . . . . 321--335
P. Lenders and
H. Schroder A programmable systolic device for image
processing based on mathematical
morphology . . . . . . . . . . . . . . . 337--344
D. W. Heermann and
A. N. Burkitt Parallelization of the Ising model and
its performance evaluation . . . . . . . 345--357
P. Michielse Parallel adaptive reservoir simulation 359--368
R. M. R. Page and
S. F. Reddaway The DAP as a filestore search engine . . 369--376
Pierre Fraigniaud and
Serge Miguet and
Yves Robert Scattering on a ring of processors . . . 377--383
Pelle Olsson and
S. Lennart Johnsson A dataparallel implementation of an
explicit method for the
three-dimensional compressible
Navier--Stokes equations . . . . . . . . 1--30
Arno Krechel and
Hans-Joachim Plum and
Klaus Stüben Parallelization and vectorization
aspects of the solution of tridiagonal
linear systems . . . . . . . . . . . . . 31--49
F. F. Rivera and
R. Doallo and
J. D. Bruguera and
E. L. Zapata and
R. Peskin Gaussian elimination with pivoting on
hypercubes . . . . . . . . . . . . . . . 51--60
U. Block and
A. Frommer and
G. Mayer Block colouring schemes for the SOR
method on local memory parallel
computers . . . . . . . . . . . . . . . 61--75
D. J. Evans and
K. Margaritis Systolic designs for
eigenvalue-eigenvector computations
using matrix powers . . . . . . . . . . 77--87
Jau-Hsiung Huang and
Leonard Kleinrock Optimal parallel merging and sorting
algorithms using $\sqrt {N}$ processors
without memory contention . . . . . . . 89--97
W. Hasselbring CELIP: a Cellular Language for Image
Processing . . . . . . . . . . . . . . . 99--109
D. J. Evans A parallel sorting-merging algorithm for
tightly coupled multiprocessors . . . . 111--121
Ramesh Natarajan A parallel algorithm for the generalized
symmetric eigenvalue problem on a hybrid
multiprocessor . . . . . . . . . . . . . 129--150
John R. Gilbert and
Hjálmtýr Hafsteinsson Parallel symbolic factorization of
sparse linear systems . . . . . . . . . 151--162
Sanjay V. Rajopadhye and
Richard M. Fujimoto Synthesizing systolic arrays from
recurrence equations . . . . . . . . . . 163--189
L. Brugnano and
M. Marrone Vectorization of some block
preconditioned conjugate gradient
methods . . . . . . . . . . . . . . . . 191--198
G. M. Megson A systolic helix for matrix
triangularisation with partial pivoting 199--206
A. de Matteis and
S. Pagnutti Long-range correlations in linear and
nonlinear random number generators . . . 207--210
J. Li and
A. Brass and
D. J. Ward and
B. Robson A study of parallel molecular dynamics
algorithms for $N$-body simulations on a
transputer system . . . . . . . . . . . 211--222
Basile Louka and
Maurice Tchuente Triangular matrix inversion on systolic
arrays . . . . . . . . . . . . . . . . . 223--228
T. Theoharis and
J. J. Modi Implementation of matrix multiplication
on the T-RACK . . . . . . . . . . . . . 229--233
Liwu Li Systolic computation with fault
diagnosis . . . . . . . . . . . . . . . 235--243
H. Mühlenbein Limitations of multi-layer perceptron
networks-steps towards genetic neural
networks . . . . . . . . . . . . . . . . 249--260
F. J. Smieja and
H. Mühlenbein The geometry of multi-layer perceptron
solutions . . . . . . . . . . . . . . . 261--275
J. Kindermann and
A. Linden Inversion of neural networks by gradient
descent . . . . . . . . . . . . . . . . 277--286
T. E. Lange Simulation of heterogeneous neural
networks on serial and parallel machines 287--303
A. Singer Implementations of artificial neural
networks on the Connection Machine . . . 305--315
Xiru Zhang and
M. McKenna and
J. P. Mesirov and
D. L. Waltz The backpropagation algorithm on grid
and hypercube architectures . . . . . . 317--327
M. Witbrock and
M. Zagha An implementation of backpropagation
learning on GF11, a large SIMD parallel
computer . . . . . . . . . . . . . . . . 329--346
D. Whitley and
T. Starkweather and
C. Bogart Genetic algorithms and neural networks:
optimizing connections and connectivity 347--361
M. F. da Mota Tenorio Topology synthesis networks:
self-organization of structure and
weight adjustment as a learning paradigm 363--380
K. Obermayer and
H. Ritter and
K. Schulten Large-scale simulations of
self-organizing neural networks on
parallel computers: application to
biological modelling . . . . . . . . . . 381--404
R. W. Kentridge Neural networks for learning in the real
world: representation, reinforcement and
dynamics . . . . . . . . . . . . . . . . 405--414
S. Knecht and
E. Laermann and
W. E. Nagel Parallelizing QCD with dynamical
fermions on a Cray multiprocessor system 3--20
Ibrahim N. Hajj and
Stig Skelboe A multilevel parallel solver for block
tridiagonal and banded linear systems 21--45
F. F. Van der Vlugt and
D. A. van Delft and
A. F. Bakker and
T. H. van der Meer The implementation of a $3$D
Navier--Stokes algorithm on an algorithm
oriented processor . . . . . . . . . . . 47--60
Amir Averbuch and
Eran Gabber and
Boaz Gordissky and
Yoav Medan A parallel FFT on an MIMD machine . . . 61--74
Michel Cosnard and
Pierre Fraigniaud Finding the roots of a polynomial on an
MIMD multicomputer . . . . . . . . . . . 75--85
I. Garcia and
J. J. Merelo and
J. D. Bruguera and
E. L. Zapata Parallel quadrant interlocking
factorization on hypercube computers . . 87--100
T. Z. Kalamboukis The symmetric tridiagonal eigenvalue
problem on a transputer network . . . . 101--106
J. Boreddy and
A. Paulraj On the performance of transputer arrays
for dense linear systems . . . . . . . . 107--117
L. Bomans and
D. Roose and
R. Hempel The Argonne/GMD macros in FORTRAN for
portable parallel programming and their
implementation on the Intel iPSC/2 . . . 119--132
Igor \vZ. Milovanovi\'c and
Emina I. Milovanovi\'c and
Mile K. Stoj\vcev An optimal algorithm for Gaussian
elimination of band matrices on an MIMD
computer . . . . . . . . . . . . . . . . 133--145
Michael Thuné A partitioning strategy for explicit
difference methods . . . . . . . . . . . 147--154
István Deák Uniform random number generators for
parallel computers . . . . . . . . . . . 155--164
Peter J. Varman and
Balakrishna R. Iyer and
Donald J. Haderle and
Stephen M. Dunn Parallel merging: algorithm and
implementation results . . . . . . . . . 165--177
Sajal K. Das and
Narsingh Deo and
Sushil Prasad Two minimum spanning forest algorithms
on fixed-size hypercube computers . . . 179--187
Ferng-Ching Lin and
Kuo Liang Chung A cost-optimal parallel tridiagonal
system solver . . . . . . . . . . . . . 189--199
F. Dehne and
A. G. Ferreira and
A. Rau-Chaplin Parallel branch and bound on
fine-grained hypercube multiprocessors 201--209
Abdelhamid Benaini and
Yves Robert Spacetime-minimal systolic arrays for
Gaussian elimination and the algebraic
path problem . . . . . . . . . . . . . . 211--225
K. Margaritis and
D. J. Evans Systolic designs for Bernoulli's method 227--240
Sung Kwon Kim Parallel algorithms for planar dominance
counting . . . . . . . . . . . . . . . . 241--246
D. Morris and
C. J. Theaker and
R. Phillips and
D. G. Evans An experimental parallel system (EPS) 247--259
Evgenij E. Tyrtyshnikov New approaches to deriving parallel
algorithms . . . . . . . . . . . . . . . 261--265
Chau-Jy Lin Parallel generation of permutations on
systolic arrays . . . . . . . . . . . . 267--276
S. R. Das and
N. H. Vaidya and
L. M. Patnaik A systolic algorithm for hidden surface
removal . . . . . . . . . . . . . . . . 277--289
Craig C. Douglas and
Willard L. Miranker Beyond massive parallelism: numerical
computation using associative tables . . 1--25
G. W. Stewart Communication and matrix computations on
large message passing systems . . . . . 27--40
Chien Min Wang and
Sheng-De Wang Structured partitioning of concurrent
programs for execution on
multiprocessors . . . . . . . . . . . . 41--57
Feng Gao and
Beresford N. Parlett A note on communication analysis of
parallel sparse Cholesky factorization
on a hypercube . . . . . . . . . . . . . 59--60
Qian Ping Gu and
Tadao Takaoka A sharper analysis of a parallel
algorithm for the all pairs shortest
path problem . . . . . . . . . . . . . . 61--67
Sathiamoorthy Manoharan and
Nigel P. Topham A general bound on schedule length for
independent tasks . . . . . . . . . . . 69--73
F. Dehne and
M. Gastaldo A note on the load balancing problem for
coarse grained hypercube dictionary
machines . . . . . . . . . . . . . . . . 75--79
D. J. Evans and
W. S. Yousif The implementation of the explicit block
iterative methods on the Balance 8000
parallel computer . . . . . . . . . . . 81--97
D. P. O'Leary and
P. Whitman Parallel $QR$ factorization by
Householder and modified Gram--Schmidt
algorithms . . . . . . . . . . . . . . . 99--112
M. F. X. B. van Swaaij and
F. V. M. Catthoor and
H. J. de Man Deriving ASIC architectures for the
Hough transform . . . . . . . . . . . . 113--121
Eric F. Van de Velde Data redistribution and concurrency . . 125--138
John M. Conroy Parallel nested dissection . . . . . . . 139--156
Michael L. Dowling Optimal code parallelization using
unimodular transformations . . . . . . . 157--171
B. Veltman and
B. J. Lageweg and
J. K. Lenstra Multiprocessor scheduling with
communication delays . . . . . . . . . . 173--182
Jau Hsiung Huang and
Leonard Kleinrock Distributed selectsort sorting
algorithms on broadcast communication
networks . . . . . . . . . . . . . . . . 183--190
G. M. Megson and
D. J. Evans Systolic arrays for group explicit
methods for solving first order
hyperbolic equations . . . . . . . . . . 191--205
D. J. Evans and
C. Li Successive underrelaxation (SUR) and
generalised conjugate gradient (GCG)
methods for hyperbolic difference
equations on a parallel computer . . . . 207--220
Stephen J. Wright Solution of discrete-time optimal
control problems on parallel computers 221--237
M. C. Counilh and
J. Roman Expression for massively parallel
algorithms-description and illustrative
example . . . . . . . . . . . . . . . . 239--251
G. M. Megson and
D. J. Evans An orthogonal systolic design for the
assignment problem . . . . . . . . . . . 253--267
N. Dodd Slow annealing versus multiple fast
annealing runs --- an empirical
investigation . . . . . . . . . . . . . 269--272
Yen Chun Lin and
Ferng-Ching Lin Parallel sorting with cooperating heaps
in a linear array of processors . . . . 273--278
D. J. Evans and
M. Adamopoulos and
S. Kortesis and
K. Tsouros Searching sets of properties with neural
networks . . . . . . . . . . . . . . . . 279--285
T. Samad and
P. Harper High-order Hopfield and Tank
optimization networks . . . . . . . . . 287--292
Marc Garbey and
David Levine Massively parallel computation of
conservation laws . . . . . . . . . . . 293--304
K. Burrage An adaptive numerical integration code
for a chain of transputers . . . . . . . 305--312
M. A. Baker and
K. C. Bowler and
R. D. Kenway MIMD implementations of linear solvers
for oil reservoir simulation . . . . . . 313--334
A. Stewart and
G. J. Shaw A parallel multigrid FAS scheme for
transputer networks . . . . . . . . . . 335--342
S. J. Shyu and
R. C. T. Lee The vectorization of the partition
problem . . . . . . . . . . . . . . . . 343--350
Tanguy Risset Implementing Gaussian elimination on a
matrix-matrix multiplication systolic
array . . . . . . . . . . . . . . . . . 351--359
F. Reale A tridiagonal solver for massively
parallel computer systems . . . . . . . 361--368
S. A. Levin A fully vectorized quicksort . . . . . . 369--373
C. Kamath and
S. Weeratunga Implementation of two projection methods
on a shared memory multiprocessor: DEC
VAX 6240 . . . . . . . . . . . . . . . . 375--382
M. Alef Concepts for efficient multigrid
implementation on SUPRENUM-like
architectures . . . . . . . . . . . . . 1--16
S. Heydorn and
P. Weidner Optimization and performance analysis of
thinning algorithms on parallel
computers . . . . . . . . . . . . . . . 17--27
P. Senechaud A MIMD Implementation of the Buchberger
Algorithm for Boolean Polynomials . . . 29--37
N. Kockler and
M. Simon Parallel singular value decomposition
with cyclic storing . . . . . . . . . . 39--47
D. J. Evans and
M. D. Levin A matrix-squaring variant of the power
method on the DAP . . . . . . . . . . . 49--54
E. Bampis and
J. C. Konig and
D. Trystram Impact of communications on the
complexity of the parallel Gaussian
Elimination . . . . . . . . . . . . . . 55--61
S. Manoharan and
P. Thanisch Assigning dependency graphs onto
processor networks . . . . . . . . . . . 63--73
C.-J. Wang and
V. P. Nelson Petri net performance modeling of a
modified mesh-connected parallel
computer . . . . . . . . . . . . . . . . 75--84
A. Torralba A systolic array with applications to
image processing and wire-routing in
VLSI circuits . . . . . . . . . . . . . 85--93
W. Dzwinel The search for an optimal multiprocessor
interconnection network . . . . . . . . 95--100
M. Wheat and
D. J. Evans Maintenance of shared data structures on
tightly coupled multiprocessors . . . . 101--107
M. Simmen Comments on broadcast algorithms for
two-dimensional grids . . . . . . . . . 109--112
Roland A. Sweet and
William L. Briggs and
Suely Oliveira and
Jules L. Porsche and
Tom Turnbull FFTs and three-dimensional Poisson
solvers for hypercubes . . . . . . . . . 121--131
Marcin Paprzyck and
Ian Gladwell Solving almost block diagonal systems on
parallel computers . . . . . . . . . . . 133--153
P. Tervola and
W. Yeung Parallel Jacobi algorithm for matrix
diagonalisation on transputer networks 155--163
D. J. Evans and
Wang Deren An asynchronous parallel algorithm for
solving a class of nonlinear
simultaneous equations . . . . . . . . . 165--180
S. M. Muller and
D. Scheerer A method to parallelize tridiagonal
solvers . . . . . . . . . . . . . . . . 181--188
F. A. Rabhi and
G. A. Manson Divide-and-conquer and parallel graph
reduction . . . . . . . . . . . . . . . 189--205
H. Schroder and
P. Strazdins Program compression on the instruction
systolic array . . . . . . . . . . . . . 207--219
Chang-Sung Jeong and
Myung-Ho Kim Fast parallel simulated annealing for
traveling salesman problem on SIMD
machines with linear interconnections 221--228
Pao-Hsu Shih and
Wu-Shung Feng An application of neural networks on
channel routing problem . . . . . . . . 229--240
Chang-Sung Jeong Parallel Vorono\u\i diagram in
${L}_1({L}_\infty)$ metric on a
mesh-connected computer . . . . . . . . 241--252
L. Bacchelli Montefusco and
C. Guerrini A domain decomposition method for
scattered data approximation on a
distributed memory multiprocessor . . . 253--263
Hong Zhang On the accuracy of the parallel diagonal
dominant algorithm . . . . . . . . . . . 265--272
H. Schröder and
E. V. Krishnamurthy Systolic computation of characteristic
polynomials of Hessenberg matrices . . . 273--277
Gen Huey Chen and
Maw Sheng Chern Synthesis of algorithms on processor
arrays . . . . . . . . . . . . . . . . . 279--284
R. J. van der Pas and
J. M. van Kats Parallelism in a multi-user environment 285--296
N. Honjou and
K. Ohtsuki and
M. Sekiya and
F. Sasaki A parallelization technique for the
speedup of configuration interaction
computing . . . . . . . . . . . . . . . 297--310
J.-Fr. Hake and
W. Homberg The impact of memory organization on the
performance of matrix calculations . . . 311--327
H. Schwandt Memory access problems in block cyclic
reduction on vector computers . . . . . 329--346
M. Kiehl A vector implementation of an ODE code
for multi-point-boundary-value problems 347--352
T. Tollenaere and
G. A. Orban Simulating modular neural networks on
message-passing multiprocessors . . . . 361--379
Xiaobo Li Nearest neighbor classification on two
types of SIMD machines . . . . . . . . . 381--407
Ilan Bar-On Efficient logarithmic time parallel
algorithms for the Cholesky
decomposition and Gram--Schmidt process 409--417
S. Bondeli Divide and conquer: a parallel algorithm
for the solution of a tridiagonal linear
system of equations . . . . . . . . . . 419--434
Fridrich Sloboda A projection method of the Cimmino type
for linear algebraic systems . . . . . . 435--442
E. Taillard Robust taboo search for the quadratic
assignment problem . . . . . . . . . . . 443--455
Yen-Chun Lin An FP-based tool for the synthesis of
regular array algorithms . . . . . . . . 457--470
Z. Mahjoub and
F. Karoui-Sahtout Parallel algorithms for redundant
precedence relations elimination in task
systems . . . . . . . . . . . . . . . . 471--481
E. V. Krishnamurthy and
H. Schröder Systolic algorithm for multivariable
approximation using tensor products of
basis functions . . . . . . . . . . . . 483--492
H. Schroder and
V. K. Murthy and
E. V. Krishnamurthy Systolic algorithm for polynomial
interpolation and related problems . . . 493--503
Chang-Sung Jeong An improved parallel algorithm for
constructing Vorono\u\i diagram on a
mesh-connected computer . . . . . . . . 505--514
Yen-Chun Lin Array size anomaly of problem-size
independent systolic arrays for
matrix-vector multiplication . . . . . . 515--522
S. Storoy and
T. Sorevik A note on an orthogonal systolic design
for the assignment problem . . . . . . . 523--525
Sajal K. Das and
Cui-Qing Yang Performance of parallel spanning tree
algorithms on linear arrays of
transputers and Unix systems . . . . . . 527--551
G. Pini A parallel algorithm for the partial
eigensolution of sparse symmetric
matrices on the CRAY Y-MP . . . . . . . 553--561
I. Gohberg and
I. Koltracht and
A. Averbuch and
B. Shoham Timing analysis of a parallel algorithm
for Toeplitz matrices on a MIMD parallel
machine . . . . . . . . . . . . . . . . 563--577
U. Detert and
G. Hofemann CRAY X-MP and Y-MP memory performance 579--590
M. D. Levin and
D. J. Evans The inversion of matrices by the
double-bordering algorithm on MIMD
computers . . . . . . . . . . . . . . . 591--602
Paul N. Swarztrauber and
Roland A. Sweet and
William L. Briggs and
Van Emden Henson and
James Otto Bluestein's FFT for arbitrary ${N}$ on
the hypercube . . . . . . . . . . . . . 607--617
H. Mühlenbein and
M. Schomisch and
J. Born The parallel genetic algorithm as
function optimizer . . . . . . . . . . . 619--632
V. V. R. Prasad and
C. Siva Ram Murthy Downloading node programs/data into
hypercubes . . . . . . . . . . . . . . . 633--642
Constantine N. K. Osiakwan and
Selim G. Akl Parallel computation of matchings in
trees . . . . . . . . . . . . . . . . . 643--656
Manfred Schimmler Parallel strong orientation on a mesh
connected computer . . . . . . . . . . . 657--664
Michael Thuné Straightforward partitioning of
composite grids for explicit difference
methods . . . . . . . . . . . . . . . . 665--672
T. L. Freeman and
M. K. Bane Asynchronous polynomial zero-finding
algorithms . . . . . . . . . . . . . . . 673--681
Stephan Olariu and
Zhaofang Wen and
Wei Xiong Zhang A faster optimal algorithm for the
measure problem . . . . . . . . . . . . 683--687
S. Olariu and
Z. Wen An efficient parallel algorithm for
multiselection . . . . . . . . . . . . . 689--693
D. Fischer On superlinear speedups . . . . . . . . 695--697
J. Hagemann Combinatorial structures for
multiprocessor-systems . . . . . . . . . 699--706
D. P. Bertsekas and
D. A. Castanon Parallel synchronous and asynchronous
implementations of the auction algorithm 707--732
D. Moncrieff and
V. R. Saunders and
S. Wilson Parallel processing using macro-tasking
in a multi-job environment on a CRAY
Y-MP computer . . . . . . . . . . . . . 733--750
C. Phillips The performance of the BLAS and LAPACK
on a shared memory scalar multiprocessor 751--761
S. K. Kim and
A. T. Chronopoulos A class of Lanczos-like algorithms
implemented on parallel computers . . . 763--778
K. Wright Parallel algorithms for $QR$
decomposition on a shared memory
multiprocessor . . . . . . . . . . . . . 779--790
F. Wiegand and
B. S. Hoyle Development and implementation of
real-time ultrasound process tomography
using a transputer network . . . . . . . 791--807
A. Corana and
A. Casaleggio and
C. Rolando and
S. Ridella Efficient computation of the correlation
dimension from a time series on a LIW
computer . . . . . . . . . . . . . . . . 809--820
C.-H. Wu and
R. E. Hodges and
C. J. Wang Parallelizing the self-organizing
feature map on multiprocessor systems 821--832
D. J. Evans and
S. Chikohora The alternating group explicit (AGE)
method on a transputer network . . . . . 833--843
V. Topkar and
O. Frieder and
A. K. Sood Duplicate removal on hypercube engines:
an experimental analysis . . . . . . . . 845--871
E. D. Chajakis and
S. A. Zenios Synchronous and asynchronous
implementations of relaxation algorithms
for nonlinear network optimization . . . 873--894
Y. Huang and
Y. Paker A parallel FFT algorithm for transputer
networks . . . . . . . . . . . . . . . . 895--906
E. Francomano and
A. Pecorella and
A. Tortorici Macaluso Parallel experience on the inverse
matrix computation . . . . . . . . . . . 907--912
H. Park A parallel algorithm for the unbalanced
orthogonal Procrustes problem . . . . . 913--923
D. J. Evans The parallel AGE method for the elliptic
problem in two dimensions . . . . . . . 925--940
Y.-H. Choi Reconfigurable VLSI/WSI multipipelines 941--952
D. Hutchinson and
B. M. S. Khalaf Parallel algorithms for solving initial
value problems: front broadening and
embedded parallelism . . . . . . . . . . 957--968
A. De Gloria and
P. Faraboschi A Boltzmann Machine approach to code
optimization . . . . . . . . . . . . . . 969--982
Wen Tsuen Chen and
Ming Yi Fang An efficient procedure for theorem
proving in propositional logic on vector
computers . . . . . . . . . . . . . . . 983--995
S. Horiguchi Hybrid systolic sorters . . . . . . . . 997--1007
S. Selvakumar and
C. Siva Ram Murthy An efficient algorithm for mapping VLSI
circuit simulation programs onto
multiprocessors . . . . . . . . . . . . 1009--1016
L. Brugnano A parallel solver for tridiagonal linear
systems for distributed memory parallel
computers . . . . . . . . . . . . . . . 1017--1023
V. R. Saunders and
S. Wilson ``Scavenger'' programming for the CRAY
X-MP computer (Short communication) . . 1025--1034
M. Wheat and
D. J. Evans Asynchronous parallel merging . . . . . 1035--1041
L. C. Waring and
M. Clint Parallel Gram--Schmidt orthogonalisation
on a network of transputers . . . . . . 1043--1050
J. Erhel and
A. Traynard and
M. Vidrascu An element-by-element preconditioned
conjugate gradient method implemented on
a vector computer . . . . . . . . . . . 1051--1065
J. Worlton Toward a taxonomy of performance metrics 1073--1092
Xian-He Sun and
J. L. Gustafson Toward a better parallel performance
metric . . . . . . . . . . . . . . . . . 1093--1109
R. Hockney Performance parameters and benchmarking
of supercomputers . . . . . . . . . . . 1111--1130
W. Schonauer and
H. Hafner Performance estimates for
supercomputers: the responsibilities of
the manufacturer and of the user . . . . 1131--1149
R. P. Weicker A detailed look at some popular
benchmarks . . . . . . . . . . . . . . . 1153--1172
M. Berry and
G. Cybenko and
J. Larson Scientific benchmark characterizations 1173--1194
K. M. Dixit The SPEC benchmarks . . . . . . . . . . 1195--1209
A. J. van der Steen The benchmark of the EuroBen group . . . 1211--1221
D. Levine and
D. Callahan and
J. Dongarra A comparative study of automatic
vectorizing compilers . . . . . . . . . 1223--1244
J. Dongarra and
M. Furtney and
S. Reinhardt and
J. Russell Parallel loops --- a test suite for
parallelizing compilers: description and
example results . . . . . . . . . . . . 1247--1255
C. M. Grassl Parallel performance of applications on
supercomputers . . . . . . . . . . . . . 1257--1273
A. J. G. Hey The Genesis distributed memory
benchmarks . . . . . . . . . . . . . . . 1275--1283
T. H. Dunigan Performance of the Intel iPSC/860 and
Ncube 6400 hypercubes . . . . . . . . . 1285--1302
W. E. Nagel and
M. A. Linn Benchmarking parallel programs in a
multiprogramming environment: the
PAR-Bench system . . . . . . . . . . . . 1303--1321
S. Arvindam and
V. Kumar and
V. Nageshwara Rao and
V. Singh Automatic test pattern generation on
parallel processors . . . . . . . . . . 1323--1342
Jenn Yang Tien and
Wei Pang Yang Hierarchical spanning trees and
distributing on incomplete hypercubes 1343--1360
Dieter Müller-Wichards Problem size scaling in the presence of
parallel overhead . . . . . . . . . . . 1361--1376
D. G. Feitelson Deadlock detection without wait-for
graphs . . . . . . . . . . . . . . . . . 1377--1383
A. Chakraborty and
D. C. S. Allison and
C. J. Ribbens and
L. T. Watson Note on unit tangent vector computation
for homotopy curve tracking on a
hypercube . . . . . . . . . . . . . . . 1385--1395
G. Bader and
E. Gehrke On the performance of transputer
networks for solving linear systems of
equations . . . . . . . . . . . . . . . 1397--1407
A. Peters Sparse matrix vector multiplication
techniques on the IBM 3090 VF . . . . . 1409--1424
Y. Escaig and
W. Oed Analysis tools for Micro- and
Autotasking programs on CRAY
multiprocessor systems . . . . . . . . . 1425--1433
E. Chu and
A. George A balanced submatrix merging algorithm
for multiprocessor architectures . . . . 1--10
G. Lotti and
M. Vajtersic The application of VLSI Poisson solvers
to the biharmonic problem . . . . . . . 11--19
G. Horton and
R. Knirsch A time-parallel multigrid-extrapolation
method for parabolic partial
differential equations . . . . . . . . . 21--29
D. Conforti and
L. Grandinetti and
R. Musmanno and
M. Cannataro and
G. Spezzano and
D. Talia A model of efficient asynchronous
parallel algorithms on multicomputer
systems . . . . . . . . . . . . . . . . 31--45
C. Neusius and
J. Olszewski and
D. Scheerer An efficient distributed thinning
algorithm . . . . . . . . . . . . . . . 47--55
A. De Gloria and
P. Faraboschi and
S. Ridella A dedicated massively parallel
architecture for the Boltzmann machine 57--73
V. K. Murthy and
E. V. Krishnamurthy and
Pin Chen Systolic algorithm for rational
interpolation and Padé approximation . . 75--83
Anatol G. Filin and
Michael A. Frumkin A systolic array for inversion of a
finite Radon transform . . . . . . . . . 85--90
M. Wheat and
D. J. Evans An efficient parallel sorting algorithm
for shared memory multiprocessors . . . 91--102
El-Sayed M. El-Horbaty and
A. El-Din H. Mohamed A synchronous algorithm for shortest
paths on a tree machine . . . . . . . . 103--107
W. Erhard and
A. Grefe Improved parallel algorithms for the
classification of electroencephalograms
(EEGs) on the DAP510 . . . . . . . . . . 109--115
R. von Hanxleden and
L. R. Scott Correctness and determinism of Parallel
Monte Carlo Processes . . . . . . . . . 121--132
Tzung-Pei Hong and
Shian-Shyong Tseng Parallel perceptron learning on a
single-channel broadcast communication
model . . . . . . . . . . . . . . . . . 133--148
D. Audet and
Y. Savaria and
J.-L. Houle Performance improvements to VLSI
parallel systems, using dynamic
concatenation of processing resources 149--167
M. Marrakchi Optimal parallel scheduling for the
$2$-steps graph with constant task cost 169--176
Hong Shen Improved universal $k$-selection in
hypercubes . . . . . . . . . . . . . . . 177--184
Ph. Clauss and
C. Mongenet and
G. R. Perrin Synthesis of size-optimal toroidal
arrays for the Algebraic Path Problem: a
new contribution . . . . . . . . . . . . 185--194
D. J. Evans A systolic array design for matrix
system solution by the symmetric
bordering method . . . . . . . . . . . . 195--205
T. Z. Kalamboukis A parallel algorithm for the dense
symmetric eigenvalue problem on a
transputer array . . . . . . . . . . . . 207--212
Przemys\law Stpiczy\'nski Parallel Cholesky factorization on
orthogonal multiprocessors . . . . . . . 213--219
Chang-Sung Jeong and
Jung-Ju Choi and
Der Tsai Lee Parallel enclosing rectangle on SIMD
machines . . . . . . . . . . . . . . . . 221--229
S. Kohlhoff and
J. Krone Performance evaluation of SUPRENUM for
the LINPACK benchmark (Short
communication) . . . . . . . . . . . . . 231--238
R. Hiromoto and
B. R. Wienke and
R. G. Brickner The performance of asynchronous
iteration schemes applied to the
linearized Boltzmann transport equation 241--268
A. Schuller Parallelizing particle simulations based
on the Boltzmann equation . . . . . . . 269--279
J. Andrew Holey and
Oscar H. Ibarra Iterative algorithms for the planar
convex hull problem on mesh-connected
arrays . . . . . . . . . . . . . . . . . 281--296
B. Robic and
P. Kolbezen and
J. Silc Area optimization of dataflow-graph
mappings . . . . . . . . . . . . . . . . 297--311
P. Casiccia and
P. Castangia and
S. Cincotti and
G. Parodi Simulation of a molecular cellular array
on a transputer-based parallel computer 313--324
K. G. Margaritis and
D. J. Evans Systolic implementation of neural
networks for searching sets of
properties . . . . . . . . . . . . . . . 325--334
W. Loots and
T. H. C. Smith A parallel three phase sorting procedure
for a $k$-dimensional hypercube and a
transputer implementation . . . . . . . 335--344
Eric Goles and
Marcos Kiwi A lower bound on the computational
complexity of the $QR$ decomposition on
a shared memory SIMD computer . . . . . 345--354
G. M. Megson and
D. J. Evans More on systolic line drawing . . . . . 355--358
P. H. Worley The effect of multiprocessor radius on
scaling . . . . . . . . . . . . . . . . 361--376
Su Chu Hsu and
Hsien Fen Hsieh and
Shing Tsaan Huang A fully-pipelined systolic algorithm for
finding bridges on an undirected
connected graph . . . . . . . . . . . . 377--391
Hong Chich Chou and
Chung Ping Chung A bound analysis of scheduling
instructions on pipelined processors
with a maximal delay of one cycle . . . 393--399
I. Mahadevan and
L. M. Patnaik Performance evaluation of bidirectional
associative memory on a transputer-based
parallel system . . . . . . . . . . . . 401--413
G. M. Megson and
O. Brudaru and
D. Comish Systolic designs for Aitken's root
finding method . . . . . . . . . . . . . 415--429
Pl. Iv. Piskoulijski Error analysis of parallel algorithm for
the solution of a tridiagonal Toeplitz
linear system of equations . . . . . . . 431--438
Gen-Huey Chen and
Wei-Wen Liang Conflict-free broadcasting algorithms
for graph traversals and their
applications . . . . . . . . . . . . . . 439--448
C. P. Thompson and
W. R. Cowell and
G. K. Leaf On the parallelization of an adaptive
multigrid algorithm for a class of flow
problems . . . . . . . . . . . . . . . . 449--466
H. C. Burg and
J. Helin 1991 International Conference on
Supercomputing . . . . . . . . . . . . . 467
H.-C. Hege and
R. Knecht Parallel Computing 91 . . . . . . . . . 473
Y. Robert and
S. W. Song Revisiting cycle shrinking . . . . . . . 481--496
Yuh-Horng Shiau and
Chung-Ping Chung Adoptability and effectiveness of
microcode compaction algorithms in
superscalar processing . . . . . . . . . 497--510
R. Lin and
S. Olariu A fast cost-optimal parallel algorithm
for the lowest common ancestor problem 511--516
E. D. Adamides and
Ph. Tsalides and
A. Thanailakis Hierarchical Cellular Automata
structures . . . . . . . . . . . . . . . 517--524
D. J. Evans and
M. Gusev Implementation of folding
transformations on linear VLSI processor
arrays . . . . . . . . . . . . . . . . . 525--542
R. S. Francis and
L. J. H. Pannan A parallel partition for enhanced
parallel QuickSort . . . . . . . . . . . 543--550
F. Suraweera and
P. Bhattacharya A parallel cost-optimal algorithm to
compute the supremum of max-min powers 551--556
H. Schreiber and
O. Steinhauser and
P. Schuster Parallel molecular dynamics of
biomolecules . . . . . . . . . . . . . . 557--573
T. Dontje and
Th. Lippert and
N. Petkov and
K. Schilling Statistical analysis of
simulation-generated time series:
Systolic vs. semi-systolic correlation
on the Connection Machine . . . . . . . 575--588
Ajay K. Gupta and
Susanne E. Hambrusch Load balanced tree embeddings . . . . . 595--614
Y. P. Boglaev Exact dynamic load balancing of MIMD
architectures with linear programming
algorithms . . . . . . . . . . . . . . . 615--623
Chien-Min Wang and
Sheng-De Wang A hybrid scheme for efficiently
executing nested loops on
multiprocessors . . . . . . . . . . . . 625--637
J.-C. Bermond and
P. Michallon and
D. Trystram Broadcasting in wraparound meshes with
parallel monodirectional links . . . . . 639--648
Ömer E\ugecio\uglu and
Çetin K. Koç A parallel algorithm for generating
discrete orthogonal polynomials . . . . 649--659
B. M. S. Khalaf and
D. Hutchinson Parallel algorithms for initial value
problems: parallel shooting . . . . . . 661--673
J. Andersen and
G. Mitra and
D. Parkinson The scheduling of sparse matrix-vector
multiplication on a massively parallel
DAP computer . . . . . . . . . . . . . . 675--697
J. M. D. Hill Parallel lexical analysis and parsing on
the AMT distributed array processor . . 699--714
E. Rothberg and
A. Gupta Parallel ICCG on a hierarchical memory
multiprocessor --- Addressing the
triangular solve bottleneck . . . . . . 719--741
T. Takeda and
K. Tani and
T. Tsunematsu and
Y. Kishimoto and
G. I. Kurita and
S. Matsushita and
T. Nakata Plasma simulator METIS for tokamak
confinement and heating studies . . . . 743--765
L. Lopez and
T. Politi Parallel methods in the numerical
treatment of population dynamic models 767--777
Jianjian Song A distributed-termination experiment on
a mesh-connected array of processors . . 779--791
D. Morris and
D. G. Evans Modelling distributed and parallel
computer systems . . . . . . . . . . . . 793--806
Laurence Boxer Finding congruent regions in parallel 807--810
Gen Huey Chen and
Jin Hwang Jang An improved parallel algorithm for $0/1$
knapsack problem . . . . . . . . . . . . 811--821
Yung Chen Hung and
Gen Huey Chen Distributed algorithms for the quickest
path problem . . . . . . . . . . . . . . 823--834
Srinivas Aluru and
G. M. Prabhu and
John Gustafson A random number generator for parallel
computers . . . . . . . . . . . . . . . 839--847
P. Sreenivasa Kumar and
M. Kishore Kumar and
A. Basu A parallel algorithm for elimination
tree computation and symbolic
factorization . . . . . . . . . . . . . 849--856
M. Gusev and
J. Tasic Comparative analysis of methods for
broadcast elimination . . . . . . . . . 857--866
M. Thune The partitioning problem for a class of
data parallel algorithms . . . . . . . . 867--878
M. P. Bekakos and
D. J. Evans The double alternating group explicit
method for nonlinear parabolic equations
on MIMD parallel computers . . . . . . . 879--895
J. Zerovnik and
M. Kaufman A parallel variant of a heuristical
algorithm for graph coloring ---
Corrigendum (Short communication) . . . 897--900
K. Okamoto and
Y. Kodama and
S. Sakai and
Y. Yamaguchi Methodologies in development and testing
of the dataflow machine EM-4 . . . . . . 901--912
K. R. Tout and
D. J. Evans Parallel forward chaining technique with
dynamic scheduling, for rule-based
expert systems . . . . . . . . . . . . . 913--930
R. Butel A Cray-2 versus CM-2 comparison using
several polynomial benchmarks . . . . . 931--945
W. Oed Cray Y-MP C90: System features and early
benchmark results (Short communication) 947--954
S. Stark and
A. N. Beris LU decomposition optimized for a
parallel computer with a hierarchical
distributed memory . . . . . . . . . . . 959--971
Jack J. Dongarra and
Robert A. van de Geijn Reduction to condensed form for the
eigenvalue problem on distributed memory
architectures . . . . . . . . . . . . . 973--982
Y. P. Chu and
C. M. Hsieh An artificial neural network model with
modified perceptron algorithm . . . . . 983--996
M. Gusev and
D. J. Evans VLSI processor array IPS cells (Short
communication) . . . . . . . . . . . . . 997--1007
G. Zhang and
H. C. Elman Parallel sparse Cholesky factorization
on a shared memory multiprocessor . . . 1009--1022
M. Bentley and
C. Froese Fischer Hypercube conversion of serial codes for
atomic structure calculations . . . . . 1023--1031
S. S. Nielsen and
S. A. Zenios Data structures for network algorithms
on massively parallel architectures . . 1033--1052
J. Tasic and
M. Gusev and
D. J. Evans Systolic implementation of
preconditioned conjugate gradient method
in adaptive transversal filters . . . . 1053--1065
R. W. Hockney and
E. A. Carmona Comparison of communications on the
Intel iPSC/860 and Touchstone Delta
(Short communication) . . . . . . . . . 1067--1072
H. Strauss Parallel CFD'92 . . . . . . . . . . . . 1073
M. V. A. Hancu and
K. Iwasaki and
Y. Sato and
M. Sugie Experimental results on the error
detection capability of a concurrent
test architecture for massively-parallel
computers . . . . . . . . . . . . . . . 1079--1103
Peter Arbenz Divide and conquer algorithms for the
bandsymmetric eigenvalue problem . . . . 1105--1128
A. Basermann and
P. Weidner A parallel algorithm for determining all
eigenvalues of large real symmetric
tridiagonal matrices . . . . . . . . . . 1129--1141
Lih-Hsing Hsu and
Peng Fei Wang and
Chu Tao Wu Parallel algorithms for finding the most
vital edge with respect to minimum
spanning tree . . . . . . . . . . . . . 1143--1155
T. Chockalingam and
S. Arunkumar A randomized heuristics for the mapping
problem: The genetic approach . . . . . 1157--1165
E. Violard and
G.-R. Perrin PEI: a language and its refinement
calculus for parallel programming . . . 1167--1184
Y.-H. Choi An easily-diagnosable fault-tolerant
binary tree architecture (Short
communication) . . . . . . . . . . . . . 1185--1195
S. L. Johnsson and
R. L. Krawitz Cooley--Tukey FFT on the Connection
Machine . . . . . . . . . . . . . . . . 1201--1221
M. Zubair and
S. N. Gupta and
C. E. Grosch A variable precision approach to speedup
iterative schemes on fine grained
parallel machines (short communication) 1223--1231
Emmanouel A. Varvarigos and
Dimitri P. Bertsekas Communication algorithms for isotropic
tasks in hypercubes and wraparound
meshes . . . . . . . . . . . . . . . . . 1233--1257
Roman G. Strongin and
Yaroslav D. Sergeyev Global multidimensional optimization on
parallel computer . . . . . . . . . . . 1259--1273
I. Vlahavas and
P. Kefalas A parallel Prolog resolution based on
multiple unifications . . . . . . . . . 1275--1283
F. J. Peters Preface . . . . . . . . . . . . . . . . 1289
T. Lippert and
K. Schilling and
N. Petkov Quark propagator on the Connection
Machine . . . . . . . . . . . . . . . . 1291--1299
Mi Lu and
Xiangzhen Qiao Applying parallel computer systems to
solve symmetric tridiagonal eigenvalue
problems . . . . . . . . . . . . . . . . 1301--1315
E. M. Daoudi and
J. Lobry Implementation of a boundary element
method on distributed memory computers 1317--1324
M. Clint and
J. S. Weston and
C. W. Bleakney A comparison of two Fortran dialects for
expressing parallel solutions for a
problem in linear algebra . . . . . . . 1325--1333
B. Khan and
L. Hayes and
A. P. Cracknell The optimisation of higher order
resampling methods in a multiprocessor
environment . . . . . . . . . . . . . . 1335--1347
P. Spee and
W. F. Wong and
M. Sato and
E. Goto Evaluation of the continuation bit in
the Cyclic Pipeline Computer . . . . . . 1349--1361
D. Sharp and
M. Cripps and
J. Darlington Parallel-architecture-directed program
transformation . . . . . . . . . . . . . 1363--1380
D. K. Arvind On the detection of
communication-related errors in
concurrent programs . . . . . . . . . . 1381--1392
C. Ribeiro and
D. El Baz A parallel optimal routing algorithm . . 1393--1402
A. W. G. Duller and
R. Storer Simulation and verification of
associative processor arrays . . . . . . 1403--1414
B. Quatember Concept of a crossbar switch for
large-scale multiple processor systems
in the field of process control . . . . 1415--1431
S. Foresti and
S. Hassanzadeh and
H. Murakami and
V. Sonnad Parallel rapid operator for iterative
finite element solvers on a shared
memory machine . . . . . . . . . . . . . 1--7
P. Edmonds and
E. Chu and
A. George Dynamic programming on a shared-memory
multiprocessor . . . . . . . . . . . . . 9--22
G. Lonsdale and
A. Schuller Multigrid efficiency for complex flow
simulations on distributed memory
machines . . . . . . . . . . . . . . . . 23--32
H. Barada and
A. El- Amawy A methodology for algorithm
regularization and mapping into
time-optimal VLSI arrays . . . . . . . . 33--61
N. Funabiki and
Y. Takefuji A parallel multi-layer channel router on
the HVH model . . . . . . . . . . . . . 63--77
D. J. Evans and
C. R. Wan Parallel direct solution for $P$-cyclic
matrix systems . . . . . . . . . . . . . 79--93
S. G. Akl and
Ke Qiu A novel routing scheme on the star and
pancake networks and its applications 95--101
J. Struckmeier and
F. J. Pfreundt On the efficiency of simulation methods
for the Boltzmann equation on parallel
computers . . . . . . . . . . . . . . . 103--119
S. Sakai and
Y. Kodama and
Y. Yamaguchi Design and implementation of a circular
omega network in the EM-4 . . . . . . . 125--142
P. S. Laursen Simple approaches to parallel Branch and
Bound . . . . . . . . . . . . . . . . . 143--152
E. Ng Supernodal symbolic Cholesky
factorization on a local-memory
multiprocessor . . . . . . . . . . . . . 153--162
A. De Gloria and
P. Faraboschi and
M. Olivieri Clustered Boltzmann Machines: Massively
parallel architectures for constrained
optimization problems . . . . . . . . . 163--175
G. P. Balboni and
G. P. Cabodi and
S. Gai and
M. Sonza Reorda A parallel system for test pattern
generation . . . . . . . . . . . . . . . 177--185
P. Sreenivasa Kumar and
M. K. Kumar and
A. Basu Parallel algorithms for sparse
triangular system solution . . . . . . . 187--196
M. Y. Mohd-Saman and
D. J. Evans Investigation of a set of Bernstein
Tests for the detection of loop
parallelization . . . . . . . . . . . . 197--207
G. Horton A multi-level diffusion method for
dynamic load balancing . . . . . . . . . 209--218
Yi-Bing Lin Parallel trace-driven simulation for
packet loss in finite-buffered voice
multiplexers . . . . . . . . . . . . . . 219--228
Stephan Olariu and
James L. Schwing and
Jingyuan Zhang Applications of reconfigurable meshes to
constant-time computations . . . . . . . 229--237
E. Chu and
A. George and
D. Quesnel Parallel matrix inversion on a
subcube-grid . . . . . . . . . . . . . . 243--256
Volker Mehrmann Divide and conquer methods for block
tridiagonal systems . . . . . . . . . . 257--279
Bassem F. Beidas and
George P. Papavassilopoulos Convergence analysis of asynchronous
linear iterations with stochastic delays 281--302
C. R. Wan and
D. J. Evans A systolic array architecture for linear
and inverse matrix systems . . . . . . . 303--321
Zhiyong Liu and
Jia-Huai You Conflict-free routing for
BPC-permutations on synchronous
hypercubes . . . . . . . . . . . . . . . 323--342
A. G. Chalmers and
S. Gregory Constructing minimum path configurations
for multiprocessor systems . . . . . . . 343--355
S. Lakshmivarahan and
Jung Sing Jwo and
S. K. Dhall Symmetry in interconnection networks
based on Cayley graphs of permutation
groups: a survey . . . . . . . . . . . . 361--407
A. R. Krommer and
C. W. Ueberhuber Architecture adaptive algorithms . . . . 409--435
M. Mantharam and
P. J. Eberlein New Jacobi-sets for parallel
computations . . . . . . . . . . . . . . 437--454
M. Atiquzzaman and
M. M. Banat Effect of hot-spots on the performance
of crossbar multiprocessor systems . . . 455--461
M. Graca Ruano and
D. F. Garcia Nocetti and
P. J. Fish and
P. J. Fleming Alternative parallel implementations of
an AR-modified covariance spectral
estimator for diagnostic ultrasonic
blood flow studies . . . . . . . . . . . 463--476
Mythili Mantharam and
P. J. Eberlein Block recursive algorithm to generate
Jacobi-sets . . . . . . . . . . . . . . 481--496
Mokhtar Aboelaze and
De-Lei L. Lee A method for data allocation and
manipulation in hypercube computers . . 497--510
M. Bahi and
J. C. Miellou Contractive mappings with maximum norms:
comparison of constants of contraction
and application to asynchronous
iterations . . . . . . . . . . . . . . . 511--523
M. Misra and
D. Nassimi and
V. K. Prasanna Efficient VLSI implementation of
iterative solutions to sparse linear
systems . . . . . . . . . . . . . . . . 525--544
M. P. Bekakos and
D. J. Evans Parallel cyclic odd-even reduction
algorithms for solving Toeplitz
tridiagonal equations on MIMD computers 545--561
G. Spaletta and
D. J. Evans The Parallel Recursive Decoupling
algorithm for solving tridiagonal linear
systems . . . . . . . . . . . . . . . . 563--576
E. V. Krishnamurthy and
Chen Pin Data parallel evaluation-interpolation
algorithm for polynomial matrix
inversion . . . . . . . . . . . . . . . 577--589
S. Chandra and
M. Jain and
A. Basu and
P. S. Kumar Sorting algorithms on transputer arrays 595--607
T. B. Boffey and
W. A. Essah Implementing a parallel constrained
$\ell_1$ approximation algorithm . . . . 609--620
A. N. Choudhary and
B. Narahari and
R. Krishnamurti An efficient heuristic scheme for
dynamic remapping of parallel
computations (Short communication) . . . 621--632
H. Azaria and
Y. Elovici Modeling and evaluation of a new
message-passing system for parallel
multiprocessor systems . . . . . . . . . 633--649
M. Paprzycki and
I. Gladwell A parallel chopping algorithm for ODE
boundary value problems . . . . . . . . 651--666
F. Pagano and
G. Parodi and
R. Zunino Parallel implementation of associative
memories for image classification . . . 667--684
R. Campanini and
I. D'Antone and
G. Di Caro and
G. Giusti A transputer-based parallel expert
diagnostic system . . . . . . . . . . . 685--692
Y.-W. Leung On-line fault identification in
multistage interconnection networks . . 693--702
E. J. Kontoghiorghes and
M. R. B. Clarke Parallel reorthogonalization of the $QR$
decomposition after deleting columns
(Short communication) . . . . . . . . . 703--707
S. J. Horng Computing dominators on a cube-connected
machine . . . . . . . . . . . . . . . . 713--728
J. D. Bruguera and
E. Antelo and
E. L. Zapata Design of a pipelined radix 4 CORDIC
processor . . . . . . . . . . . . . . . 729--744
C. N. Zhang and
H. F. Li and
R. Jayakumar A systematic approach for designing
concurrent error-detecting systolic
arrays using redundancy . . . . . . . . 745--764
Ren-Lianq Cheng and
Chung-Ping Chung Reaching approximate agreement on
hypercube . . . . . . . . . . . . . . . 765--775
P. A. Nelson Hypercube matrix multiplication . . . . 777--788
A. El-Amawy and
R. Raja Split sequence generation algorithms for
efficient identification of operational
subcubes in faulty hypercubes . . . . . 789--805
Yung-Chang Wong and
Shu-Yuen Hwang On parallelizing the Dempster-Shafer
method using transputer network . . . . 807--822
S. D. Altekar and
A. K. Ray and
B. R. Wienke On the parallelization of a $S_n$
transport algorithm on a CRAY Y MP . . . 823--834
B. L. Menezes and
I. L. M. Ricarte and
R. Thurimella Analysis of pipelined external sorting
on a reconfigurable message-passing
multicomputer . . . . . . . . . . . . . 839--858
Nicolas Boissin and
Jean-Luc Lutton A parallel simulated annealing algorithm 859--872
Louis Ibarra and
Dana Richards Efficient parallel graph algorithms
based on open ear decomposition . . . . 873--886
Jiawang Wei Parallel asynchronous iterations of
least fixed points . . . . . . . . . . . 887--895
D. J. Evans and
W. U. N. Butt Dynamic load balancing using
task-transfer probabilities . . . . . . 897--916
Przemys\law Stpiczy\'nski Error analysis of two parallel
algorithms for solving linear recurrence
systems . . . . . . . . . . . . . . . . 917--923
John J. Buoni and
Paul A. Farrell and
Arden Ruttan Algorithms for ${LU}$ decomposition on a
shared memory multiprocessor . . . . . . 925--937
Jianping Zhu $QR$ factorization for the regularized
least squares problem on hypercubes . . 939--948
A. Matrone and
P. Schiano and
V. Puoti LINDA and PVM: a comparison between two
environments for parallel programming 949--957
Zbigniew J. Czech and
Marek Konopka and
Bohdan S. Majewski Parallel algorithms for finding a
suboptimal fundamental-cycle set in a
graph . . . . . . . . . . . . . . . . . 961--971
I. W. Chan and
D. K. Friesen Parallel algorithm for segment
visibility reporting . . . . . . . . . . 973--978
L. Lopez Methods based on boundary value
techniques for solving parabolic
equations on parallel computers . . . . 979--991
Hong Shen A high performance interconnection
network for multiprocessor systems . . . 993--1001
H. Caffey and
L. Z. Liao and
C. A. Shoemaker Parallel processing of large scale
discrete-time unconstrained differential
dynamic programming . . . . . . . . . . 1003--1017
D. El Baz Asynchronous implementation of
relaxation and gradient algorithms for
convex network flow problems . . . . . . 1019--1028
R. Trobec and
I. Jerebic and
D. Janezic Parallel algorithm for molecular
dynamics integration . . . . . . . . . . 1029--1039
P. Altevogt and
A. Linke Parallelization of the two-dimensional
Ising model on a cluster of IBM RISC
System/6000 workstations . . . . . . . . 1041--1052
A. Nanayakkara and
D. Moncrieff and
S. Wilson Performance of IBM RISC System/6000
workstation clusters in a quantum
chemical application . . . . . . . . . . 1053--1062
A. Jakobs and
R. W. Gerling Scaling aspects for the performance of
parallel algorithms . . . . . . . . . . 1063--1073
Xiaobo Li and
Paul Lu and
Jonathan Schaeffer and
John Shillington and
Pok Sze Wong and
Hanmao Shi On the versatility of parallel sorting
by regular sampling . . . . . . . . . . 1079--1103
Rajesh Aggarwal and
David R. Dellwo and
Morton B. Friedman Parallel solution of Fredholm integral
equations of the second kind by
accelerated projection methods . . . . . 1105--1115
Maria Antonietta Pirozzi The fast numerical solution of mildly
nonlinear elliptic boundary value
problems on multiprocessors . . . . . . 1117--1128
Terry Bossomaier and
Adrian Loeff Parallel computation of the Hausdorff
distance between images . . . . . . . . 1129--1140
D. Busvine Implementing recursive functions as
processor farms . . . . . . . . . . . . 1141--1153
Y. Kanada A method of vector processing for shared
symbolic data . . . . . . . . . . . . . 1155--1175
M. Gusev and
D. J. Evans New linear systolic arrays for the
string comparison algorithm . . . . . . 1177--1193
J. De Keyser and
D. Roose Load balancing data parallel programs on
distributed memory computers . . . . . . 1199--1219
C. H. Cap and
V. Strumpen Efficient parallel computing in
distributed workstation environments . . 1221--1234
S. L. Johnsson Minimizing the communication time for
matrix multiplication on multiprocessors 1235--1257
B. Hendrickson Parallel $QR$ factorization using the
torus-wrap mapping . . . . . . . . . . . 1259--1271
P. Amodio and
N. Mastronardi A parallel version of the cyclic
reduction algorithm on a hypercube . . . 1273--1281
H. Dhrif and
D. Sarkar Fuzzy arithmetic on systolic arrays . . 1283--1301
Çetin Kaya Koç and
Peter Cappello Systolic arrays for integer Chinese
remaindering . . . . . . . . . . . . . . 1303--1311
S. Hurley Taskgraph mapping using a genetic
algorithm: a comparison of fitness
functions . . . . . . . . . . . . . . . 1313--1317
T. Yang and
A. Gerasoulis List scheduling with and without
communication delays . . . . . . . . . . 1321--1344
F. B. Hanson and
J.-D. Mei and
C. Tier and
H. Xu PDAC: a data parallel algorithm for the
performance analysis of closed queueing
networks . . . . . . . . . . . . . . . . 1345--1358
H. B. Zhou Two-stage $m$-way graph partitioning . . 1359--1373
K.-H. Hoffmann and
J. Zou Parallel efficiency of domain
decomposition methods . . . . . . . . . 1375--1391
M. Kumar and
Y. Baransky and
M. Denneau The GF11 parallel computer . . . . . . . 1393--1412
U. Gartel and
W. Joppich and
A. Schuller Parallelizing the ECMWF's weather
forecast program: the 2D case . . . . . 1413--1425
U. Gartel and
W. Joppich and
A. Schuller First results with a parallelized $3$D
weather prediction code . . . . . . . . 1427--1429
F.-H Hebeker Parallel CFD'93 . . . . . . . . . . . . 1431
Shen Shen Wu and
David Sweeting Heuristic algorithms for task assignment
and scheduling in a processor network 1--14
J. B\la\.zewicz and
M. Drozdowski and
G. Schmidt and
D. de Werra Scheduling independent multiprocessor
tasks on a uniform $k$-processor system 15--28
D. J. Evans and
M. Gusev New linear systolic arrays for digital
filters and convolution . . . . . . . . 29--61
Thomas Schreiber and
Peter Otto and
Fridolin Hofmann A new efficient parallelization strategy
for the $QR$ algorithm . . . . . . . . . 63--75 (or 63--76??)
R. Calinescu and
D. J. Evans A parallel simulation model for load
balancing in clustered distributed
systems . . . . . . . . . . . . . . . . 77--91
Jaime Seguel and
Dorothy Bollman Fast digit-reversal algorithms on a
shared-memory machine . . . . . . . . . 93--99
Shyan-Ming Yuan An efficient fault-tolerant
decentralized commit protocol . . . . . 101--114
Thomas Umland Parallel sorting revisited . . . . . . . 115--124
K. Nagel and
A. Schleicher Microscopic traffic modeling on parallel
high performance computers . . . . . . . 125--146
G. W. Stewart Updating URV decompositions in parallel 151--172
D. M. Beazley and
P. S. Lomdahl Message-passing multi-cell molecular
dynamics on the Connection Machine 5 . . 173--195
M. Angelaccio and
M. Colajanni The row/column pivoting strategy on
multicomputers . . . . . . . . . . . . . 197--213
Michael Conner and
Richard Tolimieri Special purpose hardware for Discrete
Fourier Transform implementation . . . . 215--232
Henry Ker-Chang Chang and
Jonathan Jen-Rong Chen and
Shyong-Jian Shyu A parallel algorithm for the knapsack
problem using a generation and searching
technique . . . . . . . . . . . . . . . 233--243
Antonio d'Acierno and
Roberto Vaccaro On parallelizing recursive neural
networks on coarse-grained parallel
computers: a general algorithm . . . . . 245--256
M. Angelaccio and
M. Colajanni Subcube matrix decomposition: a unifying
view for LU factorization on
multicomputers . . . . . . . . . . . . . 257--270
M. Kiehl Parallel multiple shooting for the
solution of initial value problems . . . 275--295
Lujuan Chen and
E. V. Krishnamurthy and
Iain Macleod Generalised matrix inversion and rank
computation by successive matrix
powering . . . . . . . . . . . . . . . . 297--311
Jian-Jin Li Multiscattering on the Cube-Connected
Cycles . . . . . . . . . . . . . . . . . 313--324
D. J. Evans and
W. U. N. Butt Load balancing with network partitioning
using host groups . . . . . . . . . . . 325--345
Tzung-Pei Hong and
Shian-Shyong Tseng An optimal parallel perceptron learning
algorithm for a large training set . . . 347--352
Jong-Chuang Tsay and
Wei-Ping Lee An optimal parallel algorithm for
generating permutations in minimal
change order . . . . . . . . . . . . . . 353--361
M. L. Sawley and
C. M. Bergman A comparative study of the use of the
data-parallel approach for compressible
flow calculations . . . . . . . . . . . 363--373
A. Asenov and
D. Reid and
J. R. Barker Speed-up of scalable iterative linear
solvers implemented on an array of
transputers . . . . . . . . . . . . . . 375--387
Roger W. Hockney The communication challenge for MPP:
Intel Paragon and Meiko CS-2 . . . . . . 389--398
U. Kleis and
J. M. Singer and
I. Morgenstern and
Th. Hußlein and
H.-G. Matuttis Experiences with re-engineering and
parallelizing a high-T$_c$
superconductivity code . . . . . . . . . 399--407
Anonymous Parallel Computing 93 . . . . . . . . . 409
Oliver A. McBryan An overview of message passing
environments . . . . . . . . . . . . . . 417--443 (or 417--444??)
Vasanth Bala and
Jehoshua Bruck and
Raymond Bryant and
Robert Cypher and
Peter de Jong and
Pablo Elustondo and
D. Frye and
Alex Ho and
Ching-Tien Ho and
Gail Irwin and
Shlomo Kipnis and
Richard Lawrence and
Marc Snir The IBM External User Interface for
scalable parallel systems . . . . . . . 445--462
Paul Pierce The NX message passing interface . . . . 463--480
Lewis W. Tucker and
Alan Mainwaring CMMD: Active messages on the CM-5 . . . 481--496
Eric Barton and
James Cownie and
Moray McLaren Message passing on the Meiko CS-2 . . . 497--507
M. Schmidt-Voigt Efficient parallel communication with
the nCUBE 2S processor . . . . . . . . . 509--530
V. S. Sunderam and
G. A. Geist and
J. Dongarra and
R. Manchek The PVM concurrent computing system:
Evolution, experiences, and trends . . . 531--545
Ralph M. Butler and
Ewing L. Lusk Monitors, messages, and clusters: The p4
parallel programming system . . . . . . 547--564
Anthony Skjellum and
Steven G. Smith and
Nathan E. Doss and
Alvin P. Leung and
Manfred Morari The design and evolution of Zipcode . . 565--596
Jon Flower and
Adam Kolawa Express is not just a message passing
system: Current and future directions in
Express . . . . . . . . . . . . . . . . 597--614
R. Calkin and
R. Hempel and
H.-C. Hoppe and
P. Wypior Portable programming with the PARMACS
message-passing library . . . . . . . . 615--632
Nicholas J. Carriero and
David Gelernter and
Timothy G. Mattson and
Andrew H. Sherman The Linda alternative to message-passing
systems . . . . . . . . . . . . . . . . 633--655
David W. Walker The design of a standard message passing
interface for distributed memory
concurrent computers . . . . . . . . . . 657--673
Alain Darte and
Yves Robert Mapping uniform loop nests onto
distributed memory architectures . . . . 679--710
Jingling Xue Automating non-unimodular loop
transformations for massive parallelism 711--728
David J. Lilja A multiprocessor architecture combining
fine-grained and coarse-grained
parallelism strategies . . . . . . . . . 729--751
Mark T. Jones and
Paul E. Plassmann Scalable iterative solution of sparse
linear systems . . . . . . . . . . . . . 753--773
Wei Ping Lee and
Jong Chuang Tsay A systolic design for generating
permutations in lexicographic order . . 775--785
D. J. Evans and
W. S. Yousif The solution of unsymmetric tridiagonal
Toeplitz systems by the strides
reduction algorithm . . . . . . . . . . 787--798
E. V. Krishnamurthy and
Vikram Krishnamurthy An ANN model perceptron algorithm using
generalized matrix inversion . . . . . . 799--806
E. Montagne and
M. Rukoz and
R. Surós and
F. Breant Modeling optimal granularity when
adapting systolic algorithms to
transputer based supercomputers . . . . 807--814
Y. F. Hu and
R. J. Blake Numerical experiences with partitioning
of unstructured meshes . . . . . . . . . 815--829
S. Selvakumar and
C. Siva Ram Murthy Static task allocation of concurrent
programs for distributed computing
systems with processor and resource
heterogeneity . . . . . . . . . . . . . 835--851
Jianjian Song A partially asynchronous and iterative
algorithm for distributed load balancing 853--868
Dongseung Kim and
Byung-Guoen Yi A two-pass scheduling algorithm for
parallel programs . . . . . . . . . . . 869--885
Tien-Yu Huang and
Jean-Lien C. Wu Alternate resolution strategy in
multistage interconnection networks . . 887--896
Bao Lin Zhang and
Wen Zhi Li On Alternating Segment Crank--Nicolson
scheme (Short communication) . . . . . . 897--902
C. R. Wan and
D. J. Evans A systolic array architecture for $QR$
decomposition of block structured sparse
systems . . . . . . . . . . . . . . . . 903--914
Kapil K. Mathur and
S. Lennart Johnsson Multiplication of matrices of arbitrary
shape on a data parallel computer . . . 919--951
Inge Gutheil and
Werner Krotz-Vogel Performance of a parallel matrix
multiplication routine on Intel iPSC/860 953--974
H. Suman and
K. Schilling A comparative study of gauge fixing
procedures on the connection machines
CM2 and CM5 . . . . . . . . . . . . . . 975--990
Chang-ming Ma Implementation of a Monte Carlo code on
a parallel computer system . . . . . . . 991--1005
Hsiao-Hsi Wang and
Ruei-Chuan Chang A distributed shared memory system with
self-adjusting coherence scheme . . . . 1007--1025
Takenori Makino Shift-net and power shift-net for
parallel processor systems . . . . . . . 1027--1039
Jean-Lien C. Wu and
T.-Y. Huang A new bus contention scheme in S/NET
with dynamic priority . . . . . . . . . 1041--1054
D. J. Evans and
E. Galligani A parallel additive preconditioner for
conjugate gradient method for $AX+XB=C$ 1055--1064
Johan De Keyser and
Kurt Lust and
Dirk Roose Run-time load balancing support for a
parallel multiblock Euler/Navier--Stokes
code with adaptive refinement on
distributed memory computers . . . . . . 1069--1088
Hong Zhang and
William F. Moss Using parallel banded linear system
solvers in generalized eigenvalue
problems . . . . . . . . . . . . . . . . 1089--1105
Sabine Van Huffel and
Haesun Park Parallel tri- and bi-diagonalization of
bordered bidiagonal matrices . . . . . . 1107--1128
T. F. Pena and
E. L. Zapata and
D. J. Evans Finite element simulation of
semiconductor devices on multiprocessor
computers . . . . . . . . . . . . . . . 1129--1159
Nicholas J. Higham and
Pythagoras Papadimitriou A parallel algorithm for computing the
polar decomposition . . . . . . . . . . 1161--1173
P. Yalamov and
D. J. Evans On the forward stability of a modified
`stride of $3$' reduction method . . . . 1175--1190
Amit J. Basu A parallel algorithm for spectral
solution of the three-dimensional
Navier--Stokes equations . . . . . . . . 1191--1204
Richard E. Overill and
Stephen Wilson Performance of parallel algorithms for
the evaluation of power series . . . . . 1205--1213
David W. Walker Erratum to: ``The design of a standard
message passing interface for
distributed memory concurrent
computers'' . . . . . . . . . . . . . . 1215--1215
L. C. Polymenakos and
D. P. Bertsekas Parallel shortest path auction
algorithms . . . . . . . . . . . . . . . 1221--1247
Qi Gan and
Qing Yang and
Chen-Yi Hu Parallel all-row preconditioned interval
linear solver for nonlinear equations on
multiprocessors . . . . . . . . . . . . 1249--1268
Jeffrey T. Draper and
Joydeep Ghosh The M-cache: a message-handling
mechanism for multicomputer systems . . 1269--1288
Abdel Aziz Farrag Tolerating faulty edges in a
multi-dimensional mesh . . . . . . . . . 1289--1301
Abhay Jain and
N. S. Chaudhari Efficient parallel recognition of
context-free languages . . . . . . . . . 1303--1321 (or 1303--1322??)
Yen Chun Lin New systolic arrays for the longest
common subsequence problem$^+$ . . . . . 1323--1334
Saulo R. M. Barros and
Tuomo Kauranne On the parallelization of global
spectral weather models . . . . . . . . 1335--1356
Jun Makino Lagged-Fibonacci random number
generators on parallel computers . . . . 1357--1367
Frank Dehne and
Afonso Ferreira and
Andrew Rau-Chaplin A massively parallel knowledge-base
server using a hypercube multiprocessor 1369--1382
Oliver A. McBryan The SUPRENUM and GENESIS projects . . . 1389--1396
Ulrich Trottenberg Some remarks on the SUPRENUM project . . 1397--1406
W. K. Giloi The SUPRENUM supercomputer: Goals,
achievements, and lessons learned . . . 1407--1425
Oliver A. McBryan SUPRENUM: Perspectives and performance 1427--1442
Wolfgang K. Giloi Parallel supercomputer architectures and
their programming models . . . . . . . . 1443--1470
Wolfgang Schröder-Preikschat PEACE --- a software backplane for
parallel computing . . . . . . . . . . . 1471--1485
Hans P. Zima and
Peter Brezany and
Barbara M. Chapman SUPERB and Vienna Fortran . . . . . . . 1487--1517
R. Hempel Application programming interfaces for
SUPRENUM . . . . . . . . . . . . . . . . 1519--1526
Hermann Mierendorff and
Helmut Schwamborn and
Maurizio Tazza Performance modelling of grid problems
--- a case study on the SUPRENUM system 1527--1546
Manfred Alef Implementation of a multigrid algorithm
on SUPRENUM and other systems . . . . . 1547--1557
Hubert Ritzdorf and
Anton Schüller and
Barbara A. Steckel and
Klaus Stüben $L_i$SS --- an environment for the
parallel multigrid solution of partial
differential equations on general 2D
domains . . . . . . . . . . . . . . . . 1559--1570
Ortwin Pätzold and
Anton Schüller and
Horst Schwichtenberg Parallel applications and performance
measurements on SUPRENUM . . . . . . . . 1571--1582
Georg Fleischmann and
Matthias Gente and
Fridolin Hofmann and
Gunter Bolch Performance analysis of parallel
programs based on model calculations . . 1583--1603
Tony Hey The Genesis Esprit project --- an
overview . . . . . . . . . . . . . . . . 1605--1612
Otto Kolp Performance estimation for a parallel
system with a hierarchical switch
network . . . . . . . . . . . . . . . . 1613--1626
Jon Beecroft and
Mark Homewood and
Moray McLaren Meiko CS-2 interconnect Elan-Elite
design . . . . . . . . . . . . . . . . . 1627--1638
L. M. Delves and
C. A. Addison and
O. A. Aziz The design and implementation of a
portable parallel numerical library . . 1639--1651
C. A. Addison and
V. S. Getov and
A. J. G. Hey and
R. W. Hockney and
I. C. Wolton Benchmarking for distributed memory
parallel systems: Gaining insight from
numbers . . . . . . . . . . . . . . . . 1653--1668
Karl Solchenbach and
Clemens-August Thole and
Ulrich Trottenberg GENESIS application software . . . . . . 1669--1673
Edgar A. Gerteisen Preliminary performance results of the
massive parallel Aircraft Euler Method 1675--1683
Tuomo Kauranne Summary of GENESIS work at the European
Centre for Medium-range Weather
Forecasts (ECMWF) . . . . . . . . . . . 1685--1688
J. J. H. Miller and
S. Wang On the implementation of a $3$-D
semiconductor device simulator on
distributed-memory MIMD/SIMD machines 1689--1691
A. Dubey and
M. Zubair and
C. E. Grosch A general purpose subroutine for fast
Fourier transform on a distributed
memory parallel machine . . . . . . . . 1697--1710
Ralf Östermark and
Martin Saarinen Parallel implementation of a VARMAX
algorithm . . . . . . . . . . . . . . . 1711--1720
Shu Hua Hu and
Hsing Lung Chen An effective routing algorithm in
incomplete hypercubes . . . . . . . . . 1721--1738
M. S. Horng and
D. J. Chen and
Kuo Lung Ku Parallel routing algorithms for
incomplete hypercube interconnection
networks . . . . . . . . . . . . . . . . 1739--1761
Kemal Efe and
P. K. Blackwell and
W. Slough and
T. Shiau Topological properties of the crossed
cube architecture . . . . . . . . . . . 1763--1775
Samir W. Mahfoud and
David E. Goldberg Parallel recombinative simulated
annealing: a genetic algorithm . . . . . 1--28
R. Van Driessche and
D. Roose An improved spectral bisection algorithm
and its application to dynamic load
balancing . . . . . . . . . . . . . . . 29--48
Claus Bendtsen and
Per Christian Hansen and
Kaj Madsen and
Hans Bruun Nielsen and
Mustafa Pinar Implementation of $QR$ up- and
downdating on a massively parallel
computer . . . . . . . . . . . . . . . . 49--61
T. H. C. Smith and
G. L. Thompson A parallel implementation of the column
subtraction algorithm . . . . . . . . . 63--71
A. De Matteis and
S. Pagnutti Controlling correlations in parallel
Monte Carlo . . . . . . . . . . . . . . 73--84
Sathiamoorthy Manoharan and
Nigel P. Topham An assessment of assignment schemes for
dependency graphs . . . . . . . . . . . 85--107
D. J. Evans and
S. A. Amin Systolic algorithms for digital image
filtering . . . . . . . . . . . . . . . 109--119
Kuninobu Tanno and
Toshihiro Taketa and
Susumu Horiguchi Parallel FFT algorithms using radix 4
butterfly computation on an
eight-neighbor processor array . . . . . 121--136
Chi-kin Lee and
Mounir Hamdi Practical aspects and experiences:
Parallel image processing applications
on a network of workstations . . . . . . 137--160
Howard C. Elman and
Dennis K.-Y. Lee Use of linear algebra kernels to build
an efficient finite element solver . . . 161--173
J. De Keyser and
D. Roose Run-time load balancing techniques for a
parallel unstructured multi-grid Euler
solver with adaptive grid refinement . . 179--198
Tilmann Bönniger and
Rüdiger Esser and
Dietrich Krekel CM-5E, KSR2, Paragon XP/S: a comparative
description of massively parallel
computers . . . . . . . . . . . . . . . 199--232
Juan C. Agüí and
Javier Jiménez A binary tree implementation of a
parallel distributed tridiagonal solver 233--241
Emmanouel A. Varvarigos and
Dimitri P. Bertsekas Transposition of banded matrices in
hypercubes: a nearly isotropic task . . 243--264
E. Lega and
H. Scholl and
J.-M. Alimi and
A. Bijaoui and
P. Bury A parallel algorithm for structure
detection based on wavelet and
segmentation analysis . . . . . . . . . 265--285
F. J. Muniz and
E. J. Zaluska Parallel load-balancing: an extension to
the gradient model . . . . . . . . . . . 287--301
Hong Shen An efficient permutation-based parallel
algorithm for range-join in hypercubes 303--313
M. Y. Mohd-Saman and
D. J. Evans Inter-procedural analysis for parallel
computing . . . . . . . . . . . . . . . 315--338
Zaher Mahjoub and
Mohamed Jemni Restructuring and parallelizing a static
conditional loop . . . . . . . . . . . . 339--347
F. Desprez and
B. Tourancheau Basic routines for the rank-$2k$ update:
2D torus vs.\ reconfigurable network . . 353--372
Jörg-Thomas Pfenning and
Christoph Moll Optimized communication patterns on
workstation clusters . . . . . . . . . . 373--388
Liu Yong and
Kang Lishan and
D. J. Evans The annealing evolution algorithm as
function optimizer . . . . . . . . . . . 389--400
S. Crivelli and
E. R. Jessup The cost of eigenvalue computation on
distributed-memory MIMD multiprocessors 401--422
L. Nicastro and
N. D'Amico An optimized mass storage FFT for vector
computers . . . . . . . . . . . . . . . 423--432
R. Sridhar and
N. Chandrasekharan Highly parallelizable problems on sorted
intervals . . . . . . . . . . . . . . . 433--446
K. G. Kumar and
D. B. Skillicorn Data parallel geometric operations on
lists . . . . . . . . . . . . . . . . . 447--459
Zhaofang Wen Fast parallel algorithms for the maximum
sum problem . . . . . . . . . . . . . . 461--466
D. Moncrieff and
R. E. Overill and
S. Wilson $\alpha_{\mbox{critical}}$ for parallel
processors . . . . . . . . . . . . . . . 467--471
Pontus Matstoms Parallel sparse $QR$ factorization on
shared memory architectures . . . . . . 473--486
Pasqua D'Ambra and
Giulio Giunta Concurrent banded Cholesky factorization
on workstation networks using PVM . . . 487--494
Frederic Desprez and
Marc Garbey Numerical simulation of a combustion
problem on a Paragon machine . . . . . . 495--508
Gerhard Globisch PARMESH --- a parallel mesh generator 509--524
David M. Nicol Noncommittal barrier synchronization . . 529--549
Rolf Borgeest and
Bernward Dimke and
Olav Hansen A trace based performance evaluation
tool for parallel real time systems . . 551--564
Bai Zhongzhi and
Wang Deren and
D. J. Evans Models of asynchronous parallel matrix
multisplitting relaxed iterations . . . 565--582
L. F. Romero and
E. L. Zapata Data distributions for sparse matrix
vector multiplication . . . . . . . . . 583--605
N. M. Bahoshy and
D. J. Evans A general harness for explicit parallel
programming . . . . . . . . . . . . . . 607--617
M. P. Bekakos A notational approach to formulation of
systolic array programs (Short
communication) . . . . . . . . . . . . . 619--626
Xiaodong Zhang Parallelizing an oil refining
simulation: Numerical methods,
implementations and experience . . . . . 627--647
Albert Y. Zomaya Parallel processing for robot dynamics
computations . . . . . . . . . . . . . . 649--668
A. Asenov and
D. Reid and
J. R. Barker Speed-up of scalable iterative linear
solvers implemented on an array of
transputers . . . . . . . . . . . . . . 669--682
G. A. Kohring Dynamic load balancing for parallelized
particle simulations on MIMD computers 683--693
Takuya Terasawa and
Ou Yamamoto and
Tomohiro Kudoh and
Hideharu Amano A performance evaluation of the
multiprocessor testbed ATTEMPT-0 . . . . 701--730
Susanne E. Hambrusch and
Farooq Hameed and
Ashfaq A. Khokhar Communication operations on
coarse-grained mesh architectures . . . 731--751 (or 731--752??)
Shuichi Sakai and
Yuetsu Kodama and
Mitsuhisa Sato and
Andrew Shaw and
Hiroshi Matsuoka and
Hideo Hirono and
Kazuaki Okamoto and
Takashi Yokota Reduced interprocessor-communication
architecture and its implementation on
EM-4 . . . . . . . . . . . . . . . . . . 753--769 (or 753--770??)
Dilip K. Saikia and
Ranjan K. Sen Order preserving communication on a star
network . . . . . . . . . . . . . . . . 771--782
M. A. de Rosa and
G. Giunta and
M. Rizzardi Parallel Talbot's algorithm for
distributed memory machines . . . . . . 783--801 (or 783--802??)
M. Cannataro and
S. Di Gregorio and
R. Rongo and
W. Spataro and
G. Spezzano and
D. Talia A parallel cellular automata environment
on multicomputers for computational
science . . . . . . . . . . . . . . . . 803--823 (or 803--824??)
K. G. Margaritis On the systolic implementation of
associative memory artificial neural
networks . . . . . . . . . . . . . . . . 825--840
Ling Chen and
Henry Y. H. Chuang An efficient algorithm for complete
Euclidean distance transform on
mesh-connected SIMD (Short
communication) . . . . . . . . . . . . . 841--852
Marek T. Michalewicz and
Mark Priebatsch Perfect scaling of the electronic
structure problem on a SIMD architecture 853--870
Robert B. Schnabel A view of the limitations,
opportunities, and challenges in
parallel nonlinear optimization . . . . 875--905
Kai Rothe and
Heinrich Voss A fully parallel condensation method for
generalized eigenvalue problems on
distributed memory computers . . . . . . 907--921
Arkady Kanevsky and
Chao Feng On the embedding of cycles in pancake
graphs . . . . . . . . . . . . . . . . . 923--936
Dieter Müller-Wichards and
Wolfgang Rönsch Scalability of algorithms: an analytic
approach . . . . . . . . . . . . . . . . 937--952
Tzong Wann Kao and
Shi Jinn Horng Optimal algorithms for computing
articulation points and some related
problems on a circular-arc graph (Short
communication) . . . . . . . . . . . . . 953--969
John Brown and
Jerzy Was\'niewski and
Zahari Zlatev Practical aspects and experiences.
Running air pollution models on
massively parallel machines . . . . . . 971--991
Vamsee Lakamsani and
Laxmi N. Bhuyan and
D. Scott Linthicum Practical aspects and experiences.
Mapping molecular dynamics computations
on to hypercubes . . . . . . . . . . . . 993--1013
Jun Makino and
Osamu Miyamura Parallelized feedback shift register
generators of pseudorandom numbers . . . 1015--1028
Tony F. Chan and
Jian Ping Shao Parallel complexity of domain
decomposition methods and optimal coarse
grid size . . . . . . . . . . . . . . . 1033--1049
Hugo Embrechts and
Dirk Roose MIMD divide-and-conquer algorithms for
the distance transformation. Part I:
City Block distance . . . . . . . . . . 1051--1076
Hugo Embrechts and
Dirk Roose MIMD divide-and-conquer algorithms for
the distance transformation. Part II.
Chamfer $3$-$4$ distance . . . . . . . . 1077--1096
Pierluigi Amodio and
Luigi Brugnano The parallel $QR$ factorization
algorithm for tridiagonal linear systems 1097--1110
P. Yalamov and
D. J. Evans The $WZ$ matrix factorisation method . . 1111--1120
Edward Rothberg Alternatives for solving sparse
triangular systems on distributed-memory
multiprocessors . . . . . . . . . . . . 1121--1136
N. Floros and
J. S. Reeve Evaluation of a spectral element CFD
code on parallel architectures . . . . . 1137--1150
A. Averbuch and
M. Israeli and
L. Vozovoi Parallel implementation of non-linear
evolution problems using parabolic
domain decomposition . . . . . . . . . . 1151--1183
Michael W. Berry and
Jack J. Dongarra and
Youngbae Kim A parallel algorithm for the reduction
of a nonsymmetric matrix to block
upper-Hessenberg form . . . . . . . . . 1189--1211
C. Trefftz and
C. C. Huang and
P. K. McKinley and
T.-Y. Li and
Z. Zeng A scalable eigenvalue solver for
symmetric tridiagonal matrices . . . . . 1213--1240
Xian-He Sun Application and accuracy of the parallel
diagonal dominant algorithm . . . . . . 1241--1267
H. R. Barada Modular matrix computations on
multi-linear VLSI arrays . . . . . . . . 1269--1284
Paraskevas Evripidou and
Jean-Luc Gaudiot Incorporating input/output operations
into dynamic data-flow graphs . . . . . 1285--1311
Clark F. Olson Parallel algorithms for hierarchical
clustering . . . . . . . . . . . . . . . 1313--1325
Tom Altman and
Yoshihide Igarashi and
Koji Obokata Hyper-ring connection machines . . . . . 1327--1338
J. P. Geschiere and
H. A. G. Wijshoff Exploiting large grain parallelism in a
sparse direct linear system solver . . . 1339--1364
G. Casciola and
S. Morigi Graphics in parallel computation for
rendering $3$D modelled scenes . . . . . 1365--1382
Jaeyoung Choi and
Jack J. Dongarra and
David W. Walker Parallel matrix transpose algorithms on
distributed memory concurrent computers 1387--1405
Gita Alaghband Parallel sparse matrix solution and
performance . . . . . . . . . . . . . . 1407--1430
Bassem F. Beidas and
George P. Papavassilopoulos Distributed asynchronous algorithms with
stochastic delays for constrained
optimization problems with conditions of
time drift . . . . . . . . . . . . . . . 1431--1450
Fotis Barlos and
Ophir Frieder A load balanced multicomputer relational
database system for highly skewed data 1451--1483
Akiyoshi Wakatani and
Michael Wolfe Optimization of array redistribution for
distributed memory multicomputers . . . 1485--1490
Umpei Nagashima and
Sachiko Hyugaji and
Satoshi Sekiguchi and
Mitsuhisa Sato and
Haruo Hosoya An experience with super-linear speedup
achieved by parallel computing on a
workstation cluster: Parallel
calculation of density of states of
large scale cyclic polyacenes . . . . . 1491--1504
Jesper Larsson Träff An experimental comparison of two
distributed single-source shortest path
algorithms . . . . . . . . . . . . . . . 1505--1532
J. Drake and
I. Foster Guest Editorial: Parallel computing in
climate and weather modeling . . . . . . 1537
J. Drake and
I. Foster Introduction to the special issue on
parallel computing in climate and
weather modeling . . . . . . . . . . . . 1539--1544
James J. Hack and
James M. Rosinski and
David L. Williamson and
Byron A. Boville and
John E. Truesdale Computational design of the NCAR
community climate model . . . . . . . . 1545--1569
John Drake and
Ian Foster and
John Michalakes and
Brian Toonen and
Patrick Worley Design and performance of a scalable
parallel community climate model . . . . 1571--1591
Steven W. Hammond and
Richard D. Loft and
John M. Dennis and
Richard K. Sato Implementation and performance issues of
a massively parallel atmospheric model 1593--1619
S. R. M. Barros and
D. Dent and
L. Isaksen and
G. Robinson and
G. Mozdzynski and
F. Wollenweber The IFS model: a parallel production
weather code . . . . . . . . . . . . . . 1621--1638
J. G. Sela Weather forecasting on parallel
architectures . . . . . . . . . . . . . 1639--1654
M. F. Wehner and
A. A. Mirin and
P. G. Eltgroth and
W. P. Dannevik and
C. R. Mechoso and
J. D. Farrara and
J. A. Spahr Performance of a distributed memory
finite difference atmospheric general
circulation model . . . . . . . . . . . 1655--1675
Philip W. Jones and
Christopher L. Kerr and
Richard S. Hemler Practical considerations in development
of a parallel SKYHI general circulation
model . . . . . . . . . . . . . . . . . 1677--1694
Rainer Bleck and
Sumner Dean and
Matthew O'Keefe and
Aaron Sawdey A comparison of data-parallel and
message-passing versions of the Miami
Isopycnic Coordinate Ocean Model (MICOM) 1695--1720
Y. Nota An efficient parallel discrete PDE
solver . . . . . . . . . . . . . . . . . 1725--1748
Chang Shu and
Hilary Buxton Parallel path planning on the
distributed array processor . . . . . . 1749--1767
Nathan Mattor and
Timothy J. Williams and
Dennis W. Hewett Algorithm for solving tridiagonal matrix
problems in parallel . . . . . . . . . . 1769--1782
Suchendra M. Bhandarkar and
Hamid R. Arabnia The REFINE multiprocessor ---
Theoretical properties and algorithms 1783--1805
Ramachandran Vaidyanathan and
Anand Padmanabhan Short communication: Bus-based networks
for fan-in and uniform hypercube
algorithms . . . . . . . . . . . . . . . 1807--1821
N. Floros and
J. S. Reeve and
J. Clinckemaillie and
S. Vlachoutsis and
G. Lonsdale Comparative efficiencies of domain
decompositions . . . . . . . . . . . . . 1823--1835
Mats Holmström Practical aspects and experiences:
Parallelizing the fast wavelet transform 1837--1848
M. Briscolini A parallel implementation of a $3$-D
pseudospectral based code on the IBM
9076 scalable POWER parallel system . . 1849--1862
T. Dehn and
M. Eiermann and
K. Giebermann and
V. Sperling Structured sparse matrix-vector
multiplication on massively parallel
SIMD architectures . . . . . . . . . . . 1867--1894
PeiZong Z. Lee Techniques for compiling programs on
distributed memory multicomputers . . . 1895--1923
C. S. Yang and
Y. M. Tsai and
S. L. Chi and
Shepherd S. B. Shi Adaptive wormhole routing in $k$-ary
$n$-cubes . . . . . . . . . . . . . . . 1925--1943
J. B\la\.zewicz and
M. Drozdowski Short Communication: Scheduling
divisible jobs on hypercubes . . . . . . 1945--1956
Sergio De Agostino Short communication: a parallel decoding
algorithm for LZ2 data compression . . . 1957--1961
Chandra N. Sekharan and
Vineet Goel and
R. Sridhar Load balancing methods for ray tracing
and binary tree computing using PVM . . 1963--1978
Gerhard Globisch On an automatically parallel generation
technique for tetrahedral meshes . . . . 1979--1995
Murray Dow Transposing a matrix on a vector
computer . . . . . . . . . . . . . . . . 1997--2005
Bruno Lang Parallel reduction of banded matrices to
bidiagonal form . . . . . . . . . . . . 1--18
Francisco Argüello and
Margarita Amor and
Emilio L. Zapata FFTs on mesh connected computers . . . . 19--38
S. A. Savari and
D. P. Bertsekas Finite termination of asynchronous
iterative algorithms . . . . . . . . . . 39--56
E. de Sturler A performance model for Krylov subspace
methods on mesh-based parallel computers 57--74
Himanshu Gupta and
P. Sadayappan Communication-efficient matrix
multiplication on hypercubes . . . . . . 75--99
A. Baronio and
F. Zama A domain decomposition technique for
spline image restoration on distributed
memory systems . . . . . . . . . . . . . 101--110
Donald Dabdub and
John H. Seinfeld Parallel computation in atmospheric
chemical modeling . . . . . . . . . . . 111--130
R. Hempel and
R. Calkin and
R. Hess and
W. Joppich and
C. W. Oosterlee and
H. Ritzdorf and
P. Wypior and
W. Ziegler and
N. Koike and
T. Washio and
U. Keller Real applications on the new parallel
system NEC Cenju-3 . . . . . . . . . . . 131--148
Andreas Uhl Wavelet packet best basis selection on
moderate parallel MIMD architectures . . 149--158
C. S. Ierotheou and
S. P. Johnson and
M. Cross and
P. F. Leggett Computer aided parallelisation tools
(CAPTools) --- conceptual overview and
performance on the parallelisation of
structured mesh codes . . . . . . . . . 163--195
S. P. Johnson and
M. Cross and
M. G. Everett Exploitation of symbolic information in
interprocedural dependence analysis . . 197--226
S. P. Johnson and
C. S. Ierotheou and
M. Cross Automatic parallel code generation for
message passing on distributed memory
systems . . . . . . . . . . . . . . . . 227--258
P. F. Leggett and
A. T. J. Marsh and
S. P. Johnson and
M. Cross Integrating user knowledge with
information from parallelisation tools
to facilitate the automatic generation
of efficient parallel FORTRAN code . . . 259--288
L. Colombet and
Ph. Michallon and
D. Trystram Parallel matrix-vector product on rings
with a minimum of communications . . . . 289--310
Yu-Hua Lee and
Shi-Jinn Horng and
Tzong-Wann Kao and
Ferng-Shi Jaung and
Yuung-Jih Chen and
Horng-Ren Tsai Parallel computation of exact Euclidean
distance transform . . . . . . . . . . . 311--325
Theodore Johnson and
Timothy A. Davis and
Steven M. Hadfield A concurrent dynamic task graph . . . . 327--333
Jingling Xue Transformations of nested loops with
non-convex iteration spaces . . . . . . 339--368
Bruce Boldon and
Narsingh Deo and
Nishit Kumar Minimum-weight degree-constrained
spanning tree problem: Heuristics and
implementation on an SIMD parallel
machine . . . . . . . . . . . . . . . . 369--382
Peter Fiebach Cyclic block-algorithms for solving
triangular systems on distributed-memory
multiprocessors with mesh topology . . . 383--393
Imtiaz Ahmad and
Muhammad K. Dhodhi Multiprocessor scheduling in a genetic
paradigm . . . . . . . . . . . . . . . . 395--406
D. Moncrieff and
R. E. Overill and
S. Wilson Heterogeneous computing machines and
Amdahl's law . . . . . . . . . . . . . . 407--413
Roland Wismüller and
Michael Oberhuber and
Johann Krammer and
Olav Hansen Interactive debugging and performance
analysis of massively parallel
applications . . . . . . . . . . . . . . 415--442
F. Gutbrod and
N. Attig and
M. Weber The SU(2)-Lattice Gauge Theory
simulation code on the Intel Paragon
supercomputer . . . . . . . . . . . . . 443--463
M. M. Shearer Computational optimization of finite
difference methods on the CM5 . . . . . 465--481
Samuel Kortas and
Philippe Angot A practical and portable model of
programming for iterative solvers on
distributed memory machines . . . . . . 487--512
S. Oliveira Parallel multigrid methods for transport
equations: the anisotropic case . . . . 513--537
Markus Hegland Real and complex fast Fourier transforms
on the Fujitsu VPP 500 . . . . . . . . . 539--553
Roni Khardon and
Shlomit S. Pinter Partitioning and scheduling to
counteract overhead . . . . . . . . . . 555--593
Sotirios G. Ziavras and
Arup Mukherjee Data broadcasting and reduction, prefix
computation, and sorting on reduced
hypercube parallel computers . . . . . . 595--606
Lin Chen Partitioning graphs into Hamiltonian
ones . . . . . . . . . . . . . . . . . . 607--618
A. T. Chronopoulos and
C. D. Swanson Parallel iterative ${S}$-step methods
for unsymmetric linear systems . . . . . 623--641
D. Conforti and
L. De Luca and
L. Grandinetti and
R. Musmanno A parallel implementation of automatic
differentiation for partially separable
functions using PVM . . . . . . . . . . 643--656
Y. Trémolet and
F.-X. Le Dimet Parallel algorithms for variational data
assimilation and coupling models . . . . 657--674
Dugki Min and
Matt W. Mutka A model for analyzing interactions in
$2$-D mesh wormhole-routed
multicomputers . . . . . . . . . . . . . 675--699
Borut Robi\vc and
Bo\vstjan Vilfan Improved schemes for mapping arbitrary
algorithms onto processor meshes . . . . 701--724
Klaus Stüben and
Hermann Mierendorff and
Clemens-August Thole and
Owen Thomas Industrial parallel computing with real
codes . . . . . . . . . . . . . . . . . 725--737
Umakishore Ramachandran and
Gautam Shah and
S. Ravikumar and
Jeyakumar Muthukumarasamy Scalability study of the KSR-1 . . . . . 739--759
G. Fabbretti and
A. Farina and
D. Laforenza and
F. Vinelli Mapping the synthetic aperture radar
signal processor on a distributed-memory
MIMD architecture . . . . . . . . . . . 761--784
William Gropp and
Ewing Lusk and
Nathan Doss and
Anthony Skjellum High-performance, portable
implementation of the MPI Message
Passing Interface Standard . . . . . . . 789--828
Y. F. Hu and
D. R. Emerson and
R. J. Blake The communication performance of the
Cray T3D and its effect on iterative
solvers . . . . . . . . . . . . . . . . 829--844
M. Chandwani and
N. S. Chaudhari Formulation and analysis of parallel
context-free recognition and parsing on
a PRAM model . . . . . . . . . . . . . . 845--868
Mats Brorsson and
Per Stenström Characterising and modelling shared
memory accesses in multiprocessor
programs . . . . . . . . . . . . . . . . 869--893
Sanjeev R. Rastogi and
Norman J. Wagner A parallel algorithm for Lees-Edwards
boundary conditions . . . . . . . . . . 895--901
Leszek Ga\csieniec and
Andrzej Pelc Adaptive broadcasting with faulty nodes 903--912
Zhiwei Xu and
Kai Hwang Early prediction of MPP performance: The
SP2, T3D, and Paragon experiences . . . 917--942
S. Lanteri Parallel solutions of compressible flows
using overlapping and non-overlapping
mesh partitioning strategies . . . . . . 943--968
Mark A. Franklin and
Vasudha Govindan A general matrix iterative model for
dynamic load balancing . . . . . . . . . 969--989
Paraskevi Fragopoulou and
Selim G. Akl Spanning subgraphs with applications to
communication on the multidimensional
torus network . . . . . . . . . . . . . 991--1015
N. Bassiliades and
I. Vlahavas Hierarchical query execution in a
parallel object-oriented database system 1017--1048
M. Surridge and
D. J. Tildesley and
Y. C. Kong and
D. B. Adolf Practical aspects and experiences. A
parallel molecular dynamics simulation
code for dialkyl cationic surfactants 1053--1071
Frank C. Wimberly and
Michael H. Lambert and
Nicholas A. Nystrom and
Alex Ropelewski and
William Young Porting third-party applications
packages to the Cray T3D: Programming
issues and scalability results . . . . . 1073--1089
Josep-Lluis Larriba-Pey and
Juan J. Navarro and
Angel Jorba and
Oriol Roig Review of general and Toeplitz vector
bidiagonal solvers . . . . . . . . . . . 1091--1125 (or 1091--1126??)
Peter K. K. Loh Artificial intelligence search
techniques as fault-tolerant routing
strategies . . . . . . . . . . . . . . . 1127--1147
H. H. ten Cate and
E. A. H. Vollebregt On the portability and efficiency of
parallel algorithms and software . . . . 1149--1163
Ignacio Martín Llorente and
Francisco Tirado and
Luis Vázquez Some aspects about the scalability of
scientific applications on parallel
architectures . . . . . . . . . . . . . 1169--1195
Goran Lj. Djordjevi\'c and
Milorad B. To\vsi\'c A heuristic for scheduling task graphs
with communication delays onto
multiprocessors . . . . . . . . . . . . 1197--1214
Jerry C. Yan and
Sekhar R. Sarukkai Analyzing parallel program performance
using normalized performance indices and
trace transformation techniques . . . . 1215--1237
Abdel Aziz Farrag New algorithm for constructing
fault-tolerant solutions of the
circulant graph configuration . . . . . 1239--1253 (or 1239--1254??)
C. Calvin Implementation of parallel FFT
algorithms on distributed memory
machines with a minimum overhead of
communication . . . . . . . . . . . . . 1255--1279
Maria Antonietta Pirozzi A fast numerical method for mildly
nonlinear parabolic initial boundary
value problems. II: The parallel
implementation on the Intel Touchstone
Delta system . . . . . . . . . . . . . . 1281--1285
K. A. Gallivan and
B. A. Marsolf and
H. A. G. Wijshoff Solving large nonsymmetric sparse linear
systems using MCSPARSE . . . . . . . . . 1291--1333
S. Hioki Construction of staples in lattice gauge
theory on a parallel computer . . . . . 1335--1344
Rabi N. Mahapatra and
Sudipta Mahapatra Mapping of neural network models onto
two-dimensional processor arrays . . . . 1345--1357
Piyush Maheshwari Improving granularity and locality of
data in multiprocessor execution of
functional programs . . . . . . . . . . 1359--1372
Mich\`ele Dion and
Yves Robert Mapping affine loop nests . . . . . . . 1373--1397
Ingmar Neumann and
Wolfgang Wilhelmi A parallel algorithm for achieving the
Smith normal form of an integer matrix 1399--1412
C. Calvin and
L. Colombet Performance evaluation and modeling of
collective communications on Cray T3D 1413--1427
Yasushi Shinjo and
Yasushi Kiyoki A lightweight process facility
supporting meta-level programming . . . 1429--1454
A. Cichocki and
A. Bargiela Neural networks for solving linear
inequality systems . . . . . . . . . . . 1455--1475
M. Hamdi and
C. K. Lee Dynamic load-balancing of image
processing applications on clusters of
workstations . . . . . . . . . . . . . . 1477--1492
N. P. Kruyt A conjugate gradient method for the
spectral partitioning of graphs . . . . 1493--1502
R. Hess and
W. Joppich A comparison of parallel multigrid and a
fast Fourier transform algorithm for the
solution of the Helmholtz equation in
numerical weather prediction . . . . . . 1503--1512
William Gropp and
Ewing Lusk A high-performance MPI implementation on
a shared-memory vector supercomputer . . 1513--1526
Bodo Heise and
Michael Jung Parallel solvers for nonlinear elliptic
problems based on domain decomposition
ideas . . . . . . . . . . . . . . . . . 1527--1544
Edward Walker and
Gary Morgan and
Bruce Cass and
Zygmunt Ulanowski A note on compiling FORTRAN loop kernels
onto a dataflow architecture . . . . . . 1545--1557
Dominique Barth Parallel matrix product algorithm in the
de Bruijn network using emulation of
meshes of trees . . . . . . . . . . . . 1563--1578
Jong-Uk Kim and
Kyu-Hyun Shim and
Kyu Ho Park A link-disjoint subcube for processor
allocation in hypercube computers . . . 1579--1595
Dale M. Slone and
Garry H. Rodrigue Efficient biased random bit generation
for parallel lattice gas simulations . . 1597--1620
Jingling Xue Unimodular transformations of
non-perfectly nested loops . . . . . . . 1621--1645
David J. Jackson and
Chris W. Humphres A simple yet effective load balancing
extension to the PVM software system . . 1647--1660
S. Mahapatra and
R. N. Mahapatra and
B. N. Chatterji A parallel formulation of
back-propagation learning on distributed
memory multiprocessors . . . . . . . . . 1661--1675
Satoko Sakata and
Umpei Nagashima and
Mitsuhisa Sato and
Satoshi Sekiguchi and
Haruo Hosoya Performance evaluation of a workstation
cluster, TMC CM-5, and Intel Paragon/XP
using a parallel homology analysis
program . . . . . . . . . . . . . . . . 1677--1693
G. Haring and
P. Kacsuk and
G. Kotsis Distributed and parallel systems:
Environments and tools . . . . . . . . . 1699--1701
G. Chiola and
G. Ciaccio Implementing a low cost, low latency
parallel platform . . . . . . . . . . . 1703--1717
F. Bergadano and
A. Giallombardo and
A. Puliafito and
G. Ruffo and
L. Vita Security agents for information
retrieval in distributed systems . . . . 1719--1731
Rushed Kanawati LICRA: a replicated-data management
algorithm for distributed synchronous
group-ware applications . . . . . . . . 1733--1746
Péter Kacsuk and
José C. Cunha and
Gábor Dózsa and
João Lourenço and
Tibor Fadgyas and
Tiago Antão A graphical development and debugging
environment for parallel programs . . . 1747--1770
Gabriele Kotsis A systematic approach for workload
modeling for parallel processing systems 1771--1787
J. Lüthi and
S. Majumdar and
G. Kotsis and
G. Haring Performance bounds for distributed
systems with workload variabilities and
uncertainties . . . . . . . . . . . . . 1789--1806
Tamás Bartha and
Endre Selényi Probabilistic system-level fault
diagnostic algorithms for
multiprocessors . . . . . . . . . . . . 1807--1821
T. Delaitre and
G. R. Ribeiro-Justo and
F. Spies and
S. C. Winter A graphical toolset for simulation
modelling of parallel systems . . . . . 1823--1836
H. Wabnig and
G. Haring PAPS --- a testbed for performance
prediction of parallel applications . . 1837--1851
Péter Kacsuk and
Zsolt Németh and
Zsolt Puskás Tools for mapping, load balancing and
monitoring in the LOGFLOW parallel
Prolog project . . . . . . . . . . . . . 1853--1881
E. Morel and
J. Briat and
J. Chassin de Kergommeaux Cuts and side-effects in distributed
memory OR-parallel Prolog . . . . . . . 1883--1896
Szabolcs Ferenczi Parallel execution of object-oriented
programs: Message handling strategies 1897--1912
László Böszörményi and
Karl-Heinz Eder M3Set --- a language for handling of
distributed and persistent sets of
objects . . . . . . . . . . . . . . . . 1913--1925
Xiaodong Wang and
Vwani P. Roychowdhury and
Pratheep Balasingam Practical aspects and experiences.
Scalable massively parallel algorithms
for computational nanoelectronics . . . 1931--1963
Anthony Theodore Chronopoulos and
Gang Wang Practical aspects and experiences.
Parallel solution of a traffic flow
simulation problem . . . . . . . . . . . 1965--1983
Der-Chyuan Lou and
Chin-Chen Chang A parallel two-list algorithm for the
knapsack problem . . . . . . . . . . . . 1985--1996
M. A. Amer and
B. A. Abdel-Hamida and
D. Fausett Parallel implementation of the Kronecker
product technique for numerical solution
of parabolic partial differential
equations . . . . . . . . . . . . . . . 1997--2005
Edward A. Billard and
Joseph C. Pasquale Load balancing to adjust for proximity
in some network topologies . . . . . . . 2007--2023
D. M. Dhamdhere and
Sridhar R. Iyer and
E. Kishore Kumar Reddy Distributed termination detection for
dynamic systems . . . . . . . . . . . . 2025--2045
Srabani Sen Gupta and
Rajib K. Das and
Krishnendu Mukhopadhyaya and
Bhabani P. Sinha A family of network topologies with
multiple loops and logarithmic diameter 2047--2064
J. Dongarra and
B. Tourancheau Workshop on environments and tools for
parallel scientific computing . . . . . 1
Tony Hey and
Alistair Dunlop and
Emilio Hernández Realistic parallel performance
estimation . . . . . . . . . . . . . . . 5--21
Jesus Labarta and
Sergi Girona and
Toni Cortes Analyzing scheduling policies using
Dimemas . . . . . . . . . . . . . . . . 23--34
Gilles Berger Sabbatel Hardware solutions for efficient
distributed computing on ATM networks 35--48
Jack J. Dongarra and
Sven Hammarling and
David W. Walker Key concepts for parallel out-of-core $L
U$ factorization . . . . . . . . . . . . 49--70
T. Brandes and
S. Chaumette and
M. C. Counilh and
J. Roman and
A. Darte and
F. Desprez and
J. C. Mignot HPFIT: a set of integrated tools for the
parallelization of applications using
High Performance Fortran. Part I: HPFIT
and the TransTOOL environment . . . . . 71--87
T. Brandes and
S. Chaumette and
M. C. Counilh and
J. Roman and
F. Desprez and
J. C. Mignot HPFIT: a set of integrated tools for the
parallelization of applications using
High Performance Fortran. Part II:
Data-structure visualization and HPF
extensions for irregular problems . . . 89--105
Lo\"\ic Prylli The CAPDYN environment and its
message-passing library implementation 107--120
Vaidy Sunderam Heterogeneous network computing: The
next generation . . . . . . . . . . . . 121--135
El Mostafa Daoudi and
Abdelhak Lakhouaja Exploiting the symmetry in the
parallelization of the Jacobi method . . 137--151
François Pellegrini Graph partitioning based methods and
tools for scientific computing . . . . . 153--164
Jean-Yves Berthou and
Laurent Colombet Which approach to parallelizing
scientific codes --- That is the
question . . . . . . . . . . . . . . . . 165--179
Karen L. Karavanic and
Jussi Myllymaki and
Miron Livny and
Barton P. Miller Integrated visualization of parallel
program performance data . . . . . . . . 181--198
D. Kranzlmüller and
S. Grabner and
J. Volkert Debugging with the MAD environment . . . 199--217
Bruno Gaujal and
Alain Jean-Marie and
Philippe Mussi and
Gunther Siegel High speed simulation of discrete event
systems by mixing process oriented and
equational approaches . . . . . . . . . 219--233
Laurent Lef\`evre Parallel programming on top of DSM
system. An experimental study . . . . . 235--249
Pierre-Yves Calland and
Alain Darte and
Yves Robert and
Frederic Vivien Plugging anti and output dependence
removal techniques into loop
parallelization algorithm . . . . . . . 251--266
Timo Hamalainen and
Harri Klapuri and
Jukka Saarinen and
Kimmo Kaski Mapping of SOM and LVQ algorithms on a
tree shape parallel computer system . . 271--289
Chao-Tung Yang and
Shian-Shyong Tseng and
Cheng-Der Chuang and
Wen-Chung Shih Using knowledge-based techniques on loop
parallelization for parallelizing
compilers . . . . . . . . . . . . . . . 291--309
Yuh-Shyan Chen and
Jang-Ping Sheu Tolerating faults in injured hypercubes
using maximal fault- free subcube-ring 311--331
Plamen Y. Yalamov Stability of a partitioning algorithm
for bidiagonal systems . . . . . . . . . 333--348
Sung Kwon Kim Rectangulating rectilinear polygons in
parallel . . . . . . . . . . . . . . . . 349--367
C. K. Yuen Parallel programming --- a critique . . 369--380
A. Basermann and
B. Reichel and
C. Schelthoff Preconditioned CG methods for sparse
matrices on massively parallel machines 381--398
David E. Womble and
David S. Greenberg Parallel I/O: an introduction . . . . . 403--417
Ethan L. Miller and
Randy H. Katz RAMA: An easy-to-use, high-performance
parallel file system . . . . . . . . . . 419--446
Nils Nieuwejaar and
David Kotz The Galley parallel file system . . . . 447--476
Jason A. Moore and
Michael J. Quinn Enhancing disk-directed I/O for
fine-grained redistribution of file data 477--499
Eric J. Schwabe and
Ian M. Sutherland and
Bruce K. Holmer Evaluating approximately balanced
parity-declustered data layouts for disk
arrays . . . . . . . . . . . . . . . . . 501--523
J. Carretero and
F. Pérez and
P. de Miguel and
F. García and
L. Alonso Performance increase mechanisms for
parallel and distributed file systems 525--542
Ian Parsons and
Ron Unrau and
Jonathan Schaeffer and
Duane Szafron PI/OT: Parallel I/O templates . . . . . 543--570
Thomas H. Cormen and
Melissa Hirschl Early experiences in evaluating the
parallel disk model with the ViC*
implementation . . . . . . . . . . . . . 571--600
Rakesh D. Barve and
Edward F. Grove and
Jeffrey Scott Vitter Simple randomized mergesort on parallel
disks . . . . . . . . . . . . . . . . . 601--631
M. Pakzad and
J. L. Lloyd and
C. Phillips Independent columns: a new parallel ILU
preconditioner for the PCG method . . . 637--647 (or 637--648??)
Mohan K. Kadalbajoo and
A. Appaji Rao Parallel group explicit method for
two-dimensional parabolic equations . . 649--666
J. Lopez and
O. Plata and
F. Arguello and
E. L. Zapata Unified framework for the
parallelization of divide and conquer
based tridiagonal systems . . . . . . . 667--686
Sergei Gorlatch $N$-graphs: Scalable topology and design
of balanced divide-and-conquer
algorithms . . . . . . . . . . . . . . . 687--698
M. Cermele and
M. Colajanni Non-uniform and dynamic domain
decompositions for hypercomputing . . . 699--720
Roman Trobec and
Izidor Jerebic Local diagnosis in massively parallel
systems . . . . . . . . . . . . . . . . 721--731
G. Mitra and
I. Hai and
M. T. Hajian A distributed processing algorithm for
solving integer programs using a cluster
of workstations . . . . . . . . . . . . 733--753
Jiahong Wang and
Jie Li and
Hisao Kameda Simulation studies on concurrency
control in parallel transaction
processing systems . . . . . . . . . . . 755--775
Neeraj K. Sharma and
Madhusudhana R. Pinnu An efficient implementation of bypass
queue under bursty traffic . . . . . . . 777--781
Ishfaq Ahmad Express versus PVM: a performance
comparison . . . . . . . . . . . . . . . 783--812
Anonymous Miscellaneous: Calendar of forthcoming
conferences and events . . . . . . . . . 813
A. Chalmers and
F. W. Jansen Parallel graphics and visualisation . . 817
Thomas W. Crockett An introduction to parallel rendering 819--843
Alan Heirich and
James Arvo Scalable Monte Carlo image synthesis . . 845--859
Hyeon-Ju Yoon and
Seongbae Eun and
Jung Wan Cho Image parallel ray tracing using static
load balancing and data prefetching . . 861--872
Erik Reinhard and
Frederik W. Jansen Rendering large scenes using parallel
ray tracing . . . . . . . . . . . . . . 873--885
Bruno Arnaldi and
Thierry Priol and
Luc Renambot and
Xavier Pueyo Visibility masks for solving complex
radiosity computations on
multiprocessors . . . . . . . . . . . . 887--897
Christophe Renaud and
François Rousselle Fast massively parallel progressive
radiosity on the MP-1 . . . . . . . . . 899--913
Anton H. J. Koning and
Karel J. Zuiderveld and
Max A. Viergever Volume visualization on shared memory
architectures . . . . . . . . . . . . . 915--925
Rüdiger Westermann and
Thomas Ertl Distributed volume visualization: a step
towards integrated data analysis and
image synthesis . . . . . . . . . . . . 927--941
Cemal Köse and
Alan Chalmers Profiling for efficient parallel volume
visualisation . . . . . . . . . . . . . 943--952
David C. Banks Screen-parallel determination of
intersection curves . . . . . . . . . . 953--960
Michael Krogh and
James Painter and
Charles Hansen Parallel sphere rendering . . . . . . . 961--974
Malte Zöckler and
Detlev Stalling and
Hans-Christian Hege Parallel line integral convolution . . . 975--989
Shaun Bangay and
James Gain and
Greg Watkins and
Kevan Watkins Building the second generation of
parallel/distributed virtual reality
systems . . . . . . . . . . . . . . . . 991--1000
Guangye Li A block variant of the GMRES method on
massively parallel processors . . . . . 1005--1019
P. Beraldi and
F. Guerriero Parallel asynchronous implementation of
the $\epsilon$-relaxation method for the
linear minimum cost flow problem . . . . 1021--1044
Padma Raghavan Parallel ordering using edge contraction 1045--1067
Soren S. Nielsen and
Stavros A. Zenios Scalable parallel Benders decomposition
for stochastic linear programming . . . 1069--1088
Ajit Singh and
Vincent Van Dongen An integrated performance analysis tool
for SPMD data-parallel programs . . . . 1089--1112
Svetozara Petrova Parallel implementation of fast elliptic
solver . . . . . . . . . . . . . . . . . 1113--1128
S. Chandra Sekhara Rao Existence and uniqueness of WZ
factorization . . . . . . . . . . . . . 1129--1139
Xin Wang and
Edward K. Blum and
D. Stott Parker and
Daniel Massey The dance party problem and its
application to collective communication
in computer networks . . . . . . . . . . 1141--1156
D. C. Hodgson and
P. K. Jimack A domain decomposition preconditioner
for a parallel finite element solver on
distributed unstructured grids . . . . . 1157--1181
Mouloud Oussaid\`ene and
Bastien Chopard and
Olivier V. Pictet and
Marco Tomassini Parallel genetic programming and its
application to trading model induction 1183--1198
Marco D'Apuzzo and
Marco Lapegna and
Almerico Murli Scalability and load balancing in
adaptive algorithms for multidimensional
integration . . . . . . . . . . . . . . 1199--1210
Michael Eldredge and
Thomas J. R. Hughes and
Robert M. Ferencz and
Steven M. Rifai and
Arthur Raefsky and
Bruce Herndon High-performance parallel computing in
industry . . . . . . . . . . . . . . . . 1217--1233
V. Kalro and
T. Tezduyar Parallel $3$D computation of unsteady
flows around circular cylinders . . . . 1235--1248
Y. Matsumoto and
T. Tokumasu Parallel computing of diatomic molecular
rarefied gas flows . . . . . . . . . . . 1249--1260
L. Paglieri and
D. Ambrosi and
L. Formaggia and
A. Quarteroni and
A. L. Scheinine Parallel computation for shallow water
flow: a domain decomposition approach 1261--1277
S. E. Ray and
G. P. Wren and
T. E. Tezduyar Parallel implementations of a finite
element formulation for fluid-structure
interactions in interior flows . . . . . 1279--1292
N. Satofuka and
M. Obata and
T. Suzuki Parallel computation of
super-/hypersonic flows on workstation
network and Transputer arrays . . . . . 1293--1305
John Shadid and
Scott Hutchinson and
Gary Hennigan and
Harry Moffat and
Karen Devine and
A. G. Salinger Efficient parallel computation of
unstructured finite element reacting
flow solutions . . . . . . . . . . . . . 1307--1325
M. S. Shephard and
J. E. Flaherty and
C. L. Bottasso and
H. L. de Cougny and
C. Ozturan and
M. L. Simone Parallel automatic adaptive analysis . . 1327--1347
T. Tezduyar and
V. Kalro and
W. Garrard Parallel computational methods for $3$D
simulation of a parafoil with prescribed
shape changes . . . . . . . . . . . . . 1349--1363
Genki Yagawa and
Yasushi Nakabayashi and
Hiroshi Okuda Large-scale finite element fluid
analysis by massively parallel
processors . . . . . . . . . . . . . . . 1365--1377
Andrew Yeckel and
Jeffrey J. Derby Parallel computation of incompressible
flows in materials processing: Numerical
experiments in diagonal preconditioning 1379--1400
Mark J. Clement and
Michael J. Quinn Automated performance prediction for
scalable parallel computing . . . . . . 1405--1420
P. Arbenz and
W. Gander and
M. Oettli The remote computation system . . . . . 1421--1428
W. J. Gutjahr and
M. Hitz and
T. A. Mueck Task assignment in Cayley
interconnection topologies . . . . . . . 1429--1460
Aiichiro Nakano and
Timothy Campbell Adaptive curvilinear-coordinate approach
to dynamic load balancing of parallel
multiresolution molecular dynamics . . . 1461--1478
Fabio Ancona and
Stefano Rovetta and
Rodolfo Zunino Transputer-based implementation of
distributed associative memories . . . . 1479--1491
E. W. Evans and
S. P. Johnson and
P. F. Leggett and
M. Cross Automatic code generation of overlapped
communications in a parallelisation tool 1493--1523
X. Yuan and
C. Salisbury and
D. Balsara and
R. Melhem Load balancing package on distributed
memory systems and its application to
particle-particle particle-mesh (P3M)
methods . . . . . . . . . . . . . . . . 1525--1544
M. S. Bebbington Parallel implementation of an
aggregation/disaggregation method for
evaluating quasi-stationary behavior in
continuous-time Markov chains . . . . . 1545--1559
M. Kutrib and
R. Vollmar and
Th. Worsch Introduction to the special issue on
cellular automata . . . . . . . . . . . 1567--1576
J.-P. Allouche and
F. v. Haeseler and
E. Lange and
A. Petersen and
G. Skordev Linear cellular automata and automatic
sequences . . . . . . . . . . . . . . . 1577--1592
G. Cattaneo and
E. Formenti and
L. Margara and
G. Mauri Transformations of the one-dimensional
cellular automata rule space . . . . . . 1593--1611
Klaus Sutner Linear cellular automata and Fischer
automata . . . . . . . . . . . . . . . . 1613--1634
Mario Markus and
Tomas Hahn and
Ingo Kusch A novel quantification of cellular
automata . . . . . . . . . . . . . . . . 1635--1642
Thomas Buchholz and
Martin Kutrib Some relations between massively
parallel arrays . . . . . . . . . . . . 1643--1662
Olivier Heen Efficient constant speed-up for one
dimensional cellular automata
calculators . . . . . . . . . . . . . . 1663--1671
Paola Flocchini and
Frédéric Geurts and
Nicola Santoro CA-like error propagation in fuzzy CA 1673--1682
Thomas Worsch On parallel Turing machines with
multi-head control units . . . . . . . . 1683--1697
Jörg R. Weimar Cellular automata for
reaction--diffusion systems . . . . . . 1699--1715
Divyesh Jadav and
Chutimet Srinilta and
Alok Choudhary Batching and dynamic allocation
techniques for increasing the stream
capacity of an on-demand media server 1727--1742
Jinsung Cho and
Heonshik Shin Scheduling video streams in a
large-scale video-on-demand server . . . 1743--1755
Valentin Rottmann and
Petra Berenbrink and
Reinhard Luling Simple distributed scheduling policy for
parallel interactive continuous media
servers . . . . . . . . . . . . . . . . 1757--1776
Constantin Arapis and
Simon Gibbs and
Christian Breiteneder Real-time segmentation of video on a
multiprocessor platform . . . . . . . . 1777--1792
John A. Watlington and
V. Michael Bove, Jr. A system for parallel media processing 1793--1809
Eddy De Greef and
Francky Catthoor and
Hugo De Man Memory size reduction through storage
order optimization for embedded parallel
multimedia applications . . . . . . . . 1811--1837 (or 1811--1838??)
Wei Li and
Xiaohu Huang and
Nanning Zheng Parallel implementing OpenGL on PVM . . 1839--1850
Abdelsalam Heddaya and
Kihong Park Congestion control for asynchronous
parallel computing on workstation
networks . . . . . . . . . . . . . . . . 1855--1875
P. S. Rao and
G. Mouney Data communication in parallel block
predictor-corrector methods for solving
ODE's . . . . . . . . . . . . . . . . . 1877--1888
Weifa Liang and
Xiaojun Shen Finding the $k$ most vital edges in the
minimum spanning tree problem . . . . . 1889--1907
Yih Huang and
Philip K. McKinley Adaptive global reduction algorithm for
wormhole-routed 2D meshes . . . . . . . 1909--1936
Seong-Pyo Kim and
Taisook Han Fault-tolerant wormhole routing in mesh
with overlapped solid fault regions . . 1937--1962
M.-Tahar Kechadi and
J.-Luc Dekeyser Analysis and simulation of an
out-of-order execution model in vector
multiprocessor systems . . . . . . . . . 1963--1986
Hong Shen Optimal parallel multiselection on EREW
PRAM . . . . . . . . . . . . . . . . . . 1987--1992
Tong-Yee Lee Exploitation of image parallelism for
ray tracing $3$D scenes on $2$D mesh
multicomputers . . . . . . . . . . . . . 1993--2015
Jarmo Rantakokko Strategies for parallel variational data
assimilation . . . . . . . . . . . . . . 2017--2039
Michael A. Lambert and
Garry H. Rodrigue and
Dennis W. Hewett Parallel DSDADI method for solution of
the steady state diffusion equation . . 2041--2065
Ç. K. Koç Parallel $p$-adic method for solving
linear systems of equations . . . . . . 2067--2074
Chunguang Sun Parallel solution of sparse linear least
squares problems on distributed-memory
multiprocessors . . . . . . . . . . . . 2075--2093
Daniela di Serafino Parallel implementation of a multigrid
multiblock Euler solver on distributed
memory machines . . . . . . . . . . . . 2095--2113
R. E. Overill and
S. Wilson Data parallel evaluation of univariate
polynomials by the Knuth-Eve algorithm 2115--2127
C. Baillie and
J. Michalakes and
R. Skålin Regional weather modeling on parallel
computers . . . . . . . . . . . . . . . 2135--2142
S. J. Thomas and
A. V. Malevsky and
M. Desgagne and
R. Benoit and
P. Pellerin and
M. Valin Massively parallel implementation of the
mesoscale compressible community model 2143--2160
R. Skålin and
D. Bjòrge Implementation and performance of a
parallel version of the HIRLAM limited
area atmospheric model . . . . . . . . . 2161--2172
J. Michalakes MM90: a scalable parallel implementation
of the Penn State/NCAR Mesoscale Model
(MM5) . . . . . . . . . . . . . . . . . 2173--2186
Donald Dabdub and
Rajit Manohar Performance and portability of an air
quality model . . . . . . . . . . . . . 2187--2200
M. Ashworth and
F. Foelkel and
V. Gülzow and
K. Kleese and
D. P. Eppel and
H. Kapitza and
S. Unger Parallelization of the GESIMA mesoscale
atmospheric model . . . . . . . . . . . 2201--2213
Ulrich Schättler and
Elisabeth Krenzien Parallel `Deutschland-Modell' --- a
message-passing version for distributed
memory computers . . . . . . . . . . . . 2215--2226
Alan J. Wallcraft and
Daniel R. Moore The NRL layered ocean model . . . . . . 2227--2242
A. Sathye and
M. Xue and
G. Bassett and
K. Droegemeier Parallel weather modeling with the
advanced regional prediction system . . 2243--2256
Thomas H. Cormen and
David M. Nicol Performing out-of-core FFTs on parallel
disk systems . . . . . . . . . . . . . . 5--20
Peter Triantafillou and
Christos Faloutsos Overlay striping and optimal parallel
I/O for modern applications . . . . . . 21--43
Daniel A. Ford and
Robert J. T. Morris and
Alan E. Bell Redundant arrays of independent
libraries (RAIL): the StarFish tertiary
storage system . . . . . . . . . . . . . 45--64
Carter T. Shock and
Chialin Chang and
Bongki Moon and
Anurag Acharya and
Larry Davis and
Joel Saltz and
Alan Sussman Design and evaluation of a
high-performance earth science database 65--89
Shahram Ghandeharizadeh and
Richard Muntz Design and implementation of scalable
continuous media servers . . . . . . . . 91--122
Leana Golubchik and
John C. S. Lui and
Maria Papadopouli Survey of approaches to fault tolerant
design of VOD servers: techniques,
analysis and comparison . . . . . . . . 123--155
Ann L. Chervenak Challenges for tertiary storage in
multimedia servers . . . . . . . . . . . 157--176
Manu Konchady and
Arun Sood and
Paul S. Schopf Implementation and performance
evaluation of a parallel ocean model . . 181--203
Kangwoo Lee and
Michel Dubois Empirical models of miss rates . . . . . 205--219
Luis Díaz de Cerio and
Miguel Valero-García and
Antonio González Method for exploiting
communication/computation overlap in
hypercubes . . . . . . . . . . . . . . . 221--245
Michael E. Houle and
Gavin Turner Dimension-exchange token distribution on
the mesh and the torus . . . . . . . . . 247--265
Jelena Mi\vsi\'c Unicast-based multicast algorithm in
wormhole-routed star graph
interconnection networks . . . . . . . . 267--286
K. Sumiyoshi and
T. Ebisuzaki Performance of parallel solution of a
block-tridiagonal linear system on
Fujitsu VPP500 . . . . . . . . . . . . . 287--304
S. V. Kuznetsov Orthogonal reduction of dense matrices
to bidiagonal form on computers with
distributed memory architectures . . . . 305--313
Piyush Mehrotra and
John Van Rosendale and
Hans Zima High Performance Fortran: History,
status and future . . . . . . . . . . . 325--354
Henk J. Sips and
Will Denissen and
Kees van Reeuwijk Analysis of local enumeration and
storage schemes in HPF . . . . . . . . . 355--382
Michael Gerndt High-level programming of massively
parallel computers based on shared
virtual memory . . . . . . . . . . . . . 383--400
Brian Armstrong and
Seon Wook Kim and
Insung Park and
Michael Voss and
Rudolf Eigenmann Compiler-based tools for analyzing
parallel programs . . . . . . . . . . . 401--420
Pierre Boulet and
Alain Darte and
Georges-André Silber and
Frédéric Vivien Loop parallelization algorithms: From
parallelism extraction to code
generation . . . . . . . . . . . . . . . 421--444
Amy W. Lim and
Monica S. Lam Maximizing parallelism and minimizing
synchronization with affine partitions 445--475
Trung N. Nguyen and
Zhiyuan Lib Interprocedural analysis for loop
scheduling and data allocation . . . . . 477--504
Wolfram Amme and
Eberhard Zehendner Data dependence analysis in programs
with pointers . . . . . . . . . . . . . 505--525
Lawrence Rauchwerger Run-time parallelization: Its time has
come . . . . . . . . . . . . . . . . . . 527--556
Eduard Ayguadé and
Jordi Garcia and
Ulrich Kremer Tools and techniques for automatic data
layout: a case study . . . . . . . . . . 557--578
Hironori Kasahara and
Akimasa Yoshida A data-localization compilation scheme
using partial-static task assignment for
Fortran coarse-grain parallel processing 579--596
M. Kandemir and
A. Choudhary and
J. Ramanujam and
R. Bordawekar Compilation techniques for out-of-core
parallel computations . . . . . . . . . 597--628
B. Creusillet and
F. Irigoin Interprocedural analyses of Fortran
programs . . . . . . . . . . . . . . . . 629--648
Vincent Lefebvre and
Paul Feautrier Automatic storage management for
parallel programs . . . . . . . . . . . 649--671
A. Averbuch and
L. Ioffe and
M. Israeli and
L. Vozovoi Two-dimensional parallel solver for the
solution of Navier--Stokes equations
with constant and variable coefficients
using ADI on cells . . . . . . . . . . . 673--699
C. Ceron and
J. Dopazo and
E. L. Zapata and
J. M. Carazo and
O. Trelles Parallel implementation of DNAml program
on message-passing architectures . . . . 701--716
P. Fisette and
J. M. Péterkenne Contribution to parallel and vector
computation in multibody dynamics . . . 717--728
A. A. Mirin and
D. E. Shumaker and
M. F. Wehner Efficient filtering techniques for
finite-difference atmospheric general
circulation models on parallel
processors . . . . . . . . . . . . . . . 729--740
R. Aversa and
A. Mazzeo and
N. Mazzocca and
U. Villano Developing applications for
heterogeneous computing environments
using simulation: a case study . . . . . 741--761
Mostafa M. Aref and
Mohammed A. Tayyib Lana-Match algorithm: a parallel version
of the Rete-Match algorithm . . . . . . 763--775
D. J. Evans and
M. Barulli BSP linear solvers for dense matrices 777--795
Ananth Grama and
Vipin Kumar and
Ahmed Sameh Scalable parallel formulations of the
Barnes--Hut method for $n$-body
simulations . . . . . . . . . . . . . . 797--822
Zden\vek Hanzálek A parallel algorithm for gradient
training of feedforward neural networks 823--839
Alain Jean-Marie and
Sophie Lefebvre-Barbaroux and
Zhen Liu An analytical approach to the
performance evaluation of master--slave
computational models . . . . . . . . . . 841--862
Zhen Liu Worst-case analysis of scheduling
heuristics of parallel systems . . . . . 863--891
Piyush Maheshwari and
Hong Shen An efficient clustering algorithm for
partitioning parallel programs . . . . . 893--909
M. Marrocu and
R. Scardovelli and
P. Malguzzi Parallelization and performance of a
meteorological limited area model . . . 911--922
Michael Mascagni Parallel linear congruential generators
with prime moduli . . . . . . . . . . . 923--936
Tz. Ostromsky and
P. C. Hansen and
Z. Zlatev A coarse-grained parallel
$QR$-factorization algorithm for sparse
least squares problems . . . . . . . . . 937--964
Sung Kwon Kim Constant-time RMESH algorithms for the
range minima and co-minima problems . . 965--977
F. Arbab and
P. Ciancarini and
C. Hankin Coordination languages for parallel
programming . . . . . . . . . . . . . . 989--1004
Nicholas Carriero An implementation of Linda for a NUMA
machine . . . . . . . . . . . . . . . . 1005--1021
Michel R. V. Chaudron and
Arno C. N. van Duin The formal derivation of parallel
triangular system solvers using a
coordination-based design method . . . . 1023--1046
Lorenzo Donatiello and
Alessandro Fabbri Generative coordination environments
supporting parallel discrete event
simulation . . . . . . . . . . . . . . . 1047--1080
Kees Everaars and
Barry Koren Using coordination to parallelize
sparse-grid methods for $3$-D CFD
problems . . . . . . . . . . . . . . . . 1081--1106
Tom Holvoet and
Thilo Kielmann Behaviour specification of parallel
active objects . . . . . . . . . . . . . 1107--1135
George A. Papadopoulos Distributed and parallel systems
engineering in MANIFOLD . . . . . . . . 1137--1160
Bouchaib Radi and
Jean-François Estrade Adaptive parallelization techniques in
global weather models . . . . . . . . . 1167--1175
Suchendra M. Bhandarkar and
Salem Machaka and
Sridhar Chirravuri and
Jonathan Arnold Parallel computing for chromosome
reconstruction via ordering of DNA
sequences . . . . . . . . . . . . . . . 1177--1204
O. Benkahla and
C. Aktouf and
C. Robach Performance evaluation of distributed
diagnosis algorithms in parallel systems 1205--1222
Elise de Doncker and
Ajay Gupta Multivariate integration on hypercubic
and mesh networks . . . . . . . . . . . 1223--1244
Qian-Ping Gu and
Shietung Peng Node-to-set and set-to-set cluster fault
tolerant routing in hypercubes . . . . . 1245--1261
Shahram Latifi and
Pradip K. Srimani Wormhole broadcast in star graph
networks . . . . . . . . . . . . . . . . 1263--1276
Mahlon Stacy and
Dennis Hanson and
Jon Camp and
Richard A. Robb High performance computing in biomedical
imaging research . . . . . . . . . . . . 1287--1321
Robert L. Galloway and
W. Andrew Bass and
Christopher E. Hockey Task-oriented asymmetric multiprocessing
for interactive image-guided surgery . . 1323--1343
Simon K. Warfield and
Ferenc A. Jolesz and
Ron Kikinis A high performance computing approach to
the registration of medical imaging data 1345--1368
Gary E. Christensen MIMD vs. SIMD parallel processing: a
case study in $3$D medical image
registration . . . . . . . . . . . . . . 1369--1383
Craig M. Wittenbrink Extensions to permutation warping for
parallel volume rendering . . . . . . . 1385--1406
Chris Basoglu and
Ravi Managuli and
George York and
Yongmin Kim Computing requirements of modern medical
diagnostic ultrasound machines . . . . . 1407--1431
Paul Schimpf and
Jens Haueisen and
Ceon Ramon and
Hannes Nowak Realistic computer modelling of electric
and magnetic fields of human head and
torso . . . . . . . . . . . . . . . . . 1433--1460
C. Laurent and
F. Peyrin and
J-M Chassery and
M. Amiel Parallel image reconstruction on MIMD
computers for three- dimensional
cone-beam tomography . . . . . . . . . . 1461--1479
Jens Gregor and
Dean A. Huff A computational study of the
focus-of-attention EM-ML algorithm for
PET reconstruction . . . . . . . . . . . 1481--1497
Chung-Ming Chen An efficient four-connected parallel
system for PET image reconstruction . . 1499--1522
Habib Zaidi and
Claire Labbé and
Christian Morel Implementation of an environment for
Monte Carlo simulation of fully $3$-D
positron tomography on a
high-performance parallel platform . . . 1523--1536
Bjorn De Sutter and
Mark Christiaens and
Koen De Bosschere and
Jan Van Campenhout On the use of subword parallelism in
medical image processing . . . . . . . . 1537--1556
Yuan-Ping Pang and
Stephen Brimijoin Supercomputing-based dimeric analog
approach for drug optimization . . . . . 1557--1566
Todd E. Scheetz and
Terry A. Braun and
Kyle J. Munn and
Edwin M. Stone and
Val C. Sheffield and
Thomas L. Casavant GenoMap: a distributed system for
unifying genotyping and genetic linkage
analysis . . . . . . . . . . . . . . . . 1567--1592
Craig Chase and
Prakash Arunachalam and
Jacob Abraham Memory distribution: Techniques and
practice for CAD applications . . . . . 1597--1615
Jih-H. Chen and
Shu-Yun Le and
Bruce A. Shapiro and
Jacob V. Maizel Optimization of an RNA folding algorithm
for parallel architectures . . . . . . . 1617--1634
Paul Caprioli and
Mark H. Holmes A parallel quasi-Newton method for
Gaussian data fitting . . . . . . . . . 1635--1651
E. Bampis and
C. Delorme and
J.-C. König Optimal schedules for $d-D$ grid graphs
with communication delays . . . . . . . 1653--1664
Cyril Fonlupt and
Philippe Marquet and
Jean-Luc Dekeyser Data-parallel load balancing strategies 1665--1684
G. Haase Parallel incomplete Cholesky
preconditioners based on the
non-overlapping data distribution . . . 1685--1703
Greg Eisenhauer and
Beth Plale and
Karsten Schwan DataExchange: high performance
communications in distributed
laboratories . . . . . . . . . . . . . . 1713--1733
Ian Foster and
Jonathan Geisler and
William Gropp and
Nicholas Karonis and
Ewing Lusk and
George Thiruvathukal and
Steven Tuecke Wide-area implementation of the Message
Passing Interface . . . . . . . . . . . 1735--1749
Matthias Brune and
Jorn Gehring and
Axel Keller and
Burkhard Monien and
Friedhelm Ramme and
Alexander Reinefeld Specifying resources and services in
metacomputing environments . . . . . . . 1751--1776
Henri Casanova and
Jack Dongarra Using agent-based software for
scientific computing in the NetSolve
system . . . . . . . . . . . . . . . . . 1777--1790
Roy Williams and
Bruce Sears A high-performance active digital
library . . . . . . . . . . . . . . . . 1791--1806
A. W. van Halderen and
B. J. Overeinder and
P. M. A. Sloot and
R. van Dantzig and
D. H. J. Epema and
M. Livny Hierarchical resource management in the
Polder Metacomputing Initiative . . . . 1807--1825
Timothy J. Sheehan and
William A. Shelton and
Thomas J. Pratt and
Philip M. Papadopoulos and
Philip LoCascio and
Thomas H. Dunigan The locally self-consistent multiple
scattering code in a geographically
distributed linked MPP environment . . . 1827--1846
Th Eickermann and
J. Henrichs and
M. Resch and
R. Stoy and
R. Volpel Metacomputing in gigabit environments:
networks, tools, and applications . . . 1847--1872
Sharon Brunett and
Thomas Gottschalk A large-scale metacomputing framework
for the ModSAF real-time simulation . . 1873--1900
K. Mani Chandy and
Joseph Kiniry and
Adam Rifkin and
Daniel Zimmerman A framework for structured distributed
object computing . . . . . . . . . . . . 1901--1922
C. Vuik and
R. R. P. van Nooyen and
P. Wesseling Parallelism in ILU-preconditioned GMRES 1927--1946
Jonathan M. D. Hill and
Bill McColl and
Dan C. Stefanescu and
Mark W. Goudreau and
Kevin Lang and
Satish B. Rao and
Torsten Suel and
Thanasis Tsantilas and
Rob H. Bisseling BSPlib: The BSP programming library . . 1947--1980
Alina N. Moga and
Bogdan Cramariuc and
Moncef Gabbouj Parallel watershed transformation
algorithms for image segmentation . . . 1981--2001
E. G. Talbi and
Z. Hafidi and
J-M. Geib A parallel adaptive tabu search approach 2003--2019
L. K. Lundin Computing the velocity of a rotating
flow . . . . . . . . . . . . . . . . . . 2021--2034
Ravi Prakash and
Dhabaleswar K. Panda Designing communication strategies for
heterogeneous parallel systems . . . . . 2035--2052
B. Ciciani and
M. Colajanni and
C. Paolucci Performance evaluation of deterministic
wormhole routing in $k$-ary $n$-cubes 2053--2075
Kuo-Pao Fan and
Chung-Ta King Efficient barrier synchronization in
wormhole-routed mesh networks supporting
turn model . . . . . . . . . . . . . . . 2077--2099
Weng-Long Chang and
Chih-Ping Chu The extension of the $I$ test . . . . . 2101--2127
Jos B. T. M. Roerdink and
Michel A. Westenberg Data-parallel tomographic
reconstruction: a comparison of filtered
backprojection and direct Fourier
reconstruction . . . . . . . . . . . . . 2129--2142
Joe Shang-Chieh Wu and
Ying-Dar Lin An efficient and orderly implementation
of bypass queue under bursty traffic . . 2143--2148
Jacques Verriet Scheduling interval-ordered tasks with
non-uniform deadlines subject to
non-zero communication delays . . . . . 3--21
Rolf H. Möhring and
Markus W. Schäffter Scheduling series--parallel orders
subject to $0/1$-communication delays 23--40
Alix Munier Approximation algorithms for scheduling
trees with general communication delays 41--48
A. K. Amoura and
E. Bampis and
Y. Manoussakis and
Zs. Tuza A comparison of heuristics for
scheduling multiprocessor tasks on three
dedicated processors . . . . . . . . . . 49--61
Cristina Boeres and
Vinod E. F. Rebello A versatile cost modelling approach for
multicomputer task scheduling . . . . . 63--86
Jacek B\la\.zewicz and
Maciej Drozdowski and
Mariusz Markiewicz Divisible task scheduling --- Concept
and verification . . . . . . . . . . . . 87--98
Christoph W. Keßler and
Jesper Larsson Träff Language and library support for
practical PRAM programming . . . . . . . 105--135
Horng-Ren Tsai and
Shi-Jinn Horng and
Tzong-Wann Kao and
Shung-Shing Lee and
Shun-Shan Tsai Fundamental data movement operations and
its applications on a hyper-bus
broadcast network . . . . . . . . . . . 137--157
Danny Krizanc and
Anton Saarimaki Bulk synchronous parallel: practical
experience with a model for parallel
computing . . . . . . . . . . . . . . . 159--181
S. W. Chen and
C. Y. Fang and
K. E. Chang Neural simulation of Petri nets . . . . 183--207
Ravi Murty and
Daniel Okunbor Efficient parallel algorithms for
molecular dynamics simulations . . . . . 217--230
Vikramaditya Sen and
Mrinal K. Sen and
Paul L. Stoffa PVM based $3$-D Kirchhoff depth
migration using dynamically computed
travel-times: an application in seismic
data processing . . . . . . . . . . . . 231--248
Mohamed Benmaiza and
Abderezak Touzene One-to-all broadcast algorithm for
constant degree 4 Cayley graphs . . . . 249--264
Cristina Corral and
Isabel Giménez and
José Marín and
José Mas Parallel $m$-step preconditioners for
the conjugate gradient method . . . . . 265--281
Sunil Kim and
Alexander V. Veidenbaum Interconnection network organization and
its impact on performance and cost in
shared memory multiprocessors . . . . . 283--309
J. S. Reeve and
M. Heath An efficient parallel version of the
householder-QL matrix diagonalisation
algorithm . . . . . . . . . . . . . . . 311--319
I. Vlahavas and
P. Kefalas and
C. Halatsis OASys: an AND/OR parallel logic
programming system . . . . . . . . . . . 321--336
Edmund Chadwick A hybrid parallel algorithm for the
spectral transform method which uses
functional parallelism . . . . . . . . . 345--360
T. C. Clune and
J. R. Elliott and
M. S. Miesch and
J. Toomre and
G. A. Glatzmaier Computational aspects of a code to study
rotating turbulent convection in
spherical shells . . . . . . . . . . . . 361--380
Maciej Drozdowski and
W\lodzimierz Glazek Scheduling divisible loads in a
three-dimensional mesh of processors . . 381--404
Akihiro Fujiwara and
Michiko Inoue and
Toshimitsu Masuzawa and
Hideo Fujiwara A cost optimal parallel algorithm for
weighted distance transforms . . . . . . 405--416
Y. F. Hu and
R. J. Blake An improved diffusion algorithm for
dynamic load balancing . . . . . . . . . 417--444
Zhiyong Liu and
David W. Cheung Oblivious routing for LC permutations on
hypercubes . . . . . . . . . . . . . . . 445--460
Roseli S. Wedemann and
Valmir C. Barbosa and
Raul Donangelo Defeasible time-stepping . . . . . . . . 461--489
Nicholas Giolmas and
Daniel W. Watson and
David M. Chelberg and
Peter V. Henstock and
June Ho Yi and
Howard Jay Siegel Aspects of computational mode and data
distribution for parallel range image
segmentation . . . . . . . . . . . . . . 499--523
U. W. Rathe and
P. Sanders and
P. L. Knight A case study in scalability: An ADI
method for the two-dimensional
time-dependent Dirac equation . . . . . 525--533
H. Schwichtenberg and
G. Winter and
H. Wallmeier Acceleration of molecular mechanic
simulation by parallelization and fast
multipole techniques . . . . . . . . . . 535--546
Pierre Boulet and
Jack Dongarra and
Yves Robert and
Frédéric Vivien Static tiling for heterogeneous
computing platforms . . . . . . . . . . 547--568
W. Cai and
K. Zhang and
S. J. Turner and
C. Sun Interlock avoidance in transparent and
dynamic parallel program instrumentation
using logical clocks . . . . . . . . . . 569--591
Giuseppe Passoni and
Giancarlo Alfonsi and
Giovanni Tula and
Umberto Cardu A wavenumber parallel computational code
for the numerical integration of the
Navier--Stokes equations . . . . . . . . 593--611
M. Szularz and
J. Weston and
M. Clint Explicitly restarted Lanczos algorithms
in an MPP environment . . . . . . . . . 613--631
Angelo Corana Parallel computation of the correlation
dimension from a time series . . . . . . 639--666
Hermann Mierendorff and
Helmut Schwamborn Automatic model generation for
performance estimation of parallel
programs . . . . . . . . . . . . . . . . 667--680
Zhong-Zhi Bai A class of asynchronous parallel
multisplitting blockwise relaxation
methods . . . . . . . . . . . . . . . . 681--701
S. Ramesh Implementation of communicating reactive
processes . . . . . . . . . . . . . . . 703--727
Reiji Suda and
Akira Nishida and
Yoshio Oyanagi A high performance parallelization
scheme for the Hessenberg double shift
$QR$ algorithm . . . . . . . . . . . . . 729--744
Franco Zambonelli Exploiting biased load information in
direct-neighbour load balancing policies 745--766
R. S. Wedemann and
V. C. Barbosa and
R. Donangelo Erratum to ``Defeasible time-stepping''
[Parallel Computing 25 (4) (April 1999)
pp. 461--489] . . . . . . . . . . . . . 767--767
Anonymous Parallelization techniques for numerical
modelling . . . . . . . . . . . . . . . 775--776
Gerhard Adrian Parallel processing in regional
climatology: The parallel version of the
``Karlsruhe Atmospheric Mesoscale
Model'' (KAMM) . . . . . . . . . . . . . 777--787
Ralf Diekmann and
Andreas Frommer and
Burkhard Monien Efficient schemes for nearest neighbor
load balancing . . . . . . . . . . . . . 789--812
Ralf Ebner and
Christoph Zenger A distributed functional framework for
recursive finite element simulations . . 813--826
Michael Griebel and
Gerhard Zumbusch Parallel multigrid in an adaptive PDE
solver based on hashing and
space-filling curves . . . . . . . . . . 827--843
Bruno Lang Efficient eigenvalue and singular value
computations on shared memory machines 845--860
Ingrid Lenhardt and
Thomas Rottner Krylov subspace methods for structural
finite element analysis . . . . . . . . 861--875
Thomas Lippert Hyper-systolic algorithms for $N$-body
computations and parallel level-$3$ BLAS
libraries . . . . . . . . . . . . . . . 877--891
Wolfgang Mackens and
Heinrich Voss General masters in parallel condensation
of eigenvalue problems . . . . . . . . . 893--903
Reinhard Möller A systolic implementation of the MLEM
reconstruction algorithm for positron
emission tomography images . . . . . . . 905--920
S. J. Dodson and
S. P. Walker and
M. J. Bluck Parallelisation issues for high speed
time domain integral equation analysis 925--942
W.-Y. Lin and
C.-L. Chen Minimum communication cost reordering
for parallel sparse Cholesky
factorization . . . . . . . . . . . . . 943--967
B. Großer and
B. Lang Efficient parallel reduction to
bidiagonal form . . . . . . . . . . . . 969--986
G. S. Brodal Priority queues on parallel machines . . 987--1011
P. Sanders Analysis of nearest neighbor load
balancing algorithms for random loads 1013--1033
D. Barth and
C. Laforest Scattering and multi-scattering in trees
and meshes, with local routing and
without buffering . . . . . . . . . . . 1035--1057
M. E. Barrows and
D. E. Gregory and
L. Gao and
A. L. Rosenberg and
P. R. Cohen An empirical study of dynamic scheduling
on rings of processors . . . . . . . . . 1063--1079
J. Yamamoto and
others Performance evaluation of SNAIL: a
multiprocessor based on the simple
serial synchronized multistage
interconnection network architecture . . 1081--1103
G.-H. Hwang and
J. K. Lee Communication set generations with CSD
calculus and expression-rewriting
framework . . . . . . . . . . . . . . . 1105--1130
A. Clematis and
A. Corana Modeling performance of heterogeneous
parallel computing systems . . . . . . . 1131--1145
E. J. Kontoghiorghes and
M. Clint and
H.-H. Naegeli Recursive least-squares using a hybrid
Householder algorithm on massively
parallel SIMD systems . . . . . . . . . 1147--1159
G. Edjlali and
M. Garbey and
D. Tromeur-Dervout Interoperability parallel programs
approach to simulate $3$D frontal
polymerization processes . . . . . . . . 1161--1191
N. Cabibbo and
Y. Iwasaki and
K. Schilling High performance computing in lattice
QCD . . . . . . . . . . . . . . . . . . 1197--1198
R. Gupta General physics motivations for
numerical simulations of quantum field
theory . . . . . . . . . . . . . . . . . 1199--1215
F. Rapuano Quenched physics on APE computers . . . 1217--1226
Stephan Güsken and
Thomas Lippert and
Klaus Schilling Lattice QCD with two dynamical Wilson
fermions on APE100 parallel systems . . 1227--1242
S. Aoki and
others Performance of lattice QCD programs on
CP-PACS . . . . . . . . . . . . . . . . 1243--1255
Akira Ukawa Lattice QCD results from the CP-PACS
computer . . . . . . . . . . . . . . . . 1257--1280
Robert D. Mawhinney The 1 Teraflops QCDSP computer . . . . . 1281--1296
R. Tripiccione APEmille . . . . . . . . . . . . . . . . 1297--1309
A. D. Kennedy The Hybrid Monte Carlo algorithm on
parallel computers . . . . . . . . . . . 1311--1339
Philippe de Forcrand The MultiBoson method . . . . . . . . . 1341--1355
Th. Lippert Parallel SSOR preconditioning for
lattice QCD . . . . . . . . . . . . . . 1357--1370
Stephan Güsken Stochastic estimator techniques and
their implementation on distributed
parallel computers . . . . . . . . . . . 1371--1381
G. Peter Lepage Improved discretizations for lattice QCD 1383--1393
Robert G. Edwards and
Urs M. Heller and
Rajamani Narayanan Chiral fermions on the lattice . . . . . 1395--1407
V. Annamalai and
C. S. Krishnamoorthy and
V. Kamakoti Adaptive finite element analysis on a
parallel and distributed environment . . 1413--1434
G. Carré and
S. Lanteri and
Mark Loriot High performance simulations of
compressible flows inside car engine
geometries using the N3S-NATUR parallel
solver . . . . . . . . . . . . . . . . . 1435--1458
Myron Ginsberg Influences, challenges, and strategies
for automotive HPC benchmarking and
performance improvement . . . . . . . . 1459--1476
S. Loucif and
M. Ould-Khaoua and
L. M. Mackenzie Analysis of fully adaptive wormhole
routing in tori . . . . . . . . . . . . 1477--1487
Max Geigl and
Martin Griebl and
Christian Lengauer Termination detection in parallel loop
nests with while loops . . . . . . . . . 1489--1510
Erich Strohmaier and
Jack J. Dongarra and
Hans W. Meuer and
Horst D. Simon The marketplace of high-performance
computing . . . . . . . . . . . . . . . 1517--1544
Yoshio Oyanagi Development of supercomputers in Japan:
Hardware and software . . . . . . . . . 1545--1567
Enrico Clementi and
Giorgina Corongiu Early parallelism with a loosely coupled
array of processors: The ICAP experiment 1583--1600
Shunichi Uchida and
Akira Aiba and
Kazuaki Rokusawa and
Takashi Chikayama and
Ryuzo Hasegawa The parallel logic programming system in
the FGCS project and its future
directions . . . . . . . . . . . . . . . 1601--1633
Kisaburo Nakazawa and
Hiroshi Nakamura and
Taisuke Boku and
Ikuo Nakata and
Yoshiyuki Yamashita CP-PACS: a massively parallel processor
at the University of Tsukuba . . . . . . 1635--1661
D. Sugimoto GRAPE: a parallel computer dedicated to
astrophysical many-body problems . . . . 1663--1676
Paolo Cremonesi and
Emilia Rosti and
Giuseppe Serazzi and
Evgenia Smirni Performance evaluation of parallel
systems . . . . . . . . . . . . . . . . 1677--1698
V. S. Sunderam and
G. A. Geist Heterogeneous parallel and distributed
computing . . . . . . . . . . . . . . . 1699--1721
A. P. Willem Böhm and
Jeffrey P. Hammes and
Sumit S. Sur On the performance of pure and impure
parallel functional programs . . . . . . 1723--1740
Rajiv Gupta and
Santosh Pande and
Kleanthis Psarris and
Vivek Sarkar Compilation techniques for parallel
systems . . . . . . . . . . . . . . . . 1741--1783
Siegfried Benkner and
Hans Zima Compiling High Performance Fortran for
distributed-memory architectures . . . . 1785--1825
B. Bacci and
M. Danelutto and
S. Pelagatti and
M. Vanneschi SkIE: a heterogeneous environment for
HPC applications . . . . . . . . . . . . 1827--1852
David E. Womble and
others Massively parallel computing: A Sandia
perspective . . . . . . . . . . . . . . 1853--1876
S. Lakshmivarahan and
Sudarshan K. Dhall Ring, torus and hypercube
architectures/algorithms for parallel
computing . . . . . . . . . . . . . . . 1877--1906
Walid A. Najjar and
Edward A. Lee and
Guang R. Gao Advances in the dataflow computational
model . . . . . . . . . . . . . . . . . 1907--1929
Iain S. Duff and
Henk A. van der Vorst Developments and trends in the parallel
solution of linear systems . . . . . . . 1931--1970
E. L. Zapata and
O. Plata and
R. Asenjo and
G. P. Trabado Data-parallel support for numerical
irregular problems . . . . . . . . . . . 1971--1994
Shun Doi and
Takumi Washio Ordering strategies and related
techniques to overcome the trade-off
between parallelism and convergence in
incomplete factorizations . . . . . . . 1995--2014
Clemens-August Thole and
Klaus Stüben Industrial simulation on parallel
computers . . . . . . . . . . . . . . . 2015--2037
Tayfun Tezduyar and
Yasuo Osawa Methods for parallel computation of
complex flow problems . . . . . . . . . 2039--2066
Richard A. Robb Visualization in biomedical computing 2067--2110
Kenneth C. Bowler and
Anthony J. G. Hey Parallel computing and quantum
chromodynamics . . . . . . . . . . . . . 2111--2134
Hermann Mierendorff and
Wolfgang Joppich Empirical performance modeling for
parallel weather prediction codes . . . 2135--2148
Stavros A. Zenios High-performance computing in finance:
The last 10 years and the next . . . . . 2149--2175
Andreas Reuter Methods for parallel execution of
complex database queries . . . . . . . . 2177--2188
Anonymous Index . . . . . . . . . . . . . . . . . 2189--2196
G. Ch. Pflug and
A. \'Swi\cetanowski Selected parallel optimization methods
for financial management under
uncertainty . . . . . . . . . . . . . . 3--25
Beno\^\it Bourbeau and
Teodor Gabriel Crainic and
Bernard Gendron Branch-and-bound parallelization
strategies applied to a depot location
and container fleet management problem 27--46
Ricardo C. Corrêa A parallel approximation scheme for the
multiprocessor scheduling problem . . . 47--72
Stella C. S. Porto and
João Paulo F. W. Kitajima and
Celso C. Ribeiro Performance evaluation of a parallel
tabu search task scheduling algorithm 73--90
Michel Toulouse and
Teodor Gabriel Crainic and
K. Thulasiraman Global optimization properties of
parallel cooperative search algorithms:
a simulation study . . . . . . . . . . . 91--112
D. G. Morales and
others Parallel dynamic programming and
automata theory . . . . . . . . . . . . 113--134
M. D. Durand and
Steve R. White Trading accuracy for speed in parallel
simulated annealing with simultaneous
moves . . . . . . . . . . . . . . . . . 135--150
I. Maros and
G. Mitra Investigating the sparse simplex
algorithm on a distributed memory
multiprocessor . . . . . . . . . . . . . 151--170
Mohammed Atiquzzaman and
Pradip K. Srimani Parallel computing on clusters of
workstations . . . . . . . . . . . . . . 175--177
W.-M. Lin and
W. Xie Load-skewing task assignment to minimize
communication conflicts on network of
workstations . . . . . . . . . . . . . . 179--197
Stephen R. Donaldson and
Jonathan M. D. Hill and
David B. Skillicorn BSP clusters: High performance, reliable
and very low cost . . . . . . . . . . . 199--242
Ron Brightwell and
others Massively parallel computing using
commodity components . . . . . . . . . . 243--266
N. Melab and
E.-G. Talbi Parallel adaptive computing on
meta-systems including NOWs . . . . . . 267--284
John C. Chu and
Patrick W. Dowd Adaptive cache coherence over a high
bandwidth broadband mesh network . . . . 285--311
Edward K. Blum and
Xin Wang and
Patrick Leung Architectures and message-passing
algorithms for cluster computing: Design
and performance . . . . . . . . . . . . 313--332
G. Chiola and
G. Ciaccio Efficient parallel processing on
low-cost clusters with GAMMA active
ports . . . . . . . . . . . . . . . . . 333--354
Yung-Lin Liu and
Chung-Ta King EXPLORER: Supporting run-time
parallelization of DOACROSS loops on
general networks of workstations . . . . 355--375
N. Marco and
S. Lanteri A two-level parallelization strategy for
Genetic Algorithms applied to optimum
shape design . . . . . . . . . . . . . . 377--397
Moez Ayed and
Jean-Luc Gaudiot An efficient heuristic for code
partitioning . . . . . . . . . . . . . . 399--426
Peter K. K. Loh and
Wen Jing Hsu The Josephus cube: a novel
interconnection network . . . . . . . . 427--453
Pao-Hwa Sui and
Sheng-De Wang A fault-tolerant routing algorithm for
wormhole routed meshes . . . . . . . . . 455--465
Taesoon Park and
Heon Y. Yeom Application controlled checkpointing
coordination for fault-tolerant
distributed computing systems . . . . . 467--482
Costas S. Iliopoulos and
James F. Reid Optimal parallel analysis and
decomposition of partially occluded
strings . . . . . . . . . . . . . . . . 483--494
A. Bevilacqua and
E. Loli Piccolomini Parallel image restoration on parallel
and distributed computers . . . . . . . 495--506
Erricos John Kontoghiorghes and
Anna Nagurney and
Berç Rustem Parallel computing in economics, finance
and decision-making . . . . . . . . . . 507--509
S. A. MirHassani and
C. Lucas and
G. Mitra and
E. Messina and
C. A. Poojari Computational solution of capacity
planning models under uncertainty . . . 511--538
G. Zanghirati and
F. Cocco and
G. Paruolo and
F. Taddei A Cray T3E implementation of a parallel
stochastic dynamic assets and
liabilities management model . . . . . . 539--567
Cyril Godart Parallel implementation of a two-factor
Cheyette-beta model calibration . . . . 569--586
Rodolphe Chatagny and
Bastien Chopard A parallel model for the foreign
exchange market . . . . . . . . . . . . 587--600
F. O. Bunnin and
Y. Guo and
Y. Ren and
J. Darlington Design of high performance financial
modelling environment . . . . . . . . . 601--622
S. C. Perry and
R. H. Grimwood and
D. J. Kerbyson and
E. Papaefstathiou and
G. R. Nudd Performance optimization of financial
option calculations . . . . . . . . . . 623--639
Jenny X. Li and
Gary L. Mullen Parallel computing of a quasi-Monte
Carlo algorithm for valuing derivatives 641--653
Elias S. Manolakos and
Haris M. Stellakis Systematic synthesis of parallel
architectures for the computation of
higher order cumulants . . . . . . . . . 655--676
E. W. Evans and
S. P. Johnson and
P. F. Leggett and
M. Cross Automatic and effective
multi-dimensional parallelisation of
structured mesh based codes . . . . . . 677--703
R. Keppens and
G. Tóth Using high performance Fortran for
magnetohydrodynamic simulations . . . . 705--722
Keqin Li and
Yi Pan and
Mounir Hamdi Solving graph theory problems using
reconfigurable pipelined optical buses 723--735
Arjen Schoneveld and
Peter M. A. Sloot and
Martin Lees and
Erwan Karyadi A framework for dynamic load balancing:
a case study on explosive containment
simulation . . . . . . . . . . . . . . . 737--751
C. Rodríguez and
J. L. Roda and
F. Sande and
D. G. Morales and
F. Almeida A new parallel model for the analysis of
asynchronous algorithms . . . . . . . . 753--767
Huan-Chao Keh and
Jen-Chih Lin On fault-tolerant embedding of
Hamiltonian cycles, linear arrays and
rings in a Flexible Hypercube . . . . . 769--781
Jan Trdli\vcka and
Pavel Tvrdík Embedding complete $k$-ary trees into
$k$-square $2$D meshes with optimal edge
congestion . . . . . . . . . . . . . . . 783--790
Shijun Diao and
T. Fujiwara Evaluation and strategy of different
data parallel implementation methods of
a stiff chemical non-equilibrium flow
solver . . . . . . . . . . . . . . . . . 791--804
J. G. Liu and
F. H. Y. Chan and
F. K. Lam and
H. F. Li A new approach to fast calculation of
moments of $3$-D gray level images . . . 805--815
Jerzy Leszczynski Computational chemistry . . . . . . . . 817--818
Wanda Andreoni and
Alessandro Curioni New advances in chemistry and materials
science with CPMD and parallel computing 819--842
C. P. Sosa and
G. Scalmani and
R. Gomperts and
M. J. Frisch Ab initio quantum chemistry on a ccNUMA
architecture using openMP. III . . . . . 843--856
John D. Watts Parallel algorithms for coupled-cluster
methods . . . . . . . . . . . . . . . . 857--867
Ross H. Nobes and
Alistair P. Rendell and
Jarek Nieplocha Computational chemistry on Fujitsu
vector-parallel processors: Hardware and
programming environment . . . . . . . . 869--886
Alistair P. Rendell and
others Computational chemistry on Fujitsu
vector-parallel processors: Development
and performance of applications software 887--911
Piotr Piecuch and
Joseph I. Landman Parallelization of multi-reference
coupled-cluster method . . . . . . . . . 913--943
David E. Bernholdt Scalability of correlated electronic
structure calculations on parallel
computers: a case study of the RI-MP2
method . . . . . . . . . . . . . . . . . 945--963
Dennis M. Newns and
others Molecular dynamics study of structure
and gating of low molecular weight ion
channels . . . . . . . . . . . . . . . . 965--976
Barry Robson Simplified models of protein folding
exploiting the Lagrange radius of
gyration of the hydrophobic component 977--998
Jacek Komasa and
Jacek Rychlewski Solving quantum-mechanical problems on
parallel systems . . . . . . . . . . . . 999--1009
Jon Baker and
Matt Shirel Ab initio quantum chemistry on PC-based
parallel supercomputers . . . . . . . . 1011--1024
Marc Pavese and
Soonmin Jang and
Gregory A. Voth Centroid molecular dynamics: a quantum
dynamics method suitable for the
parallel computer . . . . . . . . . . . 1025--1041
Leonid Gorb and
Ilya Yanov and
Jerzy Leszczynski High performance computing on the Cray
T3E and IBM SP2 systems with the
parallel version of GAUSSIAN 94 . . . . 1043--1060
Jacek Ba\lewicz and
Klaus H. Ecker and
Tao Yang New trends on scheduling in parallel and
distributed systems . . . . . . . . . . 1061--1063
Jacques Verriet Scheduling outtrees of height one in the
LogP model . . . . . . . . . . . . . . . 1065--1082
Welf Löwe and
Wolf Zimmermann Scheduling balanced task-graphs to
LogP-machines . . . . . . . . . . . . . 1083--1108
Tomasz Kalinowski and
Iskander Kort and
Denis Trystram List scheduling of general task graphs
under LogP . . . . . . . . . . . . . . . 1109--1128
Chams Lahlou Approximation algorithms for scheduling
with a limited number of communications 1129--1162
Philippe Chrétienne On Graham's bound for cyclic scheduling 1163--1174
Alain Darte On the complexity of loop fusion . . . . 1175--1193
Jacek Ba\lewicz and
Maciej Drozdowski and
Piotr Formanowicz and
Wies\law Kubiak and
Günter Schmidt Scheduling preemptable tasks on parallel
processors with limited availability . . 1195--1211
Luis Miguel Campos and
Isaac D. Scherson Rate of change load balancing in
distributed and parallel systems . . . . 1213--1230
Alan D. George and
Jeff Markwell and
Ryan Fogarty Real-time sonar beamforming on
high-performance distributed computers 1231--1252
J. Chassin de Kergommeaux and
B. Stein and
P. E. Bernard Pajé, an interactive visualization tool
for tuning multi-threaded parallel
applications . . . . . . . . . . . . . . 1253--1274
Weng-Long Chang and
Chih-Ping Chu The infinity Lambda test: a
multi-dimensional version of Banerjee
infinity test . . . . . . . . . . . . . 1275--1295
David K. Lowenthal and
Vincent W. Freeh Architecture-independent parallelism for
both shared- and distributed-memory
machines using the Filaments package . . 1297--1323
Minyi Guo and
Ikuo Nakata and
Yoshiyuki Yamashita Contention-free communication scheduling
for array redistribution . . . . . . . . 1325--1343
Peter Benner and
Ralph Byers and
Enrique S. Quintana-Ortí and
Gregorio Quintana-Ortí Solving algebraic Riccati equations on
parallel computers using Newton's method
with exact line search . . . . . . . . . 1345--1368
Peiyi Tang and
Jingling Xue Generating efficient tiled code for
distributed memory machines . . . . . . 1369--1410
Sajal K. Das and
M. Cristina Pinotti Parallel priority queues based on
binomial heaps . . . . . . . . . . . . . 1411--1428
Clémentin Tayou Djamégni and
Patrice Quinton and
Sanjay Rajopadhye and
Tanguy Risset Derivation of systolic algorithms for
the algebraic path problem by recurrence
transformations . . . . . . . . . . . . 1429--1445
M. Manzur Murshed and
Richard P. Brent Adaptive AT 2 optimal algorithms on
reconfigurable meshes . . . . . . . . . 1447--1458
Tzung-Shi Chen and
Nen-Chung Wang and
Chih-Ping Chu Multicast communication in
wormhole-routed star graph
interconnection networks . . . . . . . . 1459--1490
J. A. Bakker Semantic partitioning as a basis for
parallel I/O in database management
systems . . . . . . . . . . . . . . . . 1491--1513
Rupak Biswas and
Bruce Hendrickson and
George Karypis Graph partitioning and parallel
computing . . . . . . . . . . . . . . . 1515--1517
Bruce Hendrickson and
Tamara G. Kolda Graph partitioning models for parallel
computing . . . . . . . . . . . . . . . 1519--1534
N. Touheed and
P. Selwood and
P. K. Jimack and
M. Berzins A comparison of some dynamic
load-balancing algorithms for a parallel
adaptive flow solver . . . . . . . . . . 1535--1554
Ralf Diekmann and
Robert Preis and
Frank Schlimbach and
Chris Walshaw Shape-optimized mesh partitioning and
load balancing for parallel adaptive FEM 1555--1581
Leonid Oliker and
Rupak Biswas and
Harold N. Gabow Parallel tetrahedral mesh adaptation
with dynamic load balancing . . . . . . 1583--1608
Burkhard Monien and
Robert Preis and
Ralf Diekmann Quality matching and local improvement
for multilevel graph-partitioning . . . 1609--1634
C. Walshaw and
M. Cross Parallel optimisation algorithms for
multilevel mesh partitioning . . . . . . 1635--1660
J. Rantakokko Partitioning strategies for structured
multiblock grids . . . . . . . . . . . . 1661--1680
J. Chassin de Kergommeaux and
P. J. Hatcher and
L. Rauchwerger Parallel computing for irregular
applications . . . . . . . . . . . . . . 1681--1684
Manuel Hermenegildo Parallelizing irregular and
pointer-based computations
automatically: Perspectives from logic
and constraint programming . . . . . . . 1685--1708
E. Gutiérrez and
R. Asenjo and
O. Plata and
E. L. Zapata Automatic parallelization of irregular
applications . . . . . . . . . . . . . . 1709--1738
F. Warren Burton and
David J. Simpson Memory requirements for parallel
programs . . . . . . . . . . . . . . . . 1739--1763
Andras Laszloffy and
Jingping Long and
Abani K. Patra Simple data management, scheduling and
solution strategies for managing the
irregularities in parallel adaptive hp
finite element simulations . . . . . . . 1765--1788
Frédéric Brégier and
Marie-Christine Counilh and
Jean Roman Scheduling loops with partial
loop-carried dependencies . . . . . . . 1789--1806
Thomas Brandes and
Cécile Germain-Renaud A schedule cache for data parallel
unstructured computations . . . . . . . 1807--1823
Thomas Decker Virtual data space --- load balancing
for irregular applications . . . . . . . 1825--1860
Hwansoo Han and
Chau-Wen Tseng Efficient compiler and run-time support
for parallel irregular reductions . . . 1861--1887
P. Beraldi and
L. Grandinetti and
R. Musmanno and
C. Triki Parallel algorithms to solve two-stage
stochastic linear programs with
robustness constraints . . . . . . . . . 1889--1908
C. S. Pua and
M. H. Williams and
D. H. Marwick Modelling parallel databases with
process algebra . . . . . . . . . . . . 1909--1924
Ming-Yang Su and
Hui-Ling Huang and
Gen-Huey Chen and
Dyi-Rong Duh Node-disjoint paths in incomplete
WK-recursive networks . . . . . . . . . 1925--1944
Roman Trobec Two-dimensional regular $d$-meshes . . . 1945--1953
Anonymous Index . . . . . . . . . . . . . . . . . 1955--1962
O. Ya\csar and
Y. Deng and
R. E. Tuzun and
D. Saltz New trends in high performance computing 1--2
R. Clint Whaley and
Antoine Petitet and
Jack J. Dongarra Automated empirical optimizations of
software and the ATLAS project . . . . . 3--35
Dinshaw S. Balsara and
Charles D. Norton Highly parallel structured adaptive mesh
refinement using parallel language-based
approaches . . . . . . . . . . . . . . . 37--70
Reginald L. Walker Search engine case study: searching the
Web using genetic programming and MPI 71--89
Yuefan Deng and
Alex Korobka The performance of a supercomputer built
with commodity components . . . . . . . 91--108
Michael D. Letherwood and
David D. Gunter Ground vehicle modeling and simulation
of military vehicles using high
performance computing . . . . . . . . . 109--140
Ting Chen and
Vladimir Filkov and
Steven S. Skiena Identifying gene regulatory networks
from experimental data . . . . . . . . . 141--162
Alfredo U. Luccio Numerical simulation of particle
accelerators . . . . . . . . . . . . . . 163--177
O. Ya\csar A new ignition model for spark-ignited
engine simulations . . . . . . . . . . . 179--200
César Rego Node-ejection chains for the vehicle
routing problem: Sequential and parallel
algorithms . . . . . . . . . . . . . . . 201--222
Antonio Corradi and
Letizia Leonardi and
Franco Zambonelli Parallel object allocation via
user-specified directives: a case study
in traffic simulation . . . . . . . . . 223--241
Patrick Dymond and
Jieliang Zhou and
Xiaotie Deng A $2$-D parallel convex hull algorithm
with optimal communication phases . . . 243--255
Sathiamoorthy Manoharan Effect of task duplication on the
assignment of dependency graphs . . . . 257--268
Masayoshi Aritsugi and
Hiroki Fukatsu and
Yoshinari Kanamori Several partitioning strategies for
parallel image convolution in a network
of heterogeneous workstations . . . . . 269--293
B. Di Martino and
S. Briguglio and
G. Vlad and
P. Sguazzero Parallel PIC plasma simulation through
particle decomposition techniques . . . 295--314
Avi Kavas and
David Er-El and
Dror G. Feitelson Using multicast to pre-load jobs on the
ParPar cluster . . . . . . . . . . . . . 315--327
J. W. Manke Parallel computing in aerospace . . . . 329--336
William D. Gropp and
Dinesh K. Kaushik and
David E. Keyes and
Barry F. Smith High-performance parallel implicit CFD 337--362
M. Garbey and
Yu. V. Vassilevski A parallel solver for unsteady
incompressible $3$D Navier--Stokes
equations . . . . . . . . . . . . . . . 363--389
Jay Hoeflinger and
Prasad Alavilli and
Thomas Jackson and
Bob Kuhn Producing scalable performance with
OpenMP: Experiments with two CFD
applications . . . . . . . . . . . . . . 391--413
P. Aumann and
others MEGAFLOW: Parallel complete aircraft CFD 415--440
M. S. Fisher and
M. Mani and
D. Stookesberry Parallel processing with the Wind CFD
code at Boeing . . . . . . . . . . . . . 441--456
Joseph W. Manke and
G. David Kerlick and
David Levine and
Subhankar Banerjee and
Eric Dillon Parallel performance of two applications
in the Boeing high performance computing
benchmark suite . . . . . . . . . . . . 457--475
Piyush Mehrotra and
Hans Zima High Performance Fortran for aerospace
applications . . . . . . . . . . . . . . 477--501
Paul D. Hovland and
Lois C. McInnes Parallel simulation of compressible flow
using automatic differentiation and
PETSc . . . . . . . . . . . . . . . . . 503--519
James R. Taft Achieving 60 GFLOP/s on the production
CFD code OVERFLOW-MLP . . . . . . . . . 521--536
Stefania Bandini and
Giancarlo Mauri and
Roberto Serra Cellular automata: From modeling to
applications . . . . . . . . . . . . . . 537--538
S. Bandini and
G. Mauri and
R. Serra Cellular automata: From a theoretical
parallel computational model to its
application to complex systems . . . . . 539--553
Andreas Beckers and
Thomas Worsch A perimeter-time CA for the queen bee
problem . . . . . . . . . . . . . . . . 555--569
F. Jiménez Morales and
J. P. Crutchfield and
M. Mitchell Evolving two-dimensional cellular
automata to perform density
classification: a report on work in
progress . . . . . . . . . . . . . . . . 571--585
Hiroshi Umeo Linear-time recognition of connectivity
of binary images on $1$-bit inter-cell
communication cellular automaton . . . . 587--599
Jörg R. Weimar Coupling microscopic and macroscopic
cellular automata . . . . . . . . . . . 601--611
B. Ostrovsky and
G. Crooks and
M. A. Smith and
Y. Bar-Yam Cellular automata for polymer simulation
with application to polymer melts and
polymer collapse including implications
for protein folding . . . . . . . . . . 613--641
Stefania Bandini and
Massimiliano Magagnini Parallel processing simulation of
dynamic properties of filled rubber
compounds based on cellular automata . . 643--661
Roberto Serra and
Marco Villani and
Anna Salvemini Continuous genetic networks . . . . . . 663--683
R. Cappuccio and
G. Cattaneo and
G. Erbacci and
U. Jocher A parallel implementation of a cellular
automata based model for coffee
percolation . . . . . . . . . . . . . . 685--717
J. Wahle and
L. Neubert and
J. Esser and
M. Schreckenberg A cellular automaton traffic flow model
for online simulation of traffic . . . . 719--735
Th. Lippert and
N. Petkov and
P. Palazzari and
K. Schilling Hyper-systolic matrix multiplication . . 737--759
Gundolf Haase and
Michael Kuhn and
Ulrich Langer Parallel multigrid $3$D Maxwell solvers 761--775
Yair Censor and
Dan Gordon and
Rachel Gordon Component averaging: an efficient
iterative parallel algorithm for large
and sparse unstructured problems . . . . 777--808
Alexandros V. Gerbessiotis and
Constantinos J. Siniolakis Merging on the BSP model . . . . . . . . 809--822
Ishfaq Ahmad and
Shahriar M. Akramullah and
Ming L. Liou and
Muhammad Kafil A scalable off-line MPEG-2 video
encoding scheme using a multiprocessor
system . . . . . . . . . . . . . . . . . 823--846
Paul N. Swarztrauber and
Steven W. Hammond A comparison of optimal FFTs on torus
and hypercube multicomputers . . . . . . 847--859
Muhammad H. Alsuwaiyel An optimal parallel algorithm for the
multiselection problem . . . . . . . . . 861--865
Henk J. Sips and
Ruud Sommerhalder and
Erik D'Hollander Linear systems and associated problems 867--868
A. Basermann and
J. Fingberg and
G. Lonsdale and
B. Maerten and
C. Walshaw Dynamic multi-partitioning for parallel
finite element applications . . . . . . 869--881
Roman Geus and
Stefan Röllin Towards a fast parallel sparse symmetric
matrix-vector multiplication . . . . . . 883--896
D. B. Heras and
J. C. Cabaleiro and
F. F. Rivera Modeling data locality for the sparse
matrix-vector product using distance
measures . . . . . . . . . . . . . . . . 897--912
A. Cooper and
M. Szularz and
J. Weston External selective orthogonalization for
the Lanczos algorithm in distributed
memory environments . . . . . . . . . . 913--923
H. X. Lin A unifying graph model for designing
parallel algorithms for tridiagonal
systems . . . . . . . . . . . . . . . . 925--939
Peter Christen and
others Scalable parallel algorithms for surface
fitting and data mining . . . . . . . . 941--961
Luca Bergamaschi and
Giorgio Pini and
Flavio Sartoretto Parallel preconditioning of a sparse
eigensolver . . . . . . . . . . . . . . 963--976
Yuto Komeiji and
Makoto Haraguchi and
Umpei Nagashima Parallel molecular dynamics simulation
of a protein . . . . . . . . . . . . . . 977--987
Mardochée Magolu monga Made and
Henk A. van der Vorst Parallel incomplete factorizations with
pseudo-overlapped subdomains . . . . . . 989--1008
Arnold Krechel and
Klaus Stüben Parallel algebraic multigrid based on
subdomain blocking . . . . . . . . . . . 1009--1031
Azzedine Boukerche and
Carl Tropper Local versus global lookahead in
conservative parallel simulations . . . 1033--1055
Byung S. Yoo and
Chita R. Das Efficient processor management schemes
for mesh-connected multicomputers . . . 1057--1078
Constantine Katsinis Performance analysis of the simultaneous
optical multi-processor exchange bus . . 1079--1115
Weng-Long Chang and
Chih-Ping Chu The generalized Direction Vector I test 1117--1144
M. Alabdulkareem and
S. Lakshmivarahan and
S. K. Dhall Scalability analysis of large codes
using factorial designs . . . . . . . . 1145--1171
Daeyeon Park and
Byeong Hag Seong and
Rafael H. Saavedra Adaptive software prefetching in
scalable multiprocessors using cache
information . . . . . . . . . . . . . . 1173--1195
Paraskevas Evripidou $D^3$-Machine: a decoupled data-driven
multithreaded architecture with variable
resolution support . . . . . . . . . . . 1197--1225
Vittorio Cortellessa and
Francesco Quaglia A checkpointing-recovery scheme for Time
Warp parallel simulation . . . . . . . . 1227--1252
Dolors Royo and
Miguel Valero-García and
Antonio González Implementing the one-sided Jacobi method
on a $2$D/$3$D mesh multicomputer . . . 1253--1271
Gen-Huey Chen and
Shien-Ching Hwang and
Hui-Ling Huang and
Ming-Yang Su and
Dyi-Rong Duh A general broadcasting scheme for
recursive networks with complete
connection . . . . . . . . . . . . . . . 1273--1278
Gabriel Antoniu and
others The Hyperion system: Compiling
multithreaded Java bytecode for
distributed execution . . . . . . . . . 1279--1297
Eric Noulard and
Nahid Emad A key for reusable parallel linear
algebra software . . . . . . . . . . . . 1299--1319
Jeff Boleng and
Manavendra Misra Load balanced parallel QR decomposition
on shared memory multiprocessors . . . . 1321--1345
L. F. Romero and
E. M. Ortigosa and
E. L. Zapata Data-task parallelism for the VMEC
program . . . . . . . . . . . . . . . . 1347--1364
O. Yu. Milyukova Parallel approximate factorization
method for solving discrete elliptic
equations . . . . . . . . . . . . . . . 1365--1379
J. Al-Sadi and
K. Day and
M. Ould-Khaoua Fault-tolerant routing in hypercubes
using probability vectors . . . . . . . 1381--1399
Jack Dongarra and
Masaaki Shimasaki and
Bernard Tourancheau Clusters and computational grids for
scientific computing . . . . . . . . . . 1401--1402
Cherri M. Pancake Performance tools for today's HPC: Are
we addressing the right issues? . . . . 1403--1415
Ralph Butler and
William Gropp and
Ewing Lusk Components and interfaces of a process
management system for parallel programs 1417--1429
Thilo Kielmann and
Henri E. Bal and
Sergei Gorlatch and
Kees Verstoep and
Rutger F. H. Hofman Network performance-aware collective
communication for clustered wide-area
systems . . . . . . . . . . . . . . . . 1431--1456
Michael D. Beynon and
others Distributed processing of very large
datasets with DataCutter . . . . . . . . 1457--1478
Graham E. Fagg and
Antonin Bukovsky and
Jack J. Dongarra HARNESS and fault tolerant MPI . . . . . 1479--1495
E. Caron and
others \sc Scilab to \sc Scilab$_{//}$: The \sc
Ouragan project . . . . . . . . . . . . 1497--1519
Michael Florian and
Michel Gendreau Applications of parallel computing in
transportation . . . . . . . . . . . . . 1521--1522
S. C. Wong and
C. K. Wong and
C. O. Tong A parallelized genetic algorithm for the
calibration of Lowry model . . . . . . . 1523--1536
Michelle R. Hribar and
Valerie E. Taylor and
David E. Boyce Implementing parallel shortest path for
parallel transportation applications . . 1537--1568
N. Tremblay and
M. Florian Temporal shortest paths: Parallel
computing implementations . . . . . . . 1569--1609
Kai Nagel and
Marcus Rickert Parallel implementation of the TRANSIMS
micro-simulation . . . . . . . . . . . . 1611--1639
Michel Gendreau and
Gilbert Laporte and
Frédéric Semet A dynamic model and parallel tabu search
heuristic for real-time ambulance
relocation . . . . . . . . . . . . . . . 1641--1653
Laurent Hascoët A method for automatic placement of
communications in SPMD parallelisation 1655--1664
Giuseppe Passoni and
Paolo Cremonesi and
Giancarlo Alfonsi Analysis and implementation of a
parallelization strategy on a
Navier--Stokes solver for shear flow
simulations . . . . . . . . . . . . . . 1665--1685
B. V. Rathish Kumar and
T. Yamaguchi and
H. Liu and
R. Himeno A parallel $3$D unsteady incompressible
flow solver on VPP700 . . . . . . . . . 1687--1713
Ignacio M. Llorente and
Manuel Prieto-Matías and
Boris Diskin A parallel multigrid solver for $3$D
convection and convection-diffusion
problems . . . . . . . . . . . . . . . . 1715--1741
M. Arenaz and
R. Doallo and
J. Touriño and
C. Vázquez Efficient parallel numerical solver for
the elastohydrodynamic Reynolds-Hertz
problem . . . . . . . . . . . . . . . . 1743--1765
Wahid Nasri and
Zaher Mahjoub Optimal parallelization of a recursive
algorithm for triangular matrix
inversion on MIMD computers . . . . . . 1767--1782
Weng-Long Chang and
Chih-Ping Chu and
Jia-Hwa Wu A multi-dimensional version of the I
test . . . . . . . . . . . . . . . . . . 1783--1799
H. Sarbazi-Azad and
M. Ould-Khaoua and
L. M. Mackenzie Communication delay in hypercubes in the
presence of bit-reversal traffic . . . . 1801--1816
Jau-Der Shih Wormhole routing for torus networks with
faults . . . . . . . . . . . . . . . . . 1817--1829
Takahiro Katagiri and
Yasumasa Kanada An efficient implementation of parallel
eigenvalue computation for massively
parallel processing . . . . . . . . . . 1831--1845
Márcia A. Inda and
Rob H. Bisseling A simple and efficient parallel FFT
algorithm using the BSP model . . . . . 1847--1878
C. Bekas and
E. Gallopoulos Cobra: Parallel path following for
computing the matrix pseudospectrum . . 1879--1896
Wei Shi and
Pradip K. Srimani A regular scalable fault tolerant
interconnection network for distributed
processing . . . . . . . . . . . . . . . 1897--1919
P. Dmitruk and
L.-P. Wang and
W. H. Matthaeus and
R. Zhang and
D. Seckel Scalable parallel FFT for spectral
simulations on a Beowulf cluster . . . . 1921--1936
Anonymous Author index to volume 27 . . . . . . . 1937--1944
Gerhard R. Joubert Editorial . . . . . . . . . . . . . . . 1--2
Angela C. Sodan Applications on a multithreaded
architecture: a case study with
EARTH-MANNA . . . . . . . . . . . . . . 3--33
Hendrik L. Tolman Distributed-memory concepts in the wave
model WAVEWATCH III . . . . . . . . . . 35--52
P. Wang and
Karen Y. Liu and
Tom Cwik and
Robert Green MODTRAN on supercomputers and parallel
computers . . . . . . . . . . . . . . . 53--64
Fusen He and
Jie Wu An efficient parallel implementation of
the Everglades Landscape Fire Model
using checkpointing . . . . . . . . . . 65--82
Rajeev Thakur and
William Gropp and
Ewing Lusk Optimizing noncontiguous accesses in
MPI-IO . . . . . . . . . . . . . . . . . 83--105
Hung-Chang Hsiao and
Chung-Ta King Implementation and evaluation of
directory hints in CC-NUMA
multiprocessors . . . . . . . . . . . . 107--132
Huei-Huang Chang and
Ge-Ming Chiu An improved fault-tolerant routing
algorithm in meshes with convex faults 133--149
Erricos John Kontoghiorghes and
Ahmed Sameh and
Denis Trystram Special issue on parallel matrix
algorithms and applications . . . . . . 151--153
Olivier Beaumont and
Arnaud Legrand and
Fabrice Rastello and
Yves Robert Dense linear algebra kernels on
heterogeneous platforms: Redistribution
issues . . . . . . . . . . . . . . . . . 155--185
Olaf Schenk and
Klaus Gärtner Two-level dynamic scheduling in PARDISO:
Improved scalability on shared memory
multiprocessing systems . . . . . . . . 187--197
Dany Mezher and
Bernard Philippe Parallel computation of pseudospectra of
large sparse matrices . . . . . . . . . 199--221
C. Bekas and
E. Gallopoulos Parallel computation of pseudospectra by
fast descent . . . . . . . . . . . . . . 223--242
M. Be\vcka and
G. Ok\vsa and
M. Vajter\vsic Dynamic ordering for a parallel
block-Jacobi SVD algorithm . . . . . . . 243--262
Martin H. Gutknecht and
Stefan Röllin The Chebyshev iteration revisited . . . 263--283
Ahmed H. Sameh and
Vivek Sarin Parallel algorithms for indefinite
linear systems . . . . . . . . . . . . . 285--299
P. Hénon and
P. Ramet and
J. Roman \sc PaStiX: a high-performance parallel
direct solver for sparse symmetric
positive definite systems . . . . . . . 301--321
Y. Liang and
J. Weston and
M. Szularz Generalized least-squares polynomial
preconditioners for symmetric indefinite
linear equations . . . . . . . . . . . . 323--341
Joël M. Malard Parallel restricted maximum likelihood
estimation for linear models with a
dense exogenous matrix . . . . . . . . . 343--353
Wojciech Owczarz and
Zahari Zlatev Parallel matrix computations in air
pollution modelling . . . . . . . . . . 355--368
B. Nkonga and
P. Charrier Generalized parcel method for dispersed
spray and message passing strategy on
unstructured meshes . . . . . . . . . . 369--398
Stephen H. Brill and
George F. Pinder Parallel implementation of the Bi-CGSTAB
method with block red-black
Gauss--Seidel preconditioner applied to
the Hermite collocation discretization
of partial differential equations . . . 399--414
Harald J. Ehold and
Wilfried N. Gansterer and
Dieter F. Kvasnicka and
Christoph W. Ueberhuber Optimizing Local Performance in HPF . . 415--432
Alain Girault Elimination of redundant messages with a
two-pass static analysis algorithm . . . 433--453
Kleanthis Psarris Program analysis techniques for
transforming programs for parallel
execution . . . . . . . . . . . . . . . 455--469
Jen-Chih Lin and
Nan-Chen Hsien Reconfiguring binary tree structures in
a faulty supercube with unbounded
expansion . . . . . . . . . . . . . . . 471--483
F. Quaglia and
B. Ciciani and
M. Colajanni Performance analysis of adaptive
wormhole routing in a two-dimensional
torus . . . . . . . . . . . . . . . . . 485--501
Yosi Ben-Asher The parallel client-server paradigm . . 503--523
Abdelkader Hameurlain and
Franck Morvan CPU and incremental memory allocation in
dynamic parallelization of SQL queries 525--556
A. Goscinski and
M. Hobbs and
J. Silcock GENESIS: an efficient, transparent and
easy to use cluster operating system . . 557--606
Olivier Aumage and
Luc Bougé and
Jean-François Méhaut and
Raymond Namyst Madeleine II: a portable and efficient
communication library for
high-performance cluster computing . . . 607--626
Petr Salinger and
Pavel Tvrdík Optimal broadcasting and gossiping in
one-port meshes of trees with
distance-insensitive routing . . . . . . 627--647
Abderezak Touzene Edges-disjoint spanning trees on the
binary wrapped butterfly network with
applications to fault tolerance . . . . 649--666
Roberto Serra and
Marco Villani and
Anna Salvemini Erratum to ``Continuous genetic
networks'' [Parallel Comput. 27(5)
(2001) 663--683] . . . . . . . . . . . . 667--667
Domenico Talia and
Pradip K. Srimani Guest editorial: Parallel data-intensive
algorithms and applications . . . . . . 669--671
Mario Cannataro and
Domenico Talia and
Pradip K. Srimani Parallel data intensive computing in
scientific and commercial applications 673--704
P. Sanders Reconciling simplicity and realism in
parallel disk models . . . . . . . . . . 705--723
Renato Ferreira and
Gagan Agrawal and
Joel Saltz Data parallel language and compiler
support for data intensive applications 725--748
Bill Allcock and
others Data management and transfer in
high-performance computational grid
environments . . . . . . . . . . . . . . 749--771
Yanyan Yang and
others Agent based data management in digital
libraries . . . . . . . . . . . . . . . 773--792
Massimo Coppola and
Marco Vanneschi High-performance data mining with
skeleton-based structured parallel
programming . . . . . . . . . . . . . . 793--813
D. B. Skillicorn Parallel frequent set counting . . . . . 815--825
Michael Beynon and
others Processing large-scale multi-dimensional
data in parallel and distributed
environments . . . . . . . . . . . . . . 827--859
Pablo A. Estévez and
Hél\`ene Paugam-Moisy and
Didier Puzenat and
Manuel Ugarte A scalable parallel algorithm for
training a hierarchical mixture of
neural experts . . . . . . . . . . . . . 861--891
Tyng-Yeu Liang and
Ce-Kuen Shieh and
Jun-Qi Li Selecting threads for workload migration
in software distributed shared memory
systems . . . . . . . . . . . . . . . . 893--913
Jingling Xue and
Wentong Cai Time-minimal tiling when rise is larger
than zero . . . . . . . . . . . . . . . 915--939
Andreas Uhl and
Peter Zinterhof Guest editorial: Parallel computing in
image and video processing . . . . . . . 941--943
Cristina Nicolescu and
Pieter Jonker A data and task parallel image
processing environment . . . . . . . . . 945--965
F. J. Seinstra and
D. Koelma and
J. M. Geusebroek A software architecture for user
transparent parallel image processing 967--993
A. Biancardi and
A. Mérigot Extending the data parallel paradigm
with data-dependent operators . . . . . 995--1021
Francisco Argüello and
Juan López and
María A. Trenas and
Emilio L. Zapata Architecture for wavelet packet
transform based on lifting steps . . . . 1023--1037
Ishfaq Ahmad and
Yong He and
Ming L. Liou Video compression with parallel
processing . . . . . . . . . . . . . . . 1039--1078
Hazem M. Abbas and
Mohamed M. Bayoumi Parallel codebook design for vector
quantization on a message passing MIMD
architecture . . . . . . . . . . . . . . 1079--1093
Rade Kutil Approaches to zerotree image and video
coding on MIMD architectures . . . . . . 1095--1109
Aravind Dasu and
Sethuraman Panchanathan Reconfigurable media processing . . . . 1111--1139
K. Benkrid and
D. Crookes and
A. Benkrid Towards a general framework for FPGA
based image processing using hardware
skeletons . . . . . . . . . . . . . . . 1141--1154
A. C. Zawada and
N. L. Seed and
P. A. Ivey Continuous and high coverage
self-testing of dynamically
re-configurable systems . . . . . . . . 1155--1178
Virginie Fresse and
Olivier Deforges ARIAL: R\em apid P\em rototyping for
M\em ixed and P\em arallel P\em latforms 1179--1202
Edwige Pissaloux and
Franck Amiot and
Tharam Dillon A vision-application adaptable computer
concept and its implementation in
FreeTIV computer . . . . . . . . . . . . 1203--1219
Anonymous IFC --- Inside Front Cover (Editorial
Board) . . . . . . . . . . . . . . . . . CO2--CO2
Mark Christiaens and
Michiel Ronsse and
Koen De Bosschere Bounding the number of segment histories
during data race detection . . . . . . . 1221--1238
Mikhail S. Tarkov and
Youngsong Mun and
Jaeyoung Choi and
Hyung-Il Choi Mapping adaptive fuzzy Kohonen
clustering network onto distributed
image processing system . . . . . . . . 1239--1256
Erricos John Kontoghiorghes Greedy Givens algorithms for computing
the rank-$k$ updating of the QR
decomposition . . . . . . . . . . . . . 1257--1273
Ke Chen and
Choi H. Lai Parallel algorithms of the Purcell
method for direct solution of linear
systems . . . . . . . . . . . . . . . . 1275--1291
Shao Dong Chen and
Hong Shen and
Rodney Topor An efficient algorithm for constructing
Hamiltonian paths in meshes . . . . . . 1293--1305
Yuan-Shin Hwang Parallelizing graph construction
operations in programs with cyclic
graphs . . . . . . . . . . . . . . . . . 1307--1328
PeiZong Lee and
Wen-Yao Chen Generating communication sets of array
assignment statements for block-cyclic
distribution on distributed memory
parallel computers . . . . . . . . . . . 1329--1368
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Alexey Lastovetsky Adaptive parallel computing on
heterogeneous networks with mpC . . . . 1369--1407
Jeffrey Nesheiwat and
Boleslaw K. Szymanski Instrumentation database system for
performance analysis of parallel
scientific applications . . . . . . . . 1409--1449
Chi Shen and
Jun Zhang Parallel two level block ILU
preconditioning techniques for solving
large sparse linear systems . . . . . . 1451--1475
Lili Ju and
Qiang Du and
Max Gunzburger Probabilistic methods for centroidal
Voronoi tessellations and their parallel
implementations . . . . . . . . . . . . 1477--1500
Carlos Alberto Alonso Sanches and
Nei Yoshihiro Soma and
Horacio Hideki Yanasse Short communication: Comments on
parallel algorithms for the knapsack
problem . . . . . . . . . . . . . . . . 1501--1505
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Ian N. Dunn and
Gerard G. L. Meyer QR factorization for shared memory and
message passing . . . . . . . . . . . . 1507--1530
Jean-Guillaume Dumas and
Jean-Louis Roch On parallel block algorithms for exact
triangularizations . . . . . . . . . . . 1531--1548
Taesoon Park and
Inseon Lee and
Heon Y. Yeom An efficient causal logging scheme for
recoverable distributed shared memory
systems . . . . . . . . . . . . . . . . 1549--1572
Claire Hanen and
Alix Munier Kordon Minimizing the volume in scheduling an
out-tree with communication delays and
duplication . . . . . . . . . . . . . . 1573--1585
S. A. Jarvis and
J. M. D. Hill and
C. J. Siniolakis and
V. P. Vasilev Portable and architecture independent
parallel performance tuning using BSP 1587--1609
Li-Chiu Chang and
Fi-John Chang An efficient parallel algorithm for
LISSOM neural network . . . . . . . . . 1611--1633
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Pasqua D'Ambra and
Marco Danelutto and
Daniela di Serafino Advanced environments for parallel and
distributed computing . . . . . . . . . 1635--1636
Pasqua D'Ambra and
Marco Danelutto and
Daniela di Serafino and
Marco Lapegna Advanced environments for parallel and
distributed applications: a view of
current status . . . . . . . . . . . . . 1637--1662
S. MacDonald and
J. Anvik and
S. Bromling and
J. Schaeffer and
D. Szafron and
K. Tan From patterns to frameworks to parallel
programs . . . . . . . . . . . . . . . . 1663--1683
Jocelyn Sérot and
Dominique Ginhac Skeletons for parallel image processing:
an overview of the SKIPPER project . . . 1685--1708
Marco Vanneschi The programming model of ASSIST, an
environment for parallel and distributed
portable applications . . . . . . . . . 1709--1732
D. Laforenza Grid programming: some indications where
we are headed . . . . . . . . . . . . . 1733--1752
Nathalie Furmento and
Anthony Mayer and
Stephen McGough and
Steven Newhouse and
Tony Field and
John Darlington ICENI: Optimisation of component
applications within a Grid environment 1753--1772
Micah Beck and
Dorian Arnold and
Alessandro Bassi and
Fran Berman and
Henri Casanova and
Jack Dongarra and
Terry Moore and
Graziano Obertelli and
James Plank and
Martin Swany Middleware for the use of storage in
communication . . . . . . . . . . . . . 1773--1787
M. Di Santo and
F. Frattolillo and
W. Russo and
E. Zimeo A component-based approach to build a
portable and flexible middleware for
metacomputing . . . . . . . . . . . . . 1789--1810
Boyana Norris and
Satish Balay and
Steven Benson and
Lori Freitag and
Paul Hovland and
Lois McInnes and
Barry Smith Parallel components for PDEs and
optimization: some issues and
experiences . . . . . . . . . . . . . . 1811--1831
Anonymous Author Index . . . . . . . . . . . . . . 1833--1839
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
E. A. H. Vollebregt and
M. R. T. Roest and
J. W. M. Lander Large scale computing at Rijkswaterstaat 1--20
Leo Chin Sim and
Heiko Schroder and
Graham Leedham MIMD--SIMD hybrid system----towards a
new low cost parallel system . . . . . . 21--36
Hon F. Li and
Gabriel Girard View consistencies and exact
implementations . . . . . . . . . . . . 37--67
Ashok Srinivasan and
Michael Mascagni and
David Ceperley Testing parallel random number
generators . . . . . . . . . . . . . . . 69--94
Ramachandran Vaidyanathan and
Jerry L. Trahan and
Chun-ming Lu Degree of scalability: scalable
reconfigurable mesh algorithms for
multiple addition and matrix--vector
multiplication . . . . . . . . . . . . . 95--109
Salma A. Ghoneim and
Hossam M. A. Fahmy Job preemption, fast subcube compaction,
or waiting in hypercube systems? A
selection methodology . . . . . . . . . 111--134
Heejo Lee and
Jong Kim and
Sung Je Hong and
Sunggu Lee Task scheduling using a block dependency
DAG for block-oriented sparse Cholesky
factorization . . . . . . . . . . . . . 135--159
Oh-Han Kang and
Si-Gwan Kim A task duplication based scheduling
algorithm for shared memory
multiprocessors . . . . . . . . . . . . 161--166
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Hongzhang Shan and
Jaswinder P. Singh and
Leonid Oliker and
Rupak Biswas Message passing and shared address space
parallelism on an SMP cluster . . . . . 167--186
Olaf Bonorden and
Ben Juurlink and
Ingo von Otte and
Ingo Rieping The Paderborn University BSP (PUB)
library . . . . . . . . . . . . . . . . 187--207
Fabrice Rastello and
Amit Rao and
Santosh Pande Optimal task scheduling at run time to
exploit intra-tile parallelism . . . . . 209--239
D. González and
F. Almeida and
L. Moreno and
C. Rodríguez Towards the automatic optimal mapping of
pipeline algorithms . . . . . . . . . . 241--254
Cosimo Anglano and
Claudio Casetti and
Emilio Leonardi and
Fabio Neri Network interface multicast protocols
for wormhole-based networks of
workstations . . . . . . . . . . . . . . 255--283
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Erik Reinhard and
Dirk Bartz Parallel graphics and visualisation . . 285--288
Toshi Kato ``Kilauea''----parallel global
illumination renderer . . . . . . . . . 289--310
M. Isard and
M. Shand and
A. Heirich Distributed rendering of interactive
soft shadows . . . . . . . . . . . . . . 311--323
Wagner T. Corrêa and
James T. Klosowski and
Cláudio T. Silva Out-of-core sort-first parallel
rendering for cluster-based tiled
displays . . . . . . . . . . . . . . . . 325--338
Jürgen P. Schulze and
Ulrich Lang The parallelized perspective shear-warp
algorithm for volume rendering . . . . . 339--354
Li Chen and
Issei Fujishiro and
Kengo Nakajima Optimizing parallel performance of
unstructured volume rendering for the
Earth Simulator . . . . . . . . . . . . 355--371
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
A. Migdalas and
G. Toraldo and
V. Kumar Parallel computing in numerical
optimization . . . . . . . . . . . . . . 373--373
A. Migdalas and
G. Toraldo and
V. Kumar Nonlinear optimization and parallel
computing . . . . . . . . . . . . . . . 375--391
R. M. Aiex and
S. Binato and
M. G. C. Resende Parallel GRASP with path-relinking for
job shop scheduling . . . . . . . . . . 393--430
Jörgen Blomvall A multistage stochastic programming
algorithm suitable for parallel
computing . . . . . . . . . . . . . . . 431--445
Ricardo C. Corrêa and
Fernando C. Gomes and
Carlos A. S. Oliveira and
Panos M. Pardalos A parallel implementation of an
asynchronous team to the point-to-point
connection problem . . . . . . . . . . . 447--466
M. D'Apuzzo and
M. Marino Parallel computational issues of an
interior point method for solving large
bound-constrained quadratic programming
problems . . . . . . . . . . . . . . . . 467--483
C. Durazzi and
V. Ruggiero Numerical solution of special linear and
quadratic programs via a parallel
interior-point method . . . . . . . . . 485--503
Cristian Gatu and
Erricos J. Kontoghiorghes Parallel algorithms for computing all
possible subset regression models using
the QR decomposition . . . . . . . . . . 505--521
Susana Gómez and
Nelson del Castillo and
Longina Castellanos and
Julio Solano The parallel tunneling method . . . . . 523--533
G. Zanghirati and
L. Zanni A parallel solver for large quadratic
programs in training support vector
machines . . . . . . . . . . . . . . . . 535--551
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Anonymous Obituary: Harry F. Jordan . . . . . . . iii--iii
Gilbert Laporte and
Roberto Musmanno Parallel computing in logistics . . . . 553--554
James F. Campbell and
Gary Stiehr and
Andreas T. Ernst and
Mohan Krishnamoorthy Solving hub arc location problems on a
cluster of workstations . . . . . . . . 555--574
Félix García-López and
Belén Melián-Batista and
José A. Moreno-Pérez and
J. Marcos Moreno-Vega Parallelization of the scatter search
for the $p$-median problem . . . . . . . 575--589
Bernard Gendron and
Jean-Yves Potvin and
Patrick Soriano A parallel hybrid heuristic for the
multicommodity capacitated location
problem with balancing requirements . . 591--606
T. K. Ralphs Parallel branch and cut for capacitated
vehicle routing . . . . . . . . . . . . 607--629
Pierpaolo Caricato and
Gianpaolo Ghiani and
Antonio Grieco and
Emanuela Guerriero Parallel tabu search for a pickup and
delivery problem under track contention 631--639
A. Bortfeldt and
H. Gehring and
D. Mack A parallel tabu search algorithm for
solving the container loading problem 641--662
F. Guerriero and
M. Mancini A cooperative parallel rollout algorithm
for the sequential ordering problem . . 663--677
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Daisuke Takahashi A parallel $1$-D FFT algorithm for the
Hitachi SR8000 . . . . . . . . . . . . . 679--690
Coskun Mermer and
Donglok Kim and
Yongmin Kim Efficient 2D FFT implementation on
mediaprocessors . . . . . . . . . . . . 691--709
P. H. Muir and
R. N. Pancer and
K. R. Jackson PMIRKDC: a parallel mono-implicit
Runge--Kutta code with defect control
for boundary value ODEs . . . . . . . . 711--741
A. Plastino and
C. C. Ribeiro and
N. Rodriguez Developing SPMD applications with load
balancing . . . . . . . . . . . . . . . 743--766
Naya Nagy and
Selim G. Akl The maximum flow problem: a real-time
approach . . . . . . . . . . . . . . . . 767--794
Bassel R. Arafeh A task duplication scheme for resolving
deadlocks in clustered DAGs . . . . . . 795--820
Jung-Sheng Fu Fault-tolerant cycle embedding in the
hypercube . . . . . . . . . . . . . . . 821--832
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Patrick R. Amestoy and
Iain S. Duff and
Jean-Yves L'Excellent and
Xiaoye S. Li Impact of the implementation of MPI
point-to-point communications on the
performance of two general sparse
solvers . . . . . . . . . . . . . . . . 833--849
James Kohout and
Alan D. George A high-performance communication service
for parallel computing on distributed
DSP systems . . . . . . . . . . . . . . 851--878
Christopher J. Freitas and
Derrick B. Coffin and
Richard L. Murphy The characterization of a wide area
network computation . . . . . . . . . . 879--894
Lúcia M. A. Drummond and
Valmir C. Barbosa On reducing the complexity of matrix
clocks . . . . . . . . . . . . . . . . . 895--905
Manuel Prieto and
Ruben S. Montero and
Ignacio M. Llorente and
Francisco Tirado A parallel multigrid solver for viscous
flows on anisotropic structured grids 907--923
Manuel Díaz and
Bartolomé Rubio and
Enrique Soler and
José M. Troya Domain interaction patterns to
coordinate HPF tasks . . . . . . . . . . 925--951
Y. Tseng and
R. F. DeMara and
P. J. Wilder Distributed-sum termination detection
supporting multithreaded execution . . . 953--968
Wolfgang Blochinger and
Carsten Sinz and
Wolfgang Küchlin Parallel propositional satisfiability
checking with distributed dynamic
learning . . . . . . . . . . . . . . . . 969--994
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
M. Govett and
L. Hart and
T. Henderson and
J. Middlecoff and
D. Schaffer The Scalable Modeling System:
directive-based code parallelization for
distributed and shared memory computers 995--1020
Jorge Buenabad-Chávez and
Henk L. Muller and
Paul W. A. Stallard and
David H. D. Warren Virtual memory on data diffusion
architectures . . . . . . . . . . . . . 1021--1052
M. Yamashita and
K. Fujisawa and
M. Kojima SDPARA: SemiDefinite Programming
Algorithm paRAllel version . . . . . . . 1053--1067
V. Teuli\`ere and
Olivier Brun Parallelisation of the particle
filtering technique and application to
Doppler-bearing tracking of maneuvering
sources . . . . . . . . . . . . . . . . 1069--1090
Liang Peng and
Weng-Fai Wong and
Chung-Kwong Yuen SilkRoad II: mixed paradigm cluster
computing with RC\_dag consistency . . . 1091--1115
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Peter Arbenz and
Efstratios Gallopoulos and
Bernard Philippe and
Yousef Saad Parallel Matrix Algorithms and
Applications (PMAA '02) . . . . . . . . 1117--1119
Olivier Beaumont and
Arnaud Legrand and
Yves Robert Scheduling divisible workloads on
heterogeneous platforms . . . . . . . . 1121--1152
Martin Be\vcka and
Gabriel Ok\vsa On variable blocking factor in a
parallel dynamic block-Jacobi SVD
algorithm . . . . . . . . . . . . . . . 1153--1174
Olivier Coulaud and
Michaël Dussere and
Pascal Hénon and
Erik Lefebvre and
Jean Roman Optimization of a kinetic laser--plasma
interaction code for large parallel
systems . . . . . . . . . . . . . . . . 1175--1189
Abdou Guermouche and
Jean-Yves L'Excellent and
Gil Utard Impact of reordering on the memory of a
multifrontal solver . . . . . . . . . . 1191--1218
Hemant Mahawar and
Vivek Sarin Parallel iterative methods for dense
linear systems in inductance extraction 1219--1235
James R. McCombs and
Andreas Stathopoulos Parallel, multigrain iterative solvers
for hiding network latencies on MPPs and
networks of clusters . . . . . . . . . . 1237--1259
Sreekanth R. Sambavaram and
Vivek Sarin and
Ahmed Sameh and
Ananth Grama Multipole-based preconditioners for
large sparse linear systems . . . . . . 1261--1273
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Andrea Clematis and
Mike Mineter and
Richard Marciano High performance computing with
geographical data . . . . . . . . . . . 1275--1279
K. C. Clarke Geocomputation's future at the extremes:
high performance computing and
nanoclients . . . . . . . . . . . . . . 1281--1295
Kenneth A. Hawick and
P. D. Coddington and
H. A. James Distributed frameworks and parallel
algorithms for processing large-scale
geographic data . . . . . . . . . . . . 1297--1333
Ann Chervenak and
Ewa Deelman and
Carl Kesselman and
Bill Allcock and
Ian Foster and
Veronika Nefedova and
Jason Lee and
Alex Sim and
Arie Shoshani and
Bob Drach and
others High-performance remote access to
climate simulation data: a challenge
problem for data grid technologies . . . 1335--1356
Giovanni Aloisio and
Massimo Cafaro A dynamic earth observation system . . . 1357--1362
Asvin Ananthanarayan and
Rajiv Balachandran and
Robert Grossman and
Yunhong Gu and
Xinwei Hong and
Jorge Levera and
Marco Mazzucco Data webs for earth science data . . . . 1363--1379
Erik G. Hoel and
Hanan Samet Data-parallel polygonization . . . . . . 1381--1401
Giuseppe Dattilo and
Giandomenico Spezzano Simulation of a cellular landslide model
with CAMELOT on high performance
computers . . . . . . . . . . . . . . . 1403--1418
Apostolos Papadopoulos and
Yannis Manolopoulos Parallel bulk-loading of spatial data 1419--1444
Mark Lanthier and
Doron Nussbaum and
Jörg-Rüdiger Sack Parallel implementation of geometric
shortest path algorithms . . . . . . . . 1445--1479
Shaowen Wang and
Marc P. Armstrong A quadtree approach to domain
decomposition for spatial interpolation
in Grid computing environments . . . . . 1481--1504
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Laurence T. Yang and
Yi Pan and
Minyi Guo Parallel and distributed scientific and
engineering computing . . . . . . . . . 1505--1508
Yoshiyuki Iwamoto and
Koichi Suga and
Kanemitsu Ootsu and
Takashi Yokota and
Takanobu Baba Receiving message prediction method . . 1509--1538
Yudong Sun and
Cho-Li Wang Solving irregularly structured problems
based on distributed object model . . . 1539--1562
Weijian Fang and
Cho-Li Wang and
Francis C. M. Lau On the design of global object space for
efficient multi-threading Java computing
on clusters . . . . . . . . . . . . . . 1563--1587
Fan Chan and
Jiannong Cao and
Yudong Sun High-level abstractions for
message-passing parallel programming . . 1589--1621
Xiaohui Shen and
Alok Choudhary A distributed multi-storage I/O system
for data intensive scientific computing 1623--1643
Patrick R. Amestoy and
Iain S. Duff and
Stéphane Pralet and
Christof Vömel Adapting a parallel sparse direct solver
to architectures with clusters of SMPs 1645--1668
Suchuan Dong and
George Em Karniadakis Dual-level parallelism for high-order
CFD methods . . . . . . . . . . . . . . 1--20
V. A. Pais and
N. Fournier and
M. A. Sutton and
K. J. Weston and
U. Dragosits Using High Performance Fortran to
parallelise a multi-layer atmospheric
transport model . . . . . . . . . . . . 21--33
Milan D. Mihajlovi\'c and
David J. Silvester Efficient parallel solvers for the
biharmonic equation . . . . . . . . . . 35--55
Michel Toulouse and
Teodor Gabriel Crainic and
Brunilde Sansó Systemic behavior of cooperative search
algorithms . . . . . . . . . . . . . . . 57--79
Oliver Sinnen and
Leonel Sousa List scheduling: extension for
contention awareness and evaluation of
node priorities for heterogeneous
cluster architectures . . . . . . . . . 81--101
Frédéric Guinand and
Aziz Moukrim and
Eric Sanlaville Sensitivity analysis of tree scheduling
on two machines with communication
delays . . . . . . . . . . . . . . . . . 103--120
Yang-Suk Kee and
Jin-Soo Kim and
Soonhoi Ha Memory management for multi-threaded
software DSM systems . . . . . . . . . . 121--138
Eric Violard A semantic framework to address data
locality in data parallel languages . . 139--161
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Jörg Wensch and
Ben Sommeijer Parallel simulation of axon growth in
the nervous system . . . . . . . . . . . 163--186
Javier Cuenca and
Domingo Giménez and
José González Architecture of an automatically tuned
linear algebra library . . . . . . . . . 187--210
Maria Calzarossa and
Luisa Massari and
Daniele Tessera A methodology towards automatic
performance analysis of parallel
applications . . . . . . . . . . . . . . 211--223
B. B. Fraguela and
R. Doallo and
J. Touriño and
E. L. Zapata A compiler tool to predict memory
hierarchy performance of scientific
codes . . . . . . . . . . . . . . . . . 225--248
N. Tomov and
E. Dempster and
M. H. Williams and
A. Burger and
H. Taylor and
P. J. B. King and
P. Broughton Analytical response time estimation in
parallel relational database systems . . 249--283
Kentaro Sano and
Yusuke Kobayashi and
Tadao Nakamura Differential coding scheme for efficient
parallel image composition on a PC
cluster system . . . . . . . . . . . . . 285--299
Alexandros V. Gerbessiotis Architecture independent parallel
binomial tree option price valuations 301--316
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Lieven Eeckhout and
Koen De Bosschere Efficient simulation of trace samples on
parallel machines . . . . . . . . . . . 317--335
V. Blanco and
J. A. González and
C. León and
C. Rodríguez and
G. Rodríguez and
M. Printista Predicting the performance of parallel
programs . . . . . . . . . . . . . . . . 337--356
Eddy Caron and
Gil Utard On the performance of parallel
factorization of out-of-core matrices 357--375
Andrea Attanasio and
Jean-François Cordeau and
Gianpaolo Ghiani and
Gilbert Laporte Parallel Tabu search heuristics for the
dynamic multi-vehicle dial-a-ride
problem . . . . . . . . . . . . . . . . 377--387
Murray Cole Bringing skeletons out of the closet: a
pragmatic manifesto for skeletal
parallel programming . . . . . . . . . . 389--406
Sun-Yuan Hsieh and
Chun-Hua Chen Pancyclicity on Möbius cubes with maximal
edge faults . . . . . . . . . . . . . . 407--421
Jipeng Zhou and
Francis C. M. Lau Multi-phase minimal fault-tolerant
wormhole routing in meshes . . . . . . . 423--442
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Valerie Guralnik and
George Karypis Parallel tree-projection-based sequence
mining algorithms . . . . . . . . . . . 443--472
Gwan-Hwan Hwang An efficient algorithm for communication
set generation of data parallel programs
with block-cyclic distribution . . . . . 473--501
V. Dolean and
S. Lanteri Parallel multigrid methods for the
calculation of unsteady flows on
unstructured grids: algorithmic aspects
and parallel performances on clusters of
PCs . . . . . . . . . . . . . . . . . . 503--525
Rong-Guey Chang and
Tyng-Ruey Chuang and
Jenq Kuen Lee Support and optimization for parallel
sparse programs with array intrinsics of
Fortran 90 . . . . . . . . . . . . . . . 527--550
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Albert Y. Zomaya and
Fikret Ercal and
El-ghazali Talbi Parallel and nature-inspired
computational paradigms and applications 551--552
V. Di Martino and
M. Mililotti Sub optimal scheduling in a grid using
genetic algorithms . . . . . . . . . . . 553--565
Michelle Moore An accurate parallel genetic algorithm
to schedule tasks on a cluster . . . . . 567--583
P. Morillo and
J. M. Orduña and
M. Fernández A comparison study of evolutive
algorithms for solving the partitioning
problem in distributed virtual
environment systems . . . . . . . . . . 585--610
E. Alba and
G. Luque and
J. M. Troya Parallel LAN/WAN heuristics for
optimization . . . . . . . . . . . . . . 611--628
Azzedine Boukerche and
Kathia Regina Lemos Jucá and
João Bosco Sobral and
Mirela Sechi Moretti Annoni Notare An artificial immune based intrusion
detection model for computer and
telecommunication systems . . . . . . . 629--646
Sven E. Eklund A massively parallel architecture for
distributed genetic algorithms . . . . . 647--676
S. Cahon and
N. Melab and
E.-G. Talbi Building with ParadisEO reusable
parallel and distributed evolutionary
algorithms . . . . . . . . . . . . . . . 677--697
E. Alba and
F. Luna and
A. J. Nebro and
J. M. Troya Parallel heterogeneous genetic
algorithms for continuous optimization 699--719
F. de Toro Negro and
J. Ortega and
E. Ros and
S. Mota and
B. Paechter and
J. M. Martín PSFGA: Parallel processing and
evolutionary computation for
multiobjective optimisation . . . . . . 721--739
Xin-She Yang Pattern formation in enzyme inhibition
and cooperativity with parallel cellular
automata . . . . . . . . . . . . . . . . 741--751
Franciszek Seredynski and
Pascal Bouvry and
Albert Y. Zomaya Cellular automata computations and
secret key cryptography . . . . . . . . 753--766
Tiago Sousa and
Arlindo Silva and
Ana Neves Particle swarm-based data mining
algorithms for classification tasks . . 767--783
Peter Koro\vsec and
Jurij \vSilc and
Borut Robi\vc Solving the mesh-partitioning problem
with an ant-colony algorithm . . . . . . 785--801
Forbes J. Burkowski Proximity and priority: applying a gene
expression algorithm to the Traveling
Salesperson Problem . . . . . . . . . . 803--816
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Matthew L. Massie and
Brent N. Chun and
David E. Culler The ganglia distributed monitoring
system: design, implementation, and
experience . . . . . . . . . . . . . . . 817--840
Gerassimos Barlas and
Bharadwaj Veeravalli Quantized load distribution for tree and
bus-connected processors . . . . . . . . 841--865
Nihar R. Mahapatra and
Shantanu Dutt Adaptive Quality Equalizing:
High-performance load balancing for
parallel branch-and-bound across
applications and computing systems . . . 867--881
Ching-Wen Chen and
Shih-Chang Fu A minimal links traversed dynamic
rerouting network . . . . . . . . . . . 883--898
Michael Mascagni and
Ashok Srinivasan Parameterizing parallel multiplicative
lagged-Fibonacci generators . . . . . . 899--916
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Gerhard R. Joubert Editorial note . . . . . . . . . . . . . 917--918
Peter Koro\vsec and
Jurij \vSilc and
Borut Robi\vc ``Solving the mesh-partitioning problem
with an ant-colony algorithm'' [Parallel
Computing 30 (2004) 785--801] . . . . . 919--921
Stéphane Genaud and
Arnaud Giersch and
Frédéric Vivien Load-balancing scatter operations for
grid computing . . . . . . . . . . . . . 923--946
Ming Zhu and
Constantine Katsinis and
Wentong Cai and
Bu-Sung Lee Key Messaging on SOME-Bus clusters . . . 947--971
Teofilo F. Gonzalez and
David Serena $n$-Cube network: node disjoint shortest
paths for maximal distance pairs of
vertices . . . . . . . . . . . . . . . . 973--998
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Chun-Hsi Huang and
Sanguthevar Rajasekaran High-performance parallel bio-computing 999--1000
Mark L. Green and
Russ Miller Molecular structure determination on a
computational and data grid . . . . . . 1001--1017
Werner Dubitzky and
Damian McCourt and
Mykola Galushka and
Mathilde Romberg and
Bernd Schuller Grid-enabled data warehousing for
molecular engineering . . . . . . . . . 1019--1035
Alfredo Tirado-Ramos and
Peter M. A. Sloot and
Alfons G. Hoekstra and
Marian Bubak An integrative approach to
high-performance biomedical problem
solving environments on the Grid . . . . 1037--1055
Mark L. Green and
Russ Miller Evolutionary molecular structure
determination using grid-enabled data
mining . . . . . . . . . . . . . . . . . 1057--1071
David Piggott and
Conor Teljeur and
Alan Kelly Exploring the potential for using the
grid to support health impact assessment
modelling . . . . . . . . . . . . . . . 1073--1091
N. Jacq and
C. Blanchet and
C. Combet and
E. Cornillot and
L. Duret and
K. Kurata and
H. Nakamura and
T. Silvestre and
V. Breton Grid as a bioinformatic tool . . . . . . 1093--1107
Minyi Guo and
Michael (Shan-Hui) Ho and
Weng-Long Chang Fast parallel molecular solution to the
dominating-set problem on massively
parallel bio-computing . . . . . . . . . 1109--1125
Chain-Wu Lee and
Chun-Hsi Huang Toward cooperative genomic knowledge
inference . . . . . . . . . . . . . . . 1127--1135
John H. Miller and
Fang Zheng Large-scale simulations of cellular
signaling processes . . . . . . . . . . 1137--1149
Peter K. K. Loh and
W. J. Hsu Fault-tolerant routing for complete
Josephus Cubes . . . . . . . . . . . . . 1151--1167
Anonymous Editorial Board . . . . . . . . . . . . CO2--CO2
Jorge Buenabad-Chávez and
Henk L. Muller and
Paul W. A. Stallard and
David H. D. Warren The diffusion space of data diffusion
architectures . . . . . . . . . . . . . 1169--1193
Alexey Lastovetsky and
Ravi Reddy On performance analysis of heterogeneous
parallel algorithms . . . . . . . . . . 1195--1216
Michael Mascagni and
Hongmei Chi Parallel linear congruential generators
with Sophie--Germain moduli . . . . . . 1217--1231
Suchendra M. Bhandarkar and
Shankar R. Chandrasekaran Parallel parsing of MPEG video on a
shared-memory symmetric multiprocessor 1233--1276
Masaaki Shimasaki and
Hans P. Zima The Earth Simulator . . . . . . . . . . 1277--1278
Tetsuya Sato The Earth Simulator: roles and impacts 1279--1286
Shinichi Habata and
Kazuhiko Umezawa and
Mitsuo Yokokawa and
Shigemune Kitawaki Hardware system of the Earth Simulator 1287--1313
Takashi Yanagawa and
Kenji Suehiro Software system of the Earth Simulator 1315--1327
K. Itakura and
A. Uno and
M. Yokokawa and
T. Ishihara and
Y. Kaneda Scalability of hybrid programming for a
CFD code on the Earth Simulator . . . . 1329--1343
Akiyoshi Wakatani A parallel and scalable algorithm for
ADI method with pre-propagation and
message vectorization . . . . . . . . . 1345--1359
Kentaro Sano and
Shintaro Momose and
Hiroyuki Takizawa and
Hiroaki Kobayashi and
Tadao Nakamura Efficient parallel processing of
competitive learning algorithms . . . . 1361--1383
Jürg Hutter and
Alessandro Curioni Dual-level parallelism for ab initio
molecular dynamics: Reaching teraflop
performance with the CPMD code . . . . . 1--17
Fumihiko Ino and
Kanrou Ooyama and
Kenichi Hagihara A data distributed parallel algorithm
for nonrigid image registration . . . . 19--43
M. Salomon and
F. Heitz and
G.-R. Perrin and
J.-P. Armspach A massively parallel approach to
deformable matching of $3$D medical
images via stochastic differential
equations . . . . . . . . . . . . . . . 45--71
Stéphane Guyetant and
Mathieu Giraud and
Ludovic L'Hours and
Steven Derrien and
Stéphane Rubini and
Dominique Lavenier and
Frédéric Raimbault Cluster of re-configurable nodes for
scanning large genomic banks . . . . . . 73--96
Andrea Di Blas and
Arun Jagota and
Richard Hughey Optimizing neural networks on SIMD
parallel computers . . . . . . . . . . . 97--115
Michihiro Koibuchi and
Akiya Jouraku and
Hideharu Amano Path selection algorithm: the strategy
for designing deterministic routing from
alternative paths . . . . . . . . . . . 117--130
Hong-Chun Hsu and
Liang-Chih Chiang and
Jimmy J. M. Tan and
Lih-Hsing Hsu Fault hamiltonicity of augmented cubes 131--145
Bruno Raffin and
Han-Wei Shen and
Dirk Bartz Parallel graphics and visualization . . 147--148
T. Furumura and
L. Chen Parallel simulation of strong ground
motions during recent and historical
damaging earthquakes in Tokyo, Japan . . 149--165
Hongfeng Yu and
Kwan-Liu Ma A study of I/O methods for parallel
visualization of large-scale data . . . 167--183
Jinzhu Gao and
Chaoli Wang and
Liya Li and
Han-Wei Shen A parallel multiresolution volume
rendering algorithm for large data
visualization . . . . . . . . . . . . . 185--204
M. Strengert and
M. Magallón and
D. Weiskopf and
Stefan Guthe and
T. Ertl Large volume visualization of compressed
time-dependent datasets on GPU clusters 205--219
David E. DeMarle and
Christiaan P. Gribble and
Solomon Boulos and
Steven G. Parker Memory sharing for interactive ray
tracing on clusters . . . . . . . . . . 221--242
Kevin Liang and
Patricia Monger and
Huge Couchman Interactive parallel visualization of
large particle datasets . . . . . . . . 243--260
Erich Strohmaier and
Jack J. Dongarra and
Hans W. Meuer and
Horst D. Simon Recent trends in the marketplace of high
performance computing . . . . . . . . . 261--273
Iain S. Duff and
Jennifer A. Scott Stabilized bordered block diagonal forms
for parallel sparse solvers . . . . . . 275--289
Arijit Laha and
Amitava Sen and
Bhabani P. Sinha Parallel algorithms for identifying
convex and non-convex basis polygons in
an image . . . . . . . . . . . . . . . . 290--310
Bhanu Hariharan and
Srinivas Aluru Efficient parallel algorithms and
software for compressed octrees with
applications to hierarchical methods . . 311--331
Li Chunlin and
Li Layuan A distributed utility-based two level
market solution for optimal resource
scheduling in computational grid . . . . 332--351
Takashi Midorikawa and
Daisuke Shiraishi and
Masayoshi Shigeno and
Yasuki Tanabe and
Toshihiro Hanawa and
Hideharu Amano The performance of SNAIL-2 a (S2SS-MIN
connected multiprocessor with cache
coherent mechanism) . . . . . . . . . . 352--370
Yuan-Hsiang Teng and
Jimmy J. M. Tan and
Lih-Hsing Hsu Honeycomb rectangular disks . . . . . . 371--388
Dong Xiang and
Ai Chen and
Jiaguang Sun Fault-tolerant routing and multicasting
in hypercubes using a partial path
set-up . . . . . . . . . . . . . . . . . 389--411
Daniel A. Reed and
Mitsuhisa Sato and
Denis Trystram Editorial . . . . . . . . . . . . . . . 413--413
Margreet Nool and
Michael M. J. Proot A parallel least-squares spectral
element solver for incompressible flow
problems on unstructured grids . . . . . 414--438
Jacques M. Bahi and
Sylvain Contassot-Vivier and
Raphaël Couturier Evaluation of the asynchronous iterative
algorithms in the context of distant
heterogeneous clusters . . . . . . . . . 439--461
Ghazi Al-Rawi and
John Cioffi and
Mark Horowitz On task mapping optimization for
parallel decoding of low-density
parity-check codes on message-passing
architectures . . . . . . . . . . . . . 462--490
Josef Kohout and
Ivana Kolingerová and
Ji\vrí \vZára Parallel Delaunay triangulation in $E^2$
and $E^3$ for computers with shared
memory . . . . . . . . . . . . . . . . . 491--522
Z. Du and
F. Lin A novel parallelization approach for
hierarchical clustering . . . . . . . . 523--527
Sanya Tangpongprasit and
Takahiro Katagiri and
Kenji Kise and
Hiroki Honda and
Toshitsugu Yuba A time-to-live based reservation
algorithm on fully decentralized
resource discovery in Grid computing . . 529--543
Oscar Plata and
Rafael Asenjo and
Eladio Gutiérrez and
Francisco Corbera and
Angeles Navarro and
Emilio L. Zapata On the parallelization of irregular and
dynamic programs . . . . . . . . . . . . 544--562
J. Verkaik and
H. X. Lin A class of novel parallel algorithms for
the solution of tridiagonal systems . . 563--587
Robert W. Numrich Parallel numerical algorithms based on
tensor notation and Co-Array Fortran
syntax . . . . . . . . . . . . . . . . . 588--607
Marcello Balduccini and
Enrico Pontelli and
Omar Elkhatib and
Hung Le Issues in parallel execution of
non-monotonic reasoning systems . . . . 608--647
Alexey Kalinov and
Alexey Lastovetsky and
Yves Robert Heterogeneous computing . . . . . . . . 649--652
T. Hagras and
J. Jane\vcek A high performance, low complexity
algorithm for compile-time task
scheduling in heterogeneous systems . . 653--670
S. Shivle and
P. Sugavanam and
H. J. Siegel and
A. A. Maciejewski and
T. Banka and
K. Chindam and
S. Dussinger and
A. Kutruff and
P. Penumarthy and
P. Pichumani and
P. Satyasekaran and
D. Sendek and
J. Smith and
J. Sousa and
J. Sridharan and
J. Velazco Mapping subtasks with multiple versions
on an ad hoc grid . . . . . . . . . . . 671--690
Yoshinori Kishimoto and
Shuichi Ichikawa Optimizing the configuration of a
heterogeneous cluster with
multiprocessing and execution-time
estimation . . . . . . . . . . . . . . . 691--710
Javier Cuenca and
Domingo Giménez and
Juan-Pedro Martínez Heuristics for work distribution of a
homogeneous parallel dynamic programming
scheme on heterogeneous systems . . . . 711--735
Ioana Banicescu and
Ricolindo L. Cariño and
Jaderick P. Pabico and
Mahadevan Balasubramaniam Design and implementation of a novel
dynamic load balancing library for
cluster computing . . . . . . . . . . . 736--756
M-Tahar Kechadi and
Ilias K. Savvas Dynamic task scheduling for irregular
network topologies . . . . . . . . . . . 757--776
A. Srinivasan and
N. Chandra Latency tolerance through
parallelization of time in scientific
applications . . . . . . . . . . . . . . 777--796
Han Yu and
Xin Bai and
Dan C. Marinescu Workflow management and resource
discovery for an intelligent grid . . . 797--811
Bruno Richard and
Nicolas Maillard and
César A. F. De Rose and
Reynaldo Novaes The I-Cluster Cloud: distributed
management of idle resources for intense
computing . . . . . . . . . . . . . . . 813--838
Z. G. Wang and
Y. S. Wong and
M. Rahman Development of a parallel optimization
method based on genetic simulated
annealing algorithm . . . . . . . . . . 839--857
J. C. Pichel and
D. B. Heras and
J. C. Cabaleiro and
F. F. Rivera Performance optimization of irregular
codes based on the combination of
reordering and blocking techniques . . . 858--876
G. L. Reijns and
A. J. C. van Gemund Predicting the execution times of
parallel-independent programs using
Pearson distributions . . . . . . . . . 877--899
Uro\vs \vCibej and
Bo\vstjan Slivnik and
Borut Robi\vc The complexity of static data
replication in data grids . . . . . . . 900--912
Jürgen Dreher and
Rainer Grauer Racoon: a parallel mesh-adaptive
framework for hyperbolic conservation
laws . . . . . . . . . . . . . . . . . . 913--932
Tao Dong A linear time pessimistic one-step
diagnosis algorithm for hypercube
multicomputer systems . . . . . . . . . 933--947
Hayedeh Ahrabian and
Abbas Nowzari-Dalini Parallel generation of binary trees in
$A$-order . . . . . . . . . . . . . . . 948--955
Barbara M. Chapman and
Federico Massaioli OpenMP . . . . . . . . . . . . . . . . . 957--959
Xinmin Tian and
Jay P. Hoeflinger and
Grant Haab and
Yen-Kuang Chen and
Milind Girkar and
Sanjiv Shah A compiler for exploiting nested
parallelism in OpenMP programs . . . . . 960--983
R. Blikberg and
T. Sòrevik Load balancing and OpenMP implementation
of nested parallelism . . . . . . . . . 984--998
C. S. Ierotheou and
H. Jin and
G. Matthews and
S. P. Johnson and
R. Hood Generating OpenMP code using an
interactive parallelization environment 999--1012
Rocco Aversa and
Beniamino Di Martino and
Massimiliano Rak and
Salvatore Venticinque and
Umberto Villano Performance prediction through
simulation of a hybrid MPI/OpenMP
application . . . . . . . . . . . . . . 1013--1033
Rocco Aversa and
Beniamino Di Martino and
Nicola Mazzocca and
Salvatore Venticinque A hierarchical distributed-shared memory
parallel Branch & Bound application with
PVM and OpenMP for multiprocessor
clusters . . . . . . . . . . . . . . . . 1034--1047
Kengo Nakajima Parallel iterative solvers for
finite-element methods using an
OpenMP/MPI hybrid programming model on
the Earth Simulator . . . . . . . . . . 1048--1065
Federico Massaioli and
Filippo Castiglione and
Massimo Bernaschi OpenMP parallelization of agent-based
models . . . . . . . . . . . . . . . . . 1066--1081
Roland Norcen and
Andreas Uhl High performance JPEG 2000 and MPEG-4
VTC on SMPs using OpenMP . . . . . . . . 1082--1098
Inho Park and
Seon Wook Kim Study of OpenMP applications on the
InfiniBand-based software distributed
shared-memory system . . . . . . . . . . 1099--1113
Lei Huang and
Barbara Chapman and
Zhenying Liu Towards a more efficient implementation
of OpenMP for clusters via translation
to global arrays . . . . . . . . . . . . 1114--1139
Motonori Hirano and
Mitsuhisa Sato and
Yoshio Tanaka OpenGR: a directive-based grid
programming environment . . . . . . . . 1140--1154
P. E. Hadjidoukas and
T. S. Papatheodorou OpenMP extensions for master-slave
message passing computing . . . . . . . 1155--1167
Anonymous Editorial Board . . . . . . . . . . . . iv--vi
P. Wapperom and
A. N. Beris and
M. A. Straka A new transpose split method for
three-dimensional FFTs: performance on
an Origin2000 and Alphaserver cluster 1--13
Chun-Hsi Huang and
Xin He and
Min Qian Communication-optimal parallel
parenthesis matching . . . . . . . . . . 14--23
Kazuhide Nakata and
Makoto Yamashita and
Katsuki Fujisawa and
Masakazu Kojima A parallel primal-dual interior-point
method for semidefinite programs using
positive definite matrix completion . . 24--43
Valmir C. Barbosa and
Fernando M. N. Miranda and
Matheus C. M. Agostini Cell-centric heuristics for the
classification of cellular automata . . 44--66
L. Carracciuolo and
L. D'Amore and
A. Murli Towards a parallel component for imaging
in PETSc programming environment: a case
study in $3$-D echocardiography . . . . 67--83
Sun-Yuan Hsieh Fault-tolerant cycle embedding in the
hypercube with more both faulty vertices
and faulty edges . . . . . . . . . . . . 84--91
Takahiro Katagiri and
Kenji Kise and
Hiroki Honda and
Toshitsugu Yuba ABCLibScript: a directive to support
specification of an auto-tuning facility
for numerical software . . . . . . . . . 92--112
Maurice Clint and
Efstratios Gallopoulos and
Esmond Ng and
Jean Roman Parallel Matrix Algorithms and
Applications (PMAA'04) . . . . . . . . . 113--114
Asad Awan and
Ronaldo A. Ferreira and
Suresh Jagannathan and
Ananth Grama Unstructured peer-to-peer networks for
sharing processor cycles . . . . . . . . 115--135
Patrick R. Amestoy and
Abdou Guermouche and
Jean-Yves L'Excellent and
Stéphane Pralet Hybrid scheduling for the parallel
solution of linear systems . . . . . . . 136--156
Peter Arbenz and
Martin Be\vcka and
Roman Geus and
Ulrich Hetmaniuk and
Tiziano Mengotti On a parallel multilevel preconditioned
Maxwell eigensolver . . . . . . . . . . 157--165
Gabriel Ok\vsa and
Marián Vajter\vsic Efficient pre-processing in the parallel
block-Jacobi SVD algorithm . . . . . . . 166--176
Eric Polizzi and
Ahmed H. Sameh A parallel hybrid banded system solver:
the SPIKE algorithm . . . . . . . . . . 177--194
Petko Yanev and
Erricos John Kontoghiorghes Efficient algorithms for estimating the
general linear model . . . . . . . . . . 195--204
P. Rajesh Kumar and
K. Sridharan and
S. Srinivasan A parallel algorithm, architecture and
FPGA realization for landmark
determination and map construction in a
planar unknown environment . . . . . . . 205--221
Marc Hofmann and
Erricos John Kontoghiorghes Pipeline Givens sequences for computing
the QR decomposition on a EREW PRAM . . 222--230
Takahiro Katagiri and
Kenji Kise and
Hiroki Honda and
Toshitsugu Yuba ABCLib\_DRSSED: a parallel eigensolver
with an auto-tuning facility . . . . . . 231--250
Nahid Emad and
Ani Sedrakian Toward the reusability for iterative
linear algebra software in distributed
environment . . . . . . . . . . . . . . 251--266
R. S. Montero and
E. Huedo and
I. M. Llorente Benchmarking of high throughput
computing applications on Grids . . . . 267--279
Makoto Satoh and
Kiyoshi Negishi and
Atsushi Kobayashi Analysis of two-level data mapping in an
HPF compiler for distributed-memory
machines . . . . . . . . . . . . . . . . 280--300
Prasanta K. Jana Polynomial interpolation and polynomial
root finding on OTIS-mesh . . . . . . . 301--312
Silvia M. Figueira Optimal partitioning of nodes to
space-sharing parallel tasks . . . . . . 313--324
R. Hatzky Domain cloning for a particle-in-cell
(PIC) code on a cluster of
symmetric-multiprocessor (SMP) computers 325--330
Xiao Qin and
Hong Jiang A novel fault-tolerant scheduling
algorithm for precedence constrained
tasks in real-time heterogeneous systems 331--356
Gianluigi Folino and
Giuseppe Mendicino and
Alfonso Senatore and
Giandomenico Spezzano and
Salvatore Straface A model based on cellular automata for
the parallel simulation of $3$D
unsaturated flow . . . . . . . . . . . . 357--376
F. Luna and
A. J. Nebro and
E. Alba Observations in using Grid-enabled
technologies for solving multi-objective
optimization problems . . . . . . . . . 377--393
A. H. Baker and
R. D. Falgout and
U. M. Yang An assumed partition algorithm for
determining processor
inter-communication . . . . . . . . . . 394--414
E. Alba and
F. Almeida and
M. Blesa and
C. Cotta and
M. Díaz and
I. Dorta and
J. Gabarró and
C. León and
G. Luque and
J. Petit and
C. Rodríguez and
A. Rojas and
F. Xhafa Efficient parallel LAN/WAN algorithms
for optimization. The \sc MALLBA Project 415--440
Zhihua Du and
Feng Lin pNJTree: a parallel program for
reconstruction of neighbor-joining tree
and its application in ClustalW . . . . 441--446
Herbert Kuchen and
Murray Cole Editorial . . . . . . . . . . . . . . . 447--448
Marco Danelutto and
Marco Aldinucci Algorithmic skeletons meeting grids . . 449--462
Xiao Yan Deng and
Greg Michaelson and
Phil Trinder Autonomous mobility skeletons . . . . . 463--478
Horacio González-Vélez Self-adaptive skeletal task farm for
computational grids . . . . . . . . . . 479--490
Antonio Dorta and
Pablo López and
Francisco de Sande Basic skeletons in llc . . . . . . . . . 491--506
Clemens Grelck and
Sven-Bodo Scholz Merging compositions of array skeletons
in SaC . . . . . . . . . . . . . . . . . 507--522
Mercedes Hidalgo-Herrero and
Yolanda Ortega-Mallén and
Fernando Rubio Analyzing the influence of mixed
evaluation on the performance of Eden
skeletons . . . . . . . . . . . . . . . 523--538
F. Clément and
V. Martin and
A. Vodicka and
R. Di Cosmo and
P. Weis Domain decomposition and skeleton
programming with OCamlP3l . . . . . . . 539--550
Rob H. Bisseling and
Ildikó Flesch Mondriaan sparse matrix partitioning for
attacking cryptosystems by a parallel
block Lanczos algorithm --- a case study 551--567
E. Cesar and
A. Moreno and
J. Sorribes and
E. Luque Modeling Master/Worker applications for
automatic performance tuning . . . . . . 568--589
Kiminori Matsuzaki and
Zhenjiang Hu and
Masato Takeichi Parallel skeletons for manipulating
general trees . . . . . . . . . . . . . 590--603
J. Falcou and
J. Sérot and
T. Chateau and
J. T. Lapresté Quaff: efficient C++ design for parallel
skeletons . . . . . . . . . . . . . . . 604--615
Paras Mehta and
José Nelson Amaral and
Duane Szafron Is MPI suitable for a generative
design-pattern system? . . . . . . . . . 616--626
Jeff Linderoth and
Roberto Musmanno Optimization on grids --- optimization
for grids . . . . . . . . . . . . . . . 627--628
Lúcia M. A. Drummond and
Eduardo Uchoa and
Alexandre D. Gonçalves and
Juliana M. N. Silva and
Marcelo C. P. Santos and
Maria Clícia S. de Castro A grid-enabled distributed
branch-and-bound algorithm with
application on the Steiner Problem in
graphs . . . . . . . . . . . . . . . . . 629--642
N. Melab and
M. Mezmaz and
E.-G. Talbi Parallel cooperative meta-heuristics on
the computational grid.: a case study:
the bi-objective Flow-Shop problem . . . 643--659
Wahid Chrabakh and
Rich Wolski GridSAT: a system for solving
satisfiability problems using a
computational grid . . . . . . . . . . . 660--687
Demetrio Laganá and
Pasquale Legato and
Ornella Pisacane and
Francesca Vocaturo Solving simulation optimization problems
on grid computing systems . . . . . . . 688--700
Andrea Attanasio and
Gianpaolo Ghiani and
Lucio Grandinetti and
Francesca Guerriero Auction algorithms for decentralized
parallel machine scheduling . . . . . . 701--709
Georgios Goumas and
Nikolaos Drosinos and
Maria Athanasaki and
Nectarios Koziris Message-passing code generation for
non-rectangular tiling transformations 711--732
Hon F. Li and
Zunce Wei and
Dhrubajyoti Goswami Quasi-atomic recovery for distributed
agents . . . . . . . . . . . . . . . . . 733--758
Savina Bansal and
Padam Kumar and
Kuldip Singh An improved two-step algorithm for task
and data parallel scheduling in
distributed memory machines . . . . . . 759--774
H. Sarbazi-Azad and
M. Ould-Khaoua and
A. Y. Zomaya Performance evaluation of communication
networks for parallel and distributed
systems . . . . . . . . . . . . . . . . 775--776
Luca Gatani and
Giuseppe Lo Re and
Salvatore Gaglio An efficient distributed algorithm for
generating and updating multicast trees 777--793
Rod Fatoohi and
Ken Kardys and
Sumy Koshy and
Soundarya Sivaramakrishnan and
Jeffrey S. Vetter Performance evaluation of high-speed
interconnects using dense communication
patterns . . . . . . . . . . . . . . . . 794--807
James Broberg and
Zahir Tari and
Panlop Zeephongsekul Task assignment with work-conserving
migration . . . . . . . . . . . . . . . 808--830
Bahman Javadi and
Mohammad K. Akbari and
Jemal H. Abawajy A performance model for analysis of
heterogeneous multi-cluster systems . . 831--851
Masaru Takesue The psi-cube: a bus-based cube-type
clustering network for high-performance
on-chip systems . . . . . . . . . . . . 852--869
A. Shahrabi Performance comparison of routing
algorithms in wormhole-switched networks 870--885
M. Hoseiny Farahabady and
F. Safaei and
A. Khonsari and
M. Fathy Characterization of spatial fault
patterns in interconnection networks . . 886--901
Azzedine Boukerche and
Caron Dzermajko and
Lu Kaiyuan An enhancement towards dynamic
grid-based DDM protocol for distributed
simulation using multiple levels of data
filtering . . . . . . . . . . . . . . . 902--919
Dan Reed Changes and updates . . . . . . . . . . 1--1
Jong Wook Kwak and
Chu Shik Jhon Torus Ring: improving performance of
interconnection network by modifying
hierarchical ring . . . . . . . . . . . 2--20
Celso C. Ribeiro and
Isabel Rosseti Efficient parallel cooperative
implementations of GRASP heuristics . . 21--35
Meijie Ma and
Guizhen Liu and
Jun-Ming Xu Panconnectivity and edge-fault-tolerant
pancyclicity of augmented cubes . . . . 36--42
James S. Hammonds and
Faisal Saied and
Mark A. Shannon Solving coupled $3$-D paraxial wave and
thermal diffusion equations with
mixed-mode parallel computations . . . . 43--53
Gregorio Bernabé and
Ricardo Fernández and
Jose M. García and
Manuel E. Acacio and
José González An efficient implementation of a $3$D
wavelet transform based encoder on
hyper-threading technology . . . . . . . 54--72
Jinn-Shyong Yang and
Shyue-Ming Tang and
Jou-Ming Chang and
Yue-Li Wang Parallel construction of optimal
independent spanning trees on hypercubes 73--79
Osman Ya\csar and
Hasan Da\=g Trends in parallel computing . . . . . . 81--82
Hasan Da\=g An approximate inverse preconditioner
and its implementation for conjugate
gradient method . . . . . . . . . . . . 83--91
Halis Sak and
Süleyman Özekici and
\.Ilkay Boduro\uglu Parallel computing in Asian option
pricing . . . . . . . . . . . . . . . . 92--108
Omar Ramadan Three dimensional MPI parallel
implementation of the PML algorithm for
truncating finite-difference time-domain
Grids . . . . . . . . . . . . . . . . . 109--115
Peter Rissland and
Yuefan Deng Electrostatic force computation for
bio-molecules on supercomputers with
torus networks . . . . . . . . . . . . . 116--123
Ferat Sahin and
M. Çetin Yavuz and
Ziya Arnavut and
Önder Uluyol Fault diagnosis for airplane engines
using Bayesian networks and distributed
particle swarm optimization . . . . . . 124--143
César A. F. De Rose and
Hans-Ulrich Heiss and
Barry Linnert Distributed dynamic processor allocation
for multicomputers . . . . . . . . . . . 145--158
Alessia Gualandris and
Simon Portegies Zwart and
Alfredo Tirado-Ramos Performance analysis of direct N . . . . 159--173
Zeyao Mo and
Xiaowen Xu Relaxed RS0 or CLJP coarsening strategy
for parallel AMG . . . . . . . . . . . . 174--185
D. D'Ambrosio and
W. Spataro Parallel evolutionary modelling of
geological processes . . . . . . . . . . 186--212
Walfredo Cirne and
Francisco Brasileiro and
Daniel Paranhos and
Luís Fabrício W. Góes and
William Voorsluys On the efficacy, efficiency and emergent
behavior of task replication in large
distributed systems . . . . . . . . . . 213--234
Christophe Cérin Large scale grids . . . . . . . . . . . 235--237
Vandy Berten and
Bruno Gaujal Brokering strategies in computational
grids using stochastic prediction models 238--249
J. R. Bilbao-Castro and
A. Merino and
I. García and
J. M. Carazo and
J. J. Fernández Parameter optimization in $3$D
reconstruction on a large scale grid . . 250--263
Benjamin Gaidioz and
Birger Koblitz and
Nuno Santos Exploring high performance distributed
file storage using LDPC codes . . . . . 264--274
Denis Caromel and
Alexandre di Costanzo and
Clément Mathieu Peer-to-peer for computational grids:
mixing clusters and desktop machines . . 275--288
Nicolas Jacq and
Vincent Breton and
Hsin-Yen Chen and
Li-Yung Ho and
Martin Hofmann and
Vinod Kasam and
Hurng-Chun Lee and
Yannick Legré and
Simon C. Lin and
Astrid Maaß and
Emmanuel Medernach and
Ivan Merelli and
Luciano Milanesi and
Giulio Rastelli and
Matthieu Reichstadt and
Jean Salzemann and
Horst Schwichtenberg and
Ying-Ta Wu and
Marc Zimmermann Virtual screening on large scale grids 289--301
M. Mezmaz and
N. Melab and
E.-G. Talbi An efficient load balancing strategy for
grid-based branch and bound algorithm 302--313
Hiroshi Yamauchi and
Dongyan Xu Portable virtual cycle accounting for
large-scale distributed cycle sharing
systems . . . . . . . . . . . . . . . . 314--327
Eun-Kyu Byun and
Jin-Soo Kim DynaGrid: a dynamic service deployment
and resource migration framework for
WSRF-compliant applications . . . . . . 328--338
Moreno Marzolla and
Matteo Mordacchini and
Salvatore Orlando Peer-to-peer systems for discovering
resources in a dynamic grid . . . . . . 339--358
Luis Paulo Santo and
Bruno Raffin and
Alan Heirich Parallel graphics and visualization . . 359--360
K. Debattista and
A. Chalmers and
R. Gillibrand and
P. Longhurst and
G. Mastoropoulou and
V. Sundstedt Parallel selective rendering of
high-fidelity virtual environments . . . 361--376
Bernhard Thomaszewski and
Wolfgang Blochinger Physically based simulation of cloth on
distributed memory architectures . . . . 377--390
Fábio F. Bernardon and
Steven P. Callahan and
João L. D. Comba and
Cláudio T. Silva An adaptive framework for visualizing
unstructured grids with time-varying
scalar fields . . . . . . . . . . . . . 391--405
C. Müller and
M. Strengert and
T. Ertl Adaptive load balancing for raycasting
of non-uniformly bricked volumes . . . . 406--419
D. Cotting and
M. Waschbüsch and
M. Duller and
M. Gross WinSGL: synchronizing displays in
parallel graphics using cost-effective
software genlocking . . . . . . . . . . 420--437
Mario Lorenz and
Guido Brunnett and
Marcel Heinz Driving tiled displays with an extended
Chromium system based on stream cached
multicast communication . . . . . . . . 438--466
Chao-Tung Yang and
Kuan-Wei Cheng and
Wen-Chung Shih On development of an efficient parallel
loop self-scheduling for grid computing
environments . . . . . . . . . . . . . . 467--487
Jung-Sheng Fu Conditional fault-tolerant hamiltonicity
of star graphs . . . . . . . . . . . . . 488--496
Henrique Andrade and
Tahsin Kurc and
Alan Sussman and
Joel Saltz Active semantic caching to optimize
multidimensional data analysis in
parallel and distributed environments 497--520
V. Hernandez and
J. E. Roman and
A. Tomas Parallel Arnoldi eigensolvers with
enhanced scalability via global
communications rearrangement . . . . . . 521--540
T. Esposti Ongaro and
C. Cavazzoni and
G. Erbacci and
A. Neri and
M. V. Salvetti A parallel multiphase flow code for the
$3$D simulation of explosive volcanic
eruptions . . . . . . . . . . . . . . . 541--560
Isaac D. Scherson and
Daniel S. Valencia and
Enrique Cauich Service address routing: a
network-embedded resource management
layer for cluster computing . . . . . . 561--571
Wei Jie and
Wentong Cai and
Lizhe Wang and
Rob Procter A secure information service for
monitoring large scale grids . . . . . . 572--591
Bernd Mohr and
Jesper Larsson Träff and
Joachim Worringen Selected papers from EuroPVM/MPI 2006 593--594
William Gropp and
Rajeev Thakur Thread-safety in an MPI implementation:
Requirements and analysis . . . . . . . 595--604
Fabian Kulla and
Peter Sanders Scalable parallel suffix array
construction . . . . . . . . . . . . . . 605--612
Jelena Pje\vsivac-Grbovi\'c and
George Bosilca and
Graham E. Fagg and
Thara Angskun and
Jack J. Dongarra MPI collective algorithm selection and
quadtree encoding . . . . . . . . . . . 613--623
Torsten Hoefler and
Peter Gottschling and
Andrew Lumsdaine and
Wolfgang Rehm Optimizing a conjugate gradient solver
with non-blocking collective operations 624--633
Darius Buntinas and
Guillaume Mercier and
William Gropp Implementation and evaluation of
shared-memory communication and
synchronization operations in MPICH2
using the Nemesis communication
subsystem . . . . . . . . . . . . . . . 634--644
Wu-chun Feng and
Dinesh Manocha High-performance computing using
accelerators . . . . . . . . . . . . . . 645--647
Patrick McCormick and
Jeff Inman and
James Ahrens and
Jamaludin Mohd-Yusof and
Greg Roth and
Sharen Cummins Scout: a data-parallel programming
language for graphics processors . . . . 648--662
Naga K. Govindaraju and
Dinesh Manocha Cache-efficient numerical algorithms
using graphics hardware . . . . . . . . 663--684
Dominik Göddeke and
Robert Strzodka and
Jamaludin Mohd-Yusof and
Patrick McCormick and
Sven H. M. Buijssen and
Matthias Grajewski and
Stefan Turek Exploring weak scalability for FEM
calculations on a GPU-enhanced cluster 685--699
Filip Blagojevic and
Dimitrios S. Nikolopoulos and
Alexandros Stamatakis and
Christos D. Antonopoulos and
Matthew Curtis-Maury Runtime scheduling of dynamic
parallelism on accelerator-based
multi-core systems . . . . . . . . . . . 700--719
David A. Bader and
Virat Agarwal and
Kamesh Madduri and
Seunghwa Kang High performance combinatorial algorithm
design on the Cell Broadband Engine
processor . . . . . . . . . . . . . . . 720--740
Martin C. Herbordt and
Josh Model and
Bharat Sukhwani and
Yongfeng Gu and
Tom VanCourt Single pass streaming BLAST on FPGAs . . 741--756
Alexey Lastovetsky and
Ravi Reddy Data distribution for dense
factorization on computers with memory
heterogeneity . . . . . . . . . . . . . 757--779
J. Xu Benchmarks on tera-scalable models for
DNS of turbulent channel flow . . . . . 780--794
N. Botta and
C. Ionescu Relation-based computations in a monadic
BSP model . . . . . . . . . . . . . . . 795--821
M. Vanneschi and
L. Veraldi Dynamicity in distributed applications:
issues, problems and the ASSIST approach 822--845
V. Santhosh Kumar and
R. Nanjundiah and
M. J. Thazhuthaveetil and
R. Govindarajan Impact of message compression on the
scalability of an atmospheric modeling
application on clusters . . . . . . . . 1--16
Yuhui Deng and
Frank Wang and
Na Helian and
Sining Wu and
Chenhan Liao Dynamic and scalable storage management
architecture for Grid Oriented Storage
devices . . . . . . . . . . . . . . . . 17--31
Jason Brazile and
Rudolf Richter and
Daniel Schläpfer and
Michael E. Schaepman and
Klaus I. Itten Cluster versus grid for operational
generation of ATCOR's \sc MODTRAN-based
look up tables . . . . . . . . . . . . . 32--46
Albert Chan and
Frank Dehne and
Prosenjit Bose and
Markus Latzel Coarse grained parallel algorithms for
graph matching . . . . . . . . . . . . . 47--62
Fouad B. Chedid An optimal parallelization of the
two-list algorithm of cost
${O}(2^{n/2})$ . . . . . . . . . . . . . 63--65
Anonymous Acknowledgement to reviewers . . . . . . 66--68
Andrzej M. Goscinski and
Adam K. L. Wong A study of the concurrent execution of
parallel and sequential applications on
a non-dedicated cluster . . . . . . . . 69--91
Antonio Plaza and
David Valencia and
Javier Plaza An experimental comparison of parallel
algorithms for hyperspectral analysis
using heterogeneous and homogeneous
networks of workstations . . . . . . . . 92--114
A. Murli and
L. D'Amore and
L. Carracciuolo and
M. Ceccarelli and
L. Antonelli High performance edge-preserving
regularization in $3$D SPECT imaging . . 115--132
Eladio Gutiérrez and
Oscar Plata and
Emilio L. Zapata An analytical model of locality-based
parallel irregular reductions . . . . . 133--157
Jean-François Pineau and
Yves Robert and
Frédéric Vivien The impact of heterogeneity on
master-slave scheduling . . . . . . . . 158--176
S. Chandra Sekhara Rao and
Sarita Parallel solution of large symmetric
tridiagonal linear systems . . . . . . . 177--197
Volodymyr Kindratenko and
Duncan Buell Reconfigurable Systems Summer Institute
2007 . . . . . . . . . . . . . . . . . . 199--200
Roger D. Chamberlain and
Joseph M. Lancaster and
Ron K. Cytron Visions for application development on
hybrid computing systems . . . . . . . . 201--216
Seth Koehler and
John Curreri and
Alan D. George Performance analysis challenges and
framework for high-performance
reconfigurable computing . . . . . . . . 217--230
M. Wirthlin and
D. Poznanovic and
P. Sundararajan and
A. Coppola and
D. Pellerin and
W. Najjar and
R. Bruce and
M. Babst and
O. Pritchard and
P. Palazzari and
G. Kuzmanov OpenFPGA CoreLib core library
interoperability effort . . . . . . . . 231--244
Proshanta Saha and
Esam El-Araby and
Miaoqing Huang and
Mohamed Taher and
Sergio Lopez-Buedo and
Tarek El-Ghazawi and
Chang Shu and
Kris Gaj and
Alan Michalski and
Duncan Buell Portable library development for
reconfigurable computing systems: a case
study . . . . . . . . . . . . . . . . . 245--260
Yongfeng Gu and
Tom VanCourt and
Martin C. Herbordt Explicit design of FPGA-based
coprocessors for short-range force
computations in molecular dynamics
simulations . . . . . . . . . . . . . . 261--277
Akila Gothandaraman and
Gregory D. Peterson and
G. L. Warren and
Robert J. Hinde and
Robert J. Harrison FPGA acceleration of a quantum Monte
Carlo application . . . . . . . . . . . 278--291
Laura Grigori and
Bernard Philippe and
Ahmed Sameh and
Damien Tromeur-Dervout and
Marian Vajtersic Parallel matrix algorithms and
applications . . . . . . . . . . . . . . 293--295
Emmanuel Agullo and
Abdou Guermouche and
Jean-Yves L'Excellent A parallel out-of-core multifrontal
method: Storage of factors on disk and
analysis of models for an out-of-core
active memory . . . . . . . . . . . . . 296--317
C. Chevalier and
F. Pellegrini PT-Scotch: a tool for efficient parallel
graph ordering . . . . . . . . . . . . . 318--331
Guy Antoine Atenekeng Kahou and
Laura Grigori and
Masha Sosonkina A partitioning algorithm for
block-diagonal matrices with overlap . . 332--344
Pascal Hénon and
Pierre Ramet and
Jean Roman On finding approximate supernodes for an
efficient block-ILU$(k)$ factorization 345--362
L. Giraud and
A. Haidar and
L. T. Watson Parallel scalability study of hybrid
preconditioners in three dimensions . . 363--379
Raphaël Couturier and
Christophe Denis and
Fabienne Jézéquel GREMLINS: a large sparse linear solver
for grid environment . . . . . . . . . . 380--391
N. Yamanaka and
T. Ogita and
S. M. Rump and
S. Oishi A parallel algorithm for accurate dot
product . . . . . . . . . . . . . . . . 392--410
S. Hunold and
T. Rauber and
G. Rünger Combining building blocks for parallel
multi-level matrix multiplication . . . 411--426
Kok Fu Ng and
Norhashidah Hj. Mohd Ali Performance analysis of explicit group
parallel algorithms for distributed
memory multicomputer . . . . . . . . . . 427--440
C. Bekas and
A. Curioni and
W. Andreoni Atomic wavefunction initialization in ab
initio molecular dynamics using
distributed Lanczos . . . . . . . . . . 441--450
Petko I. Yanev and
Erricos J. Kontoghiorghes Parallel algorithms for downdating the
least squares estimator of the
regression model . . . . . . . . . . . . 451--468
Maria Lucka and
Igor Melichercik and
Ladislav Halada Application of multistage stochastic
programs solved in parallel in portfolio
management . . . . . . . . . . . . . . . 469--485
Dajin Wang A linear-time algorithm for computing
collision-free path on reconfigurable
mesh . . . . . . . . . . . . . . . . . . 487--496
Yasheng Maimaitijiang and
Mohammed Ali Roula and
Stuart Watson and
Ralf Patz and
Robert J. Williams and
Huw Griffiths Parallelization methods for
implementation of a magnetic induction
tomography forward model in symmetric
multiprocessor systems . . . . . . . . . 497--507
Lee Kee Goh and
Bharadwaj Veeravalli Design and performance evaluation of
combined first-fit task allocation and
migration strategies in mesh
multiprocessor systems . . . . . . . . . 508--520
Wei-Ming Lin Performance modeling and analysis of
correlated parallel computations . . . . 521--538
J. Sánchez-Curto and
P. Chamorro-Posada On a faster parallel implementation of
the split-step Fourier method . . . . . 539--549
Julien Straubhaar Parallel preconditioners for the
conjugate gradient algorithm using
Gram--Schmidt and least squares methods 551--569
Woo-Chul Jeun and
Yang-Suk Kee and
Soonhoi Ha and
Changdon Kee Overcoming performance bottlenecks in
using OpenMP on SMP clusters . . . . . . 570--592
Carlo Mastroianni and
Domenico Talia and
Oreste Verta Designing an information system for
Grids: Comparing hierarchical,
decentralized P2P and super-peer models 593--611
David A. Bader and
Srinivas Aluru High-performance computational biology 613--615
Vipin Sachdeva and
Michael Kistler and
Evan Speight and
Tzy-Hwa Kathy Tzeng Exploring the viability of the Cell
Broadband Engine for bioinformatics
applications . . . . . . . . . . . . . . 616--626
David A. Bader and
Kamesh Madduri A graph-theoretic analysis of the human
protein-interaction network using
multicore parallel algorithms . . . . . 627--639
Sadaf R. Alam and
Pratul K. Agarwal and
Jeffrey S. Vetter Performance characteristics of
biomolecular simulations on high-end
systems with multi-core processors . . . 640--651
P. Brenner and
J. M. Wozniak and
D. Thain and
A. Striegel and
J. W. Peng and
J. A. Izaguirre Biomolecular committor probability
calculation enabled by processing in
network storage . . . . . . . . . . . . 652--660
Michela Taufer and
Ming-Ying Leung and
Thamar Solorio and
Abel Licon and
David Mireles and
Roberto Araiza and
Kyle L. Johnson RNAVLab: a virtual laboratory for
studying RNA secondary structures based
on grid computing technology . . . . . . 661--680
Tim Oliver and
Leow Yuan Yeow and
Bertil Schmidt Integrating FPGA acceleration into HMMer 681--691
Alain Merigot and
Alfredo Petrosino Parallel processing for image and video
processing . . . . . . . . . . . . . . . 693--693
Alain Merigot and
Alfredo Petrosino Parallel processing for image and video
processing: Issues and challenges . . . 694--699
O. Kao On parallel image retrieval with
dynamically extracted features . . . . . 700--709
Myeongsoo Oh and
Kiyoharu Aizawa Large-scale image sensing by a group of
smart image sensors . . . . . . . . . . 710--717
C. Colombo and
A. Del Bimbo and
A. Valli A real-time full body tracking and
humanoid animation system . . . . . . . 718--726
Francesco Isgr\`o and
Domenico Tegolo A distributed genetic algorithm for
restoration of vertical line scratches 727--734
P. P. Jonker and
J. G. E. Olk and
C. Nicolescu Distributed bucket processing: a
paradigm embedded in a framework for the
parallel processing of pixel sets . . . 735--746
Radhika S. Grover and
Qiang Li and
H.-P. Dommel Performance study of data layout schemes
for a SAN-based video server . . . . . . 747--756
Paolo Gamba and
Luca Lombardi and
Marco Porta Log-map analysis . . . . . . . . . . . . 757--764
X. Meng and
V. Chaudhary Boosting data throughput for sequence
database similarity searches on FPGAs
using an adaptive buffering scheme . . . 1--11
Ricardo C. Corrêa and
Valmir C. Barbosa Partially ordered distributed
computations on asynchronous
point-to-point networks . . . . . . . . 12--28
Lih-Yuan Deng and
Huajiang Li and
Jyh-Jen Horng Shiau Scalable parallel multiple recursive
generators of large order . . . . . . . 29--37
Alfredo Buttari and
Julien Langou and
Jakub Kurzak and
Jack Dongarra A class of parallel tiled linear algebra
algorithms for multicore architectures 38--53
Anonymous Acknowledgement to reviewers . . . . . . 54--55
Anonymous Editorial Board . . . . . . . . . . . . ??
Fabrício A. B. da Silva and
Hermes Senger Improving scalability of Bag-of-Tasks
applications running on master-slave
platforms . . . . . . . . . . . . . . . 57--71
Yuh-Rau Wang A novel $O(1)$ time algorithm for $3$D
block-based medial axis transform by
peeling corner shells . . . . . . . . . 72--82
Anne Benoit and
Mourad Hakem and
Yves Robert Contention awareness and fault-tolerant
scheduling for precedence constrained
tasks in heterogeneous systems . . . . . 83--108
L. K. S. Daldorff and
B. Eliasson Parallelization of a Vlasov--Maxwell
solver in four-dimensional phase space 109--115
Rupak Biswas and
Leonid Oliker and
Jeffrey Vetter Revolutionary technologies for
acceleration of emerging petascale
applications . . . . . . . . . . . . . . 117--118
David A. Bader and
Virat Agarwal and
Seunghwa Kang Computing discrete transforms on the
Cell Broadband Engine . . . . . . . . . 119--137
Jakub Kurzak and
Wesley Alvaro and
Jack Dongarra Optimizing matrix multiplication for a
short-vector SIMD architecture --- CELL
processor . . . . . . . . . . . . . . . 138--150
Jeremy S. Meredith and
Gonzalo Alvarez and
Thomas A. Maier and
Thomas C. Schulthess and
Jeffrey S. Vetter Accuracy and performance of graphics
processors: a Quantum Monte Carlo
application case study . . . . . . . . . 151--163
David J. Hardy and
John E. Stone and
Klaus Schulten Multilevel summation of electrostatic
potentials using graphics processing
units . . . . . . . . . . . . . . . . . 164--177
Samuel Williams and
Leonid Oliker and
Richard Vuduc and
John Shalf and
Katherine Yelick and
James Demmel Optimization of sparse matrix-vector
multiplication on emerging multicore
platforms . . . . . . . . . . . . . . . 178--194
Suresh Behara and
Sanjay Mittal Parallel finite element computation of
incompressible flows . . . . . . . . . . 195--212
Arquimedes Canedo and
Ben A. Abderazek and
Masahiro Sowa Efficient compilation for queue size
constrained queue processors . . . . . . 213--225
Tien-Yien Li and
Chih-Hsiung Tsai HOM4PS-2.0para: Parallelization of
HOM4PS-2.0 for solving polynomial
systems . . . . . . . . . . . . . . . . 226--238
Sid-Ahmed-Ali Touati and
Zsolt Mathe Periodic register saturation in
innermost loops . . . . . . . . . . . . 239--254
Won W. Ro and
Jean-Luc Gaudiot A complexity-effective microprocessor
design with decoupled dispatch queues
and prefetching . . . . . . . . . . . . 255--268
Yaohang Li and
Michael Mascagni and
Andrey Gorin A decentralized parallel implementation
for parallel tempering algorithm . . . . 269--283
L. Grinberg and
D. Pekurovsky and
S. J. Sherwin and
G. E. Karniadakis Parallel performance of the coarse space
linear vertex solver and low energy
basis preconditioner for spectral/hp
elements . . . . . . . . . . . . . . . . 284--304
Antonio Robles-Gómez and
Aurelio Bermúdez and
Rafael Casado and
Åshild Grònstad Solheim A dynamic distributed mechanism for
reconfiguring high-performance networks 305--312
Ching-Wen Chen and
Chuan-Chi Weng and
Chang-Jung Ku An overlapping and pipelining data
transmission MAC protocol with multiple
channels in ad hoc networks . . . . . . 313--330
Taro Konda and
Yoshimasa Nakamura A new algorithm for singular value
decomposition and its parallelization 331--344
Gerold Jäger and
Clemens Wagner Efficient parallelizations of Hermite
and Smith normal form algorithms . . . . 345--357
Julian Borrill and
Leonid Oliker and
John Shalf and
Hongzhang Shan and
Andrew Uselton HPC global file system performance
analysis using a scientific-application
derived benchmark . . . . . . . . . . . 358--373
Markus Geimer and
Felix Wolf and
Brian J. N. Wylie and
Bernd Mohr A scalable tool architecture for
diagnosing wait states in massively
parallel applications . . . . . . . . . 375--388
Jay Smith and
Vladimir Shestak and
Howard Jay Siegel and
Suzy Price and
Larry Teklits and
Prasanna Sugavanam Robust resource allocation in a cluster
based imaging system . . . . . . . . . . 389--400
Yang Wang and
Ming Zhu and
Hua Li A distributed Key Message algorithm to
optimize the communication in clusters 401--415
Hatem Ltaief and
Marc Garbey A parallel Aitken-additive Schwarz
waveform relaxation suitable for the
grid . . . . . . . . . . . . . . . . . . 416--428
Cole Trapnell and
Michael C. Schatz Optimizing data intensive GPGPU
computations for DNA sequence alignment 429--440
Tz-Liang Kueng and
Cheng-Kuan Lin and
Tyne Liang and
Jimmy J. M. Tan and
Lih-Hsing Hsu Embedding paths of variable lengths into
hypercubes with conditional link-faults 441--454
Arturo González-Escribano and
Arjan J. C. van Gemund and
Valentín Cardeñoso-Payo Performance implications of
synchronization structure in parallel
programming . . . . . . . . . . . . . . 455--474
Ananta Tiwari and
Vahid Tabatabaee and
Jeffrey K. Hollingsworth Tuning parallel applications in parallel 475--492
Diane Lingrand and
Tristan Glatard and
Johan Montagnat Modeling the latency on production grids
with respect to the execution context 493--511
Anshu Dubey and
Katie Antypas and
Murali K. Ganapathy and
Lynn B. Reid and
Katherine Riley and
Dan Sheeler and
Andrew Siegel and
Klaus Weide Extensible component-based architecture
for FLASH, a massively parallel,
multiphysics simulation code . . . . . . 512--522
I. Marín Carrión and
E. Arias Antúnez and
M. M. Artigao Castillo and
J. J. Águila Guerrero and
J. J. Miralles Canals Thread-based implementations of the
false nearest neighbors method . . . . . 523--534
Hamid Mahini and
Hamid Sarbazi-Azad Resource placement in three-dimensional
tori . . . . . . . . . . . . . . . . . . 535--543
Henning Meyerhenke and
Burkhard Monien and
Stefan Schamberger Graph partitioning and disturbed
diffusion . . . . . . . . . . . . . . . 544--569
Franck Cappello and
Thomas Herault and
Jack Dongarra Foreword . . . . . . . . . . . . . . . . 571--571
Bin Jia Process cooperation in multiple message
broadcast . . . . . . . . . . . . . . . 572--580
Peter Sanders and
Jochen Speck and
Jesper Larsson Träff Two-tree algorithms for full bandwidth
broadcast, reduction and scan . . . . . 581--594
Daniel Becker and
Rolf Rabenseifner and
Felix Wolf and
John C. Linford Scalable timestamp synchronization for
event traces of message-passing
applications . . . . . . . . . . . . . . 595--607
Rajeev Thakur and
William Gropp Test suite for evaluating performance of
multithreaded MPI communication . . . . 608--617
Jeffrey K. Hollingsworth Editorial . . . . . . . . . . . . . . . 1--2
P. Amestoy and
I. S. Duff and
A. Guermouche and
Tz. Slavova Analysis of the solution phase of a
parallel multifrontal approach . . . . . 3--15
Shigeo Orii Metrics for evaluation of parallel
efficiency toward highly parallel
processing . . . . . . . . . . . . . . . 16--25
Juan Piernas-Canovas and
Jarek Nieplocha Implementation and evaluation of active
storage in modern parallel file systems 26--47
Rajesh Sudarsan and
Calvin J. Ribbens Design and performance of a scheduling
framework for resizable parallel
applications . . . . . . . . . . . . . . 48--64
Carlos Alberto Alonso Sanches and
Nei Yoshihiro Soma and
Horacio Hideki Yanasse Observations on optimal parallelizations
of two-list . . . . . . . . . . . . . . 65--67
Anonymous Acknowledgment to Reviewers . . . . . . 68--69
Anonymous Editorial Board . . . . . . . . . . . . ??
Javier Navaridas and
Jose Miguel-Alonso and
Francisco Javier Ridruejo and
Wolfgang Denzel Reducing complexity in tree-like
computer interconnection networks . . . 71--85
Hinde Lilia Bouziane and
Christian Pérez and
Thierry Priol Extending software component models with
the master-worker paradigm . . . . . . . 86--103
Yi-Neng Lin and
Ying-Dar Lin and
Yuan-Cheng Lai Thread allocation in CMP-based
multithreaded network processors . . . . 104--116
Mathieu Luisier and
Gerhard Klimeck Numerical strategies towards peta-scale
simulations of nanoelectronics devices 117--128
Yusuke Okitsu and
Fumihiko Ino and
Kenichi Hagihara High-performance cone beam
reconstruction using CUDA compatible
GPUs . . . . . . . . . . . . . . . . . . 129--141
J. Götz and
K. Iglberger and
C. Feichtinger and
S. Donath and
U. Rüde Coupling multibody dynamics and
computational fluid dynamics on 8192
processor cores . . . . . . . . . . . . 142--151
Mauricio Marin and
Veronica Gil-Costa and
Carolina Bonacic and
Ricardo Baeza-Yates and
Isaac D. Scherson Sync/Async parallel search for the
efficient design and construction of Web
search engines . . . . . . . . . . . . . 153--168
Andrzej Karbowski and
Maciej Remiszewski Assessment of the Cell Broadband Engine
Architecture as a platform to solve
closed-loop optimal control problems . . 169--180
M. Krotkiewski and
M. Dabrowski Parallel symmetric sparse matrix-vector
product on scalar multi-core CPUs . . . 181--198
J. Berli\'nska and
M. Drozdowski Heuristics for multi-round divisible
loads scheduling with limited memory . . 199--211
Costas Bekas and
Pasqua D'Ambra and
Ananth Grama and
Yousef Saad and
Petko Yanev Special issue on Parallel Matrix
Algorithms and Applications . . . . . . 213--214
Joseph M. Elble and
Nikolaos V. Sahinidis and
Panagiotis Vouzis GPU computing with Kaczmarz's and other
iterative algorithms for linear systems 215--231
Stanimire Tomov and
Jack Dongarra and
Marc Baboulin Towards dense linear algebra for hybrid
GPU accelerated manycore systems . . . . 232--240
Aydìn Buluç and
John R. Gilbert and
Ceren Budak Solving path problems on the GPU . . . . 241--253
Bora Uçar and
Ümit V. Çatalyürek and
Cevdet Aykanat A Matrix Partitioning Interface to PaToH
in MATLAB . . . . . . . . . . . . . . . 254--272
T. Huckle and
A. Kallischko and
A. Roy and
M. Sedlacek and
T. Weinzierl An efficient parallel implementation of
the MSPAI preconditioner . . . . . . . . 273--284
L. Giraud and
A. Haidar and
S. Pralet Using multiple levels of parallelism to
enhance the performance of domain
decomposition solvers . . . . . . . . . 285--296
Martin Be\vcka and
Gabriel Ok\vsa and
Marián Vajter\vsic and
Laura Grigori On iterative QR pre-processing in the
parallel block-Jacobi SVD algorithm . . 297--307
Fabrice Dupros and
Florent De Martin and
Evelyne Foerster and
Dimitri Komatitsch and
Jean Roman High-performance finite-element
simulations of seismic wave propagation
in three-dimensional nonlinear inelastic
geological media . . . . . . . . . . . . 308--325
Maximilian Emans Performance of parallel
AMG-preconditioners in CFD-codes for
weakly compressible flows . . . . . . . 326--338
Jose E. Roman and
Matthias Kammerer and
Florian Merz and
Frank Jenko Fast eigenvalue calculations in a
massively parallel plasma turbulence
code . . . . . . . . . . . . . . . . . . 339--358
T. Auckenthaler and
M. Bader and
T. Huckle and
A. Spörl and
K. Waldherr Matrix exponentials and parallel prefix
computation in a quantum control problem 359--369
Ruppa K. Thulasiram Preface . . . . . . . . . . . . . . . . 371--371
Vladimir Surkov Parallel option pricing with Fourier
space time-stepping method on graphics
processing units . . . . . . . . . . . . 372--380
Manfred Gilli and
Enrico Schumann Distributed optimisation of a
portfolio's Omega . . . . . . . . . . . 381--389
S. Corsaro and
P. L. De Angelis and
Z. Marino and
F. Perla and
P. Zanetti On parallel asset-liability management
in life insurance: a forward
risk-neutral approach . . . . . . . . . 390--402
Gianluca Fusai and
Daniele Marazzina and
Marina Marena Option pricing, maturity randomization
and distributed computing . . . . . . . 403--414
Giray Ökten and
Matthew Willyard Parameterization based on randomized
quasi-Monte Carlo methods . . . . . . . 415--422
Andrew V. Terekhov Parallel Dichotomy Algorithm for solving
tridiagonal system of linear equations
with multiple right-hand sides . . . . . 423--438
Daisuke Takahashi Parallel implementation of
multiple-precision arithmetic and
$2,576,980,370,000$ decimal digits of
$\pi$ calculation . . . . . . . . . . . 439--448
Pavan Yalamanchili and
Sumod Mohan and
Rommel Jalasutram and
Tarek Taha Acceleration of hierarchical Bayesian
network based cortical models on
multicore architectures . . . . . . . . 449--468
Tomas Hruz and
Stefan Geisseler and
Marcel Schöngens Parallelism in simulation and modeling
of scale-free complex networks . . . . . 469--485
Qiankun Miao and
Guangzhong Sun and
Jiulong Shan and
Guoliang Chen Parallelization and optimization of
Mfold on shared memory system . . . . . 487--494
Dan Gordon and
Rachel Gordon CARP--CG: a robust and efficient
parallel solver for linear systems,
applied to strongly convection dominated
PDEs . . . . . . . . . . . . . . . . . . 495--515
Fei Xia and
Yong Dou and
Dan Zhou and
Xin Li Fine-grained parallel RNA secondary
structure prediction using SCFGs on FPGA 516--530
Sean Rul and
Hans Vandierendonck and
Koen De Bosschere A profile-based tool for finding
pipeline parallelism in sequential
programs . . . . . . . . . . . . . . . . 531--551
J. Ignacio Hidalgo and
Francisco Fernandez and
Juan Lanchares and
Erick Cantú-Paz and
Albert Zomaya Parallel Architectures and Bioinspired
Algorithms . . . . . . . . . . . . . . . 553--554
M. Ruci\'nski and
D. Izzo and
F. Biscani On the impact of the migration topology
on the Island Model . . . . . . . . . . 555--571
José L. Risco-Martín and
David Atienza and
J. Manuel Colmenar and
Oscar Garnica A parallel evolutionary algorithm to
optimize dynamic memory managers in
embedded systems . . . . . . . . . . . . 572--590
Marjan Rouhipour and
Peter J. Bentley and
Hooman Shayani Fast bio-inspired computation using a
GPU-based systemic computer . . . . . . 591--617
Carlos Pérez-Miguel and
Jose Miguel-Alonso and
Alexander Mendiburu Porting Estimation of Distribution
Algorithms to the Cell Broadband Engine 618--634
Una-May O'Reilly and
Eric Robinson and
Sanjeev Mohindra and
Julie Mullen and
Nadya Bliss Hogs and slackers: Using operations
balance in a genetic algorithm to
optimize sparse algebra computation on
distributed architectures . . . . . . . 635--644
Stanimire Tomov and
Rajib Nath and
Jack Dongarra Accelerating the reduction to upper
Hessenberg, tridiagonal, and bidiagonal
forms through hybrid GPU-based computing 645--654
K. A. Hawick and
A. Leist and
D. P. Playne Parallel graph component labelling with
GPUs and CUDA . . . . . . . . . . . . . 655--678
T. E. Athanaileas and
G. E. Athanasiadou and
G. V. Tsoulos and
D. I. Kaklamani Parallel radio-wave propagation modeling
with image-based ray tracing techniques 679--695
Marina Alonso and
Salvador Coll and
Juan-Miguel Martínez and
Vicente Santonja and
Pedro López and
José Duato Power saving in regular interconnection
networks . . . . . . . . . . . . . . . . 696--712
Bo Li and
Koichi Wada Communication latency tolerant parallel
algorithm for particle swarm
optimization . . . . . . . . . . . . . . 1--10
Yung-Chang Chiu and
Ce-Kuen Shieh and
Tzu-Chi Huang and
Tyng-Yeu Liang and
Kuo-Chih Chu Data race avoidance and replay scheme
for developing and debugging parallel
programs on distributed shared memory
systems . . . . . . . . . . . . . . . . 11--25
Sevin Varoglu and
Stephen Jenks Architectural support for thread
communications in multi-core processors 26--41
Rahul Nagpal and
Y. N. Srikant Compiler-assisted power optimization for
clustered VLIW architectures . . . . . . 42--59
Oleg V. Shylo and
Timothy Middelkoop and
Panos M. Pardalos Restart strategies in optimization:
parallel and serial cases . . . . . . . 60--68
Robert W. Numrich and
Michael A. Heroux Self-similarity of parallel machines . . 69--84
Brice Goglin High-performance message-passing over
generic Ethernet hardware with Open-MX 85--100
Anshu Dubey and
Katie Antypas and
Christopher Daley Parallel algorithms for moving
Lagrangian data on block structured
Eulerian meshes . . . . . . . . . . . . 101--113
Alireza Poshtkohi and
M. B. Ghaznavi-Ghoushchi DotDFS: a Grid-based high-throughput
file transfer system . . . . . . . . . . 114--136
Anonymous Editorial Board . . . . . . . . . . . . ifc
Antonio Robles-Gómez and
Aurelio Bermúdez and
Rafael Casado Efficient network management applied to
source routed networks . . . . . . . . . 137--156
Liangxiu Han and
Chee Sun Liew and
Jano van Hemert and
Malcolm Atkinson A generic parallel processing model for
facilitating data mining and integration 157--171
Eric Aubanel Scheduling of tasks in the parareal
algorithm . . . . . . . . . . . . . . . 172--182
José I. Aliaga and
Matthias Bollhöfer and
Alberto F. Martín and
Enrique S. Quintana-Orti' Exploiting thread-level parallelism in
the iterative solution of sparse linear
systems . . . . . . . . . . . . . . . . 183--202
Anonymous Editorial Board . . . . . . . . . . . . ??
Christian Konrad Two-constraint domain decomposition with
Space Filling Curves . . . . . . . . . . 203--216
Robert W. Robey and
Jonathan M. Robey and
Rob Aulwes In search of numerical consistency in
parallel programming . . . . . . . . . . 217--229
Omar Bouattane and
Bouchaib Cherradi and
Mohamed Youssfi and
Mohamed O. Bensalah Parallel $c$-means algorithm for image
segmentation on a reconfigurable mesh
computer . . . . . . . . . . . . . . . . 230--243
David Díaz and
Francisco José Esteban and
Pilar Hernández and
Juan Antonio Caballero and
Gabriel Dorado and
Sergio Gálvez Parallelizing and optimizing a
bioinformatics pairwise sequence
alignment algorithm for many-core
architecture . . . . . . . . . . . . . . 244--259
Anonymous Editorial Board . . . . . . . . . . . . ??
Dimitrije Jevremovi\'c and
Cong T. Trinh and
Friedrich Srienc and
Carlos P. Sosa and
Daniel Boley Parallelization of Nullspace Algorithm
for the computation of metabolic
pathways . . . . . . . . . . . . . . . . 261--278
Fangzhou Wei and
Ali E. Yilmaz A hybrid message passing/shared memory
parallelization of the adaptive integral
method for multi-core clusters . . . . . 279--301
Hao Wang and
Xudong Fu and
Guangqian Wang and
Tiejian Li and
Jie Gao A common parallel computing framework
for modeling hydrological processes of
river basins . . . . . . . . . . . . . . 302--315
Pablo D. Mininni and
Duane Rosenberg and
Raghu Reddy and
Annick Pouquet A hybrid MPI--OpenMP scheme for scalable
parallel pseudospectral computations for
fluid turbulence . . . . . . . . . . . . 316--326
Anonymous Editorial Board . . . . . . . . . . . . ??
Jeffrey K. Hollingsworth In Memoriam: Angela C. Sodan, PhD
(August 30, 1955--April 21, 2011) . . . 327--327
Yves Robert and
Leonel Sousa and
Denis Trystram Parallel Computing --- Special Issue . . 329--330
Anne Benoit and
Henri Casanova and
Veronika Rehn-Sonigo and
Yves Robert Resource allocation for multiple
concurrent in-network stream-processing
applications . . . . . . . . . . . . . . 331--348
Cristina Boeres and
Idalmis Milián Sardiña and
Lúcia M. A. Drummond An efficient weighted bi-objective
scheduling algorithm for heterogeneous
systems . . . . . . . . . . . . . . . . 349--364
Anne Benoit and
Yves Robert and
Arnold Rosenberg and
Frédéric Vivien Static worksharing strategies for
heterogeneous computers with
unrecoverable interruptions . . . . . . 365--378
Luis Garcés-Erice Admission control for a responsive
distributed middleware using decision
trees to model run-time parameters . . . 379--391
M. M. Khan and
A. D. Rast and
J. Navaridas and
X. Jin and
L. A. Plana and
M. Luján and
S. Temple and
C. Patterson and
D. Richards and
J. V. Woods and
J. Miguel-Alonso and
S. B. Furber Event-driven configuration of a neural
network CMP system over an homogeneous
interconnect fabric . . . . . . . . . . 392--409
Anne Benoit and
Alexandru Dobrila and
Jean-Marc Nicod and
Laurent Philippe Mapping workflow applications with types
on heterogeneous specialized platforms 410--427
Jorge G. Barbosa and
Belmiro Moreira Dynamic scheduling of a batch of
parallel task jobs on heterogeneous
clusters . . . . . . . . . . . . . . . . 428--438
Peter Benner and
Pablo Ezzatti and
Daniel Kressner and
Enrique S. Quintana-Orti' and
Alfredo Remón A mixed-precision algorithm for the
solution of Lyapunov equations on hybrid
CPU--GPU platforms . . . . . . . . . . . 439--450
Chenqi Wang and
Neil Cafferkey and
James Kennedy and
John P. Morrison CG3DR: Coordination of icosahedral virus
reconstruction using Condensed Graphs 451--465
Mathieu Giraud and
Jean-Stéphane Varré Parallel Position Weight Matrices
algorithms . . . . . . . . . . . . . . . 466--478
Anna Beletska and
W\lodzimierz Bielecki and
Albert Cohen and
Marek Palkowski and
Krzysztof Siedlecki Coarse-grained loop parallelization:
Iteration Space Slicing vs affine
transformations . . . . . . . . . . . . 479--497
Anonymous Editorial Board . . . . . . . . . . . . ??
Leonid Oliker and
Rajesh Nishtala and
Rupak Biswas Emerging programming paradigms for
large-scale scientific computing . . . . 499--500
Kamesh Madduri and
Eun-Jin Im and
Khaled Z. Ibrahim and
Samuel Williams and
Stéphane Ethier and
Leonid Oliker Gyrokinetic particle-in-cell
optimization on emerging multi- and
manycore platforms . . . . . . . . . . . 501--520
Wang Xian and
Aoki Takayuki Multi-GPU performance of incompressible
flow computation by lattice Boltzmann
method on GPU cluster . . . . . . . . . 521--535
Christian Feichtinger and
Johannes Habich and
Harald Köstler and
Georg Hager and
Ulrich Rüde and
Gerhard Wellein A flexible Patch-based lattice Boltzmann
parallelization approach for
heterogeneous GPU--CPU clusters . . . . 536--549
Darren J. Kerbyson and
Michael Lang and
Scott Pakin Adapting wave-front algorithms to
efficiently utilize systems with deep
communication hierarchies . . . . . . . 550--561
Haoqiang Jin and
Dennis Jespersen and
Piyush Mehrotra and
Rupak Biswas and
Lei Huang and
Barbara Chapman High performance computing using MPI and
OpenMP on multi-core parallel systems 562--575
Rajesh Nishtala and
Yili Zheng and
Paul H. Hargrove and
Katherine A. Yelick Tuning collective communication for
Partitioned Global Address Space
programming models . . . . . . . . . . . 576--591
David Gay and
Joel Galenson and
Mayur Naik and
Kathy Yelick Yada: Straightforward parallel
programming . . . . . . . . . . . . . . 592--609
Steven J. Plimpton and
Karen D. Devine MapReduce in MPI for large-scale graph
algorithms . . . . . . . . . . . . . . . 610--632
Michael Wilde and
Mihael Hategan and
Justin M. Wozniak and
Ben Clifford and
Daniel S. Katz and
Ian Foster Swift: a language for distributed
parallel scripting . . . . . . . . . . . 633--652
Anonymous Editorial Board . . . . . . . . . . . . ??
Lizhi Peng and
Bo Yang and
Lei Zhang and
Yuehui Chen A parallel evolving algorithm for
flexible neural tree . . . . . . . . . . 653--666
Min Yeol Lim and
Vincent W. Freeh and
David K. Lowenthal Adaptive, transparent CPU scaling
algorithms leveraging inter-node MPI
communication regions . . . . . . . . . 667--683
Tristan Glatard and
Sorina Camarasu-Pop A model of pilot-job resource
provisioning on production grids . . . . 684--692
Loris Marchal and
Frédéric Vivien Editorial . . . . . . . . . . . . . . . 693--693
Naga Vydyanathan and
Umit Catalyurek and
Tahsin Kurc and
Ponnuswamy Sadayappan and
Joel Saltz Optimizing latency and throughput of
application workflows on clusters . . . 694--712
Ioannis Riakiotakis and
Florina M. Ciorba and
Theodore Andronikos and
George Papakonstantinou Distributed dynamic load balancing for
pipelined computations on heterogeneous
systems . . . . . . . . . . . . . . . . 713--729
Anonymous Editorial Board . . . . . . . . . . . . ??
Peter Arbenz and
Yousef Saad and
Ahmed Sameh and
Olaf Schenk Special issue on Parallel Matrix
Algorithms and Applications (PMAA'10) 731--732
Karan Mendiratta and
Eric Polizzi A threaded SPIKE algorithm for solving
general banded systems . . . . . . . . . 733--741
Daniel Maurer and
Christian Wieners A parallel block LU decomposition method
for distributed finite element matrices 742--758
Chenhan D. Yu and
Weichung Wang and
Dan'l Pierce A CPU--GPU hybrid approach for the
unsymmetric multifrontal method . . . . 759--770
L. Karlsson and
B. Kågström Parallel two-stage reduction to
Hessenberg form using dynamic scheduling
on shared-memory architectures . . . . . 771--782
T. Auckenthaler and
V. Blum and
H.-J. Bungartz and
T. Huckle and
R. Johanni and
L. Krämer and
B. Lang and
H. Lederer and
P. R. Willems Parallel solution of partial symmetric
eigenvalue problems from electronic
structure calculations . . . . . . . . . 783--794
M. Petschow and
P. Bientinesi MR$^3$-SMP: a symmetric tridiagonal
eigensolver for multi-core architectures 795--805
A. N. Yzelman and
Rob H. Bisseling Two-dimensional cache-oblivious sparse
matrix-vector multiplication . . . . . . 806--819
Johannes Langguth and
Md. Mostofa Ali Patwary and
Fredrik Manne Parallel algorithms for bipartite
matching problems on distributed memory
computers . . . . . . . . . . . . . . . 820--845
Cyril Flaig and
Peter Arbenz A scalable memory efficient multigrid
solver for micro-finite element analyses
based on CT images . . . . . . . . . . . 846--854
Anonymous Editorial Board . . . . . . . . . . . . ??
Torsten Hoefler Extensions for next-generation parallel
programming models . . . . . . . . . . . 1--1
Nick Rutar and
Jeffrey K. Hollingsworth Data centric techniques for mapping
performance data to program variables 2--14
Joshua Hursey and
Richard L. Graham Analyzing fault aware collective
performance in a process fault tolerant
MPI . . . . . . . . . . . . . . . . . . 15--25
Jesper Larsson Träff Alternative, uniformly expressive and
more scalable interfaces for collective
communication in MPI . . . . . . . . . . 26--36
George Bosilca and
Aurelien Bouteiller and
Anthony Danalis and
Thomas Herault and
Pierre Lemarinier and
Jack Dongarra DAGuE: a generic distributed DAG engine
for High Performance Computing . . . . . 37--51
Martin Sandrieser and
Siegfried Benkner and
Sabri Pllana Using explicit platform descriptions to
support programming of heterogeneous
many-core systems . . . . . . . . . . . 52--65
Phil Miller and
Aaron Becker and
Laxmikant Kalé Using shared arrays in message-driven
parallel programs . . . . . . . . . . . 66--74
Pieter Hijma and
Rob V. van Nieuwpoort and
Ceriel J. H. Jacobs and
Henri E. Bal Generating synchronization statements in
divide-and-conquer programs . . . . . . 75--89
Anonymous Editorial Board . . . . . . . . . . . . ??
Lucas Mello Schnorr and
Guillaume Huard and
Philippe Olivier Alexandre Navaux A hierarchical aggregation model to
achieve visualization scalability in the
analysis of parallel applications . . . 91--110
Holger Scherl and
Markus Kowarschik and
Hannes G. Hofmann and
Benjamin Keck and
Joachim Hornegger Evaluation of state-of-the-art hardware
architectures for fast cone-beam CT
reconstruction . . . . . . . . . . . . . 111--124
A. Moreno and
E. Cesar and
A. Guevara and
J. Sorribes and
T. Margalef Load balancing in homogeneous pipeline
based applications . . . . . . . . . . . 125--139
Aleksandr Ovcharenko and
Daniel Ibanez and
Fabien Delalondre and
Onkar Sahni and
Kenneth E. Jansen and
Christopher D. Carothers and
Mark S. Shephard Neighborhood communication paradigm to
increase scalability in large-scale
dynamic scientific applications . . . . 140--156
Andreas Klöckner and
Nicolas Pinto and
Yunsup Lee and
Bryan Catanzaro and
Paul Ivanov and
Ahmed Fasih PyCUDA and PyOpenCL: a scripting-based
approach to GPU run-time code generation 157--174
Anonymous Editorial Board . . . . . . . . . . . . ??
Minhaj Ahmad Khan Scheduling for heterogeneous systems
using constrained critical paths . . . . 175--193
Kathryn Mohror and
Karen L. Karavanic Trace profiling: Scalable event tracing
on high-end parallel systems . . . . . . 194--225
Gerassimos Barlas Cluster-based optimized parallel video
transcoding . . . . . . . . . . . . . . 226--244
H. M. Aktulga and
J. C. Fogarty and
S. A. Pandit and
A. Y. Grama Parallel reactive molecular dynamics:
Numerical methods and algorithmic
techniques . . . . . . . . . . . . . . . 245--259
Roman Wyrzykowski and
Krzysztof Rojek and
Lukasz Szustak Model-driven adaptation of
double-precision matrix multiplication
to the Cell processor architecture . . . 260--276
Anonymous Editorial Board . . . . . . . . . . . . ??
F. Argüello and
D. B. Heras and
M. Bóo and
J. Lamas-Rodríguez The split-and-merge method in general
purpose computation on GPUs . . . . . . 277--288
Timothy D. R. Hartley and
Erik Saule and
Ümit V. Çatalyürek Improving performance of adaptive
component-based dataflow middleware . . 289--309
Peng Di and
Hui Wu and
Jingling Xue and
Feng Wang and
Canqun Yang Parallelizing SOR for GPGPUs using
alternate loop tiling . . . . . . . . . 310--328
Rahul Nagpal and
Anasua Bhowmik Criticality guided energy aware
speculation for speculative
multithreaded processors . . . . . . . . 329--341
Anonymous Editorial Board . . . . . . . . . . . . ??
Volodymyr Kindratenko and
Gregory D. Peterson Application accelerators in HPC ---
Editorial introduction . . . . . . . . . 343--343
Andrew G. Schmidt and
Siddhartha Datta and
Ashwin A. Mendon and
Ron Sass Investigation into scaling I/O bound
streaming applications productively with
an all-FPGA cluster . . . . . . . . . . 344--364
Frederico Pratas and
Pedro Trancoso and
Leonel Sousa and
Alexandros Stamatakis and
Guochun Shi and
Volodymyr Kindratenko Fine-grain parallelism using multi-core,
Cell/BE, and GPU Systems . . . . . . . . 365--390
Peng Du and
Rick Weber and
Piotr Luszczek and
Stanimire Tomov and
Gregory Peterson and
Jack Dongarra From CUDA to OpenCL: Towards a
performance-portable solution for
multi-platform GPU programming . . . . . 391--407
Francisco Vázquez and
José Jesús Fernández and
Ester M. Garzón Automatic tuning of the sparse matrix
vector product on GPUs based on the
ELLR-T approach . . . . . . . . . . . . 408--420
Depeng Yang and
Gregory. D. Peterson and
Husheng Li Compressed sensing and Cholesky
decomposition on FPGAs and GPUs . . . . 421--437
John R. Wernsing and
Greg Stitt Elastic computing: a portable
optimization framework for hybrid
computers . . . . . . . . . . . . . . . 438--464
Anonymous Editorial Board . . . . . . . . . . . . ??
Basilio B. Fraguela and
Ganesh Bikshandi and
Jia Guo and
María J. Garzarán and
David Padua and
Christoph von Praun Optimization techniques for efficient
HTA programs . . . . . . . . . . . . . . 465--484
Takeshi Iwashita and
Yu Hirotani and
Takeshi Mifune and
Toshio Murayama and
Hideki Ohtani Large-scale time-harmonic
electromagnetic field analysis using a
multigrid solver on a distributed memory
parallel computer . . . . . . . . . . . 485--500
Amit Amritkar and
Danesh Tafti and
Rui Liu and
Rick Kufrin and
Barbara Chapman OpenMP parallelism for fluid and
fluid-particulate systems . . . . . . . 501--517
Wlodzimierz Bielecki and
Marek Palkowski and
Tomasz Klimek Free scheduling for statement instances
of parameterized arbitrarily nested
affine loops . . . . . . . . . . . . . . 518--532
Anonymous Editorial Board . . . . . . . . . . . . ??
Yong Chen and
Huaiyu Zhu and
Hui Jin and
Xian-He Sun Algorithm-level Feedback-controlled
Adaptive data prefetcher: Accelerating
data access for high-performance
processors . . . . . . . . . . . . . . . 533--551
Mickeal Verschoor and
Andrei C. Jalba Analysis and performance estimation of
the Conjugate Gradient method on
multiple GPUs . . . . . . . . . . . . . 552--575
Ümit V. Çatalyürek and
John Feo and
Assefaw H. Gebremedhin and
Mahantesh Halappanavar and
Alex Pothen Graph coloring algorithms for multi-core
and massively multithreaded
architectures . . . . . . . . . . . . . 576--594
Anonymous Editorial Board . . . . . . . . . . . . ??
Madan Sathe and
Olaf Schenk and
Helmar Burkhart An auction-based weighted matching
implementation on massively parallel
architectures . . . . . . . . . . . . . 595--614
M. Etinski and
J. Corbalan and
J. Labarta and
M. Valero Parallel job scheduling for power
constrained HPC systems . . . . . . . . 615--630
Anonymous Editorial Board . . . . . . . . . . . . ??
Dana A. Jacobsen and
Inanc Senocak Multi-level parallelism for
incompressible flow computations on GPU
clusters . . . . . . . . . . . . . . . . 1--20
Masha Sosonkina and
Layne T. Watson and
Nicholas R. Radcliffe and
Rafael T. Haftka and
Michael W. Trosset Adjusting process count on demand for
petascale global optimization . . . . . 21--35
Diego Andrade and
Basilio B. Fraguela and
Ramón Doallo Accurate prediction of the behavior of
multithreaded applications in shared
caches . . . . . . . . . . . . . . . . . 36--57
Orlando Ayala and
Lian-Ping Wang Parallel implementation and scalability
analysis of $3$D Fast Fourier Transform
using $2$D domain decomposition . . . . 58--77
Anonymous Editorial Board . . . . . . . . . . . . ??
Abhinav Sarje and
Srinivas Aluru All-pairs computations on many-core
graphics processors . . . . . . . . . . 79--93
Ferit Büyükkeçeci and
Omar Awile and
Ivo F. Sbalzarini A portable OpenCL implementation of
generic particle-mesh and mesh-particle
interpolation in $2$D and $3$D . . . . . 94--111
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous Preface: Infrastructure for scalable
tools . . . . . . . . . . . . . . . . . 113--113
Mark W. Krentel Libmonitor: a tool for first-party
monitoring . . . . . . . . . . . . . . . 114--119
Nick Rutar and
Jeffrey K. Hollingsworth Software techniques for negating skid
and approximating cache miss
measurements . . . . . . . . . . . . . . 120--131
Marc-André Hermanns and
Sriram Krishnamoorthy and
Felix Wolf A scalable infrastructure for the
performance analysis of passive target
synchronization . . . . . . . . . . . . 132--145
Michael O. Lam and
Jeffrey K. Hollingsworth and
G. W. Stewart Dynamic floating-point cancellation
detection . . . . . . . . . . . . . . . 146--155
Barry Rountree and
Todd Gamblin and
Bronis R. de Supinski and
Martin Schulz and
David K. Lowenthal and
Guy Cobb and
Henry Tufo Parallelizing heavyweight debugging
tools with \tt mpiecho . . . . . . . . . 156--166
J. D. Goehner and
D. C. Arnold and
D. H. Ahn and
G. L. Lee and
B. R. de Supinski and
M. P. LeGendre and
B. P. Miller and
M. Schulz LIBI: a framework for bootstrapping
extreme scale software systems . . . . . 167--176
Anonymous Editorial Board . . . . . . . . . . . . ??
Sen Su and
Jian Li and
Qingjia Huang and
Xiao Huang and
Kai Shuang and
Jie Wang Cost-efficient task scheduling for
executing large programs in the cloud 177--188
George Teodoro and
Tony Pan and
Tahsin M. Kurc and
Jun Kong and
Lee A. D. Cooper and
Joel H. Saltz Efficient irregular wavefront
propagation algorithms on hybrid
CPU--GPU machines . . . . . . . . . . . 189--211
Jack Dongarra and
Mathieu Faverge and
Thomas Hérault and
Mathias Jacquelin and
Julien Langou and
Yves Robert Hierarchical QR factorization algorithms
for multi-core clusters . . . . . . . . 212--232
Wagner Kolberg and
Pedro de B. Marcos and
Julio C. S. Anjos and
Alexandre K. S. Miyazaki and
Claudio R. Geyer and
Luciana B. Arantes MRSG --- a MapReduce simulator over
SimGrid . . . . . . . . . . . . . . . . 233--244
Anonymous Editorial Board . . . . . . . . . . . . ??
Andrew V. Terekhov A fast parallel algorithm for solving
block-tridiagonal systems of linear
equations including the domain
decomposition method . . . . . . . . . . 245--258
Christian Obrecht and
Frédéric Kuznik and
Bernard Tourancheau and
Jean-Jacques Roux Scalable lattice Boltzmann solvers for
CUDA GPU clusters . . . . . . . . . . . 259--270
Yuefan Deng and
Peng Zhang and
Carlos Marques and
Reid Powell and
Li Zhang Analysis of Linpack and power
efficiencies of the world's TOP500
supercomputers . . . . . . . . . . . . . 271--279
Ichitaro Yamazaki and
Hiroto Tadano and
Tetsuya Sakurai and
Tsutomu Ikegami Performance comparison of parallel
eigensolvers based on a contour integral
method and a Lanczos method . . . . . . 280--290
Anonymous Editorial Board . . . . . . . . . . . . ??
Yang Wang and
Paul Lu DDS: a deadlock detection-based
scheduling algorithm for workflow
computations in HPC systems with storage
constraints . . . . . . . . . . . . . . 291--305
A. Sandroos and
I. Honkonen and
S. von Alfthan and
M. Palmroth Multi-GPU simulations of Vlasov's
equation using Vlasiator . . . . . . . . 306--318
O. Fortmeier and
H. M. Bücker and
B. O. Fagginger Auer and
R. H. Bisseling A new metric enabling an exact
hypergraph model for the communication
volume in distributed-memory parallel
applications . . . . . . . . . . . . . . 319--335
Harald Servat and
Germán Llort and
Kevin Huck and
Judit Giménez and
Jesús Labarta Framework for a productive performance
optimization . . . . . . . . . . . . . . 336--353
Anonymous Editorial Board . . . . . . . . . . . . ??
Fangyang Shen and
Mei Yang and
Maurizio Palesi Guest Editors' Introduction to the
Special Issue on ``Novel On-Chip
Parallel Architectures and Software
Support'' . . . . . . . . . . . . . . . 355--356
Sandeep Pande and
Fearghal Morgan and
Gerard Smit and
Tom Bruintjes and
Jochem Rutgers and
Brian McGinley and
Seamus Cawley and
Jim Harkin and
Liam McDaid Fixed latency on-chip interconnect for
hardware spiking neural network
architectures . . . . . . . . . . . . . 357--371
Junghee Lee and
Chrysostomos Nicopoulos and
Hyung Gyu Lee and
Jongman Kim Sharded Router: a novel on-chip router
architecture employing bandwidth
sharding and stealing . . . . . . . . . 372--388
Michael Opoku Agyeman and
Ali Ahmadinia and
Alireza Shahrabi Efficient routing techniques in
heterogeneous $3$D Networks-on-Chip . . 389--407
Xiaohang Wang and
Peng Liu and
Mei Yang and
Yingtao Jiang Avoiding request-request type
message-dependent deadlocks in
networks-on-chips . . . . . . . . . . . 408--423
Ashkan Beyranvand Nejad and
Anca Molnos and
Matias Escudero Martinez and
Kees Goossens A hardware/software platform for QoS
bridging over multi-chip NoC-based
systems . . . . . . . . . . . . . . . . 424--441
José M. Andión and
Manuel Arenaz and
Gabriel Rodríguez and
Juan Touriño A novel compiler support for automatic
parallelization on multicore systems . . 442--460
Jiyang Yu and
Peng Liu and
Weidong Wang and
Chunming Huang and
Jie Yang and
Yingtao Jiang and
Qingdong Yao An efficient protocol with
synchronization accelerator for
multi-processor embedded systems . . . . 461--474
Carlos H. González and
Basilio B. Fraguela A framework for argument-based task
synchronization with automatic detection
of dependencies . . . . . . . . . . . . 475--489
Guiyuan Jiang and
Jigang Wu and
Jizhou Sun Efficient reconfiguration algorithms for
communication-aware three-dimensional
processor arrays . . . . . . . . . . . . 490--503
Giovanni Mariani and
Gianluca Palermo and
Vittorio Zaccaria and
Cristina Silvano ARTE: an Application-specific Run-Time
managEment framework for multi-cores
based on queuing models . . . . . . . . 504--519
Jingweijia Tan and
Yang Yi and
Fangyang Shen and
Xin Fu Modeling and characterizing GPGPU
reliability in the presence of soft
errors . . . . . . . . . . . . . . . . . 520--532
Anonymous Editorial Board . . . . . . . . . . . . ??
Marcin Krotkiewski and
Marcin Dabrowski Efficient $3$D stencil computations
using CUDA . . . . . . . . . . . . . . . 533--548
J. Joven and
A. Marongiu and
F. Angiolini and
L. Benini and
G. De Micheli An integrated, programming model-driven
framework for NoC--QoS support in
cluster-based embedded many-cores . . . 549--566
Laiping Zhao and
Yizhi Ren and
Kouichi Sakurai Reliable workflow scheduling with less
resource redundancy . . . . . . . . . . 567--585
Libo Huang and
Nong Xiao and
Zhiying Wang and
Yongwen Wang and
Mingche Lai Efficient multimedia coprocessor with
enhanced SIMD engines for exploiting ILP
and DLP . . . . . . . . . . . . . . . . 586--602
Dimitris Saougkos and
George Manis Self adaptive run time scheduling for
the automatic parallelization of loops
with the C2$ \mu $TC/SL compiler . . . . 603--614
Agustín C. Caminero and
Antonio Robles-Gómez and
Salvador Ros and
Roberto Hernández and
Llanos Tobarra P2P-based resource discovery in dynamic
grids allowing multi-attribute and range
queries . . . . . . . . . . . . . . . . 615--637
Xiaoliang Wan and
Guang Lin Hybrid parallel computing of minimum
action method . . . . . . . . . . . . . 638--651
Anonymous Editorial Board . . . . . . . . . . . . ??
Gregory Tauer and
Rakesh Nagi A map-reduce Lagrangian heuristic for
multidimensional assignment problems
with decomposable costs . . . . . . . . 653--668
G. R. Mudalige and
M. B. Giles and
J. Thiyagalingam and
I. Z. Reguly and
C. Bertolli and
P. H. J. Kelly and
A. E. Trefethen Design and initial performance of a
high-level unstructured mesh framework
on heterogeneous parallel systems . . . 669--692
Javier Navaridas and
Steve Furber and
Jim Garside and
Xin Jin and
Mukaram Khan and
David Lester and
Mikel Luján and
José Miguel-Alonso and
Eustace Painkras and
Cameron Patterson and
Luis A. Plana and
Alexander Rast and
Dominic Richards and
Yebin Shi and
Steve Temple and
Jian Wu and
Shufan Yang SpiNNaker: Fault tolerance in a power-
and area- constrained large-scale
neuromimetic architecture . . . . . . . 693--708
Hameed Hussain and
Saif Ur Rehman Malik and
Abdul Hameed and
Samee Ullah Khan and
Gage Bickler and
Nasro Min-Allah and
Muhammad Bilal Qureshi and
Limin Zhang and
Wang Yongji and
Nasir Ghani and
Joanna Kolodziej and
Albert Y. Zomaya and
Cheng-Zhong Xu and
Pavan Balaji and
Abhinav Vishnu and
Fredric Pinel and
Johnatan E. Pecero and
Dzmitry Kliazovich and
Pascal Bouvry and
Hongxiang Li and
Lizhe Wang and
Dan Chenm and
Ammar Rayes A survey on resource allocation in high
performance distributed computing
systems . . . . . . . . . . . . . . . . 709--736
Hoang-Vu Dang and
Bertil Schmidt CUDA-enabled Sparse Matrix-Vector
Multiplication on GPUs using atomic
operations . . . . . . . . . . . . . . . 737--750
Yong Chen and
Pavan Balaji and
Abhinav Vishnu Special issue on programming models,
systems software, and tools for High-End
Computing . . . . . . . . . . . . . . . 751--752
Wei Tang and
Dongxu Ren and
Zhiling Lan and
Narayan Desai Toward balanced and sustainable job
scheduling for production supercomputers 753--768
Mark Gardner and
Paul Sathre and
Wu-chun Feng and
Gabriel Martinez Characterizing the challenges and
evaluating the efficacy of a
CUDA-to-OpenCL translator . . . . . . . 769--786
Zhiyi Huang and
Kai-Cheung Leung Performance evaluation of View-Oriented
Transactional Memory . . . . . . . . . . 787--801
E. J. Otoo and
Gideon Nimako and
Daniel Ohene-Kwofie Chunked extendible dense arrays for
scientific data storage . . . . . . . . 802--818
Shannon Steinfadt Fine-grained parallel implementations
for SWAMP+ Smith--Waterman alignment . . 819--833
Jie Shen and
Jianbin Fang and
Henk Sips and
Ana Lucia Varbanescu An application-centric evaluation of
OpenCL on multi-core CPUs . . . . . . . 834--850
Hisham Mohamed and
Stéphane Marchand-Maillet MRO-MPI: MapReduce overlapping using MPI
and an optimized data exchange policy 851--866
Omer Erdil Albayrak and
Ismail Akturk and
Ozcan Ozturk Improving application behavior on
heterogeneous manycore systems through
kernel mapping . . . . . . . . . . . . . 867--878
Alexander Reinefeld and
Robert Döbbelin and
Thorsten Schütt Analyzing the performance of SMP memory
allocators with iterative MapReduce
applications . . . . . . . . . . . . . . 879--889
L. Yavits and
A. Morad and
R. Ginosar The effect of communication and
synchronization on Amdahl's law in
multicore systems . . . . . . . . . . . 1--16
Lois Curfman McInnes and
Barry Smith and
Hong Zhang and
Richard Tran Mills Hierarchical Krylov and nested Krylov
methods for extreme-scale computing . . 17--31
Pavan Balaji and
Zhiyi Huang Special issue on programming models and
applications for multicores and
manycores --- Guest Editors'
introduction . . . . . . . . . . . . . . 33--34
Mark Utting and
Min-Hsien Weng and
John G. Cleary The JStar language philosophy . . . . . 35--50
Weihua Sheng and
Stefan Schürmans and
Maximilian Odendahl and
Mark Bertsch and
Vitaliy Volevach and
Rainer Leupers and
Gerd Ascheid A compiler infrastructure for embedded
heterogeneous MPSoCs . . . . . . . . . . 51--68
Vikas and
Nasser Giacaman and
Oliver Sinnen Multiprocessing with GUI-awareness using
OpenMP-like directives in Java . . . . . 69--89
Oded Green and
Yitzhak Birk Scheduling directives: Accelerating
shared-memory many-core processor
execution . . . . . . . . . . . . . . . 90--106
Zhenning Wang and
Long Zheng and
Quan Chen and
Minyi Guo CPU + GPU scheduling with asymptotic
profiling . . . . . . . . . . . . . . . 107--115
Yu Liu and
Kento Emoto and
Zhenjiang Hu A Generate-Test-Aggregate parallel
programming library for systematic
parallel programming . . . . . . . . . . 116--135
Zhijun Hao and
Chenning Xie and
Haibo Chen and
Binyu Zang X10-FT: Transparent fault tolerance for
APGAS language and runtime . . . . . . . 136--156
Mohammad Reza Selim and
Mohammed Ziaur Rahman Carrying on the legacy of imperative
languages in the future parallel
computing era . . . . . . . . . . . . . 1--33
Jean-Yves L'Excellent and
Wissam M. Sid-Lakhdar A study of shared-memory parallelism in
a multifrontal solver . . . . . . . . . 34--46
Urban Borstnik and
Joost VandeVondele and
Valéry Weber and
Jürg Hutter Sparse matrix multiplication: the
distributed block-compressed sparse row
library . . . . . . . . . . . . . . . . 47--58
Yuki Sugimoto and
Fumihiko Ino and
Kenichi Hagihara Improving cache locality for GPU-based
volume rendering . . . . . . . . . . . . 59--69
Ray-Bing Chen and
Yaohung M. Tsai and
Weichung Wang Adaptive block size for dense $ Q R $
factorization in hybrid CPU--GPU systems
via statistical modeling . . . . . . . . 70--85
Michael J. Hallock and
John E. Stone and
Elijah Roberts and
Corey Fry and
Zaida Luthey-Schulten Simulation of reaction diffusion
processes over biologically relevant
size and time scales using multi-GPU
workstations . . . . . . . . . . . . . . 86--99
Ivan Teixidó and
Francesc Sebé and
Josep Conde and
Francesc Solsona MPI-based implementation of an enhanced
algorithm to solve the LPN problem in a
memory-constrained environment . . . . . 100--112
Alberto F. Martín and
Ruymán Reyes and
Rosa M. Badia and
Enrique S. Quintana-Ortí Leveraging task-parallelism in
message-passing dense matrix
factorizations using SMPSs . . . . . . . 113--128
Jose A. Pascual and
Jose Miguel-Alonso and
Jose A. Lozano Application-aware metrics for partition
selection in cube-shaped topologies . . 129--139
Robert Hallberg and
Alistair Adcroft An order-invariant real-to-integer
conversion sum . . . . . . . . . . . . . 140--143
Oscar Peredo and
Julián M. Ortiz and
José R. Herrero and
Cristóbal Samaniego Tuning and hybrid parallelization of a
genetic-based multi-point statistics
simulation code . . . . . . . . . . . . 144--158
Anonymous Editorial Board . . . . . . . . . . . . IFC
Costas Bekas and
Ananth Grama and
Yousef Saad and
Olaf Schenk Parallel matrix algorithms . . . . . . . 159--160
Robert Andrew and
Nicholas Dingle Implementing $ Q R $ factorization
updating algorithms on GPUs . . . . . . 161--172
Yiannis Cotronis and
Elias Konstantinidis and
Maria A. Louka and
Nikolaos M. Missirlis A comparison of CPU and GPU
implementations for solving the
convection diffusion equation using the
local modified SOR method . . . . . . . 173--185
T. Auckenthaler and
T. Huckle and
R. Wittmann A blocked $ Q R $-decomposition for the
parallel symmetric eigenvalue problem 186--194
Hasan Metin Aktulga and
Lin Lin and
Christopher Haine and
Esmond G. Ng and
Chao Yang Parallel eigenvalue calculation based on
multiple shift-invert Lanczos and
contour integral based spectral
projection method . . . . . . . . . . . 195--212
Marc Baboulin and
Dulceneia Becker and
George Bosilca and
Anthony Danalis and
Jack Dongarra An efficient distributed randomized
algorithm for solving large dense
symmetric indefinite linear systems . . 213--223
P. Ghysels and
W. Vanroose Hiding global synchronization latency in
the preconditioned conjugate gradient
algorithm . . . . . . . . . . . . . . . 224--238
Erhan Turan and
Peter Arbenz Large scale micro finite element
analysis of $3$D bone poroelasticity . . 239--250
Michele Martone Efficient multithreaded untransposed,
transposed or symmetric sparse
matrix-vector multiplication with the
Recursive Sparse Blocks format . . . . . 251--270
L. Karlsson and
B. Kågström and
E. Wadbro Fine-grained bulge-chasing kernels for
strongly scalable parallel $ Q R $
algorithms . . . . . . . . . . . . . . . 271--288
J. Langguth and
A. Azad and
M. Halappanavar and
F. Manne On parallel push-relabel based
algorithms for bipartite maximum
matching . . . . . . . . . . . . . . . . 289--308
Jesús Cámara and
Javier Cuenca and
Luis-Pedro García and
Domingo Giménez Auto-tuned nested parallelism: a way to
reduce the execution time of scientific
software in NUMA systems . . . . . . . . 309--327
Emanuel H. Rubensson and
Elias Rudberg Chunks and Tasks: a programming model
for parallelization of dynamic
algorithms . . . . . . . . . . . . . . . 328--343
Anonymous Editorial Board . . . . . . . . . . . . IFC
María Botón-Fernández and
Miguel A. Vega-Rodríguez and
Francisco Prieto Castrillo Self-adaptivity for grid applications.
An Efficient Resources Selection model
based on evolutionary computation
algorithms . . . . . . . . . . . . . . . 345--361
Chihiro Kodama and
Masaaki Terai and
Akira T. Noda and
Yohei Yamada and
Masaki Satoh and
Tatsuya Seiki and
Shin-ichi Iga and
Hisashi Yashiro and
Hirofumi Tomita and
Kazuo Minami Scalable rank-mapping algorithm for an
icosahedral grid system on the massive
parallel computer with a $3$-D torus
network . . . . . . . . . . . . . . . . 362--373
Angeles Navarro and
Rafael Asenjo and
Francisco Corbera and
Antonio J. Dios and
Emilio L. Zapata A case study of different task
implementations for multioutput stages
in non-trivial parallel pipeline
applications . . . . . . . . . . . . . . 374--393
J. Sánchez-Curto and
P. Chamorro-Posada and
G. S. McDonald Efficient parallel implementation of the
nonparaxial beam propagation method . . 394--407
Jie Chen and
Tom L. H. Li and
Mihai Anitescu A parallel linear solver for multilevel
Toeplitz systems with possibly several
right-hand sides . . . . . . . . . . . . 408--424
Roman Wyrzykowski and
Lukasz Szustak and
Krzysztof Rojek Parallelization of $2$D MPDATA EULAG
algorithm on hybrid architectures with
GPU accelerators . . . . . . . . . . . . 425--447
Anonymous Editorial Board . . . . . . . . . . . . IFC
Joao Andrade and
Gabriel Falcao and
Vitor Silva Optimized Fast Walsh--Hadamard Transform
on GPUs for non-binary LDPC decoding . . 449--453
Ehsan Totoni and
Michael T. Heath and
Laxmikant V. Kale Structure-adaptive parallel solution of
sparse triangular linear systems . . . . 454--470
Diego Arroyuelo and
Carolina Bonacic and
Veronica Gil-Costa and
Mauricio Marin and
Gonzalo Navarro Distributed text search using suffix
arrays . . . . . . . . . . . . . . . . . 471--495
Yingchong Situ and
Chandra S. Martha and
Matthew E. Louis and
Zhiyuan Li and
Ahmed H. Sameh and
Gregory A. Blaisdell and
Anastasios S. Lyrintzis Petascale large eddy simulation of jet
engine noise based on the truncated
SPIKE algorithm . . . . . . . . . . . . 496--511
Lucas Mello Schnorr and
Philippe Olivier Alexandre Navaux Best of SBAC--PAD 2012 . . . . . . . . . 512--513
Luiz Ramos and
Ricardo Bianchini Robust performance in hybrid-memory
cooperative caches . . . . . . . . . . . 514--525
Joefon Jann and
R. Sarma Burugula and
Ching-Farn E. Wu and
Kaoutar El Maghraoui Towards an immortal operating system in
virtual environments . . . . . . . . . . 526--535
Esteban Meneses and
Osman Sarood and
Laxmikant V. Kalé Energy profile of rollback-recovery
strategies in high performance computing 536--547
Teo Milanez and
Sylvain Collange and
Fernando Magno Quintão Pereira and
Wagner Meira, Jr. and
Renato Ferreira Thread scheduling and memory coalescing
for dynamic vectorization of SPMD
workloads . . . . . . . . . . . . . . . 548--558
Anonymous Editorial Board . . . . . . . . . . . . IFC
Li Tan and
Shashank Kothapalli and
Longxiang Chen and
Omar Hussaini and
Ryan Bissiri and
Zizhong Chen A survey of power and energy efficient
techniques for high performance
numerical linear algebra operations . . 559--573
Antonio J. Peña and
Carlos Reaño and
Federico Silla and
Rafael Mayo and
Enrique S. Quintana-Ortí and
José Duato A complete and efficient CUDA-sharing
solution for HPC clusters . . . . . . . 574--588
George Teodoro and
Tony Pan and
Tahsin Kurc and
Jun Kong and
Lee Cooper and
Scott Klasky and
Joel Saltz Region templates: Data representation
and management for high-throughput image
analysis . . . . . . . . . . . . . . . . 589--610
Yizhuo Wang and
Yang Zhang and
Yan Su and
Xiaojun Wang and
Xu Chen and
Weixing Ji and
Feng Shi An adaptive and hierarchical task
scheduling scheme for multi-core
clusters . . . . . . . . . . . . . . . . 611--627
Andrew White and
Soo-Young Lee Derivation of optimal input parameters
for minimizing execution time of
matrix-based computations on a GPU . . . 628--645
Nicholas Horelik and
Andrew Siegel and
Benoit Forget and
Kord Smith Monte Carlo domain decomposition for
robust nuclear reactor analysis . . . . 646--660
Leandro A. J. Marzulo and
Tiago A. O. Alves and
Felipe M. G. França and
Vítor Santos Costa Couillard: Parallel programming via
coarse-grained Data-flow Compilation . . 661--680
Philip C. Roth and
Yong Chen Guest Editors' introduction to the
special issue on ``DISCS-2013'' . . . . 681--681
Jesse Weaver and
Vito Giovanni Castellana and
Alessandro Morari and
Antonino Tumeo and
Sumit Purohit and
Alan Chappell and
David Haglin and
Oreste Villa and
Sutanay Choudhury and
Karen Schuchardt and
John Feo Toward a data scalable solution for
facilitating discovery of science
resources . . . . . . . . . . . . . . . 682--696
Jiangling Yin and
Junyao Zhang and
Jun Wang and
Wu-chun Feng SDAFT: a novel scalable data access
framework for parallel BLAST . . . . . . 697--709
Yong Li and
Dan Feng and
Zhan Shi Heterogeneous-aware cache partitioning:
Improving the fairness of shared storage
cache . . . . . . . . . . . . . . . . . 710--721
Joong-Yeon Cho and
Hyun-Wook Jin and
Min Lee and
Karsten Schwan Dynamic core affinity for
high-performance file upload on Hadoop
Distributed File System . . . . . . . . 722--737
P. Coetzee and
M. Leeke and
S. Jarvis Towards unified secure on- and off-line
analytics at scale . . . . . . . . . . . 738--753
Dominique LaSalle and
George Karypis MPI for Big Data: New tricks for an old
dog . . . . . . . . . . . . . . . . . . 754--767
Lan Vu and
Gita Alaghband Novel parallel method for association
rule mining on multi-core shared memory
systems . . . . . . . . . . . . . . . . 768--785
Anonymous Editorial Board . . . . . . . . . . . . IFC
Saiqin Long and
Yuelong Zhao and
Wei Chen and
Yuanbin Tang A prediction-based dynamic file
assignment strategy for parallel file
systems . . . . . . . . . . . . . . . . 1--13
Tassadaq Hussain and
Amna Haider and
Shakaib A. Gursal and
Eduard Ayguadé AMC: Advanced Multi-accelerator
Controller . . . . . . . . . . . . . . . 14--30
Hugo Rito and
João Cachopo Adaptive transaction scheduling for
mixed transactional workloads . . . . . 31--49
Ren Xiaoguang and
Xu Xinhai and
Wang Qian and
Chen Juan and
Wang Miao and
Yang Xuejun GS-DMR: Low-overhead soft error
detection scheme for stencil-based
computation . . . . . . . . . . . . . . 50--65
Dounia Khaldi and
Pierre Jouvelot and
Corinne Ancourt Parallelizing with BDSC, a
resource-constrained scheduling
algorithm for shared and distributed
memory systems . . . . . . . . . . . . . 66--89
Alexandros V. Gerbessiotis Extending the BSP model for multi-core
and out-of-core computing: MBSP . . . . 90--102
Anonymous Editorial Board . . . . . . . . . . . . IFC
Miguel A. Vega-Rodríguez and
David L. González-Álvarez Parallelism in bioinformatics: a view
from different parallelism-based
technologies . . . . . . . . . . . . . . 1--3
Michael Bromberger and
Fabian Nowak and
Wolfgang Karl Combined hardware-software
multi-parallel prefiltering on the
Convey HC-1 for fast homology detection 4--17
Miquel Orobitg and
Fernando Guirado and
Fernando Cores and
Jordi Llados and
Cedric Notredame High performance computing improvements
on bioinformatics consistency-based
multiple sequence alignment tools . . . 18--34
Sérgio E. D. Dias and
Abel J. P. Gomes Triangulating molecular surfaces over a
LAN of GPU-enabled computers . . . . . . 35--47
Romain Vasseur and
Stéphanie Baud and
Luiz Angelo Steffenel and
Xavier Vigouroux and
Laurent Martiny and
Michaël Krajecki and
Manuel Dauchez Inverse docking method for new proteins
targets identification: a parallel
approach . . . . . . . . . . . . . . . . 48--59
Marco Ferretti and
Mirto Musci Geometrical motifs search in proteins: a
parallel approach . . . . . . . . . . . 60--74
Elmar Peise and
Diego Fabregat-Traver and
Paolo Bientinesi High performance solutions for big-data
GWAS . . . . . . . . . . . . . . . . . . 75--87
Gonzalo Martín and
David E. Singh and
Maria-Cristina Marinescu and
Jesús Carretero Towards efficient large scale
epidemiological simulations in EpiGraph 88--102
Anonymous Editorial Board . . . . . . . . . . . . IFC
Daniel Chavarría-Miranda and
Ajay Panyala and
Wenjing Ma and
Adrian Prantl and
Sriram Krishnamoorthy Global transformations for legacy
parallel applications via structural
analysis and rewriting . . . . . . . . . 1--26
Kenli Li and
Jing Liu and
Lanjun Wan and
Shu Yin and
Keqin Li A cost-optimal parallel algorithm for
the $0$--$1$ knapsack problem and its
performance on multicore CPU and GPU
implementations . . . . . . . . . . . . 27--42
Matthias Diener and
Eduardo H. M. Cruz and
Philippe O. A. Navaux and
Anselm Busse and
Hans-Ulrich Heiß Communication-aware process and thread
mapping using online communication
detection . . . . . . . . . . . . . . . 43--63
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Jian Li and
Sen Su and
Xiang Cheng and
Meina Song and
Liyu Ma and
Jie Wang Cost-efficient coordinated scheduling
for leasing cloud resources on hybrid
workloads . . . . . . . . . . . . . . . 1--17
Haifeng Wang and
Yunpeng Cao Predicting power consumption of GPUs
with fuzzy wavelet neural networks . . . 18--36
João V. F. Lima and
Thierry Gautier and
Vincent Danjean and
Bruno Raffin and
Nicolas Maillard Design and analysis of scheduling
strategies for multi-CPU and multi-GPU
architectures . . . . . . . . . . . . . 37--52
J. Iverson and
C. Kamath and
G. Karypis Evaluation of connected-component
labeling algorithms for
distributed-memory systems . . . . . . . 53--68
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Anonymous Best papers from ACM Computing Frontiers
2014 Conference . . . . . . . . . . . . 1
Ali JavadiAbhari and
Shruti Patil and
Daniel Kudrow and
Jeff Heckey and
Alexey Lvov and
Frederic T. Chong and
Margaret Martonosi ScaffCC: Scalable compilation and
analysis of quantum programs . . . . . . 2--17
Vladimir Gajinov and
Srdjan Stipi\'c and
Igor Eri\'c and
Osman S. Unsal and
Eduard Ayguadé and
Adrian Cristal DaSH: a benchmark suite for hybrid
dataflow and shared memory programming
models . . . . . . . . . . . . . . . . . 18--48
Javier Navaridas and
Mikel Luján and
Luis A. Plana and
Steve Temple and
Steve B. Furber SpiNNaker: Enhanced multicast routing 49--66
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Christian Feichtinger and
Johannes Habich and
Harald Köstler and
Ulrich Rüde and
Takayuki Aoki Performance modeling and analysis of
heterogeneous lattice Boltzmann
simulations on CPU--GPU clusters . . . . 1--13
Juan-Antonio Rico-Gallego and
Juan-Carlos Díaz-Martín $ \tau $-Lop: Modeling performance of
shared memory MPI . . . . . . . . . . . 14--31
Javier Prades and
Federico Silla and
Holger Fröning and
Mondrian Nüssle and
José Duato On the design of a new dynamic
credit-based end-to-end flow control
mechanism for HPC clusters . . . . . . . 32--59
Gonzalo Martín and
David E. Singh and
Maria-Cristina Marinescu and
Jesús Carretero Enhancing the performance of malleable
MPI applications by using
performance-aware dynamic
reconfiguration . . . . . . . . . . . . 60--77
Siew Yin Chan and
Teck Chaw Ling and
Eric Aubanel Performance modeling for hierarchical
graph partitioning in heterogeneous
multi-core environment . . . . . . . . . 78--97
Yan Y. Liu and
Shaowen Wang A scalable parallel genetic algorithm
for the Generalized Assignment Problem 98--119
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Aydin Buluç and
Leonid Oliker and
John Gilbert Special issue ``Graph analysis for
scientific discovery'' . . . . . . . . . 1--2
Ahmet Erdem Sariyüce and
Erik Saule and
Kamer Kaya and
Ümit V. Çatalyürek Incremental closeness centrality in
distributed memory . . . . . . . . . . . 3--18
Hao Lu and
Mahantesh Halappanavar and
Ananth Kalyanaraman Parallel heuristics for scalable
community detection . . . . . . . . . . 19--37
James P. Fairbanks and
Ramakrishnan Kannan and
Haesun Park and
David A. Bader Behavioral clusters in dynamic graphs 38--50
George M. Slota and
Kamesh Madduri Parallel color-coding . . . . . . . . . 51--69
Vince Lyzinski and
Daniel L. Sussman and
Donniell E. Fishkind and
Henry Pao and
Li Chen and
Joshua T. Vogelstein and
Youngser Park and
Carey E. Priebe Spectral clustering for
divide-and-conquer graph matching . . . 70--87
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Tatjana Davidovi\'c and
Teodor Gabriel Crainic Parallel Local Search to schedule
communicating tasks on identical
processors . . . . . . . . . . . . . . . 1--14
Ryan E. Grant and
Mohammad J. Rashti and
Pavan Balaji and
Ahmad Afsahi Scalable connectionless RDMA over
unreliable datagrams . . . . . . . . . . 15--39
Ivanoe De Falco and
Umberto Scafuri and
Ernesto Tarantino Mapping of time-consuming multitask
applications on a cloud system by
multiobjective Differential Evolution 40--58
M. Alonso and
S. Coll and
J. M. Martínez and
V. Santonja and
P. López Power consumption management in fat-tree
interconnection networks . . . . . . . . 59--80
Anna Sikora and
Tom\`as Margalef and
Josep Jorba Online root-cause performance analysis
of parallel applications . . . . . . . . 81--107
Peng Zhang and
Ling Liu and
Yuefan Deng A data-driven paradigm for mapping
problems . . . . . . . . . . . . . . . . 108--124
Ryo Asai and
Andrey Vladimirov Intel Cilk Plus for complex parallel
algorithms: ``Enormous Fast Fourier
Transforms'' (EFFT) library . . . . . . 125--142
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Shinya Maeyama and
Tomohiko Watanabe and
Yasuhiro Idomura and
Motoki Nakata and
Masanori Nunami and
Akihiro Ishizawa Improved strong scaling of a
spectral/finite difference gyrokinetic
code for multi-scale plasma turbulence 1--12
Pablo Abad and
Pablo Prieto and
Valentin Puente and
Jose-Angel Gregorio Improving last level shared cache
performance through mobile insertion
policies (MIP) . . . . . . . . . . . . . 13--27
Xiaohua Shi and
Fredrick Park and
Lina Wang and
Jack Xin and
Yingyong Qi Parallelization of a color-entropy
preprocessed Chan--Vese model for face
contour detection on multi-core CPU and
GPU . . . . . . . . . . . . . . . . . . 28--49
S. Weise and
C. Hasse Reducing the memory footprint in Large
Eddy Simulations of reactive flows . . . 50--65
Berenger Bramas and
Olivier Coulaud and
Guillaume Sylvand Time-domain BEM for the wave equation on
distributed-heterogeneous architectures:
a blocking approach . . . . . . . . . . 66--82
Sylvain Collange and
David Defour and
Stef Graillat and
Roman Iakymchuk Numerical reproducibility for the
parallel reduction on multi- and
many-core architectures . . . . . . . . 83--97
Peter Arbenz and
Laura Grigori and
Rolf Krause and
Olaf Schenk Special issue on Parallel Matrix
Algorithms and Applications (PMAA'14) 99--100
I. E. Venetis and
A. Kouris and
A. Sobczyk and
E. Gallopoulos and
A. H. Sameh A direct tridiagonal solver based on
Givens rotations for GPU architectures 101--116
Dominik Göddeke and
Mirco Altenbernd and
Dirk Ribbrock Fault-tolerant finite-element multigrid
algorithms with hierarchically
compressed asynchronous checkpointing 117--135
Vedran Novakovi\'c and
Sanja Singer and
Sasa Singer Blocking and parallelization of the
Hari--Zimmermann variant of the
Falk--Langemeyer algorithm for the
generalized SVD . . . . . . . . . . . . 136--152
Martin Galgon and
Lukas Krämer and
Jonas Thies and
Achim Basermann and
Bruno Lang On the parallel iterative solution of
linear systems arising in the FEAST
algorithm for computing inner
eigenvalues . . . . . . . . . . . . . . 153--163
Ali Dorostkar and
Maya Neytcheva and
Björn Lund Numerical and computational aspects of
some block-preconditioners for saddle
point systems . . . . . . . . . . . . . 164--178
Weifeng Liu and
Brian Vinter Speculative segmented sum for sparse
matrix-vector multiplication on
heterogeneous processors . . . . . . . . 179--193
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Santiago Badia and
Alberto F. Martín and
Javier Principe On the scalability of inexact balancing
domain decomposition by constraints with
overlapped coarse/fine corrections . . . 1--24
Nuno Diegues and
Paolo Romano Self-tuning Intel Restricted
Transactional Memory . . . . . . . . . . 25--52
Eike Hermann Müller and
Robert Scheichl and
Eero Vainikko Petascale solvers for anisotropic PDEs
in atmospheric modelling on GPU clusters 53--69
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Pavan Balaji and
Abhinav Vishnu and
Yong Chen Special Issue on Parallel Programming
Models and Systems Software for High-End
Computing . . . . . . . . . . . . . . . 1--2
Huy Bui and
Eun-Sung Jung and
Venkatram Vishwanath and
Andrew Johnson and
Jason Leigh and
Michael E. Papka Improving sparse data movement
performance using multiple paths on the
Blue Gene/Q supercomputer . . . . . . . 3--16
S. Herbein and
S. McDaniel and
N. Podhorszki and
J. Logan and
S. Klasky and
M. Taufer Performance characterization of
irregular I/O at the extreme scale . . . 17--36
Lu Li and
Usman Dastgeer and
Christoph Kessler Pruning strategies in adaptive off-line
tuning for optimized composition of
components on heterogeneous systems . . 37--45
Antonio J. Peña and
Pavan Balaji A data-oriented profiler to assist in
data partitioning and distribution for
heterogeneous memory in HPC . . . . . . 46--55
Jiangzhou He and
Wenguang Chen and
Zhizhong Tang NestedMP: Enabling cache-aware thread
mapping for nested parallel shared
memory applications . . . . . . . . . . 56--66
Evan Balzuweit and
David P. Bunde and
Vitus J. Leung and
Austin Finley and
Alan C. S. Lee Local search to improve coordinate-based
task mapping . . . . . . . . . . . . . . 67--78
Lucas A. Wilson and
Jeffery von Ronne A task-uncoordinated distributed
dataflow model for scalable high
performance parallel program execution 79--87
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Hariswaran Sitaraman and
Ray Grout Balancing conflicting requirements for
grid and particle decomposition in
continuum-Lagrangian solvers . . . . . . 1--21
Julien Herrmann and
George Bosilca and
Thomas Hérault and
Loris Marchal and
Yves Robert and
Jack Dongarra Assessing the cost of redistribution
followed by a computational kernel:
Complexity and performance results . . . 22--41
T. Weinzierl and
B. Verleye and
P. Henri and
D. Roose Two particle-in-grid realisations on
spacetrees . . . . . . . . . . . . . . . 42--64
Jorge F. Fabeiro and
Diego Andrade and
Basilio B. Fraguela Writing a performance-portable matrix
multiplication . . . . . . . . . . . . . 65--77
Philipp Hupp and
Mario Heene and
Riko Jacob and
Dirk Pflüger Global communication schemes for the
numerical solution of high-dimensional
PDEs . . . . . . . . . . . . . . . . . . 78--105
Xiongwei Fei and
Kenli Li and
Wangdong Yang and
Keqin Li A secure and efficient file protecting
system based on SHA3 and parallel AES 106--132
Dan Ibanez and
Ian Dunn and
Mark S. Shephard Hybrid MPI-thread parallelization of
adaptive mesh operations . . . . . . . . 133--143
Mahmoud Meribout and
Ahmad Firadus A new systolic multiprocessor
architecture for real-time soft
tomography algorithms . . . . . . . . . 144--155
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
M. Llorens and
J. Oliver and
J. Silva and
S. Tamarit Dynamic slicing of concurrent
specification languages . . . . . . . . 1--22
Zhihao Lou and
John Reinitz Parallel simulated annealing using an
adaptive resampling interval . . . . . . 23--31
Michelle Mills Strout and
Alan LaMielle and
Larry Carter and
Jeanne Ferrante and
Barbara Kreaseck and
Catherine Olschanowsky An approach for code generation in the
Sparse Polyhedral Framework . . . . . . 32--57
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Anonymous Preface: 26th International Symposium on
Computer Architecture and High
Performance Computing . . . . . . . . . 1
Michail Alvanos and
Ettore Tiotto and
José Nelson Amaral and
Montse Farreras and
Xavier Martorell Using shared-data localization to reduce
the cost of inspector-execution in
unified-parallel-C programs . . . . . . 2--14
Francis B. Moreira and
Marco A. Z. Alves and
Matthias Diener and
Philippe O. A. Navaux and
Israel Koren A dynamic block-level execution profiler 15--28
Rachata Ausavarungnirun and
Chris Fallin and
Xiangyao Yu and
Kevin Kai-Wei Chang and
Greg Nazario and
Reetuparna Das and
Gabriel H. Loh and
Onur Mutlu A case for hierarchical rings with
deflection routing: an energy-efficient
on-chip communication substrate . . . . 29--45
Marcio Machado Pereira and
Matthew Gaudet and
J. Nelson Amaral and
Guido Araujo Study of hardware transactional memory
characteristics and serialization
policies on Haswell . . . . . . . . . . 46--58
Eduardo H. M. Cruz and
Matthias Diener and
Marco A. Z. Alves and
Laércio L. Pilla and
Philippe O. A. Navaux LAPT: a locality-aware page table for
thread and data mapping . . . . . . . . 59--71
Iván Cores and
Mónica Rodríguez and
Patricia González and
María J. Martín Reducing the overhead of an MPI
application-level migration approach . . 72--82
Olivier Beaumont and
Lionel Eyraud-Dubois and
Juan-Angel Lorenzo-del-Castillo Analyzing real cluster data for
formulating allocation algorithms in
cloud platforms . . . . . . . . . . . . 83--96
José I. Aliaga and
Rosa M. Badia and
Maria Barreda and
Matthias Bollhöfer and
Ernesto Dufrechou and
Pablo Ezzatti and
Enrique S. Quintana-Ortí Exploiting task and data parallelism in
ILUPACK's preconditioned CG solver on
NUMA architectures and many-core
accelerators . . . . . . . . . . . . . . 97--107
Márcio Castro and
Emilio Francesquini and
Fabrice Dupros and
Hideo Aochi and
Philippe O. A. Navaux and
Jean-François Méhaut Seismic wave propagation simulations on
low-power and performance-centric
manycores . . . . . . . . . . . . . . . 108--120
Yun R. Qu and
Viktor K. Prasanna Compact hash tables for decision-trees 121--127
Tuan Tu Tran and
Yongchao Liu and
Bertil Schmidt Bit-parallel approximate pattern
matching: Kepler GPU versus Xeon Phi . . 128--138
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Hank Childs and
Franck Cappello Preface: Visualization and data
analytics for scientific discovery . . . 1
William M. Putman and
Lesley Ott and
Anton Darmenov and
Arlindo daSilva A global perspective of atmospheric
carbon dioxide concentrations . . . . . 2--8
Paris Perdikaris and
Joseph A. Insley and
Leopold Grinberg and
Yue Yu and
Michael E. Papka and
George Em. Karniadakis Visualizing multiphysics,
fluid-structure interaction phenomena in
intracranial aneurysms . . . . . . . . . 9--16
John E. Stone and
Melih Sener and
Kirby L. Vandivort and
Angela Barragan and
Abhishek Singharoy and
Ivan Teo and
João V. Ribeiro and
Barry Isralewitz and
Bo Liu and
Boon Chong Goh and
James C. Phillips and
Craig MacGregor-Chatwin and
Matthew P. Johnson and
Lena F. Kourkoutis and
C. Neil Hunter and
Klaus Schulten Atomic detail visualization of
photosynthetic membranes with
GPU-accelerated ray tracing . . . . . . 17--27
Leigh Orf and
Robert Wilhelmson and
Louis Wicker Visualization of a simulated long-track
EF5 tornado embedded within a supercell
thunderstorm . . . . . . . . . . . . . . 28--34
Christopher Lewis and
Miguel Valenciano and
Charles Cornwell Visualizations of molecular dynamics
simulations of high-performance
polycrystalline structural ceramics . . 35--42
Patrick O'Leary and
James Ahrens and
Sébastien Jourdain and
Scott Wittenburg and
David H. Rogers and
Mark Petersen Cinema image-based in situ analysis and
visualization of MPAS-ocean simulations 43--48
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Stefan Engblom and
Dimitar Lukarski Fast Matlab compatible sparse assembly
on multicore computers . . . . . . . . . 1--17
Souley Madougou and
Ana Varbanescu and
Cees de Laat and
Rob van Nieuwpoort The landscape of GPGPU performance
modeling tools . . . . . . . . . . . . . 18--33
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Oguz Selvitopi and
Cevdet Aykanat Reducing latency cost in $2$D sparse
matrix partitioning models . . . . . . . 1--24
Alejandro Acosta and
Sergio Afonso and
Francisco Almeida Extending Paralldroid with object
oriented annotations . . . . . . . . . . 25--36
Keisuke Tsugane and
Taisuke Boku and
Hitoshi Murai and
Mitsuhisa Sato and
William Tang and
Bei Wang Hybrid-view programming of nuclear
fusion simulation code in the PGAS
parallel programming language XcalableMP 37--51
Ketan Date and
Rakesh Nagi GPU-accelerated Hungarian algorithms for
the Linear Assignment Problem . . . . . 52--72
Sean Wallace and
Zhou Zhou and
Venkatram Vishwanath and
Susan Coghlan and
John Tramm and
Zhiling Lan and
Michael E. Papka Application power profiling on IBM Blue
Gene/Q . . . . . . . . . . . . . . . . . 73--86
Emanuel H. Rubensson and
Elias Rudberg Locality-aware parallel block-sparse
matrix-matrix multiplication using the
Chunks and Tasks programming model . . . 87--106
Vishnu Abhinav and
Andres Marquez and
Dimitris Nikolopoulos Editorial of the Special issue: SI: E2SC 107
Ziming Zhang and
Michael Lang and
Scott Pakin and
Song Fu TracSim: Simulating and scheduling
trapped power capacity to maximize
machine room throughput . . . . . . . . 108--124
Lena Oden and
Benjamin Klenk and
Holger Fröning Analyzing GPU-controlled communication
with dynamic parallelism in terms of
performance and energy . . . . . . . . . 125--134
Peter Arbenz and
Laura Grigori and
Rolf Krause and
Olaf Schenk Special issue on Parallel Matrix
Algorithms and Applications (PMAA'14) 135--136
Radu Popescu and
Michael A. Heroux and
Simone Deparis Parallel subdomain solver strategies for
the algebraic additive Schwarz
preconditioner . . . . . . . . . . . . . 137--153
Lubomír Ríha and
Tomás Brzobohatý and
Alexandros Markopoulos and
Marta Jarosová and
Tomás Kozubek and
David Horák and
Václav Hapla Implementation of the efficient
communication layer for the highly
parallel total FETI and hybrid total
FETI solvers . . . . . . . . . . . . . . 154--166
Karl E. Prikopa and
Wilfried N. Gansterer and
Elias Wimmer Parallel iterative refinement linear
least squares solvers based on
all-reduce operations . . . . . . . . . 167--184
Gemma Sanjuan and
Tom\`as Margalef and
Ana Cortés Applying domain decomposition to wind
field calculation . . . . . . . . . . . 185--196
Bruno Carpentieri and
Jia Liao and
Masha Sosonkina and
Aldo Bonfiglioli and
Sven Baars Using the VBARMS method in parallel
computing . . . . . . . . . . . . . . . 197--211
Martin Köhler and
Jens Saak On GPU acceleration of common solvers
for (quasi-) triangular generalized
Lyapunov equations . . . . . . . . . . . 212--221
Lars Karlsson and
Daniel Kressner and
André Uschmajew Parallel algorithms for tensor
completion in the CP format . . . . . . 222--234
Jean-Guillaume Dumas and
Thierry Gautier and
Clément Pernet and
Jean-Louis Roch and
Ziad Sultan Recursion based parallelization of exact
dense linear algebra routines for
Gaussian elimination . . . . . . . . . . 235--249
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
E. Calore and
A. Gabbana and
J. Kraus and
E. Pellegrini and
S. F. Schifano and
R. Tripiccione Massively parallel lattice-Boltzmann
codes on large GPU clusters . . . . . . 1--24
Michela Taufer and
Pavan Balaji and
Satoshi Matsuoka Special Issue on Cluster Computing . . . 25--26
Khaled Hamidouche and
Akshay Venkatesh and
Ammar Ahmad Awan and
Hari Subramoni and
Ching-Hsiang Chu and
Dhabaleswar K. Panda CUDA-Aware OpenSHMEM: Extensions and
Designs for High Performance OpenSHMEM
on GPU Clusters . . . . . . . . . . . . 27--36
Ashwin M. Aji and
Antonio J. Peña and
Pavan Balaji and
Wu-chun Feng MultiCL: Enabling automatic scheduling
for task-parallel workloads in OpenCL 37--55
Edgar A. León and
Ian Karlin and
Ryan E. Grant and
Matthew Dosanjh Program optimizations: the interplay
between power, performance, and energy 56--75
Jiaan Zeng and
Beth Plale Argus: a Multi-tenancy NoSQL store with
workload-aware resource reservation . . 76--89
Anthony Agelastos and
Benjamin Allan and
Jim Brandt and
Ann Gentile and
Sophia Lefantzi and
Steve Monk and
Jeff Ogden and
Mahesh Rajan and
Joel Stevenson Continuous whole-system monitoring
toward rapid understanding of production
HPC applications and systems . . . . . . 90--106
Zhou Zhou and
Xu Yang and
Dongfang Zhao and
Paul Rich and
Wei Tang and
Jia Wang and
Zhiling Lan I/O-aware bandwidth allocation for
petascale computing systems . . . . . . 107--116
Ariful Azad and
Aydin Buluç A matrix-algebraic formulation of
distributed-memory maximal cardinality
matching algorithms in bipartite graphs 117--130
Jianping Zeng and
Hongfeng Yu A study of graph partitioning schemes
for parallel graph community detection 131--139
Dong Dai and
Philip Carns and
Robert B. Ross and
John Jenkins and
Nicholas Muirhead and
Yong Chen An asynchronous traversal engine for
graph-based rich metadata management . . 140--156
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Oscar Vega-Gisbert and
Jose E. Roman and
Jeffrey M. Squyres Design and implementation of Java
bindings in Open MPI . . . . . . . . . . 1--20
Antonino Tumeo and
John Feo and
Oreste Villa Special Issue on Theory and Practice of
Irregular Applications (TaPIA) . . . . . 21--23
Andrea Marongiu and
Alessandro Capotondi and
Luca Benini Controlling NUMA effects in embedded
manycore applications with lightweight
nested parallelism support . . . . . . . 24--42
Yao Zhu and
David F. Gleich A parallel min-cut algorithm using
iteratively reweighted least squares
targeting at problems with
floating-point edge weights . . . . . . 43--59
Daming Feng and
Andrey N. Chernikov and
Nikos P. Chrisochoides Two-level locality-aware parallel
Delaunay image-to-mesh conversion . . . 60--70
Seher Acer and
Oguz Selvitopi and
Cevdet Aykanat Improving performance of sparse matrix
dense matrix multiplication on
large-scale parallel systems . . . . . . 71--96
Mohammed A. Al Farhan and
Dinesh K. Kaushik and
David E. Keyes Unstructured computational aerodynamics
on many integrated core architecture . . 97--118
J. Gmys and
M. Mezmaz and
N. Melab and
D. Tuyttens A GPU-based Branch-and-Bound algorithm
using Integer--Vector--Matrix data
structure . . . . . . . . . . . . . . . 119--139
Steven C. Rennich and
Darko Stosic and
Timothy A. Davis Accelerating sparse Cholesky
factorization on GPUs . . . . . . . . . 140--150
Cristina Montañola-Sales and
Bhakti S. S. Onggo and
Josep Casanovas-Garcia and
Jose María Cela-Espín and
Adriana Kaplan-Marcusán Approaching parallel computing to
simulating population dynamics in
demography . . . . . . . . . . . . . . . 151--170
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Chen Wang and
Ce Yu and
Shanjiang Tang and
Jian Xiao and
Jizhou Sun and
Xiangfei Meng A general and fast distributed system
for large-scale dynamic programming
applications . . . . . . . . . . . . . . 1--21
Martina Prugger and
Lukas Einkemmer and
Alexander Ostermann Evaluation of the partitioned global
address space (PGAS) model for an
inviscid Euler solver . . . . . . . . . 22--40
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Philip C. Roth and
R. Shane Canon Special Issue on Data-Intensive Scalable
Computing Systems . . . . . . . . . . . 1--2
Wei Xie and
Yong Chen and
Philip C. Roth ASA-FTL: an adaptive separation aware
flash translation layer for solid state
drives . . . . . . . . . . . . . . . . . 3--17
Pengfei Xuan and
Walter B. Ligon and
Pradip K. Srimani and
Rong Ge and
Feng Luo Accelerating big data analytics on HPC
clusters using two-level storage . . . . 18--34
Preeti Malakar and
Venkatram Vishwanath Data movement optimizations for
independent MPI I/O on the Blue Gene/Q 35--51
Francisco Rodrigo Duro and
Javier Garcia Blas and
Florin Isaila and
Jesus Carretero and
Justin M. Wozniak and
Rob Ross Experimental evaluation of a flexible
I/O architecture for accelerating
workflow engines in ultrascale
environments . . . . . . . . . . . . . . 52--67
Huansong Fu and
Haiquan Chen and
Yue Zhu and
Weikuan Yu FARMS: Efficient MapReduce speculation
for failure recovery in short jobs . . . 68--82
Lizhen Shi and
Zhong Wang and
Weikuan Yu and
Xiandong Meng A case study of tuning MapReduce for
efficient bioinformatics in the cloud 83--95
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Amandeep Verma and
Sakshi Kaushal A hybrid multi-objective Particle Swarm
Optimization for scientific workflow
scheduling . . . . . . . . . . . . . . . 1--19
Robert Speck and
Daniel Ruprecht Toward fault-tolerant parallel-in-time
integration with PFASST . . . . . . . . 20--37
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Jiaquan Gao and
Yuanshen Zhou and
Guixia He and
Yifei Xia A multi-GPU parallel optimization model
for the preconditioned conjugate
gradient algorithm . . . . . . . . . . . 1--16
Andrew Giuliani and
Lilia Krivodonova Face coloring in unstructured CFD codes 17--37
Boyu Zhang and
Trilce Estrada and
Pietro Cicotti and
Pavan Balaji and
Michela Taufer Enabling scalable and accurate
clustering of distributed ligand
geometries on supercomputers . . . . . . 38--60
Douglas Otstott and
Latchesar Ionkov and
Michael Lang and
Ming Zhao TCASM: an asynchronous shared memory
interface for high-performance
application composition . . . . . . . . 61--78
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Rupak Biswas and
David Donofrio and
Leonid Oliker High-End Computing for Next-Generation
Scientific Discovery . . . . . . . . . . 1--2
Thomas Bönisch and
Michael Resch and
Thomas Schwitalla and
Matthias Meinke and
Volker Wulfmeyer and
Kirsten Warrach-Sagi Hazel Hen --- leading HPC technology and
its impact on science in Germany and
Europe . . . . . . . . . . . . . . . . . 3--11
Fumiyoshi Shoji Lessons learned from development and
operation of the K computer . . . . . . 12--19
Eric J. Nielsen and
Boris Diskin High-performance aerodynamic
computations for aerospace applications 20--32
Vincent Cavé and
Romain Clédat and
Paul Griffin and
Ankit More and
Bala Seshasayee and
Shekhar Borkar and
Sanjay Chatterjee and
Dave Dunning and
Joshua Fryman Traleika Glacier: a hardware--software
co-designed approach to exascale
computing . . . . . . . . . . . . . . . 33--49
Protonu Basu and
Samuel Williams and
Brian Van Straalen and
Leonid Oliker and
Phillip Colella and
Mary Hall Compiler-based code generation and
autotuning for geometric multigrid on
GPU-accelerated supercomputers . . . . . 50--64
Sébastien Rumley and
Meisam Bahadori and
Robert Polster and
Simon D. Hammond and
David M. Calhoun and
Ke Wen and
Arun Rodrigues and
Keren Bergman Optical interconnects for extreme scale
computing systems . . . . . . . . . . . 65--80
Rupak Biswas and
Zhang Jiang and
Kostya Kechezhi and
Sergey Knysh and
Salvatore Mandr\`a and
Bryan O'Gorman and
Alejandro Perdomo-Ortiz and
Andre Petukhov and
John Realpe-Gómez and
Eleanor Rieffel and
Davide Venturelli and
Fedir Vasko and
Zhihui Wang A NASA perspective on quantum computing:
Opportunities and challenges . . . . . . 81--98
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
S. Cools and
W. Vanroose The communication-hiding pipelined
BiCGstab method for the parallel
solution of large unsymmetric linear
systems . . . . . . . . . . . . . . . . 1--20
Ryuji Yoshida and
Seiya Nishizawa and
Hisashi Yashiro and
Sachiho A. Adachi and
Yousuke Sato and
Tsuyoshi Yamaura and
Hirofumi Tomita CONeP: a cost-effective online nesting
procedure for regional atmospheric
models . . . . . . . . . . . . . . . . . 21--31
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Giovanni Mariani and
Andreea Anghel and
Rik Jongerius and
Gero Dittmann Classification of thread profiles for
scaling application behavior . . . . . . 1--21
Maria Predari and
Aurélien Esnard and
Jean Roman Comparison of initial partitioning
methods for multilevel direct $k$-way
graph partitioning with fixed vertices 22--39
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Milan B. Radulovi\'c and
Sylvain Girbal and
Milo V. Tomasevi\'c Low-level implementation of the SISC
protocol for thread-level speculation on
a multi-core architecture . . . . . . . 1--19
Nicholas Geneva and
Cheng Peng and
Xiaoming Li and
Lian-Ping Wang A scalable interface-resolved simulation
of particle-laden flow using the lattice
Boltzmann method . . . . . . . . . . . . 20--37
Marc Casas and
Greg Bronevetsky Prediction of the impact of network
switch utilization on application
performance via active measurement . . . 38--56
Roberto Peñaranda and
Crispín Gómez and
María Engracia Gómez and
Pedro López XOR-based HoL-blocking reduction routing
mechanisms for direct networks . . . . . 57--74
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Sunita Chandrasekaran and
Antonio J. Peña Special Issue on Topics on Heterogeneous
Computing . . . . . . . . . . . . . . . 1--2
Iman Faraji and
Seyed H. Mirsadeghi and
Ahmad Afsahi Exploiting heterogeneity of
communication channels for efficient GPU
selection on multi-GPU nodes . . . . . . 3--16
Joshua D. Booth and
Nathan D. Ellingwood and
Heidi K. Thornquist and
Sivasankaran Rajamanickam Basker: Parallel sparse $ L U $
factorization utilizing hierarchical
parallelism and data layouts . . . . . . 17--31
Hartwig Anzt and
Mark Gates and
Jack Dongarra and
Moritz Kreutzer and
Gerhard Wellein and
Martin Köhler Preconditioned Krylov solvers on GPUs 32--44
Valeria Cardellini and
Alessandro Fanfarillo and
Salvatore Filippone Coarray-based load balancing on
heterogeneous and many-core
architectures . . . . . . . . . . . . . 45--58
Luis Costero and
Francisco D. Igual and
Katzalin Olcoz and
Sandra Catalán and
Rafael Rodríguez-Sánchez and
Enrique S. Quintana-Ortí Revisiting conventional task schedulers
to exploit asymmetry in multi-core
architectures for dense linear algebra
operations . . . . . . . . . . . . . . . 59--76
John D. Leidel and
Yong Chen HMC-Sim-2.0: a co-design infrastructure
for exploring custom memory cube
operations . . . . . . . . . . . . . . . 77--88
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Hoang-Vu Dang and
Marc Snir and
William Gropp Eliminating contention bottlenecks in
multithreaded MPI . . . . . . . . . . . 1--23
Martin Ruefenacht and
Mark Bull and
Stephen Booth Generalisation of recursive doubling for
AllReduce: Now with simulation . . . . . 24--44
Ana Moreton-Fernandez and
Arturo Gonzalez-Escribano and
Diego R. Llanos A technique to automatically determine
Ad-hoc communication patterns at runtime 45--62
Weiming Lu and
Yaoguang Wang and
Jingyuan Jiang and
Jian Liu and
Yapeng Shen and
Baogang Wei Hybrid storage architecture and
efficient MapReduce processing for
unstructured data . . . . . . . . . . . 63--77
Ramzi Mahmoudi and
Mohamed Akil and
Mohamed Hédi Bedoui Concurrent computation of topological
watershed on shared memory parallel
machines . . . . . . . . . . . . . . . . 78--97
Alexandra Carpen-Amarie and
Sascha Hunold and
Jesper Larsson Träff On expected and observed communication
performance with MPI derived datatypes 98--117
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Jeff Hollingsworth Editorial . . . . . . . . . . . . . . . 1--1
Michael A. Heroux and
C. Kristopher Garrett Special Issue on SC16 Student Cluster
Competition Reproducibility Initiative 3--4
Rainier Ababao and
Joe A. Garcia and
Joseph Voss and
W. Cyrus Proctor and
R. Todd Evans Student Cluster Competition 2016
reproducibility challenge: Genomic
partitioning with ParConnect . . . . . . 5--10
Ying Hao Tan and
Yiyang Shao and
Siyuan Liu and
Bu-Sung Lee Student cluster competition: ParConnect
reproducibility task report . . . . . . 11--17
Marek Baranowski and
Braden Caywood and
Hannah Eyre and
Janaan Lake and
Kevin Parker and
Kincaid Savoie and
Hari Sundar and
Mary Hall Reproducing ParConnect for SC16 . . . . 18--21
Lei Yang and
Yilong Li and
Zhenxin Fu and
Zhuohan Li and
Wenbin Hou and
Haoze Wu and
Xiaolin Wang and
Yun Liang ParConnect reproducibility report . . . 22--26
G. R. Williams and
G. P. Behm and
T. Nguyen and
A. Esparza and
V. G. Haka and
A. Ramos and
B. Wright and
J. C. Otto and
C. P. Paolini and
M. P. Thomas SC16 student cluster competition
challenge: Investigating the
reproducibility of results for the
ParConnect application . . . . . . . . . 27--34
Edward Hutter and
Chung-Ting Huang ParConnect: Results from the student
cluster competition at SC16 . . . . . . 35--40
Alexander Ditter and
Jan Laukemann and
Benedikt Oehlrich Reproducibility report: Team SegFAUlt @
SCC 2016 . . . . . . . . . . . . . . . . 41--45
Maximilian Hornung and
Svilen Stefanov and
David Schneller and
Sharru Mòller Analysis of the ParConnect algorithm ran
on Intel Xeon Phi Knights Landing . . . 46--53
Patrick Flick and
Chirag Jain and
Tony Pan and
Srinivas Aluru Reprint of ``A parallel connectivity
algorithm for de Bruijn graphs in
metagenomic applications'' . . . . . . . 54--65
Anonymous Editorial Board . . . . . . . . . . . . ifc--ifc
Anonymous Reviewer acknowledgement 2017 . . . . . I--II
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Hartwig Anzt and
Thomas K. Huckle and
Jürgen Bräckle and
Jack Dongarra Incomplete Sparse Approximate Inverses
for Parallel Preconditioning . . . . . . 1--22
Minquan Fang and
Jianbin Fang and
Weimin Zhang and
Haifang Zhou and
Jianxing Liao and
Yuangang Wang Benchmarking the GPU memory at the warp
level . . . . . . . . . . . . . . . . . 23--41
Hassan Salehe Matar and
Erdal Mutlu and
Serdar Tasiran and
Didem Unat Output nondeterminism detection for
programming models combining dataflow
with shared memory . . . . . . . . . . . 42--57
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Hongzhi Wang and
Feng Xiong and
Jianing Li and
Shengfei Shi and
Jianzhong Li and
Hong Gao Data management on new processors: A
survey . . . . . . . . . . . . . . . . . 1--13
Joanna Berli\'nska and
Maciej Drozdowski Comparing load-balancing algorithms for
MapReduce under Zipfian data skews . . . 14--28
Junxiong Wang and
Hongzhi Wang and
Chenxu Zhao and
Jianzhong Li and
Hong Gao Iteration acceleration for distributed
learning systems . . . . . . . . . . . . 29--41
Anonymous Foreword for the special issue on the
best papers from the EuroMPI 2016
conference . . . . . . . . . . . . . . . 42--42
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Christos D. Antonopoulos and
Enrique S. Quintana-Ortí Parallel programming for resilience and
energy efficiency . . . . . . . . . . . 1--2
Li Tan and
Nathan DeBardeleben and
Qiang Guan and
Sean Blanchard and
Michael Lang Using virtualization to quantify power
conservation via near-threshold voltage
reduction for inherently resilient
applications . . . . . . . . . . . . . . 3--15
F. Rizzi and
K. Morris and
K. Sargsyan and
P. Mycek and
C. Safta and
O. Le Ma\^ìtre and
O. M. Knio and
B. J. Debusschere Exploring the interplay of resilience
and energy consumption for a task-based
partial differential equations
preconditioner . . . . . . . . . . . . . 16--27
Sandra Catalán and
José R. Herrero and
Enrique S. Quintana-Ortí and
Rafael Rodríguez-Sánchez Energy balance between voltage-frequency
scaling and resilience for linear
algebra routines on low-power multicore
architectures . . . . . . . . . . . . . 28--39
Patrick Judd and
Jorge Albericio and
Tayler Hetherington and
Tor Aamodt and
Natalie Enright Jerger and
Raquel Urtasun and
Andreas Moshovos Proteus: Exploiting precision
variability in deep neural networks . . 40--51
Panos Koutsovasilis and
Christos Kalogirou and
Christos Konstantas and
Manolis Maroudas and
Michalis Spyrou and
Christos D. Antonopoulos AcHEe: Evaluating approximate computing
and heterogeneity for energy efficiency 52--67
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Emmanuel Agullo and
Peter Arbenz and
Luc Giraud and
Olaf Schenk Special issue on parallel matrix
algorithms and applications (PMAA'16) 1--2
Mark Gates and
Stanimire Tomov and
Jack Dongarra Accelerating the SVD two stage
bidiagonal reduction and divide and
conquer using GPUs . . . . . . . . . . . 3--18
Wajih Halim Boukaram and
George Turkiyyah and
Hatem Ltaief and
David E. Keyes Batched $ Q R $ and SVD algorithms on
GPUs with applications in hierarchical
matrix compression . . . . . . . . . . . 19--33
Akira Imakura and
Tetsuya Sakurai Block SS-CAA: A complex moment-based
parallel nonlinear eigensolver using the
block communication-avoiding Arnoldi
procedure . . . . . . . . . . . . . . . 34--48
Chao Chen and
Hadi Pouransari and
Sivasankaran Rajamanickam and
Erik G. Boman and
Eric Darve A distributed-memory hierarchical solver
for general sparse linear systems . . . 49--64
Gustavo Chávez and
George Turkiyyah and
Stefano Zampini and
Hatem Ltaief and
David Keyes Accelerated Cyclic Reduction: A
distributed-memory fast solver for
structured linear systems . . . . . . . 65--83
Mathias Jacquelin and
Lin Lin and
Chao Yang PSelInv --- A distributed memory
parallel algorithm for selected
inversion: The non-symmetric case . . . 84--98
Shaden Smith and
Jongsoo Park and
George Karypis HPC formulations of optimization
algorithms for tensor completion . . . . 99--117
A. Lamas Daviña and
J. E. Roman MPI-CUDA parallel linear solvers for
block-tridiagonal matrices in the
context of SLEPc's eigensolvers . . . . 118--135
Vassilis Kalantzis and
A. Cristiano I. Malossi and
Costas Bekas and
Alessandro Curioni and
Efstratios Gallopoulos and
Yousef Saad A scalable iterative dense linear system
solver for multiple right-hand sides in
data analytics . . . . . . . . . . . . . 136--153
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Daisuke Takahashi Computation of the 100 quadrillionth
hexadecimal digit of $ \pi $ on a
cluster of Intel Xeon Phi processors . . 1--10
Germán Ceballos and
Thomas Grass and
Andra Hugo and
David Black-Schaffer Analyzing performance variation of task
schedulers with TaskInsight . . . . . . 11--27
János Végh Introducing the explicitly
many-processor approach . . . . . . . . 28--40
Henri Casanova and
Julien Herrmann and
Yves Robert Computing the expected makespan of task
graphs in the presence of silent errors 41--60
Beichuan Yan and
Richard A. Regueiro Superlinear speedup phenomenon in
parallel $3$D Discrete Element Method
(DEM) simulations of complex-shaped
particles . . . . . . . . . . . . . . . 61--87
Przemyslaw Spychalski and
Ryszard Arendt Machine Learning in Multi-Agent Systems
using Associative Arrays . . . . . . . . 88--99
Jesper Larsson Träff Practical, distributed, low overhead
algorithms for irregular gather and
scatter collectives . . . . . . . . . . 100--117
Alessandro Fanfarillo and
Davide Del Vento Notified access in coarray-based
hydrodynamics applications on many-core
architectures: Design and performance 118--129
Wayne Joubert and
James Nance and
Deborah Weighill and
Daniel Jacobson Parallel accelerated vector similarity
calculations for genomics applications 130--145
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
David L. González-Álvarez and
Miguel A. Vega-Rodríguez and
Álvaro Rubio-Largo Searching for common patterns on protein
sequences by means of a parallel hybrid
honey-bee mating optimization algorithm 1--17
Sandra Catalán and
José R. Herrero and
Enrique S. Quintana-Ortí and
Rafael Rodríguez-Sánchez Static scheduling of the $ L U $
factorization with look-ahead on
asymmetric multicore processors . . . . 18--27
Antonio Gómez-Iglesias and
Miguel Cárdenas-Montes Performance evaluation of the
three-point angular correlation function 28--41
Alcides Fonseca and
Bruno Cabral Overcoming the No Free Lunch Theorem in
Cut-off Algorithms for Fork--Join
programs . . . . . . . . . . . . . . . . 42--56
Ivy Bo Peng and
Roberto Gioiosa and
Gokcen Kestor and
Jeffrey S. Vetter and
Pietro Cicotti and
Erwin Laure and
Stefano Markidis Characterizing the performance benefit
of hybrid memory system for HPC
applications . . . . . . . . . . . . . . 57--69
Brice Goglin and
Emmanuel Jeannot and
Farouk Mansouri and
Guillaume Mercier Hardware topology management in MPI
applications through hierarchical
communicators . . . . . . . . . . . . . 70--90
Xuechen Zhang and
Song Jiang and
Alseny Diallo and
Lei Wang IR+: Removing parallel I/O interference
of MPI programs via data replication
over heterogeneous storage devices . . . 91--105
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Simon Pickartz and
Carsten Clauss and
Stefan Lankes and
Antonello Monti Revisiting locality-awareness in view of
dynamically changing topologies . . . . 1--18
Srinivasan Ramesh and
Aur\`ele Mahéo and
Sameer Shende and
Allen D. Malony and
Hari Subramoni and
Amit Ruhela and
Dhabaleswar K. (DK) Panda MPI performance engineering with the MPI
tool interface: the integration of
MVAPICH and TAU . . . . . . . . . . . . 19--37
Sergio Rivas-Gomez and
Roberto Gioiosa and
Ivy Bo Peng and
Gokcen Kestor and
Sai Narasimhamurthy and
Erwin Laure and
Stefano Markidis MPI windows on storage for HPC
applications . . . . . . . . . . . . . . 38--56
Kurt B. Ferreira and
Scott Levy and
Kevin Pedretti and
Ryan E. Grant Characterizing MPI matching via
trace-based simulation . . . . . . . . . 57--83
Alejandro Estaña and
Kevin Molloy and
Marc Vaisset and
Nathalie Sibille and
Thierry Siméon and
Pau Bernadó and
Juan Cortés Hybrid parallelization of a multi-tree
path search algorithm: Application to
highly-flexible biomolecules . . . . . . 84--100
Yonggang Che and
Meifang Yang and
Chuanfu Xu and
Yutong Lu Petascale scramjet combustion simulation
on the Tianhe-2 heterogeneous
supercomputer . . . . . . . . . . . . . 101--117
Siyuan Liu and
Meiru Hao and
Bu-Sung Lee Student Cluster Competition 2017, team
Nanyang Technological University:
Reproducing vectorization of the Tersoff
multi-body potential on the Intel
Broadwell architecture . . . . . . . . . 118--124
Sunita Chandrasekaran and
Antonio J. Peña Special issue on applications for the
heterogeneous computing era 2017 . . . . 125--127
James Lin and
Zhigeng Xu and
Linjin Cai and
Akira Nukada and
Satoshi Matsuoka Evaluating the SW26010 many-core
processor with a micro-benchmark suite
for performance optimizations . . . . . 128--143
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Harald Servat and
Jesús Labarta and
Hans-Christian Hoppe and
Judit Giménez and
Antonio J. Peña Understanding memory access patterns
using the BSC performance tools . . . . 1--14
Michael Wolfe and
Seyong Lee and
Jungwon Kim and
Xiaonan Tian and
Rengan Xu and
Barbara Chapman and
Sunita Chandrasekaran The OpenACC data model: Preliminary
study on its major challenges and
implementations . . . . . . . . . . . . 15--27
Zhenxin Fu and
Lei Yang and
Wenbin Hou and
Zhuohan Li and
Yifan Wu and
Yihua Cheng and
Xiaolin Wang and
Yun Liang Student Cluster Competition 2017, Team
Peking University: Reproducing
vectorization of the Tersoff multi-body
potential on the Intel Broadwell
architecture . . . . . . . . . . . . . . 28--32
Mehmet Deveci and
Christian Trott and
Sivasankaran Rajamanickam Multithreaded sparse matrix--matrix
multiplication for many-core and GPU
architectures . . . . . . . . . . . . . 33--46
Ka Cheong Jason Lau and
Yuxuan Li and
Lei Xie and
Qian Xie and
Beichen Li and
Yu Chen and
Guanyu Feng and
Jiping Yu and
Xinjian Yu and
Miao Wang and
Wentao Han and
Jidong Zhai Student cluster competition 2017, team
Tsinghua University: Reproducing
vectorization of the Tersoff multi-body
potential on the Intel Skylake and
NVIDIA Volta architectures . . . . . . . 47--53
Sergio Iserte and
Rafael Mayo and
Enrique S. Quintana-Ortí and
Vicenç Beltran and
Antonio J. Peña DMR API: Improving cluster productivity
by turning applications into malleable 54--66
Z. Marcus and
J. Booth and
C. Bunn and
M. Leger and
S. Hance and
T. Sweeney and
C. McCardwell and
D. Kaeli Student cluster competition 2017, team
Northeastern University: Reproducing
vectorization of the Tersoff multi-body
potential on the NVIDIA V100 . . . . . . 67--71
ChanJung Chang and
YungChing Lin and
YuHsuan Cheng and
YuCheng Wang and
LiYu Yu and
TienChi Yang and
Jerry Chou Student cluster competition 2017, team
NTHU: Reproducing vectorization of the
Tersoff multi-body potential on the
Intel Skylake and Nvidia P100
architecture . . . . . . . . . . . . . . 72--78
Lisa Marie Dreier and
Svilen Stefanov and
David Schneller and
Alexander Ditter SC17 student cluster competition, Team
Technical University of Munich and
Friedrich-Alexander University
Erlangen--Nürnberg: Reproducing
vectorization of the Tersoff multi-body
potential on the Intel Broadwell
architecture . . . . . . . . . . . . . . 79--83
Abdelhalim Amer and
Pavan Balaji and
Zhiyi Huang 8th International Workshop on
Programming Models and Applications for
Multicores and Manycores (PMAM'17) . . . 84--84
Pedro Alonso and
Sandra Catalán and
José R. Herrero and
Enrique S. Quintana-Ortí and
Rafael Rodríguez-Sánchez Two-sided orthogonal reductions to
condensed forms on asymmetric multicore
processors . . . . . . . . . . . . . . . 85--100
Xuhao Chen and
Cheng Chen and
Jie Shen and
Jianbin Fang and
Tao Tang and
Canqun Yang and
Zhiying Wang Orchestrating parallel detection of
strongly connected components on GPUs 101--114
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Janaan Lake and
Qixiang Chao and
Hannah Eyre and
Emerson Ford and
Kevin Parker and
Kincaid Savoie Student Cluster Competition 2017, Team
University of Utah: Reproducing
Vectorization of the Tersoff Multi-Body
Potential on the Intel Broadwell and
Intel Skylake Platforms . . . . . . . . 1--8
Ralph H. Castain and
Joshua Hursey and
Aurelien Bouteiller and
David Solt PMIx: Process management for exascale
environments . . . . . . . . . . . . . . 9--29
James Sullivan and
Collin Weir and
Austin Reichert and
R. Todd Evans and
W. Cyrus Proctor and
Nicolas Thorne Student cluster competition 2017, Team
University of Texas at Austin/Texas
State University: Reproducing
vectorization of the Tersoff multi-body
potential on the Intel Skylake and
NVIDIA V100 architectures . . . . . . . 30--35
Yuan Yirang and
Chang Luo and
Li Changfeng and
Sun Tongjun Domain decomposition modified with
characteristic mixed finite element of
compressible oil-water seepage
displacement and its numerical analysis 36--47
C. Kris Garrett and
Stephen Lien Harrell and
Michael A. Heroux Special Issue on SCC'17 Reproducibility
Initiative . . . . . . . . . . . . . . . 48--49
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Maciej Paszy\'nski and
Leszek Siwik and
Maciej Wo\'zniak Concurrency of three-dimensional refined
isogeometric analysis . . . . . . . . . 1--22
Mario Badr and
Natalie Enright Jerger A high-level model for exploring
multi-core architectures . . . . . . . . 23--35
Sebastian Eibl and
Ulrich Rüde A Local Parallel Communication Algorithm
for Polydisperse Rigid Body Dynamics . . 36--48
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
I. Masliah and
A. Abdelfattah and
A. Haidar and
S. Tomov and
M. Baboulin and
J. Falcou and
J. Dongarra Algorithms and optimization techniques
for high-performance matrix--matrix
multiplications of very small matrices 1--21
Andrés E. Tomás and
Rafael Rodríguez-Sánchez and
Sandra Catalán and
Rocío Carratalá-Sáez and
Enrique S. Quintana-Ortí Dynamic look-ahead in the reduction to
band form for the singular value
decomposition . . . . . . . . . . . . . 22--31
Daniel J. Holmes and
Bradley Morgan and
Anthony Skjellum and
Purushotham V. Bangalore and
Srinivas Sridharan Planning for performance: Enhancing
achievable performance for MPI through
persistent collective operations . . . . 32--57
Alessandro Fanfarillo and
Sudip Kumar Garain and
Dinshaw Balsara and
Daniel Nagle Resilient computational applications
using Coarray Fortran . . . . . . . . . 58--67
Chaitanya Talnikar and
Qiqi Wang A two-level computational graph method
for the adjoint of a finite volume based
compressible unsteady flow solver . . . 68--84
Yanhua Cao and
Li Lu and
Jiadi Yu and
Shiyou Qian and
Yanmin Zhu and
Minglu Li Online cost-rejection rate scheduling
for resource requests in hybrid clouds 85--103
J. W. Buurlage and
R. H. Bisseling and
K. J. Batenburg A geometric partitioning method for
distributed tomographic reconstruction 104--121
Remko van Wagensveld and
Tobias Wägemann and
Ralph Mader and
Ramin Tavakoli Kolagari and
Ulrich Margull Evaluation and modeling of the supercore
parallelization pattern in automotive
real-time systems . . . . . . . . . . . 122--130
Hartwig Anzt and
Jack Dongarra and
Goran Flegar and
Enrique S. Quintana-Ortí Variable-size batched Gauss--Jordan
elimination for block-Jacobi
preconditioning on graphics processors 131--146
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Abhinav Vishnu and
Pavan Balaji and
Yong Chen Guest Editor's Introduction: P2S2: SI
2016 . . . . . . . . . . . . . . . . . . 1--2
Neda Tavakoli and
Dong Dai and
Yong Chen Client-side straggler-aware I/O
scheduler for object-based parallel file
systems . . . . . . . . . . . . . . . . 3--18
Hiroshi Yoritaka and
Ken Matsui and
Masahiro Yasugi and
Tasuku Hiraishi and
Seiji Umatani Probabilistic guards: a mechanism for
increasing the granularity of
work-stealing programs . . . . . . . . . 19--36
Jason Mair and
Zhiyi Huang and
David Eyers Manila: Using a densely populated
PMC-space for power modelling within
large-scale systems . . . . . . . . . . 37--56
Xing Fan and
Oliver Sinnen and
Nasser Giacaman Supporting asynchronization in OpenMP
for event-driven programming . . . . . . 57--74
Dong Dai and
Forrest Sheng Bao and
Jiang Zhou and
Xuanhua Shi and
Yong Chen Vectorizing disks blocks for efficient
storage system via deep learning . . . . 75--90
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Pavan Balaji and
Abhinav Vishnu and
Yong Chen Foreword to the special issue for the
Workshop on Parallel Programming Models
and Systems Software for High-End
Computing (P2S2 2017) . . . . . . . . . 1--2
Juan J. Durillo and
Philipp Gschwandtner and
Klaus Kofler and
Thomas Fahringer Multi-Objective region-Aware
optimization of parallel programs . . . 3--21
Sai Narasimhamurthy and
Nikita Danilov and
Sining Wu and
Ganesan Umanesan and
Stefano Markidis and
Sergio Rivas-Gomez and
Ivy Bo Peng and
Erwin Laure and
Dirk Pleiter and
Shaun de Witt SAGE: Percipient Storage for Exascale
Data Centric Computing . . . . . . . . . 22--33
Zixi Quan and
Volker Haarslev A parallel computing architecture for
high-performance OWL reasoning . . . . . 34--46
Loris Marchal and
Erik Saule and
Oliver Sinnen Special Issue Proposal for the
\booktitleParallel Computing Journal:
HeteroPar 2016 and HCW 2016 Workshops 47--47
Dylan Machovec and
Bhavesh Khemka and
Nirmal Kumbhare and
Sudeep Pasricha and
Anthony A. Maciejewski and
Howard Jay Siegel and
Ali Akoglu and
Gregory A. Koenig and
Salim Hariri and
Cihan Tunc and
Michael Wright and
Marcia Hilton and
Rajendra Rambharos and
Christopher Blandin and
Farah Fargo and
Ahmed Louri and
Neena Imam Utility-based resource management in an
oversubscribed energy-constrained
heterogeneous environment executing
parallel applications . . . . . . . . . 48--72
T. Cojean and
A. Guermouche and
A. Hugo and
R. Namyst and
P. A. Wacrenier Resource aggregation for task-based
Cholesky Factorization on top of modern
architectures . . . . . . . . . . . . . 73--92
João Guerreiro and
Aleksandar Ilic and
Nuno Roma and
Pedro Tomás DVFS-aware application classification to
improve GPGPUs energy efficiency . . . . 93--117
Julio Proaño and
Carmen Carrión and
Blanca Caminero Empirical modeling and simulation of an
heterogeneous Cloud computing
environment . . . . . . . . . . . . . . 118--134
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Nawrin Sultana and
Martin Rüfenacht and
Anthony Skjellum and
Ignacio Laguna and
Kathryn Mohror Failure recovery for bulk synchronous
applications with MPI stages . . . . . . 1--14
Wayne Joubert and
James Nance and
Sharlee Climer and
Deborah Weighill and
Daniel Jacobson Parallel accelerated Custom Correlation
Coefficient calculations for genomics
applications . . . . . . . . . . . . . . 15--23
Javier López-Gómez and
Javier Fernández Muñoz and
David del Rio Astorga and
Manuel F. Dolz and
J. Daniel Garcia Exploring stream parallel patterns in
distributed MPI environments . . . . . . 24--36
Anton Shterenlikht and
Luis Cebamanos MPI vs Fortran coarrays beyond 100k
cores: $3$D cellular automata . . . . . 37--49
Pedro Valero-Lara and
Raül Sirvent and
Antonio J. Peña and
Jesús Labarta MPI + OpenMP tasking scalability for
multi-morphology simulations of the
human brain . . . . . . . . . . . . . . 50--61
William Gropp and
Rajeev Thakur Guest Editor's introduction: Special
issue on best papers from EuroMPI/USA
2017 . . . . . . . . . . . . . . . . . . 62--62
Scott Levy and
Kurt B. Ferreira and
Whit Schonbein and
Ryan E. Grant and
Matthew G. F. Dosanjh Using simulation to examine the effect
of MPI message matching costs on
application performance . . . . . . . . 63--74
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Valentin Le F\`evre and
Thomas Herault and
Yves Robert and
Aurelien Bouteiller and
Atsushi Hori and
George Bosilca and
Jack Dongarra Comparing the performance of rigid,
moldable and grid-shaped applications on
failure-prone HPC platforms . . . . . . 1--12
Amit Ruhela and
Hari Subramoni and
Sourav Chakraborty and
Mohammadreza Bayatpour and
Pouya Kousha and
Dhabaleswar K. (DK) Panda Efficient design for MPI asynchronous
progress without dedicated resources . . 13--26
Joel Mat\vejka and
Björn Forsberg and
Michal Sojka and
P\vremysl \vS\rucha and
Luca Benini and
Andrea Marongiu and
Zden\vek Hanzálek Combining PREM compilation and static
scheduling for high-performance and
predictable MPSoC execution . . . . . . 27--44
Michael Gowanlock and
Ben Karsin A hybrid CPU/GPU approach for optimizing
sorting throughput . . . . . . . . . . . 45--55
Martin Schreiber and
Nathanaël Schaeffer and
Richard Loft Exponential integrators with
parallel-in-time rational approximations
for the shallow-water equations on the
rotating sphere . . . . . . . . . . . . 56--65
Zhuo Liu and
Amit Kumar Nath and
Xiaoning Ding and
Huansong Fu and
Md. Muhib Khan and
Weikuan Yu Multivariate modeling and two-level
scheduling of analytic queries . . . . . 66--78
José I. Aliaga and
Ernesto Dufrechou and
Pablo Ezzatti and
Enrique S. Quintana-Ortí Accelerating the task/data-parallel
version of ILUPACK's BiCG in
multi-CPU/GPU configurations . . . . . . 79--87
P.-H. Tournier and
I. Aliferis and
M. Bonazzoli and
M. de Buhan and
M. Darbas and
V. Dolean and
F. Hecht and
P. Jolivet and
I. El Kanfoud and
C. Migliaccio and
F. Nataf and
Ch. Pichot and
S. Semenov Microwave tomographic imaging of
cerebrovascular accidents by using
high-performance computing . . . . . . . 88--97
William D. Gropp Using node and socket information to
implement MPI Cartesian topologies . . . 98--108
Liyang Xu and
Xiaoguang Ren and
Qian Wang and
Xinhai Xu and
Xuejun Yang Full-neighbor-list based numerical
reproducibility method for parallel
molecular dynamics simulations . . . . . 109--118
Marc-André Hermanns and
Nathan T. Hjelm and
Michael Knobloch and
Kathryn Mohror and
Martin Schulz The MPI\_T events interface: an early
evaluation and overview of the interface 119--130
Angelika Schwarz and
Lars Karlsson Scalable eigenvector computation for the
non-symmetric eigenvalue problem . . . . 131--140
Ammar Ahmad Awan and
Karthik Vadambacheri Manian and
Ching-Hsiang Chu and
Hari Subramoni and
Dhabaleswar K. Panda Optimized large-message broadcast for
deep learning workloads: MPI, MPI +
NCCL, or NCCL2? . . . . . . . . . . . . 141--152
Kevin Sala and
Xavier Teruel and
Josep M. Perez and
Antonio J. Peña and
Vicenç Beltran and
Jesus Labarta Integrating blocking and non-blocking
MPI primitives with task-based
programming models . . . . . . . . . . . 153--166
P. K\rus and
A. Marek and
S. S. Köcher and
H.-H. Kowalski and
C. Carbogno and
Ch. Scheurer and
K. Reuter and
M. Scheffler and
H. Lederer Optimizations of the eigensolvers in the
ELPA library . . . . . . . . . . . . . . 167--177
Aristeidis Mastoras and
Thomas R. Gross Load-balancing for load-imbalanced
fine-grained linear pipelines . . . . . 178--189
Millad Ghane and
Sunita Chandrasekaran and
Margaret S. Cheung pointerchain: Tracing pointers to their
roots --- A case study in molecular
dynamics simulations . . . . . . . . . . 190--203
Julien Adam and
Maxime Kermarquer and
Jean-Baptiste Besnard and
Leonardo Bautista-Gomez and
Marc Pérache and
Patrick Carribault and
Julien Jaeger and
Allen D. Malony and
Sameer Shende Checkpoint/restart approaches for a
thread-based MPI runtime . . . . . . . . 204--219
Qiao Kang and
Jesper Larsson Träff and
Reda Al-Bahrani and
Ankit Agrawal and
Alok Choudhary and
Wei-keng Liao Scalable Algorithms for MPI Intergroup
Allgather and Allgatherv . . . . . . . . 220--230
Shi Sha and
Wujie Wen and
Gustavo A. Chaparro-Baquero and
Gang Quan Thermal-constrained energy efficient
real-time scheduling on multi-core
platforms . . . . . . . . . . . . . . . 231--242
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Dumitrel Loghin and
Yong Meng Teo The time and energy efficiency of modern
multicore systems . . . . . . . . . . . 1--13
Pavan Balaji and
Marc Casas Special issue on the Message Passing
Interface . . . . . . . . . . . . . . . 14--15
S. Cools Analyzing and improving maximal
attainable accuracy in the communication
hiding pipelined BiCGStab method . . . . 16--35
Thanh-Dang Diep and
Kien Trung Pham and
Karl Fürlinger and
Nam Thoai A time-stamping system to detect memory
consistency errors in MPI one-sided
applications . . . . . . . . . . . . . . 36--44
Ludovic A. R. Capelli and
Zhenjiang Hu and
Timothy A. K. Zakian and
Nick Brown and
J. Mark Bull iPregel: Vertex-centric programmability
vs memory efficiency and performance,
why choose? . . . . . . . . . . . . . . 45--56
Shengguo Li and
Jie Liu and
Yunfei Du A high performance implementation of
Zolo-SVD algorithm on distributed memory
systems . . . . . . . . . . . . . . . . 57--65
Ichitaro Yamazaki and
Edmond Chow and
Aurelien Bouteiller and
Jack Dongarra Performance of asynchronous optimized
Schwarz with one-sided communication . . 66--81
George Matheou and
Vassos Soteriou and
Paraskevas Evripidou Toward data-driven architectural support
in improving the performance of future
HPC architectures . . . . . . . . . . . 82--106
Anonymous Editorial Board . . . . . . . . . . . . ii--ii
Hajime Fujita and
Chongxiao Cao and
Sayantan Sur and
Charles Archer and
Erik Paulson and
Maria Garzaran Efficient implementation of MPI-3 RMA
over openFabrics interfaces . . . . . . 1--10
Shaolong Chen and
Miquel Angel Senar Exploring efficient data parallelism for
genome read mapping on multicore and
manycore architectures . . . . . . . . . 11--24
Weihao Liang and
Yong Chen and
Jialin Liu and
Hong An CARS: a contention-aware scheduler for
efficient resource management of HPC
storage systems . . . . . . . . . . . . 25--34
Olga Pearce Exploring utilization options of
heterogeneous architectures for
multi-physics simulations . . . . . . . 35--45
Ting Yu and
Mengchi Liu A memory efficient maximal clique
enumeration method for sparse graphs
with a parallel implementation . . . . . 46--59
Jeffrey S. Young and
Eric Hein and
Srinivas Eswar and
Patrick Lavin and
Jiajia Li and
Jason Riedy and
Richard Vuduc and
Tom Conte A microbenchmark characterization of the
Emu Chick . . . . . . . . . . . . . . . 60--69
Fang Zhou and
Song Wu and
Youchuang Jia and
Xiang Gao and
Hai Jin and
Xiaofei Liao and
Pingpeng Yuan VAIL: a Victim-Aware Cache Policy to
improve NVM Lifetime for hybrid memory
system . . . . . . . . . . . . . . . . . 70--76
J. Austin Ellis and
Thomas M. Evans and
Steven P. Hamilton and
C. T. Kelley and
Tara M. Pandya Optimization of processor allocation for
domain decomposed Monte Carlo
calculations . . . . . . . . . . . . . . 77--86
Guanghui Zhu and
Chen Guo and
Le Lu and
Zhi Huang and
Chunfeng Yuan and
Rong Gu and
Yihua Huang DGST: Efficient and scalable suffix tree
construction on distributed
data-parallel platforms . . . . . . . . 87--102
Min Si and
Zhiyi Huang and
Pavan Balaji International workshop on programming
models and applications for multicores
and manycores (PMAM 2018) . . . . . . . ??
Michael Rippl and
Bruno Lang and
Thomas Huckle Parallel eigenvalue computation for
banded generalized eigenvalue problems ??
Jean M. Favre and
Alexander Blass A comparative evaluation of three volume
rendering libraries for the
visualization of sheared thermal
convection . . . . . . . . . . . . . . . ??
Reuben D. Budiardja and
Christian Y. Cardall Targeting GPUs with OpenMP directives on
Summit: a simple and effective Fortran
experience . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous Publisher's Note . . . . . . . . . . . . ??
Jose Monsalve Diaz and
Kyle Friedline and
Swaroop Pophale and
Oscar Hernandez and
David E. Bernholdt and
Sunita Chandrasekaran Analysis of OpenMP 4.5 Offloading in
Implementations: Correctness and
Overhead . . . . . . . . . . . . . . . . ??
S. Mahdieh Ghazimirsaeed and
Ryan E. Grant and
Ahmad Afsahi A dynamic, unified design for dedicated
message matching engines for collective
and point-to-point communications . . . ??
Anton G. Artemov and
Elias Rudberg and
Emanuel H. Rubensson Parallelization and scalability analysis
of inverse factorization using the
chunks and tasks programming model . . . ??
Min Si and
Abhinav Vishnu and
Yong Chen Parallel programming models and systems
software for high-end computing (P2S2
2018) . . . . . . . . . . . . . . . . . ??
Alexander Heinecke and
Alexander Breuer and
Yifeng Cui Tensor-optimized hardware accelerates
fused discontinuous Galerkin simulations ??
Xinzhe Wu and
Serge G. Petiton A distributed and parallel asynchronous
unite and conquer method to solve large
scale non-Hermitian linear systems with
multiple right-hand sides . . . . . . . ??
Dimitris Palyvos-Giannas and
Vincenzo Gulisano and
Marina Papatriantafilou GeneaLog: Fine-grained data streaming
provenance in cyber-physical systems . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous November 2019 . . . . . . . . . . . . . ??
Yusuke Nagasaka and
Satoshi Matsuoka and
Ariful Azad and
Aydin Buluç Performance optimization, modeling and
analysis of sparse matrix-matrix
products on multi-core and many-core
processors . . . . . . . . . . . . . . . ??
Bian Wu and
Weiliang Heng and
Bu-Sung Lee Student Cluster Competition 2018, Team
Nanyang Technological University:
Reproducing performance of a
Multi-Physics Simulations of the
Tsunamigenic 2004 Sumatra Megathrust
Earthquake on the Intel Skylake
architecture . . . . . . . . . . . . . . ??
Nicole Brewer and
HyeJin Kim and
Claudia Li and
Heidi Anderson and
Jessica Lanum and
Jia Cheoh and
Betsy Hillery and
Trinity Overmyer Student cluster competition 2018, team
Ada Six of Purdue University:
Reproducing Extreme Scale Multi-Physics
Simulations of Tsunamigenic 2004 Sumatra
Megathrust Earthquake on Intel Skylake
architecture . . . . . . . . . . . . . . ??
Thaddeus Koehn and
Peter Athanas Data staging for efficient high
throughput stream processing . . . . . . ??
Julia Bazi\'nska and
Maciej Korpalski and
Maciej Szpindler Student Cluster Competition 2018, Team
University of Warsaw, University of
Wroclaw, Warsaw University of
Technology: Reproducing performance of a
multi-physics simulations of the
Tsunamigenic 2004 Sumatra megathrust
earthquake on the Intel Skylake
architecture . . . . . . . . . . . . . . ??
C. Bunn and
H. Barclay and
A. Lazarev and
F. Yusuf and
J. Fitch and
J. Booth and
K. Shivdikar and
D. Kaeli Student cluster competition 2018, team
northeastern university: Reproducing
performance of a multi-physics
simulations of the Tsunamigenic 2004
Sumatra Megathrust earthquake on the AMD
EPYC 7551 architecture . . . . . . . . . ??
ShaoFu Lin and
ChiChen Yang and
YuHsuan Cheng and
KengJui Hsu and
HungHsin Chen and
YuanChing Lin and
Jerry Chou Student Cluster Competition 2018, team
NTHU: Reproducing performance of
multi-physics simulations of the
tsunamigenic 2004 Sumatra megathrust
earthquake on the Intel Skylake
architecture . . . . . . . . . . . . . . ??
Jiaao He and
Chenggang Zhao and
Jiping Yu and
Xinjian Yu and
Liyan Zheng and
Chenyao Lou and
Shizhi Tang and
Wentao Han and
Jidong Zhai Student Cluster Competition 2018, Team
Tsinghua University: Reproducing
performance of multi-physics simulations
of the Tsunamigenic 2004 Sumatra
megathrust earthquake on the Intel
Skylake Architecture . . . . . . . . . . ??
Hai Ah Nam and
Elsa Gonsiorowski and
Scott Michael Special Issue on the SC'18 Student
Cluster Competition Reproducibility
Initiative . . . . . . . . . . . . . . . ??
Mateusz Starzec and
Grazyna Starzec and
Aleksander Byrski and
Wojciech Turek Distributed ant colony optimization
based on actor model . . . . . . . . . . ??
Afshin Zafari and
Elisabeth Larsson and
Martin Tillenius DuctTeip: an efficient programming model
for distributed task-based parallel
computing . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous December 2019 . . . . . . . . . . . . . ??
Antonio J. Pena and
Min Si Guest editorial: Special Issue on
Applications and System Software for
Hybrid Exascale Systems . . . . . . . . ??
Vasco Amaral and
Beatriz Norberto and
Miguel Goulão and
Marco Aldinucci and
Siegfried Benkner and
Andrea Bracciali and
Paulo Carreira and
Edgars Celms and
Lu\'ìs Correia and
Clemens Grelck and
Helen Karatza and
Christoph Kessler and
Peter Kilpatrick and
Hugo Martiniano and
Ilias Mavridis and
Sabri Pllana and
Ana Resp\'ìcio and
José Simão and
Lu\'ìs Veiga and
Ari Visa Programming languages for data-Intensive
HPC applications: a systematic mapping
study . . . . . . . . . . . . . . . . . ??
Yu Huang and
Kai Gong and
Eric Mercer An efficient algorithm for match pair
approximation in message passing . . . . ??
Tobias Ribizel and
Hartwig Anzt Parallel selection on GPUs . . . . . . . ??
Valeriy Manin and
Bruno Lang Cannon-type triangular matrix
multiplication for the reduction of
generalized HPD eigenproblems to
standard form . . . . . . . . . . . . . ??
Vianney Kengne Tchendji and
Armel Nkonjoh Ngomade and
Jerry Lacmou Zeutouo and
Jean Frédéric Myoupo Efficient CGM-based parallel algorithms
for the longest common subsequence
problem with multiple
substring-exclusion constraints . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous March 2020 . . . . . . . . . . . . . . . ??
Takeshi Terao and
Katsuhisa Ozaki and
Takeshi Ogita $ L U$-Cholesky $ Q R$ algorithms for
thin $ Q R$ decomposition . . . . . . . ??
Md Maruf Hussain and
Noriyuki Fujimoto GPU-based parallel multi-objective
particle swarm optimization for large
swarms and high dimensional problems . . ??
Massimo Bernaschi and
Pasqua D'Ambra and
Dario Pasquini AMG based on compatible weighted
matching for GPUs . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous April 2020 . . . . . . . . . . . . . . . ??
Jakub Kruzik and
David Horak and
Vaclav Hapla and
Martin Cermak Comparison of selected FETI coarse space
projector implementation strategies . . ??
Carlos Junqueira-Junior and
João Luiz F. Azevedo and
Jairo Panetta and
William R. Wolf and
Sami Yamouni On the scalability of CFD tool for
supersonic jet flow configurations . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous May 2020 . . . . . . . . . . . . . . . . ??
Hitoshi Murai and
Mitsuhisa Sato Design and evaluation of efficient
global data movement in partitioned
global address space . . . . . . . . . . ??
Bengisu Elis and
Dai Yang and
Olga Pearce and
Kathryn Mohror and
Martin Schulz QMPI: a next generation MPI profiling
interface for modern HPC platforms . . . ??
Carolin Penke and
Andreas Marek and
Christian Vorwerk and
Claudia Draxl and
Peter Benner High performance solution of
skew-symmetric eigenvalue problems with
applications in solving the
Bethe--Salpeter eigenvalue problem . . . ??
Timon E. Knigge and
Rob H. Bisseling An improved exact algorithm and an
NP-completeness proof for sparse matrix
bipartitioning . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous August 2020 . . . . . . . . . . . . . . ??
Seiya Watanabe and
Takayuki Aoki and
Tomohiro Takaki A domain partitioning method using a
multi-phase-field model for block-based
AMR applications . . . . . . . . . . . . ??
JunKyu Lee and
Gregory D. Peterson and
Dimitrios S. Nikolopoulos and
Hans Vandierendonck AIR: Iterative refinement acceleration
using arbitrary dynamic precision . . . ??
Jaume Bosch and
Carlos Álvarez and
Daniel Jiménez-González and
Xavier Martorell and
Eduard Ayguadé Asynchronous runtime with distributed
manager for task-based programming
models . . . . . . . . . . . . . . . . . ??
Dunwei Gong and
Tian Tian and
Jinxin Wang and
Ying Du and
Zheng Li A novel method of grouping target paths
for parallel programs . . . . . . . . . ??
Shardul Natu and
Ketan Date and
Rakesh Nagi GPU-accelerated Lagrangian heuristic for
multidimensional assignment problems
with decomposable costs . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2020 . . . . . . . . . . . . . ??
Nusrat Sharmin Islam and
Gengbin Zheng and
Sayantan Sur and
Akhil Langer and
Maria Garzaran Minimizing the usage of hardware
counters for collective communication
using triggered operations . . . . . . . ??
Rodolfo Pereira Araujo and
Igor Machado Coelho and
Leandro Augusto Justen Marzulo A multi-improvement local search using
dataflow and GPU to solve the minimum
latency problem . . . . . . . . . . . . ??
Yi Zhou and
Yuanqi Chen and
Shubbhi Taneja and
Ajit Chavan and
Xiao Qin and
Jifu Zhang ThermoBench: a thermal efficiency
benchmark for clusters in data centers ??
Cuong M. Nguyen and
Philip J. Rhodes Delaunay triangulation of large-scale
datasets using two-level parallelism . . ??
Jian Xiao and
Min Long and
Ce Yu and
Xin Zhou and
Li Ji Performance optimization of
non-equilibrium ionization simulations
from MapReduce and GPU acceleration . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Marco Aldinucci and
Valeria Cardellini and
Gabriele Mencagli and
Massimo Torquati Data stream processing in HPC systems:
New frameworks and architectures for
high-frequency streaming . . . . . . . . ??
Anonymous October 2020 . . . . . . . . . . . . . . ??
Herbert Jordan and
Philipp Gschwandtner and
Peter Thoman and
Peter Zangerl and
Alexander Hirsch and
Thomas Fahringer and
Thomas Heller and
Dietmar Fey The allscale framework architecture . . ??
Jianguo Liang and
Rong Hua and
Hao Zhang and
Wenqiang Zhu and
You Fu Accelerated molecular dynamics
simulation of Silicon Crystals on
TaihuLight using OpenACC . . . . . . . . ??
Huan Zhou and
José Gracia and
Naweiluo Zhou and
Ralf Schneider Collectives in hybrid MPI+MPI code:
Design, practice and performance . . . . ??
Nirmal Kumbhare and
Ali Akoglu and
Aniruddha Marathe and
Salim Hariri and
Ghaleb Abdulla Dynamic power management for
value-oriented schedulers in
power-constrained HPC system . . . . . . ??
Jesper Larsson Träff and
Torsten Hoefler Special issue: Selected papers from
EuroMPI 2019 . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous November 2020 . . . . . . . . . . . . . ??
Zhongming Fu and
Zhuo Tang and
Li Yang and
Kenli Li and
Keqin Li ImRP: a Predictive Partition Method for
Data Skew Alleviation in Spark Streaming
Environment . . . . . . . . . . . . . . ??
Shi Dong and
Pu Zhao and
Xue Lin and
David Kaeli Exploring GPU acceleration of Deep
Neural Networks using Block Circulant
Matrices . . . . . . . . . . . . . . . . ??
Massimiliano Lupo Pasini and
Bruno Turcksin and
Wenjun Ge and
Jean-Luc Fattebert A parallel strategy for density
functional theory computations on
accelerated nodes . . . . . . . . . . . ??
Andrew Reisner and
Markus Berndt and
J. David Moulton and
Luke N. Olson Scalable line and plane relaxation in a
parallel structured multigrid solver . . ??
Angelika Schwarz and
Carl Christian Kjelgaard Mikkelsen and
Lars Karlsson Robust parallel eigenvector computation
for the non-symmetric eigenvalue problem ??
Mohammad Almasri and
Walid Abu-Sufah CCF: an efficient SpMV storage format
for AVX512 platforms . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous December 2020 . . . . . . . . . . . . . ??
Xiongwei Fei and
Kenli Li and
Wangdong Yang and
Keqin Li Analysis of energy efficiency of a
parallel AES algorithm for CPU--GPU
heterogeneous platforms . . . . . . . . ??
Joshua Dennis Booth and
Gregory Bolet An on-node scalable sparse incomplete LU
factorization for a many-core iterative
solver with \pkgJavelin . . . . . . . . ??
Baicheng Yan and
Limin Xiao and
Guangjun Qin and
Zhang Yang and
Bin Dong and
Haonan Yu and
Hongyu Wu QTMS: a quadratic time complexity
topology-aware process mapping method
for large-scale parallel applications on
shared HPC system . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous June 2020 . . . . . . . . . . . . . . . ??
Benbo Zha and
Hong Shen Improved probabilistic I/O scheduling
for limited-size Burst-Buffers deployed
HPC . . . . . . . . . . . . . . . . . . ??
Qianqian Tong and
Guannan Liang and
Xingyu Cai and
Chunjiang Zhu and
Jinbo Bi Asynchronous parallel stochastic
Quasi-Newton methods . . . . . . . . . . ??
Mohammad K. Fallah and
Mahmood Fazlali Parallel branch and bound algorithm for
solving integer linear programming
models derived from behavioral synthesis ??
Jiaquan Gao and
Qi Chen and
Guixia He A thread-adaptive sparse approximate
inverse preconditioning algorithm on
multi-GPUs . . . . . . . . . . . . . . . ??
Esra Ruzgar Ateskan and
Kayhan Erciyes and
Mehmet Emin Dalkilic Parallelization of network motif
discovery using star contraction . . . . ??
Fatéma Zahra Benchara and
Mohamed Youssfi A new scalable distributed $k$-means
algorithm based on Cloud micro-services
for High-performance computing . . . . . ??
Yaling Xun and
Jifu Zhang and
Haifeng Yang and
Xiao Qin HBPFP-DC: a parallel frequent itemset
mining using Spark . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous April 2021 . . . . . . . . . . . . . . . ??
Melih Sener and
Stuart Levy and
John E. Stone and
AJ Christensen and
Barry Isralewitz and
Robert Patterson and
Kalina Borkiewicz and
Jeffrey Carpenter and
C. Neil Hunter and
Zaida Luthey-Schulten and
Donna Cox Multiscale modeling and cinematic
visualization of photosynthetic energy
conversion processes from electronic to
cell scales . . . . . . . . . . . . . . ??
Olaf Schenk and
Peter Arbenz and
Luc Giraud and
Wim Vanroose Guest editorial: Virtual special issue
on parallel matrix algorithms and
applications (PMAA'18) . . . . . . . . . ??
Chiheb-Eddine Ben Ncir and
Abdallah Hamza and
Waad Bouaguel Parallel and scalable Dunn Index for the
validation of big data clusters . . . . ??
Xin Long and
Jigang Wu and
Yalan Wu and
Long Chen and
Yidong Li Context switch cost aware joint task
merging and scheduling for deep learning
applications . . . . . . . . . . . . . . ??
Hiroyuki Takizawa and
Shinji Shiotsuki and
Naoki Ebata and
Ryusuke Egawa OpenCL-like offloading with
metaprogramming for SX-Aurora TSUBASA ??
Salvatore Cielo and
Luigi Iapichino and
Johannes Günther and
Christoph Federrath and
Elisabeth Mayer and
Markus Wiedemann Visualizing the world's largest
turbulence simulation . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous May 2021 . . . . . . . . . . . . . . . . ??
Mansour Khelghatdoust and
Vincent Gramoli A scalable and low latency probe-based
scheduler for data analytics frameworks ??
Ryoma Ohira and
Md. Saiful Islam and
Humayun Kayesh Speedup vs. quality: Asynchronous and
cluster-based distributed adaptive
genetic algorithms for ordered problems ??
Ronald Gonzales and
Yury Gryazin and
Yun Teck Lee Parallel FFT algorithms for high-order
approximations on three-dimensional
compact stencils . . . . . . . . . . . . ??
Akemi Shioya and
Yusaku Yamamoto Block red-black MILU(0) preconditioner
with relaxation on GPU . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous June 2021 . . . . . . . . . . . . . . . ??
Stephanie Brink and
Matthew Larsen and
Hank Childs and
Barry Rountree Evaluating adaptive and predictive power
management strategies for optimizing
visualization performance on
supercomputers . . . . . . . . . . . . . ??
Stephanie Brink and
Matthew Larsen and
Hank Childs and
Barry Rountree Evaluating adaptive and predictive power
management strategies for optimizing
visualization performance on
supercomputers . . . . . . . . . . . . . ??
Chahak Mehta and
Amarnath Karthi and
Vishrut Jetly and
Bhaskar Chaudhury Parallel Fast Multipole Method
accelerated FFT on HPC clusters . . . . ??
Chahak Mehta and
Amarnath Karthi and
Vishrut Jetly and
Bhaskar Chaudhury Parallel Fast Multipole Method
accelerated FFT on HPC clusters . . . . ??
Jacob Lambert and
Seyong Lee and
Jeffrey S. Vetter and
Allen D. Malony Optimization with the OpenACC-to-FPGA
framework on the Arria 10 and Stratix 10
FPGAs . . . . . . . . . . . . . . . . . ??
Jacob Lambert and
Seyong Lee and
Jeffrey S. Vetter and
Allen D. Malony Optimization with the OpenACC-to-FPGA
framework on the Arria 10 and Stratix 10
FPGAs . . . . . . . . . . . . . . . . . ??
Jared Brzenski and
Christopher Paolini and
Jose E. Castillo Improving the I/O of large geophysical
models using PnetCDF and BeeGFS . . . . ??
Jared Brzenski and
Christopher Paolini and
Jose E. Castillo Improving the I/O of large geophysical
models using PnetCDF and BeeGFS . . . . ??
Massimiliano Lupo Pasini and
Junqi Yin and
Ying Wai Li and
Markus Eisenbach A scalable algorithm for the
optimization of neural network
architectures . . . . . . . . . . . . . ??
Massimiliano Lupo Pasini and
Junqi Yin and
Ying Wai Li and
Markus Eisenbach A scalable algorithm for the
optimization of neural network
architectures . . . . . . . . . . . . . ??
Christos Bellas and
Anastasios Gounaris HySet: a hybrid framework for exact set
similarity join using a GPU . . . . . . ??
Christos Bellas and
Anastasios Gounaris HySet: a hybrid framework for exact set
similarity join using a GPU . . . . . . ??
Fareed Qararyah and
Mohamed Wahib and
Doga Dikbayir and
Mehmet Esat Belviranli and
Didem Unat A computational-graph partitioning
method for training memory-constrained
DNNs . . . . . . . . . . . . . . . . . . ??
Fareed Qararyah and
Mohamed Wahib and
Doga Dikbayir and
Mehmet Esat Belviranli and
Didem Unat A computational-graph partitioning
method for training memory-constrained
DNNs . . . . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous July 2021 . . . . . . . . . . . . . . . ??
Anonymous July 2021 . . . . . . . . . . . . . . . ??
Seher Acer and
Erik G. Boman and
Christian A. Glusa and
Sivasankaran Rajamanickam \pkgSphynx: a parallel multi-GPU graph
partitioner for distributed-memory
systems . . . . . . . . . . . . . . . . ??
Joseph Schuchart and
Philipp Samfass and
Christoph Niethammer and
José Gracia and
George Bosilca Callback-based completion notification
using MPI Continuations . . . . . . . . ??
Cherifa Dad and
Jean-Philippe Tavella and
Stéphane Vialle Synthesis and feedback on the
distribution and parallelization of
FMI-CS-based co-simulations with the
DACCOSIM platform . . . . . . . . . . . ??
Mellila Bouam and
Charles Bouillaguet and
Claire Delaplace and
Camille Noûs Computational records with aging
hardware: Controlling half the output of
SHA-256 . . . . . . . . . . . . . . . . ??
Masahiro Nakao and
Maaki Sakai and
Yoshiko Hanada and
Hitoshi Murai and
Mitsuhisa Sato Graph optimization algorithm for
low-latency interconnection networks . . ??
Zhixing Yu and
Kejing He and
Xiuhong Zou \pkgPEAB: a pool-based distributed
evolutionary algorithm model with buffer ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2021 . . . . . . . . . . . . . ??
Evan Schneider and
Brant Robertson and
Alexander Kuhn and
Christopher Lux and
Marc Nienhaus NVIDIA IndeX accelerated computing for
visualizing Cholla's galactic winds . . ??
Bipin Kumar and
Matt Rehme and
Neethi Suresh and
Nihanth Cherukuru and
Stanislaw Jaroszynski and
Samual Li and
Scott Pearse and
Tim Scheitlin and
Suryachandra A. Rao and
Ravi S. Nanjundiah Optimization of DNS code and
visualization of entrainment and mixing
phenomena at cloud edges . . . . . . . . ??
Andreas Jocksch and
Noé Ohana and
Emmanuel Lanti and
Eirini Koutsaniti and
Vasileios Karakasis and
Laurent Villard An optimisation of allreduce
communication in message-passing systems ??
John Lawson and
Mehdi Goli Performance portability through machine
learning guided kernel selection in SYCL
libraries . . . . . . . . . . . . . . . ??
Michael Orr and
Oliver Sinnen Optimal task scheduling for partially
heterogeneous systems . . . . . . . . . ??
Jesper Larsson Träff and
Sascha Hunold and
Guillaume Mercier and
Daniel J. Holmes MPI collective communication through a
single set of interfaces: a case for
orthogonality . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous October 2021 . . . . . . . . . . . . . . ??
Linchao Cai and
Junrong Yang and
Shoubin Dong and
Zhenyu Jiang GPU accelerated parallel
reliability-guided digital volume
correlation with automatic seed
selection based on $3$D SIFT . . . . . . ??
Kurt B. Ferreira and
Scott Levy Evaluating MPI resource usage summary
statistics . . . . . . . . . . . . . . . ??
Matthew G. F. Dosanjh and
Andrew Worley and
Derek Schafer and
Prema Soundararajan and
Sheikh Ghafoor and
Anthony Skjellum and
Purushotham V. Bangalore and
Ryan E. Grant Implementation and evaluation of MPI 4.0
partitioned communication libraries . . ??
Mirsaeid Hosseini Shirvani and
Reza Noorian Talouki A novel hybrid heuristic-based list
scheduling algorithm in heterogeneous
cloud computing environment for
makespan optimization . . . . . . . . . ??
David B. Williams-Young and
Abhishek Bagusetty and
Wibe A. de Jong and
Douglas Doerfler and
Hubertus J. J. van Dam and
Álvaro Vázquez-Mayagoitia and
Theresa L. Windus and
Chao Yang Achieving performance portability in
Gaussian basis set density functional
theory on accelerator based
architectures in \pkgNWChemEx . . . . . ??
Sean M. Couch and
Jared Carlson and
Michael Pajkos and
Brian W. O'Shea and
Anshu Dubey and
Tom Klosterman Towards performance portability in the
Spark astrophysical magnetohydrodynamics
solver in the Flash-X simulation
framework . . . . . . . . . . . . . . . ??
Richard Tran Mills and
Mark F. Adams and
Satish Balay and
Jed Brown and
Alp Dener and
Matthew Knepley and
Scott E. Kruger and
Hannah Morgan and
Todd Munson and
Karl Rupp and
Barry F. Smith and
Stefano Zampini and
Hong Zhang and
Junchao Zhang Toward performance-portable PETSc for
GPU-based exascale systems . . . . . . . ??
John R. Tramm and
Andrew R. Siegel Immortal rays: Rethinking random ray
neutron transport on GPU architectures ??
A. Myers and
A. Almgren and
L. D. Amorim and
J. Bell and
L. Fedeli and
L. Ge and
K. Gott and
D. P. Grote and
M. Hogan and
A. Huebl and
R. Jambunathan and
R. Lehe and
C. Ng and
M. Rowan and
O. Shapoval and
M. Thévenet and
J.-L. Vay and
H. Vincenti and
E. Yang and
N. Za\"\im and
W. Zhang and
Y. Zhao and
E. Zoni Porting \pkgWarpX to GPU-accelerated
platforms . . . . . . . . . . . . . . . ??
Kenneth Moreland and
Robert Maynard and
David Pugmire and
Abhishek Yenpure and
Allison Vacanti and
Matthew Larsen and
Hank Childs Minimizing development costs for
efficient many-core visualization using
\pkgMCD$^3$ . . . . . . . . . . . . . . ??
Cody J. Balos and
David J. Gardner and
Carol S. Woodward and
Daniel R. Reynolds Enabling GPU accelerated computing in
the \pkgSUNDIALS time integration
library . . . . . . . . . . . . . . . . ??
Keren Zhou and
Laksono Adhianto and
Jonathon Anderson and
Aaron Cherian and
Dejan Grubisic and
Mark Krentel and
Yumeng Liu and
Xiaozhu Meng and
John Mellor-Crummey Measurement and analysis of
GPU-accelerated applications with
\pkgHPCToolkit . . . . . . . . . . . . . ??
Fabian Czappa and
Alexandru Calotoiu and
Thomas Höhl and
Heiko Mantel and
Toni Nguyen and
Felix Wolf Design-time performance modeling of
compositional parallel programs . . . . ??
Robert D. Falgout and
Ruipeng Li and
Björn Sjögreen and
Lu Wang and
Ulrike Meier Yang Porting \pkghypre to heterogeneous
computer architectures: Strategies and
experiences . . . . . . . . . . . . . . ??
Ahmad Abdelfattah and
Valeria Barra and
Natalie Beams and
Ryan Bleile and
Jed Brown and
Jean-Sylvain Camier and
Robert Carson and
Noel Chalmers and
Veselin Dobrev and
Yohann Dudouit and
Paul Fischer and
Ali Karakus and
Stefan Kerkemeier and
Tzanio Kolev and
Yu-Hsiang Lan and
Elia Merzari and
Misun Min and
Malachi Phillips and
Thilina Rathnayake and
Robert Rieben and
Thomas Stitt and
Ananias Tomboulides and
Stanimire Tomov and
Vladimir Tomov and
Arturo Vargas and
Tim Warburton and
Kenneth Weiss GPU algorithms for Efficient Exascale
Discretizations . . . . . . . . . . . . ??
Yuta Hasegawa and
Takayuki Aoki and
Hiromichi Kobayashi and
Yasuhiro Idomura and
Naoyuki Onodera Tree cutting approach for domain
partitioning on forest-of-octrees-based
block-structured static adaptive mesh
refinement with lattice Boltzmann method ??
Atsushi Hori and
Emmanuel Jeannot and
George Bosilca and
Takahiro Ogura and
Balazs Gerofi and
Jie Yin and
Yutaka Ishikawa An international survey on MPI users . . ??
Wen Cheng and
Shijun Deng and
Lingfang Zeng and
Yang Wang and
André Brinkmann AIOC$^2$: a deep Q-learning approach to
autonomic I/O congestion control in
\pkgLustre . . . . . . . . . . . . . . . ??
Igor Fontana de Nardin and
Rodrigo da Rosa Righi and
Thiago Roberto Lima Lopes and
Cristiano André da Costa and
Heon Young Yeom and
Harald Köstler On revisiting energy and performance in
microservices applications: a cloud
elasticity-driven approach . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous December 2021 . . . . . . . . . . . . . ??
Seonmyeong Bak and
Colleen Bertoni and
Swen Boehm and
Reuben Budiardja and
Barbara M. Chapman and
Johannes Doerfert and
Markus Eisenbach and
Hal Finkel and
Oscar Hernandez and
Joseph Huber and
Shintaro Iwasaki and
Vivek Kale and
Paul R. C. Kent and
JaeHyuk Kwack and
Meifeng Lin and
Piotr Luszczek and
Ye Luo and
Buu Pham and
Swaroop Pophale and
Kiran Ravikumar and
Vivek Sarkar and
Thomas Scogland and
Shilei Tian and
P. K. Yeung OpenMP application experiences: Porting
to accelerated nodes . . . . . . . . . . ??
Joachim Protze and
Marc-André Hermanns and
Matthias S. Müller and
Van Man Nguyen and
Julien Jaeger and
Emmanuelle Saillard and
Patrick Carribault and
Denis Barthou MPI detach --- Towards automatic
asynchronous local completion . . . . . ??
Stephane Bouhrour and
Thibaut Pepin and
Julien Jaeger Towards leveraging collective
performance with the support of MPI 4.0
features in MPC . . . . . . . . . . . . ??
Leonardo Solis-Vasquez and
Andreas F. Tillack and
Diogo Santos-Martins and
Andreas Koch and
Scott LeGrand and
Stefano Forli Benchmarking the performance of
irregular computations in AutoDock--GPU
molecular docking . . . . . . . . . . . ??
Stephen Timcheck and
Jeremy Buhler Reducing queuing impact in streaming
applications with irregular dataflow . . ??
Dong Zhong and
Qinglei Cao and
George Bosilca and
Jack Dongarra Using long vector extensions for MPI
reductions . . . . . . . . . . . . . . . ??
Mirko Mariotti and
Daniel Magalotti and
Daniele Spiga and
Loriano Storchi The BondMachine, a moldable computer
architecture . . . . . . . . . . . . . . ??
Boro Sofranac and
Ambros Gleixner and
Sebastian Pokutta Accelerating domain propagation: an
efficient GPU-parallel algorithm over
sparse matrices . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous March 2022 . . . . . . . . . . . . . . . ??
Sunwoo Lee and
Kai-yuan Hou and
Kewei Wang and
Saba Sehrish and
Marc Paterno and
James Kowalkowski and
Quincey Koziol and
Robert B. Ross and
Ankit Agrawal and
Alok Choudhary and
Wei-keng Liao A case study on parallel \pkgHDF5
dataset concatenation for high energy
physics data analysis . . . . . . . . . ??
Rong Gu and
Jun Shi and
Xiaofei Chen and
Zhaokang Wang and
Yang Che and
Kai Zhang and
Yihua Huang \pkgOctopus-DF: Unified DataFrame-based
cross-platform data analytic system . . ??
Rafael F. Schmid and
Flávia Pisani and
Edson N. Cáceres and
Edson Borin An evaluation of fast segmented sorting
implementations on GPUs . . . . . . . . ??
Fei Teng and
Lei Yu and
Xiao Liu and
Pei Lai Tight Lower bound on power consumption
for scheduling real-time periodic tasks
in core-level DVFS systems . . . . . . . ??
Spiros N. Agathos and
Vassilios V. Dimakopoulos and
Ilias K. Kasmeridis Compiler-assisted, adaptive runtime
system for the support of OpenMP in
embedded multicores . . . . . . . . . . ??
Ian Bogle and
George M. Slota and
Erik G. Boman and
Karen D. Devine and
Sivasankaran Rajamanickam Parallel graph coloring algorithms for
distributed GPU environments . . . . . . ??
Pieter Ghysels and
Ryan Synk High performance sparse multifrontal
solvers on modern GPUs . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous May 2022 . . . . . . . . . . . . . . . . ??
Kasia \'Swirydowicz and
Eric Darve and
Wesley Jones and
Jonathan Maack and
Shaked Regev and
Michael A. Saunders and
Stephen J. Thomas and
Slaven Pele\vs Linear solvers for power grid
optimization problems: a review of
GPU-accelerated linear solvers . . . . . ??
Anonymous Special issue of Selected Papers from
EuroMPI/USA 2020 . . . . . . . . . . . . ??
Jianguo Liang and
Rong Hua and
Wenqiang Zhu and
Yuxi Ye and
You Fu and
Hao Zhang OpenACC + Athread collaborative
optimization of Silicon-Crystal
application on Sunway TaihuLight . . . . ??
Nitin Gawande and
Sayan Ghosh and
Mahantesh Halappanavar and
Antonino Tumeo and
Ananth Kalyanaraman Towards scaling community detection on
distributed-memory heterogeneous systems ??
Zhongyu Shen and
Jilin Zhang and
Tomohiro Suzuki Task-parallel tiled direct solver for
dense symmetric indefinite systems . . . ??
Yalan Wu and
Jigang Wu and
Peng Liu and
Yinhe Han and
Thambipillai Srikanthan Reconfiguration algorithms for
synchronous communication on switch
based degradable arrays . . . . . . . . ??
Terry Cojean and
Yu-Hsiang Mike Tsai and
Hartwig Anzt Ginkgo --- a math library designed for
platform portability . . . . . . . . . . ??
Johannes Pekkilä and
Miikka S. Väisälä and
Maarit J. Käpylä and
Matthias Rheinhardt and
Oskar Lappi Scalable communication for high-order
stencil computations using CUDA-aware
MPI . . . . . . . . . . . . . . . . . . ??
Keita Iwabuchi and
Karim Youssef and
Kaushik Velusamy and
Maya Gokhale and
Roger Pearce Metall: a persistent memory allocator
for data-centric analytics . . . . . . . ??
Yassine Ramdane and
Omar Boussaid and
Doulkifli Boukra\`a and
Nadia Kabachi and
Fadila Bentayeb Building a novel physical design of a
distributed big data warehouse over a
Hadoop cluster to enhance OLAP cube
query performance . . . . . . . . . . . ??
Robert Schade and
Tobias Kenter and
Hossam Elgabarty and
Michael Lass and
Ole Schütt and
Alfio Lazzaro and
Hans Pabst and
Stephan Mohr and
Jürg Hutter and
Thomas D. Kühne and
Christian Plessl Towards electronic structure-based
ab-initio molecular dynamics
simulations with hundreds of millions of
atoms . . . . . . . . . . . . . . . . . ??
Tetsuro Nakamura and
Shogo Saito and
Kei Fujimoto and
Masashi Kaneko and
Akinori Shiraga Spatial- and time- division multiplexing
in CNN accelerator . . . . . . . . . . . ??
Nuntipat Phisutthangkoon and
Jeeraporn Werapun Optimal ATAPE task scheduling on
reconfigurable and partitionable
hierarchical hypercube networks . . . . ??
Ziheng Wang and
Heng Chen and
Weiling Cai and
Xiaoshe Dong and
Xingjun Zhang $C$-Lop: Accurate contention-based
modeling of MPI concurrent communication ??
Vianney Kengne Tchendji and
Hermann Bogning Tepiele and
Mathias Akong Onabid and
Jean Frédéric Myoupo and
Jerry Lacmou Zeutouo A coarse-grained multicomputer parallel
algorithm for the sequential substring
constrained longest common subsequence
problem . . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous July 2022 . . . . . . . . . . . . . . . ??
Andrew Garmon and
Vinay Ramakrishnaiah and
Danny Perez Resource allocation for task-level
speculative scientific applications: a
proof of concept using Parallel
Trajectory Splicing . . . . . . . . . . ??
Daniel Bielich and
Julien Langou and
Stephen Thomas and
Kasia \'Swirydowicz and
Ichitaro Yamazaki and
Erik G. Boman Low-synch Gram--Schmidt with delayed
reorthogonalization for Krylov solvers ??
Busenur Aktilav and
Isil Öz Performance and accuracy predictions of
approximation methods for shortest-path
algorithms on GPUs . . . . . . . . . . . ??
Lena Oden and
Jörg Keller Improving cryptanalytic applications
with stochastic runtimes on GPUs and
multicores . . . . . . . . . . . . . . . ??
Zhong Liu and
Xin Xiao and
Chen Li and
Sheng Ma and
Deng Rangyu Optimizing convolutional neural networks
on multi-core vector accelerator . . . . ??
Wenhu Shi and
Hongjian Li and
Junzhe Guan and
Hang Zeng and
Rafe Misskat jahan Energy-efficient scheduling algorithms
based on task clustering in
heterogeneous spark clusters . . . . . . ??
Seo Jin Jang and
Wei Liu and
Wei Li and
Yong Beom Cho Parallel multi-view HEVC for
heterogeneously embedded cluster system ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2022 . . . . . . . . . . . . . ??
Alessio Netti and
Michael Ott and
Carla Guillen and
Daniele Tafani and
Martin Schulz Operational Data Analytics in practice:
Experiences from design to deployment in
production HPC environments . . . . . . ??
J. Pronold and
J. Jordan and
B. J. N. Wylie and
I. Kitayama and
M. Diesmann and
S. Kunkel Routing brain traffic through the von
Neumann bottleneck: Efficient cache
usage in spiking neural network
simulation code on general purpose
computers . . . . . . . . . . . . . . . ??
Jiazhi Jiang and
Dan Huang and
Jiangsu Du and
Yutong Lu and
Xiangke Liao Optimizing small channel $3$D
convolution on GPU with tensor core . . ??
Gizen Mutlu and
Çigdem Inan Aci SVM-SMO-SGD: a hybrid-parallel support
vector machine algorithm using
sequential minimal optimization with
stochastic gradient descent . . . . . . ??
Tianshi Xu and
Vassilis Kalantzis and
Ruipeng Li and
Yuanzhe Xi and
Geoffrey Dillon and
Yousef Saad parGeMSLR: a parallel multilevel Schur
complement low-rank preconditioning and
solution package for general sparse
matrices . . . . . . . . . . . . . . . . ??
Qingxiao Sun and
Liu Yi and
Hailong Yang and
Mingzhen Li and
Zhongzhi Luan and
Depei Qian QoS-aware dynamic resource allocation
with improved utilization and energy
efficiency on GPU . . . . . . . . . . . ??
Jaemin Choi and
Zane Fink and
Sam White and
Nitin Bhat and
David F. Richards and
Laxmikant V. Kale Accelerating communication for parallel
programming models on GPU systems . . . ??
Yan Huang and
Qingbin Wang and
Minghao Lv and
Xingguang Song and
Jinkai Feng and
Xuli Tan and
Ziyan Huang and
Chuyuan Zhou Fast calculation of isostatic
compensation correction using the
GPU-parallel prism method . . . . . . . ??
Hao Wang and
Ce Yu and
Jian Xiao and
Shanjiang Tang and
Yu Lu and
Hao Fu and
Bo Kang and
Gang Zheng and
Chenzhou Cui A method for efficient radio
astronomical data gridding on multi-core
vector processor . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous October 2022 . . . . . . . . . . . . . . ??
Lukas Spies and
Amanda Bienz and
David Moulton and
Luke Olson and
Andrew Reisner Tausch: a halo exchange library for
large heterogeneous computing systems
using MPI, OpenCL, and CUDA . . . . . . ??
Xinyuan Wang and
Hejiao Huang SGPM: a coroutine framework for
transaction processing . . . . . . . . . ??
Paul Fischer and
Stefan Kerkemeier and
Misun Min and
Yu-Hsiang Lan and
Malachi Phillips and
Thilina Rathnayake and
Elia Merzari and
Ananias Tomboulides and
Ali Karakus and
Noel Chalmers and
Tim Warburton NekRS, a GPU-accelerated spectral
element Navier--Stokes solver . . . . . ??
Masahiro Nakao and
Masaki Tsukamoto and
Yoshiko Hanada and
Keiji Yamamoto Graph optimization algorithm using
symmetry and host bias for low-latency
indirect network . . . . . . . . . . . . ??
Adel Dabah and
Ibrahim Chegrane and
Sa\"\id Yahiaoui and
Ahcene Bendjoudi and
Nadia Nouali-Taboudjemat Efficient parallel branch-and-bound
approaches for exact graph edit distance
problem . . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous December 2022 . . . . . . . . . . . . . ??
Guilherme Andrade and
Renato Ferreira and
George Teodoro Spatial-aware data partition for
distributed memory parallelization of
ANN search in multimedia retrieval . . . ??
G. Patronas and
N. Vlassopoulos and
Ph. Bellos and
D. Reisis Accelerating the scheduling of the
network resources of the next-generation
optical data centers . . . . . . . . . . ??
Özcan Dülger and
Halit Oguztüzün and
Mübeccel Demirekler Uphill resampling for particle filter
and its implementation on graphics
processing unit . . . . . . . . . . . . ??
Guoqing Wu and
Hongyun Tian and
Guo Lu and
Wei Wang ParVoro++: a scalable parallel algorithm
for constructing $3$D Voronoi
tessellations based on $ k d$-tree
decomposition . . . . . . . . . . . . . ??
Kuan Li and
Kang He and
Stef Graillat and
Hao Jiang and
Tongxiang Gu and
Jie Liu Multi-level parallel multi-layer block
reproducible summation algorithm . . . . ??
Phillip Allen Lane and
Joshua Dennis Booth Heterogeneous sparse matrix--vector
multiplication via compressed sparse row
format . . . . . . . . . . . . . . . . . ??
Valeriy Manin and
Bruno Lang Efficient parallel reduction of
bandwidth for symmetric matrices . . . . ??
Anonymous Reviewer acknowledgment . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous February 2023 . . . . . . . . . . . . . ??
Yidong Chen and
Chen Li and
Yonghong Hu and
Zhonghua Lu A parallel non-convex approximation
framework for risk parity portfolio
design . . . . . . . . . . . . . . . . . ??
Marek Palkowski and
Wlodzimierz Bielecki NPDP benchmark suite for the evaluation
of the effectiveness of automatic
optimizing compilers . . . . . . . . . . ??
Zeshi Liu and
Zhen Xie and
Wenqian Dong and
Mengting Yuan and
Haihang You and
Dong Li A heterogeneous processing-in-memory
approach to accelerate quantum chemistry
simulation . . . . . . . . . . . . . . . ??
Akira Nukada and
Taichiro Suzuki and
Satoshi Matsuoka Efficient checkpoint/restart of CUDA
applications . . . . . . . . . . . . . . ??
David Castells-Rufas GPU acceleration of Levenshtein distance
computation between long strings . . . . ??
Lukas Reitz and
Kai Hardenbicker and
Tobias Werner and
Claudia Fohry Lifeline-based load balancing schemes
for Asynchronous Many-Task runtimes in
clusters . . . . . . . . . . . . . . . . ??
Shelby Lockhart and
Amanda Bienz and
William D. Gropp and
Luke N. Olson Characterizing the performance of
node-aware strategies for irregular
point-to-point communication on
heterogeneous architectures . . . . . . ??
Lei Yu and
Tianqi Zhong and
Peng Bi and
Lan Wang and
Fei Teng Segment based power-efficient scheduling
for real-time DAG tasks on edge devices ??
Clément Foyer and
Brice Goglin and
Andr\`es Rubio Proaño A survey of software techniques to
emulate heterogeneous memory systems in
high-performance computing . . . . . . . ??
Andres Pastrana-Cruz and
Manuel Lafond A lightweight semi-centralized strategy
for the massive parallelization of
branching algorithms . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous July 2023 . . . . . . . . . . . . . . . ??
Srdan Daniel Simi\'c and
Nikola Tankovi\'c and
Darko Etinger Big data BPMN workflow resource
optimization in the cloud . . . . . . . ??
Rene Halver and
Christoph Junghans and
Godehard Sutmann Using heterogeneous GPU nodes with a
Cabana-based implementation of MPCD . . ??
Bin Yu and
Xu Lu and
Cong Tian and
Meng Wang and
Chu Chen and
Ming Lei and
Zhenhua Duan Adaptively parallel runtime verification
based on distributed network for
temporal properties . . . . . . . . . . ??
Jiang Zheng and
Jiazhi Jiang and
Jiangsu Du and
Dan Huang and
Yutong Lu Optimizing massively parallel sparse
matrix computing on ARM many-core
processor . . . . . . . . . . . . . . . ??
Lih-Yuan Deng and
Bryan R. Winter and
Jyh-Jen Horng Shiau and
Henry Horng-Shing Lu and
Nirman Kumar and
Ching-Chi Yang Parallelizable efficient large order
multiple recursive generators . . . . . ??
Ami Marowka and
Przemys\law Stpiczy\'nski Editorial on Advances in High
Performance Programming . . . . . . . . ??
Jinliang Shi and
Dewu Chen and
Jiabi Liang and
Lin Li and
Yue Lin and
Jianjiang Li New YARN sharing GPU based on graphics
memory granularity scheduling . . . . . ??
Adam Sky and
César Polindara and
Ingo Muench and
Carolin Birk A flexible sparse matrix data format and
parallel algorithms for the assembly of
finite element matrices on shared memory
systems . . . . . . . . . . . . . . . . ??
Muhammad Kabeer and
Ibrahim Yusuf and
Nasir Ahmad Sufi Distributed software defined
network-based fog to fog collaboration
scheme . . . . . . . . . . . . . . . . . ??
Ou Wu and
Shanshan Li and
He Zhang and
Liwen Liu and
Haoming Li and
Yanze Wang and
Ziyi Zhang An optimal scheduling algorithm
considering the transactions worst-case
delay for multi-channel hyperledger
fabric network . . . . . . . . . . . . . ??
Ignacio Laguna and
Anh Tran and
Ganesh Gopalakrishnan Finding inputs that trigger
floating-point exceptions in
heterogeneous computing via Bayesian
optimization . . . . . . . . . . . . . . 103042:1--103042:13
Hao Zhang and
Zhiyi Huang and
Yawen Chen and
Jianguo Liang and
Xiran Gao ESA: an efficient sequence alignment
algorithm for biological database search
on Sunway TaihuLight . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2023 . . . . . . . . . . . . . ??
Matthias Bolten and
Stephanie Friedhoff and
Jens Hahne Task graph-based performance analysis of
parallel-in-time methods . . . . . . . . ??
James D. Trotter and
Johannes Langguth and
Xing Cai Targeting performance and
user-friendliness: GPU-accelerated
finite element computation with
automated code generation in FEniCS . . ??
Zhexu Liu and
Shaofeng Liu and
Zhiyong Fan and
Zhen Zhao Low consumption automatic discovery
protocol for DDS-based large-scale
distributed parallel computing . . . . . ??
Yunqi Gao and
Zechao Zhang and
Bing Hu and
A-Long Jin and
Chunming Wu OF-WFBP: a near-optimal communication
mechanism for tensor fusion in
distributed deep learning . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous November 2023 . . . . . . . . . . . . . ??
Shushan Li and
Meng Wang and
Hong Zhang and
Yao Liu Program partitioning and deadlock
analysis for MPI based on logical clocks ??
Xi Liu and
Gizem Kayar and
Ken Perlin A GPU-based hydrodynamic simulator with
boid interactions . . . . . . . . . . . ??
Mohammad Norouzi and
Nicolas Morew and
Qamar Ilias and
Lukas Rothenberger and
Ali Jannesari and
Felix Wolf Fast data-dependence profiling through
prior static analysis . . . . . . . . . ??
Ke Liu and
Haonan Tong and
Zhongxiang Sun and
Zhixin Ren and
Guangkui Huang and
Hongyin Zhu and
Luyang Liu and
Qunyang Lin and
Chuang Zhang Integrating FPGA-based hardware
acceleration with relational databases ??
Anne Benoit Editorial for \booktitleParallel
Computing . . . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous February 2024 . . . . . . . . . . . . . ??
Jianjiang Li and
Lin Li and
Qingwei Wang and
Wei Xue and
Jiabi Liang and
Jinliang Shi Parallel optimization and application of
unstructured sparse triangular solver on
new generation of Sunway architecture ??
Kohei Yoshida and
Shinobu Miwa and
Hayato Yamaki and
Hiroki Honda Analyzing the impact of CUDA versions on
GPU applications . . . . . . . . . . . . ??
Kaihao Ma and
Zhenkun Cai and
Xiao Yan and
Yang Zhang and
Zhi Liu and
Yihui Feng and
Chao Li and
Wei Lin and
James Cheng PPS: Fair and efficient black-box
scheduling for multi-tenant GPU clusters ??
Sanjay Bhardwaj and
Da-Hye Kim and
Dong-Seong Kim Federated learning based modulation
classification for multipath channels ??
Fahimeh Yazdanpanah and
Mohammad Alaei An approach for low-power heterogeneous
parallel implementation of ALC-PSO
algorithm using OmpSs and CUDA . . . . . ??
Qingcai Jiang and
Zhenwei Cao and
Xinhui Cui and
Lingyun Wan and
Xinming Qin and
Huanqi Cao and
Hong An and
Junshi Chen and
Jie Liu and
Wei Hu and
Jinlong Yang Extending the limit of LR-TDDFT on two
different approaches: Numerical
algorithms and new Sunway heterogeneous
supercomputer . . . . . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous June 2024 . . . . . . . . . . . . . . . ??
Duo Yang and
Bing Hu and
An Liu and
A-Long Jin and
Kwan L. Yeung and
Yang You WBSP: Addressing stragglers in
distributed machine learning with
worker-busy synchronous parallel . . . . ??
Alexander Agathos and
Philip Azariadis Multi-GPU $3$D $k$-nearest neighbors
computation with application to ICP,
point cloud smoothing and normals
computation . . . . . . . . . . . . . . ??
Chunfeng Li and
Karim Soliman and
Fei Yin and
Jin Wei and
Feng Shi NxtSPR: a deadlock-free shortest path
routing dedicated to relaying for
Triplet-Based many-core Architecture . . ??
Gang Xian and
Wenxiang Yang and
Yusong Tan and
Jinghua Feng and
Yuqi Li and
Jian Zhang and
Jie Yu Mobilizing underutilized storage nodes
via job path: a job-aware file striping
approach . . . . . . . . . . . . . . . . ??
Jirí Klepl and
Adam Smelko and
Lukás Rozsypal and
Martin Krulis Abstractions for C++ code optimizations
in parallel high-performance
applications . . . . . . . . . . . . . . ??
Dolores Miao and
Ignacio Laguna and
Giorgis Georgakoudis and
Konstantinos Parasyris and
Cindy Rubio-González An automated OpenMP mutation testing
framework for performance optimization ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2024 . . . . . . . . . . . . . ??
Xingwang Huang and
Min Xie and
Dong An and
Shubin Su and
Zongliang Zhang Task scheduling in cloud computing based
on grey wolf optimization with a new
encoding mechanism . . . . . . . . . . . ??
Adrian Schmitz and
Semih Burak and
Julian Miller and
Matthias S. Müller Parallel Pattern Compiler for Automatic
Global Optimizations . . . . . . . . . . ??
Rahim Alizadeh and
Shahriar Bijani and
Fatemeh Shakeri Distributed consensus-based estimation
of the leading eigenvalue of a
non-negative irreducible matrix . . . . ??
Fenglong Cai and
Dong Yuan and
Zhe Yang and
Yonghui Xu and
Wei He and
Wei Guo and
Lizhen Cui FastPTM: Fast weights loading of
pre-trained models for parallel
inference service provisioning . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous November 2024 . . . . . . . . . . . . . ??
Xiaofeng Zou and
Yuanxi Peng and
Tuo Li and
Lingjun Kong and
Lu Zhang Seesaw: a 4096-bit vector processor for
accelerating Kyber based on RISC-V ISA
extensions . . . . . . . . . . . . . . . ??
Zheng Miao and
Jon C. Calhoun and
Rong Ge Towards resilient and energy efficient
scalable Krylov solvers . . . . . . . . ??
Kasia \'Swirydowicz and
Nicholson Koukpaizan and
Maksudul Alam and
Shaked Regev and
Michael Saunders and
Slaven Pele\vs Iterative methods in GPU-resident linear
solvers for nonlinear constrained
optimization . . . . . . . . . . . . . . ??
Xiran Gao and
Li Chen and
Haoyu Wang and
Huimin Cui and
Xiaobing Feng Scalable tasking runtime with
parallelized builders for explicit
message passing architectures . . . . . ??
Henri Casanova and
Arnaud Giersch and
Arnaud Legrand and
Martin Quinson and
Frédéric Suter Lowering entry barriers to developing
custom simulators of distributed
applications and platforms with SimGrid ??
Jaroslav Olha and
Jana Hozzová and
Matej Antol and
Jirí Filipovic Estimating resource budgets to ensure
autotuning efficiency . . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous March 2025 . . . . . . . . . . . . . . . ??
Anshuman Misra and
Ajay D. Kshemkalyani Byzantine-tolerant detection of
causality: There is no holy grail . . . ??
Siyang Xing and
Youmeng Li and
Zikun Deng and
Qijun Zheng and
Zeyu Lu and
Qinglin Wang Multi-level parallelism optimization for
two-dimensional convolution
vectorization method on multi-core
vector accelerator . . . . . . . . . . . ??
Wei Qian and
Zhengwei Zhu and
Chenyang Zhu and
Yanping Zhu FPGA-based accelerator for YOLOv5 object
detection with optimized computation and
data access for edge deployment . . . . ??
Rupinder Kaur and
Gurjinder Kaur and
Major Singh Goraya EESF: Energy-efficient scheduling
framework for deadline-constrained
workflows with computation speed
estimation method in cloud . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous June 2025 . . . . . . . . . . . . . . . ??
Harish Padmanaban and
Nurkasym Arkabaev and
Maher Ali Rusho and
Vladyslav Kozub and
Yurii Kozub Using Java to create and analyze models
of parallel computing systems . . . . . ??
Yuyao Niu and
Marc Casas ALBBA: an efficient ALgebraic Bypass BFS
Algorithm on long vector architectures ??
Hui Zhao and
Wentao Zhi and
Xiaoqin Lu and
Jing Wang and
Nan Luo and
Bo Wan and
Quan Wang Multi-workflow fault-tolerance
scheduling strategy considering
resources supply delay in WaaS platforms ??
Xiang Zhao and
Haitao Du and
Yi Kang Enable cross-iteration parallelism for
PIM-based graph processing with
vertex-level synchronization . . . . . . ??
Ali Nada and
Hazem Ismail Ali and
Liang Liu and
Yousra Alkabani Software acceleration of multi-user MIMO
uplink detection on GPU . . . . . . . . ??
Anonymous Editorial Board . . . . . . . . . . . . ??
Anonymous September 2025 . . . . . . . . . . . . . ??