Last update:
Tue Apr 15 08:52:10 MDT 2025
Joanne L. Martin An Invitation To Participate . . . . . . 3--4
Erich Bloch Supercomputing and the Growth of
Computational Science in the National
Science Foundation . . . . . . . . . . . 5--8
Richard A. Friesner and
Jean-Philippe Brunet and
Robert E. Wyatt and
Claude Leforestier and
Steven Binkley Computational Approach to Large Quantum
Dynamical Problems . . . . . . . . . . . 9--23
John M. Dawson and
Viktor K. Decyk and
Brendan McNamara Particle Modeling of Plasmas On
Supercomputers . . . . . . . . . . . . . 24--43
George N. Reeke, Jr. and
Gerald M. Edelman and
Dan Sulzbach Selective Neural Networks and Their
Implications for Recognition Automata 44--69
Rami Melhem and
Dennis Gannon Toward Efficient Implementation of
Preconditioned Conjugate Gradient
Methods on Vector Supercomputers . . . . 70--98
Francis Sullivan and
Jack Dongarra Algorithm Design for Large-Scale
Computations . . . . . . . . . . . . . . 99--105
Anonymous High-Speed Computing and Artificial
Intelligence Connection . . . . . . . . 106--110
Anonymous An Agenda for Improved Evaluation of
Supercomputer Performance . . . . . . . 110--111
Jack Dongarra Book Reviews: \booktitleThe Connection
Machine . . . . . . . . . . . . . . . . 112--112
Joanne L. Martin Book Reviews: \booktitleHigh-Speed
Computing: Scientific Applications and
Algorithm Design . . . . . . . . . . . . 113--113
Anonymous Software for High Performance Computers 114--115
Anonymous Advanced Computing Research Facility
Offers Opportunities for Experimentation
in Multiprocessing . . . . . . . . . . . 115--116
Anonymous Dispelling the ``No Software Myth . . . 116--116
Robert B. Wilhelmson A Walk Into the Future . . . . . . . . . 3--5
Merry Maisel Science At the San Diego Supercomputer
Center . . . . . . . . . . . . . . . . . 6--10
William R. Martin and
Forrest B. Brown Status of Vectorized Monte Carlo for
Particle Transport Analysis . . . . . . 11--32
David C. Torney and
Tony T. Warnock and
Peter Kollman Computer Simulation of Diffusion-Limited
Chemical Reactions in Three Dimensions 33--43
Boles\law K. Szymanski and
Dieter Mueller-Wichards Parallel Programming With Recurrent
Equations . . . . . . . . . . . . . . . 44--74
S. T. Kao and
E. L. Leiss and
Olin Johnson An Experimental Implementation of
Migration Algorithms on the Intel
Hypercube . . . . . . . . . . . . . . . 75--99
B. Buslee Book Review: \booktitleSupercomputers:
Value and Trends, Bill Buzbee, Computer
Research and Applications Group,
Computing and Communications Division,
Los Alamos National Laboratory, Los
Alamos, New Mexico 87545 . . . . . . . . 100--103
Joanne L. Martin The Missing Pieces . . . . . . . . . . . 3--4
Tricia Nunns Supercomputing in Western Canada . . . . 5--11
Petter E. Bjòrstad and
Jon Braekhus and
John Aldag Implementation and Performance of the
Large-Scale Finite Element Code Sesam On
a Wide Range of Scientific Computers . . 12--25
R. E. Benner and
G. R. Montry and
G. G. Weigand and
Iain Duff Concurrent Multifrontal Methods: Shared
Memory, Cache, and Frontwidth Issues . . 26--44
Misako Ishiguro and
Hiroo Harada and
Mitsuhiro Makino and
Joanne L. Martin Performance Analysis of Vectorized
Nuclear Codes on a FACOM VP-100 at the
Japan Atomic Energy Research Institute 45--56
William R. Martin and
Tzu-Chiang Wan and
Tarek S. Abdel-Rahman and
Trevor N. Mudge and
Kenichi Miura Monte Carlo Photon Transport On Shared
Memory and Distributed Memory Parallel
Processors . . . . . . . . . . . . . . . 57--74
Michelle Y. Kim and
Anil Nigam and
George Paul and
Robert J. Flynn and
Garry H. Rodrigue Disk Interleaving and Very Large Fast
Fourier Transforms . . . . . . . . . . . 75--96
Anonymous Networking Needs and Trends in Data
Communications . . . . . . . . . . . . . 97--100
Brendan McNamara The Mass Market for Supercomputing . . . 3--4
Dennis Meredith Science and Technology At Cornell's
Theory Center . . . . . . . . . . . . . 5--9
C. Cleveland Ashcraft and
Roger G. Grimes and
John G. Lewis and
Barry W. Peyton and
Horst D. Simon and
Petter E. Bjòrstad Progress in Sparse Matrix Methods for
Large Linear Systems on Vector
Supercomputers . . . . . . . . . . . . . 10--30
Christian E. Petersen and
Christopher A. Sims Computer Simulation of Large Scale
Econometric Models: Project LINK . . . . 31--53
Albert Ando and
Paul Beaumont and
Matthew Ando and
Christopher A. Sims Efficiency of the CYBER 205 for
Stochastic Simulations of a
Simultaneous, Nonlinear, Dynamic
Econometric Model . . . . . . . . . . . 54--81
Diana Choi and
Creon Levit and
Steven E. Follin Implementation of a Distributed
Interactive Graphics System . . . . . . 82--95
Stuart E. Rogers and
Pieter G. Buning and
Fergus J. Merritt and
Steven E. Follin Distributed Interactive Graphics
Applications in Computational Fluid
Dynamics . . . . . . . . . . . . . . . . 96--105
David Salzman Visualization in Scientific Computing:
Summary of an NSF-Sponsored Panel Report
On Graphics, Image Processing, and
Workstations . . . . . . . . . . . . . . 106--108
Gregory J. McRae Book Reviews: \booktitleThe
Characteristics of Parallel Algorithms 109--110
Joanne L. Martin Book Reviews: \booktitleSupercomputers
and Their Use . . . . . . . . . . . . . 110--111
Dennis Gannon Programming Environments for
Supercomputing . . . . . . . . . . . . . 3--4
Ralph Z. Roskies and
Penny D. Sackett Science At the Pittsburgh Supercomputing
Center . . . . . . . . . . . . . . . . . 5--11
Kyle Gallivan and
William Jalby and
Ulrike Meier and
Ahmed H. Sameh Impact of Hierarchical Memory Systems on
Linear Algebra Algorithm Design . . . . 12--48
O. Terki-Hassaine and
E. L. Leiss Multitasking $3$-D Forward Modeling
Using High Order Finite Difference
Methods on the CRAY X-MP/416 . . . . . . 49--65
M. Lescrenier and
Ph. L. Tointt Large Scale Unconstrained Optimization
on the FPS 164 and CRAY X-MP Vector
Processors . . . . . . . . . . . . . . . 66--81
David H. Bailey A High-Performance FFT Algorithm for
Vector Supercomputers . . . . . . . . . 82--87
Gene M. Amdahl Limits of Expectation . . . . . . . . . 88--94
John W. D. Connolly Book Review: \booktitleThe Supercomputer
Era . . . . . . . . . . . . . . . . . . 95--96
Robert A. Brown Supercomputers in Chemistry and Chemical
Engineering . . . . . . . . . . . . . . 3--4
Jan Almlöf and
Donald G. Truhlar and
H. T. Davis and
Klavs F. Jensen and
Matthew Tirrell and
Terry Lybrand Supercomputer Chemistry At the
University of Minnesota . . . . . . . . 5--15
Gregory J. McRae and
Jana B. Milford and
Barbara J. Slompak Changing Roles for Supercomputing in
Chemical Engineering . . . . . . . . . . 16--40
Donna A. Bassolino and
Fumio Hirata and
Douglas B. Kitchen and
Dorothea Kominos and
Arthur Pardi and
Ronald M. Levy Determination of Protein Structures in
Solution Using NMR Data and IMPACT . . . 41--61
David A. Dixon and
Frederic A. Van-Cadedge A Molecular Model for the Helicity of
Polytetrafluoro Ethylene (Tefloin\reg) 62--81
Joanne L. Martin Book Review: \booktitleSupercomputer
Research in Chemistry and Chemical
Engineering (ACS Symposium Series, Vol.
353) . . . . . . . . . . . . . . . . . . 82--83
Joanne L. Martin A Retrospective . . . . . . . . . . . . 3--5
William H. Allen Centers of Supercomputing --- Science at
the National Center for Supercomputing 6--9
Arvind and
David E. Culler and
Gino K. Maa Assessing the Benefits of Fine-Grain
Parallelism in Dataflow Programs . . . . 10--36
Michael W. Berry and
Ahmed Sameh Multiprocessor Schemes for Solving Block
Tridiagonal Linear Systems . . . . . . . 37--57
Kazutami Tago and
Hiroki Kumahora and
Noriyuki Sadaoka and
Kinya Kobayashi Vectorized calculations and use of fast
semiconductor memories in the DV-X $
\alpha $ method . . . . . . . . . . . . 58--72
Heihachiro Hara and
Yoichi Kodera and
Kazuhiko Kanehiro Flow Simulations By Parallel Computer
MiPax . . . . . . . . . . . . . . . . . 73--80
Patrick Gaffney IBM Bergen Scientific Centre and the
International Conference On Vector and
Parallel Computing . . . . . . . . . . . 3--4
Gérard Meurant Domain Decomposition Methods for Partial
Differential Equations on Parallel
Computers . . . . . . . . . . . . . . . 5--12
James J. Little and
Tomaso Poggio and
Edward B. Gamble, Jr. Seeing in parallel: the Vision Machine 13--28
Linda G. Shapiro Programming Parallel Vision Algorithms:
a Dataflow Language Approach . . . . . . 29--44
Richard E. Ewing Large-Scale Computing in Reservoir
Simulation . . . . . . . . . . . . . . . 45--53
J. A. Hertz Statistical Mechanics of Neural
Computation . . . . . . . . . . . . . . 54--62
Wolfgang Gentzsch Comparison of Supercomputers and
Mini-Supercomputers for Computational
Fluid Dynamics Calculations . . . . . . 63--71
Tony F. Chan Domain Decomposition Algorithms and
Computational Fluid Dynamics . . . . . . 72--83
C. David Callahan and
Keith D. Cooper and
Robert T. Hood and
Ken Kennedy and
Linda Torczon ParaScope: a Parallel Programming
Environment . . . . . . . . . . . . . . 84--99
Jan Kok Parallel Programming With Ada . . . . . 100--108
Hermann Mierendorff and
Karl Solchenbach and
Ulrich Trottenberg On the SUPRENUM System . . . . . . . . . 109--117
Albert M. Erisman Supercomputing as a tool for product
development . . . . . . . . . . . . . . 118--121
Joanne L. Martin Supercomputers, Networks, and Privacy 3--4
Christopher Eoyang and
Raul H. Mendez Supercomputing in Japan: Institute for
Supercomputing Research . . . . . . . . 5--9
Linda Kaufman and
Norm Schryer Solving two-dimensional partial
differential equations on vector and
scalar machines . . . . . . . . . . . . 10--33
Anna Nagurney and
Dae-Shik S. Kim Parallel and Serial Variational
Inequality Decomposition Algorithms for
Multicommodity Market Equilibrium
Problems . . . . . . . . . . . . . . . . 34--58
Gary R. Montry Massively Parallel Mathematical Sieves 59--74
Swarn P. Kumar Solving tridiagonal linear systems on
the Butterfly parallel computer . . . . 75--81
K. J. M. Moriarty Parallel Processing of Large-Scale
Applications on Powerful Multiple
Processors . . . . . . . . . . . . . . . 82--87
John E. Aldag The Impact of Supercomputers: Global,
Pervasive, Positive . . . . . . . . . . 3--5
lain S. Duff CERFACS: a European Center for
High-Performance Computation . . . . . . 6--9
J. A. Sethian and
James B. Salem Animation of Interactive Fluid Flow
Visualization Tools on a Data Parallel
Machine . . . . . . . . . . . . . . . . 10--39
Michel J. Daydé and
Iain S. Duff Level 3 BLAS in $ L U $ Factorization on
the CRAY-2, ETA-10P, and IBM 3090-200/VF 40--70
Daniel A. Menascé and
Virgilio A. F. Almeida Analytic Models of Supercomputer
Performance in Multiprogramming
Environments . . . . . . . . . . . . . . 71--91
David A. Mandell and
Harold E. Trease Parallel Processing a Three-Dimensional
Free-Lagrange Code: a Case History . . . 92--99
Frederic A. Van-Catledge Toward a General Model for Evaluating
the Relative Performance of Computer
Systems . . . . . . . . . . . . . . . . 100--108
Joanne L. Martin Supercomputing: Beyond the Daily Planet 3--4
M. Berry and
D. Chen and
P. Koss and
D. Kuck and
S. Lo and
Y. Pang and
L. Pointer and
R. Roloff and
A. Sameh and
E. Clementi and
S. Chin and
D. Schneider and
G. Fox and
P. Messina and
D. Walker and
C. Hsiung and
J. Schwarzmeier and
K. Lue and
S. Orszag and
F. Seidl and
O. Johnson and
R. Goodrum and
J. Martin The PERFECT Club Benchmarks: Effective
Performance Evaluation of Supercomputers 5--40
Patrick R. Amestoy and
Iain S. Duff Vectorization of a Multiprocessor
Multifrontal Code . . . . . . . . . . . 41--59
Armel de La Bourdonnaye The Element By Element Method as a
Preconditioner for Linear Systems Coming
From Finite Element Models . . . . . . . 60--68
Brendan McNamara Supercomputer Throughput Benchmarks for
the Cray-1s and Cyber 205 With Estimates
for Class VII Supercomputers . . . . . . 69--85
David H. Bailey and
Horst D. Simon and
John T. Barton and
Martin J. Fouts Floating Point Arithmetic in Future
Supercomputers . . . . . . . . . . . . . 86--90
P. Y.-T. Hsu and
B. R. Rau and
K. J. M. Moriarty Applications Development on the Very
Long Instruction Word CYDRA-5 . . . . . 91--98
Bill Buzbee Report From Trondheim . . . . . . . . . 3--5
Jack Dongarra Advanced Computing Research Facility,
Mathematics and Computer Science
Division, Argonne National Laboratory 6--8
Dennis C. Jespersen and
Creon Levit A Computational Fluid Dynamics Algorithm
on a Massively Parallel Computer . . . . 9--27
Verena Meiser Umar and
Charlotte Froese Fischer Multitasking the Davidson Algorithm for
the Large, Sparse Eigenvalue Problem . . 28--53
K. J. M. Moriarty Optimizing the SU(3) Lattice Gauge
Theory Algorithm on the NEC SX-2
Supercomputer . . . . . . . . . . . . . 54--63
Hwa A. Lim and
Gregory Riccardi and
Charles M. Bauer and
Sanjay Sharma A Vector Algorithm for Lattice Gas
Hydrodynamics . . . . . . . . . . . . . 64--67
R. C. Brower and
K. J. M. Moriarty and
P. Tamayo A Fast Algorithm To Simulate the
Microcanonical Dynamics of the Ising
Model . . . . . . . . . . . . . . . . . 68--72
Anna Nagumey Book Review: \booktitleParallel and
Distributed Computation: Numerical
Methods . . . . . . . . . . . . . . . . 73--74
Sidney Fembach A U.S. high-performance computing
program . . . . . . . . . . . . . . . . 3--5
Alison Brown and
Ashley Burns and
Kevin Wohlever Centers of supercomputing --- research
at the Ohio Supercomputer Center . . . . 6--9
Kevin J. M. Moriarty and
Claudio Rebbi Supercomputer Methods for the Solution
of Fundamental Problems of Particle
Physics . . . . . . . . . . . . . . . . 10--30
Ernst L. Leiss and
Raj H. Thapar Three-Dimensional Dip Moveout on the
SX-2: An XMU Implementation . . . . . . 31--48
Anna Nagurney and
Dae-Shik Kim and
Alan G. Robinson Serial and Parallel Equilibration of
Large-Scale Constrained Matrix Problems
with Application to the Social and
Economic Sciences . . . . . . . . . . . 49--71
Abdulmannan Saati and
Sedat Biringen and
Charbel Farhat Solving Navier--Stokes Equations on a
Massively Parallel Processor: Beyond the
1 GFLOP Performance . . . . . . . . . . 72--80
Mary-Anne Mahaffy The Direction of Numerically Intensive
Computing in Higher Education . . . . . 81--87
Horst D. Simon Are Highly Parallel Systems Ready for
Prime Time? . . . . . . . . . . . . . . 88--94
Edward D. Lazowska and
Kenneth C. Sevcik Workshop on Scientific Computing
Performance Analysis . . . . . . . . . . 95--97
Steve Follin About This Issue . . . . . . . . . . . . 3--4
Warren M. Washington and
Thomas W. Bettge and
Gerald A. Meehl and
Jeffery B. Yost Computer Simulation of the Global
Climatic Effects of Increased Greenhouse
Gases . . . . . . . . . . . . . . . . . 5--19
Robert B. Wilhelmson and
Brian F. Jewett and
Crystal Shaw and
Louis J. Wicker and
Matthew Arrott and
Colleen B. Bushell and
Mark Bajuk and
Jeffrey Thingvold and
Jeffery B. Yost A Study of the Evolution of a
Numerically Modeled Severe Storm . . . . 20--36
Mark A. Johnson and
James J. O'Brien Modeling the Pacific Ocean . . . . . . . 37--47
W. Reid Thompson Global Four-Band Spectral Classification
of Jupiter's Clouds: Color/Albedo Units
and Trends . . . . . . . . . . . . . . . 48--65
Susumu Shirayama and
Kunio Kuwahara Flow Visualization in Computational
Fluid Dynamics . . . . . . . . . . . . . 66--80
Nateri K. Madavan and
Paul Kelaita and
Sharad Gavali Supercomputer Applications in Gas
Turbine Flowfield Simulation . . . . . . 81--95
Stuart E. Rogers and
Dochan Kwak and
Cetin Kiris and
I-Dee Chang Numerical Simulation of Flow Through
Biofluid Devices . . . . . . . . . . . . 96--106
Ichiro Hagiwara and
Masaaki Tsuda and
Yoshihiro Sato and
Yuichi Kitagawa Simulation of Automobile Side Member
Collapse for Crash Energy Management . . 107--114
Akio Koide Visual Simulation of a Chemical Reaction 115--123
Fumiko Yonezawa and
Shoichi Sakamoto and
Shuichi Nosé Glass, Transition . . . . . . . . . . . 124--133
David A. Dixon and
William B. Farnham and
Patrick J. Capobianco Quantum Chemical Molecular Models for
Fluorinated Polymers: Visualization of
Structures and Vibrational Motions . . . 134--149
Robert B. Haber Scientific Visualization: What's Beyond
the Vision? . . . . . . . . . . . . . . 150--153
James D. Foley Scientific Data Visualization Software:
Trends and Directions . . . . . . . . . 154--157
Tom Kitchens The U.S. Department of Energy's ``Grand
Challenge'' Program . . . . . . . . . . 3--5
Arthur A. Mirin The National Energy Research
Supercomputer Center . . . . . . . . . . 6--10
Brian E. Hingerty and
Suse Broyde Atomic Resolution Structures of DNA and
DNA Modified by Carcinogens . . . . . . 11--21
Rajiv K. Kalia and
Priya Vashishta and
Lin H. Yang and
Fred W. Dech and
John Rowlan Quantum Molecular Dynamics: a New
Algorithm for Linear and Nonlinear
Electron Transport in Disordered
Materials . . . . . . . . . . . . . . . 22--33
D. V. Anderson and
W. A. Cooper and
R. Gruber and
S. Merazzi and
U. Schwenn Methods for the Efficient Calculation of
the Magnetohydrodynamic (MHD) Stability
Properties of Magnetically Confined
Fusion Plasmas . . . . . . . . . . . . . 34--47
K. M. Bitar and
R. Edwards and
U. Heller and
A. D. Kennedy and
W. Liu and
T. A. DeGrand and
S. A. Gottlieb and
A. Krasnitz and
J. B. Kogut and
R. L. Renken and
M. C. Ogilvie and
P. Rossi and
D. K. Sinclair and
K. C. Wang and
R. L. Sugar and
M. Teper and
D. Toussaint The High Energy Monte Carlo Grand
Challenge: Simulating Quarks and Gluons 48--60
Claude Bernard and
Rajan Gupta and
Gregory Kilcup and
Stephen R. Sharpe and
Amarjit Soni Lattice Calculation of Electroweak
Amplitude . . . . . . . . . . . . . . . 61--71
Keh-Fei Liu Hadron Structure and Interaction from
lattice Quantum Chromodynamics
Calculations . . . . . . . . . . . . . . 72--80
Andrew Pohorille and
Wilson S. Ross and
Ignacio Tinoco, Jr. DNA Dynamics in Aqueous Solution:
Opening the Double Helix . . . . . . . . 81--96
B. A. Carreras and
N. Dominguez and
J. B. Drake and
J.-N. Leboeuf and
L. A. Charlton and
J. A. Holmes and
D. K. Lee and
V. E. Lynch and
L. Garcia Plasma Turbulence Calculations On
Supercomputers . . . . . . . . . . . . . 97--110
Y.-Y. Ye and
C.-T. Chan and
K.-M. Ho and
B. N. Harmon Total Energy Calculations for Structural
Phase Transformations . . . . . . . . . 111--121
James W. Davenport and
Guo-Xin Qian and
Gayanath W. Fernando and
Michael Weinert First Principles Molecular Dynamics
Studies of Liquid and Solid Sodium . . . 122--130
Larry Lee and
Sunny Christensen The North Carolina Supercomputing
Center: a Study of Economic Development
Impact . . . . . . . . . . . . . . . . . 3--8
Sangback Ma and
Anthony T. Chronopoulos Implementation of Iterative Methods for
Large Sparse Nonsymmetric Linear Systems
on a Parallel Vector Machine . . . . . . 9--24
Marco Zaider and
David E. Orr and
John L. Fry Calculational Aspects of the Assessment
of Dielectric Response Function and
Energy Loss in Materials: Applications
to Ice and Polyacetylene . . . . . . . . 25--39
Jack C. M. Wang and
John M. Gary and
Hari K. Iyer A Technique to Evaluate Benchmarks: a
Case Study Using the Livermore Loops . . 40--55
B. McNamara and
K. J. M. Moriarty Computer-Aided Software Development
Tools for the Supercomputer Environment 56--70
Robert B. Haber and
David A. McNabb and
Robert A. Ellis Eliminating Distance in Scientific
Computing: An Experiment in
Televisualization . . . . . . . . . . . 71--89
Gary Demos Issues in Applying Massively Parallel
Computing Power . . . . . . . . . . . . 90--105
Charlotte Froese Fischer Concurrent Vector Algorithms for Spline
Solutions of the Helium Pair Equation 5--20
X. W. Wang and
Steven G. Louie and
Marvin L. Cohen Predicting High-Pressure and
Excited-State Properties of Real
Materials . . . . . . . . . . . . . . . 21--33
L. G. Ferreira and
S.-H. Wei and
Alex Zunger Stability, Electronic Structure, and
Phase Diagrams of Novel
Inter-Semiconductor Compounds . . . . . 34--56
Gregory J. Tawa and
Jules W. Moskowitz and
Paula A. Whitlock and
Kevin E. Schmidt Accurate First Principles Calculation of
Many-Body Interactions . . . . . . . . . 57--71
Mutsumi Aoyagi and
Ron Shepard and
Albert F. Wagner An Ab Initio Theoretical Study of the $
\hbox {CH} + \hbox {H}_2
\rightleftharpoons \hbox {CH}_3^*
\rightleftharpoons \hbox {CH}_2 + \hbox
{H} $ Reactions . . . . . . . . . . . . 72--89
Hans M. Amman and
David A. Kendrick Parallel Processing for Large-Scale
Nonlinear Control Experiments in
Economics . . . . . . . . . . . . . . . 90--95
Chris Barrett and
Frank Bobrowicz and
Ralph G. Brickner and
Bradley A. Clark and
Rajan Gupta and
Ann H. Hayes and
Harold Trease and
Andrew B. White, Jr. Centers of supercomputing ---
supercomputing at Los Alamos National
Laboratory . . . . . . . . . . . . . . . 3--9
Michael T. Heath and
George A. Geist and
John B. Drake Early Experience with the Intel iPSC/860
at Oak Ridge National Laboratory . . . . 10--26
James M. Hutchinson and
Stavros A. Zenios Financial Simulations on a Massively
Parallel Connection Machine . . . . . . 28--46
Maurice Yarrow and
Unmeel B. Mehta Multiprocessing on Supercomputers for
Computational Aerodynamics . . . . . . . 47--73
Hong-Qiang Ding Simulating Lattice QCD on a Caltech/JPL
Hypercube . . . . . . . . . . . . . . . 74--81
Hsieh-Lung Hsu and
Hojjat Adeli A Microtasking Algorithm for
Optimization of Structures . . . . . . . 82--91
Michael P. Persons and
Lawrence L. Halcomb Decoupled asynchronous I/O for data
processing applications on
supercomputers . . . . . . . . . . . . . 92--95
Nora H. Sabelli Perspectives: Role of High-Performance
Computing in Science Education . . . . . 95--98
Anonymous Meetings . . . . . . . . . . . . . . . . 102--103
Anonymous The International Journal of
Supercomputer Applications- Information
for Contributors . . . . . . . . . . . . 104--105
Joanne L. Martin In Memoriam --- Sidney Fernbach
(1917-1991) . . . . . . . . . . . . . . 3--3
Dennis W. Duke Computational Science at the
Supercomputer Computations Research
Institute . . . . . . . . . . . . . . . 4--12
John M. Dawson and
Richard D. Sydora and
Viktor K. Decyk and
Paulette C. Liewer and
Robert D. Ferraro Physics Modeling of Tokamak Transport, a
Grand Challenge for Controlled Fusion 13--35
R. T. Scalettar and
D. J. Scalapino and
R. L. Sugar and
S. R. White Quantum Monte Carlo Simulations of a $
\hbox {CuO}_2 $ Model . . . . . . . . . 36--45
Misako Ishiguro Queuing Model Analysis of the Fujitsu
VP2000 with Dual Scalar Architecture . . 46--62
D. H. Bailey and
E. Barszcz and
J. T. Barton and
D. S. Browning and
R. L. Carter and
L. Dagum and
R. A. Fatoohi and
P. O. Frederickson and
T. A. Lasinski and
R. S. Schreiber and
H. D. Simon and
V. Venkatakrishnan and
S. K. Weeratunga The NAS Parallel Benchmarks . . . . . . 63--73
Anthony T. Chronopoulos and
C. R. Swaminathan and
V. R. Voller The Stefan Problem Solved via Conjugate
Gradient-like Iterative Methods on a
Parallel Vector Machine . . . . . . . . 74--91
M. J. Daydé and
I. S. Duff Use of Level 3 BLAS in $ L U $
Factorization in a Multiprocessing
Environment on Three Vector
Multiprocessors: The Alliant FX/80, the
CRAY-2, and the IBM 3090 VF . . . . . . 92--110
Anonymous Meetings . . . . . . . . . . . . . . . . 113--114
Anonymous The \booktitleInternational Journal of
Supercomputer Applications ---
Information for Contributors . . . . . . 115--116
Thomas A. Weber The National Science Foundation
Supercomputer Centers Program . . . . . 3--3
Lawrence E. Brandt Centers of supercomputing --- a history
and prospectus for the NSF supercomputer
centers . . . . . . . . . . . . . . . . 4--9
Paulette Clancy Computer Simulation of Crystal Growth
and Dissolution in Metals and
Semiconductors . . . . . . . . . . . . . 10--33
M. D. Smooke and
V. Giovangigli Numerical Modeling of Axisymmetric
Laminar Diffusion Flames by a Parallel
Boundary Value Method . . . . . . . . . 34--49
Steven A. Gottlieb and
A. Krasnitz and
U. M. Heller and
A. D. Kennedy and
W. Liu and
J. B. Kogut and
R. L. Renken and
D. K. Sinclair and
K. C. Wang and
R. L. Sugar and
D. Toussaint Hadron Thermodynamics on the Connection
Machine . . . . . . . . . . . . . . . . 50--60
Claude Bernard and
Michael C. Ogilvie and
Thomas A. DeGrand and
Carleton E. DeTar and
Steven A. Gottlieb and
A. Krasnitz and
R. L. Sugar and
D. Toussaint Studying Quarks And Gluons on MIMD
Parallel Computers . . . . . . . . . . . 61--70
Lars Hernquist The Fueling of Active Galaxies . . . . . 71--83
Herbert W. Hamber Simulations of Discrete Quantized
Gravity . . . . . . . . . . . . . . . . 84--97
Charles L. Brooks III and
William S. Young and
Douglas J. Tobias Molecular Simulations On Supercomputers 98--112
Anonymous Meetings . . . . . . . . . . . . . . . . 113--114
Joanne L. Martin Editorial . . . . . . . . . . . . . . . 3--3
Bahram Nassersharif Centers of Supercomputing --- Science
and Engineering at the Texas A&M
University Supercomputer Center . . . . 4--12
Michael W. Berry Large-Scale Sparse Singular Value
Computations . . . . . . . . . . . . . . 13--49
Lawrence Sirovich and
Richard Everson Management and Analysis of Large
Scientific Datasets . . . . . . . . . . 50--68
Krister Dackland and
Erik Elmroth and
Bo Kågström and
Charles Van Loan Parallel block matrix factorizations on
the shared-memory multiprocessor IBM
3090 VF/600J . . . . . . . . . . . . . . 69--97
S. K. Kim and
A. T. Chronopoulos An Efficient Parallel Algorithm for
Extreme Eigenvalues of Sparse
Nonsymmetric Matrices . . . . . . . . . 98--111
Cherri M. Pancake What Should We Expect from Parallel
Language Standards? . . . . . . . . . . 112--117
Anonymous Announcements . . . . . . . . . . . . . 118--119
Anonymous Meetings . . . . . . . . . . . . . . . . 120--121
Anonymous The International Journal of
Supercomputer Applications-Information
for Contributors . . . . . . . . . . . . 122--123
Matthew Witten Editorial: the Frankenstein Project:
Building a Man in the Machine and the
Arrival of the Computational Physician 127--137
Robert Jones Protein Sequence and Structure
Comparison on Massively Parallel
Computers . . . . . . . . . . . . . . . 138--146
Dean F. Sittig and
Mark A. Shifman and
Prakash Nadkarni and
Perry L. Miller Parallel Computation for Medicine and
Biology: Applications of Linda at Yale
University . . . . . . . . . . . . . . . 147--163
Richard T. Hart and
Z. Maria Oden and
Susannah W. Parrish and
David B. Burr Computational Methods for Bone Mechanics
Studies . . . . . . . . . . . . . . . . 164--174
David Strip and
Michael Karasick Solid Modeling On a Massively Parallel
Processor . . . . . . . . . . . . . . . 175--192
Jianping Zhu and
Yung Ming Chen History Matching for Multiphase
Reservoir Models on Shared Memory
Supercomputers . . . . . . . . . . . . . 193--206
Anonymous Announcements . . . . . . . . . . . . . 207--207
Anonymous Meetings . . . . . . . . . . . . . . . . 208--209
Donald M. Austin Centers of supercomputing --- the
University of Minnesota Army High
Performance Computing Research Center 215--223
Hans-Georg Reusch Experiences with the Parallelization and
Vectorization of Simulation Codes for
Heavy-Ion Reactions . . . . . . . . . . 224--240
Jean-Philippe Brunet and
S. Lennart Johnsson All-to-All Broadcast and Applications on
the Connection Machine . . . . . . . . . 241--256
Matthew Witten and
Robert E. Wyatt Increasing Our Understanding of
Biological Models Through Visual and
Sonic Representations: a Cortical Case
Study . . . . . . . . . . . . . . . . . 257--280
R. C. Brower and
C. Rebbi and
P. Tamayo and
K. J. M. Moriarty and
S. Sanielevici Benchmarking High-Performance Computing
Systems by Means of Local-Creutz
Simulations of the $ d = 2 $ Ising Model 281--287
Scientific Supercomputing Subcommittee, Technical Committee on Supercomputing Applications, IEEE Computer Society NSF Supercomputer Center Study: February
1992 . . . . . . . . . . . . . . . . . . 288--303
Anonymous Largest-Known Prime Number Uncovered . . 304--304
Anonymous Meetings . . . . . . . . . . . . . . . . 305--307
Jack Dongarra Editorial . . . . . . . . . . . . . . . 313--313
Kevin Timson and
Ann Redelfs Centers of supercomputing --- Center for
Research on Parallel Computation . . . . 314--321
S. Lennart Johnsson and
Luis F. Ortiz Local Basic Linear Algebra Subroutines
(LBLAS) for distributed memory
architectures and languages with array
syntax . . . . . . . . . . . . . . . . . 322--350
Roberto Ansaloni and
Stefano Evangelisti and
Giuseppe Paruolo and
Elda Rossi Efficient Parallel Implementation of a
Full Configuration Interaction Algorithm
for Circular Polyenes on a CRAY Y-MP . . 351--360
K. J. M. Moriarty and
S. Sanielevici and
D. W. Kuba Parallel Processing and the Sustained
Production Performance of the CRAY Y-MP:
Benchmarks Using Optimized Microtasked
Lattice SU(3) Code . . . . . . . . . . . 361--370
S. Y. Moon and
C. S. Yoon and
T. J. Chung Multitasking for Local Parallelism in
Applications to Chemically Reacting
Supersonic Flows on CRAY Y-MP . . . . . 371--382
Skef Wholey and
Clifford Lasser and
Gyan Bhanot Correspondence: FLO67: a Case Study in
Scalable Programming . . . . . . . . . . 383--388
Anonymous Announcements . . . . . . . . . . . . . 389--389
Anonymous Meetings . . . . . . . . . . . . . . . . 390--391
Anonymous The International Journal of
Supercomputer Applications- . . . . . . 392--406
S. K. Kim and
A. T. Chronopoulos An Efficient Parallel Algorithm for
Extreme Eigenvalues of Sparse
Nonsymmetric Matrices . . . . . . . . . 407--420
Anonymous Perspectives . . . . . . . . . . . . . . 421--426
Anonymous Announcements . . . . . . . . . . . . . 427--428
Anonymous Meetings . . . . . . . . . . . . . . . . 429--430
Anonymous The International Journal of
Supercomputer Applications- . . . . . . 431--432
J. B. Drake and
G. A. Geist and
H. R. Hicks and
K. L. Kliewer and
G. M. Stocks and
L. E. Toran and
P. H. Worley The Center for Computational Sciences at
Oak Ridge National Laboratory . . . . . 3--14
Shiwei Zhang and
M. H. Kalos Exact Monte Carlo Calculations for
Fermions on a Parallel Machine . . . . . 15--24
C.-S. Chang and
G. De Titta and
H. Hauptman and
R. Miller and
P. Thuman and
C. Weeks Using Parallel Computers to Solve the
Phase Problem of X-Ray Crystallography 25--49
Fongray Frank Young and
Chwan-Hwa ``John'' Wu A Fully Vectorized Code for
Nonequilibrium RF Glow Discharge Fluid
Modeling and Its Parallel Processing on
a CRAY X-MP . . . . . . . . . . . . . . 50--63
Patrick R. Amestoy and
Iain S. Duff Memory Management Issues in Sparse
Multifrontal Methods on Multiprocessors 64--82
Juli Raw and
Donald C. Aston and
Karinne W. Gordon and
Kyle Wheeler The 0th Heterogeneous Computing
Challenge: Fun and (Sometimes Too Much)
Excitement . . . . . . . . . . . . . . . 91--96
Gary A. Mastin and
Steven J. Plimpton and
Dennis C. Ghiglia A Massively Parallel Digital Processor
for Spotlight Synthetic Aperture Radar 97--112
Alan Edelman Large Dense Numerical Linear Algebra in
1993: The Parallel Computing Influence 113--128
Mark T. Jones and
Paul E. Plassmann Computation of Equilibrium Vortex
Structures for Type-II Superconductors 129--143
Charis Gantes and
Jerome J. Connor and
Robert D. Logcher Simulation of the Deployment Process of
Multiunit Deployable Structures on a
CRAY-2 . . . . . . . . . . . . . . . . . 144--154
H. Adeli and
S. L. Hung A Concurrent Adaptive Conjugate Gradient
Learning Algorithm on MIMD Shared-Memory
Machines . . . . . . . . . . . . . . . . 155--166
Lincoln Gray and
Scott Klasky and
Robert Byers Visualizing Complex Patterns in the
Spread of Head and Neck Cancers . . . . 167--178
Anonymous Message-Passing Interface . . . . . . . 179--179
Anonymous Meetings . . . . . . . . . . . . . . . . 180--181
Anonymous The International Journal of
Supercomputer Applications- . . . . . . 182--183
Anna Nagurney Introduction To the Special Issue . . . 187--188
Mahmoud A. El-Gamal and
Richard D. McKelvey and
Thomas R. Palfrey Computational Issues in the Statistical
Design and Analysis of Experimental
Games . . . . . . . . . . . . . . . . . 189--200
Hans M. Amman and
David A. Kendrick Forward Looking Behavior and Learning in
Stochastic Control . . . . . . . . . . . 201--211
Ayse Imrohoroglu and
Selahattin Imrohoroglu and
Douglas H. Joines A Numerical Algorithm for Solving Models
with Incomplete Markets . . . . . . . . 212--230
Vassilis Argyrou Hajivassiliou Simulating normal rectangle
probabilities and their derivatives:
Effects of vectorization . . . . . . . . 231--253
Manfred Gilli and
Giorgio Pauletto Econometric Model Simulation On Parallel
Computers . . . . . . . . . . . . . . . 254--264
Agapi L. Somwaru and
Kenneth Hanson Globally convex agricultural production
system: parameter estimation . . . . . . 265--271
K. Lowther and
J. C. Salem and
J. A. Sethian Interactive, animated visualization
environment for three-dimensional fluid
flow . . . . . . . . . . . . . . . . . . 277--291
Yan Huo and
Robert Schreiber Efficient, Massively Parallel Eigenvalue
Computation . . . . . . . . . . . . . . 292--303
George Delic Performance Attributes for Code and
Workload Analysis on CRAY X-MP and Y-MP
Systems . . . . . . . . . . . . . . . . 304--336
C.-H. Lai Domain decomposition methods for
semiconductor device problems on a Cray
S-MP . . . . . . . . . . . . . . . . . . 337--348
Soren S. Nielsen and
Stavros A. Zenios Massively Parallel Proximal Algorithms
for Solving Linear Stochastic Network
Programs . . . . . . . . . . . . . . . . 349--364
Anonymous Meetings . . . . . . . . . . . . . . . . 365--366
Joanne L. Martin Editorial . . . . . . . . . . . . . . . 3--4
Frederick H. Hausheer Introduction to the Theme Issue . . . . 5--5
Terry R. Stouch and
Howard E. Alper and
Donna Bassolino-Klimas Supercomputing Studies of Biomembranes 6--23
Raul E. Cachau and
Rick Gussio and
John A. Beutler and
Gwendolyn N. Chmurny and
Bruce D. Hilton and
Gary M. Muschik and
John W. Erickson Solution Structure of Taxol Determined
Using a Novel Feedback-Scaling Procedure
for NOE-Restrained Molecular Dynamics 24--34
Salvatore Profeta, Jr. and
Rayomand J. Unwalla and
Daniel J. Russell Relative energies and structural
features of small amines and their
ammonium analogs: Results from 6-31G*
optimizations and an MM2 ammonium force
field . . . . . . . . . . . . . . . . . 35--46
John E. Mertz and
B. Montgomery Pettitt Molecular Dynamics At a Constant pH . . 47--53
Ai Chen and
Cynthia S. Hirtzel Massively Parallel Monte Carlo
Simulations on CM2 for Gas Adsorption in
Zeolite Molecular Sieves . . . . . . . . 54--63
Manish Deshpande and
Jinzhang Feng and
Charles L. Merkle and
Ashish Deshpande Application of a Distributed Network in
Computational Fluid Dynamic Simulations 64--67
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing . . . . . . . . . 68--69
Ken Kennedy and
Kevin Timson Centers of supercomputing --- making
parallel computing truly usable:
research, education, and knowledge
transfer at the Center for Research on
Parallel Computation . . . . . . . . . . 73--79
Mani Chandy and
Ian Foster and
Ken Kennedy and
Charles Koelbel and
Chau-Wen Tseng Integrated Support for Task and Data
Parallelism . . . . . . . . . . . . . . 80--98
Jaeyoung Choi and
Jack J. Dongarra and
Roldan Pozo and
Danny C. Sorensen and
David W. Walker CRPC Research into Linear Algebra
Software for High Performance Computers 99--118
Ulrich Kremer and
Marcelo Ramé Compositional Oil Reservoir Simulation
in Fortran D: a Feasibility Study On
Intel iPsc\slash 860 . . . . . . . . . . 119--128
John K. Salmon and
Michael S. Warren and
Gregoire S. Winckelmans Fast Parallel Tree Codes for
Gravitational and Fluid Dynamical $
{N}$-Body Problems . . . . . . . . . . . 129--142
B. Averick and
C. Bischof and
B. Bixby and
A. Carle and
J. Dennis and
M. El-Alem and
A. El-Bakry and
A. Griewank and
G. Johnson and
R. Lewis and
J. Moré and
R. Tapia and
V. Torczon and
K. Williamson Numerical Optimization at the Center for
Research on Parallel Computation . . . . 143--153
Anonymous Supercomputer applications and High
Performance Computing- . . . . . . . . . 154--155
Anonymous MPI: a Message-Passing Interface
Standard . . . . . . . . . . . . . . . . 159--416
Luca F. Pavarino and
Marcelo Ramé Numerical Experiments With an
Overlapping Additive Schwarz Solver for
$3$-D Parallel Reservoir Simulation . . 3--17
Marie-Odile Bristeau and
Jocelyne Erhel and
Philippe Féat and
Roland Glowinski and
Jacques Périaux Solving the Helmholtz Equation At
High-Wave Numbers On a Parallel Computer
With a Shared Virtual Memory . . . . . . 18--28
Bruce A. Shapiro and
Jih-Hsiang Chen and
Tim Busse and
Joseph Navetta and
Wojciech Kasprzak and
Jacob V. Maizel, Jr. Optimization and Performance Analysis of
a Massively Parallel Dynamic Programming
Algorithm for RNA Secondary Structure
Prediction . . . . . . . . . . . . . . . 29--39
Yu-Chung Chang and
Tony F. Chan Performance Modeling for High-Order
Finite Difference Methods on the
Connection Machine CM-2 . . . . . . . . 40--57
W. F. Wong and
Yoshio Oyanagi and
Eiichi Goto Evaluation of the Hitachi S-3800
Supercomputer Using Six Benchmarks . . . 58--70
U. Kremer and
M. Ramé Erratum: Compositional Oil Reservoir
Simulation in Fortran D: a Feasibility
Study on Intel iPSC/860 . . . . . . . . 71--71
Anonymous The International Journal of
Supercomputer Applications and High
Performance Computing- . . . . . . . . . 72--73
Louis H. Turcotte Introduction . . . . . . . . . . . . . . 77--78
Anthony Skjellum and
Ewing Lusk and
William Gropp Early applications in the
Message-Passing Interface (MPI) . . . . 79--94
Steven A. Moyer and
Vaidy S. Sunderam Parallel I/O as a Parallel Application 95--107
Adam Beguelin and
Jack Dongarra and
Al Geist and
Robert Manchek and
Vaidy Sunderam Recent Enhancements to PVM . . . . . . . 108--127
P. Dragovitsch and
X. Zhao and
L. C. Dennis and
G. A. Riccardi PVMGeant --- a Parallel Simulation Code
for the CLAS Detector at CEBAF . . . . . 128--137
Timothy G. Mattson Programming Environments for Parallel
and Distributed Computing: a Comparison
of P4, PVM, Linda, and TCGMSG . . . . . 138--161
Alexandre Ern and
Craig C. Douglas and
Mitchell D. Smooke Detailed Chemistry Modeling of Laminar
Diffusion Flames on Parallel Computers 167--186
Thomas A. Cortese and
S. Balachandar High Performance Spectral Simulation of
Turbulent Flows in Massively Parallel
Machines with Distributed Memory . . . . 187--204
Vincent Bouchitté and
Pierre Boulet and
Alain Darte and
Yves Robert Evaluating Array Expressions on
Massively Parallel Machines with
Communication/Computation Overlap . . . 205--219
Henry Ker-Chang Chang and
Chung-Yu Liou Parallel Implementation of Linear
Quadtree Codes Using the nCube 2
Supercomputer System . . . . . . . . . . 220--231
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing . . . . . . . . . 232--233
Ember Uziel and
Michael W. Berry Parallel Models of Animal Migration in
Northern Yellowstone National Park . . . 237--255
Santhosh Kumaran and
Robert N. Miller A Comparison of Parallelization
Techniques for a Finite Element
Quasigeostrophic Model of Regional Ocean
Circulation . . . . . . . . . . . . . . 256--279
Chris H. Walshaw and
Mark Cross and
Martin G. Everett A Localized Algorithm for Optimizing
Unstructured Mesh Partitions . . . . . . 280--295
Sridhar Chirravuri and
Suchendra M. Bhandarkar and
David Whitmire A Massively Parallel Algorithm for $ K_2
$ Entropy Computation: Case Studies of
Model Systems and \em In Vivo Data . . . 296--311
Ann-Marie Mårtensson-Pendrill Perspectives: Turnaround times at a
supercomputing center . . . . . . . . . 312--314
Yu Hu and
S. Lennart Johnsson A Data-Parallel Implementation of
Hierarchical $ {N}$-Body Methods . . . . 3--40
Andreas Stathopoulos and
Anders B. Ynnerman and
Charlotte Froese Fischer A PVM Implementation of the MCHF Atomic
Structure Package . . . . . . . . . . . 41--61
Thomas Rauber and
Gudula Rünger Parallel Implementations of Iterated
Runge--Kutta Methods . . . . . . . . . . 62--90
George Delic and
Richard I. Haller Factor Analysis of Applications
Performance Data for the Cray Y-MP . . . 91--113
Thomas A. DeFanti and
Ian Foster and
Michael E. Papka and
Rick Stevens and
Tim Kuhfuss Overview of the I-WAY: Wide-Area Visual
Supercomputing . . . . . . . . . . . . . 123--131
Michael L. Norman and
Peter Beckman and
Greg Bryan and
John Dubinski and
Dennis Gannon and
Lars Hernquist and
Kate Keahey and
Jeremiah P. Ostriker and
John Shalf and
Joel Welling and
Shelby Yang Galaxies Collide on the I-WAY: An
Example of Heterogeneous Wide-Area
Collaborative Supercomputing . . . . . . 132--144
Valerie E. Taylor and
Milana Huang and
Thomas Canfield and
Rick Stevens and
Daniel Reed and
Stephen Lamm Performance Modeling of Interactive;
Immersive Virtual Environments for
Finite Element Simulations . . . . . . . 145--156
George A. Geist II and
James A. Kohl and
Donald M. C. Nicholson and
Philip M. Papadopoulos and
Bart D. Semeraro and
William A. Shelton and
G. Malcolm Stocks and
Yang Wang Early Experiences with Distributed
Supercomputing on I-WAY: First
Principles Materials Science and
Parallel Acoustic Wave Propagation . . . 157--169
Stephen J. Young and
Gary Guo You Fan and
David Hessler and
Stephan Lamont and
T. Todd Elvins and
Martin Hadida-Hassan and
Gary Alan Hanyzewski and
James W. Durkin and
Philip Hubbard and
Gordon Kindlmann and
Eric Wong and
Donald Greenberg and
Sidney Karin and
Mark H. Ellisman Implementing a Collaboratory for
Microscopic Digital Anatomy . . . . . . 170--181
Gary D. Kerbel and
Tim Pierce and
J. L. Milovich and
Dan E. Shumaker and
Alan Verlo and
Ronald E. Waltz and
Gregory W. Hammett and
Mike A. Beer and
Bill Dorland Interactive Scientific Exploration of
Gyrofluid Tokamak Turbulence . . . . . . 182--198
Glen H. Wheless and
Cathy M. Lascara and
Arnoldo Valle-Levinson and
Donald P. Brutzman and
William Sherman and
William L. Hibbard and
Brian E. Paul The Chesapeake Bay Virtual Environment
(CBVE): Initial Results from the
Prototypical System . . . . . . . . . . 199--210
William L. Hibbard and
John Anderson and
Ian Foster and
Brian E. Paul and
Robert Jacob and
Chad Schafer and
Mary K. Tyree Exploring Coupled Atmosphere-Ocean
Models Using Vis5D . . . . . . . . . . . 211--222
Darin Diachin and
Lori Freitag and
Daniel Heath and
Jim Herzog and
William Michels and
Paul Plassmann Collaborative Virtual Environments Used
in the Design of Pollution Control
Systems . . . . . . . . . . . . . . . . 223--235
Richard M. Crutcher and
M. Pauline Baker and
George Baxter and
John Pixton and
Raymond Plante and
Harold Ravlin and
Douglas Roberts and
Randall Sharpe Radio Synthesis Imaging: a Grand
Challenge HPCC Project . . . . . . . . . 236--245
Anonymous Information for Contributors . . . . . . 246--247
Mark T. Nelson and
William F. Humphrey and
Attila Gursoy and
Andrew Dalke and
Laxmikant V. Kalé and
Robert D. Skeel and
Klaus Schulten NAMD: a Parallel Object-Oriented
Molecular Dynamics Program . . . . . . . 251--268
Susan Burgee and
Anthony A. Giunta and
Vladimir Balabanov and
Bernard Grossman and
William H. Mason and
Robert Narducci and
Raphael T. Haftka and
Layne T. Watson A Coarse-Grained Parallel
Variable-Complexity Multidisciplinary
Optimization Paradigm . . . . . . . . . 269--299
David Kramer and
S. Lennart Johnsson and
Yu Hu Local Basic Linear Algebra Subroutines
(LBLAS) for the CM-5/SE . . . . . . . . 300--335
Andrew Ilin and
L. Ridgway Scott Correspondence: Loop Splitting for High
Performance Computers . . . . . . . . . 336--340
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing: Information for
Contributors . . . . . . . . . . . . . . 341--342
Anonymous Index: Volume 10 . . . . . . . . . . . . 343--345
Jan Clinckemaillie and
Birgit Elsner and
Guy Lonsdale and
Serge Meliciani and
Stefanos Vlachoutsis and
Frank de Bruyne and
Michael Holzner Performance Issues of the Parallel
PAM-CRASH Code . . . . . . . . . . . . . 3--11
Susan E. Dorward and
Lesley R. Matheson and
Robert E. Tarjan Toward Efficient Unstructured Multigrid
Preprocessing . . . . . . . . . . . . . 12--33
Jeffrey M. Constantin and
Michael W. Berry and
Bradley T. Vander Zanden Parallelization of the Hoshen--Kopelman
Algorithm Using a Finite State Machine 34--48
Michael T. Heath and
Padma Raghavan Performance of a Fully Parallel Sparse
Solver . . . . . . . . . . . . . . . . . 49--64
Paul Fischer and
David Gottlieb On the Optimal Number of Subdomains for
Hyperbolic Problems on Parallel
Computers . . . . . . . . . . . . . . . 65--76
Jack Dongarra and
Bernard Tourancheau Preface To the Special Issue . . . . . . 83--83
Cherri M. Pancake Can Users Play an Effective Role in
Parallel Tools Research? . . . . . . . . 84--94
Jean-Luc Dekeyser and
Christian Lefebvre HPF-Builder: a Visual Environment to
Transform Fortran 90 Codes to HPF . . . 95--102
William Gropp and
Ewing Lusk Sowing MPICH: a Case Study in the
Dissemination of a Portable Environment
for Parallel Scientific Computing . . . 103--114
Ian Foster and
Carl Kesselman Globus: a Metacomputing Infrastructure
Toolkit . . . . . . . . . . . . . . . . 115--128
Andrew S. Grimshaw and
Anh Nguyen-Tuong and
Mark J. Lewis and
M. Hyett Campus-Wide Computing: Early Results
Using Legion at the University of
Virginia . . . . . . . . . . . . . . . . 129--143
Oleg Y. Nickolayev and
Philip C. Roth and
Daniel A. Reed Real-Time Statistical Clustering for
Event Trace Reduction . . . . . . . . . 144--159
Thomas Ludwig and
Roland Wismüller and
Michael Oberhuber and
Arndt Bode An Open Interface for the On-Line
Monitoring of Parallel and Distributed
Programs . . . . . . . . . . . . . . . . 160--174
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing: Information for
Contributors . . . . . . . . . . . . . . 175--176
Janice E. Cuny and
Robert A. Dunn and
Steven T. Hackstadt and
Christopher W. Harrop and
Harold H. Hersey and
Allen D. Malony and
Douglas R. Toomey Building Domain-Specific Environments
for Computational Science: a Case Study
in Seismic Tomography . . . . . . . . . 179--196
Françoise Tisseur Parallel Implementation of the Yau and
Lu Method for Eigenvalue Computation . . 197--204
Pierre Manneback Solving Irregular Sparse Linear Systems
on a Multicomputer Using the CGNR Method 205--211
Henri Casanova and
Jack Dongarra NetSolve: a Network-Enabled Server for
Solving Computational Science Problems 212--223
G. A. Geist II and
James Arthur Kohl and
Philip M. Papadopoulos CUMULVS: Providing Fault Tolerance,
Visualization, and Steering of Parallel
Applications . . . . . . . . . . . . . . 224--235
Karsten M. Decker and
Brian J. N. Wylie Software Tools for Scalable Multilevel
Application Engineering . . . . . . . . 236--250
Roldan Pozo Template Numerical Toolkit for Linear
Algebra: High Performance Programming
with C++ and the Standard Template
Library . . . . . . . . . . . . . . . . 251--263
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing: Information for
Contributors . . . . . . . . . . . . . . 264--265
Adrian Colbrook and
Iain Duff and
Tony Hey and
Klaus Stüben and
Clemens-August Thole Editorial . . . . . . . . . . . . . . . 275--276
Thierry Coupez and
Stéphane Marie From a Direct Solver to a Parallel
Iterative Solver in 3-D Forming
Simulation . . . . . . . . . . . . . . . 277--285
Michel Géradin and
Danielle Coulon and
Jean-Pierre Delsemme Parallelization of the SAMCEF Finite
Element Software through Domain
Decomposition and FETI Algorithm . . . . 286--298
Dag Fritzson and
Peter Fritzson and
Patrik Nordling and
Tommy Persson Rolling Bearing Simulation on MIMD
Computers . . . . . . . . . . . . . . . 299--313
C. Addison and
E. Appiani and
R. Cook and
M. Corvi and
P. G. N. Howard and
B. Stephens Parallel SAR Image Enhancement . . . . . 314--327
Markus Ast and
T. Jerez and
Jesus Labarta and
Hartmut Manz and
Andres Pérez and
Uwe Schulz and
Jaume Solé Runtime Parallelization of the Finite
Element Code PERMAS . . . . . . . . . . 328--335
Anders Ytterström A Tool for Partitioning Structured
Multiblock Meshes for Parallel
Computational Mechanics . . . . . . . . 336--343
Mike C. Dracopoulos and
Craig Glasgow and
A. Kevin Parrott and
Rick Janssen and
Pergiorgio Alotto and
John Simkin Bulk Synchronous Parallelization of
Industrial Electromagnetic Software . . 344--358
Anonymous Index to Volume 11 . . . . . . . . . . . 359--361
Anonymous \booktitleThe International Journal of
Supercomputer Applications and High
Performance Computing . . . . . . . . . 362--363
MPI Forum Special Issue: MPI2: a Message-Passing
Interface Standard . . . . . . . . . . . 1--299
David Mackay and
G. Mahinthakumar and
Ed D'Azevedo A Study of I/O in a Parallel Finite
Element Groundwater Transport Code . . . 307--319
P. Lockey and
R. Proctor and
I. D. James Characterization of I/O Requirements in
a Massively Parallel Shelf Sea Model . . 320--332
Ron A. Oldfield and
David E. Womble and
Curtis C. Ober Efficient Parallel I/O in Seismic
Processing . . . . . . . . . . . . . . . 333--344
Jarek Nieplocha and
Ian Foster and
Rick A. Kendall ChemIO: High Performance Parallel I/O
for Computational Chemistry Applications 345--363
Huseyin Simitci and
Daniel A. Reed A Comparison of Logical and Physical
Parallel I/O Patterns . . . . . . . . . 364--380
Anonymous \booktitleThe International Journal of
High Performance Computing Applications:
Information for Contributors . . . . . . 381--382
Rajeev Thakur and
Ewing Lusk and
William Gropp I/O in Parallel Applications: The
Weakest Link . . . . . . . . . . . . . . 389--395
G. Davis and
L. Lau and
R. Young and
F. Duncalfe and
L. Brebber Parallel Run Length Encoding
Compression: Reducing I/O in Dynamic
Environmental Simulations . . . . . . . 396--410
Meenakshi A. Kandaswamy and
Mahmut T. Kandemir and
Alok N. Choudhary and
David E. Bernholdt An Experimental Study to Analyze and
Optimize Hartree--Fock Application's I/O
with PASSION . . . . . . . . . . . . . . 411--439
Anonymous Meetings . . . . . . . . . . . . . . . . 440--445
Anonymous Index to International Journal of High
Performance Computing Applications . . . 446--447
Colin J. Aro and
Garry H. Rodrigue and
Douglas A. Rotman A High Performance Chemical Kinetics
Algorithm for $3$-D Atmospheric Models 3--15
R. Alan McCoy and
Yuefan Deng Parallel Particle Simulations of
Thin-Film Deposition . . . . . . . . . . 16--32
Alex R. Carrillo and
John E. West and
David A. Horner and
John F. Peters Interactive Large-Scale Soil Modeling
Using Distributed High Performance
Computing Environments . . . . . . . . . 33--48
Ranieri Baraglia and
Renato Ferrini and
Domenico Laforenza and
Antonio Lagan\`a On the Optimization of a Pipeline Model
to Integrate a Reduced-Dimensionality
Schrödinger Equation for Distributed
Memory Architectures . . . . . . . . . . 49--62
G. Wang and
Danesh K. Tafti Performance Enhancement on
Microprocessors with Hierarchical Memory
Systems for Solving Large Sparse Linear
Systems . . . . . . . . . . . . . . . . 63--79
S. F. Ashby and
W. J. Bosl and
R. D. Falgout and
S. G. Smith and
A. F. B. Tompson and
T. J. Williams A Numerical Simulation of Groundwater
Flow and Contaminant Transport on the
Cray T3D and C90 Supercomputers . . . . 80--93
Sandra Baldini and
Luc Giraud and
Javier G. Izaguirre and
Jose M. Jimenez and
Luis M. Matey High Performance Computing in Multibody
System Design . . . . . . . . . . . . . 99--106
Stephen T. Barnard and
Luis M. Bernardo and
Horst D. Simon An MPI Implementation of the SPAI
Preconditioner on the T3E . . . . . . . 107--123
Eleanor Chu Impact of Physical/Logical Network
Topology on Parallel Matrix Computation 124--145
Dror G. Feitelson On the Interpretation of Top500 Data . . 146--153
Aiichiro Nakano A Rigid-Body-Based Multiple Time Scale
Molecular Dynamics Simulation of
Nanophase Materials . . . . . . . . . . 154--162
Kevin R. Wadleigh High Performance FFT Algorithms for
Cache-Coherent Multiprocessors . . . . . 163--171
Jack J. Dongarra and
Bernard Tourancheau Special Issue Introduction: Clusters and
Computational Grids for Scientific
Computing . . . . . . . . . . . . . . . 179--179
Frederica Darema New Software Technologies for the
Development and Runtime Support of
Complex Applications . . . . . . . . . . 180--190
Thomas Sterling and
Daniel F. Savarese From Toys to Teraflops: Bridging the
Beowulf Gap . . . . . . . . . . . . . . 191--200
A. Chien and
M. Lauria and
R. Pennington and
M. Showerman and
G. Iannello and
M. Buchanan and
K. Connelly and
L. Giannini and
G. Koenig and
S. Krishnamurthy and
Q. Liu and
S. Pakin and
G. Sampemane Design and Evaluation of an HPVM-Based
Windows NT Supercomputer . . . . . . . . 201--219
Jim Basney and
Miron Livny Improving Goodput by Coscheduling CPU
and Network Capacity . . . . . . . . . . 220--230
Henri Casanova and
MyungHo Kim and
James S. Plank and
Jack J. Dongarra Adaptive Scheduling for Task Farming
with Grid Middleware . . . . . . . . . . 231--240
Paul A. Gray and
Vaidy S. Sunderam Metacomputing with the IceT System . . . 241--252
Alan Su and
Francine Berman and
Richard Wolski and
Michelle Mills Strout Using AppLeS to Schedule Simple SARA on
the Computational Grid . . . . . . . . . 253--262
Ariel Tamches and
Barton P. Miller Using Dynamic Kernel Instrumentation for
Kernel and Application Tuning . . . . . 263--276
Omer Zaki and
Ewing Lusk and
William Gropp and
Deborah Swider Toward Scalable Performance
Visualization with Jumpshot . . . . . . 277--288
Jeffrey L. Tilson and
Mike Minkoff and
Albert F. Wagner and
Ron Shepard and
Paul Sutton and
Robert J. Harrison and
Ricky A. Kendall and
Adrian T. Wong High-Performance Computational
Chemistry: Hartree--Fock Electronic
Structure Calculations on Massively
Parallel Processors . . . . . . . . . . 291--302
A. K. Dhingra and
M. Zhang and
R. Ratnam and
D. Suri A Coarse-Grained Parallel Homotopy for
Mechanism Design . . . . . . . . . . . . 303--319
Toshiya Kimura and
Hiroshi Takemiya Distributed Parallel Computing for
Fluid-Structure Coupled Simulations on a
Heterogeneous Parallel Computer Cluster 320--333
C. Walshaw and
M. Cross and
R. Diekmann and
F. Schlimbach Multilevel Mesh Partitioning for
Optimizing Domain Shape . . . . . . . . 334--353
George D. Byrne and
Alan C. Hindmarsh Correspondence: PVODE, an ODE Solver for
Parallel Computers . . . . . . . . . . . 354--365
Anonymous Index to International Journal of High
Performance Computing Applications,
Volume 13 . . . . . . . . . . . . . . . 366--368
Hany H. Ammar and
Zhouhui Miao Parallel Algorithms for the Training
Process of a Neural Network-Based System 3--25
M. Scot Breitenfeld and
Philippe H. Geubelle Parallel Implementation of a Spectral
Scheme for Simulations of 3-D Dynamic
Fracture Events . . . . . . . . . . . . 26--38
Sangback Ma Comparisons of the ILU(0), Point-SSOR,
and SPAI Preconditioners on the CRAY-T3E
for Nonsymmetric Sparse Linear Systems
Arising from PDEs on Structured Grids 39--48
Steve W. Bova and
Clay P. Breshears and
Christine E. Cuicchi and
Zeki Demirbilek and
Henry A. Gabb Dual-Level Parallel Analysis of Harbor
Wave Response Using MPI and OpenMP . . . 49--64
Weian Deng and
S. Sitharama Iyengar and
Nathan E. Brener A Fast Parallel Thinning Algorithm for
the Binary Image Skeletonization . . . . 65--81
Tony Chan and
Victor Eijkhout Design of a Library of Parallel
Preconditioners . . . . . . . . . . . . 91--101
William Gropp and
David Keyes and
Lois Curfman McInness and
M. D. Tidriri Globalized Newton--Krylov--Schwarz
Algorithms and Software for Parallel
Implicit CFD . . . . . . . . . . . . . . 102--136
Kevin McManus and
Mark Cross and
Chris Walshaw and
Steve Johnson and
Peter Leggett A Scalable Strategy for the
Parallelization of Multiphysics
Unstructured Mesh-Iterative Codes on
Distributed-Memory Systems . . . . . . . 137--174
Frederica Darema and
Jack Dongarra and
Subhash Saini Preface . . . . . . . . . . . . . . . . 179--179
Frederica Darema Performance Engineering Technology for
the Design, Management, and Control of
Computing Systems . . . . . . . . . . . 180--188
S. Browne and
J. Dongarra and
N. Garner and
G. Ho and
P. Mucci A Portable Programming Interface for
Performance Evaluation on Modern
Processors . . . . . . . . . . . . . . . 189--204
Tony Hey and
David Lancaster The Development of Parkbench and
Performance Prediction . . . . . . . . . 205--215
Tahsin Kurc and
Mustafa Uysal and
Hyeonsang Eom and
Jeff Hollingsworth and
Joel Saltz and
Alan Sussman Efficient Performance Prediction for
Large-Scale Data-Intensive Applications 216--227
G. R. Nudd and
D. J. Kerbyson and
E. Papaefstathiou and
S. C. Perry and
J. S. Harper and
D. V. Wilcox PACE --- a Toolset for the Performance
Prediction of Parallel and Distributed
Systems . . . . . . . . . . . . . . . . 228--251
Lewis Mackenzie and
Mohamed Ould-Khaoua Comparative Modeling of Network
Topologies and Routing Strategies in
Multicomputers . . . . . . . . . . . . . 252--267
Kento Aida and
Atsuko Takefusa and
Hidemoto Nakada and
Satoshi Matsuoka and
Satoshi Sekiguchi and
Umpei Nagashima Performance Evaluation Model for
Scheduling in Global Computing Systems 268--279
J. C. Browne and
E. Berger and
A. Dube Compositional Development of Performance
Models in POEMS . . . . . . . . . . . . 283--291
Daniel A. Menascé Web Performance Modeling Issues . . . . 292--303
Vikram Adve and
Rizos Sakellariou Application Representations for
Multiparadigm Performance Modeling of
Large-Scale Parallel Scientific Codes 304--316
Bryan Buck and
Jeffrey K. Hollingsworth An API for Runtime Code Patching . . . . 317--329
Adolfy Hoisie and
Olaf Lubeck and
Harvey Wasserman Performance and Scalability Analysis of
Teraflop-Scale Parallel Architectures
Using Multidimensional Wavefront
Applications . . . . . . . . . . . . . . 330--346
Katarzyna Keahey and
Peter Beckman and
James Ahrens Ligature: Component Architecture for
High Performance Applications . . . . . 347--356
Jeffrey S. Vetter and
Daniel A. Reed Real-Time Performance Monitoring,
Adaptive Control, and Interactive
Steering of Computational Grids . . . . 357--366
Neil J. Gunther The Dynamics of Performance Collapse in
Large-Scale Networks and Computers . . . 367--372
Anonymous Index to \booktitleInternational Journal
of High Performance Computing
Applications, Volume 14 . . . . . . . . 373--375
Rajive Bagrodia and
Ewa Deelman and
Thomas Phan Parallel Simulation of Large-Scale
Parallel Applications . . . . . . . . . 3--12
Jason Abate and
Peng Wang and
Kamy Sepehrnoori Parallel Compositional Reservoir
Simulation on Clusters of PCs . . . . . 13--21
David Kerlick and
Eric Dillon and
David Levine Performance Testing of a Parallel
Multiblock CFD Solver . . . . . . . . . 22--35
Luc Giraud and
Ronan Guivarch and
Joël Stein Parallel Distributed FFT-Based Solvers
for $3$-D Poisson Problems in Meso-Scale
Atmospheric Simulations . . . . . . . . 36--46
Jen-Chih Lin and
Huan-Chao Keh Reconfiguration of Complete Binary Trees
in Full IEH Graphs and Faulty Hypercubes 47--55
Edmond Chow Parallel Implementation and Practical
Use of Sparse Approximate Inverse
Preconditioners with a Priori Sparsity
Patterns . . . . . . . . . . . . . . . . 56--74
Mihai Horoi and
Richard J. Enbody Using Amdahl's Law as a Metric to Drive
Code Parallelization: Two Case Studies 75--80
Mark Baker Preface . . . . . . . . . . . . . . . . 91--91
Thomas Sterling An Introduction to PC Clusters for High
Performance Computing . . . . . . . . . 92--101
Amy Apon and
Mark Baker Network Technologies . . . . . . . . . . 102--114
Steve Chapin and
Joachim Worringen Operating Systems . . . . . . . . . . . 115--123
Rajkumar Buyya and
Toni Cortes and
Hai Jin Single System Image . . . . . . . . . . 124--135
Mark Baker and
Amy Apon Middleware . . . . . . . . . . . . . . . 136--142
Anthony Skjellum and
Rossen Dimitrov and
Srihari Venkata Angaluri and
David Lifka and
George Coulouris and
Putchong Uthayopas and
Stephen L. Scott and
Rasit Eskicioglu Systems Administration . . . . . . . . . 143--161
Erich Schikuta and
Helmut Wanek Parallel I/O . . . . . . . . . . . . . . 162--168
Ira Pramanick High Availability . . . . . . . . . . . 169--174
Jack Dongarra and
Shirley Moore and
Anne Trefethen Numerical Libraries and Tools for
Scalable Parallel Cluster Computing . . 175--180
David A. Bader and
Robert Pennington Applications . . . . . . . . . . . . . . 181--185
Daniel S. Katz and
Jeremy Kepner Embedded\slash Real-Time Systems . . . . 186--190
Anonymous Appendixes: Appendix A: Linux, Windows
NT, AIX, Solaris; Appendix B: Compilers
and Preprocessors, MPI Implementations,
Development Environments, Debuggers,
Performance Analyzers . . . . . . . . . 191--194
Jack Dongarra and
Bernard Tourancheau Preface . . . . . . . . . . . . . . . . 199--199
Ian Foster and
Carl Kesselman and
Steven Tuecke The Anatomy of the Grid: Enabling
Scalable Virtual Organization . . . . . 200--222
William E. Johnston Using Computing and Data Grids for
Large-Scale Science and Engineering . . 223--242
Henri Casanova and
Thomas M. Bartol, Jr. and
Joel Stiles and
Francine Berman Distributing MCell Simulations on the
Grid . . . . . . . . . . . . . . . . . . 243--257
Rich Wolski and
James S. Plank and
John Brevik and
Todd Bryan Analyzing Market-Based Resource
Allocation Strategies for the
Computational Grid . . . . . . . . . . . 258--281
Thomas Sterling and
Daniel S. Katz and
Larry Bergman High Performance Computing Systems for
Autonomous Spaceborne Missions . . . . . 282--296
Jean-Yves Berthou and
Eric Fayolle Comparing OpenMP, HPF, and MPI
Programming: a Study Case . . . . . . . 297--309
Olivier Beaumont and
Arnaud Legrand and
Fabrice Rastello and
Yves Robert Static $ L U $ Decomposition on
Heterogeneous Platforms . . . . . . . . 310--323
Francine Berman and
Andrew Chien and
Keith Cooper and
Jack Dongarra and
Ian Foster and
Dennis Gannon and
Lennart Johnsson and
Ken Kennedy and
Carl Kesselman and
John Mellor-Crummey and
Dan Reed and
Linda Torczon and
Rich Wolski The GrADS Project: Software Support for
High-Level Grid Application Development 327--344
Gabrielle Allen and
David Angulo and
Ian Foster and
Gerd Lanfermann and
Chuang Liu and
Thomas Radke and
Ed Seidel and
John Shalf The Cactus Worm: Experiments with
Dynamic Resource Discovery and
Allocation in a Grid Environment . . . . 345--358
Antoine Petitet and
Susan Blackford and
Jack Dongarra and
Brett Ellis and
Graham Fagg and
Kenneth Roche and
Sathish Vadhiyar Numerical Libraries and the Grid . . . . 359--374
Matei Ripeanu and
Adriana Iamnitchi and
Ian Foster Performance Predictions for a Numerical
Relativity Package in Grid Environments 375--387
Boris Chernyavsky and
Doyle Knight Investigation of Large Eddy Simulation
Code Scaling Performance and Network
Type Influence on a Linux PC Cluster . . 388--393
Anonymous Index to \booktitleInternational Journal
of High Performance Computing
Applications, Volume 15 . . . . . . . . 394--396
Jack Dongarra Preface: Basic Linear Algebra
Subprograms Technical (Blast) Forum
Standard I . . . . . . . . . . . . . . . 1--111
Anonymous Acknowledgments . . . . . . . . . . . . 2--3
Anonymous Suggestions for Reading . . . . . . . . 4--4
Anonymous Introduction . . . . . . . . . . . . . . 5--18
Anonymous Dense and Banded Blas . . . . . . . . . 19--86
Anonymous Annex A Appendix . . . . . . . . . . . . 87--93
Anonymous Annex B Legacy Blas . . . . . . . . . . 94--107
Anonymous Annex C . . . . . . . . . . . . . . . . 108--108
Anonymous References . . . . . . . . . . . . . . . 109--109
Anonymous Index . . . . . . . . . . . . . . . . . 110--111
Jack Dongarra Preface: Basic Linear Algebra
Subprograms Technical (Blast) Forum
Standard II . . . . . . . . . . . . . . 115--115
Anonymous Acknowledgments . . . . . . . . . . . . 116--117
Anonymous Suggestions for Reading . . . . . . . . 118--118
Anonymous 3 Sparse Blas . . . . . . . . . . . . . 119--141
Anonymous 4 Extended and Mixed Precision Blas . . 142--174
Anonymous Annex A . . . . . . . . . . . . . . . . 175--181
Anonymous Annex B . . . . . . . . . . . . . . . . 182--195
Anonymous Annex C . . . . . . . . . . . . . . . . 196--196
Anonymous References . . . . . . . . . . . . . . . 197--197
Anonymous Index . . . . . . . . . . . . . . . . . 198--199
S. S. Iyengar and
Sri Kumar Preface . . . . . . . . . . . . . . . . 203--205
R. R. Brooks and
C. Griffin and
D. S. Friedlander Self-Organized Distributed Sensor
Network Entity Tracking . . . . . . . . 207--219
R. R. Brooks and
C. Griffin Traffic Model Evaluation of \em Ad Hoc
Target Tracking Algorithms . . . . . . . 221--234
D. S. Friedlander and
S. Phoha Semantic Information Fusion for
Coordinated Signal Processing in Mobile
Sensor Networks . . . . . . . . . . . . 235--241
Mark T. Jones and
Shashank Mehrotra and
Jae H. Park Tasking Distributed Sensor Networks . . 243--257
J. C. Chen and
K. Yao and
T. L. Tung and
C. W. Reed and
D. Chen Source Localization and Tracking of a
Wideband Source Using a Randomly
Distributed Beamforming Sensor Array . . 259--272
Ivo H. Pineda-Torres and
Ibrahim Gokcen and
Bill P. Buckles Image Feature Set For Correspondence
Mappings . . . . . . . . . . . . . . . . 273--283
Nageswara S. V. Rao Netlets For End-To-End Delay
Minimization in Distributed Computing
Over The Internet Using Two-Paths . . . 285--292
Maurice Chu and
Horst Haussecker and
Feng Zhao Scalable Information-Driven Sensor
Querying and Routing for \em Ad Hoc
Heterogeneous Sensor Networks . . . . . 293--313
Edoardo S. Biagioni and
K. W. Bridges The Application of Remote Sensor
Technology to Assist The Recovery of
Rare And Endangered Species . . . . . . 315--324
Hairong Qi and
Xiaoling Wang and
S. Sitharama Iyengar and
Krishnendu Chakrabarty High Performance Sensor Integration in
Distributed Sensor Networks Using Mobile
Agents . . . . . . . . . . . . . . . . . 325--335
John W. Fisher III and
Martin J. Wainwright and
Erik B. Sudderth and
Alan S. Willsky Statistical and Information-Theoretic
Methods for Self-Organization and Fusion
of Multimodal, Networked Sensors . . . . 337--353
Xian-He Sun and
Thomas Fahringer and
Mario Pantano Scala: a Performance System for Scalable
Computing . . . . . . . . . . . . . . . 357--370
G. Mahinthakumar and
F. Saied A Hybrid MPI-OpenMP Implementation of an
Implicit Finite-Element Code on Parallel
Architectures . . . . . . . . . . . . . 371--393
Dimitri J. Mavriplis Parallel Performance Investigations of
an Unstructured Mesh Navier--Stokes
Solver . . . . . . . . . . . . . . . . . 395--407
Chao Yang and
Padma Raghavan and
Lloyd Arrowood and
Donald W. Noid and
Bobby G. Sumpter and
Robert E. Tuzun Large-Scale Normal Coordinate Analysis
on Distributed Memory Parallel Systems 409--424
L. Giraud Combining Shared and Distributed Memory
Programming Models on Clusters of
Symmetric Multiprocessors: Some Basic
Promising Experiments . . . . . . . . . 425--430
Anonymous Index . . . . . . . . . . . . . . . . . 431--432
Dieter Kranzlmüller and
Peter Kacsuk and
Jack Dongarra and
Jens Volkert Recent Advances in Parallel Virtual
Machine and Message Passing Interface
(Select papers from the EuroPVMMPI 2002
Conference) . . . . . . . . . . . . . . 3--5
Ron Brightwell and
Rolf Riesen and
Arthur B. Maccabe Design, Implementation, and Performance
of MPI on Portals 3.0 . . . . . . . . . 7--19
Félix Garcia-Carballeira and
Alejandro Calderon and
Jesus Carretero and
Javier Fernandez and
Jose M. Perez The Design of the Expand Parallel File
System . . . . . . . . . . . . . . . . . 21--37
Francesc Giné and
Francesc Solsona and
Porfidio Hernández and
Emilio Luque Dealing with Memory Constraints in a
Non-Dedicated Linux Cluster . . . . . . 39--48
Rolf Rabenseifner and
Gerhard Wellein Communication and Optimization Aspects
of Parallel Programming Models on Hybrid
Architectures . . . . . . . . . . . . . 49--62
Sébastien Laflamme and
Julien Dompierre and
François Guibault and
Robert Roy Applying Parmetis to Structured
Remeshing for Industrial CFD
Applications . . . . . . . . . . . . . . 63--76
Pawe\l Czarnul Programming, Tuning and Automatic
Parallelization of Irregular
Divide-and-Conquer Applications in
DAMPVM/DAC . . . . . . . . . . . . . . . 77--93
Bernd Hamann and
E. Wes Bethel and
Horst Simon and
Juan Meza NERSC `Visualization Greenbook': Future
Visualization Needs of the DOE
Computational Science Community Hosted
at NERSC . . . . . . . . . . . . . . . . 97--123
Jack Dongarra and
Victor Eijkhout Self-Adapting Numerical Software for
Next Generation Applications . . . . . . 125--131
Claudio Luis de Amorim Guest Editor's Preface . . . . . . . . . 133--134
Johann Großschädl Architectural Support for Long Integer
Modulo Arithmetic on RISC-Based Smart
Cards . . . . . . . . . . . . . . . . . 135--146
Leonardo Bidese de Pinho and
Edison Ishikawa and
Claudio Luis de Amorim GloVE: a Distributed Environment for
Scalable Video-on-demand Systems . . . . 147--161
D. P. Ruchkys and
S. W. Song A Parallel Solution to Infer Genetic
Network Architectures in Gene Expression
Analysis . . . . . . . . . . . . . . . . 163--172
Cristina Boeres and
Vinod E. F. Rebello Towards Optimal Static Task Scheduling
for Realistic Machine Models: Theory and
Practice . . . . . . . . . . . . . . . . 173--189
Adenauer Corrêa Yamin and
Jorge Victória Barbosa and
Iara Augustin and
Luciano Cavalheiro da Silva and
Rodrigo Real and
Cláudio Geyer and
Gerson Cavalheiro Towards Merging Context-Aware, Mobile
and Grid Computing . . . . . . . . . . . 191--203
David W. Walker Preface: Grid Computing: Infrastructure
and Applications . . . . . . . . . . . . 207--208
Gabriel Mateescu Qaulity of Service on the Grid via
Metascheduling with Resource
Co-scheduling and Co-reservation . . . . 209--218
Saleem N. Bhatti and
Sòren-Aksel Sòrensen and
Peter Clark and
Jon Crowcroft Network QoS for Grid Systems . . . . . . 219--236
Nader Mohamed and
Jameela Al-Jaroodi and
Hong Jiang and
David Swanson Scalable Bulk Data Transfer in Wide Area
Networks . . . . . . . . . . . . . . . . 237--248
Sudharshan Vazhkudai and
Jennifer M. Schopf Using Regression Techniques to Predict
Large Data Transfers . . . . . . . . . . 249--268
Catherine Houstis and
Spyros Lalis and
Marios Pitikakis and
George V. Vasilakis and
Kyriakos Kritikos and
Antonis Smardas A Grid Service-based Infrastructure for
Accessing Scientific Collections: The
Case of the ARION System . . . . . . . . 269--280
Andrew Woolf and
Keith Haines and
Chunlei Liu A Web Service Model for Climate Data
Access on the Grid . . . . . . . . . . . 281--295
Salman AlSairafi and
Filippia-Sofia Emmanouil and
Moustafa Ghanem and
Nikolaos Giannadakis and
Yike Guo and
Dimitrios Kalaitzopoulos and
Michelle Osmond and
Anthony Rowe and
Jameel Syed and
Patrick Wendel The Design of Discovery Net: Towards
Open Grid Services for Knowledge
Discovery . . . . . . . . . . . . . . . 297--315
Yan Huang JISGA: a Jini-Based Service-Oriented
Grid Architecture . . . . . . . . . . . 317--327
Lican Huang and
Zhaohui Wu and
Yunhe Pan Virtual and Dynamic Hierarchical
Architecture for E-Science Grid . . . . 329--347
Craig A. Lee Best Applications Papers from the Third
International Workshop on Grid Computing 351--351
Jim Smith and
Paul Watson and
Anastasios Gounaris and
Norman W. Paton and
Alvaro A. A. Fernandes and
Rizos Sakellariou Distributed Query Processing on the Grid 353--367
Yaohang Li and
Michael Mascagni Analysis of Large-scale Grid-based Monte
Carlo Applications . . . . . . . . . . . 369--382
Marcio Faerman and
Adam Birnbaum and
Francine Berman and
Henri Casanova Resource Allocation Strategies for
Guided Parameter Space Searches . . . . 383--402
William H. Bell and
David G. Cameron and
A. Paul Millar and
Luigi Capozza and
Kurt Stockinger and
Floriano Zini Optorsim: a Grid Simulator for Studying
Dynamic Data Replication Strategies . . 403--416
Christian Pérez and
Thierry Priol and
André Ribes A Parallel CORBA Component Model for
Numerical Code Coupling . . . . . . . . 417--429
Gregor von Laszewski and
Branko Ruscic and
Kaizar Amin and
Patrick Wagstrom and
Sriram Krishnan and
Sandeep Nijsure A Framework for Building Scientific
Knowledge Grids Applied to
Thermochemical Tables . . . . . . . . . 431--447
Gabrielle Allen and
Tom Goodale and
Thomas Radke and
Michael Russell and
Ed Seidel and
Kelly Davis and
Konstantinos N. Dolkas and
Nikolaos D. Doulamis and
Thilo Kielmann and
André Merzky and
Jarek Nabrzyski and
Juliusz Pukacki and
John Shalf and
Ian Taylor Enabling Applications on the Grid: a
GridLab Overview . . . . . . . . . . . . 449--466
Henri Casanova and
Francine Berman and
Thomas Bartol and
Erhan Gokcay and
Terry Sejnowski and
Adam Birnbaum and
Jack Dongarra and
Michelle Miller and
Mark Ellisman and
Marcio Faerman and
Graziano Obertelli and
Rich Wolski and
Stuart Pomerantz and
Joel Stiles The Virtual Instrument: Support for
Grid-Enabled Mcell Simulations . . . . . 3--17
Katherine Yelick Special Issue on Automatic Performance
Tuning . . . . . . . . . . . . . . . . . 19--19
Markus Püschel and
José M. F. Moura and
Bryan Singer and
Jianxin Xiong and
Jeremy Johnson and
David Padua and
Manuela Veloso and
Robert W. Johnson Spiral: a Generator for Platform-Adapted
Libraries of Signal Processing
Algorithms . . . . . . . . . . . . . . . 21--45
Dragan Mirkovi\'c and
Lennart Johnsson Automatic Performance Tuning for Fast
Fourier Transforms . . . . . . . . . . . 47--64
Richard Vuduc and
James W. Demmel and
Jeff A. Bilmes Statistical Models for Empirical
Search-Based Performance Tuning . . . . 65--94
Michelle Mills Strout and
Larry Carter and
Jeanne Ferrante and
Barbara Kreaseck Sparse Tiling for Stationary Iterative
Methods . . . . . . . . . . . . . . . . 95--113
Sriram Sellappa and
Siddhartha Chatterjee Cache-Efficient Multigrid Algorithms . . 115--133
Eun-Jin Im and
Katherine Yelick and
Richard Vuduc Sparsity: Optimization Framework for
Sparse Matrix Kernels . . . . . . . . . 135--158
Sathish S. Vadhiyar and
Graham E. Fagg and
Jack J. Dongarra Towards an Accurate Model for Collective
Communications . . . . . . . . . . . . . 159--167
Weicheng Huang and
Danesh K. Tafti A Parallel Adaptive Mesh Refinement
Algorithm for Solving Nonlinear
Dynamical Systems . . . . . . . . . . . 171--181
Y. Deng and
J. Glimm and
J. W. Davenport and
X. Cai and
E. Santos Performance Models on QCDOC for
Molecular Dynamics with Coulomb
Potentials . . . . . . . . . . . . . . . 183--195
Darren J. Kerbyson and
Adolfy Hoisie and
Scott Pakin and
Fabrizio Petrini and
Harvey J. Wasserman A Performance Evaluation of an Alpha EV7
Processing Node . . . . . . . . . . . . 199--209
W. Jalby and
C. Lemuet and
X. Le Pasteur WBTK: a New Set of Microbenchmarks to
Explore Memory System Performance for
Scientific Computing . . . . . . . . . . 211--224
John Mellor-Crummey and
John Garvin Optimizing Sparse Matrix--Vector Product
Computations Using Unroll and Jam . . . 225--236
Qing Yi and
Ken Kennedy Improving Memory Hierarchy Performance
through Combined Loop Interchange and
Multi-Level Fusion . . . . . . . . . . . 237--253
Martin Swany and
Rich Wolski Building Performance Topologies for
Computational Grids . . . . . . . . . . 255--265
Celso L. Mendes and
Daniel A. Reed Monitoring Large Systems Via Statistical
Sampling . . . . . . . . . . . . . . . . 267--277
Tony Hey and
Anne E. Trefethen UK e-Science Programme: Next Generation
Grid Applications . . . . . . . . . . . 285--291
Pascale Vicat-Blanc Primet and
Robert Harakaly and
Franck Bonnassieux Grid Network Monitoring in the European
Datagrid Project . . . . . . . . . . . . 293--304
Roland Wismüller and
Marian Bubak and
W\lodzimierz Funika and
Bartosz Bali\'s A Performance Analysis Tool for
Interactive Applications on the Grid . . 305--316
Philip M. Papadopoulos and
Caroline A. Papadopoulos and
Mason J. Katz and
William J. Link and
Greg Bruno Configuring Large High-Performance
Clusters at Lightspeed: a Case Study . . 317--326
Yiannis Cotronis Composition of Message Passing Interface
Applications over MPICH-G2 . . . . . . . 327--339
Otto Sievert and
Henri Casanova A Simple MPI Process Swapping
Architecture for Iterative Applications 341--352
Graham E. Fagg and
Jack J. Dongarra Building and Using a Fault-Tolerant MPI
Implementation . . . . . . . . . . . . . 353--361
William Gropp and
Ewing Lusk Fault Tolerance in Message Passing
Interface Programs . . . . . . . . . . . 363--372
E. Caron and
F. Desprez and
M. Quinson and
F. Suter Performance Evaluation of Linear Algebra
Routines . . . . . . . . . . . . . . . . 373--390
Jeremy Kepner HPC Productivity: An Overarching View 393--397
D. E. Post and
R. P. Kendall Software Project Management and Quality
Engineering Practices for Complex,
Coupled Multiphysics, Massively Parallel
Computational Simulations: Lessons
Learned From ASCI . . . . . . . . . . . 399--416
Marc Snir and
David A. Bader A Framework for Measuring Supercomputer
Productivity . . . . . . . . . . . . . . 417--432
Thomas Sterling Productivity Metrics and Models for High
Performance Computing . . . . . . . . . 433--440
Ken Kennedy and
Charles Koelbel and
Robert Schreiber Defining and Measuring the Productivity
of Programming Languages . . . . . . . . 441--448
Robert W. Numrich Performance Metrics Based on
Computational Action . . . . . . . . . . 449--458
Stuart Faulk and
John Gustafson and
Philip Johnson and
Adam Porter and
Walter Tichy and
Lawrence Votta Measuring High Performance Computing
Productivity . . . . . . . . . . . . . . 459--473
J. Gustafson Purpose-Based Benchmarks . . . . . . . . 475--487
David J. Kuck Productivity in High Performance
Computing . . . . . . . . . . . . . . . 489--504
Jeremy Kepner High Performance Computing Productivity
Model Synthesis . . . . . . . . . . . . 505--516
Kevin McManus and
Alison Williams and
Mark Cross and
Nick Croft and
Chris Walshaw Assessing the Scalability of
Multiphysics Tools for Modeling
Solidification and Melting Processes on
Parallel Clusters . . . . . . . . . . . 1--27
Paul M. Eder and
James E. Giuliani and
Somnath Ghosh Multilevel Parallel Programming for
Three-Dimensional Voronoi Cell Finite
Element Modeling of Heterogeneous
Materials . . . . . . . . . . . . . . . 29--45
Salvatore Orlando and
Domenico Laforenza Preface: Selected Papers from the
EUROPVM/MPI 2003 Conference, Venice,
Italy, 29 September--2 October 2003 . . 47--47
Rajeev Thakur and
Rolf Rabenseifner and
William Gropp Optimization of Collective Communication
Operations in MPICH . . . . . . . . . . 49--66
Edgar Gabriel and
Graham E. Fagg and
Jack J. Dongarra Evaluating Dynamic Communicators and
One-Sided Operations for Current MPI
Libraries . . . . . . . . . . . . . . . 67--79
Albert Chan and
Frank Dehne and
Ryan Taylor CGMGRAPH/CGMLIB: Implementing and
Testing CGM Graph Algorithms on PC
Clusters and Shared Memory Machines . . 81--97
Dieter Kranzlmüller and
Peter Kacsuk and
Jack Dongarra Recent Advances in Parallel Virtual
Machine and Message Passing Interface 99--101
Ron Brightwell and
Rolf Riesen and
Keith D. Underwood Analyzing the Impact of Overlap,
Offload, and Independent Progress for
Message Passing Interface Applications 103--117
Rajeev Thakur and
William Gropp and
Brian Toonen Optimizing the Synchronization
Operations in Message Passing Interface
One-Sided Communication . . . . . . . . 119--128
Gopalakrishnan Santhanaraman and
Jiesheng Wu and
Wei Huang and
Dhabaleswar K. Panda Designing Zero-Copy Message Passing
Interface Derived Datatype Communication
Over Infiniband: Alternative Approaches
and Performance Evaluation . . . . . . . 129--142
Dawid Kurzyniec and
Vaidy Sunderam Failure Resilient Heterogeneous Parallel
Computing Across Multidomain Clusters 143--155
Franco Frattolillo Running Large-Scale Applications on
Cluster Grids . . . . . . . . . . . . . 157--172
Aristides Patrinos Preface . . . . . . . . . . . . . . . . 175--175
John B. Drake and
Philip W. Jones and
George R. Carr, Jr. Overview of the Software Design of the
Community Climate System Model . . . . . 177--186
Patrick H. Worley and
John B. Drake Performance Portability in the Physical
Parameterizations of the Community
Atmospheric Model . . . . . . . . . . . 187--201
Arthur A. Mirin and
William B. Sawyer A Scalable Implementation of a
Finite-Volume Dynamical Core in the
Community Atmosphere Model . . . . . . . 203--212
William M. Putman and
Shian-Jiann Lin and
Bo-Wen Shen Cross-Platform Performance of a Portable
Communication Module and the NASA Finite
Volume General Circulation Model . . . . 213--223
John Dennis and
Aimé Fournier and
William F. Spotz and
Amik St-Cyr and
Mark A. Taylor and
Stephen J. Thomas and
Henry Tufo High-Resolution Mesh Convergence
Properties and Parallel Efficiency of a
Spectral Element Atmospheric Dynamical
Core . . . . . . . . . . . . . . . . . . 225--235
Steven Ghan and
Timothy Shippert Load Balancing and Scalability of a
Subgrid Orography Scheme in a Global
Climate Model . . . . . . . . . . . . . 237--245
Forrest M. Hoffman and
Mariana Vertenstein and
Hideyuki Kitabata and
James B. White III Vectorizing the Community Land Model . . 247--260
Darren J. Kerbyson and
Philip W. Jones A Performance Model of the Parallel
Ocean Program . . . . . . . . . . . . . 261--276
Jay Larson and
Robert Jacob and
Everest Ong The Model Coupling Toolkit: a New
Fortran90 Toolkit for Building
Multiphysics Parallel Coupled Models . . 277--292
Robert Jacob and
Jay Larson and
Everest Ong $ M \times N $ Communication and
Parallel Interpolation in Community
Climate System Model Version 3 Using the
Model Coupling Toolkit . . . . . . . . . 293--307
Anthony P. Craig and
Robert Jacob and
Brian Kauffman and
Tom Bettge and
Jay Larson and
Everest Ong and
Chris Ding and
Yun He CPL6: The New Extensible, High
Performance Parallel Coupler for the
Community Climate System Model . . . . . 309--327
Yun He and
Chris H. Q. Ding Coupling Multicomponent Models with MPH
on Distributed Memory Computer
Architectures . . . . . . . . . . . . . 329--340
Nancy Collins and
Gerhard Theurich and
Cecelia DeLuca and
Max Suarez and
Atanas Trayanov and
V. Balaji and
Peggy Li and
Weiyu Yang and
Chris Hill and
Arlindo da Silva Design and Implementation of Components
in the Earth System Modeling Framework 341--350
Marc Baboulin and
Luc Giraud and
Serge Gratton A Parallel Distributed Solver for Large
Dense Symmetric Systems: Applications to
Geodesy and Electromagnetism Problems 353--363
Chun-Ho Liu and
Chat-Ming Woo and
Dennis Y. C. Leung Performance Analysis of a Linux PC
Cluster Using a Direct Numerical
Simulation of Fluid Turbulence Code . . 365--374
Ping Wang and
Y. Tony Song and
Yi Chao and
Hongchun Zhang Parallel Computation of the Regional
Ocean Modeling System . . . . . . . . . 375--385
Robert Fowler Preface . . . . . . . . . . . . . . . . 387--388
Kostadin Damevski and
Steven G. Parker $ M \times N $ Data Redistribution
Through Parallel Remote Method
Invocation . . . . . . . . . . . . . . . 389--398
Felipe Bertrand and
Yongquan Yuan and
Kenneth Chiu and
Randall Bramley An Approach to Parallel $ M \times N $
Communication . . . . . . . . . . . . . 399--407
Johan Steensland and
Jaideep Ray A Partitioner-Centric Model for
Structured Adaptive Mesh Refinement
Partitioning Trade-Off Optimization:
Part I . . . . . . . . . . . . . . . . . 409--422
Keith D. Cooper and
Todd Waterman Investigating Adaptive Compilation Using
the MIPSPro Compiler . . . . . . . . . . 423--431
Guohua Jin and
John Mellor-Crummey Improving Performance by Reducing the
Memory Footprint of Scientific
Applications . . . . . . . . . . . . . . 433--451
Weikuan Yu and
Sayantan Sur and
Dhabaleswar K. Panda and
Rob T. Aulwes and
Rich L. Graham High Performance Broadcast Support in
LA-MPI Over Quadrics . . . . . . . . . . 453--463
Graham E. Fagg and
Edgar Gabriel and
Zizhong Chen and
Thara Angskun and
George Bosilca and
Jelena Pjesivac-Grbovic and
Jack J. Dongarra Process Fault Tolerance: Semantics,
Design and Applications for High
Performance Computing . . . . . . . . . 465--477
Sriram Sankaran and
Jeffrey M. Squyres and
Brian Barrett and
Vishal Sahay and
Andrew Lumsdaine and
Jason Duell and
Paul Hargrove and
Eric Roman The LAM/MPI Checkpoint/Restart
Framework: System-Initiated
Checkpointing . . . . . . . . . . . . . 479--493
Larry Carter and
Henri Casanova and
Jeanne Ferrante and
Frédéric Desprez and
Yves Robert Preface . . . . . . . . . . . . . . . . 3--4
O. Beaumont and
L. Marchal and
Y. Robert Complexity Results for Collective
Communications on Heterogeneous
Platforms . . . . . . . . . . . . . . . 5--17
M. Drozdowski and
M. Lawenda and
F. Guinand Scheduling Multiple Divisible Loads . . 19--30
Hél\`ene Renard and
Yves Robert and
Frédéric Vivien Data Redistribution Algorithms for
Heterogeneous Processor Rings . . . . . 31--43
Barbara Kreaseck and
Larry Carter and
Henri Casanova and
Jeanne Ferrante and
Sagnik Nandy Interference-Aware Scheduling . . . . . 45--59
Yves Caniou and
Emmanuel Jeannot Multicriteria Scheduling Heuristics for
GridRPC Systems . . . . . . . . . . . . 61--76
Aurélien Bouteiller and
Hinde-Lilia Bouziane and
Thomas Herault and
Pierre Lemarinier and
Franck Cappello Hybrid Preemptive Scheduling of Message
Passing Interface Applications on Grids 77--90
Darin England and
Jon Weissman A Resource Leasing Policy for on-Demand
Computing . . . . . . . . . . . . . . . 91--101
Gosia Wrzesi\'nska and
Rob V. van Nieuwpoort and
Jason Maassen and
Thilo Kielmann and
Henri E. Bal Fault-Tolerant Scheduling of
Fine-Grained Tasks in Grid Environments 103--114
Arjav J. Chakravarti and
Gerald Baumgartner and
Mario Lauria Self-Organizing Scheduling on the
Organic Grid . . . . . . . . . . . . . . 115--130
Asim YarKhan and
Keith Seymour and
Kiran Sagi and
Zhiao Shi and
Jack Dongarra Recent Developments in GridSolve . . . . 131--141
Holly Dail and
Frédéric Desprez Experiences with Hierarchical Request
Flow Management for Network-Enabled
Server Environments . . . . . . . . . . 143--157
Osni Marques and
Tony Drummond Preface . . . . . . . . . . . . . . . . 161--162
David E. Bernholdt and
Benjamin A. Allan and
Robert Armstrong and
Felipe Bertrand and
Kenneth Chiu and
Tamara L. Dahlgren and
Kostadin Damevski and
Wael R. Elwasif and
Thomas G. W. Epperly and
Madhusudhan Govindaraju and
Daniel S. Katz and
James A. Kohl and
Manoj Krishnan and
Gary Kumfert and
J. Walter Larson and
Sophia Lefantzi and
Michael J. Lewis and
Allen D. Malony and
Lois C. Mclnnes and
Jarek Nieplocha and
Boyana Norris and
Steven G. Parker and
Jaideep Ray and
Sameer Shende and
Theresa L. Windus and
Shujia Zhou A Component Architecture for
High-Performance Scientific Computing 163--202
Jarek Nieplocha and
Bruce Palmer and
Vinod Tipparaju and
Manojkumar Krishnan and
Harold Trease and
Edoardo Apr\`a Advances, Applications and Performance
of the Global Arrays Shared Memory
Programming Toolkit . . . . . . . . . . 203--231
J. Nieplocha and
V. Tipparaju and
M. Krishnan and
D. K. Panda High Performance Remote Memory Access
Communication: The ARMCI Approach . . . 233--253
James A. Kohl and
Torsten Wilde and
David E. Bernholdt Cumulvs: Interacting with
High-Performance Scientific Simulations,
for Visualization, Steering and Fault
Tolerance . . . . . . . . . . . . . . . 255--285
Sameer S. Shende and
Allen D. Malony The Tau Parallel Performance System . . 287--311
Jack Dongarra and
Bernard Tourancheau Special Issue on Tools in the ACTS
Collection 2004 . . . . . . . . . . . . 317--317
A. Bouteiller and
T. Herault and
G. Krawezik and
P. Lemarinier and
F. Cappello MPICH-V Project: a Multiprotocol
Automatic Fault-Tolerant MPI . . . . . . 319--333
E. Caron and
F. Desprez Diet: a Scalable Toolbox to Build
Network Enabled Servers on the Grid . . 335--352
B. R. Buck and
J. K. Hollingsworth A New Hardware Monitor Design to Measure
Data Structure-Specific Cache Eviction
Information . . . . . . . . . . . . . . 353--363
L. Marchal and
Y. Yang and
H. Casanova and
Y. Robert Steady-State Scheduling of Multiple
Divisible Load Applications on Wide-Area
Distributed Computing Platforms . . . . 365--381
X. Liu and
A. A. Chien Realistic Large-Scale Online Network
Simulation . . . . . . . . . . . . . . . 383--399
E. Lusk and
N. Desai and
R. Bradshaw and
A. Lusk and
R. Butler An Interoperability Approach to System
Software, Tools, and Libraries for
Clusters . . . . . . . . . . . . . . . . 401--407
J. P. Morrison and
B. Coghlan and
A. Shearer and
S. Foley and
D. Power and
R. Perrott WEBCOM-G: a Candidate Middleware for
Grid-Ireland . . . . . . . . . . . . . . 409--422
X. Zhang and
B. Rutt and
Ü. Çatalyürek and
T. Kurç and
P. Stoffa and
M. Sen and
J. Saltz Supporting Scalable and Distributed Data
Subsetting and Aggregation in
Large-Scale Seismic Data Analysis . . . 423--438
Larry Carter and
Henri Casanova and
Frédéric Desprez and
Jeanne Ferrante and
Yves Robert Preface . . . . . . . . . . . . . . . . 441--442
Emmanuel Jeannot and
Frédéric Wagner Scheduling Messages For Data
Redistribution: An Experimental Study 443--454
Rahul Trivedi and
Abhishek Chandra and
Jon Weissman Heterogeneity-Aware Workload
Distribution in Donation-Based Grids . . 455--466
Kaoutar El Maghraoui and
Travis J. Desell and
Boleslaw K. Szymanski and
Carlos A. Varela The Internet Operating System:
Middleware for Adaptive Distributed
Computing . . . . . . . . . . . . . . . 467--480
Raphaël Bolze and
Franck Cappello and
Eddy Caron and
Michel Daydé and
Frédéric Desprez and
Emmanuel Jeannot and
Yvon Jégou and
Stephane Lanteri and
Julien Leduc and
Noredine Melab and
Guillaume Mornet and
Raymond Namyst and
Pascale Primet and
Benjamin Quetier and
Olivier Richard and
El-Ghazali Talbi and
Iréa Touche Grid'5000: a Large Scale and Highly
Reconfigurable Experimental Grid Testbed 481--494
Cynthia Bailey Lee and
Allan Snavely On the User--Scheduler Dialogue: Studies
of User-Provided Runtime Estimates and
Utility Functions . . . . . . . . . . . 495--506
Lionel Eyraud A Pragmatic Analysis of Scheduling
Environments on New Computing Platforms 507--516
Pushpinder Kaur Chouhan and
Holly Dail and
Eddy Caron and
Frédéric Vivien Automatic Middleware Deployment Planning
on Clusters . . . . . . . . . . . . . . 517--530
Brinkley Sprunt Managing the Complexity of Performance
Monitoring Hardware: The Brink Andabyss
Approach . . . . . . . . . . . . . . . . 533--540
Mohamed Dahmani and
Robert Roy Scalability Modeling For Deterministic
Particle Transport Solvers . . . . . . . 541--556
C. Shyam Sunder and
G. Baskar and
V. Babu and
David Strenski A Detailed Performance Analysis of the
Interpolation Supplemented Lattice
Boltzmann Method on the Cray T3E and
Cray X1 . . . . . . . . . . . . . . . . 557--570
Dali Wang and
Michael W. Berry and
Louis J. Gross On Parallelization of a
Spatially-Explicit Structured Ecological
Model for Integrated Ecosystem
Simulation . . . . . . . . . . . . . . . 571--581
Osman Ya\csar and
Hasan Da\=g Preface . . . . . . . . . . . . . . . . 3--4
\.I \.Ilkay Boduro\=glu and
Zeynep Erenay A Pattern Recognition Model for
Predicting a Financial Crisis in Turkey:
Turkish Economic Stability Index . . . . 5--20
Omer Ozan Sonmez and
Attila Gursoy A Novel Economic-Based Scheduling
Heuristic for Computational Grids . . . 21--29
O. Ya\csar and
M. Koça\cs Computational Modeling of Hermetic
Reciprocating Compressors . . . . . . . 30--41
Siraj-ul-Islam and
Ikram A. Tirmizi and
Fazal Haq Quartic Non-Polynomial Splines Approach
to the Solution of a System of
Second-Order Boundary-Value Problems . . 42--49
Ziya Arnavut Lossless and Near-Lossless Compression
of ECG Signals with Block-Sorting
Techniques . . . . . . . . . . . . . . . 50--58
Burak Alakent and
Mehmet C. Camurdan and
Pemra Doruker Mimicking Protein Dynamics by the
Integration of Elastic Network Model
with Time Series Analysis . . . . . . . 59--65
Berk Onat and
Sondan Durukano\=glu and
Hasan Da\=g A Parallel Implementation: Real Space
Green's Function Technique . . . . . . . 66--74
Alexey Lastovetsky and
Ravi Reddy Data Partitioning with a Functional
Performance Model of Heterogeneous
Processors . . . . . . . . . . . . . . . 76--90
Gyu Sang Choi and
Saurabh Agarwal and
Jin-Ha Kim and
Chita R. Das and
Andy B. Yoo Performance Comparison of Coscheduling
Algorithms for Non-Dedicated Clusters
Through a Generic Framework . . . . . . 91--105
Dimitri J. Mavriplis and
Michael J. Aftosmis and
Marsha Berger High Resolution Aerospace Applications
Using the NASA Columbia Supercomputer 106--126
Beniamino Di Martino and
Dieter Kranzlmüller and
Jack Dongarra Preface . . . . . . . . . . . . . . . . 129--131
Robert Latham and
Robert Ross and
Rajeev Thakur Implementing MPI-IO Atomic Mode and
Shared File Pointers Using MPI One-Sided
Communication . . . . . . . . . . . . . 132--143
Wei-keng Liao and
Kenin Coloma and
Alok Choudhary and
Lee Ward Cooperative Client-Side File Caching for
MPI Applications . . . . . . . . . . . . 144--154
Christopher Falzone and
Anthony Chan and
Ewing Lusk and
William Gropp A Portable Method for Finding User
Errors in the Usage of MPI Collective
Operations . . . . . . . . . . . . . . . 155--165
Narayan Desai and
Ewing Lusk and
Rick Bradshaw A Composition Environment for MPI
Programs . . . . . . . . . . . . . . . . 166--173
Allen D. Malony and
Sameer Shende and
Alan Morris and
Felix Wolf Compensation of Measurement Overhead in
Parallel Performance Profiling . . . . . 174--194
Belgacem Ben Youssef and
Gang Cheng and
Kyriacos Zygourakis and
Pauline Markenscoff Parallel Implementation of a Cellular
Automaton Modeling the Growth of
Three-Dimensional Tissues . . . . . . . 196--209
Katarzyna Rycerz and
Alfredo Tirado-Ramos and
Alessia Gualandris and
Simon F. Portegies Zwart and
Marian Bubak and
Peter M. A. Sloot Interactive $N$-Body Simulations on the
Grid: HLA Versus MPI . . . . . . . . . . 210--221
Stylianos Bounanos and
Martin Fleury and
Sebastien Nicolas and
Anthony Vickers Load-Balanced Drift-Diffusion Model
Simulation: Cluster Software Performance
Evaluation . . . . . . . . . . . . . . . 222--245
Jeremy Kepner and
Hans Zima Preface . . . . . . . . . . . . . . . . 249--250
Zoran Budimli\'c and
Mackale Joyner and
Ken Kennedy Improving Compilation of Java Scientific
Applications . . . . . . . . . . . . . . 251--265
K. Yelick and
P. Hilfinger and
S. Graham and
D. Bonachea and
J. Su and
A. Kamil and
K. Datta and
P. Colella and
T. Wen Parallel Languages and Compilers:
Perspective From the Titanium Experience 266--290
B. L. Chamberlain and
D. Callahan and
H. P. Zima Parallel Programmability and the Chapel
Language . . . . . . . . . . . . . . . . 291--312
R. E. Diaconescu and
H. P. Zima An Approach To Data Distributions in
Chapel . . . . . . . . . . . . . . . . . 313--335
N. Travinin Bliss and
J. Kepner pMATLAB Parallel MATLAB Library . . . . 336--359
Piotr Luszczek and
Jack Dongarra High Performance Development for High
End Computing With Python Language
Wrapper (PLW) . . . . . . . . . . . . . 360--369
G. Tan and
L. Xu and
Z. Dai and
S. Feng and
N. Sun A Study of Architectural Optimization
Methods in Bioinformatics Applications 371--384
David K. Kahaner Preface . . . . . . . . . . . . . . . . 387--387
H. S. Bhatt and
H. J. Kotecha and
B. K. Singh and
K. Bandyopadhyay and
V. H. Patel and
A. Dasgupta Connecting Grids Using Communication
Satellites . . . . . . . . . . . . . . . 388--404
Chee Shin Yeo and
Rajkumar Buyya Pricing for Utility-Driven Resource
Management and Allocation in Clusters 405--418
H. S. Bhatt and
R. M. Patel and
H. J. Kotecha and
V. H. Patel and
A. Dasgupta GANESH: Grid Application Management and
Enhanced Scheduling . . . . . . . . . . 419--428
S. S. Thakur and
S. Nandi and
R. Bhattacharjee and
D. Goswami An Asynchronous Wakeup Power-Saving
Protocol for Multi-Hop Ad Hoc Networks 429--442
Adam K. L. Wong and
Andrzej M. Goscinski The Performance of a Parallel TSP
Program and Byte Sequential Benchmarks
Executing on a Shared Cluster . . . . . 443--455
Alfredo Buttari and
Jack Dongarra and
Julie Langou and
Julien Langou and
Piotr Luszczek and
Jakub Kurzak Mixed Precision Iterative Refinement
Techniques for the Solution of Dense
Linear Systems . . . . . . . . . . . . . 457--466
Alfredo Buttari and
Victor Eijkhout and
Julien Langou and
Salvatore Filippone Performance Optimization and Modeling of
Blocked Sparse Kernels . . . . . . . . . 467--484
Charles S. Zender and
Harry Mangalam Scaling Properties of Common Statistical
Operators for Gridded Datasets . . . . . 485--498
Rupak Biswas and
Leonid Oliker Preface . . . . . . . . . . . . . . . . 3--4
Leonid Oliker and
Andrew Canning and
Jonathan Carter and
John Shalf and
Stéphane Ethier Scientific Application Performance on
Leading Scalar and Vector
Supercomputering Platforms . . . . . . . 5--20
Hongzhang Shan and
Erich Strohmaier and
Ji Qiang Performance Analysis of Leading HPC
Architectures With Beambeam3D . . . . . 21--32
Bronis R. de Supinski and
Martin Schulz and
Vasily V. Bulatov and
William Cabot and
Bor Chan and
Andrew W. Cook and
Erik W. Draeger and
James N. Glosli and
Jeffrey A. Greenough and
Keith Henderson and
Alison Kubota and
Steve Louis and
Brian J. Miller and
Mehul V. Patel and
Thomas E. Spelce and
Frederick H. Streitz and
Peter L. Williams and
Robert K. Yates and
Andy Yoo and
George Almasi and
Gyan Bhanot and
Alan Gara and
John A. Gunnels and
Manish Gupta and
Jose Moreira and
James Sexton and
Bob Walkup and
Charles Archer and
Francois Gygi and
Timothy C. Germann and
Kai Kadau and
Peter S. Lomdahl and
Charles Rendleman and
Michael L. Welcome and
William McLendon and
Bruce Hendrickson and
Franz Franchetti and
Stefan Kral and
Jürgen Lorenz and
Christoph W. Überhuber and
Edmond Chow and
Ümit Çatalyürek BlueGene/L Applications: Parallelism on
a Massive Scale . . . . . . . . . . . . 33--51
Sadaf R. Alam and
Richard F. Barrett and
Mark R. Fahey and
Jeffery A. Kuehn and
O. E. Bronson Messer and
Richard T. Mills and
Philip C. Roth and
Jeffrey S. Vetter and
Patrick H. Worley An Evaluation of the Oak Ridge National
Laboratory Cray XT3 . . . . . . . . . . 52--80
German Rodriguez and
Rosa M. Badia and
Jesus Labarta An Evaluation of Marenostrum Performance 81--96
Robert Hood and
Rupak Biswas and
Johnny Chang and
M. Jahed Djomehri and
Haoqiang Jin Benchmarking the Columbia Supercluster 97--112
Aiichiro Nakano and
Rajiv K. Kalia and
Ken-ichi Nomura and
Ashish Sharma and
Priya Vashishta and
Fuyuki Shimojo and
Adri C. T. van Duin and
William A. Goddard and
Rupak Biswas and
Deepak Srivastava and
Lin H. Yang De Novo Ultrascale Atomistic Simulations
on High-End Parallel Supercomputers . . 113--128
S. R. Tiyyagura and
P. Adamidis and
R. Rabenseifner and
P. Lammers and
S. Borowski and
F. Lippold and
F. Svensson and
O. Marxen and
S. Haberhauer and
A. P. Seitsonen and
J. Furthmüller and
K. Benkert and
M. Galle and
T. Bönisch and
U. Küster and
M. M. Resch Teraflops Sustained Performance With
Real World Applications . . . . . . . . 131--148
Michael Wehner and
Leonid Oliker and
John Shalf Towards Ultra-High Resolution Models of
Climate and Weather . . . . . . . . . . 149--165
Dylan G. Allegretti and
Garrett T. Kenyon and
William C. Priedhorsky Cellular Automata for Distributed Sensor
Networks . . . . . . . . . . . . . . . . 167--176
Geoffrey W. Cowles Parallelization of the FVCOM Coastal
Ocean Model . . . . . . . . . . . . . . 177--193
Nguyen Hai Chau and
Atsushi Kawai and
Toshikazu Ebisuzaki Acceleration of Fast Multipole Method
Using Special-Purpose Computer GRAPE . . 194--205
Yu-Heng Tseng and
Chris Ding Efficient Parallel I/O in Community
Atmosphere Model (CAM) . . . . . . . . . 206--218
Jakub Kurzak and
Dragan Mirkovi\'c and
B. Montgomery Pettitt and
S. Lennart Johnsson Automatic Generation of FFT for
Translations of Multipole Expansions in
Spherical Harmonics . . . . . . . . . . 219--230
Jinjun Chen and
Hai Jin and
Mengchu Zhou Preface . . . . . . . . . . . . . . . . 235--237
Ru-Yue Ma and
Yong-Wei Wu and
Xiang-Xu Meng and
Shi-Jun Liu and
Li Pan Grid-Enabled Workflow Management System
Based on BPEL . . . . . . . . . . . . . 238--249
Pilar Herrero and
José Luis Bosque and
Manuel Salvadores and
María S. Pérez WE-AMBLE: a Workflow Engine To Manage
Awareness in Collaborative Grid
Environments . . . . . . . . . . . . . . 250--267
Andrew Harrison and
Ian Taylor and
Ian Wang and
Matthew Shields WS-RF Workflow in Triana . . . . . . . . 268--283
Wanchun Dou and
Jinjun Chen and
Jianxun Liu and
S. C. Cheung and
Guihai Chen and
Shaokun Fan A Workflow Engine-Driven SOA-Based
Cooperative Computing Paradigm in Grid
Environments . . . . . . . . . . . . . . 284--300
Cecilia Gomes and
Omer F. Rana and
Jose Cunha Extending Grid-Based Workflow Tools With
Patterns/Operators . . . . . . . . . . . 301--318
Jinjun Chen and
Yun Yang Activity Completion Duration Based
Checkpoint Selection for Dynamic
Verification of Temporal Constraints in
Grid Workflow Systems . . . . . . . . . 319--329
Dang Minh Quan and
D. Frank Hsu Mapping Heavy Communication Grid-Based
Workflows Onto Grid Resources Within an
SLA Context Using Metaheuristics . . . . 330--346
Tristan Glatard and
Johan Montagnat and
Diane Lingrand and
Xavier Pennec Flexible and Efficient Workflow
Deployment of Data-Intensive
Applications on Grids With MOTEUR . . . 347--360
Antonio Plaza and
Chein-I Chang Preface . . . . . . . . . . . . . . . . 363--365
Antonio Plaza and
Chein-I Chang Clusters Versus FPGA for Parallel
Processing of Hyperspectral Imagery . . 366--385
David Valencia and
Alexey Lastovetsky and
Maureen O'Flynn and
Antonio Plaza and
Javier Plaza Parallel Processing of Remotely Sensed
Hyperspectral Images on Heterogeneous
Networks of Workstations Using HeteroMPI 386--407
Mingkai Hsueh and
Chein-I Chang Field Programmable Gate Arrays (FPGA)
for Pixel Purity Index Using Blocks of
Skewers for Endmember Extraction in
Hyperspectral Imagery . . . . . . . . . 408--423
Javier Setoain and
Manuel Prieto and
Christian Tenllado and
Francisco Tirado GPU for Parallel On-Board Hyperspectral
Image Processing . . . . . . . . . . . . 424--437
Qian Du and
James E. Fowler Low-Complexity Principal Component
Analysis for Hyperspectral Image
Compression . . . . . . . . . . . . . . 438--448
Uwe Fladrich and
Jörg Stiller and
Wolfgang E. Nagel Improved Performance for Nodal Spectral
Element Operators . . . . . . . . . . . 450--459
Selim Gurun and
Rich Wolski and
Chandra Krintz and
Dan Nurmi On the Efficacy of Computation
Offloading Decision-Making Strategies 460--479
Jack J. Dongarra and
Julien Langou The Problem With the LINPACK Benchmark
1.0 Matrix Generator . . . . . . . . . . 5--13
Jian He and
Alex Verstak and
L. T. Watson and
M. Sosonkina Performance Modeling and Analysis of a
Massively Parallel Direct---Part 1 . . . 14--28
Jian He and
Alex Verstak and
M. Sosonkina and
L. T. Watson Performance Modeling and Analysis of a
Massively Parallel Direct---Part 2 . . . 29--41
James Mc Donald and
Aaron Golden and
S. Gerard Jennings OpenDDA: a Novel High-Performance
Computational Framework for the Discrete
Dipole Approximation . . . . . . . . . . 42--61
Jin Woo Park and
Si Hyong Park and
Seung Jo Kim Optimization With High-Cost Objective
Function Evaluations in a Computing Grid
and an Application To Simulation-Based
Design . . . . . . . . . . . . . . . . . 62--83
Sundari M. Sivagama and
Sathish S. Vadhiyar and
Ravi S. Nanjundiah Dynamic Component Extension: a Strategy
for Performance Improvement in
Multicomponent Applications . . . . . . 84--98
Marta Beltrán and
Antonio Guzmán How to Balance the Load on Heterogeneous
Clusters . . . . . . . . . . . . . . . . 99--118
Alexey Lastovetsky and
Vladimir Rychkov Accurate and Efficient Estimation of
Parameters of Heterogeneous
Communication Performance Models . . . . 123--139
Jacques M. Bahi and
Jean-Claude Charr and
Raphaël Couturier and
David Laiymani A Parallel Algorithm To Solve Large
Stiff ODE Systems on Grid Systems . . . 140--151
Werner Mach and
Erich Schikuta Parallel Algorithms for the Execution of
Relational Database Operations Revisited
on Grids . . . . . . . . . . . . . . . . 152--170
Anne Benoit and
Harald Kosch and
Veronika Rehn-Sonigo and
Yves Robert Multi-Criteria Scheduling of Pipeline
Workflows (and Application To the JPEG
Encoder) . . . . . . . . . . . . . . . . 171--187
Jack Dongarra and
Bernard Tourancheau Editorial . . . . . . . . . . . . . . . 195--195
Martin J. Chorley and
David W. Walker and
Martyn F. Guest Hybrid Message-Passing and Shared-Memory
Programming in a Molecular Dynamics
Application on Multicore Clusters . . . 196--211
Franck Cappello Fault Tolerance in Petascale/Exascale
Systems: Current Knowledge, Challenges
and Research Opportunities . . . . . . . 212--226
Mark L. James and
Andrew A. Shapiro and
Paul L. Springer and
Hans P. Zima Adaptive Fault Tolerance for Scalable
Cluster Computing in Space . . . . . . . 227--241
James S. Plank The Raid-6 Liber8Tion Code . . . . . . . 242--251
Tahsin Kurc and
Shannon Hastings and
Vijay Kumar and
Stephen Langella and
Ashish Sharma and
Tony Pan and
Scott Oster and
David Ervin and
Justin Permar and
Sivaramakrishnan Narayanan and
Yolanda Gil and
Ewa Deelman and
Mary Hall and
Joel Saltz HPC and Grid Computing for Integrative
Biomedical Research . . . . . . . . . . 252--264
Shuaiwen Song and
Rong Ge and
Xizhou Feng and
Kirk W. Cameron Energy Profiling and Analysis of the HPC
Challenge Benchmarks . . . . . . . . . . 265--276
Piotr Luszczek Parallel Programming in MATLAB . . . . . 277--283
Judit Planas and
Rosa M. Badia and
Eduard Ayguadé and
Jesus Labarta Hierarchical Task-Based Programming With
StarSs . . . . . . . . . . . . . . . . . 284--299
Jack Dongarra and
Pete Beckman and
Patrick Aerts and
Frank Cappello and
Thomas Lippert and
Satoshi Matsuoka and
Paul Messina and
Terry Moore and
Rick Stevens and
Anne Trefethen and
Mateo Valero The International Exascale Software
Project: a Call To Cooperative Action By
the Global High-Performance Community 309--322
Bernd Mohr Summary of the IESP White Papers . . . . 323--327
Barbara Chapman and
Jesús Labarta and
Vivek Sarkar and
Mitsuhisa Sato Programmability Issues . . . . . . . . . 328--331
Thomas Sterling Models of Computation --- Enabling
Exascale . . . . . . . . . . . . . . . . 332--334
Thomas Sterling The Biggest Need: a New Model of
Computation . . . . . . . . . . . . . . 335--336
Ewing Lusk Slouching Towards Exascale . . . . . . . 337--339
Jesús Labarta and
Eduard Ayguadé and
Mateo Valero BSC Vision Towards Exascale . . . . . . 340--343
Laxmikant Kale Programming Models at Exascale: Adaptive
Runtime Systems, Incomplete Simple
Languages, and Interoperability . . . . 344--346
Arthur Maccabe and
Hugo Falter and
William Kramer Resource Management . . . . . . . . . . 347--349
Mark Seager and
Brent Gorda The Case for a Hierarchical System Model
for Linux Clusters . . . . . . . . . . . 350--354
Bernd Mohr and
Matthias S. Müller and
Wolfgang E. Nagel Performance at Exascale . . . . . . . . 355--356
David Skinner and
Alok Choudary On the Importance of End-to-End
Application Performance Monitoring and
Workload Analysis at the Exascale . . . 357--360
Jean-Yves Berthou and
Jean-François Hamelin and
Etienne de Rocquigny XXL Simulation for XXIst Century Power
Systems Operation . . . . . . . . . . . 361--365
David Keyes Partial Differential Equation-Based
Applications and Solvers at Extreme
Scale . . . . . . . . . . . . . . . . . 366--368
Peter Michielse Application Analysis and Porting in the
PRACE Project . . . . . . . . . . . . . 369--373
Franck Cappello and
Al Geist and
Bill Gropp and
Laxmikant Kale and
Bill Kramer and
Marc Snir Toward Exascale Resilience . . . . . . . 374--388
William Kramer and
David Skinner An Exascale Approach to Software and
Hardware Design . . . . . . . . . . . . 389--391
William Kramer and
David Skinner Consistent Application Performance at
the Exascale . . . . . . . . . . . . . . 392--394
Mark Seager and
Brent Gorda A Collaboration and Commercialization
Model for Exascale Software Research . . 395--397
Giovanni Aloisio and
Sandro Fiore Towards Exascale Distributed Data
Management . . . . . . . . . . . . . . . 398--400
Al Geist and
Sudip Dosanjh IESP Exascale Challenge: Co-Design of
Architectures and Algorithms . . . . . . 401--402
David Barkai The Application Perspective: Seeking
Productivity and Performance . . . . . . 403--408
Robert F. Lucas Musings on the Path Forward to Exascale 409--410
Laxmikant Kale Early Application Development/Tuning and
Application
Characterization/Segmentation . . . . . 411--412
William Gropp and
Marc Snir On the Need for a Consortium of
Capability Centers . . . . . . . . . . . 413--420
Abani Patra and
Rob Pennington and
Ed Seidel Exascale Software: Some Questions to
Drive the Development . . . . . . . . . 421--422
Anne Trefethen and
Nick Higham and
Iain Duff and
Peter Coveney Developing a High-Performance
Computing/Numerical Analysis Roadmap . . 423--426
Al Geist and
Robert Lucas Major Computer Science Challenges at
Exascale . . . . . . . . . . . . . . . . 427--436
Michael A. Heroux Software Challenges for Extreme Scale
Computing: Going From Petascale to
Exascale Systems . . . . . . . . . . . . 437--439
Alexey Lastovetsky and
Tahar Kechadi Recent Advances in Parallel Virtual
Machine and Message Passing Interface 3--4
Pavan Balaji and
Anthony Chan and
William Gropp and
Rajeev Thakur and
Ewing Lusk The Importance of Non-Data-Communication
Overheads in MPI . . . . . . . . . . . . 5--15
Sameer Kumar and
Ahmad Faraj and
Amith R. Mamidala and
Brian Smith and
Gabor Dozsa and
Bob Cernohous and
John Gunnels and
Douglas Miller and
Joseph Ratterman and
Philip Heidelberger Architecture of the Component Collective
Messaging Interface . . . . . . . . . . 16--33
Alexey Lastovetsky and
Vladimir Rychkov and
Maureen O'Flynn Accurate Heterogeneous Communication
Models and a Software Tool for Their
Efficient Estimation . . . . . . . . . . 34--48
Pavan Balaji and
Darius Buntinas and
David Goodell and
William Gropp and
Rajeev Thakur Fine-Grained Multithreading Support for
Hybrid Threaded MPI Programming . . . . 49--57
Jesper Larsson Träff and
Andreas Ripke and
Christian Siebert and
Pavan Balaji and
Rajeev Thakur and
William Gropp A Pipelined Algorithm for Large,
Irregular All-Gather Problems . . . . . 58--68
Ron Brightwell Exploiting Direct Access Shared Memory
for MPI on Multi-Core Processors . . . . 69--77
Javier Garcia Blas and
Florin Isaila and
Jesus Carretero and
David Singh and
Felix Garcia-Carballeira Implementation and Evaluation of File
Write-Back and Prefetching for MPI-IO
Over GPFS . . . . . . . . . . . . . . . 78--92
Stephen F. Siegel and
Andrew R. Siegel Madre: the Memory-Aware Data
Redistribution Engine . . . . . . . . . 93--104
Hong Li and
Linda Petzold Efficient Parallelization of the
Stochastic Simulation Algorithm for
Chemically Reacting Systems on the
Graphics Processing Unit . . . . . . . . 107--116
Julianne Chung and
Philip Sternberg and
Chao Yang High-Performance Three-Dimensional Image
Reconstruction for Molecular Structure
Determination . . . . . . . . . . . . . 117--135
J. C. Pichel and
D. B. Heras and
J. C. Cabaleiro and
A. J. García-Loureiro and
F. F. Rivera Increasing the Locality of Iterative
Methods and Its Application to the
Simulation of Semiconductor Devices . . 136--153
Do Van Tuan and
Ui-Pil Chong Audio Watermarking Based on Advanced
Wigner Distribution and Important
Frequency Peaks . . . . . . . . . . . . 154--163
Florin Isaila and
Francisco Javier Garcia Blas and
Jesús Carretero and
Wei-keng Liao and
Alok Choudhary A Scalable Message Passing Interface
Implementation of an Ad-Hoc Parallel I/O
system . . . . . . . . . . . . . . . . . 164--184
Derek Groen and
Stefan Harfst and
Simon Portegies Zwart The Living Application: a
Self-Organizing System for Complex Grid
Tasks . . . . . . . . . . . . . . . . . 185--193
Mehmet Belgin and
Godmar Back and
Calvin J. Ribbens Operation Stacking for Ensemble
Computations With Variable Convergence 194--212
Patrick Downes and
Oisín Curran and
John Cunniffe and
Andy Shearer Distributed Radiotherapy Simulation with
the Webcom Workflow System . . . . . . . 213--227
Bruce Palmer and
Vidhya Gurumoorthi and
Alexandre Tartakovsky and
Tim Scheibe A Component-Based Framework for Smoothed
Particle Hydrodynamics Simulations of
Reactive Fluid Flow in Porous Media . . 228--239
Jose Ignacio Garzon and
Eduardo Huedo and
Ruben Santiago Montero and
Ignacio Martin Llorente and
Pablo Chacon End-To-End Cache System for Grid
Computing: Design and Efficiency
Analysis of a High-Throughput
Bioinformatic Docking Application . . . 243--264
M. E. Tryby and
B. Y. Mirghani and
G. K. Mahinthakumar and
S. R. Ranjithan A Solution Framework for Environmental
Characterization Problems . . . . . . . 265--283
Ewa Deelman Grids and Clouds: Making Workflow
Applications Work in Heterogeneous
Distributed Environments . . . . . . . . 284--298
Thomas Hauser and
Raymond LeBeau Optimization of a Computational Fluid
Dynamics Code for the Memory Hierarchy:
a Case Study . . . . . . . . . . . . . . 299--318
Toshiyuki Imamura and
Takuma Kano and
Susumu Yamada and
Masahiko Okumura and
Masahiko Machida High-Performance Quantum Simulation for
Coupled Josephson Junctions on the Earth
Simulator: a Challenge To the Schrödinger
Equation on $ 256^4 $ Grids . . . . . . 319--334
Marc Casas and
Rosa M. Badia and
Jesús Labarta Automatic Phase Detection and Structure
Extraction of MPI Applications . . . . . 335--360
Ninghui Sun and
David Kahaner and
Debbie Chen High-performance Computing in China:
Research and Applications . . . . . . . 363--409
Abhinav Bhatelé and
Lukasz Wesolowski and
Eric Bohm and
Edgar Solomonik and
Laxmikant V. Kalé Understanding Application Performance
via Micro-benchmarks on Three Large
Supercomputers: Intrepid, Ranger and
Jaguar . . . . . . . . . . . . . . . . . 411--427
Nicolas Gourdain and
Marc Montagnac and
Fabien Wlassow and
Michel Gazaix High-performance Computing to Simulate
Large-scale Industrial Flows in
Multistage Compressors . . . . . . . . . 429--443
Ke Liu and
Hai Jin and
Jinjun Chen and
Xiao Liu and
Dong Yuan and
Yun Yang A Compromised-Time-Cost Scheduling
Algorithm in SwinDeW-C for
Instance-Intensive Cost-Constrained
Workflows on a Cloud Computing Platform 445--456
Paula Cecilia Fritzsche and
Jose-Jesus Fernandez and
Dolores Rexachs and
Inmaculada Garcia and
Emilio Luque Analytical Performance Prediction for
Iterative Reconstruction Techniques in
Electron Tomography of Biological
Structures . . . . . . . . . . . . . . . 457--468
Nor Asilah Wati Abdul Hamid and
Paul Coddington Comparison of MPI Benchmark Programs on
Shared Memory and Distributed Memory
Machines (Point-to-Point Communication) 469--483
Hung-Hsun Su and
Max Billingsley and
Alan D. George Parallel Performance Wizard: a
Performance System for the Analysis of
Partitioned Global-Address-Space
Applications . . . . . . . . . . . . . . 485--510
Rajib Nath and
Stanimire Tomov and
Jack Dongarra An Improved Magma GEMM for Fermi
Graphics Processing Units . . . . . . . 511--515
Jack Dongarra and
Pete Beckman and
Terry Moore and
Patrick Aerts and
Giovanni Aloisio and
Jean-Claude Andre and
David Barkai and
Jean-Yves Berthou and
Taisuke Boku and
Bertrand Braunschweig and
Franck Cappello and
Barbara Chapman and
Xuebin Chi and
Alok Choudhary and
Sudip Dosanjh and
Thom Dunning and
Sandro Fiore and
Al Geist and
Bill Gropp and
Robert Harrison and
Mark Hereld and
Michael Heroux and
Adolfy Hoisie and
Koh Hotta and
Zhong Jin and
Yutaka Ishikawa and
Fred Johnson and
Sanjay Kale and
Richard Kenway and
David Keyes and
Bill Kramer and
Jesus Labarta and
Alain Lichnewsky and
Thomas Lippert and
Bob Lucas and
Barney Maccabe and
Satoshi Matsuoka and
Paul Messina and
Peter Michielse and
Bernd Mohr and
Matthias S. Mueller and
Wolfgang E. Nagel and
Hiroshi Nakashima and
Michael E. Papka and
Dan Reed and
Mitsuhisa Sato and
Ed Seidel and
John Shalf and
David Skinner and
Marc Snir and
Thomas Sterling and
Rick Stevens and
Fred Streitz and
Bob Sugar and
Shinji Sumimoto and
William Tang and
John Taylor and
Rajeev Thakur and
Anne Trefethen and
Mateo Valero and
Aad van der Steen and
Jeffrey Vetter and
Peg Williams and
Robert Wisniewski and
Kathy Yelick The International Exascale Software
Project roadmap . . . . . . . . . . . . 3--60
Kamran Karimi and
Neil Dickson and
Firas Hamze High-performance Physics Simulations
Using Multi-core CPUs and GPGPUs in a
Volunteer Computing Context . . . . . . 61--69
Silvio Migliori and
Giovanni Bracco and
Lorella Fatone and
Maria Cristina Recchioni and
Francesco Zirilli A Parallel Code for Time-Dependent
Acoustic Scattering Involving Passive or
Smart Obstacles . . . . . . . . . . . . 70--92
Rosa Filgueira and
David E. Singh and
Jesús Carretero and
Alejandro Calderón and
Félix García Adaptive-CoMPI: Enhancing MPI-Based
Applications' Performance and
Scalability by using Adaptive
Compression . . . . . . . . . . . . . . 93--114
D. Guo and
W. Gropp Optimizing Sparse Data Structures for
Matrix-vector Multiply . . . . . . . . . 115--131
Pavan Balaji and
Abhinav Vishnu Special Issue on Programming Models and
Systems Software Support for High-End
Computing Applications . . . . . . . . . 135--136
Pieter Bellens and
Josep M. Perez and
Rosa M. Badia and
Jesus Labarta Making the Best of Temporal Locality:
Just-in-Time Renaming and Lazy
Write-Back on the Cell/B.E. . . . . . . 137--147
Kazutomo Yoshii and
Kamil Iskra and
Harish Naik and
Pete Beckman and
P. Chris Broekema Performance and Scalability Evaluation
of `Big Memory' on Blue Gene Linux . . . 148--160
Patrick Widener and
Matthew Wolf and
Hasan Abbasi and
Scott McManus and
Mary Payne and
Matthew Barrick and
Jack Pulikottil and
Patrick Bridges and
Karsten Schwan Exploiting Latent I/O Asynchrony in
Petascale Science Applications . . . . . 161--179
Rinku Gupta and
Harish Naik and
Pete Beckman Understanding Checkpointing Overheads on
Massive-Scale Systems: Analysis of the
IBM Blue Gene/P System . . . . . . . . . 180--192
S. Murtaza and
A. G. Hoekstra and
P. M. A. Sloot Cellular Automata Simulations on a FPGA
cluster . . . . . . . . . . . . . . . . 193--204
Juan Gómez-Luna and
José María González-Linares and
José Ignacio Benavides and
Emilio L. Zapata and
Nicolás Guil Load Balancing versus Occupancy
Maximization on Graphics Processing
Units: the Generalized Hough Transform
as a Case Study . . . . . . . . . . . . 205--222
Abigail Hunter and
Faisal Saied and
Chinh Le and
Marisol Koslowski Large-Scale $3$D Phase Field Dislocation
Dynamics Simulations on High-Performance
Architectures . . . . . . . . . . . . . 223--235
Matthias Korch and
Thomas Rauber Parallel Low-Storage Runge--Kutta
Solvers for ODE Systems with Limited
Access Distance . . . . . . . . . . . . 236--255
Jack Dongarra and
Bernard Tourancheau Selected papers of the Workshop on
Clusters, Clouds and Grids for
Scientific Computing (CCGSC) . . . . . . 259--260
Anne Benoit and
Paul Renaud-Goud and
Yves Robert Models and complexity results for
performance and energy optimization of
concurrent streaming applications . . . 261--273
Scott Callaghan and
Philip Maechling and
Patrick Small and
Kevin Milner and
Gideon Juve and
Thomas H. Jordan and
Ewa Deelman and
Gaurang Mehta and
Karan Vahi and
Dan Gunter and
Keith Beattie and
Christopher Brooks Metrics for heterogeneous scientific
workflows: a case study of an earthquake
science application . . . . . . . . . . 274--285
Ananta Tiwari and
Jeffrey K. Hollingsworth and
Chun Chen and
Mary Hall and
Chunhua Liao and
Daniel J. Quinlan and
Jacqueline Chame Auto-tuning full applications: a case
study . . . . . . . . . . . . . . . . . 286--294
Christian Obrecht and
Frédéric Kuznik and
Bernard Tourancheau and
Jean-Jacques Roux The Thelma Project: Multi-GPU
implementation of the lattice Boltzmann
method . . . . . . . . . . . . . . . . . 295--303
Deb Agarwal and
You-Wei Cheah and
Dan Fay and
Jonathan Fay and
Dean Guo and
Tony Hey and
Marty Humphrey and
Keith Jackson and
Jie Li and
Christophe Poulain and
Youngryel Ryu and
Catharine van Ingen Data-intensive science: the Terapixel
and Modisazure projects . . . . . . . . 304--316
Philip M. Papadopoulos Extending clusters to Amazon EC2 using
the Rocks toolkit . . . . . . . . . . . 317--327
Manu Shantharam and
Anirban Chatterjee and
Padma Raghavan Exploiting dense substructures for fast
sparse matrix vector multiplication . . 328--341
Charles Lively and
Xingfu Wu and
Valerie Taylor and
Shirley Moore and
Hung-Ching Chang and
Kirk Cameron Energy and performance characteristics
of different parallel implementations of
scientific applications on multicore
systems . . . . . . . . . . . . . . . . 342--350
Balaji Pavan and
Vishnu Abhinav Special Issue on Programming Models,
Software and Tools for High-End
Computing . . . . . . . . . . . . . . . 353--354
Yong Chen and
Huaiyu Zhu and
Philip C. Roth and
Hui Jin and
Xian-He Sun Global-aware and multi-order
context-based prefetching for
high-performance processors . . . . . . 355--370
Gengbin Zheng and
Abhinav Bhatelé and
Esteban Meneses and
Laxmikant V. Kalé Periodic hierarchical load balancing for
large supercomputers . . . . . . . . . . 371--385
Barry Smith and
Hong Zhang Sparse triangular solves for $ I L U $
revisited: data layout crucial to better
performance . . . . . . . . . . . . . . 386--391
Alexander E. MacDonald and
Jacques Middlecoff and
Tom Henderson and
Jin-Luen Lee A general method for modeling on
irregular grids . . . . . . . . . . . . 392--403
Francisco D. Igual and
Rafael Mayo and
Timothy Hartley and
Ümit V. Çatalyürek and
Antonio Ruiz and
Manuel Ujaldon Color and texture analysis using
emerging parallel architectures . . . . 404--427
Heike Jagode and
Andreas Knüpfer and
Jack Dongarra and
Matthias Jurenz and
Matthias S. Müller and
Wolfgang E. Nagel Trace-based performance analysis for the
petascale simulation code Flash . . . . 428--439
Tp Collignon and
Mb van Gijzen Fast iterative solution of large sparse
linear systems on geographically
separated clusters . . . . . . . . . . . 440--450
David L. Hart Measuring TeraGrid: workload
characterization for a high-performance
computing federation . . . . . . . . . . 451--465
Shanti Bhushan and
Pablo Carrica and
Jianming Yang and
Frederick Stern Scalability studies and large grid
computations for surface combatant using
CFDShip-Iowa . . . . . . . . . . . . . . 466--487
M. Chau and
R. Couturier and
J. Bahi and
P. Spiteri Parallel solution of the obstacle
problem in Grid environments . . . . . . 488--495
Aydin Buluç and
John R. Gilbert The Combinatorial BLAS: design,
implementation, and applications . . . . 496--509
Anjuli Bamzai Preface . . . . . . . . . . . . . . . . 3--4
John M. Dennis and
Mariana Vertenstein and
Patrick H. Worley and
Arthur A. Mirin and
Anthony P. Craig and
Robert Jacob and
Sheri Mickelson Computational performance of
ultra-high-resolution capability in the
Community Earth System Model . . . . . . 5--16
Arthur A. Mirin and
Patrick H. Worley Improving the performance scalability of
the Community Atmosphere Model . . . . . 17--30
Anthony P. Craig and
Mariana Vertenstein and
Robert Jacob A new flexible coupler for Earth system
modeling developed for CCSM4 and CESM1 31--42
John M. Dennis and
Jim Edwards and
Ray Loy and
Robert Jacob and
Arthur A. Mirin and
Anthony P. Craig and
Mariana Vertenstein An application-level parallel I/O
library for Earth system models . . . . 43--53
Katherine J. Evans and
Andrew G. Salinger and
Patrick H. Worley and
Stephen F. Price and
William H. Lipscomb and
Jeffrey A. Nichols and
James B. White III and
Mauro Perego and
Mariana Vertenstein and
James Edwards and
Jean-François Lemieux A modern solver interface to manage
solution algorithms in the Community
Earth System Model . . . . . . . . . . . 54--62
Peter H. Lauritzen and
Arthur A. Mirin and
John Truesdale and
Kevin Raeder and
Jeffrey L. Anderson and
Julio Bacmeister and
Richard B. Neale Implementation of new
diffusion/filtering operators in the
CAM--FV dynamical core . . . . . . . . . 63--73
John M. Dennis and
Jim Edwards and
Katherine J. Evans and
Oksana Guba and
Peter H. Lauritzen and
Arthur A. Mirin and
Amik St-Cyr and
Mark A. Taylor and
Patrick H. Worley CAM--SE: a scalable spectral element
dynamical core for the Community
Atmosphere Model . . . . . . . . . . . . 74--89
Torsten Hoefler and
Kamil Iskra Operating systems and runtime
environments on supercomputers . . . . . 93--94
Jan Stoess and
Udo Steinberg and
Volkmar Uhlig and
Jens Kehne and
Jonathan Appavoo and
Amos Waterland A lightweight virtual machine monitor
for Blue Gene/P . . . . . . . . . . . . 95--109
Stephen L. Olivier and
Allan K. Porterfield and
Kyle B. Wheeler and
Michael Spiegel and
Jan F. Prins OpenMP task scheduling strategies for
multicore NUMA systems . . . . . . . . . 110--124
Patrick G. Bridges and
Dorian Arnold and
Kevin T. Pedretti and
Madhav Suresh and
Feng Lu and
Peter Dinda and
Russ Joseph and
Jack Lange Virtual-machine-based emulation of
future generation high-performance
computing systems . . . . . . . . . . . 125--135
Terry Jones Linux kernel co-scheduling and bulk
synchronous parallelism . . . . . . . . 136--145
Pavan Balaji and
Jiayuan Meng Applications for the Heterogeneous
Computing Era . . . . . . . . . . . . . 146--147
Yan Li and
Jeffrey R. Diamond and
Xu Wang and
Haibo Lin and
Yudong Yang and
Zhenxing Han Large-scale Fast Fourier Transform on a
heterogeneous multi-core system . . . . 148--158
Sean Whalen and
Sophie Engle and
Sean Peisert and
Matt Bishop Network-theoretic classification of
parallel computation patterns . . . . . 159--169
Haicheng Wu and
Gregory Diamos and
Jin Wang and
Si Li and
Sudhakar Yalamanchili Characterization and transformation of
unstructured control flow in bulk
synchronous GPU applications . . . . . . 170--185
Beniamino Di Martino and
Eduard Mehofer and
Dan Quinlan and
Markus Schordan Graphical processing units and
scientific applications . . . . . . . . 189--191
Lancelot Perrotte and
Guillaume Saupin Fast GPU perspective grid construction
and triangle tracing for exhaustive ray
tracing of highly coherent rays . . . . 192--202
Aria Shahingohar and
Roy Eagleson A framework for GPU accelerated
deformable object modeling . . . . . . . 203--214
Andreas Monitzer Combining lattice Boltzmann and discrete
element methods on a graphics processor 215--226
Marc-André Hermanns and
Markus Geimer and
Bernd Mohr and
Felix Wolf Scalable detection of MPI-2 remote
memory access inefficiency patterns . . 227--236
Francisco D. Igual and
Rafael Mayo and
Timothy D. R. Hartley and
Ümit V. Çatalyürek and
Antonio Ruiz and
Manuel Ujaldon Retracted: Color and texture analysis on
emerging parallel architectures . . . . 237--259
Thomas Gw Epperly and
Gary Kumfert and
Tamara Dahlgren and
Dietmar Ebner and
Jim Leek and
Adrian Prantl and
Scott Kohn High-performance language
interoperability for scientific
computing through Babel . . . . . . . . 260--274
Maciej Malawski and
Tomasz Gubala and
Marian Bubak Component-based approach for programming
and running scientific applications on
grids and clouds . . . . . . . . . . . . 275--295
Florian Ries and
Tommaso De Marco and
Roberto Guerrieri Tuning solution of large non-Hermitian
linear systems on multiple graphics
processing unit accelerated workstations 296--309
Keiichiro Fukazawa and
Takayuki Umeda Performance measurement of
magnetohydrodynamic code for space
plasma on typical scalar-type
supercomputer systems with a large
number of cores . . . . . . . . . . . . 310--318
Chirag Dekate and
Matthew Anderson and
Maciej Brodowicz and
Hartmut Kaiser and
Bryce Adelstein-Lelbach and
Thomas Sterling Improving the scalability of parallel
$N$-body applications with an
event-driven constraint-based execution
model . . . . . . . . . . . . . . . . . 319--332
Horst Simon and
Jack Dongarra and
Hemant Shukla Introduction to the Special Issue . . . 335--336
Rio Yokota and
Lorena A. Barba A tuned and scalable fast multipole
method as a preeminent algorithm for
exascale systems . . . . . . . . . . . . 337--346
Richard L. Martin and
Prabhat and
David D. Donofrio and
James A. Sethian and
Maciej Haranczyk Accelerating analysis of void space in
porous materials on multicore and GPU
platforms . . . . . . . . . . . . . . . 347--357
Melvyn Wright Adaptive Real-Time Imaging Synthesis
Telescopes . . . . . . . . . . . . . . . 358--366
Hsi-Yu Schive and
Ui-Han Zhang and
Tzihong Chiueh Directionally unsplit hydrodynamic
schemes with hybrid Mpi/Openmp/GPU
parallelization in AMR . . . . . . . . . 367--377
Michael Commer and
Filipe Rnc Maia and
Gregory A. Newman Iterative Krylov solution methods for
geophysical electromagnetic simulations
on throughput-oriented processing units 378--385
Bálint Joó and
Mike A. Clark Lattice QCD on GPU clusters, using the
Quda library and the Chroma software
system . . . . . . . . . . . . . . . . . 386--398
E. Wes Bethel and
Mark Howison Multi-core and many-core shared-memory
parallel raycasting volume rendering
optimization and tuning . . . . . . . . 399--412
Mahantesh Halappanavar and
John Feo and
Oreste Villa and
Antonino Tumeo and
Alex Pothen Approximate weighted matching on
emerging manycore and multithreaded
architectures . . . . . . . . . . . . . 413--430
David E. Keyes and
Lois C. McInnes and
Carol Woodward and
William Gropp and
Eric Myra and
Michael Pernice and
John Bell and
Jed Brown and
Alain Clo and
Jeffrey Connors and
Emil Constantinescu and
Don Estep and
Kate Evans and
Charbel Farhat and
Ammar Hakim and
Glenn Hammond and
Glen Hansen and
Judith Hill and
Tobin Isaac and
Xiangmin Jiao and
Kirk Jordan and
Dinesh Kaushik and
Efthimios Kaxiras and
Alice Koniges and
Kihwan Lee and
Aaron Lott and
Qiming Lu and
John Magerlein and
Reed Maxwell and
Michael McCourt and
Miriam Mehl and
Roger Pawlowski and
Amanda P. Randles and
Daniel Reynolds and
Beatrice Rivi\`ere and
Ulrich Rüde and
Tim Scheibe and
John Shadid and
Brendan Sheehan and
Mark Shephard and
Andrew Siegel and
Barry Smith and
Xianzhu Tang and
Cian Wilson and
Barbara Wohlmuth Multiphysics simulations: Challenges and
opportunities . . . . . . . . . . . . . 4--83
Pavan Balaji and
Satoshi Matsuoka Guest Editors' Introduction: Special
Issue on Applications for the
Heterogeneous Computing Era . . . . . . 87--88
Mitesh R. Meswani and
Laura Carrington and
Didem Unat and
Allan Snavely and
Scott Baden and
Stephen Poole Modeling and predicting performance of
high performance computing applications
on hardware accelerators . . . . . . . . 89--108
Huming Zhu and
Yu Cao and
Zhiqiang Zhou and
Maoguo Gong and
Licheng Jiao Parallel unsupervised Synthetic Aperture
Radar image change detection on a
graphics processing unit . . . . . . . . 109--122
Torsten Hoefler and
Kamil Iskra Operating systems and runtime
environments on supercomputers . . . . . 123--123
Brian Kocoloski and
John Lange Improving compute node performance using
virtualization . . . . . . . . . . . . . 124--135
Hakan Akkan and
Michael Lang and
Lorie Liebrock Understanding and isolating the noise in
the Linux kernel . . . . . . . . . . . . 136--146
Abhishek Kulkarni and
Latchesar Ionkov and
Michael Lang and
Andrew Lumsdaine Optimizing process creation and
execution on multi-core architectures 147--161
Jan Treibig and
Georg Hager and
Hannes G. Hofmann and
Joachim Hornegger and
Gerhard Wellein Pushing the limits for medical image
reconstruction on recent standard
multicore processors . . . . . . . . . . 162--177
M. A. Clark and
P. C. La Plante and
L. J. Greenhill Accelerating radio astronomy
cross-correlation with graphics
processing units . . . . . . . . . . . . 178--192
Tareq Malas and
Aron J. Ahmadia and
Jed Brown and
John A. Gunnels and
David E. Keyes Optimizing the performance of streaming
numerical kernels on the IBM Blue Gene/P
PowerPC 450 processor . . . . . . . . . 193--209
K. G. Felker and
A. R. Siegel and
S. F. Siegel Optimizing Memory Constrained
Environments in Monte Carlo Nuclear
Reactor Simulations . . . . . . . . . . 210--216
Kiran Narayanan and
Angel Mora and
Nicholas Allsopp and
Tamer El Sayed A hybrid, massively parallel
implementation of a genetic algorithm
for optimization of the impact
performance of a metal/polymer composite
plate . . . . . . . . . . . . . . . . . 217--227
Jack Dongarra and
Bernard Tourancheau Introduction for August Special Issue
CCDSC . . . . . . . . . . . . . . . . . 231--231
Mohammed EM Diouri and
Ghislain L. Tsafack Chetsa and
Olivier Glück and
Laurent Lef\`evre and
Jean-Marc Pierson and
Patricia Stolf and
Georges Da Costa Energy efficiency in high-performance
computing with and without knowledge of
applications and services . . . . . . . 232--243
Wesley Bland and
Aurelien Bouteiller and
Thomas Herault and
George Bosilca and
Jack Dongarra Post-failure recovery of MPI
communication capability: Design and
rationale . . . . . . . . . . . . . . . 244--254
Kyle Spafford and
Jeffrey S. Vetter and
Thomas Benson and
Mike Parker Modeling synthetic aperture radar
computation with Aspen . . . . . . . . . 255--262
Joel H. Saltz and
George Teodoro and
Tony Pan and
Lee A. D. Cooper and
Jun Kong and
Scott Klasky and
Tahsin M. Kurc Feature-based analysis of large-scale
spatio-temporal sensor data on hybrid
architectures . . . . . . . . . . . . . 263--272
Ana Gainaru and
Franck Cappello and
Marc Snir and
William Kramer Failure prediction for HPC systems and
applications: Current situation and open
issues . . . . . . . . . . . . . . . . . 273--282
Emmanuel Jeannot Symbolic mapping and allocation for the
Cholesky factorization on NUMA machines:
Results and optimizations . . . . . . . 283--290
Vicente Peruffo Minotto and
Claudio Rosito Jung and
Luiz Gonzaga da Silveira, Jr. and
Bowon Lee GPU-based approaches for real-time sound
source localization using the SRP--PHAT
algorithm . . . . . . . . . . . . . . . 291--306
Chaofeng Hou and
Ji Xu and
Peng Wang and
Wenlai Huang and
Xiaowei Wang and
Wei Ge and
Xianfeng He and
Li Guo and
Jinghai Li Petascale molecular dynamics simulation
of crystalline silicon on Tianhe-1A . . 307--317
José R. Sanjurjo and
Margarita Amor and
Montserrat Bóo and
Ramón Doallo Parallel Monte Carlo radiosity using
scene partitioning . . . . . . . . . . . 318--334
I. Carpenter and
R. K. Archibald and
K. J. Evans and
J. Larkin and
P. Micikevicius and
M. Norman and
J. Rosinski and
J. Schwarzmeier and
M. A. Taylor Progress towards accelerating HOMME on
hybrid multi-core systems . . . . . . . 335--347
Ekaterini Solomou and
Spiros Kostopoulos and
Konstantinos Sidiropoulos and
Emmanouil Athanasiadis and
Eleftherios Lavdas and
Dimitris Glotsos and
George Sakellaropoulos and
Petros Zampakis and
John Stonham and
Dionisis Cavouras Designing a pattern recognition system
on GPU for discriminating between
patients with micro-ischaemic and
multiple sclerosis lesions, using MRI
images . . . . . . . . . . . . . . . . . 348--359
Anshu Dubey and
Alan C. Calder and
Christopher Daley and
Robert T. Fisher and
C. Graziani and
George C. Jordan and
Donald Q. Lamb and
Lynn B. Reid and
Dean M. Townsley and
Klaus Weide Pragmatic optimizations for better
scientific utilization of large
supercomputers . . . . . . . . . . . . . 360--373
Leonid Oliker and
Richard Vuduc Introduction for Special Issue on
Autotuning . . . . . . . . . . . . . . . 377--378
Protonu Basu and
Mary Hall and
Malik Khan and
Suchit Maindola and
Saurav Muralidharan and
Shreyas Ramalingam and
Axel Rivera and
Manu Shantharam and
Anand Venkat Towards making autotuning mainstream . . 379--393
Ray S. Chen and
Jeffrey K. Hollingsworth Towards fully automatic auto-tuning:
Leveraging language features of Chapel 394--402
Nicholas Chaimov and
Scott Biersdorff and
Allen D. Malony Tools for machine-learning-based
empirical autotuning and specialization 403--411
Sanket Tavarageri and
J. Ramanujam and
P. Sadayappan Adaptive parallel tiled code generation
and accelerated auto-tuning . . . . . . 412--425
Diego Fabregat-Traver and
Paolo Bientinesi Application-tailored linear algebra
algorithms: a search-based approach . . 426--439
Bryan Marker and
Don Batory and
Robert van de Geijn A case study in mechanically deriving
dense linear algebra code . . . . . . . 440--453
Khaled Z. Ibrahim and
Kamesh Madduri and
Samuel Williams and
Bei Wang and
Stephane Ethier and
Leonid Oliker Analysis and optimization of gyrokinetic
toroidal simulations on homogeneous and
heterogeneous platforms . . . . . . . . 454--473
Thanadech Thanakornworakij and
Raja Nassar and
Chokchai Box Leangsuksun and
Mihaela Paun Reliability model of a system of $k$
nodes with simultaneous failures for
high-performance computing applications 474--482
Raúl Valín and
Carlos Sampedro and
Natalia Seoane and
Manuel Aldegunde and
Antonio Garcia-Loureiro and
Andres Godoy and
Francisco Gámiz Optimisation and parallelisation of a
$2$D MOSFET multi-subband ensemble Monte
Carlo simulator . . . . . . . . . . . . 483--492
Jacobo Lobeiras and
Moisés Viñas and
Margarita Amor and
Basilio B. Fraguela and
Manuel Arenaz and
J. A. García and
M. J. Castro Parallelization of shallow water
simulations on current multi-threaded
systems . . . . . . . . . . . . . . . . 493--512
Dahai Guo and
William Gropp Applications of the streamed storage
format for sparse matrix operations . . 3--12
Miguel O. Bernabeu and
James Southern and
Nicholas Wilson and
Peter Strazdins and
Jonathan Cooper and
Joe Pitt-Francis Chaste: a case study of parallelisation
of an open source finite-element solver
with applications to computational
cardiac electrophysiology simulation . . 13--32
Guillermo Vigueras and
Juan M. Orduña and
Miguel Lozano and
José M. Cecilia and
José M. García Accelerating collision detection for
large-scale crowd simulation on
multi-core and many-core architectures 33--49
Liang Zheng and
Huai Zhang and
Taras Gerya and
Matthew Knepley and
David A. Yuen and
Yaolin Shi Implementation of a multigrid solver on
a GPU for Stokes equations with strongly
variable viscosity based on Matlab and
CUDA . . . . . . . . . . . . . . . . . . 50--60
Alexander Vondrous and
Michael Selzer and
Johannes Hötzer and
Britta Nestler Parallel computing for phase-field
models . . . . . . . . . . . . . . . . . 61--72
Yasuhiro Idomura and
Motoki Nakata and
Susumu Yamada and
Masahiko Machida and
Toshiyuki Imamura and
Tomohiko Watanabe and
Masanori Nunami and
Hikaru Inoue and
Shigenobu Tsutsumi and
Ikuo Miyoshi and
Naoyuki Shida Communication-overlap techniques for
improved strong scaling of gyrokinetic
Eulerian code beyond 100k cores on the
K-computer . . . . . . . . . . . . . . . 73--86
Andrew R. Siegel and
Kord Smith and
Paul K. Romano and
Benoit Forget and
Kyle G. Felker Multi-core performance studies of a
Monte Carlo neutron transport code . . . 87--96
Iain Bethune and
J. Mark Bull and
Nicholas J. Dingle and
Nicholas J. Higham Performance analysis of asynchronous
Jacobi's method implemented in MPI,
SHMEM and OpenMP . . . . . . . . . . . . 97--111
Diego Darriba and
Guillermo L. Taboada and
Ramón Doallo and
David Posada High-performance computing selection of
models of DNA substitution for multicore
clusters . . . . . . . . . . . . . . . . 112--125
Marc Snir and
Robert W. Wisniewski and
Jacob A. Abraham and
Sarita V. Adve and
Saurabh Bagchi and
Pavan Balaji and
Jim Belak and
Pradip Bose and
Franck Cappello and
Bill Carlson and
Andrew A. Chien and
Paul Coteus and
Nathan A. DeBardeleben and
Pedro C. Diniz and
Christian Engelmann and
Mattan Erez and
Saverio Fazzari and
Al Geist and
Rinku Gupta and
Fred Johnson and
Sriram Krishnamoorthy and
Sven Leyffer and
Dean Liberty and
Subhasish Mitra and
Todd Munson and
Rob Schreiber and
Jon Stearley and
Eric Van Hensbergen Addressing failures in exascale
computing . . . . . . . . . . . . . . . 129--173
Shijin Yuan and
Shicheng Wen and
Hongyu Li and
Xinfeng Zhang and
Qin Liu An optimization framework for
adjoint-based climate simulations: a
case study of the Zebiak--Cane model . . 174--182
Wangdong Yang and
Kenli Li and
Yan Liu and
Lin Shi and
Lanjun Wan Optimization of quasi-diagonal
matrix-vector multiplication on GPU . . 183--195
Azzam Haidar and
Stanimire Tomov and
Jack Dongarra and
Raffaele Solc\`a and
Thomas Schulthess A novel hybrid CPU-GPU generalized
eigensolver for electronic structure
calculations based on fine-grained
memory aware tasks . . . . . . . . . . . 196--209
Marin Bougeret and
Henri Casanova and
Yves Robert and
Frédéric Vivien and
Dounia Zaidouni Using group replication for resilience
on exascale systems . . . . . . . . . . 210--224
Anshu Dubey and
Katie Antypas and
Alan C. Calder and
Chris Daley and
Bruce Fryxell and
J. Brad Gallagher and
Donald Q. Lamb and
Dongwook Lee and
Kevin Olson and
Lynn B. Reid and
Paul Rich and
Paul M. Ricker and
Katherine M. Riley and
Robert Rosner and
Andrew Siegel and
Noel T. Taylor and
Klaus Weide and
Francis X. Timmes and
Natasha Vladimirova and
John ZuHone Evolution of FLASH, a multi-physics
scientific simulation code for
high-performance computing . . . . . . . 225--237
Shuai Che and
Kevin Skadron BenchFriend: Correlating the performance
of GPU benchmarks . . . . . . . . . . . 238--250
Jiayuan Meng and
Toshio Endo Special Issue on Applications for the
Heterogeneous Computing Era . . . . . . 253--254
Tao Gao and
Yutong Lu and
Baida Zhang and
Guang Suo Using the Intel Many Integrated Core to
accelerate graph traversal . . . . . . . 255--266
Dip Sankar Banerjee and
Parikshit Sakurikar and
Kishore Kothapalli Comparison sorting on hybrid multicore
architectures for fixed and variable
length keys . . . . . . . . . . . . . . 267--284
Andra Hugo and
Abdou Guermouche and
Pierre-André Wacrenier and
Raymond Namyst Composing multiple StarPU applications
over heterogeneous machines: a
supervised approach . . . . . . . . . . 285--300
Yang You and
Haohuan Fu and
Shuaiwen Leon Song and
Maryam Mehri Dehnavi and
Lin Gan and
Xiaomeng Huang and
Guangwen Yang Evaluating multi-core and many-core
architectures through accelerating the
three-dimensional Lax--Wendroff
correction stencil . . . . . . . . . . . 301--318
Yash Ukidave and
Amir Kavyan Ziabari and
Perhaad Mistry and
Gunar Schirner and
David Kaeli Analyzing power efficiency of
optimization techniques and algorithm
design methods for applications on
heterogeneous platforms . . . . . . . . 319--334
Yukihiro Hasegawa and
Jun-Ichi Iwata and
Miwako Tsuji and
Daisuke Takahashi and
Atsushi Oshiyama and
Kazuo Minami and
Taisuke Boku and
Hikaru Inoue and
Yoshito Kitazawa and
Ikuo Miyoshi and
Mitsuo Yokokawa Performance evaluation of
ultra-large-scale first-principles
electronic structure calculation code on
the K computer . . . . . . . . . . . . . 335--355
Sandra Wienke and
Marcel Spekowius and
Alesja Dammer and
Dieter an Mey and
Christian Hopmann and
Matthias S. Müller Towards an accurate simulation of the
crystallisation process in injection
moulded plastic components by hybrid
parallelisation . . . . . . . . . . . . 356--367
M. Luisa Córdoba and
Antonio García Dopico and
M. Isabel García and
Francisco Rosales and
Jesús Arnaiz and
Rodolfo Bermejo and
Pedro Galán del Sastre Efficient parallelization of a regional
ocean model for the western
Mediterranean Sea . . . . . . . . . . . 368--383
Javier Garcia Blas and
Jesus Carretero Recent advances in the Message Passing
Interface . . . . . . . . . . . . . . . 387--389
James Dinan and
Ryan E. Grant and
Pavan Balaji and
David Goodell and
Douglas Miller and
Marc Snir and
Rajeev Thakur Enabling communication concurrency
through flexible MPI endpoints . . . . . 390--405
Christi Symeonidou and
Polyvios Pratikakis and
Dimitrios S. Nikolopoulos and
Angelos Bilas Distributed region-based memory
allocation and synchronization . . . . . 406--414
Brian W. Barrett and
Ron Brightwell and
Ryan Grant and
Simon D. Hammond and
K. Scott Hemmert An evaluation of MPI message rate on
hybrid-core processors . . . . . . . . . 415--424
Emmanuelle Saillard and
Patrick Carribault and
Denis Barthou PARCOACH: Combining static and dynamic
validation of MPI collective
communications . . . . . . . . . . . . . 425--434
Judicael A. Zounmevo and
Dries Kimpe and
Robert Ross and
Ahmad Afsahi Extreme-scale computing services over
MPI: Experiences, observations and
features proposal for next-generation
message passing interface . . . . . . . 435--449
Sameer Kumar and
Amith Mamidala and
Philip Heidelberger and
Dong Chen and
Daniel Faraj Optimization of MPI collective
operations on the IBM Blue Gene/Q
supercomputer . . . . . . . . . . . . . 450--464
Kamil Iskra and
Torsten Hoefler Operating systems and runtime
environments on supercomputers . . . . . 3--4
Scott Levy and
Kurt B. Ferreira and
Patrick G. Bridges and
Aidan P. Thompson and
Christian Trott A study of the viability of exploiting
memory content similarity to improve
resilience to memory errors . . . . . . 5--20
Yin Lu and
Yong Chen and
Yu Zhuang and
Jialin Liu and
Rajeev Thakur Collective input/output under memory
constraints . . . . . . . . . . . . . . 21--36
Erik Vermij and
Leandro Fiorin and
Rik Jongerius and
Christoph Hagleitner and
Koen Bertels Challenges in exascale radio astronomy:
Can the SKA ride the technology wave? 37--50
Jun Chai and
Johan Hake and
Nan Wu and
Mei Wen and
Xing Cai and
Glenn T. Lines and
Jing Yang and
Huayou Su and
Chunyuan Zhang and
Xiangke Liao Towards simulation of subcellular
calcium dynamics at nanometre resolution 51--63
Manish Bajpai and
Phalguni Gupta and
Prabhat Munshi Fast multi-processor multi-GPU based
algorithm of tomographic inversion for
$3$D image reconstruction . . . . . . . 64--72
Henri Casanova and
Fanny Dufossé and
Yves Robert and
Frédéric Vivien Mapping applications on volatile
resource . . . . . . . . . . . . . . . . 73--91
Mark Gates and
Michael T. Heath and
John Lambros High-performance hybrid CPU and GPU
parallel algorithm for digital volume
correlation . . . . . . . . . . . . . . 92--106
Lynn Wood and
Jeff Daily and
Michael Henry and
Bruce Palmer and
Karen Schuchardt and
Donald Dazlich and
Ross Heikes and
David Randall A global climate model agent for high
spatial and temporal resolution data . . 107--116
Simon McIntosh-Smith and
James Price and
Richard B. Sessions and
Amaurys A. Ibarra High performance in silico virtual drug
screening on many-core processors . . . 119--134
Hari K. Raghavan and
Sathish S. Vadhiyar Adaptive executions of hyperbolic
block-structured AMR applications on GPU
systems . . . . . . . . . . . . . . . . 135--153
Anthony P. Craig and
Sheri A. Mickelson and
Elizabeth C. Hunke and
David A. Bailey Improved parallel performance of the
CICE model in CESM1 . . . . . . . . . . 154--165
Hee Won Lee and
Mihail L. Sichitiu and
David Thuente High-performance emulation of
heterogeneous systems using adaptive
time dilation . . . . . . . . . . . . . 166--183
Fv Grigoriev and
Av Sulimov and
Igor Kochikov and
Oa Kondakova and
Vb Sulimov and
Av Tikhonravov High-performance atomistic modeling of
optical thin films deposited by
energetic processes . . . . . . . . . . 184--192
Azzam Haidar and
Tingxing Dong and
Piotr Luszczek and
Stanimire Tomov and
Jack Dongarra Batched matrix computations on hardware
accelerators based on GPUs . . . . . . . 193--208
Didem Unat and
Cy Chan and
Weiqun Zhang and
Samuel Williams and
John Bachan and
John Bell and
John Shalf ExaSAT: an exascale co-design tool for
performance modeling . . . . . . . . . . 209--232
Tomas Ekeberg and
Stefan Engblom and
Jing Liu Machine learning for ultrafast X-ray
diffraction patterns on large-scale GPU
clusters . . . . . . . . . . . . . . . . 233--243
Frédéric Magoul\`es and
Mark Parsons and
Lorna Smith Innovative Algorithms for Extreme Scale
Computing . . . . . . . . . . . . . . . 247--248
Vincent Reverdy and
Jean-Michel Alimi and
Vincent Bouillot and
Yann Rasera and
Pier-Stefano Corasaniti and
Ir\`ene Balm\`es and
Stéphane Requena and
Xavier Delaruelle and
Jean-Noel Richet DEUS full observable universe
simulations: Numerical challenge and
outlooks . . . . . . . . . . . . . . . . 249--260
George Mozdzynski and
Mats Hamrud and
Nils Wedi A Partitioned Global Address Space
implementation of the European Centre
for Medium Range Weather Forecasts
Integrated Forecasting System . . . . . 261--273
Alan Gray and
Alistair Hart and
Oliver Henrich and
Kevin Stratford Scaling soft matter physics to thousands
of graphics processing units in parallel 274--283
Frédéric Magoul\`es and
Abal-Kassim Cheik Ahamed Alinea: an Advanced Linear Algebra
Library for Massively Parallel
Computations on Graphics Processing
Units . . . . . . . . . . . . . . . . . 284--310
Stefano Markidis and
Jing Gong and
Michael Schliephake and
Erwin Laure and
Alistair Hart and
David Henty and
Katherine Heisey and
Paul Fischer OpenACC acceleration of the Nek5000
spectral element code . . . . . . . . . 311--319
Michael T. Heath A tale of two laws . . . . . . . . . . . 320--330
Unai Lopez-Novoa and
Jon Sáenz and
Alexander Mendiburu and
Jose Miguel-Alonso An efficient implementation of kernel
density estimation for multi-core and
many-core architectures . . . . . . . . 331--347
Nan Dun and
Hajime Fujita and
John R. Tramm and
Andrew A. Chien and
Andrew R. Siegel Data decomposition in Monte Carlo
neutron transport simulations using
global view arrays . . . . . . . . . . . 348--365
Hartwig Anzt and
Stanimire Tomov and
Piotr Luszczek and
William Sawyer and
Jack Dongarra Acceleration of GPU-based Krylov solvers
via data transfer reduction . . . . . . 366--383
Dewan Ibtesham and
Kurt B. Ferreira and
Dorian Arnold A checkpoint compression study for
high-performance computing systems . . . 387--402
Austin R. Benson and
Sven Schmit and
Robert Schreiber Silent error detection in numerical
time-stepping schemes . . . . . . . . . 403--421
Erlin Yao and
Jiutian Zhang and
Mingyu Chen and
Guangming Tan and
Ninghui Sun Detection of soft errors in $ L U $
decomposition with partial pivoting
using algorithm-based fault tolerance 422--436
Steven McDonagh and
Cigdem Beyan and
Phoenix X. Huang and
Robert B. Fisher Applying semi-synchronised task farming
to large-scale computer vision problems 437--460
Marco Aldinucci and
Guilherme Peretti Pezzi and
Maurizio Drocco and
Concetto Spampinato and
Massimo Torquati Parallel visual data restoration on
multi-GPGPUs using stencil-reduce
pattern . . . . . . . . . . . . . . . . 461--472
Massimo Minervini and
Cristian Rusu and
Mario Damiano and
Valter Tucci and
Angelo Bifone and
Alessandro Gozzi and
Sotirios A. Tsaftaris Large-scale analysis of neuroimaging
data on commercial clouds with
content-aware resource allocation
strategies . . . . . . . . . . . . . . . 473--488
Tim Besard and
Bjorn De Sutter and
Andrés Frías-Velázquez and
Wilfried Philips Case study of multiple trace transform
implementations . . . . . . . . . . . . 489--505
Jorge González-Domínguez and
Jan Christian Kässens and
Lars Wienbrandt and
Bertil Schmidt Large-scale genome-wide association
studies on a GPU cluster using a
CUDA-accelerated PGAS programming model 506--510
Jack Dongarra and
Michael A. Heroux and
Piotr Luszczek High-performance conjugate-gradient
benchmark: a new metric for ranking
high-performance computing systems . . . 3--10
Jongsoo Park and
Mikhail Smelyanskiy and
Karthikeyan Vaidyanathan and
Alexander Heinecke and
Dhiraj D. Kalamkar and
Md Mosotofa Ali Patwary and
Vadim Pirogov and
Pradeep Dubey and
Xing Liu and
Carlos Rosales and
Cyril Mazauric and
Christopher Daley Optimizations in a high-performance
conjugate gradient benchmark for
IA-based multi- and many-core processors 11--27
Everett Phillips and
Massimiliano Fatica Performance analysis of the
high-performance conjugate gradient
benchmark on GPUs . . . . . . . . . . . 28--38
Yiqun Liu and
Chao Yang and
Fangfang Liu and
Xianyi Zhang and
Yutong Lu and
Yunfei Du and
Canqun Yang and
Min Xie and
Xiangke Liao 623 Tflop/s HPCG run on Tianhe-2:
Leveraging millions of hybrid cores . . 39--54
Kiyoshi Kumahata and
Kazuo Minami and
Naoya Maruyama High-performance conjugate gradient
performance improvement on the K
computer . . . . . . . . . . . . . . . . 55--70
Toshitaka Baba and
Kazuto Ando and
Daisuke Matsuoka and
Mamoru Hyodo and
Takane Hori and
Narumi Takahashi and
Ryoko Obayashi and
Yoshiyuki Imato and
Dai Kitamura and
Hitoshi Uehara and
Toshihiro Kato and
Ryotaro Saka Large-scale, high-speed tsunami
prediction for the Great Nankai Trough
Earthquake on the K computer . . . . . . 71--84
Edmond Chow and
Xing Liu and
Sanchit Misra and
Marat Dukhan and
Mikhail Smelyanskiy and
Jeff R. Hammond and
Yunfei Du and
Xiang-Ke Liao and
Pradeep Dubey Scaling up Hartree--Fock calculations on
Tianhe-2 . . . . . . . . . . . . . . . . 85--102
Dahai Guo and
William Gropp and
Luke N. Olson A hybrid format for better performance
of sparse matrix-vector multiplication
on a GPU . . . . . . . . . . . . . . . . 103--120
Patrick M. Widener and
Scott Levy and
Kurt B. Ferreira and
Torsten Hoefler On noise and the performance benefit of
nonblocking collectives . . . . . . . . 121--133
Jiri Jaros and
Alistair P. Rendell and
Bradley E. Treeby Full-wave nonlinear ultrasound
simulation on distributed clusters with
applications in high-intensity focused
ultrasound . . . . . . . . . . . . . . . 137--155
Xinqiang Miao and
Xianlong Jin and
Junhong Ding Improving the parallel efficiency of
large-scale structural dynamic analysis
using a hierarchical approach . . . . . 156--168
Bozhong Liu and
Weidong Qiu and
Lin Jiang and
Zheng Gong Software pipelining for graphic
processing unit acceleration: Partition,
scheduling and granularity . . . . . . . 169--185
Rone Kwei Lim and
J. William Pro and
Matthew R. Begley and
Marcel Utz and
Linda R. Petzold High-performance simulation of fracture
in idealized brick and mortar composites
using adaptive Monte Carlo minimization
on the GPU . . . . . . . . . . . . . . . 186--199
Hoang-Vu Dang and
Bertil Schmidt and
Andreas Hildebrandt and
Tuan Tu Tran and
Anna Katharina Hildebrandt CUDA-enabled hierarchical ward
clustering of protein structures based
on the nearest neighbour chain algorithm 200--211
Tanzima Islam and
Kathryn Mohror and
Martin Schulz Exploring the MPI tool information
interface: features and capabilities . . 212--222
Bruce Palmer and
William Perkins and
Yousu Chen and
Shuangshuang Jin and
David Callahan and
Kevin Glass and
Ruisheng Diao and
Mark Rice and
Stephen Elbert and
Mallikarjuna Vallem and
Zhenyu Huang GridPACK\TM: a framework for developing
power grid simulations on
high-performance computing platforms . . 223--240
Teng Wang and
Kevin Vasko and
Zhuo Liu and
Hui Chen and
Weikuan Yu Enhance parallel input/output with
cross-bundle aggregation . . . . . . . . 241--256
Yi Liu and
Xiongzi Ge and
David Hung-Chang Du and
Xiaoxia Huang Par-BF: a parallel partitioned Bloom
filter for dynamic data sets . . . . . . 259--275
Adnan Ozsoy An efficient parallelization of longest
prefix match and application on data
compression . . . . . . . . . . . . . . 276--289
Daniele Pianu and
Roberto Nerino and
Claudia Ferraris and
Antonio Chimienti A novel approach to train random forests
on GPU for computer vision applications
using local features . . . . . . . . . . 290--304
Ignacio Laguna and
David F. Richards and
Todd Gamblin and
Martin Schulz and
Bronis R. de Supinski and
Kathryn Mohror and
Howard Pritchard Evaluating and extending user-level
fault tolerance in MPI applications . . 305--319
Matthew Otten and
Jing Gong and
Azamat Mametjanov and
Aaron Vose and
John Levesque and
Paul Fischer and
Misun Min An MPI/OpenACC implementation of a
high-order electromagnetics solver with
GPUDirect communication . . . . . . . . 320--334
Md. Mohsin Ali and
Peter E. Strazdins and
Brendan Harding and
Markus Hegland Complex scientific applications made
fault-tolerant with the sparse grid
combination technique . . . . . . . . . 335--359
William Boyd and
Andrew Siegel and
Shuo He and
Benoit Forget and
Kord Smith Parallel performance results for the
OpenMOC neutron transport code on
multicore platforms . . . . . . . . . . 360--375
Zsolt Horváth and
Rui Ap Perdigão and
Jürgen Waser and
Daniel Cornel and
Artem Konev and
Günter Blöschl Kepler shuffle for real-world flood
simulations on GPUs . . . . . . . . . . 379--395
Shuibing He and
Yan Liu and
Yang Wang and
Xian-He Sun and
Chuanhe Huang Enhancing hybrid parallel file system
through performance and space-aware data
layout . . . . . . . . . . . . . . . . . 396--410
Seiji Tsuboi and
Kazuto Ando and
Takayuki Miyoshi and
Daniel Peter and
Dimitri Komatitsch and
Jeroen Tromp A 1.8 trillion degrees-of-freedom, 1.24
petaflops global seismic wave simulation
on the K computer . . . . . . . . . . . 411--422
Huda Ibeid and
Rio Yokota and
David Keyes A performance model for the
communication in fast multipole methods
on high-performance computing platforms 423--437
Pavol Bauer and
Stefan Engblom and
Stefan Widgren Fast event-based epidemiological
simulations on national scales . . . . . 438--453
Kazuto Ando and
Mamoru Hyodo and
Toshitaka Baba and
Takane Hori and
Toshihiro Kato and
Masaru Watanabe and
Shin-ichi Ichikawa and
Hisakuni Kitahara and
Hitoshi Uehara and
Hikaru Inoue Parallel-algorithm extension for tsunami
and earthquake-cycle simulators for
massively parallel execution on the K
computer . . . . . . . . . . . . . . . . 454--468
Alejandro Calderón and
Alberto García and
Félix García-Carballeira and
Jesús Carretero and
Javier Fernández Improving performance using
computational compression through
memoization: a case study using a
railway power consumption simulator . . 469--485
Jonathan Y. Kemal and
Roger L. Davis and
John D. Owens Multidisciplinary simulation
acceleration using multiple shared
memory graphical processing units . . . 486--508
Jack Dongarra and
Bernard Tourancheau Guest Editor's Note: Special Issue on
Clusters, Clouds and Data for Scientific
Computing . . . . . . . . . . . . . . . 3--3
Ewa Deelman and
Christopher Carothers and
Anirban Mandal and
Brian Tierney and
Jeffrey S. Vetter and
Ilya Baldin and
Claris Castillo and
Gideon Juve and
Dariusz Król and
Vickie Lynch and
Ben Mayer and
Jeremy Meredith and
Thomas Proffen and
Paul Ruth and
Rafael Ferreira da Silva PANORAMA: an approach to performance
modeling and diagnosis of extreme-scale
workflows . . . . . . . . . . . . . . . 4--18
Michela Taufer and
Arnold L. Rosenberg Scheduling DAG-based workflows on single
cloud instances: High-performance and
cost effectiveness with a static
scheduler . . . . . . . . . . . . . . . 19--31
George Teodoro and
Tahsin Kurc and
Guilherme Andrade and
Jun Kong and
Renato Ferreira and
Joel Saltz Application performance analysis and
efficient execution on systems with
multi-core CPUs, GPUs and MICs: a case
study with microscopy image analysis . . 32--51
Anne Benoit and
Saurabh K. Raina and
Yves Robert Efficient checkpoint/verification
patterns . . . . . . . . . . . . . . . . 52--65
Enric Tejedor and
Yolanda Becerra and
Guillem Alomar and
Anna Queralt and
Rosa M. Badia and
Jordi Torres and
Toni Cortes and
Jesús Labarta PyCOMPSs: Parallel computational
workflows in Python . . . . . . . . . . 66--82
Marc Buffat and
Anne Cadiou and
Lionel Le Penven and
Christophe Pera In situ analysis and visualization of
massively parallel computations . . . . 83--90
Shad Kirmani and
Jeonghyung Park and
Padma Raghavan An embedded sectioning scheme for
multiprocessor topology-aware mapping of
irregular applications . . . . . . . . . 91--103
Al Geist and
Daniel A. Reed A survey of high-performance computing
scaling challenges . . . . . . . . . . . 104--113
William Spataro and
Giuseppe A. Trunfio and
Georgios Ch. Sirakoulis High performance computing in modelling
and simulation . . . . . . . . . . . . . 117--118
Ana Flávia P. Camargos and
Viviane C. Silva and
Jean-M. Guichon and
Gérard Meunier GPU-accelerated iterative solution of
complex-entry systems issued from $3$D
edge-FEA of electromagnetics in the
frequency domain . . . . . . . . . . . . 119--133
Themistoklis Giitsidis and
Nikolaos I. Dourvas and
Georgios Ch Sirakoulis Parallel implementation of aircraft
disembarking and emergency evacuation
based on cellular automata . . . . . . . 134--151
Irfan Uddin One-IPC high-level simulation of
microthreaded many-core architectures 152--162
Davide Spataro and
Donato D'Ambrosio and
Giuseppe Filippone and
Rocco Rongo and
William Spataro and
Davide Marocco The new SCIARA-fv3 numerical model and
acceleration by GPGPU strategies . . . . 163--176
Anonymous Preface . . . . . . . . . . . . . . . . 179--180
Anonymous Notice . . . . . . . . . . . . . . . . . 181--181
Ivan Merelli and
Paolo Cozzi and
Elisabetta Ronchieri and
Daniele Cesini and
Daniele D'Agostino Porting bioinformatics applications from
grid to cloud: a macromolecular surface
analysis application case study . . . . 182--195
Fabio Tordini and
Maurizio Drocco and
Claudia Misale and
Luciano Milanesi and
Pietro Li\`o and
Ivan Merelli and
Massimo Torquati and
Marco Aldinucci NuChart-II: The road to a fast and
scalable tool for Hi-C data analysis . . 196--211
Matthias Diener and
Eduardo Hm Cruz and
Philippe Oa Navaux Modeling memory access behavior for data
mapping . . . . . . . . . . . . . . . . 212--228
Jan G. Cornelis and
Jan Lemeire and
Tim Bruylants and
Peter Schelkens Heterogeneous acceleration of volumetric
JPEG 2000 using OpenCL . . . . . . . . . 229--245
Fredrik Robertsén and
Jan Westerholm and
Keijo Mattila Designing a graphics processing unit
accelerated petaflop capable lattice
Boltzmann solver: Read aligned data
layouts and asynchronous communication 246--255
Rune Havnung Bakken and
Lars Moland Eliassen Real-time three-dimensional
skeletonisation using general-purpose
computing on graphics processing units
applied to computer vision-based human
pose estimation . . . . . . . . . . . . 259--273
Lena Oden and
Holger Fröning InfiniBand Verbs on GPU: a case study of
controlling an InfiniBand network device
from the GPU . . . . . . . . . . . . . . 274--284
Anish Varghese and
Bob Edwards and
Gaurav Mitra and
Alistair P. Rendell Programming the Adapteva Epiphany
64-core network-on-chip coprocessor . . 285--302
Miaoqing Huang and
Chenggang Lai and
Xuan Shi and
Zhijun Hao and
Haihang You Study of parallel programming models on
computer clusters with Intel MIC
coprocessors . . . . . . . . . . . . . . 303--315
Rosa Filguiera and
Amrey Krause and
Malcolm Atkinson and
Iraklis Klampanos and
Alexander Moreno dispel4py: a Python framework for
data-intensive scientific computing . . 316--334
Anthony Kougkas and
Hassan Eslami and
Xian-He Sun and
Rajeev Thakur and
William Gropp Rethinking key--value store for parallel
I/O optimization . . . . . . . . . . . . 335--356
Pavan Balaji and
Zhiyi Huang Special issue on programming models and
applications for multicores and
manycores . . . . . . . . . . . . . . . 359--360
Yao Wu and
Long Zheng and
Brian Heilig and
Guang R. Gao HAMR: a dataflow-based real-time
in-memory cluster computing engine . . . 361--374
Hartwig Anzt and
Stanimire Tomov and
Jack Dongarra On the performance and energy efficiency
of sparse linear algebra on GPUs . . . . 375--390
Jonathan C. Beard and
Peng Li and
Roger D. Chamberlain RaftLib: a C++ template library for high
performance stream parallel processing 391--404
Nicholas Chaimov and
Khaled Z. Ibrahim and
Samuel Williams and
Costin Iancu Reaching bandwidth saturation using
transparent injection parallelization 405--421
Ahmad Qawasmeh and
Maxime R. Hugues and
Henri Calandra and
Barbara M. Chapman Performance portability in reverse time
migration and seismic modelling via
OpenACC . . . . . . . . . . . . . . . . 422--440
Peng Li and
Jonathan C. Beard and
Jeremy D. Buhler Deadlock-free buffer configuration for
stream computing . . . . . . . . . . . . 441--450
Akhil Langer and
Ehsan Totoni and
Udatta Palekar and
Laxmikant V. Kalé Energy-optimal configuration selection
for manycore chips with variation . . . 451--466
Gordon Bell and
David H. Bailey and
Jack Dongarra and
Alan H. Karp and
Kevin Walsh A look back on 30 years of the Gordon
Bell Prize . . . . . . . . . . . . . . . 469--484
Felix Schmitt and
Robert Dietrich and
Guido Juckeland Scalable critical-path analysis and
optimization guidance for hybrid
MPI--CUDA applications . . . . . . . . . 485--498
Sebastião Miranda and
Jonas Feldt and
Frederico Pratas and
Ricardo A. Mata and
Nuno Roma and
Pedro Tomás Efficient parallelization of
perturbative Monte Carlo QM/MM
simulations in heterogeneous platforms 499--516
Chao Jin and
Bronis R. de Supinski and
David Abramson and
Heidi Poxon and
Luiz DeRose and
Minh Ngoc Dinh and
Mark Endrei and
Elizabeth R. Jessup A survey on software methods to improve
the energy efficiency of parallel
computing . . . . . . . . . . . . . . . 517--549
Timothy Dykes and
Claudio Gheller and
Marzia Rivi and
Mel Krokos Splotch . . . . . . . . . . . . . . . . 550--563
A. Chien and
P. Balaji and
N. Dun and
A. Fang and
H. Fujita and
K. Iskra and
Z. Rubenstein and
Z. Zheng and
J. Hammond and
I. Laguna and
D. Richards and
A. Dubey and
B. van Straalen and
M. Hoemmen and
M. Heroux and
K. Teranishi and
A. Siegel Exploring versioned distributed arrays
for resilience in scientific
applications . . . . . . . . . . . . . . 564--590
Jack Dongarra and
Bernard Tourancheau Guest editors' note . . . . . . . . . . 3--3
Ewing Lusk and
Ralph Butler and
Steven C. Pieper Evolution of a minimal parallel
programming model . . . . . . . . . . . 4--13
Yiannis Georgiou and
Emmanuel Jeannot and
Guillaume Mercier and
Ad\`ele Villiermet Topology-aware job mapping . . . . . . . 14--27
Brice Videau and
Kevin Pouget and
Luigi Genovese and
Thierry Deutsch and
Dimitri Komatitsch and
Frédéric Desprez and
Jean-François Méhaut BOAST . . . . . . . . . . . . . . . . . 28--44
Javier Conejero and
Sandra Corella and
Rosa M. Badia and
Jesus Labarta Task-based programming in COMPSs to
converge from HPC to big data . . . . . 45--60
Supun Kamburugamuve and
Pulasthi Wickramasinghe and
Saliya Ekanayake and
Geoffrey C. Fox Anatomy of machine learning algorithm
implementations in MPI, Spark, and Flink 61--73
Charalampos Chalios and
Giorgis Georgakoudis and
Konstantinos Tovletoglou and
George Karakonstantis and
Hans Vandierendonck and
Dimitrios S. Nikolopoulos DARE . . . . . . . . . . . . . . . . . . 74--88
Anne Benoit and
Lo\"\ic Pottier and
Yves Robert Resilient co-scheduling of malleable
applications . . . . . . . . . . . . . . 89--103
Moustafa AbdelBaky and
Javier Diaz-Montes and
Manish Parashar Software-defined environments for
science and engineering . . . . . . . . 104--122
Guillaume Aupy and
Anne Benoit and
Sicheng Dai and
Lo\"\ic Pottier and
Padma Raghavan and
Yves Robert and
Manu Shantharam Co-scheduling Amdahl applications on
cache-partitioned systems . . . . . . . 123--138
George Bosilca and
Aurelien Bouteiller and
Amina Guermouche and
Thomas Herault and
Yves Robert and
Pierre Sens and
Jack Dongarra A failure detector for HPC platforms . . 139--158
Ewa Deelman and
Tom Peterka and
Ilkay Altintas and
Christopher D. Carothers and
Kerstin Kleese van Dam and
Kenneth Moreland and
Manish Parashar and
Lavanya Ramakrishnan and
Michela Taufer and
Jeffrey Vetter The future of scientific workflows . . . 159--175
Anne Benoit and
Laurent Lef\`evre and
Anne-Cécile Orgerie and
Issam Ra\"\is Reducing the energy consumption of
large-scale computing systems through
combined shutdown policies with multiple
constraints . . . . . . . . . . . . . . 176--188
David W. Walker Morton ordering of $2$D arrays for
efficient access to hierarchical memory 189--203
Fande Kong and
Xiao-Chuan Cai Scalability study of an implicit solver
for coupled fluid-structure interaction
problems on unstructured meshes in $3$D 207--219
Hartwig Anzt and
Moritz Kreutzer and
Eduardo Ponce and
Gregory D. Peterson and
Gerhard Wellein and
Jack Dongarra Optimization and performance evaluation
of the IDR iterative Krylov solver on
GPUs . . . . . . . . . . . . . . . . . . 220--230
Michael O. Lam and
Jeffrey K. Hollingsworth Fine-grained floating-point precision
analysis . . . . . . . . . . . . . . . . 231--245
Lídia Kuan and
Frederico Pratas and
Leonel Sousa and
Pedro Tomás MrBayes sMC$^3$: Accelerating Bayesian
inference of phylogenetic trees . . . . 246--265
Martin Kronbichler and
Ababacar Diagne and
Hanna Holmgren A fast massively parallel two-phase flow
solver for microfluidic chip simulation 266--287
Alan Gray and
Kevin Stratford A lightweight approach to performance
portability with targetDP . . . . . . . 288--301
Carmen Cotelo and
María Aránzazu Amo Baladrón and
Roland Aznar and
Pablo Lorente and
Pablo Rey and
Aurelio Rodríguez On the successful coexistence of
oceanographic operational services with
other computational workloads . . . . . 302--313
Miguel A. Vega-Rodríguez and
Álvaro Rubio-Largo Parallelism in computational biology . . 317--320
Nathan T. Weeks and
Glenn R. Luecke and
Brandon M. Groth and
Marina Kraeva and
Li Ma and
Luke M. Kramer and
James E. Koltes and
James M. Reecy High-performance epistasis detection in
quantitative trait GWAS . . . . . . . . 321--336
Enzo Rucci and
Carlos Garcia and
Guillermo Botella and
Armando E. De Giusti and
Marcelo Naiouf and
Manuel Prieto-Matias OSWALD: OpenCL Smith--Waterman on
Altera's FPGA for Large Protein
Databases . . . . . . . . . . . . . . . 337--350
F. Auricchio and
M. Ferretti and
A. Lefieux and
M. Musci and
A. Reali and
S. Trimarchi and
A. Veneziani Parallelizing a finite element solver in
computational hemodynamics . . . . . . . 351--362
Suejb Memeti and
Sabri Pllana A machine learning approach for
accelerating DNA sequence analysis . . . 363--379
Francesco Asnicar and
Luca Masera and
Emanuela Coller and
Caterina Gallo and
Nadir Sella and
Thomas Tolio and
Paolo Morettin and
Luca Erculiani and
Francesca Galante and
Stanislau Semeniuta and
Giulia Malacarne and
Kristof Engelen and
Andrea Argentini and
Valter Cavecchia and
Claudio Moser and
Enrico Blanzieri NES$^2$RA: Network expansion by
stratified variable subsetting and
ranking aggregation . . . . . . . . . . 380--392
Héctor Martínez and
Sergio Barrachina and
Maribel Castillo and
Joaquín Tárraga and
Ignacio Medina and
Joaquín Dopazo and
Enrique S. Quintana-Ortí A framework for genomic sequencing on
clusters of multicore and manycore
processors . . . . . . . . . . . . . . . 393--406
Sebastian Daberdaku and
Carlo Ferrari Computing voxelised representations of
macromolecular surfaces . . . . . . . . 407--432
M. Asch and
T. Moore and
R. Badia and
M. Beck and
P. Beckman and
T. Bidot and
F. Bodin and
F. Cappello and
A. Choudhary and
B. de Supinski and
E. Deelman and
J. Dongarra and
A. Dubey and
G. Fox and
H. Fu and
S. Girona and
W. Gropp and
M. Heroux and
Y. Ishikawa and
K. Keahey and
D. Keyes and
W. Kramer and
J-F Lavignon and
Y. Lu and
S. Matsuoka and
B. Mohr and
D. Reed and
S. Requena and
J. Saltz and
T. Schulthess and
R. Stevens and
M. Swany and
A. Szalay and
W. Tang and
G. Varoquaux and
J.-P. Vilotte and
R. Wisniewski and
Z. Xu and
I. Zacharov Big data and extreme-scale computing . . 435--479
Roman Wyrzykowski and
Ewa Deelman Guest Editor's note . . . . . . . . . . 480--481
Adrian Klusek and
Pawe\l Topa and
Jaros\law Was and
Robert Luba\'s An implementation of the Social
Distances Model using multi-GPU systems 482--495
Krzysztof Jurczuk and
Marek Kretowski and
Johanne Bezy-Wendling GPU-based computational modeling of
magnetic resonance imaging of vascular
structures . . . . . . . . . . . . . . . 496--511
Mindaugas Radziunas Modeling and simulations of broad-area
edge-emitting semiconductor devices . . 512--522
Lukasz Szustak and
Kamil Halbiniak and
Lukasz Kuczynski and
Joanna Wrobel and
Adam Kulawik Porting and optimization of
solidification application for CPU-MIC
hybrid platforms . . . . . . . . . . . . 523--539
Heike Jagode and
Anthony Danalis and
Jack Dongarra Accelerating NWChem Coupled Cluster
through dataflow-based execution . . . . 540--551
Tom Scogland and
David Beckingsale Introduction . . . . . . . . . . . . . . 553--554
Tom Deakin and
Simon McIntosh-Smith and
Matt Martineau and
Wayne Gaudin An improved parallelism scheme for
deterministic discrete ordinates
transport . . . . . . . . . . . . . . . 555--569
Robert F. Bird and
Patrick Gillies and
Michael R. Bareford and
Andy Herdman and
Stephen Jarvis Performance Optimisation of Inertial
Confinement Fusion Codes using
Mini-applications . . . . . . . . . . . 570--581
Oe Bronson Messer and
Ed D'Azevedo and
Judy Hill and
Wayne Joubert and
Mark Berrill and
Christopher Zimmer MiniApps derived from production HPC
applications using multiple programming
models . . . . . . . . . . . . . . . . . 582--593
Wesley Bland and
Mattan Erez Special Issue on FTS . . . . . . . . . . 597--597
David E. Bernholdt and
Wael R. Elwasif and
Christos Kartsaklis and
Seyong Lee and
Tiffany M. Mintz Programmer-guided reliability for
extreme-scale applications . . . . . . . 598--612
Faisal Shahzad and
Moritz Kreutzer and
Thomas Zeiser and
Rui Machado and
Andreas Pieper and
Georg Hager and
Gerhard Wellein Building and utilizing fault tolerance
support tools for the GASPI applications 613--626
Simon McIntosh-Smith and
Rob Hunt and
James Price and
Alex Warwick Vesztrocy Application-based fault tolerance
techniques for sparse matrix solvers . . 627--640
Omer Subasi and
Tatiana Martsinkevich and
Ferad Zyulkyarov and
Osman Unsal and
Jesus Labarta and
Franck Cappello Unified fault-tolerance framework for
hybrid task-parallel message-passing
applications . . . . . . . . . . . . . . 641--657
F. Rizzi and
K. Morris and
K. Sargsyan and
P. Mycek and
C. Safta and
O. Le Ma\^\itre and
O. Knio and
B. Debusschere Partial differential equations
preconditioner resilient to soft and
hard faults . . . . . . . . . . . . . . 658--673
J. Ignacio Hidalgo and
Francisco Fernández de Vega Special issue on ``Evolutionary
Algorithms on Parallel Architectures and
Distributed Infrastructures'' . . . . . 674--675
Rafael Nogueras and
Carlos Cotta Analyzing self-$ \star $ island-based
memetic algorithms in heterogeneous
unstable environments . . . . . . . . . 676--692
Diego Teijeiro and
Xoán C. Pardo and
Patricia González and
Julio R. Banga and
Ramón Doallo Towards cloud-based parallel
metaheuristics . . . . . . . . . . . . . 693--705
Francisco Chávez and
Francisco Fernández de Vega and
Daniel Lanza and
César Benavides and
Juan Villegas and
Leonardo Trujillo and
Gustavo Olague and
Graciela Román Deploying massive runs of evolutionary
algorithms with ECJ and Hadoop: Reducing
interest points required for face
recognition . . . . . . . . . . . . . . 706--720
Luis Acedo and
Clara Burgos and
José-Ignacio Hidalgo and
Victor Sánchez-Alonso and
Rafael-Jacinto Villanueva and
Javier Villanueva-Oller Calibrating a large network model
describing the transmission dynamics of
the human papillomavirus using a
particle swarm optimization algorithm in
a distributed computing environment . . 721--728
Amogh Katti and
Giuseppe Di Fatta and
Thomas Naughton and
Christian Engelmann Epidemic failure detection and consensus
for extreme parallelism . . . . . . . . 729--743
Jeremy Iverson and
George Karypis A virtual memory manager optimized for
node-level cooperative multi-tasking in
memory constrained systems . . . . . . . 744--759
Tarun Prabhu and
William Gropp DAME: Runtime-compilation for data
movement . . . . . . . . . . . . . . . . 760--774
Pavan Balaji and
Kai-Cheung Leung Introduction . . . . . . . . . . . . . . 777--778
David del Rio Astorga and
Manuel F. Dolz and
Luis Miguel Sánchez and
J. Daniel García and
Marco Danelutto and
Massimo Torquati Finding parallel patterns through static
analysis in C++ applications . . . . . . 779--788
Baldomero Imbernón and
José M. Cecilia and
Horacio Pérez-Sánchez and
Domingo Giménez METADOCK: a parallel metaheuristic
schema for virtual screening methods . . 789--803
Javier García-Blas and
Christopher Brown High-level programming for heterogeneous
and hierarchical parallel systems . . . 804--806
Marco Danelutto and
Peter Kilpatrick and
Gabriele Mencagli and
Massimo Torquati State access patterns in stream parallel
computations . . . . . . . . . . . . . . 807--818
Issam Said and
Pierre Fortin and
Jean-Luc Lamotte and
Henri Calandra Leveraging the accelerated processing
units for seismic imaging: a performance
and power efficiency comparison against
CPUs and GPUs . . . . . . . . . . . . . 819--837
Ana Moreton-Fernandez and
Hector Ortega-Arranz and
Arturo Gonzalez-Escribano Controllers: an abstraction to ease the
use of hardware accelerators . . . . . . 838--853
David del Rio Astorga and
Manuel F. Dolz and
Luis Miguel Sánchez and
Javier Fernández and
J. Daniel García An adaptive offline implementation
selector for heterogeneous parallel
platforms . . . . . . . . . . . . . . . 854--863
Italo Epicoco and
Silvia Mocavero and
Andrew R. Porter and
Stephen M. Pickles and
Mike Ashworth and
Giovanni Aloisio Hybridisation strategies and data
structures for the NEMO ocean model . . 864--881
Esthela Gallardo and
Jérôme Vienne and
Leonardo Fialho and
Patricia Teller and
James Browne Employing MPI\_T in MPI Advisor to
optimize application performance . . . . 882--896
Mirco Altenbernd and
Dominik Göddeke Soft fault detection and correction for
multigrid . . . . . . . . . . . . . . . 897--912
Martin Schreiber and
Pedro S. Peixoto and
Terry Haut and
Beth Wingate Beyond spatial scalability limitations
with a massively parallel method for
linear oscillatory problems . . . . . . 913--933
Hugues Digonnet and
Thierry Coupez and
Patrice Laure and
Luisa Silva Massively parallel anisotropic mesh
adaptation . . . . . . . . . . . . . . . 3--24
J. Loffeld and
Jaf Hittinger On the arithmetic intensity of
high-order finite-volume discretizations
for hyperbolic systems of conservation
laws . . . . . . . . . . . . . . . . . . 25--52
Franz Pichler and
Gundolf Haase Finite element method completely
implemented for graphic processor units
using parallel algorithm libraries . . . 53--66
Muhammad Nufail Farooqi and
Daulet Izbassarov and
Metin Muradoglu and
Didem Unat Communication analysis and optimization
of $3$D front tracking method for
multiphase flow simulations . . . . . . 67--80
Daniel S. Abdi and
Lucas C. Wilcox and
Timothy C. Warburton and
Francis X. Giraldo A GPU-accelerated continuous and
discontinuous Galerkin non-hydrostatic
atmospheric model . . . . . . . . . . . 81--109
Masahiro Nakao and
Hitoshi Murai and
Hidetoshi Iwashita and
Taisuke Boku and
Mitsuhisa Sato Implementation and evaluation of the HPC
challenge benchmark in the XcalableMP
PGAS language . . . . . . . . . . . . . 110--123
E. Calore and
A. Gabbana and
Sf Schifano and
R. Tripiccione Optimization of lattice Boltzmann
simulations on heterogeneous computers 124--139
Jan Hückelheim and
Paul Hovland and
Michelle Mills Strout and
Jens-Dominik Müller Reverse-mode algorithmic differentiation
of an OpenMP-parallel compressible flow
solver . . . . . . . . . . . . . . . . . 140--154
Tadashi Yamazaki and
Jun Igarashi and
Junichiro Makino and
Toshikazu Ebisuzaki Real-time simulation of a cat-scale
artificial cerebellum on PEZY-SC
processors . . . . . . . . . . . . . . . 155--168
Bei Wang and
Stephane Ethier and
William Tang and
Khaled Z. Ibrahim and
Kamesh Madduri and
Samuel Williams and
Leonid Oliker Modern gyrokinetic particle-in-cell
simulation of fusion plasmas on top
supercomputers . . . . . . . . . . . . . 169--188
Linda Stals Algorithm-based fault recovery of
adaptively refined parallel multilevel
grids . . . . . . . . . . . . . . . . . 189--211
Vladimir Mironov and
Alexander Moskovsky and
Michael D'Mello and
Yuri Alexeev An efficient MPI/OpenMP parallelization
of the Hartree--Fock--Roothaan method
for the first generation of Intel\reg
Xeon Phi\TM processor architecture . . . 212--224
Carlos Teijeiro and
Thomas Hammerschmidt and
Ralf Drautz and
Godehard Sutmann Optimized parallel simulations of
analytic bond-order potentials on hybrid
shared/distributed memory with MPI and
OpenMP . . . . . . . . . . . . . . . . . 227--241
Daniel S. Abdi and
Francis X. Giraldo and
Emil M. Constantinescu and
Lester E. Carr and
Lucas C. Wilcox and
Timothy C. Warburton Acceleration of the IMplicit--EXplicit
nonhydrostatic unified model of the
atmosphere on manycore processors . . . 242--267
Katherine J. Evans and
Richard K. Archibald and
David J. Gardner and
Matthew R. Norman and
Mark A. Taylor and
Carol S. Woodward and
Patrick H. Worley Performance analysis of fully explicit
and fully implicit solvers within a
spectral element shallow-water
atmosphere model . . . . . . . . . . . . 268--284
Dingwen Tao and
Sheng Di and
Hanqi Guo and
Zizhong Chen and
Franck Cappello Z-checker: a framework for assessing
lossy compression of scientific data . . 285--303
Hasan Metin Aktulga and
Chris Knight and
Paul Coffman and
Kurt A. O'Hearn and
Tzu-Ray Shan and
Wei Jiang Optimizing the performance of reactive
molecular dynamics simulations for
many-core architectures . . . . . . . . 304--321
Anshu Dubey and
Petros Tzeferacos and
Don Q. Lamb The dividends of investing in
computational software design: a case
study . . . . . . . . . . . . . . . . . 322--331
Irina Demeshko and
Jerry Watkins and
Irina K. Tezaur and
Oksana Guba and
William F. Spotz and
Andrew G. Salinger and
Roger P. Pawlowski and
Michael A. Heroux Toward performance portability of the
Albany finite element analysis code
using the Kokkos library . . . . . . . . 332--352
Elmar Peise and
Paolo Bientinesi The ELAPS framework: Experimental Linear
Algebra Performance Studies . . . . . . 353--365
Marc Casas and
Wilfried N. Gansterer and
Elias Wimmer Resilient gossip-inspired all-reduce
algorithms for high-performance
computing: Potential, limitations, and
open questions . . . . . . . . . . . . . 366--383
Pietro Cicotti and
Manu Shantharam and
Laura Carrington Reducing communication in parallel graph
search algorithms with software caches 384--396
Jon Calhoun and
Franck Cappello and
Luke N. Olson and
Marc Snir and
William D. Gropp Exploring the feasibility of lossy
compression for PDE simulations . . . . 397--410
Andreas Müller and
Michal A. Kopera and
Simone Marras and
Lucas C. Wilcox and
Tobin Isaac and
Francis X. Giraldo Strong scaling for numerical weather
prediction at petascale with the
atmospheric model NUMA . . . . . . . . . 411--426
Gabriele Mencagli and
Felipe Mg França and
Cristiana Barbosa Bentes and
Leandro Augusto Justen Marzulo and
Mauricio Lima Pilla Special issue on parallel applications
for in-situ computing on the
next-generation computing platforms . . 429--430
João Vicente Ferreira Lima and
Issam Ra\"\is and
Laurent Lef\`evre and
Thierry Gautier Performance and energy analysis of
OpenMP runtime systems with dense linear
algebra algorithms . . . . . . . . . . . 431--443
Jucele França de Alencar Vasconcellos and
Edson Norberto Cáceres and
Henrique Mongelli and
Siang Wun Song and
Frank Dehne and
Jayme Luiz Szwarcfiter New BSP/CGM algorithms for spanning
trees . . . . . . . . . . . . . . . . . 444--461
Anderson Avila and
Renata Hax Sander Reiser and
Maurício Lima Pilla and
Adenauer Correa Yamin Improving in situ GPU simulation of
quantum computing in the D-GM
environment . . . . . . . . . . . . . . 462--472
Matheus S. Serpa and
Eduardo Hm Cruz and
Matthias Diener and
Arthur M. Krause and
Philippe Oa Navaux and
Jairo Panetta and
Albert Farrés and
Claudia Rosas and
Mauricio Hanzich Optimization strategies for geophysics
models on manycore systems . . . . . . . 473--486
Roman Wyrzykowski and
Ewa Deelman Guest Editor's note: Special issue on
challenges and solutions for porting
applications to emerging high
performance computing systems . . . . . 487--488
Adrian K\lusek and
Adrian K\lusek and
Marcin Lo\'s and
Maciej Paszy\'nski and
Witold Dzwinel Efficient model of tumor dynamics
simulated in multi-GPU environment . . . 489--506
Vladimir Stegailov and
Ekaterina Dlinnova and
Timur Ismagilov and
Mikhail Khalilov and
Nikolay Kondratyuk and
Dmitry Makagon and
Alexander Semenov and
Alexei Simonov and
Grigory Smirnov and
Alexey Timofeev Angara interconnect makes GPU-based
Desmos supercomputer an efficient tool
for molecular dynamics calculations . . 507--521
Daniel Langr and
Tomás Dytrych and
Kristina D. Launey and
Jerry P. Draayer Accelerating many-nucleon basis
generation for high performance
computing enabled ab initio nuclear
structure studies . . . . . . . . . . . 522--533
Lukasz Szustak and
Pawel Bratek Performance portable parallel
programming of heterogeneous stencils
across shared-memory platforms with
modern Intel processors . . . . . . . . 534--553
Christian Simmendinger and
Roman Iakymchuk and
Luis Cebamanos and
Dana Akhmetova and
Valeria Bartsch and
Tiberiu Rotaru and
Mirko Rahn and
Erwin Laure and
Stefano Markidis Interoperability strategies for GASPI
and MPI in large-scale scientific
applications . . . . . . . . . . . . . . 554--568
Nils Kohl and
Johannes Hötzer and
Florian Schornbaum and
Martin Bauer and
Christian Godenschwager and
Harald Köstler and
Britta Nestler and
Ulrich Rüde A scalable and extensible checkpointing
scheme for massively parallel
simulations . . . . . . . . . . . . . . 571--589
Vaibhav Sundriyal and
Kristopher Keipert and
Masha Sosonkina and
Mark S. Gordon Effect of frequency scaling granularity
on energy-saving strategies . . . . . . 590--601
Karl-Robert Wichmann and
Martin Kronbichler and
Rainald Löhner and
Wolfgang A. Wall Practical applicability of optimizations
and performance models to complex
stencil-based loop kernels in CFD . . . 602--618
Samuel Elliott and
Raghu Raj Prasanna Kumar and
Natasha Flyer and
Tuan Ta and
Richard Loft Implementation of a scalable,
performance portable shallow water
equation solver using radial basis
function-generated finite difference
methods . . . . . . . . . . . . . . . . 619--631
Hector Emilio Barrios Molano and
Kamy Sepehrnoori Development of a framework for parallel
reservoir simulation . . . . . . . . . . 632--650
Domenico Rea and
Giansimone Perrino and
Diego di Bernardo and
Livia Marcellino and
Diego Romano A GPU algorithm for tracking yeast cells
in phase-contrast microscopy images . . 651--659
Lubomir Riha and
Michal Merta and
Radim Vavrik and
Tomas Brzobohaty and
Alexandros Markopoulos and
Ondrej Meca and
Ondrej Vysocky and
Tomas Kozubek and
Vit Vondrak A massively parallel and
memory-efficient FEM toolbox with a
hybrid total FETI solver with
accelerator support . . . . . . . . . . 660--677
Niclas Jansson and
Rahul Bale and
Keiji Onishi and
Makoto Tsubokura CUBE: a scalable framework for
large-scale industrial simulations . . . 678--698
Thomas Heller and
Bryce Adelstein Lelbach and
Kevin A. Huck and
John Biddiscombe and
Patricia Grubel and
Alice E. Koniges and
Matthias Kretz and
Dominic Marcello and
David Pfander and
Adrian Serio and
Juhan Frank and
Geoffrey C. Clayton and
Dirk Pflüger and
David Eder and
Hartmut Kaiser Harnessing billions of tasks for a
scalable portable hydrodynamic
simulation of the merger of two stars 699--715
Andrea Borghesi and
Andrea Bartolini and
Michela Milano and
Luca Benini Pricing schemes for energy-efficient HPC
systems: Design and exploration . . . . 716--734
Kasia \'Swirydowicz and
Noel Chalmers and
Ali Karakus and
Tim Warburton Acceleration of tensor-product
operations for high-order finite element
methods . . . . . . . . . . . . . . . . 735--757
Michael Mascagni CRE2017 Special Issue Introduction
IJHPCA . . . . . . . . . . . . . . . . . 761--762
Line Pouchard and
Sterling Baldwin and
Todd Elsethagen and
Shantenu Jha and
Bibi Raju and
Eric Stephan and
Li Tang and
Kerstin Kleese Van Dam Computational reproducibility of
scientific workflows at extreme scales 763--776
Kento Sato and
Ignacio Laguna and
Gregory L. Lee and
Martin Schulz and
Christopher M. Chambreau and
Simone Atzeni and
Michael Bentley and
Ganesh Gopalakrishnan and
Zvonimir Rakamaric and
Geof Sawaya and
Joachim Protze and
Dong H. Ahn Pruners: Providing reproducibility for
uncovering non-deterministic errors in
runs on supercomputers . . . . . . . . . 777--783
Salil Mahajan and
Katherine J. Evans and
Joseph H. Kennedy and
Min Xu and
Mathew R. Norman and
Marcia L. Branstetter Ongoing solution reproducibility of
earth system models as they progress
toward exascale computing . . . . . . . 784--790
Roman Iakymchuk and
Stef Graillat and
David Defour and
Enrique S. Quintana-Ortí Hierarchical approach for deriving a
reproducible unblocked $ L U $
factorization . . . . . . . . . . . . . 791--803
Sergio Iserte and
Héctor Martínez and
Sergio Barrachina and
Maribel Castillo and
Rafael Mayo and
Antonio J. Peña Dynamic reconfiguration of noniterative
scientific applications: a case study
with HPG aligner . . . . . . . . . . . . 804--816
Markus Huber and
Ulrich Rüde and
Barbara Wohlmuth Adaptive control in roll-forward
recovery for extreme scale multigrid . . 817--837
Nikola Tchipev and
Steffen Seckler and
Matthias Heinen and
Jadran Vrabec and
Fabio Gratl and
Martin Horsch and
Martin Bernreuther and
Colin W. Glass and
Christoph Niethammer and
Nicolay Hammer and
Bernd Krischok and
Michael Resch and
Dieter Kranzlmüller and
Hans Hasse and
Hans-Joachim Bungartz and
Philipp Neumann TweTriS: Twenty trillion-atom simulation 838--854
Stefan Lemvig Glimberg and
Allan Peter Engsig-Karup and
Luke N. Olson A massively scalable distributed
multigrid framework for nonlinear marine
hydrodynamics . . . . . . . . . . . . . 855--868
Masahiro Nakao and
Tetsuya Odajima and
Hitoshi Murai and
Akihiro Tabuchi and
Norihisa Fujita and
Toshihiro Hanawa and
Taisuke Boku and
Mitsuhisa Sato Evaluation of XcalableACC with tightly
coupled accelerators/InfiniBand hybrid
communication on accelerated cluster . . 869--884
Milos Ivanovi\'c and
Ana Kaplarevi\'c-Malisi\'c and
Boban Stojanovi\'c and
Marina Svicevi\'c and
Srboljub M. Mijailovich Machine learned domain decomposition
scheme applied to parallel multi-scale
muscle simulation . . . . . . . . . . . 885--896
Andrew C. Kirby and
Michael J. Brazell and
Zhi Yang and
Rajib Roy and
Behzad R. Ahrabi and
Michael K. Stoellinger and
Jay Sitaraman and
Dimitri J. Mavriplis Wind farm simulations using an overset
hp-adaptive approach with blade-resolved
turbine models . . . . . . . . . . . . . 897--923
Katharina Kormann and
Klaus Reuter and
Markus Rampp A massively parallel semi-Lagrangian
solver for the six-dimensional
Vlasov--Poisson equation . . . . . . . . 924--947
David Strelák and
Carlos Óscar S. Sorzano and
José María Carazo and
Jirí Filipovic A GPU acceleration of $3$-D Fourier
reconstruction in cryo-EM . . . . . . . 948--959
Pierre Fortin and
Maxime Touche Dual tree traversal on integrated GPUs
for astrophysical $N$-body simulations 960--972
Dominic E. Charrier and
Benjamin Hazelwood and
Ekaterina Tutlyaeva and
Michael Bader and
Michael Dumbser and
Andrey Kudryavtsev and
Alexander Moskovsky and
Tobias Weinzierl Studies on the energy and deep memory
behaviour of a cache-oblivious,
task-based hyperbolic PDE solver . . . . 973--986
Jorge Ejarque and
Marc Domínguez and
Rosa M. Badia A hierarchic task-based programming
model for distributed heterogeneous
computing . . . . . . . . . . . . . . . 987--997
Ibrahim Al-Kharusi and
David W. Walker Locality properties of $3$D data
orderings with application to parallel
molecular dynamics simulations . . . . . 998--1018
Mohammad Y. Al-Shorman and
Majd M. Al-Kofahi Ultrasonic pulse propagation simulation
using OpenCL for environment mapping and
discovery . . . . . . . . . . . . . . . 1019--1029
John M. Dennis and
Brian Dobbins and
Christopher Kerr and
Youngsung Kim Optimizing the HOMME dynamical core for
multicore platforms . . . . . . . . . . 1030--1045
Ichitaro Yamazaki and
Akihiro Ida and
Rio Yokota and
Jack Dongarra Distributed-memory lattice $H$-matrix
factorization . . . . . . . . . . . . . 1046--1063
Jack Dongarra and
Bernard Tourancheau Guest editors' note: Special issue on
clusters, clouds, and data for
scientific computing . . . . . . . . . . 1067--1068
Hartwig Anzt and
Goran Flegar and
Thomas Grützmacher and
Enrique S. Quintana-Ortí Toward a modular precision ecosystem for
high-performance computing . . . . . . . 1069--1078
Mark Endrei and
Chao Jin and
Minh Ngoc Dinh and
David Abramson and
Heidi Poxon and
Luiz Derose and
Bronis R. de Supinski Statistical and machine learning models
for optimizing energy in parallel
applications . . . . . . . . . . . . . . 1079--1097
Jungwon Kim and
Jeffrey S. Vetter Implementing efficient data compression
and encryption in a persistent
key--value store for HPC . . . . . . . . 1098--1112
Heike Jagode and
Anthony Danalis and
Hartwig Anzt and
Jack Dongarra PAPI software-defined events for
in-depth performance analysis . . . . . 1113--1127
Ewa Deelman and
Anirban Mandal and
Ming Jiang and
Rizos Sakellariou The role of machine learning in
scientific workflows . . . . . . . . . . 1128--1139
Ana Gainaru and
Hongyang Sun and
Guillaume Aupy and
Yuankai Huo and
Bennett A. Landman and
Padma Raghavan On-the-fly scheduling versus
reservation-based scheduling for
unpredictable workflows . . . . . . . . 1140--1158
Daniel Balouek-Thomert and
Eduard Gibert Renart and
Ali Reza Zamani and
Anthony Simonet and
Manish Parashar Towards a computing continuum: Enabling
edge-to-cloud integration for
data-driven workflows . . . . . . . . . 1159--1174
Dylan Chapp and
Danny Rorabaugh and
Kento Sato and
Dong H. Ahn and
Michela Taufer A three-phase workflow for general and
expressive representations of
nondeterminism in HPC applications . . . 1175--1184
Guillaume Aupy and
Brice Goglin and
Valentin Honoré and
Bruno Raffin Modeling high-throughput applications
for in situ analytics . . . . . . . . . 1185--1200
Franck Cappello and
Sheng Di and
Sihuan Li and
Xin Liang and
Ali Murat Gok and
Dingwen Tao and
Chun Hong Yoon and
Xin-Chuan Wu and
Yuri Alexeev and
Frederic T. Chong Use cases of lossy compression for
floating-point data in scientific data
sets . . . . . . . . . . . . . . . . . . 1201--1220
Guillaume Aupy and
Anne Benoit and
Brice Goglin and
Lo\"\ic Pottier and
Yves Robert Co-scheduling HPC workloads on
cache-partitioned CMP platforms . . . . 1221--1239
Alexandre Denis and
Julien Jaeger and
Emmanuel Jeannot and
Marc Pérache and
Hugo Taboada Study on progress threads placement and
dedicated cores for overlapping MPI
nonblocking collectives on manycore
processor . . . . . . . . . . . . . . . 1240--1254
Li Han and
Valentin Le F\`evre and
Louis-Claude Canon and
Yves Robert and
Frédéric Vivien A generic approach to scheduling and
checkpointing workflows . . . . . . . . 1255--1274
Anand Venkat and
Tharindu Rusira and
Raj Barik and
Mary Hall and
Leonard Truong SWIRL: High-performance many-core CPU
code generation for deep neural networks 1275--1289
Thiago Sfx Teixeira and
William Gropp and
David Padua Managing code transformations for better
performance portability . . . . . . . . 1290--1306
Anonymous Corrigendum to A failure detector for
HPC platforms . . . . . . . . . . . . . NP1--NP1
José M. Cecilia Guest Editors' note: Special issue on
novel high-performance computing
algorithms and platforms in
bioinformatics . . . . . . . . . . . . . 3--4
Javier Prades and
Baldomero Imbernón and
Carlos Reaño and
Jorge Peña-García and
Jose Pedro Cerón-Carrasco and
Federico Silla and
Horacio Pérez-Sánchez Maximizing resource usage in multifold
molecular dynamics with rCUDA . . . . . 5--19
Christian Ponte-Fernández and
Jorge González-Domínguez and
María J. Martín Fast search of third-order epistatic
interactions on CPU and GPU clusters . . 20--29
Baldomero Imbernón and
Antonio Llanes and
José-Matías Cutillas-Lozano and
Domingo Giménez HYPERDOCK: Improving virtual screening
through parallel hyperheuristics . . . . 30--41
Marta Garcia-Gasulla and
Filippo Mantovani and
Marc Josep-Fabrego and
Beatriz Eguzkitza and
Guillaume Houzeaux Runtime mechanisms to survive new HPC
architectures: a use case in human
respiratory simulations . . . . . . . . 42--56
César González and
Mariano Pérez and
Juan M. Orduña and
Javier Chaves and
Ana-Bárbara García HPG-HMapper: a DNA hydroxymethylation
analysis tool . . . . . . . . . . . . . 57--65
Akrem Benatia and
Weixing Ji and
Yizhuo Wang and
Feng Shi Sparse matrix partitioning for
optimizing SpMV on CPU--GPU
heterogeneous platforms . . . . . . . . 66--80
José M. Mantas and
Francesco Vecil Hybrid OpenMP--CUDA parallel
implementation of a deterministic solver
for ultrashort DG-MOSFETs . . . . . . . 81--102
Pawe\l Russek and
Pawe\l Russek and
Ernest Jamro and
Agnieszka Dabrowska-Boruch and
Kazimierz Wiatr A study of the loops control for
reconfigurable computing with OpenCL in
the LABS local search problem . . . . . 103--114
Daobi Chen and
Liang Yuan and
Yunquan Zhang and
Jingfu Yan and
David Kahaner HPC software capability landscape in
China . . . . . . . . . . . . . . . . . 115--153
Jue Wang and
XinFu He Special issue on advanced simulation in
engineering . . . . . . . . . . . . . . 157--158
Xinming Qin and
Honghui Shang and
Lei Xu and
Wei Hu and
Jinlong Yang and
Shigang Li and
Yunquan Zhang The static parallel distribution
algorithms for hybrid density-functional
calculations in HONPAS package . . . . . 159--168
Xiaodong Hu and
Zhonghua Lu and
Jian Zhang and
Xiazhen Liu and
Wu Yuan and
Shan Liang and
Haikuo Zhang A parallel algorithm for chimera grid
with implicit hole cutting method . . . 169--177
Xianmeng Wang and
Zhifeng Zhou and
Changjun Hu and
Wen Yang and
Minfu Zhao and
Zhaoshun Wang and
Peng Shi Accelerating and tuning small matrix
multiplications on Sunway TaihuLight: a
case study of spectral element CFD Code
Nek5000 . . . . . . . . . . . . . . . . 178--186
Gary Lawson and
Masha Sosonkina and
Tal Ezer and
Yuzhong Shen Applying EMD/HHT analysis to power
traces of applications executed on
systems with Intel Xeon Phi . . . . . . 187--198
Mario Hernández and
Juan M. Cebrián and
José M. Cecilia and
José M. García Offloading strategies for Stencil
kernels on the KNC Xeon Phi
architecture: Accuracy versus
performance . . . . . . . . . . . . . . 199--207
Atsushi Hori and
Kazumi Yoshinaga and
Thomas Herault and
Aurélien Bouteiller and
George Bosilca and
Yutaka Ishikawa Overhead of using spare nodes . . . . . 208--226
Jean Luca Bez and
André Ramos Carneiro and
Pablo José Pavan and
Valéria Soldera Girelli and
Francieli Zanon Boito and
Bruno Alves Fagundes and
Carla Osthoff and
Pedro Leite da Silva Dias and
Jean-François Méhaut and
Philippe Oa Navaux I/O performance of the Santos Dumont
supercomputer . . . . . . . . . . . . . 227--245
Louis-Claude Canon and
Aurélie Kong Win Chang and
Yves Robert and
Frédéric Vivien Scheduling independent stochastic tasks
under deadline and budget constraints 246--264
Roberto Porc\`u and
Edie Miglio and
Nicola Parolini and
Mattia Penati and
Noemi Vergopolan HPC simulations of brownout: a
noninteracting particles dynamic model 267--281
Byron E. Moutafis and
George A. Gravvanis and
Christos K. Filelis-Papadopoulos Hybrid multi-projection method using
sparse approximate inverses on GPU
clusters . . . . . . . . . . . . . . . . 282--305
Tieqiang Mo and
Renfa Li Iteratively solving sparse linear system
based on PaRSEC task scheduling . . . . 306--315
David Zwick and
S. Balachandar A scalable Euler--Lagrange approach for
multiphase flow simulation on spectral
elements . . . . . . . . . . . . . . . . 316--339
Sébastien Cayrols and
Iain S. Duff and
Florent Lopez Parallelization of the solve phase in a
task-based Cholesky solver using a
sequential task flow model . . . . . . . 340--356
Jcs Kadupitiya and
Geoffrey C. Fox and
Vikram Jadhao Machine learning for parameter
auto-tuning in molecular dynamics
simulations: Efficient dynamics of ions
near polarizable nanoparticles . . . . . 357--374
Kadir Akbudak and
Hatem Ltaief and
Vincent Etienne and
Rached Abdelkhalak and
Thierry Tonellot and
David Keyes Asynchronous computations for solving
the acoustic wave propagation equation 377--393
Yang Liu and
Wissam Sid-Lakhdar and
Elizaveta Rebrova and
Pieter Ghysels and
Xiaoye Sherry Li A parallel hierarchical blocked adaptive
cross approximation algorithm . . . . . 394--408
Tom Peterka and
Deborah Bard and
Janine C. Bennett and
E. Wes Bethel and
Ron A. Oldfield and
Line Pouchard and
Christine Sweeney and
Matthew Wolf Priority research directions for in situ
data management: Enabling scientific
discovery from diverse data sources . . 409--427
Francesco Cremonesi and
Georg Hager and
Gerhard Wellein and
Felix Schürmann Analytic performance modeling and
analysis of detailed neuron simulations 428--449
Kevin Verma and
Christopher Mccabe and
Chong Peng and
Robert Wille A PCISPH implementation using
distributed multi-GPU acceleration for
simulating industrial engineering
applications . . . . . . . . . . . . . . 450--464
A. Grannan and
K. Sood and
B. Norris and
A. Dubey Understanding the landscape of
scientific software used on
high-performance computing platforms . . 465--477
Anonymous Thanks to Reviewers . . . . . . . . . . 478--478
Anonymous Corrigendum . . . . . . . . . . . . . . NP1--NP1
Anonymous Corrigendum . . . . . . . . . . . . . . NP2--NP2
Walid Keyrouz and
Michael Mascagni CRE2019 Special Issue Introduction
IJHPCA . . . . . . . . . . . . . . . . . 481--482
David H. Bailey Reproducibility and variable precision
computing . . . . . . . . . . . . . . . 483--490
Gregory Kiar and
Pablo de Oliveira Castro and
Pierre Rioux and
Eric Petit and
Shawn T. Brown and
Alan C. Evans and
Tristan Glatard Comparing perturbation models for
evaluating stability of neuroimaging
pipelines . . . . . . . . . . . . . . . 491--501
Roman Iakymchuk and
Maria Barreda Vayá and
Stef Graillat and
José I. Aliaga and
Enrique S. Quintana-Ortí Reproducibility of parallel
preconditioned conjugate gradient in
hybrid programming environments . . . . 502--518
Brett Neuman and
Andy Dubois and
Laura Monroe and
Robert W. Robey Fast, good, and repeatable: Summations,
vectorization, and reproducibility . . . 519--531
Srdan Nikoli\'c and
Nenad Stevanovi\'c and
Milos Ivanovi\'c Optimizing parallel particle tracking in
Brownian motion using machine learning 532--546
Amanda Bienz and
William D. Gropp and
Luke N. Olson Reducing communication in algebraic
multigrid with multi-step node aware
communication . . . . . . . . . . . . . 547--561
Paul Fischer and
Misun Min and
Thilina Rathnayake and
Som Dutta and
Tzanio Kolev and
Veselin Dobrev and
Jean-Sylvain Camier and
Martin Kronbichler and
Tim Warburton and
Kasia \'Swirydowicz and
Jed Brown Scalability of high-performance PDE
solvers . . . . . . . . . . . . . . . . 562--586
James D. Stevens and
Andreas Klöckner A mechanism for balancing accuracy and
scope in cross-machine black-box GPU
performance modeling . . . . . . . . . . 589--614
Masaki Iwasawa and
Daisuke Namekata and
Ryo Sakamoto and
Takashi Nakamura and
Yasuyuki Kimura and
Keigo Nitadori and
Long Wang and
Miyuki Tsubouchi and
Jun Makino and
Zhao Liu and
Haohuan Fu and
Guangwen Yang Implementation and performance of
Barnes--Hut $n$-body algorithm on
extreme-scale heterogeneous many-core
architectures . . . . . . . . . . . . . 615--628
Tianjiao Sun and
Lawrence Mitchell and
Kaushik Kulkarni and
Andreas Klöckner and
David A. Ham and
Paul H. J. Kelly A study of vectorization for matrix-free
finite element methods . . . . . . . . . 629--644
Mohammed Al Farhan and
Ahmad Abdelfattah and
Stanimire Tomov and
Mark Gates and
Dalal Sukkari and
Azzam Haidar and
Robert Rosenberg and
Jack Dongarra MAGMA templates for scalable linear
algebra on emerging architectures . . . 645--658
Cristian Ramon-Cortes and
Ramon Amela and
Jorge Ejarque and
Philippe Clauss and
Rosa M. Badia \pkgAutoParallel: Automatic
parallelisation and distributed
execution of affine loop nests in Python 659--675
Hank Childs and
Sean D. Ahern and
James Ahrens and
Andrew C. Bauer and
Janine Bennett and
E. Wes Bethel and
Peer-Timo Bremer and
Eric Brugger and
Joseph Cottam and
Matthieu Dorier and
Soumya Dutta and
Jean M. Favre and
Thomas Fogal and
Steffen Frey and
Christoph Garth and
Berk Geveci and
William F. Godoy and
Charles D. Hansen and
Cyrus Harrison and
Bernd Hentschel and
Joseph Insley and
Chris R. Johnson and
Scott Klasky and
Aaron Knoll and
James Kress and
Matthew Larsen and
Jay Lofstead and
Kwan-Liu Ma and
Preeti Malakar and
Jeremy Meredith and
Kenneth Moreland and
Paul Navrátil and
Patrick O'Leary and
Manish Parashar and
Valerio Pascucci and
John Patchett and
Tom Peterka and
Steve Petruzza and
Norbert Podhorszki and
David Pugmire and
Michel Rasquin and
Silvio Rizzi and
David H. Rogers and
Sudhanshu Sane and
Franz Sauer and
Robert Sisneros and
Han-Wei Shen and
Will Usher and
Rhonda Vickery and
Venkatram Vishwanath and
Ingo Wald and
Ruonan Wang and
Gunther H. Weber and
Brad Whitlock and
Matthew Wolf and
Hongfeng Yu and
Sean B. Ziegeler A terminology for in situ visualization
and analysis systems . . . . . . . . . . 676--691
Roman Wyrzykowski and
Ewa Deelman Guest editor's note: Special issue on
application performance optimization in
the era of extreme heterogeneity . . . . 3--4
Dominik Ernst and
Georg Hager and
Jonas Thies and
Gerhard Wellein Performance engineering for real and
complex tall & skinny matrix
multiplication kernels on GPUs . . . . . 5--19
Krzysztof Jurczuk and
Marcin Czajkowski and
Marek Kretowski Fitness evaluation reuse for
accelerating GPU-based evolutionary
induction of decision trees . . . . . . 20--32
Krzysztof Rojek and
Kamil Halbiniak and
Lukasz Kuczynski CFD code adaptation to the FPGA
architecture . . . . . . . . . . . . . . 33--46
Hannah Morgan and
Patrick Sanan and
Matthew Knepley and
Richard Tran Mills Understanding performance variability in
standard and pipelined parallel Krylov
solvers . . . . . . . . . . . . . . . . 47--59
Andreas Pieper and
Georg Hager and
Holger Fehske A domain-specific language and
matrix-free stencil code for
investigating electronic properties of
Dirac and topological materials . . . . 60--77
Yutong Ye and
Hongyin Zhu and
Chaoying Zhang and
Binghai Wen Efficient graphic processing unit
implementation of the chemical-potential
multiphase lattice Boltzmann method . . 78--96
Bartosz Kohnke and
Carsten Kutzner and
Andreas Beckmann and
Gert Lube and
Ivo Kabadshow and
Holger Dachsel and
Helmut Grubmüller A CUDA fast multipole method with highly
efficient M2L far field evaluation . . . 97--117
Wenpeng Ma and
Xiao-Chuan Cai Point-block incomplete $ L U $
preconditioning with asynchronous
iterations on GPU for multiphysics
problems . . . . . . . . . . . . . . . . 121--135
Ross Adelman Highly parallel boundary element method
for solving extremely large, wide-area
power-line models . . . . . . . . . . . 136--153
Weijian Zheng and
Dali Wang and
Fengguang Song Designing a parallel Feel-the-Way
clustering algorithm on HPC systems . . 154--169
Vedran Novakovi\'c and
Sanja Singer Implicit Hari--Zimmermann algorithm for
the generalized SVD on the GPUs . . . . 170--205
Peter Benner and
Enrique Quintana-Ortí and
Jens Saak Introduction to the Special Issue
related to the Power-Aware Computing
Workshop 2019-PACO 2019 . . . . . . . . 209--210
Jonas Dünnebacke and
Stefan Turek and
Christoph Lohmann and
Andriy Sokolov and
Peter Zajac Increased space-parallelism via
time-simultaneous Newton-multigrid
methods for nonstationary nonlinear PDE
problems . . . . . . . . . . . . . . . . 211--225
Pratik Nayak and
Terry Cojean and
Hartwig Anzt Evaluating asynchronous Schwarz solvers
on GPUs . . . . . . . . . . . . . . . . 226--236
Axel Klawonn and
Martin Lanser and
Oliver Rheinbach and
Gerhard Wellein and
Markus Wittmann Energy efficiency of nonlinear domain
decomposition methods . . . . . . . . . 237--253
Ernesto Dufrechou and
Pablo Ezzatti and
Enrique S. Quintana-Ortí Selecting optimal SpMV realizations for
GPUs via machine learning . . . . . . . 254--267
Maria Barreda and
Manuel F. Dolz and
M. Asunción Castaño Convolutional neural nets for estimating
the run time and energy consumption of
the sparse matrix--vector product . . . 268--281
Tommaso Benacchio and
Luca Bonaventura and
Mirco Altenbernd and
Chris D. Cantwell and
Peter D. Düben and
Mike Gillard and
Luc Giraud and
Dominik Göddeke and
Erwan Raffin and
Keita Teranishi and
Nils Wedi Resilience and fault tolerance in
high-performance computing for numerical
weather and climate prediction . . . . . 285--311
Nikolay Kondratyuk and
Vsevolod Nikolskiy and
Daniil Pavlov and
Vladimir Stegailov GPU-accelerated molecular dynamics:
State-of-art software performance and
porting from Nvidia CUDA to AMD HIP . . 312--324
Hatem Elshazly and
Francesc Lordan and
Jorge Ejarque and
Rosa M. Badia Accelerated execution via eager-release
of dependencies in task-based workflows 325--343
Ahmad Abdelfattah and
Hartwig Anzt and
Erik G. Boman and
Erin Carson and
Terry Cojean and
Jack Dongarra and
Alyson Fox and
Mark Gates and
Nicholas J. Higham and
Xiaoye S. Li and
Jennifer Loe and
Piotr Luszczek and
Srikara Pranesh and
Siva Rajamanickam and
Tobias Ribizel and
Barry F. Smith and
Kasia \'Swirydowicz and
Stephen Thomas and
Stanimire Tomov and
Yaohung M. Tsai and
Ulrike Meier Yang A survey of numerical linear algebra
methods utilizing mixed-precision
arithmetic . . . . . . . . . . . . . . . 344--369
Karl-Robert Wichmann and
Martin Kronbichler and
Rainald Löhner and
Wolfgang A. Wall A runtime based comparison of highly
tuned lattice Boltzmann and finite
difference solvers . . . . . . . . . . . 370--390
Shu-Mei Tseng and
Bogdan Nicolae and
Franck Cappello and
Aparna Chandramowlishwaran Demystifying asynchronous I/O
Interference in HPC applications . . . . 391--412
Markus Holzer and
Martin Bauer and
Harald Köstler and
Ulrich Rüde Highly efficient lattice Boltzmann
multiphase simulations of immiscible
fluids at high-density ratios on CPUs
and GPUs through code generation . . . . 413--427
Bronis R. de Supinski Special Issue Introduction: The Gordon
Bell Special Prize for HPC-Based
COVID-19 Research Finalists . . . . . . 431--431
Lorenzo Casalino and
Abigail C. Dommer and
Zied Gaieb and
Emilia P. Barros and
Terra Sztain and
Surl-Hee Ahn and
Anda Trifan and
Alexander Brace and
Anthony T. Bogetti and
Austin Clyde and
Heng Ma and
Hyungro Lee and
Matteo Turilli and
Syma Khalid and
Lillian T. Chong and
Carlos Simmerling and
David J. Hardy and
Julio Dc Maia and
James C. Phillips and
Thorsten Kurth and
Abraham C. Stern and
Lei Huang and
John D. Mccalpin and
Mahidhar Tatineni and
Tom Gibbs and
John E. Stone and
Shantenu Jha and
Arvind Ramanathan and
Rommie E. Amaro AI-driven multiscale simulations
illuminate mechanisms of SARS-CoV-2
spike dynamics . . . . . . . . . . . . . 432--451
Jens Glaser and
Josh V. Vermaas and
David M. Rogers and
Jeff Larkin and
Scott Legrand and
Swen Boehm and
Matthew B. Baker and
Aaron Scheinberg and
Andreas F. Tillack and
Mathialakan Thavappiragasam and
Ada Sedova and
Oscar Hernandez High-throughput virtual laboratory for
drug discovery using massive datasets 452--468
Sam Ade Jacobs and
Tim Moon and
Kevin Mcloughlin and
Derek Jones and
David Hysom and
Dong H. Ahn and
John Gyllenhaal and
Pythagoras Watson and
Felice C. Lightstone and
Jonathan E. Allen and
Ian Karlin and
Brian Van Essen Enabling rapid COVID-19 small molecule
drug design through scalable deep
learning of generative models . . . . . 469--482
Jonathan Ozik and
Justin M. Wozniak and
Nicholson Collier and
Charles M. Macal and
Mickaël Binois A population data-driven workflow for
COVID-19 modeling and learning . . . . . 483--499
Timothy C. Germann Co-design in the Exascale Computing
Project . . . . . . . . . . . . . . . . 503--507
Weiqun Zhang and
Andrew Myers and
Kevin Gott and
Ann Almgren and
John Bell AMReX: Block-structured adaptive mesh
refinement for multiphysics applications 508--526
Tzanio Kolev and
Paul Fischer and
Misun Min and
Jack Dongarra and
Jed Brown and
Veselin Dobrev and
Tim Warburton and
Stanimire Tomov and
Mark S. Shephard and
Ahmad Abdelfattah and
Valeria Barra and
Natalie Beams and
Jean-Sylvain Camier and
Noel Chalmers and
Yohann Dudouit and
Ali Karakus and
Ian Karlin and
Stefan Kerkemeier and
Yu-Hsiang Lan and
David Medina and
Elia Merzari and
Aleksandr Obabko and
Will Pazner and
Thilina Rathnayake and
Cameron W. Smith and
Lukas Spies and
Kasia \'Swirydowicz and
Jeremy Thompson and
Ananias Tomboulides and
Vladimir Tomov Efficient exascale discretizations:
High-order finite element methods . . . 527--552
Seher Acer and
Ariful Azad and
Erik G. Boman and
Aydin Buluç and
Karen D. Devine and
Sm Ferdous and
Nitin Gawande and
Sayan Ghosh and
Mahantesh Halappanavar and
Ananth Kalyanaraman and
Arif Khan and
Marco Minutoli and
Alex Pothen and
Sivasankaran Rajamanickam and
Oguz Selvitopi and
Nathan R. Tallent and
Antonino Tumeo EXAGRAPH: Graph and combinatorial
methods for enabling exascale
applications . . . . . . . . . . . . . . 553--571
Susan M. Mniszewski and
James Belak and
Jean-Luc Fattebert and
Christian Fa Negre and
Stuart R. Slattery and
Adetokunbo A. Adedoyin and
Robert F. Bird and
Choongseok Chang and
Guangye Chen and
Stéphane Ethier and
Shane Fogerty and
Salman Habib and
Christoph Junghans and
Damien Lebrun-Grandié and
Jamaludin Mohd-Yusof and
Stan G. Moore and
Daniel Osei-Kuffuor and
Steven J. Plimpton and
Adrian Pope and
Samuel Temple Reeve and
Lee Ricketson and
Aaron Scheinberg and
Amil Y. Sharma and
Michael E. Wall Enabling particle applications for
exascale computing platforms . . . . . . 572--597
Francis J. Alexander and
James Ang and
Jenna A. Bilbrey and
Jan Balewski and
Tiernan Casey and
Ryan Chard and
Jong Choi and
Sutanay Choudhury and
Bert Debusschere and
Anthony M. Degennaro and
Nikoli Dryden and
J. Austin Ellis and
Ian Foster and
Cristina Garcia Cardona and
Sayan Ghosh and
Peter Harrington and
Yunzhi Huang and
Shantenu Jha and
Travis Johnston and
Ai Kagawa and
Ramakrishnan Kannan and
Neeraj Kumar and
Zhengchun Liu and
Naoya Maruyama and
Satoshi Matsuoka and
Erin Mccarthy and
Jamaludin Mohd-Yusof and
Peter Nugent and
Yosuke Oyama and
Thomas Proffen and
David Pugmire and
Sivasankaran Rajamanickam and
Vinay Ramakrishniah and
Malachi Schram and
Sudip K. Seal and
Ganesh Sivaraman and
Christine Sweeney and
Li Tan and
Rajeev Thakur and
Brian Van Essen and
Logan Ward and
Paul Welch and
Michael Wolf and
Sotiris S. Xantheas and
Kevin G. Yager and
Shinjae Yoo and
Byung-Jun Yoon Co-design Center for Exascale Machine
Learning Technologies (ExaLearn) . . . . 598--616
Ian Foster and
Mark Ainsworth and
Julie Bessac and
Franck Cappello and
Jong Choi and
Sheng Di and
Zichao Di and
Ali M. Gok and
Hanqi Guo and
Kevin A. Huck and
Christopher Kelly and
Scott Klasky and
Kerstin Kleese van Dam and
Xin Liang and
Kshitij Mehta and
Manish Parashar and
Tom Peterka and
Line Pouchard and
Tong Shu and
Ozan Tugluk and
Hubertus van Dam and
Lipeng Wan and
Matthew Wolf and
Justin M. Wozniak and
Wei Xu and
Igor Yakushin and
Shinjae Yoo and
Todd Munson Online data analysis and reduction: an
important Co-design motif for
extreme-scale computers . . . . . . . . 617--635
Thomas M. Evans and
Julia C. White Multiphysics coupling in the Exascale
Computing Project . . . . . . . . . . . 3--4
Thomas M. Evans and
Andrew Siegel and
Erik W. Draeger and
Jack Deslippe and
Marianne M. Francois and
Timothy C. Germann and
William E. Hart and
Daniel F. Martin A survey of software implementations
used by application codes in the
Exascale Computing Project . . . . . . . 5--12
John A. Turner and
James Belak and
Nathan Barton and
Matthew Bement and
Neil Carlson and
Robert Carson and
Stephen Dewitt and
Jean-Luc Fattebert and
Neil Hodge and
Zechariah Jibben and
Wayne King and
Lyle Levine and
Christopher Newman and
Alex Plotkowski and
Balasubramaniam Radhakrishnan and
Samuel Temple Reeve and
Matthew Rolchigo and
Adrian Sabau and
Stuart Slattery and
Benjamin Stump ExaAM: Metal additive manufacturing
simulation at the fidelity of the
microstructure . . . . . . . . . . . . . 13--39
Jordan Musser and
Ann S. Almgren and
William D. Fullmer and
Oscar Antepara and
John B. Bell and
Johannes Blaschke and
Kevin Gott and
Andrew Myers and
Roberto Porcu and
Deepak Rangarajan and
Michele Rosso and
Weiqun Zhang and
Madhava Syamlal MFIX-Exa: a path toward exascale CFD-DEM
simulations . . . . . . . . . . . . . . 40--58
J. Austin Harris and
Ran Chu and
Sean M. Couch and
Anshu Dubey and
Eirik Endeve and
Antigoni Georgiadou and
Rajeev Jain and
Daniel Kasen and
M. P. Laiu and
Oe B. Messer and
Jared O'Neal and
Michael A. Sandoval and
Klaus Weide Exascale models of stellar explosions:
Quintessential multi-physics simulation 59--77
David Mccallen and
Houjun Tang and
Suiwen Wu and
Eric Eckert and
Junfei Huang and
N. Anders Petersson Coupling of regional geophysics and
local soil-structure models in the EQSIM
fault-to-structure earthquake simulation
framework . . . . . . . . . . . . . . . 78--92
Matthew R. Norman and
David A. Bader and
Christopher Eldred and
Walter M. Hannah and
Benjamin R. Hillman and
Christopher R. Jones and
Jungmin M. Lee and
Lr Leung and
Isaac Lyngaas and
Kyle G. Pressel and
Sarat Sreepathi and
Mark A. Taylor and
Xingqiu Yuan Unprecedented cloud resolution in a
GPU-enabled full-physics atmospheric
climate simulation on OLCF's Summit
supercomputer . . . . . . . . . . . . . 93--105
Eric Suchyta and
Scott Klasky and
Norbert Podhorszki and
Matthew Wolf and
Abolaji Adesoji and
Cs Chang and
Jong Choi and
Philip E. Davis and
Julien Dominski and
Stéphane Ethier and
Ian Foster and
Kai Germaschewski and
Berk Geveci and
Chris Harris and
Kevin A. Huck and
Qing Liu and
Jeremy Logan and
Kshitij Mehta and
Gabriele Merlo and
Shirley V. Moore and
Todd Munson and
Manish Parashar and
David Pugmire and
Mark S. Shephard and
Cameron W. Smith and
Pradeep Subedi and
Lipeng Wan and
Ruonan Wang and
Shuangxi Zhang The Exascale Framework for High Fidelity
coupled Simulations (EFFIS): Enabling
whole device modeling in fusion science 106--128
John A. Taylor and
Pablo Larraondo and
Bronis R. de Supinski Data-driven global weather predictions
at high resolutions . . . . . . . . . . 130--140
Pascal R. Bähr and
Bruno Lang and
Peer Ueberholz and
Marton Ady and
Roberto Kersevan Development of a hardware-accelerated
simulation kernel for ultra-high vacuum
with Nvidia RTX GPUs . . . . . . . . . . 141--152
Giovanni Isotton and
Carlo Janna and
Massimo Bernaschi A GPU-accelerated adaptive FSAI
preconditioner for massively parallel
simulations . . . . . . . . . . . . . . 153--166
Zhi Yao and
Revathi Jambunathan and
Yadong Zeng and
Andrew Nonaka A massively parallel time-domain coupled
electrodynamics-micromagnetics solver 167--181
Yuta Hirokawa and
Atsushi Yamada and
Shunsuke Yamada and
Masashi Noda and
Mitsuharu Uemoto and
Taisuke Boku and
Kazuhiro Yabana Large-scale ab initio simulation of
light-matter interaction at the atomic
scale in Fugaku . . . . . . . . . . . . 182--197
Mojtaba Barzegari and
Liesbet Geris Highly scalable numerical simulation of
coupled reaction--diffusion systems with
moving interfaces . . . . . . . . . . . 198--213
Isaac Lyngaas and
Matthew Norman and
Youngsung Kim SAM++: Porting the E3SM-MMF cloud
resolving model using a C++ portability
library . . . . . . . . . . . . . . . . 214--230
Leigh Lapworth Parallel encryption of input and output
data for HPC applications . . . . . . . 231--250
Emmanuel Agullo and
Mirco Altenbernd and
Hartwig Anzt and
Leonardo Bautista-Gomez and
Tommaso Benacchio and
Luca Bonaventura and
Hans-Joachim Bungartz and
Sanjay Chatterjee and
Florina M. Ciorba and
Nathan Debardeleben and
Daniel Drzisga and
Sebastian Eibl and
Christian Engelmann and
Wilfried N. Gansterer and
Luc Giraud and
Dominik Göddeke and
Marco Heisig and
Fabienne Jézéquel and
Nils Kohl and
Xiaoye Sherry Li and
Romain Lion and
Miriam Mehl and
Paul Mycek and
Michael Obersteiner and
Enrique S. Quintana-Ortí and
Francesco Rizzi and
Ulrich Rüde and
Martin Schulz and
Fred Fung and
Robert Speck and
Linda Stals and
Keita Teranishi and
Samuel Thibault and
Dominik Thönnes and
Andreas Wagner and
Barbara Wohlmuth Resiliency in numerical algorithm design
for extreme scale simulations . . . . . 251--285
Stephen Herbein and
Tapasya Patki and
Dong H. Ahn and
Sebastian Mobo and
Clark Hathaway and
Silvina Caíno-Lores and
James Corbett and
David Domyancic and
Thomas RW Scogland and
Bronis R. de Supinski and
Michela Taufer An analytical performance model of
generalized hierarchical scheduling . . 289--306
Roel Van Beeumen and
Khaled Z. Ibrahim and
Gregory D. Kahanamoku-Meyer and
Norman Y. Yao and
Chao Yang Enhancing scalability of a matrix-free
eigensolver for studying many-body
localization . . . . . . . . . . . . . . 307--319
Michiel Van Gendt and
Tim Besard and
Stefaan Vandenberghe and
Bjorn De Sutter Productively accelerating positron
emission tomography image reconstruction
on graphics processing units with Julia 320--336
Jatin Gharat and
Bipin Kumar and
Leena Ragha and
Amit Barve and
Shaik Mohammad Jeelani and
John Clyne Development of NCL equivalent serial and
parallel Python routines for
meteorological data analysis . . . . . . 337--355
Adrian P. Dieguez and
Margarita Amor and
Ramón Doallo and
Akira Nukada and
Satoshi Matsuoka Efficient high-precision integer
multiplication on the GPU . . . . . . . 356--369
Ii Michael R. Wyatt and
Stephen Herbein and
Todd Gamblin and
Michela Taufer AI4IO: a suite of AI-based tools for
IO-aware scheduling . . . . . . . . . . 370--387
Heather Pacella and
Alec Dunton and
Alireza Doostan and
Gianluca Iaccarino Task-parallel in situ temporal
compression of large-scale computational
fluid dynamics data . . . . . . . . . . 388--418
Pablo Antonio Martínez and
Biagio Peccerillo and
Sandro Bartolini and
José M. García and
Gregorio Bernabé Performance portability in a real world
application: PHAST applied to Caffe . . 419--439
Andrew Kassen and
Varun Shankar and
Aaron L. Fogelson A fine-grained parallelization of the
immersed boundary method . . . . . . . . 443--458
Dustin Ruda and
Stefan Turek and
Dirk Ribbrock and
Peter Zajac Very fast finite element Poisson solvers
on lower precision accelerator hardware:
a proof of concept study for Nvidia
Tesla V100 . . . . . . . . . . . . . . . 459--474
Hiroyuki Ootomo and
Rio Yokota Recovering single precision accuracy
from Tensor Cores while surpassing the
FP32 theoretical peak performance . . . 475--491
Arturo Vargas and
Thomas M. Stitt and
Kenneth Weiss and
Vladimir Z. Tomov and
Jean-Sylvain Camier and
Tzanio Kolev and
Robert N. Rieben Matrix-free approaches for GPU
acceleration of a high-order finite
element hydrodynamics application using
MFEM, Umpire, and RAJA . . . . . . . . . 492--509
R Lily Hu and
Damien Pierce and
Yusef Shafi and
Anudhyan Boral and
Vladimir Anisimov and
Sella Nevo and
Yi-fan Chen Accelerating physics simulations with
tensor processing units: an inundation
modeling example . . . . . . . . . . . . 510--523
Marcin Rogowski and
Lisandro Dalcin and
Matteo Parsani and
David E. Keyes Performance analysis of relaxation
Runge--Kutta methods . . . . . . . . . . 524--542
Sebastian Friedemann and
Bruno Raffin An elastic framework for ensemble-based
large-scale data assimilation . . . . . 543--563
Anonymous Corrigendum to `Unprecedented cloud
resolution in a GPU-enabled full-physics
atmospheric climate simulation on OLCF's
Summit supercomputer' . . . . . . . . . 564
Anonymous Special issue introduction . . . . . . . 567
Kazuto Ando and
Rahul Bale and
ChungGang Li and
Satoshi Matsuoka and
Keiji Onishi and
Makoto Tsubokura Digital transformation of
droplet/aerosol infection risk
assessment realized on ``Fugaku'' for
the fight against COVID-19 . . . . . . . 568--586
Andrew E. Blanchard and
John Gounley and
Debsindhu Bhowmik and
Mayanka Chandra Shekar and
Isaac Lyngaas and
Shang Gao and
Junqi Yin and
Aristeidis Tsaris and
Feiyi Wang and
Jens Glaser Language models for the prediction of
SARS-CoV-2 inhibitors . . . . . . . . . 587--602
Anda Trifan and
Defne Gorgun and
Michael Salim and
Zongyi Li and
Alexander Brace and
Maxim Zvyagin and
Heng Ma and
Austin Clyde and
David Clark and
David J. Hardy and
Tom Burnley and
Lei Huang and
John McCalpin and
Murali Emani and
Hyenseung Yoo and
Junqi Yin and
Aristeidis Tsaris and
Vishal Subbiah and
Tanveer Raza and
Jessica Liu and
Noah Trebesch and
Geoffrey Wells and
Venkatesh Mysore and
Thomas Gibbs and
James Phillips and
S. Chakra Chennubhotla and
Ian Foster and
Rick Stevens and
Anima Anandkumar and
Venkatram Vishwanath and
John E. Stone and
Emad Tajkhorshid and
Sarah A. Harris and
Arvind Ramanathan Intelligent resolution: Integrating
Cryo-EM with AI-driven multi-resolution
simulations to observe the severe acute
respiratory syndrome coronavirus-2
replication-transcription machinery in
action . . . . . . . . . . . . . . . . . 603--623
Mark Parsons Special issue: Introduction . . . . . . 3
Parantapa Bhattacharya and
Jiangzhuo Chen and
Stefan Hoops and
Dustin Machi and
Bryan Lewis and
Srinivasan Venkatramanan and
Mandy L. Wilson and
Brian Klahn and
Aniruddha Adiga and
Benjamin Hurt and
Joseph Outten and
Abhijin Adiga and
Andrew Warren and
Young Yun Baek and
Przemyslaw Porebski and
Achla Marathe and
Dawen Xie and
Samarth Swarup and
Anil Vullikanti and
Henning Mortveit and
Stephen Eubank and
Christopher L. Barrett and
Madhav Marathe Data-driven scalable pipeline using
national agent-based models for
real-time pandemic response and decision
support . . . . . . . . . . . . . . . . 4--27
Abigail Dommer and
Lorenzo Casalino and
Fiona Kearns and
Mia Rosenfeld and
Nicholas Wauer and
Surl-Hee Ahn and
John Russo and
Sofia Oliveira and
Clare Morris and
Anthony Bogetti and
Anda Trifan and
Alexander Brace and
Terra Sztain and
Austin Clyde and
Heng Ma and
Chakra Chennubhotla and
Hyungro Lee and
Matteo Turilli and
Syma Khalid and
Teresa Tamayo-Mendoza and
Matthew Welborn and
Anders Christensen and
Daniel GA Smith and
Zhuoran Qiao and
Sai K. Sirumalla and
Michael O'Connor and
Frederick Manby and
Anima Anandkumar and
David Hardy and
James Phillips and
Abraham Stern and
Josh Romero and
David Clark and
Mitchell Dorrell and
Tom Maiden and
Lei Huang and
John McCalpin and
Christopher Woods and
Alan Gray and
Matt Williams and
Bryan Barker and
Harinda Rajapaksha and
Richard Pitts and
Tom Gibbs and
John Stone and
Daniel M. Zuckerman and
Adrian J. Mulholland and
Thomas Miller and
Shantenu Jha and
Arvind Ramanathan and
Lillian Chong and
Rommie E. Amaro #COVIDisAirborne: AI-enabled multiscale
computational microscopy of delta
SARS-CoV-2 in a respiratory aerosol . . 28--44
Zhe Li and
Chengkun Wu and
Yishui Li and
Runduo Liu and
Kai Lu and
Ruibo Wang and
Jie Liu and
Chunye Gong and
Canqun Yang and
Xin Wang and
Chang-Guo Zhan and
Hai-Bin Luo Free energy perturbation-based
large-scale virtual screening for
effective drug discovery against
COVID-19 . . . . . . . . . . . . . . . . 45--57
Martin Kronbichler and
Dmytro Sashko and
Peter Munch Enhancing data locality of the conjugate
gradient method for high-order
matrix-free finite-element
implementations . . . . . . . . . . . . 61--81
José I. Aliaga and
Hartwig Anzt and
Thomas Grützmacher and
Enrique S. Quintana-Ortí and
Andrés E. Tomás Compressed basis GMRES on
high-performance graphics processing
units . . . . . . . . . . . . . . . . . 82--100
John M. Dennis and
Allison H. Baker and
Brian Dobbins and
Michael M. Bell and
Jian Sun and
Youngsung Kim and
Ting-Yu Cha Enabling efficient execution of a
variational data assimilation
application . . . . . . . . . . . . . . 101--114
Marc T. Henry de Frahan and
Jon S. Rood and
Marc S. Day and
Hariswaran Sitaraman and
Shashank Yellapantula and
Bruce A. Perry and
Ray W. Grout and
Ann Almgren and
Weiqun Zhang and
John B. Bell and
Jacqueline H. Chen PeleC: an adaptive mesh refinement
solver for compressible reacting flows 115--131
Long Qu and
Rached Abdelkhalak and
Hatem Ltaief and
Issam Said and
David Keyes Exploiting temporal data reuse and
asynchrony in the reverse time migration 132--150
Jakub Sístek and
Tomás Oberhuber Acceleration of a parallel BDDC solver
by using graphics processing units on
subdomains . . . . . . . . . . . . . . . 151--164
Florent Lopez and
Theo Mary Mixed precision $ L U $ factorization on
GPU tensor cores: reducing data movement
and memory footprint . . . . . . . . . . 165--179
Lukas Einkemmer and
Alexander Moriggl Semi-Lagrangian 4d, 5d, and 6d kinetic
plasma simulation on large-scale
GPU-equipped supercomputers . . . . . . 180--196
Jacques Middlecoff and
Yonggang G. Yu and
Mark W. Govett Performance comparison of the A-grid and
C-grid shallow-water models on
icosahedral grids . . . . . . . . . . . 197--208
Jack Dongarra and
Bernard Tourancheau Guest editors note: Special issue on
clusters, clouds, and data for
scientific computing . . . . . . . . . . 211--212
Emmanuel Jeannot and
Guillaume Pallez and
Nicolas Vidal IO-aware Job-Scheduling: Exploiting the
Impacts of Workload Characterizations to
select the Mapping Strategy . . . . . . 213--228
Piotr Luszczek and
Wissam M. Sid-Lakhdar and
Jack Dongarra Combining multitask and transfer
learning with deep Gaussian processes
for autotuning-based performance
engineering . . . . . . . . . . . . . . 229--244
Satoshi Matsuoka and
Jens Domke and
Mohamed Wahib and
Aleksandr Drozd and
Torsten Hoefler Myths and legends in high-performance
computing . . . . . . . . . . . . . . . 245--259
Naweiluo Zhou and
Giorgio Scorzelli and
Jakob Luettgau and
Rahul R. Kancharla and
Joshua J. Kane and
Robert Wheeler and
Brendan P. Croom and
Pania Newell and
Valerio Pascucci and
Michela Taufer Orchestration of materials science
workflows for heterogeneous resources at
large scale . . . . . . . . . . . . . . 260--271
Jorge Ejarque and
Rosa M. Badia Automatizing the creation of specialized
high-performance computing containers 272--287
Sadaf R. Alam and
Miguel Gila and
Mark Klein and
Maxime Martinasso and
Thomas C. Schulthess Versatile software-defined HPC and cloud
clusters on Alps supercomputer for
diverse workflows . . . . . . . . . . . 288--305
Sanjukta Bhowmick and
Patrick Bell and
Michela Taufer A Survey of Graph Comparison Methods
with Applications to Nondeterminism in
High-Performance Computing . . . . . . . 306--327
Vladimir Ostapenco and
Laurent Lef\`evre and
Anne-Cécile Orgerie and
Benjamin Fichel Modeling, evaluating, and orchestrating
heterogeneous environmental leverages
for large-scale data center management 328--350
Jeffrey S. Vetter and
Prasanna Date and
Farah Fahim and
Shruti R. Kulkarni and
Petro Maksymovych and
A. Alec Talin and
Marc Gonzalez Tallada and
Pruek Vanna-iampikul and
Aaron R. Young and
David Brooks and
Yu Cao and
Wei Gu-Yeon and
Sung Kyu Lim and
Frank Liu and
Matthew Marinella and
Bobby Sumpter and
Narasinga Rao Miniskar Abisko: Deep codesign of an architecture
for spiking neural networks using novel
neuromorphic materials . . . . . . . . . 351--379
Andrés E. Tomás and
Enrique S. Quintana-Orti and
Hartwig Anzt Fast truncated SVD of sparse and dense
matrices on graphics processors . . . . 380--393
Hongwei Jin and
Krishnan Raghavan and
George Papadimitriou and
Cong Wang and
Anirban Mandal and
Mariam Kiran and
Ewa Deelman and
Prasanna Balaprakash Graph neural networks for detecting
anomalies in scientific workflows . . . 394--411
Robert Underwood and
Julie Bessac and
David Krasowska and
Jon C. Calhoun and
Sheng Di and
Franck Cappello Black-box statistical prediction of
lossy compression ratios for scientific
data . . . . . . . . . . . . . . . . . . 412--433
Olga Pearce and
Stephanie Brink Finding the forest in the trees:
Enabling performance optimization on
heterogeneous architectures through data
science analysis of ensemble performance
data . . . . . . . . . . . . . . . . . . 434--441
Sabra Ossen and
Jeremy Musser and
Luke Dalessandro and
Martin Swany INDIANA --- In-Network Distributed
Infrastructure for Advanced Network
Applications . . . . . . . . . . . . . . 442--461
Philipp Grete and
Joshua C. Dolence and
Jonah M. Miller and
Joshua Brown and
Ben Ryan and
Andrew Gaspar and
Forrest Glines and
Sriram Swaminarayan and
Jonas Lippuner and
Clell J. Solomon and
Galen Shipman and
Christoph Junghans and
Daniel Holladay and
James M. Stone and
Luke F. Roberts Parthenon --- a performance portable
block-structured adaptive mesh
refinement framework . . . . . . . . . . 465--486
Martin Karp and
Daniele Massaro and
Niclas Jansson and
Alistair Hart and
Jacob Wahlgren and
Philipp Schlatter and
Stefano Markidis Large-Scale direct numerical simulations
of turbulence using GPUs and modern
Fortran . . . . . . . . . . . . . . . . 487--502
Sergio Iserte and
Alejandro González-Barberá and
Paloma Barreda and
Krzysztof Rojek A study on the performance of
distributed training of data-driven CFD
simulations . . . . . . . . . . . . . . 503--515
He Bai and
Changjun Hu and
Yuhan Zhu and
Dandan Chen and
Genshen Chu and
Shuai Ren Accelerating cluster dynamics simulation
of fission gas behavior in nuclear fuel
on deep computing unit-based
heterogeneous architecture supercomputer 516--529
Robert Schade and
Tobias Kenter and
Hossam Elgabarty and
Michael Lass and
Thomas D. Kühne and
Christian Plessl Breaking the exascale barrier for the
electronic structure problem in
ab-initio molecular dynamics . . . . . . 530--538
Megan Hickman Fulp and
Dakota Fulp and
Changfeng Zou and
Cooper Sanders and
Ayan Biswas and
Melissa C. Smith and
Jon C. Calhoun Accelerated dynamic data reduction using
spatial and temporal properties . . . . 539--559
Noel Chalmers and
Abhishek Mishra and
Damon McDougall and
Tim Warburton HipBone: a performance-portable graphics
processing unit-accelerated C++ version
of the NekBone benchmark . . . . . . . . 560--577
Will Pazner and
Tzanio Kolev and
Jean-Sylvain Camier End-to-end GPU acceleration of
low-order-refined preconditioning for
high-order finite element
discretizations . . . . . . . . . . . . 578--599
Jerry Watkins and
Max Carlson and
Kyle Shan and
Irina Tezaur and
Mauro Perego and
Luca Bertagna and
Carolyn Kao and
Matthew J. Hoffman and
Stephen F. Price Performance portable ice-sheet modeling
with MALI . . . . . . . . . . . . . . . 600--625
Marc Gonzalez Tallada and
Enric Morancho Heterogeneous programming using OpenMP
and CUDA/HIP for hybrid CPU-GPU
scientific applications . . . . . . . . 626--646
Edmond Chow Editorial . . . . . . . . . . . . . . . 649
Jie Chen and
Zhiwei Nie and
Yu Wang and
Kai Wang and
Fan Xu and
Zhiheng Hu and
Bing Zheng and
Zhennan Wang and
Guoli Song and
Jingyi Zhang and
Jie Fu and
Xiansong Huang and
Zhongqi Wang and
Zhixiang Ren and
Qiankun Wang and
Daixi Li and
Dongqing Wei and
Bin Zhou and
Chao Yang and
Yonghong Tian Running ahead of evolution-AI-based
simulation for predicting future
high-risk SARS-CoV-2 variants . . . . . 650--665
Darren J. Hsu and
Hao Lu and
Aditya Kashi and
Michael Matheson and
John Gounley and
Feiyi Wang and
Wayne Joubert and
Jens Glaser TwoFold: Highly accurate structure and
affinity prediction for protein-ligand
complexes from sequences . . . . . . . . 666--682
Maxim Zvyagin and
Alexander Brace and
Kyle Hippe and
Yuntian Deng and
Bin Zhang and
Cindy Orozco Bohorquez and
Austin Clyde and
Bharat Kale and
Danilo Perez-Rivera and
Heng Ma and
Carla M. Mann and
Michael Irvin and
Defne G. Ozgulbas and
Natalia Vassilieva and
James Gregory Pauloski and
Logan Ward and
Valerie Hayot-Sasson and
Murali Emani and
Sam Foreman and
Zhen Xie and
Diangen Lin and
Maulik Shukla and
Weili Nie and
Josh Romero and
Christian Dallago and
Arash Vahdat and
Chaowei Xiao and
Thomas Gibbs and
Ian Foster and
James J. Davis and
Michael E. Papka and
Thomas Brettin and
Rick Stevens and
Anima Anandkumar and
Venkatram Vishwanath and
Arvind Ramanathan GenSLMs: Genome-scale language models
reveal SARS-CoV-2 evolutionary dynamics 683--705
Roman Wyrzykowski and
Ewa Deelman Guest Editor's note: Special issue on
challenges and solutions for porting
applications to next-generation high
performance computing systems . . . . . 3--4
Daniel Langr and
Tomás Dytrych Parallel multithreaded deduplication of
data sequences in nuclear structure
calculations . . . . . . . . . . . . . . 5--16
Roman Iakymchuk and
Stef Graillat and
José I. Aliaga General framework for re-assuring
numerical reliability in parallel Krylov
solvers: a case of bi-conjugate gradient
stabilized methods . . . . . . . . . . . 17--33
Daniil Pavlov and
Vladislav Galigerov and
Daniil Kolotinskii and
Vsevolod Nikolskiy and
Vladimir Stegailov GPU-based molecular dynamics of fluid
flows: Reaching for turbulence . . . . . 34--49
Jesus Carretero and
Estela Suarez and
Martin Schulz Malleability techniques applications in
high-performance computing . . . . . . . 53--54
Rafael Rodríguez-Sánchez and
Adrián Castelló and
Sandra Catalán and
Francisco D. Igual and
Enrique S. Quintana-Ortí Experiences with nested parallelism in
task-parallel applications using
malleable BLAS on multicore processors 55--68
Iker Martín-Álvarez and
José I. Aliaga and
Maribel Castillo and
Sergio Iserte and
Rafael Mayo Dynamic spawning of MPI processes
applied to malleability . . . . . . . . 69--93
Joel Criado and
Victor Lopez and
Joan Vinyals-Ylla-Catala and
Guillem Ramirez-Miranda and
Xavier Teruel and
Marta Garcia-Gasulla Role-shifting threads: Increasing OpenMP
malleability to address load imbalance
at MPI and OpenMP . . . . . . . . . . . 94--107
Alberto Cascajo and
David E. Singh and
Jesús Carretero Detecting interference between
applications and improving the
scheduling using malleable application
clones . . . . . . . . . . . . . . . . . 108--133
Natsuki Hosono and
Mikito Furuichi Efficient implementation of
low-order-precision smoothed particle
hydrodynamics . . . . . . . . . . . . . 137--153
Anders Melander and
Emil Stròm and
Finnur Pind and
Allan P. Engsig-Karup and
Cheol-Ho Jeong and
Tim Warburton and
Noel Chalmers and
Jan S. Hesthaven Massively parallel nodal discontinuous
Galerkin finite element method simulator
for room acoustics . . . . . . . . . . . 154--174
Hyun-Gyu Kang and
Raymond S. Tuminaro and
Andrey Prokopenko and
Seth R. Johnson and
Andrew G. Salinger and
Katherine J. Evans An implicit barotropic mode solver for
MPAS-ocean using a modern Fortran solver
interface . . . . . . . . . . . . . . . 175--191
Peter Munch and
Martin Kronbichler Cache-optimized and low-overhead
implementations of additive Schwarz
methods for high-order FEM multigrid
computations . . . . . . . . . . . . . . 192--209
Kadir Akbudak Hypergraph-based locality-enhancing
methods for graph operations in Big Data
applications . . . . . . . . . . . . . . 210--224
Yuxi Hong and
Hatem Ltaief and
Matteo Ravasi and
David Keyes High performance computing seismic
redatuming by inversion with algebraic
compression and multiple precisions . . 225--244
Bingxin Wei and
Yizhuo Wang and
Fangli Chang and
Jianhua Gao and
Weixing Ji Predicting optimal sparse general
matrix--matrix multiplication algorithm
on GPUs . . . . . . . . . . . . . . . . 245--259
John J. Loffeld and
Andy Nonaka and
Daniel R. Reynolds and
David J. Gardner and
Carol S. Woodward Performance of explicit and IMEX MRI
multirate methods on complex reactive
flow problems within modern parallel
adaptive structured grid frameworks . . 263--281
Daniel S. Abdi and
Isidora Jankov Accelerating atmospheric physics
parameterizations using graphics
processing units . . . . . . . . . . . . 282--296
Hiroyuki Ootomo and
Katsuhisa Ozaki and
Rio Yokota DGEMM on integer matrix multiplication
unit . . . . . . . . . . . . . . . . . . 297--313
Qianxiang Ma and
Rio Yokota An inherently parallel $ {\cal H}^2$-ULV
factorization for solving dense linear
systems on GPUs . . . . . . . . . . . . 314--336
Misun Min and
Michael Brazell and
Ananias Tomboulides and
Matthew Churchfield and
Paul Fischer and
Michael Sprague Towards exascale for wind energy
simulations . . . . . . . . . . . . . . 337--355
Mrinalgouda Patil and
Ravi Lumba and
Buvana Jayaraman and
Anubhav Datta An integrated three-dimensional
aeromechanical analysis for the
prediction of stresses on modern coaxial
rotors . . . . . . . . . . . . . . . . . 356--376
Anonymous Retraction Notice: Azzam Haidar and
Tingxing Dong and Piotr Luszczek and
Stanimire Tomov and Jack Dongarra,
\booktitleBatched matrix computations on
hardware accelerators based on GPUs,
Int. J. High Perform. Comput. Appl. \bf
29(2) 193--208 (2015) . . . . . . . . . 377
Michael A. Heroux and
Lois Curfman McInnes and
James Ahrens and
Todd Gamblin and
Timothy C. Germann and
Xiaoye Sherry Li and
Kathryn Mohror and
Todd Munson and
Sameer Shende and
Rajeev Thakur and
Jeffrey Vetter and
James Willenbring ECP libraries and tools: an overview . . 381--408
Pedro Valero-Lara and
Seyong Lee and
Marc Gonzalez-Tallada and
Joel Denny and
Keita Teranishi and
Jeffrey S. Vetter Enhancing Kokkos with OpenACC . . . . . 409--426
Joel E. Denny and
Seyong Lee and
Pedro Valero-Lara and
Marc Gonzalez-Tallada and
Keita Teranishi and
Jeffrey S. Vetter Clacc: OpenACC for C/C++ in Clang . . . 427--446
Julian Andrej and
Nabil Atallah and
Jan-Phillip Bäcker and
Jean-Sylvain Camier and
Dylan Copeland and
Veselin Dobrev and
Yohann Dudouit and
Tobias Duswald and
Brendan Keith and
Dohyun Kim and
Tzanio Kolev and
Boyan Lazarov and
Ketan Mittal and
Will Pazner and
Socratis Petrides and
Syun'ichi Shiraiwa and
Mark Stowell and
Vladimir Tomov High-performance finite elements with
MFEM . . . . . . . . . . . . . . . . . . 447--467
Ahmad Abdelfattah and
Natalie Beams and
Robert Carson and
Pieter Ghysels and
Tzanio Kolev and
Thomas Stitt and
Arturo Vargas and
Stanimire Tomov and
Jack Dongarra MAGMA: Enabling exascale performance
with accelerated BLAS and LAPACK for
diverse GPU architectures . . . . . . . 468--490
David E. Bernholdt and
George Bosilca and
Aurelien Bouteiller and
Ron Brightwell and
Jan Ciesko and
Matthew GF Dosanjh and
Giorgis Georgakoudis and
Ignacio Laguna and
Scott Levy and
Thomas Naughton and
Stephen L. Olivier and
Howard P. Pritchard and
Whit Schonbein and
Joseph Schuchart and
Amir Shehata Taking the MPI standard and the open MPI
library to exascale . . . . . . . . . . 491--507
Kenneth Moreland and
Tushar M. Athawale and
Vicente Bolea and
Mark Bolstad and
Eric Brugger and
Hank Childs and
Axel Huebl and
Li-Ta Lo and
Berk Geveci and
Nicole Marsaglia and
Sujin Philip and
David Pugmire and
Silvio Rizzi and
Zhe Wang and
Abhishek Yenpure Visualization at exascale: Making it all
work with VTK-m . . . . . . . . . . . . 508--526
Hui Zhou and
Ken Raffenetti and
Yanfei Guo and
Thomas Gillis and
Robert Latham and
Rajeev Thakur Designing and prototyping extensions to
the Message Passing Interface in MPICH 527--545
Mahesh Lakshminarasimhan and
Oscar Antepara and
Tuowen Zhao and
Benjamin Sepanski and
Protonu Basu and
Hans Johansen and
Mary Hall and
Samuel Williams Bricks: a high-performance portability
layer for computations on
block-structured grids . . . . . . . . . 549--567
Terry Cojean and
Pratik Nayak and
Tobias Ribizel and
Natalie Beams and
Yu-Hsiang Mike Tsai and
Marcel Koch and
Fritz Göbel and
Thomas Grützmacher and
Hartwig Anzt Ginkgo --- a math library designed to
accelerate Exascale Computing Project
science applications . . . . . . . . . . 568--584
Wajih Boukaram and
Yuxi Hong and
Yang Liu and
Tianyi Shi and
Xiaoye S. Li Batched sparse direct solver design and
evaluation in SuperLU_DIST . . . . . . . 585--598
Andrew Myers and
Weiqun Zhang and
Ann Almgren and
Thierry Antoun and
John Bell and
Axel Huebl and
Alexander Sinn AMReX and pyAMReX: Looking beyond the
exascale computing project . . . . . . . 599--611
Laksono Adhianto and
Jonathon Anderson and
Robert Matthew Barnett and
Dragana Grbic and
Vladimir Indic and
Mark Krentel and
Yumeng Liu and
Sr\dbaran Milakovi\'c and
Wileam Phan and
John Mellor-Crummey Refining HPCToolkit for application
performance analysis at exascale . . . . 612--632
Hengrui Luo and
Younghyun Cho and
James W. Demmel and
Igor Kozachenko and
Xiaoye S. Li and
Yang Liu Non-smooth Bayesian optimization in
tuning scientific applications . . . . . 633--657
Weijian Zheng and
Jack Kordas and
Tyler J. Skluzacek and
Raj Kettimuthu and
Ian Foster Globus service enhancements for exascale
applications and facilities . . . . . . 658--670
Piotr Luszczek and
Anthony Castaldo and
Yaohung M. Tsai and
Daniel Mishler and
Jack Dongarra Numerical eigen-spectrum slicing,
accurate orthogonal eigen-basis, and
mixed-precision eigenvalue refinement
using OpenMP data-dependent tasks and
accelerator offload . . . . . . . . . . 671--691
Mark Gates and
Ahmad Abdelfattah and
Kadir Akbudak and
Mohammed Al Farhan and
Rabab Alomairy and
Daniel Bielich and
Treece Burgess and
Sébastien Cayrols and
Neil Lindquist and
Dalal Sukkari and
Asim YarKhan Evolution of the SLATE linear algebra
library . . . . . . . . . . . . . . . . 3--17
Lisa Claus and
Pieter Ghysels and
Wajih Halim Boukaram and
Xiaoye Sherry Li A graphics processing unit accelerated
sparse direct solver and preconditioner
with block low rank compression . . . . 18--31
James Ahrens and
Marco Arienti and
Utkarsh Ayachit and
Janine Bennett and
Roba Binyahib and
Ayan Biswas and
Peer-Timo Bremer and
Eric Brugger and
Roxana Bujack and
Hamish Carr and
Jieyang Chen and
Hank Childs and
Soumya Dutta and
Abdelilah Essiari and
Berk Geveci and
Cyrus Harrison and
Subhashis Hazarika and
Megan Hickman Fulp and
Petar Hristov and
Xuan Huang and
Joseph Insley and
Yuya Kawakami and
Chloe Keilers and
James Kress and
Matthew Larsen and
Dan Lipsa and
Meghanto Majumder and
Nicole Marsaglia and
Victor A. Mateevitsi and
Valerio Pascucci and
John Patchett and
Saumil Patel and
Steve Petruzza and
David Pugmire and
Silvio Rizzi and
David H. Rogers and
Oliver Rübel and
Jorge Salinas and
Sudhanshu Sane and
Sergei Shudler and
Alexandra Stewart and
Karen Tsai and
Terece L. Turton and
Will Usher and
Zhe Wang and
Gunther H. Weber and
Corey Wetterer-Nelson and
Jonathan Woodring and
Abhishek Yenpure The ECP ALPINE project: In situ and post
hoc visualization infrastructure and
analysis capabilities for exascale . . . 32--51
Logan Ward and
J. Gregory Pauloski and
Valerie Hayot-Sasson and
Yadu Babuji and
Alexander Brace and
Ryan Chard and
Kyle Chard and
Rajeev Thakur and
Ian Foster Employing artificial intelligence to
steer exascale workflows with Colmena 52--64
M Scot Breitenfeld and
Houjun Tang and
Huihuo Zheng and
Jordan Henderson and
Suren Byna HDF5 in the exascale era: Delivering
efficient and scalable parallel I/O for
exascale applications . . . . . . . . . 65--78
Xingfu Wu and
John R. Tramm and
Jeffrey Larson and
John-Luke Navarro and
Prasanna Balaprakash and
Brice Videau and
Michael Kruse and
Paul Hovland and
Valerie Taylor and
Mary Hall Integrating ytopt and libEnsemble to
autotune OpenMC . . . . . . . . . . . . 79--103
Peter Lindstrom and
Jeffrey Hittinger and
James Diffenderfer and
Alyson Fox and
Daniel Osei-Kuffuor and
Jeffrey Banks ZFP: a compressed array representation
for numerical computations . . . . . . . 104--122
Cody J. Balos and
Marcus Day and
Lucas Esclapez and
Anne M. Felden and
David J. Gardner and
Malik Hassanaly and
Daniel R. Reynolds and
Jon S. Rood and
Jean M. Sexton and
Nicholas T. Wimer and
Carol S. Woodward SUNDIALS time integrators for exascale
applications with many independent
systems of ordinary differential
equations . . . . . . . . . . . . . . . 123--146
Aurelien Bouteiller and
Thomas Herault and
Qinglei Cao and
Joseph Schuchart and
George Bosilca PaRSEC: Scalability, flexibility, and
hybrid architecture support for
task-based applications in ECP . . . . . 147--166
Andrey Prokopenko and
Daniel Arndt and
Damien Lebrun-Grandié and
Bruno Turcksin and
Nicholas Frontiere and
J. D. Emberson and
Michael Buehlmann Advances in ArborX to support exascale
applications . . . . . . . . . . . . . . 167--176
Stephen Hudson and
Jeffrey Larson and
John-Luke Navarro and
Stefan M. Wild Portable, heterogeneous ensemble
workflows at scale using libEnsemble . . 177--192
Roxana Bujack and
Maya Gokhale and
Latchesar Ionkov and
Keita Iwabuchi and
Michael Jantz and
Terry Jones and
Sumathi Lakshmiranganatha and
Michael K. Lang and
Jason Lee and
M. Ben Olson and
Scott Pakin and
Roger Pearce and
Jonathan Pietarila Graham and
Li Tang and
Terece L. Turton and
Sean Williams The ECP SICM project: Managing complex
memory hierarchies for exascale
applications . . . . . . . . . . . . . . 193--207