Last update:
Wed Oct 8 06:46:45 MDT 2025
Anonymous Important announcement . . . . . . . . . 1--1
Anonymous Editorial: a journal transformed . . . . 3--4
Edward J. Krall and
Patrick F. McGehearty A case study of parallel execution of a
rule-based expert system . . . . . . . . 5--32
Vaughan R. Pratt Modeling concurrency with partial orders 33--71
S. Kasif Control and data driven execution of
logic programs: a comparison . . . . . . 73--99
Parallax How are parallel systems invented? . . . 101--102
Paul R. Hudak The Denotational Semantics of a
Para-Functional Programming Language . . 103--125
Guang R. Gao Maximum pipelining linear recurrence on
static data flow computers . . . . . . . 127--149
Donald M. Chiarulli and
Duncan A. Buell Parallel microprogramming tools for a
horizontally reconfigurable architecture 151--162
D. Nau and
P. Purdom and
Chun-Hung Tzeng Experiments on alternatives to minimax 163--183
Parallax When is pull better than push? (parallel
programming) . . . . . . . . . . . . . . 185--188
Khayri A. M. Ali OR-parallel execution of PROLOG on a
multi-sequential machine . . . . . . . . 189--214
Bharat Jayaraman and
Robert M. Keller Primitives for resource management in a
demand-driven reduction model . . . . . 215--244
S. Taylor and
S. Safra and
E. Shapiro A parallel implementation of Flat
Concurrent Prolog . . . . . . . . . . . 245--275
Parallax The bards on parallel programming . . . 277--277
Michael Wolfe Loops skewing: The wavefront method
revisited . . . . . . . . . . . . . . . 279--293
Eugene D. Brooks, II The butterfly barrier (multiprocessing) 295--307
Alan George and
Michael T. Heath and
Joseph Liu and
Esmond Ng Solution of sparse positive definite
systems on a shared-memory
multiprocessor . . . . . . . . . . . . . 309--325
S. P. Rana and
D. K. Banerji An optimal distributed solution to the
dining philosophers problem . . . . . . 327--335
Anonymous Hotspotting . . . . . . . . . . . . . . 337--337
Khayri A. M. Ali and
Seif Haridi Global garbage collection for
distributed heap storage systems . . . . 339--387
Hossam El-Gindy An optimal speed-up parallel algorithm
for triangulating simplicial point sets
in space . . . . . . . . . . . . . . . . 389--398
Ed Merks An Optimal Parallel Algorithm for
Triangulating a Set of Points in the
Plane . . . . . . . . . . . . . . . . . 399--411
B. Gro\vselj and
C. Tropper Pseudosimulation: an algorithm for
distributed simulation with limited
memory . . . . . . . . . . . . . . . . . 413--456
Anonymous The church of the least fixed point . . 457--457
Robert H. Halstead, Jr. An Assessment of Multilisp --- Lessons
from Experience . . . . . . . . . . . . 459--501
Eliezer Dekel and
Shietung Peng and
S. Sitharma Lyengar Optimal parallel algorithms for
constructing and maintaining a balanced
$m$-way search tree . . . . . . . . . . 503--528
Virgilio A. F. Almeida and
Lawrence W. Dowdy Performance analysis of a scheme for
concurrency/synchronization using
queueing network models . . . . . . . . 529--550
Venkatramana G. Ajjanagadde and
L. M. Patnaik Systolic Architecture for B-Spline
Surfaces . . . . . . . . . . . . . . . . 551--565
Gary Lindstrom Sans pareil: Referees . . . . . . . . . 567--568
Shlomit S. Pinter and
Yaron Wolfstahl On mapping processes to processors in
distributed systems . . . . . . . . . . 1--15
Kristine Stougaard Thomsen Inheritance on processes, exemplified on
distributed termination detection . . . 17--52
E. P. DeBenedictis A Multiprocessor Using Protocol-Based
Programming Primitives . . . . . . . . . 53--84
Anonymous Amdahl's law . . . . . . . . . . . . . . 85--85
Ian Foster and
Stephen Taylor Flat Parlog: a basis for comparison . . 87--125
Henk Meijer and
Selim G. Akl Optimal computation of prefix sums on a
binary tree of processors . . . . . . . 127--136
Michael Wolfe and
Utpal Banerjee Data dependence and its application to
parallel processing . . . . . . . . . . 137--178
Anonymous Isomorphic Computers Inc.: With
Isomorphic Computers, more is more\TM 179--182
Adolfo Guzman and
Edward J. Krall and
Patrick F. McGehearty and
Nader Bagherzadeh Performance of symbolic applications on
a parallel architecture . . . . . . . . 183--214
Richard M. Fujimoto and
Hwa-chung Feng A shared memory algorithm and proof for
the generalized alternative construct in
CSP . . . . . . . . . . . . . . . . . . 215--241
R. L. Wainwright Deriving parallel computations from
functional specifications: a seismic
example on a hypercube . . . . . . . . . 243--260
Anonymous Systolic processing . . . . . . . . . . 261--261
Nissim Francez and
Shmuel Katz Fairness and the axioms of control
predicates . . . . . . . . . . . . . . . 263--278
Frances E. Hunt Experiments with applicative updating:
practical results . . . . . . . . . . . 279--303
E. Bradley and
R. H. Halstead, Jr. Simulating logic circuits: a
multiprocessor application . . . . . . . 305--338
Anonymous Connectionism . . . . . . . . . . . . . 339--339
Ashok Samal and
Tom Henderson Parallel Consistent Labeling Algorithms 341--364
Charles Koelbel and
Piyush Mehrotra and
John Van Rosendale Semi-automatic process partitioning for
parallel computation . . . . . . . . . . 365--382
Michael G. Main Trace, failure and testing equivalences
for communicating processes . . . . . . 383--400
A. Davison Blackboard systems in Polka . . . . . . 401--424
Anonymous Fixpoints in Daily Life . . . . . . . . 425--425
John R. Gilbert and
Earl Zmijewski A parallel graph partitioning algorithm
for a message-passing multiprocessor . . 427--449
Pierpaolo Degano and
Sergio Marchetti Partial ordering models for concurrency
can be defined operationally . . . . . . 451--478
V. Nageshwara Rao and
Vipin Kumar Parallel depth first search. Part I.
Implementation . . . . . . . . . . . . . 479--499
Vipin Kumar and
V. Nageshwara Rao Parallel depth first search. Part II.
Analysis . . . . . . . . . . . . . . . . 501--519
Gary Lindstrom Sans pareil: Referees . . . . . . . . . 521--522
Debra Hensgen and
Raphael Finkel and
Udi Manber Two algorithms for barrier
synchronization . . . . . . . . . . . . 1--17
Patrick Valduriez and
Setrag Khoshfian Parallel evaluation of the transitive
closure of a database relation . . . . . 19--42
Stephen L. Stepoway and
Michael Christiansen Parallel Rendering of Fractal Surfaces 43--58
P. A. Tinker Performance of an OR-parallel logic
programming system . . . . . . . . . . . 59--92
Gary Lindstrom Sage commentary . . . . . . . . . . . . 93--93
Anoop Gupta and
Milind Tambe and
Dirk Kalp and
Charles Forgy and
Allen Newell Parallel implementation of OPS5 on the
Encore multiprocessor: results and
analysis . . . . . . . . . . . . . . . . 95--124
John S. Conery Binding environments for parallel logic
programs in non-shared memory
multiprocessors . . . . . . . . . . . . 125--152
Rance Cleaveland and
Prakash Panangaden Type theory and concurrency . . . . . . 153--206
Z. Somogyi and
K. Ramamohanarao and
J. Vaghani A backtracking algorithm for the stream
AND-parallel execution of logic programs 207--257
Elizabeth W. Edmiston and
Nolan G. Core and
Joel H. Saltz and
Roger M. Smith Parallel processing of biological
sequence comparison algorithms . . . . . 259--275
V. K. Janakiram and
E. F. Gehringer and
D. P. Agrawal and
Mehrotra and
R. A randomized parallel branch-and-bound
algorithm . . . . . . . . . . . . . . . 277--301
Carla Schlatter Ellis and
Thomas J. Olson Algorithms for parallel memory
allocation . . . . . . . . . . . . . . . 303--345
Mark T. Vandevoorde and
Eric S. Roberts WorkCrews: an abstraction for
controlling parallelism . . . . . . . . 347--366
James S. Miller Implementing a Scheme-Based Parallel
Processing System . . . . . . . . . . . 367--402
G. Cybenko and
T. G. Allen and
J. E. Polito Practical Parallel Union-Find Algorithms
for Transitive Closure and Clustering 403--423
Benjamin Goldberg Multiprocessor execution of functional
programs . . . . . . . . . . . . . . . . 425--473
Lionel M. Ni and
Chung-Ta King On partitioning and mapping for
hypercube computing . . . . . . . . . . 475--495
Jim Crammond A Garbage Collection Algorithm for
Shared Memory Parallel Processors . . . 497--522
Michael J. Swain Comments on A. Samal and T. Henderson:
``Parallel consistent labeling
algorithms'' [Internat. J. Parallel
Programming \bf 16 (1987), no. 5,
341--364] . . . . . . . . . . . . . . . 523--528
Gary Linstrom Sans pareil: referees . . . . . . . . . 529--530
Anne Neirynck and
Prakash Panangaden and
Alan J. Demers Effect analysis in higher-order
languages . . . . . . . . . . . . . . . 1--36
Ran Ginosar and
David Egozi Topological comparison of perfect
shuffle and hypercube . . . . . . . . . 37--68
David M. Nicol and
Joel H. Saltz and
James C. Townsend Delay Point Schedules for Irregular
Parallel Computations . . . . . . . . . 69--90
Kee-Hyun Park and
Lawrence W. Dowdy Dynamic partitioning of multiprocessor
systems . . . . . . . . . . . . . . . . 91--120
Alessandro Giacalone and
Prateek Mishra and
Sanjiva Prasad FACILE: a Symmetric Integration of
Concurrent and Functional Programming 121--160
Rajiv Gupta and
Charles R. Hill A Scalable Implementation of Barrier
Synchronization Using An Adaptive
Combining Tree . . . . . . . . . . . . . 161--180
Ian Foster A Multicomputer Garbage Collector for a
Single Assignment Language . . . . . . . 181--203
Yi Xin Zhang Parallel algorithms for minimal spanning
trees of directed graphs . . . . . . . . 205--221
Xiaoqiu Huang A space--efficient parallel sequence
comparison algorithm for a
message--passing multiprocessor . . . . 223--239
David Hemmendinger Initializing memory shared by several
processors . . . . . . . . . . . . . . . 241--253
Gadi Taubenfeld and
Shmuel Katz and
Shlomo Moran Initial failures in distributed
computations . . . . . . . . . . . . . . 255--276
Jason Gait Speedup and optimality in pipeline
programs . . . . . . . . . . . . . . . . 277--290
G. A. Geist and
E. Ng Task scheduling for parallel sparse
Cholesky factorization . . . . . . . . . 291--314
Jeannette M. Wing Verifying atomic data types . . . . . . 315--357
Selim G. Akl and
Frank Dehne Pipelined search on coarse grained
networks . . . . . . . . . . . . . . . . 359--364
Juanito Camilleri An Operational Semantics for occam . . . 365--400 (or 149--167??)
Arvind K. Bansal and
Leon S. Sterling Transforming generate-and-test programs
to execute under committed-choice
AND-parallelism . . . . . . . . . . . . 401--446
Ambuj K. Singh and
Ross Overbeek Derivation of Efficient Parallel
Programs: an Example From Genetic
Sequence Analysis . . . . . . . . . . . 447--484
Frederick Springsteel and
Ivan Stojmenovi\'c Parallel general prefix computations
with geometric, algebraic, and other
applications . . . . . . . . . . . . . . 485--503
Woei-Kae Chen and
Matthias F. M. Stallmann and
Edward F. Gehringer Hypercube embedding heuristics: an
evaluation . . . . . . . . . . . . . . . 505--549
Gary Lindstrom Sans pareil: Referees . . . . . . . . . 551--552
John H. Reif and
Scott A. Smolka Data flow analysis of distributed
communicating processes . . . . . . . . 1--30
Russell M. Clapp and
Trevor N. Mudge and
Donald C. Winsor Cache Coherence Requirements for
Interprocess Rendezvous . . . . . . . . 31--51
Rajiv Gupta and
Michael Epstein High Speed Synchronization of Processors
Using Fuzzy Barriers . . . . . . . . . . 53--73
Duane A. Bailey and
Janice E. Cuny and
Craig P. Loomis ParaGraph: Graph editor support for
parallel programming environments . . . 75--110
Raymond Greenlaw and
Lawrence Snyder Achieving speedups for APL on an SIMD
distributed memory machine . . . . . . . 111--127
Khayri A. M. Ali and
Roland Karlsson The Muse Approach to OR-Parallel Prolog 129--162 (or 129--160??)
Manuel E. Bermudez and
Richard Newman-Wolfe and
George Logothetis Parallel Construction of SLR(1) and
LALR(1) Parsers . . . . . . . . . . . . 163--184
Soumitra Sengupta and
Arthur J. Bernstein Concurrency Control Optimizations in a
Prolog Database . . . . . . . . . . . . 185--211
Frank Dehne and
Quoc T. Pham and
Ivan Stojmenovi\'c Optimal Visibility Algorithms for Binary
Images on the Hypercube . . . . . . . . 213--224
Boris D. Lubachevsky Synchronization Barrier and Related
Tools for Shared Memory Parallel
Programming . . . . . . . . . . . . . . 225--250
L. V. Kalé and
Vikram A. Saletore Parallel State-Space Search for a First
Solution with Consistent Linear Speedups 251--293
Oscar H. Ibarra and
Michael A. Palis An Efficient All-Parses Systolic
Algorithm for General Context-free
Parsing . . . . . . . . . . . . . . . . 295--331
Laurent Langlois Systolic Parsing of Context-free
Languages . . . . . . . . . . . . . . . 333--355
Carole M. McNamee and
Ronald A. Olsson Transformations for optimizing
interprocess communication and
synchronization mechanisms . . . . . . . 357--387
Rok Sosic and
Richard F. Riesenfeld Parallel Algorithms for Line Generation 389--404
Douglas M. Blough and
Nader Bagherzadeh Near-Optimal Message Routing and
Broadcasting in Faulty Hypercubes . . . 405--423
E. Tick Execution Characteristics of Layered
Streams . . . . . . . . . . . . . . . . 425--443
Khayri A. M. Ali and
Roland Karlsson Full Prolog and Scheduling
OR-Parallelism in Muse . . . . . . . . . 445--475
Michael D. Rice Semantics for Data Parallel Computation 477--509
Gary Lindstrom Sans pareil: Referees . . . . . . . . . 511--512
Manfred Broy and
Thomas Streicher Specification and Design of Shared
Resource Arbitration . . . . . . . . . . 1--22
Paul Feautrier Dataflow Analysis of Array and Scalar
References . . . . . . . . . . . . . . . 23--53 (or 23--52??)
Mike Livesey A Network Model of Barrier
Synchronization Algorithms . . . . . . . 55--74
R. Mall and
L. M. Patnaik Formal Timing Analysis of Distributed
Systems . . . . . . . . . . . . . . . . 75--94
V. Singh and
V. Kumar and
G. Agha and
C. Tomlinson Efficient Algorithms for Parallel
Sorting on Mesh Multicomputers . . . . . 95--131
D. B. Skillicorn Models for Practical Parallel
Computation . . . . . . . . . . . . . . 133--158
Kai Li and
Jeffrey F. Naughton and
James S. Plank An Efficient Checkpointing Method for
Multicomputers with Wormhole Routing . . 159--180
Carole M. McNamee and
Ronald A. Olsson An Attribute Grammar Approach to
Compiler Optimization of IntraModule
Interprocess Communication . . . . . . . 181--202
Gurdip Singh and
Arthur J. Bernstein On the Relative Execution Times of
Distributed Protocols . . . . . . . . . 203--235
Virginia M. Lo and
Sanjay Rajopadhye and
Samik Gupta and
David Keldsen and
Moataz A. Mohamed and
Bill Nitzberg and
Jan Arne Tell and
Xiaoxiong Zhong OREGAMI: Tools for mapping parallel
computations to parallel architectures 237--270
P. Adamson and
E. Tick Greedy Partitioned Algorithms for the
Shortest-Path Problem . . . . . . . . . 271--298
Matthew Huntbach Parallel Branch-and-Bound Search in
Parlog . . . . . . . . . . . . . . . . . 299--314
Zheng Lin A Distributed Fair Polling Scheme
Applied to OR-Parallel Logic Programming 315--339
Mohammad Ashraf Iqbal Approximate Algorithms for Partitioning
Problems . . . . . . . . . . . . . . . . 341--361
Calvin Lin and
Lawrence Snyder A Portable Implementation of SIMPLE . . 363--401
Amitabha Das and
Louise E. Moser and
P. M. Melliar-Smith A Parallel Sorting Algorithm for a Novel
Model of Computation . . . . . . . . . . 403--419
Andrzej Ciepielewski Scheduling in OR-parallel Prolog
systems: survey and open problems . . . 421--451
Steven Y. Susswein and
Thomas C. Henderson and
Joseph L. Zachary and
Chuck Hansen and
Paul Hinker and
Gary C. Marsden Parallel Path Consistency . . . . . . . 453--473
Frank Dehne and
Russ Miller and
Andrew Rau Chaplin Optical Clustering on a Mesh-Connected
Computer . . . . . . . . . . . . . . . . 475--486
Gary Lindstorm Sans pareil: Referees . . . . . . . . . 487--488
Michael A. Palis and
David S. L. Wei Parallel Parsing of Tree Adjoining
Grammars on the Connection Machine . . . 1--38
Stephen A. Schwab Extended parallelism in the Gröbner basis
algorithm . . . . . . . . . . . . . . . 39--66
Balkrishna Ramkumar and
Laxmikant V. Kalé A Join Algorithm for Combining AND
Parallel Solutions in AND/OR Parallel
Systems . . . . . . . . . . . . . . . . 67--107
Dilip Sarkar and
Ivan Stojmenovi\'c Parallel Algorithms for Separation of
Two Sets of Points and Recognition of
Digital Convex Polygons . . . . . . . . 109--121
Xining Li and
John Cleary and
Brian Unger Virtual Time and Virtual Space . . . . . 123--150
Michael A. Palis and
Sunil M. Shende An NC Algorithm for Recognizing Tree
Adjoining Languages . . . . . . . . . . 151--167
Rajiv Gupta and
Sunah Lee Exploiting Parallelism on a Fine-Grained
MIMD Architecture Based Upon Channel
Queues . . . . . . . . . . . . . . . . . 169--192
Ling-Yu Chuang and
Vernon Rego and
Aditya Mathur An application of program unification to
priority queue vectorization . . . . . . 193--224
R. Govindarajan and
S. Yu and
V. S. Lakshmanan Attempting Guards in Parallel: a Data
Flow Approach to Execute Generalized
Guarded Commands . . . . . . . . . . . . 225--268
Ouri Wolfson and
Weining Zhang and
Harish Butani and
Akira Kawaguchi and
Mok Kui Parallel Processing of Graph
Reachability in Databases . . . . . . . 269--302
Alan P. Sprague A Parallel Algorithm to Construct a
Dominance Graph on Non-overlapping
Rectangles . . . . . . . . . . . . . . . 303--312
Paul Feautrier Some efficient solutions to the affine
scheduling problem. I. One-dimensional
time . . . . . . . . . . . . . . . . . . 313--347
W. Loots and
T. H. C. Smith A parallel algorithm for the $0$-$1$
knapsack problem . . . . . . . . . . . . 349--362
Bradley K. Seevers and
Michael J. Quinn and
Philip J. Hatcher A Parallel Programming Environment
Supporting Multiple Data-Parallel
Modules . . . . . . . . . . . . . . . . 363--386
Anonymous Important announcement to subscribers 387--387
Paul Feautrier Some Efficient Solutions to the Affine
Scheduling Problem. Part II.
Multidimensional Time . . . . . . . . . 389--420
Qi Ning and
Guang R. Gao Optimal Loop Storage Allocation for
Argument-Fetching Dataflow Machines . . 421--448
Khayri A. M. Ali and
Roland Karlsson Scheduling Speculative Work in MUSE and
Performance Results . . . . . . . . . . 449--476
Gary Lindstrom Referees and Valedictory . . . . . . . . 477--479
Gordon Bell Scalable, Parallel Computers:
Alternatives, Issues, and Challenges . . 3--46
Jack B. Dennis Machines and Models for Parallel
Computing . . . . . . . . . . . . . . . 47--77
Ken Kennedy Compiler technology for
machine-independent parallel programming 79--98
David J. Kuck What Do Users of Parallel Computer
Systems Really Need? . . . . . . . . . . 99--127
Nicholas Carriero and
David Gelernter Case studies in asynchronous data
parallelism . . . . . . . . . . . . . . 129--149
William Y. Chen and
Scott A. Mahlke and
Nancy J. Warter and
Sadun Anik and
Wen-Mei W. Hwu Profile-assisted instruction scheduling 151--181
Wei Li and
Keshav Pingali A singular loop transformation framework
based on non-singular matrices . . . . . 183--205
Wen-Mei Hwu and
Alex Nicolau From the Guest Editors . . . . . . . . . 207
Walid A. Najjar and
Lucas Roh and
A. P. Wim Böhm An Evaluation of Medium-Grain Dataflow
Code . . . . . . . . . . . . . . . . . . 209--242
Gary Tyson and
Matthew Farrens Code Scheduling for Multiple Instruction
Stream Architectures . . . . . . . . . . 243--272
M. Rajagopalan and
V. H. Allan Specification of Software Pipelining
Using Petri Nets . . . . . . . . . . . . 273--301
Mark R. Gilder and
Mukkai S. Krishnamoorthy Automatic Source-Code Parallelization
Using HICOR Objects . . . . . . . . . . 303--350
Jian Wang and
Christine Eisenbeis and
Martin Jourdan and
Bogong Su Decomposed software pipelining: a new
perspective and a new approach . . . . . 351--373
Yosi Ben-Asher and
Eitan Farchi Using True Concurrency to Model
Execution of Parallel Programs . . . . . 375--407
Feipei Lai and
Yung-kuang Chao and
Chia-Jung Hsieh The Complementary Relationship of
Interprocedural Register Allocation and
Inlining . . . . . . . . . . . . . . . . 409--434
M. K. Stoj\vcev and
E. I. Milovanovi\'c and
I. \vZ. Milovanovi\'c An Optimal Scheduling Procedure for
Matrix Inversion on Linear Array at a
Processor Level . . . . . . . . . . . . 435--448
Michael L. Scott and
John M. Mellor-Crummey Fast, Contention-Free Combining Tree
Barriers for Shared-Memory
Multiprocessors . . . . . . . . . . . . 449--481
Utpal Banerjee Editor's Introduction . . . . . . . . . 483
Larry Carter and
Jeanne Ferrante and
Vasanth Bala XDP: a Compiler Intermediate Language
Extension for the Representation and
Optimization of Data Movement . . . . . 485--518
Milind Girkar and
Constantine D. Polychronopoulos The Hierarchical Task Graph as a
Universal Intermediate Representation 519--551
Keith A. Faigin and
Stephen A. Weatherford and
Jay P. Hoeflinger and
David A. Padua and
Paul M. Petersen The Polaris Internal Representation . . 553--586
Jie Liu and
Vikram A. Saletore and
Ted G. Lewis Safe Self-Scheduling: a Parallel Loop
Scheduling Scheme for Shared-Memory
Multiprocessors . . . . . . . . . . . . 589--616
Theodore Johnson Parallel-Access Memory Management Using
Fast-Fits . . . . . . . . . . . . . . . 617--648
Shlomit S. Pinter Introduction . . . . . . . . . . . . . . 3
Nicholas Carriero and
David Gelernter and
Marc Jourdenais and
David Kaminsky Piranha Scheduling: Strategies and Their
Implementation . . . . . . . . . . . . . 5--33
Steven Novack and
Alexandru Nicolau A Hierarchical Approach to
Instruction-level Parallelization . . . 35--62
Dror E. Maydan and
John L. Hennessy and
Monica S. Lam Effectiveness of Data Dependence
Analysis . . . . . . . . . . . . . . . . 63--81
David Bernstein and
Mauricio Breternitz, Jr. and
Ahmed M. Gheith and
Bilha Mendelson Solutions and Debugging for Data
Consistency in Multiprocessors with
Noncoherent Caches . . . . . . . . . . . 83--103
David Abramson and
A. McKay Evaluating the Performance of a SISAL
Implementation of the Abingdon Cross
Image Processing Benchmark . . . . . . . 105--134
Dror G. Feitelson and
Larry Rudolph Coscheduling Based on Runtime
Identification of Activity Working Sets 135--160
Wei-Ming Lin and
Bo Yang Probabilistic Performance Analysis for
Parallel Search Techniques . . . . . . . 161--189
Jean-François Collard Automatic Parallelization of while-Loops
Using Speculative Execution . . . . . . 191--219
Stephen Melvin and
Yale Patt Enhancing Instruction Scheduling with a
Block-Structured ISA . . . . . . . . . . 221--243
Heng-Yi Chao and
Mary P. Harper Minimizing Redundant Dependencies and
Interprocessor Synchronizations . . . . 245--262
Elana D. Granston and
Thierry Montaut and
François Bodin Loop Transformations to Prevent False
Sharing . . . . . . . . . . . . . . . . 263--301
Wayne Kelly and
William Pugh Using Affine Closure to Find Legal
Reordering Transformations . . . . . . . 303--325
Eric Stoltz and
Michael Wolfe Detecting Value-Based Scalar Dependence 327--358
Yi-Qing Yang and
Corinne Ancourt and
François Irigoin Minimal Data Dependence Abstractions for
Loop Transformations: Extended Version 359--388
Yosi Ben-Asher and
Gudula Runger and
Assaf Schuster and
Reinhard Wilhelm 2DT-FP: a Parallel Functional
Programming Language on Two-Dimensional
Data . . . . . . . . . . . . . . . . . . 389--422
Elana D. Granston and
Alexander V. Veidenbaum Combining Flow and Dependence Analyses
to Expose Redundant Array Accesses . . . 423--470
Martin Griebl and
Christian Lengauer A Communication Scheme for the
Distributed Execution of Loop Nests with
while Loops . . . . . . . . . . . . . . 471--496
Mario Mango Furnari Guest Editor's Introduction . . . . . . 497
Andrea Capitanio and
Alexandru Nicolau and
Nikil Dutt A Hypergraph-Based Model for Port
Allocation on Multiple-Register-File
VLIW Architectures . . . . . . . . . . . 499--513
Eduard Ayguade and
Jesus Labarta and
Jordi Garcia and
Merce Girones and
Mateo Valero Analyzing Reference Patterns in
Automatic Data Distribution Tools . . . 515--535
Lawrence Rauchwerger and
Nancy M. Amato and
David A. Padua A Scalable Method for Run-Time Loop
Parallelization . . . . . . . . . . . . 537--576
Matthew Farrens and
Wen-mei Hwu Guest Editors' Introduction . . . . . . 1
B. Ramakrishna Rau Iterative Modulo Scheduling . . . . . . 3--64
Michael Schlansker and
Vinod Kathail and
Sadun Anik Parallelization of Control Recurrences
for ILP Processors . . . . . . . . . . . 65--102
Alexandre E. Eichenberger and
Edward S. Davidson and
Santosh G. Abraham Minimizing Register Requirements of a
Modulo Schedule via Optimum Stage
Scheduling . . . . . . . . . . . . . . . 103--132
Po-Yung Chang and
Eric Hao and
Tse-Yu Yeh and
Yale Patt Branch Classification: a New Mechanism
for Improving Branch Predictor
Performance . . . . . . . . . . . . . . 133--158
Gary Tyson and
Matthew Farrens Evaluating the Effects of Predicated
Execution on Branch Prediction . . . . . 159--186
Thomas M. Conte and
Burzin A. Patel and
Kishore N. Menezes and
J. Stan Cox Hardware-Based Profiling: an Effective
Technique for Profile-Driven
Optimization . . . . . . . . . . . . . . 187--206
Jean-Luc Gaudiot Guest Editor's Introduction . . . . . . 207
Po-Yung Chang and
Eric Hao and
Yale N. Patt and
Pohua P. Chang Using Predicated Execution to Improve
the Performance of a Dynamically
Scheduled Machine with Speculative
Execution . . . . . . . . . . . . . . . 209--234
David H. Albonesi and
Israel Koren A Mean Value Analysis Multiprocessor
Model Incorporating Super-scalar
Processors and Latency Tolerating
Techniques . . . . . . . . . . . . . . . 235--263
M. Cosnard and
M. Loi A Simple Algorithm for the Generation of
Efficient Loop Structures . . . . . . . 265--289
Dean Engelhardt and
Andrew Wendelborn A Partitioning-Independent Paradigm for
Nested Data Parallelism . . . . . . . . 291--317
Herbert H. J. Hum and
Olivier Maquelin and
Kevin B. Theobald and
Xinmin Tian and
Guang R. Gao and
Laurie J. Hendren A Study of the EARTH-MANNA Multithreaded
System . . . . . . . . . . . . . . . . . 319--348
Evan Torrie and
Margaret Martonosi and
Mary W. Hall and
Chau-Wen Tseng Memory Referencing Behavior in
Compiler-Parallelized Applications . . . 349--376
Thomas Sterling and
Daniel Savarese and
Phillip Merkey and
Kevin Olson An Empirical Evaluation of the Convex
SPP-1000 Hierarchical Shared Memory
System . . . . . . . . . . . . . . . . . 377--396
Lesley R. Matheson and
Robert E. Tarjan Parallelism in Multigrid Methods: How
Much Is Too Much? . . . . . . . . . . . 397--432
Kish Shen and
Manuel V. Hermenegildo High-Level Characteristics of OR- and
Independent AND-Parallelism in Prolog 433--478
Rastislav Bodik and
Rajiv Gupta Array Data Flow Analysis for Load-Store
Optimizations in Fine-Grain
Architectures . . . . . . . . . . . . . 481--512
Beatrice Creusillet and
François Irigoin Interprocedural Array Region Analyses 513--546
Rakesh Ghiya and
Laurie J. Hendren Connection Analysis: a Practical
Interprocedural Heap Analysis for C . . 547--578
Wayne Kelly and
William Pugh and
Evan Rosser and
Tatiana Shpeisman Transitive Closure of Infinite Graphs
and its Applications . . . . . . . . . . 579--598
Thomas J. Sheffler and
Robert Schreiber and
William Pugh and
John R. Gilbert and
Siddhartha Chatterjee Efficient Distribution Analysis via
Graph Contraction . . . . . . . . . . . 599--620
Frank Dehne and
Siang W. Song Randomized Parallel List Ranking for
Distributed Memory Multi-processors . . 1--16
Christoph W. Kessler and
Helmut Seidl The Fork95 Parallel Programming
Language: Design, Implementation,
Application . . . . . . . . . . . . . . 17--50
Kemal Ebcio\uglu and
Wen-mei Hwu Guest Editors' Introduction . . . . . . 51
Vasanth Bala and
Norman Rubin Efficient Instruction Scheduling Using
Finite State Automata . . . . . . . . . 53--82
Thomas M. Conte and
Sumedh W. Sathaye Optimization of VLIW Compatibility
Systems Employing Dynamic Rescheduling 83--112
Richard E. Hank and
Wen-mei W. Hwu and
B. Ramakrishna Rau Region-Based Compilation: Introduction,
Motivation, and Initial Experience . . . 113--146
Michael Schlansker and
Vinod Kathail Techniques for Critical Path Reduction
of Scalar Programs . . . . . . . . . . . 147--181
Marco Fillo and
Stephen W. Keckler and
William J. Dally and
Nicholas P. Carter and
Andrew Chang and
Yevgeny Gurevich and
Whay S. Lee The M-Machine Multicomputer . . . . . . 183--212
Gary Tyson and
Matthew Farrens and
John Matthews and
Andrew R. Pleszkun Managing Data Caches Using Selective
Cache Line Replacement . . . . . . . . . 213--242
Walid A. Najjar and
Gabriel M. Silberman Foreword to the Special Issues . . . . . 243
Chris J. Newburn and
John Paul Shen Post-Pass Partitioning of Signal
Processing Programs . . . . . . . . . . 245--280
Stephen Jenks and
Jean-Luc Gaudiot Exploiting Locality and Tolerating
Remote Memory Access Latency Using
Thread Migration . . . . . . . . . . . . 281--304
Laurie J. Hendren and
Xinan Tang and
Yingchun Zhu and
Shereen Ghobrial and
Guang R. Gao and
Xun Xue and
Haiying Cai and
Pierre Ouellet Compiling C for the EARTH Multithreaded
Architecture . . . . . . . . . . . . . . 305--338
Po- Yung Chang and
Marius Evers and
Yale N. Patt Improving Branch Prediction Accuracy by
Reducing Pattern History Table
Interference . . . . . . . . . . . . . . 339--362
Stephan Jourdan and
Jared Stark and
Tse-Hao Hsing and
Yale N. Patt Recovery Requirements of Branch
Prediction Storage Structures in the
Presence of Mispredicted-Path Execution 363--383
Lorenz Huelsbergen Dynamic Resolution: a Runtime Technique
for the Parallelization of Modifications
to Directed Acyclic Graphs . . . . . . . 385--417
Daeyeon Park and
Rafael H. Saavedra and
Sungdo Moon Adaptive Granularity: Transparent
Integration of Fine- and Coarse-Grain
Communication . . . . . . . . . . . . . 419--446
Alain Darte and
Frédéric Vivien Optimal Fine and Medium Grain
Parallelism Detection in Polyhedral
Reduced Dependence Graphs . . . . . . . 447--496
Catherine Mongenet Affine Dependence Classification for
Communications Minimization . . . . . . 497--524
Vincent Loechner and
Doran K. Wilde Parameterized Polyhedra and Their
Vertices . . . . . . . . . . . . . . . . 525--549
Editorial Introduction Editor's Announcement . . . . . . . . . 1--2
David Sehr Guest Editor's Introduction . . . . . . 3--4
Val Donaldson and
Jeanne Ferrante Analyzing Asynchronous Pipeline
Schedules . . . . . . . . . . . . . . . 5--42
Tito Autrey and
Michael Wolfe Initial Results for Glacial Variable
Analysis . . . . . . . . . . . . . . . . 43--64
Ajita John and
James C. Browne Compilation of constraint programs with
noncyclic and cyclic dependencies to
procedural parallel programs . . . . . . 65--119
Josep Llosa and
Eduard Ayguadé and
Mateo Valero Quantitative evaluation of register
pressure on software pipelined loops . . 121--142
Ricardo Bianchini and
Enrique V. Carrera and
Leonidas Kontothanassis Evaluating the effect of coherence
protocols on the performance of parallel
programming constructs . . . . . . . . . 143--181
John John E. So and
Thomas J. Downar and
Raghunandan Janardhan and
Howard Jay Siegel Mapping conjugate gradient algorithms
for neutron diffusion applications onto
SIMD, MIMD, and mixed-mode machines . . 183--207
Thomas Grün and
Thomas Rauber and
Jochen Röhrig Support for Efficient Programming on the
SB-PRAM . . . . . . . . . . . . . . . . 209--240
Cindy Norris and
Lori L. Pollock Experiences with cooperating register
allocation and instruction scheduling 241--283
Pierre-Yves Calland and
Alain Darte and
Yves Robert and
Frederic Vivien On the Removal of Anti- and
Output-Dependences . . . . . . . . . . . 285--312
Erik R. Altman and
Guang R. Gao Optimal Modulo Scheduling Through
Enumeration . . . . . . . . . . . . . . 313--344
Steve Beaty and
Wen-mei Hwu Foreword to the Special Issue . . . . . 345--347
Santosh G. Abraham and
Vinod Kathail and
Brian L. Deitrich Meld Scheduling: a Technique for
Relaxing Scheduling Constraints . . . . 349--381
Ashwini K. Nanda and
James O. Bondi and
Simonjit Dutta The Misprediction Recovery Cache . . . . 383--415
John C. Gyllenhaal and
Wen-mei W. Hwu and
B. Ramakrishna Rau Optimization of Machine Descriptions for
Efficient Use . . . . . . . . . . . . . 417--447
Eric Hao and
Po-Yung Chang and
Marius Evers and
Yale N. Patt Increasing the Instruction Fetch Rate
via Block-Structured Instruction Set
Architectures . . . . . . . . . . . . . 449--478
Michael E. Wolf and
Dror E. Maydan and
Ding-Kai Chen Combining Loop Transformations
Considering Caches and Scheduling . . . 479--503
Mikko H. Lipasti and
John Paul Shen Exploiting Value Locality to Exceed the
Dataflow Limit . . . . . . . . . . . . . 505--538
Zhiyuan Li and
Pen-Chung Yew Introduction . . . . . . . . . . . . . . 539--540
Insung Park and
Michael Voss and
Brian Armstrong and
Rudolf Eigenmann Parallel Programming and Performance
Evaluation with the URSA Tool Family . . 541--561
Jaejin Lee and
Samuel P. Midkiff and
David A. Padua A Constant Propagation Algorithm for
Explicitly Parallel Programs . . . . . . 563--589
Hwansoo Han and
Chau-Wen Tseng and
Pete Keleher Eliminating Barrier Synchronization for
Compiler-Parallelized Codes on Software
DSMs . . . . . . . . . . . . . . . . . . 591--612
John Mellor-Crummey and
Vikram Adve Simplifying Control Flow in
Compiler-Generated Parallel Code . . . . 613--638
Zhiyuan Li and
Pen-Chung Yew Introduction . . . . . . . . . . . . . . 639--640
Nicholas Mitchell and
Karin Högstedt and
Larry Carter and
Jeanne Ferrante Quantifying the Multi-Level Nature of
Tiling Interactions . . . . . . . . . . 641--670
Jingling Xue and
Chua-Huang Huang Reuse-Driven Tiling for Improving Data
Locality . . . . . . . . . . . . . . . . 671--696
Jenn-Yuan Tsai and
Zhenzhen Jiang and
Pen-Chung Yew Compiler Techniques for the
Superthreaded Architectures . . . . . . 1--19
Thomas Kistler and
Michael Franz A Tree-Based Alternative to Java
Byte-Codes . . . . . . . . . . . . . . . 21--33
Edward H. Gornish and
Alexander Veidenbaum An Integrated Hardware/Software Data
Prefetching Scheme for Shared-Memory
Multiprocessors . . . . . . . . . . . . 35--70
Kazuki Joe Guest Editor's Introduction . . . . . . 71--72
Bret A. Marsolf and
Kyle A. Gallivan and
Harry A. G. Wijshoff The Utilization of Matrix Structure to
Generate Optimized Code from MATLAB
Programs . . . . . . . . . . . . . . . . 73--96
Atsushi Kubota and
Shogo Tatsumi and
Toshihiko Tanaka and
Masahiro Goshima and
Shin-ichiro Mori and
Hiroshi Nakashima and
Shinji Tomita A Technique to Eliminate Redundant
Inter-Processor Communication on
Parallelizing Compiler TINPAR . . . . . 97--109
Mariko Sasakura and
Kazuki Joe and
Yoshitoshi Kunieda and
Keijiro Araki NaraView: an Interactive $3$D
Visualization System for Parallelization
of Programs . . . . . . . . . . . . . . 111--129
Michael F. P. O'Boyle and
Peter M. W. Knijnenburg Nonsingular Data Transformations:
Definition, Validity, and Applications 131--159
Avi Mendelson and
Michael Bekerman Design Alternatives of Multithreaded
Architecture . . . . . . . . . . . . . . 161--193
Min Tan and
Janet M. Siegel and
Howard Jay Siegel Parallel Implementations of Block-Based
Motion Vector Estimation for Video
Compression on Four Parallel Processing
Systems . . . . . . . . . . . . . . . . 195--225
Shlomit S. Pinter Introduction . . . . . . . . . . . . . . 227--228
Yiannakis Sazeides and
James E. Smith Limits of Data Value Predictability . . 229--256
Steven Phillips and
Anne Rogers Parallel Speech Recognition . . . . . . 257--288
Ragini Narasimhan and
Daniel J. Rosenkrantz and
S. S. Ravi Using Data Flow Information to Obtain
Efficient Check Sets for Algorithm-Based
Fault Tolerance . . . . . . . . . . . . 289--323
Thomas Conte and
Wen-Mei Hwu and
Mark Smotherman Editors' Introduction . . . . . . . . . 325--326
Keith I. Farkas and
Paul Chow and
Norman P. Jouppi and
Zvonko Vranesic The Multicluster Architecture: Reducing
Processor Cycle Time Through
Partitioning . . . . . . . . . . . . . . 327--356
Gary S. Tyson and
Todd M. Austin Memory Renaming: Fast, Early and
Accurate Processing of Memory
Communication . . . . . . . . . . . . . 357--380
David I. August and
Wen-mei W. Hwu and
Scott A. Mahlke The Partial Reverse If-Conversion
Framework for Balancing Control Flow and
Predication . . . . . . . . . . . . . . 381--423
Thomas Conte and
Wen-mei Hwu and
Mark Smotherman Editors' Introduction . . . . . . . . . 425--426
Andreas Moshovos and
Gurindar S. Sohi Speculative Memory Cloaking and
Bypassing . . . . . . . . . . . . . . . 427--456
Darko Kirovski and
Johnson Kin and
William H. Mangione-Smith Procedure Based Program Compression . . 457--475
Jack L. Lo and
Susan J. Eggers and
Henry M. Levy and
Sujay S. Parekh and
Dean M. Tullsen Tuning Compiler Optimizations for
Simultaneous Multithreading . . . . . . 477--503
R. Govindarajan and
N. S. S. Narasimha Rao and
E. R. Altman and
Guang R. Gao Enhanced Co-Scheduling: a Software
Pipelining Method Using Modulo-Scheduled
Pipeline Theory . . . . . . . . . . . . 1--46
Vincent Loechner and
Catherine Mongenet Communication Optimization for Affine
Recurrence Equations Using Broadcast and
Locality . . . . . . . . . . . . . . . . 47--102
Marc Daumas and
Paraskevas Evripidou Parallel Implementations of the
Selection Problem: a Case Study . . . . 103--131
Anonymous Guest Editor's Introduction . . . . . . 133--134
Kazuaki Ishizaki and
Hideaki Komatsu and
Toshio Nakatani A Loop Transformation Algorithm for
Communication Overlapping . . . . . . . 135--154
Naoshi Uchihira and
Hideji Kawata and
Fumitaka Tamura Scenario-Based Hypersequential
Programming . . . . . . . . . . . . . . 155--157
Hironori Nakajo and
Akihiro Ichikawa and
Yukio Kaneda A Distributed Shared-Memory System on a
Workstation Cluster Using Fast Serial
Links . . . . . . . . . . . . . . . . . 179--194
Hideki Saito and
Nicholas J. Stavrakos and
Constantine D. Polychronopoulos and
others The Design of the PROMIS
Compiler-Towards Multi-Level
Parallelization . . . . . . . . . . . . 195--212
Denis Barthou and
Albert Cohen and
Jean-François Collard Maximal Static Expansion . . . . . . . . 213--243
David K. Lowenthal Accurately Selecting Block Size at
Runtime in Pipelined Parallel Programs 245--274
Ramiro Varela Arias and
Camino Rodríguez Vela and
Jorge Puente Peinador and
Cesar Alonso Gonzalez Parallel Logic Programming for Problem
Solving . . . . . . . . . . . . . . . . 275--319
Anonymous Introduction . . . . . . . . . . . . . . 321--323
Erven Rohou and
François Bodin and
Christine Eisenbeis and
Andre Seznec Handling Global Constraints in Compiler
Strategy . . . . . . . . . . . . . . . . 325--345
Andreas Krall and
Sylvain Lelait Compilation Techniques for Multimedia
Processors . . . . . . . . . . . . . . . 347--361
N. Sreraman and
R. Govindarajan A Vectorizing Compiler for Multimedia
Extensions . . . . . . . . . . . . . . . 363--400
Henk Corporaal and
Johan Janssen and
Marnix Arnold Computation in the Context of Transport
Triggered Architectures . . . . . . . . 401--427
Anonymous Introduction . . . . . . . . . . . . . . 429--430
Wolfram Amme and
Peter Braun and
François Thomasset and
Eberhard Zehendner Data Dependence Analysis of Assembly
Code . . . . . . . . . . . . . . . . . . 431--467
Fabien Quillere and
Sanjay Rajopadhye and
Doran Wilde Generation of Efficient Nested Loops
from Polyhedra . . . . . . . . . . . . . 469--498
Alain Darte and
Guillaume Huard Loop Shifting for Loop Compaction . . . 499--534
Paraskevas Evripidou Introduction . . . . . . . . . . . . . . 535--536
Manish Gupta and
Sayak Mukhopadhyay and
Navin Sinha Automatic Parallelization of Recursive
Procedures . . . . . . . . . . . . . . . 537--562
Lori Carter and
Beth Simon and
Brad Calder and
Larry Carter and
Jeanne Ferrante Path Analysis and Renaming for
Predicated Instruction Scheduling . . . 563--588
Peng Wu and
David Padua Containers on the Parallelization of
General-Purpose Java Programs . . . . . 589--605
Martin Griebl and
Paul Feautrier and
Christian Lengauer Index Set Splitting . . . . . . . . . . 607--631
Anonymous Introduction . . . . . . . . . . . . . . 1--2
Venkata Krishnan and
Josep Torrellas The Need for Fast Communication in
Hardware-Based Speculative Chip
Multiprocessors . . . . . . . . . . . . 3--33
Pierre Michaud and
André Seznec and
Stéphan Jourdan An Exploration of Instruction Fetch
Requirement in Out-of-Order Superscalar
Processors . . . . . . . . . . . . . . . 35--58
Ramon Canal and
Joan-Manuel Parcerisa and
Antonio González Dynamic Code Partitioning for Clustered
Architectures . . . . . . . . . . . . . 59--79
Artur Klauser and
Srilatha Manne and
Dirk Grunwald Selective Branch Inversion: Confidence
Estimation for Branch Predictors . . . . 81--110
Matthew Arnold and
Michael Hsiao and
Ulrich Kremer and
Barbara G. Ryder Exploring the Interaction between Java's
Implicitly Thrown Exceptions and
Instruction Scheduling . . . . . . . . . 111--137
Dhruva R. Chakrabarti and
Prithviraj Banerjee Static Single Assignment Form for
Message-Passing Programs . . . . . . . . 139--184
Jay P. Hoeflinger and
Yunheung Paek and
Kwang Yi Unified Interprocedural Parallelism
Detection . . . . . . . . . . . . . . . 185--215
John Mellor-Crummey and
David Whalley and
Ken Kennedy Improving Memory Hierarchy Performance
for Irregular Applications Using Data
and Computation Reorderings . . . . . . 217--247
Dimitrios S. Nikolopoulos and
Theodore S. Papatheodorou The Architectural and Operating System
Implications on the Performance of
Synchronization on ccNUMA
Multiprocessors . . . . . . . . . . . . 249--282
Hongzhang Shan and
Jaswinder Pal Singh A Comparison of MPI, SHMEM and
Cache-Coherent Shared Address Space
Programming Models on a Tightly-Coupled
Multiprocessors . . . . . . . . . . . . 283--318
Induprakas Kodukula and
Keshav Pingali Data-Centric Transformations for
Locality Enhancement . . . . . . . . . . 319--364
Mayez Al-Mouhamed and
Hussam Abu-Haimed Evaluation of Neural and Genetic
Algorithms for Synthesizing Parallel
Storage Schemes . . . . . . . . . . . . 365--399
Raju Pandey and
James C. Browne Support for Implementation of
Evolutionary Concurrent Systems . . . . 401--431
Isabelle Attali and
Denis Caromel and
Yung-Syau Chen and
Jean-Luc Gaudiot and
Andrew L. Wendelborn Enhancing Functional and Irregular
Parallelism: Stateful Functions and
their Semantics . . . . . . . . . . . . 433--460
Alex Veidenbaum Guest Editor's Introduction . . . . . . 461--462
Ken Kennedy Fast Greedy Weighted Fusion . . . . . . 463--491
Nawaaz Ahmed and
Nikolay Mateev and
Keshav Pingali Synthesizing Transformations for
Locality Enhancement of
Imperfectly-Nested Loop Nests . . . . . 493--544
Vivek Sarkar Optimized Unrolling of Nested Loops . . 545--581
Yosi Ben-Asher and
Dimitry Podvolny Y-Invalidate: a New Protocol for
Implementing Weak Consistency in DSM
Systems . . . . . . . . . . . . . . . . 583--606
Inbum Jung and
Jongwoong Hyun and
Joonwon Lee and
Joongsoo Ma Two-Phase Barrier: a Synchronization
Primitive for Improving the Processor
Utilization . . . . . . . . . . . . . . 607--627
Tracy D. Braun and
Renard Ulrey and
Anthony A. Maciejewski and
Howard Jay Siegel Parallel Approaches for Singular Value
Decomposition as Applied to Robotic
Manipulator Jacobians . . . . . . . . . 1--35
Francisco Corbera and
Rafael Asenjo and
Emilio Zapata New Shape Analysis and Interprocedural
Techniques for Automatic Parallelization
of C Codes . . . . . . . . . . . . . . . 37--63
Aart J. C. Bik and
Milind Girkar and
Paul M. Grey and
Xinmin Tian Automatic Intra-Register Vectorization
for the Intel\reg Architecture . . . . . 65--98
Jose M. Mantas Ruiz and
Julio Ortega Lopera and
Jose A. Carrillo de la Plata Component-Based Derivation of a Parallel
Stiff ODE Solver Implemented in a
Cluster of Computers . . . . . . . . . . 99--148
Dragan Milicev and
Zoran Jovanovic Control Flow Regeneration for Software
Pipelined Loops with Conditions . . . . 149--179
David Wonnacott Achieving Scalable Locality with Time
Skewing . . . . . . . . . . . . . . . . 181--221
Alex Veidenbaum Guest Editor's Introduction . . . . . . 223--224
Dimitrios S. Nikolopoulos and
Eduard Ayguadé and
Constantine D. Polychronopoulos Runtime vs. Manual Data Distribution for
Architecture-Agnostic Shared-Memory
Programming Models . . . . . . . . . . . 225--255
Pramod G. Joisha and
Samuel P. Midkiff and
Mauricio J. Serrano and
Manish Gupta Efficiently Adapting Java Binaries in
Limited Memory Contexts . . . . . . . . 257--289
Arun Chauhan and
Ken Kennedy Reducing and Vectorizing Procedures for
Telescoping Languages . . . . . . . . . 291--315
George S. Almasi and
C\ualin Ca\cscaval and
José G. Castaños and
Monty Denneau and
Wilm Donath and
Maria Eleftheriou and
Mark Giampapa and
Howard Ho and
Derek Lieber and
José E. Moreira and
Dennis Newns and
Marc Snir and
Henry S. Warren, Jr. Demonstrating the Scalability of a
Molecular Dynamics Application on a
Petaflops Computer . . . . . . . . . . . 317--351
Krishna M. Kavi and
Alireza Moshtaghi and
Deng-jyi Chen Modeling Multithreaded Applications
Using Petri Nets . . . . . . . . . . . . 353--371
Alex Ramirez and
Josep Ll. Larriba-Pey and
Carlos Navarro and
Mateo Valero and
Josep Torrellas Software Trace Cache for Commercial
Applications . . . . . . . . . . . . . . 373--395
Ivan D. Baev and
Waleed M. Meleis and
Santosh G. Abraham Backtracking-Based Instruction
Scheduling to Fill Branch Delay Slots 397--418
Paola Favati and
Grazia Lotti and
Ornella Menchi and
Francesco Romani Railway Computation for Infinite Linear
Systems . . . . . . . . . . . . . . . . 419--439
Kazuki Joe Guest Editor's Introduction . . . . . . 1--2
Siegfried Benkner and
Viera Sipkova Exploiting Distributed-Memory and
Shared-Memory Parallelism on Clusters of
SMPs with Data Parallel Programs . . . . 3--19
Minsoo Jeon and
Dongseung Kim Parallel Merge Sort with Load Balancing 21--33
J. Davison de St.Germain and
Alan Morris and
Steven G. Parker and
Allen D. Malony and
Sameer Shende Performance Analysis Integration in the
Uintah Software Development Cycle . . . 35--53
Takeshi Iwashita and
Masaaki Shimasaki Block Red-Black Ordering: a New Ordering
Strategy for Parallelization of ICCG
Method . . . . . . . . . . . . . . . . . 55--75
Alfredo Cristobal-Salas and
Andrei Tchernykh and
Jean-Luc Gaudiot and
Wen-Yen Lin Non-Strict Execution in Parallel and
Distributed Computing . . . . . . . . . 77--105
Patricio Buli\'c and
Veselko Gu\vstin An Extended ANSI C for Processors with a
Multimedia Extension . . . . . . . . . . 107--136
Zhijian Lu and
John Lach and
Mircea R. Stan and
Kevin Skadron Alloyed Branch History: Combining Global
and Local Branch History for Robust
Performance . . . . . . . . . . . . . . 137--177
Anonymous Erratum . . . . . . . . . . . . . . . . 179--179
Eduard Ayguade Guest Editor's Introduction . . . . . . 181--183
Daisuke Takahashi and
Mitsuhisa Sato and
Taisuke Boku Performance Evaluation of the Hitachi
SR8000 Using SPEC OMP2001 Benchmarks . . 185--196
Hideki Saito and
Greg Gaertner and
Wesley Jones and
Rudolf Eigenmann and
Hidetoshi Iwashita and
Ron Lieberman and
Matthijs van Waveren and
Brian Whitney Large System Performance of SPEC OMP
Benchmark Suites . . . . . . . . . . . . 197--209
Hirofumi Nakano and
Kazuhisa Ishizaka and
Motoki Obata and
Keiji Kimura and
Hironori Kasahara Static Coarse Grain Task Scheduling with
Cache Optimization Using OpenMP . . . . 211--223
Seung-Jai Min and
Ayon Basumallik and
Rudolf Eigenmann Optimizing OpenMP Programs on Software
Distributed Shared Memory Systems . . . 225--249
Silvius Rus and
Lawrence Rauchwerger and
Jay Hoeflinger Hybrid Analysis: Static & Dynamic Memory
Reference Analysis . . . . . . . . . . . 251--283
Richard L. Graham and
Sung-Eun Choi and
David J. Daniel and
Nehal N. Desai and
Ronald G. Minnich and
Craig E. Rasmussen and
L. Dean Risinger and
Mitchel W. Sukalski A Network-Failure-Tolerant
Message-Passing System for Terascale
Clusters . . . . . . . . . . . . . . . . 285--303
Venkata K. Pingali and
Sally A. McKee and
Wilson C. Hsieh and
John B. Carter Restructuring Computations for Temporal
Data Cache Locality . . . . . . . . . . 305--338
Han-Saem Yun and
Jihong Kim and
Soo-Mook Moon Time Optimal Software Pipelining of
Loops with Control Flows . . . . . . . . 339--391
Keqin Li On the Performance of Randomized
Embedding of Reproduction Trees in
Static Networks . . . . . . . . . . . . 393--406
Alex Orailoglu Guest Editor's Introduction . . . . . . 407--409
Kubilay Atasu and
Laura Pozzi and
Paolo Ienne Automatic Application-Specific
Instruction-Set Extensions Under
Microarchitectural Constraints . . . . . 411--428
Nathan Clark and
Hongtao Zhong and
Wilkin Tang and
Scott Mahlke Automatic Design of Application Specific
Instruction Set Extensions Through
Dataflow Graph Exploration . . . . . . . 429--449
José L. Ayala and
Alexander Veidenbaum and
Marisa López-Vallejo Power-Aware Compilation for Register
File Energy Reduction . . . . . . . . . 451--467
G. Surendra and
S. Banerjee and
S. K. Nandy On the Effectiveness of Flow Aggregation
in Improving Instruction Reuse in
Network Processing Applications . . . . 469--487
C. Kachris and
N. Bourbakis and
A. Dollas A Reconfigurable Logic-Based Processor
for the SCAN Image and Video Encryption
Algorithm . . . . . . . . . . . . . . . 489--506
Lei Pan and
MingKin Lai and
Koji Noguchi and
Javid J. Huseynov and
Lubomir F. Bic and
Michael B. Dillencourt Distributed Parallel Computing Using
Navigational Programming . . . . . . . . 1--37
Jongwook Woo and
Jean-Luc Gaudiot and
Andrew L. Wendelborn Alias Analysis in Java with
Reference-Set Representation for
High-Performance Computing . . . . . . . 39--76
N. P. Manoj and
K. V. Manjunath and
R. Govindarajan CAS-DSM: a Compiler Assisted Software
Distributed Shared Memory . . . . . . . 77--122
Mayez Al-Mouhamed Array Organization in Parallel Memories 123--163
Utpal Banerjee Guest Editor's Introduction . . . . . . 165--166
Jiuxing Liu and
Jiesheng Wu and
Dhabaleswar K. Panda High Performance RDMA-Based MPI
Implementation over InfiniBand . . . . . 167--198
Daniel Ortega and
Mateo Valero and
Eduard Ayguadé Dynamic Memory Instruction Bypassing . . 199--224
Ravi Rajwar and
Alain Kägi and
James R. Goodman Inferential Queueing and Speculative
Push . . . . . . . . . . . . . . . . . . 225--258
Utpal Banerjee Guest Editor's Introduction . . . . . . 259--261
Julita Corbalan and
Xavier Martorell and
Jesus Labarta Page Migration with Dynamic
Space-Sharing Scheduling Policies: The
Case of the SGI O2000 . . . . . . . . . 263--288
Steven Carroll and
Constantine Polychronopoulos A Framework for Incremental Extensible
Compiler Construction . . . . . . . . . 289--316
Konstantinos Kyriakopoulos and
Kleanthis Psarris Data Dependence Analysis Techniques for
Increased Accuracy and Extracted
Parallelism . . . . . . . . . . . . . . 317--359
Stavros Souravlas and
Manos Roumeliotis A Pipeline Technique for Dynamic Data
Transfer on a Multiprocessor Grid . . . 361--388
Hideya Iwasaki and
Zhenjiang Hu A New Parallel Skeleton for General
Accumulative Computations . . . . . . . 389--414
H. Sarojadevi and
S. K. Nandy and
S. Balakrishnan On the Correctness of Program Execution
When Cache Coherence Is Maintained
Locally at Data-Sharing Boundaries in
Distributed Shared Memory
Multiprocessors . . . . . . . . . . . . 415--446
Javier Zalamea and
Josep Llosa and
Eduard Ayguadé and
Mateo Valero Software and Hardware Techniques to
Optimize Register File Utilization in
VLIW Architectures . . . . . . . . . . . 447--474
Virgil Palanciuc and
Dragos Badea A Spill Code Minimization
Technique-Application in the Metrowerks
StarCore C Compiler . . . . . . . . . . 475--499
Vijay Menon and
Keshav Pingali Look Left, Look Right, Look Left Again:
an Application of Fractal Symbolic
Analysis to Linear Algebra Code
Restructuring . . . . . . . . . . . . . 501--523
Yonghong Song and
Cheng Wang and
Zhiyuan Li A Polynomial-Time Algorithm for Memory
Space Reduction . . . . . . . . . . . . 1--33
Eric Hung-Yu Tseng and
Jean-Luc Gaudiot Automatic Array Partitioning Based on
the Smith Normal Form . . . . . . . . . 35--56
Mo Zeyao Concatenation Algorithms for Parallel
Numerical Simulation of Radiation
Hydrodynamics coupled with Neutron
Transport . . . . . . . . . . . . . . . 57--71
Frederica Darema The Next Generation Software Program . . 73--79
David I. August and
Sharad Malik and
Li-Shiuan Peh and
Vijay Pai and
Manish Vachharajani and
Paul Willmann Achieving Structural and Composable
Modeling of Complex Systems . . . . . . 81--101
Naveen Kumar and
Bruce R. Childers and
Daniel Williams and
Jack W. Davidson and
Mary Lou Soffa Compile-Time Planning for Overhead
Reduction in Software Dynamic
Translators . . . . . . . . . . . . . . 103--114
Shobana Padmanabhan and
Phillip Jones and
David V. Schuehler and
Scott J. Friedman and
Praveen Krishnamurthy and
Huakai Zhang and
Roger Chamberlain and
Ron K. Cytron and
Jason Fritts and
John W. Lockwood Extracting and Improving
Microarchitecture Performance on
Reconfigurable Architectures . . . . . . 115--136
Victor Eijkhout and
Erika Fuentes and
Thomas Eidson and
Jack Dongarra The Component Structure of a
Self-Adapting Numerical Software System 137--143
Douglas Gregor and
Jaakko Järvi and
Mayuresh Kulkarni and
Andrew Lumsdaine and
David Musser and
Sibylle Schupp Generic Programming and High-Performance
Libraries . . . . . . . . . . . . . . . 145--164
Yoon-Ju Lee and
Pedro C. Diniz and
Mary W. Hall and
Robert Lucas Empirical Optimization for a Sparse
Linear Solver: a Case Study . . . . . . 165--181
Gengbin Zheng and
Terry Wilmarth and
Praveen Jagadishprasad and
Laxmikant V. Kalé Simulation-Based Performance Prediction
for Large Parallel Machines . . . . . . 183--207
F. Berman and
H. Casanova and
A. Chien and
K. Cooper and
H. Dail and
A. Dasgupta and
W. Deng and
J. Dongarra and
L. Johnsson and
K. Kennedy and
C. Koelbel and
B. Liu and
X. Liu and
A. Mandal and
G. Marin and
M. Mazina and
J. Mellor-Crummey and
C. Mendes and
A. Olugbile and
M. Patel and
D. Reed and
Z. Shi and
O. Sievert and
H. Xia and
A. YarKhan New Grid Scheduling and Rescheduling
Methods in the GrADS Project . . . . . . 209--229
J. Eliot B. Moss and
Trek Palmer and
Timothy Richards and
Edward K. Walters and
Charles C. Weems CISL: a Class-Based Machine Description
Language for Co-Generation of Compilers
and Simulators . . . . . . . . . . . . . 231--246
Ravi Iyer and
Jack Perdue and
Lawrence Rauchwerger and
Nancy M. Amato and
Laxmi Bhuyan An Experimental Evaluation of the HP
V-Class and SGI Origin 2000
Multiprocessors using Microbenchmarks
and Scientific Applications . . . . . . 307--350
Chao Lin and
Jang-Ping Sheu Efficient Broadcast in Heterogeneous
Networks of Workstations Using Two
Sub-Networks . . . . . . . . . . . . . . 351--391
Sid-Ahmed-Ali Touati Register Saturation in Instruction Level
Parallelism . . . . . . . . . . . . . . 393--449
Jean-Luc Gaudiot and
Siang Wun Song Message from the Guest Editors . . . . . 451--452
Rodolfo Azevedo and
Sandro Rigo and
Marcus Bartholomeu and
Guido Araujo and
Cristiano Araujo and
Edna Barros The ArchC Architecture Description
Language and Tools . . . . . . . . . . . 453--484
Debora R. Roberti and
Roberto P. Souto and
Haroldo F. Campos Velho and
Gervasio A. Degrazia and
Domenico Anfossi Parallel Implementation of a Lagrangian
Stochastic Model for Pollutant
Dispersion . . . . . . . . . . . . . . . 485--498
Edson Toshimi Midorikawa and
Helio Marci Oliveira and
Jean Marcos Laine PEMPIs: a New Methodology for Modeling
and Prediction of MPI Programs
Performance . . . . . . . . . . . . . . 499--527
Onur Mutlu and
Hyesoon Kim and
David N. Armstrong and
Yale N. Patt Using the First-Level Caches as Filters
to Reduce the Pollution Caused by
Speculative Memory References . . . . . 529--559
Yue Luo and
Lizy K. John and
Lieven Eeckhout SMA: a Self-Monitored Adaptive Cache
Warm-Up Scheme for Microprocessor
Simulation . . . . . . . . . . . . . . . 561--581
Franco Fummi and
Ian G. Harris Editorial . . . . . . . . . . . . . . . 583--584
Mirko Loghi and
Tiziana Margaria and
Graziano Pravadelli and
Bernhard Steffen Dynamic and Formal Verification of
Embedded Systems: a Comparative Survey 585--611
Jean-Pierre Talpin and
Paul Le Guernic and
Sandeep Kumar Shukla and
Rajesh Gupta A Compositional Behavioral Modeling
Framework for Embedded System Design and
Conformance Checking . . . . . . . . . . 613--643
Alfred Koelbl and
Carl Pixley Constructing Efficient Formal Models
from High-Level Descriptions Using
Symbolic Simulation . . . . . . . . . . 645--666
Francesco Bruschi and
Fabrizio Ferrandi and
Donatella Sciuto A Framework for the Functional
Verification of SystemC Models . . . . . 667--695
Iñigo Ugarte and
Pablo Sanchez Verification of Embedded Systems Based
on Interval Analysis . . . . . . . . . . 697--720
Ian G. Harris and
Franco Fummi Guest Editor\'s Introduction . . . . . . 1--2
Xi Chen and
Harry Hsieh and
Felice Balarin Verification Approach of Metropolis
Design Framework for Embedded Systems 3--27
Samar Abdi and
Daniel Gajski Verification of System Level Model
Transformations . . . . . . . . . . . . 29--59
David Currie and
Xiushan Feng and
Masahiro Fujita and
Alan J. Hu and
Mark Kwan and
Sreeranga Rajan Embedded Software Verification Using
Symbolic Execution and Uninterpreted
Functions . . . . . . . . . . . . . . . 61--91
Ernesto Sánchez and
Matteo Sonza Reorda and
Giovanni Squillero Efficient Techniques for Automatic
Verification-Oriented Test Set
Optimization . . . . . . . . . . . . . . 93--109
Bilha Mendelson and
Shlomit S. Pinter and
Ayal Zaks Introduction . . . . . . . . . . . . . . 111--112
Michael Factor and
Assaf Schuster and
Konstantin Shagin A Platform-Independent Distributed
Runtime for Standard Multithreaded Java 113--142
Gregory Chockler and
Dahlia Malkhi Light-Weight Leases for Storage-Centric
Coordination . . . . . . . . . . . . . . 143--170
Alexander Gendler and
Avi Mendelson and
Yitzhak Birk A PAB-Based Multi-Prefetcher Mechanism 171--188
Chris Jesshope and
Alex Shafarenko Special issue on Micro-grids --- Guest
Editor Introduction . . . . . . . . . . 189--192
Carmen Martínez and
Enrique Vallejo and
Ramón Beivide and
Cruz Izu and
Miquel Moretó Dense Gaussian Networks: Suitable
Topologies for On-Chip Multiprocessors 193--211
Pedro Trancoso and
Paraskevas Evripidou and
Kyriakos Stavrou and
Costas Kyriacou A Case for Chip Multiprocessors Based on
the Data-Driven Multithreading Model . . 213--235
Asadollah Shahbahrami and
Ben Juurlink and
Demid Borodin and
Stamatis Vassiliadis Avoiding Conversion and Rearrangement
Overhead in SIMD Architectures . . . . . 237--260
Sylvain Girbal and
Nicolas Vasilache and
Cédric Bastoul and
Albert Cohen and
David Parello and
Marc Sigler and
Olivier Temam Semi-Automatic Composition of Loop
Transformations for Deep Parallelism and
Memory Hierarchies . . . . . . . . . . . 261--317
Chris Jesshope and
Alex Shafarenko Guest Editor's Introduction á $<$Part 2$>$ 319--322
Gajinder Panesar and
Daniel Towner and
Andrew Duller and
Alan Gray and
Will Robbins Deterministic Parallel Processing . . . 323--341
Ian Bell and
Nabil Hasasneh and
Chris Jesshope Supporting Microthread Scheduling and
Synchronisation in CMPs . . . . . . . . 343--381
Clemens Grelck and
Sven-Bodo Scholz SAC --- a Functional Array Language for
Efficient Multi-threaded Execution . . . 383--427
Paraskevas Evripidou and
George Samaras Metacomputing with Mobile Agents . . . . 429--458
Paul Feautrier Scalable and Structured Scheduling . . . 459--487
A. Aiello and
M. Mango Furnari and
A. Massarotti and
S. Brandi and
V. Caputo and
V. Barone An Experimental Ontology Server for an
Information Grid Environment . . . . . . 489--508
Ales Holobar and
Milan Ojstersek and
Damjan Zazula Distributed Jacobi Joint Diagonalization
on Clusters of Personal Computers . . . 509--530
Rajani Pai and
R. Govindarajan FEADS: a Framework for Exploring the
Application Design Space on Network
Processors . . . . . . . . . . . . . . . 1--31
Ender Özcan and
Esin Onbasioglu Memetic Algorithms for Parallel Code
Optimization . . . . . . . . . . . . . . 33--61
Chunhui Zhang and
Fadi Kurdahi Reducing Off-Chip Memory Access via
Stream-Conscious Tiling on Multimedia
Applications . . . . . . . . . . . . . . 63--98
Tony Givargis Special Issue On Embedded Processors ---
Guest Editor Introduction . . . . . . . 99--100
JoAnn M. Paul and
Brett H. Meyer Amdahl's Law Revisited for Single Chip
Systems . . . . . . . . . . . . . . . . 101--123
Sorin Manolache and
Petru Eles and
Zebo Peng Fault-aware Communication Mapping for
NoCs with Guaranteed Latency . . . . . . 125--156
Peter Petrov and
Alex Orailoglu Dynamic Tag Reduction for Low-Power
Caches in Embedded Systems with Virtual
Memory . . . . . . . . . . . . . . . . . 157--177
Sally A. McKee Guest Editor's Introduction . . . . . . 179--180
José E. Moreira and
Valentina Salapura and
George Almasi and
Charles Archer and
Ralph Bellofatto and
Peter Bergner and
Randy Bickford and
Mathias Blumrich and
José R. Brunheroto and
Arthur A. Bright and
Michael Brutman and
José G. Castaños and
Dong Chen and
Paul Coteus and
Paul Crumley and
Sam Ellis and
Thomas Engelsiepen and
Alan Gara and
Mark Giampapa and
Tom Gooding and
Shawn Hall and
Ruud A. Haring and
Roger Haskin and
Philip Heidelberger and
Dirk Hoenicke and
Todd Inglett and
Gerrard V. Kopcsay and
Derek Lieber and
David Limpert and
Pat McCarthy and
Mark Megerian and
Mike Mundy and
Martin Ohmacht and
Jeff Parker and
Rick A. Rand and
Don Reed and
Ramendra Sahoo and
Alda Sanomiya and
Richard Shok and
Brian Smith and
Gordon G. Stewart and
Todd Takken and
Pavlos Vranas and
Brian Wallenfelt and
Michael Blocksome and
Joe Ratterman The Blue Gene/L Supercomputer: a
Hardware and Software Story . . . . . . 181--206
Gregory L. Lee and
Martin Schulz and
Dong H. Ahn and
Andrew Bernat and
Bronis R. de Supinski and
Steven Y. Ko and
Barry Rountree Dynamic Binary Instrumentation and Data
Aggregation on Large Scale Systems . . . 207--232
Michael Gschwind The Cell Broadband Engine: Exploiting
Multiple Levels of Parallelism in a Chip
Multiprocessor . . . . . . . . . . . . . 233--262
Samuel Williams and
John Shalf and
Leonid Oliker and
Shoaib Kamil and
Parry Husbands and
Katherine Yelick Scientific Computing Kernels on the Cell
Processor . . . . . . . . . . . . . . . 263--298
James Laudon and
Lawrence Spracklen The Coming Wave of Multithreaded Chip
Multiprocessors . . . . . . . . . . . . 299--330
Eduard Ayguadé and
Matthias S. Mueller Special Issue on OpenMP --- Guest
Editors' Introduction . . . . . . . . . 331--333
Greg Bronevetsky and
Bronis R. de Supinski Complete Formal Specification of the
OpenMP Memory Model . . . . . . . . . . 335--392
Alejandro Duran and
Roger Ferrer and
Juan José Costa and
Marc Gonz\`alez and
Xavier Martorell and
Eduard Ayguadé and
Jesús Labarta A Proposal for Error Handling in OpenMP 393--416
Alan Morris and
Allen D. Malony and
Sameer S. Shende Supporting Nested OpenMP Parallelism in
the TAU Performance System . . . . . . . 417--436
Eduard Ayguadé and
Matthias S. Mueller Introduction . . . . . . . . . . . . . . 437--439
Russell Brown and
Ilya Sharapov High-Scalability Parallelization of a
Molecular Modeling Application:
Performance and Productivity Comparison
Between OpenMP and MPI Implementations 441--458
Dieter an Mey and
Samuel Sarholz and
Christian Terboven Nested Parallelization with OpenMP . . . 459--476
Markus Nordén and
Henrik Löf and
Jarmo Rantakokko and
Sverker Holmgren Dynamic Data Migration for Structured
AMR Solvers . . . . . . . . . . . . . . 477--491
Tien-Hsiung Weng and
Ruey-Kuen Perng and
Barbara Chapman OpenMP Implementation of SPICE3 Circuit
Simulator . . . . . . . . . . . . . . . 493--505
Anup Gangwar and
M. Balakrishnan and
Preeti Ranjan Panda and
Anshul Kumar Evaluation of Bus Based Interconnect
Mechanisms in Clustered VLIW
Architectures . . . . . . . . . . . . . 507--527
Issam W. Damaj Parallel Algorithms Development for
Programmable Devices with Application
from Cryptography . . . . . . . . . . . 529--572
Laurent Baduel and
Françoise Baude and
Denis Caromel Asynchronous Typed Object Groups for
Grid Programming . . . . . . . . . . . . 573--614
Kento Emoto and
Zhenjiang Hu and
Kazuhiko Kakehi and
Masato Takeichi A Compositional Framework for Developing
Parallel Programs on Two-Dimensional
Arrays . . . . . . . . . . . . . . . . . 615--658
Preeti Ranjan Panda Guest Editor Introduction: Special Issue
on Multiprocessor-based Embedded Systems 1--2
Martino Ruggiero and
Alessio Guerri and
Davide Bertozzi and
Michela Milano and
Luca Benini A Fast and Accurate Technique for
Mapping Parallel Applications on
Stream-Oriented MPSoC Platforms with
Communication Awareness . . . . . . . . 3--36
Traian Pop and
Paul Pop and
Petru Eles and
Zebo Peng Analysis and Optimisation of
Hierarchically Scheduled Multiprocessor
Embedded Systems . . . . . . . . . . . . 37--67
Lobna Kriaa and
Aimen Bouchhima and
Marius Gligor and
Anne-Marie Fouillart and
Fréderic Pétrot and
Ahmed-Amine Jerraya Parallel Programming of Multi-processor
SoC: a HW--SW Interface Perspective . . 68--92
Ilya Issenin and
Nikil Dutt Using FORAY Models to Enable MPSoC
Memory Optimizations . . . . . . . . . . 93--113
Mohammad Abdullah Al Faruque and
Jörg Henkel QoS-supported On-chip Communication for
Multi-processors . . . . . . . . . . . . 114--139
Seng Lin Shee and
Andrea Erdos and
Sri Parameswaran Architectural Exploration of
Heterogeneous Multiprocessor Systems for
JPEG . . . . . . . . . . . . . . . . . . 140--162
Alberto F. De Souza and
Rajkumar Buyya Introduction to the Special Issue on the
18th International Symposium on Computer
Architecture and High Performance
Computing . . . . . . . . . . . . . . . 163--165
Fredrik Warg and
Per Stenstrom Dual-thread Speculation: a Simple
Approach to Uncover Thread-level
Parallelism on a Simultaneous
Multithreaded Processor . . . . . . . . 166--183
Peter A. Rounce and
Alberto F. De Souza Dynamic Instruction Scheduling in a
Trace-based Multi-threaded Architecture 184--205
Wessam M. Hassanein and
Layali K. Rashid and
Moustafa A. Hammad Analyzing the Effects of Hyperthreading
on the Performance of Data Management
Systems . . . . . . . . . . . . . . . . 206--225
Renata Braga Araújo and
Guilherme Henrique Trielli Ferreira and
Gustavo Henrique Orair and
Wagner Meira and
Renato Antônio Celso Ferreira and
Dorgival Olavo Guedes Neto and
Mohammed Javeed Zaki The ParTriCluster Algorithm for Gene
Expression Analysis . . . . . . . . . . 226--249
George Teodoro and
Tulio Tavares and
Renato Ferreira and
Tahsin Kurc and
Wagner Meira and
Dorgival Guedes and
Tony Pan and
Joel Saltz A Run-time System for Efficient
Execution of Scientific Workflows on
Distributed Environments . . . . . . . . 250--266
Gabriel H. Loh and
Daniel A. Jiménez Modulo Path History for the Reduction of
Pipeline Overheads in Path-based Neural
Branch Predictors . . . . . . . . . . . 267--286
Guang R. Gao and
Mitsuhisa Sato and
Eduard Ayguadé Guest Editors Introduction: Special
Issue on OpenMP . . . . . . . . . . . . 287--288
Kevin O\'Brien and
Kathryn O\'Brien and
Zehra Sura and
Tong Chen and
Tao Zhang Supporting OpenMP on Cell . . . . . . . 289--311
Haoqiang Jin and
Barbara Chapman and
Lei Huang and
Dieter an Mey and
Thomas Reichstein Performance Evaluation of a Multi-Zone
Application in Different OpenMP
Approaches . . . . . . . . . . . . . . . 312--325
Milos Milovanovi\'c and
Roger Ferrer and
Vladimir Gajinov and
Osman S. Unsal and
Adrian Cristal and
Eduard Ayguadé and
Mateo Valero Nebelung: Execution Environment for
Transactional OpenMP . . . . . . . . . . 326--346
Jie Tao and
Marcel Kunze and
Fabian Nowak and
Rainer Buchty and
Wolfgang Karl Performance Advantage of Reconfigurable
Cache Design on Multicore Processor
Systems . . . . . . . . . . . . . . . . 347--360
Dongsoo Kang and
Chen Liu and
Jean-Luc Gaudiot The Impact of Speculative Execution on
SMT Processors . . . . . . . . . . . . . 361--385
K. Subramani and
Kiran Yellajyosula On the Design and Implementation of a
Shared Memory Dispatcher for Partially
Clairvoyant Schedulers . . . . . . . . . 386--411
Mariana Luderitz Kolberg and
Luiz Gustavo Fernandes and
Dalcidio Moraes Claudio Dense Linear System: a Parallel
Self-verified Solver . . . . . . . . . . 412--425
Ahmad Faraj and
Pitch Patarasuk and
Xin Yuan Bandwidth Efficient All-to-All Broadcast
on Switched Clusters . . . . . . . . . . 426--453
Tony Givargis Guest Editor Introduction: Special Issue
on Embedded Processors . . . . . . . . . 455--456
Praveen Kalla and
X. Sharon Hu and
Jörg Henkel A Flexible Framework for Communication
Evaluation in SoC Design . . . . . . . . 457--477
Roman Lysecky Scalability and Parallel Execution of
Warp Processing: Dynamic
Hardware/Software Partitioning . . . . . 478--492
Zhi Guo and
Betul Buyukkurt and
John Cortes and
Abhishek Mitra and
Walild Najjar A Compiler Intermediate Representation
for Reconfigurable Fabrics . . . . . . . 493--520
Hsiao-Hsi Wang and
Kuan-Ching Li and
Ssu-Hsuan Lu and
Chun-Chieh Yang and
Jean-Luc Gaudiot Design and Implementation of an Agent
Home Scheme Strategy for Prefetch-Based
DSM Systems . . . . . . . . . . . . . . 521--542
Ahmad Faraj and
Pitch Patarasuk and
Xin Yuan A Study of Process Arrival Patterns for
MPI Collective Operations . . . . . . . 543--570
Aart J. C. Bik and
David L. Kreitzer and
Xinmin Tian A Case Study on Compiler Optimizations
for the Intel$^\reg $ Core$^{TM}$ 2 Duo
Processor . . . . . . . . . . . . . . . 571--591
H. L. A. van der Spek and
S. Groot and
E. M. Bakker and
H. A. G. Wijshoff A Compile/Run-time Environment for the
Automatic Transformation of Linked List
Data Structures . . . . . . . . . . . . 592--623
Nicholas Carriero Guest Editor Introduction: Special Issue
on High Performance Computing for High
Productivity Environments . . . . . . . 1--2
Gaurav Sharma and
Jos Martin MATLAB$^\reg $: a Language for Parallel
Computing . . . . . . . . . . . . . . . 3--36
Masatoshi Seki dRuby and Rinda: Implementation and
Application of Distributed Ruby and its
Parallel Coordination Mechanism . . . . 37--57
L. Anthony Drummond and
Vicente Galiano and
Violeta Migallón and
Jose Penadés PyACTS: a Python Based Interface to ACTS
Tools and Parallel Scientific
Applications . . . . . . . . . . . . . . 58--77
Luke Tierney and
A. J. Rossini and
Na Li Snow: a Parallel Computing Framework for
the R System . . . . . . . . . . . . . . 78--90
David E. Hudak and
Neil Ludban and
Ashok Krishnamurthy and
Vijay Gadepally and
Siddharth Samsi and
others A Computational Science IDE for HPC
Systems: Design and Applications . . . . 91--105
Robert D. Bjornson and
Nicholas J. Carriero and
Martin H. Schultz and
Patrick M. Shields and
Stephen B. Weston NetWorkSpace: a Coordination System for
High-Productivity Environments . . . . . 106--125
Jun Cao and
Ayush Goyal and
Krista A. Novstrup and
Samuel P. Midkiff and
James M. Caruthers An Optimizing Compiler for Parallel
Chemistry Simulations . . . . . . . . . 127--152
J. Miguel-Alonso and
J. Navaridas and
F. J. Ridruejo Interconnection Network Simulation Using
Traces of MPI Applications . . . . . . . 153--174
Joahyoung Lee and
Inbum Jung Recovery Strategies for Streaming Media
Service in a Cluster-Based VOD Server
with a Fault Node . . . . . . . . . . . 175--194
Athanasios I. Margaris Log File Formats for Parallel
Applications: a Review . . . . . . . . . 195--222
Mohammad J. Rashti and
Ahmad Afsahi A Speculative and Adaptive MPI
Rendezvous Protocol Over RDMA-enabled
Interconnects . . . . . . . . . . . . . 223--246
Rudolf Eigenmann and
Eduard Ayguadé Guest Editors' Introduction . . . . . . 247--249
Greg Bronevetsky and
John Gyllenhaal and
Bronis R. de Supinski CLOMP: Accurately Characterizing OpenMP
Application Overheads . . . . . . . . . 250--265
Karl Fürlinger and
Shirley Moore Capturing and Analyzing the Execution
Control Flow of OpenMP Applications . . 266--276
Tobias Hilbrich and
Matthias S. Müller and
Bettina Krammer MPI Correctness Checking for OpenMP/MPI
Applications . . . . . . . . . . . . . . 277--291
Alejandro Duran and
Roger Ferrer and
Eduard Ayguadé and
Rosa M. Badia and
Jesus Labarta A Proposal to Extend the OpenMP Tasking
Model with Dependent Tasks . . . . . . . 292--305
Morten S. Rasmussen and
Matthias B. Stuart and
Sven Karlsson Parallelism and Scalability in an Image
Processing Application . . . . . . . . . 306--323
Pascal Vander-Swalmen and
Gilles Dequen and
Michaël Krajecki A Collaborative Approach for
Multi-Threaded SAT Solving . . . . . . . 324--342
Prabhat Mishra Guest Editor Introduction: Special Issue
on Nano/Bio-Inspired Applications and
Architectures . . . . . . . . . . . . . 343--344
Jayram Moorkanikara Nageswaran and
Andrew Felch and
Ashok Chandrasekhar and
Nikil Dutt and
Richard Granger and
others Brain Derived Vision Algorithm on High
Performance Architectures . . . . . . . 345--369
Yang Zhao and
Krishnendu Chakrabarty On-Line Testing of Lab-on-Chip Using
Reconfigurable Digital-Microfluidic
Compactors . . . . . . . . . . . . . . . 370--388
Scott Chilstedt and
Chen Dong and
Deming Chen Design and Evaluation of a Carbon
Nanotube-Based Programmable Architecture 389--416
Michael DeBole and
Ramakrishnan Krishnan and
Varsha Balakrishnan and
Wenping Wang and
Hong Luo and
others New-Age: a Negative Bias Temperature
Instability-Estimation Framework for
Microarchitectural Components . . . . . 417--431
Stéphane Genaud and
Emmanuel Jeannot and
Choopan Rattanapoka Fault-Management in P2P-MPI . . . . . . 433--461
Mohammad Reza Bonyadi and
Mohsen Ebrahimi Moghaddam A Bipartite Genetic Algorithm for
Multi-processor Task Scheduling . . . . 462--487
Guochun Shi and
Volodymyr Kindratenko and
Steven Gottlieb The Bottom-Up Implementation of One MILC
Lattice QCD Application on the Cell
Blade . . . . . . . . . . . . . . . . . 488--507
Chen Tian and
Min Feng and
Vijay Nagarajan and
Rajiv Gupta Speculative Parallelization of
Sequential Loops on Multicores . . . . . 508--535
Nadia Nedjah and
Luiza de Macedo Mourelle High-Performance Hardware of the
Sliding-Window Method for Parallel
Computation of Modular Exponentiations 537--555
Steen Larsen and
Parthasarathy Sarangam and
Ram Huggahalli and
Siddharth Kulkarni Architectural Breakdown of End-to-End
Latency in a TCP/IP Network . . . . . . 556--571
Carolina Ribeiro Xavier and
Rafael Sachetto Oliveira and
Vinicius da Fonseca Vieira and
Rodrigo Weber dos Santos and
Wagner Meira Multi-Level Parallelism for the Cardiac
Bidomain Equations . . . . . . . . . . . 572--592
Claudio Schepke and
Nicolas Maillard and
Philippe O. A. Navaux Parallel Lattice Boltzmann Method with
Blocked Partitioning . . . . . . . . . . 593--611
Sven-Bodo Scholz and
Alex Shafarenko Guest Editors' Editorial: Special Issue
on the Second International Workshop on
Microgrids . . . . . . . . . . . . . . . 1--3
Benedict R. Gaster and
Tim Bainbridge and
David Lacey and
David Gardner Compilation Techniques for High Level
Parallel Code . . . . . . . . . . . . . 4--18
Jan Haase and
Andreas Hofmann and
Klaus Waldschmidt A Self Distributing Virtual Machine for
Adaptive Multicore Environments . . . . 19--37
Clemens Grelck and
Sven-Bodo Scholz and
Alex Shafarenko Asynchronous Stream Processing with
S-Net . . . . . . . . . . . . . . . . . 38--67
Philip K. F. Hölzenspies and
Timon D. ter Braak and
Jan Kuper and
Gerard J. M. Smit and
Johann M. Hurink Run-time Spatial Mapping of Streaming
Applications to Heterogeneous
Multi-Processor Systems . . . . . . . . 68--83
Xiaobin Li and
Jean-Luc Gaudiot Tolerating Radiation-Induced Transient
Faults in Modern Processors . . . . . . 85--116
Chao Dong and
Huijie Zhao and
Wei Wang Parallel Nonnegative Matrix
Factorization Algorithm on the
Distributed Memory Platform . . . . . . 117--137
Nan Zhang Computing Optimised Parallel Speeded-Up
Robust Features (P-SURF) on Multi-Core
Processors . . . . . . . . . . . . . . . 138--158
Alexandros V. Gerbessiotis Parallel Option Price Valuations with
the Explicit Finite Difference Method 159--182
Preeti Ranjan Panda and
Rajendran Panda Guest Editorial: Special Issue on VLSI
Design and Embedded Systems . . . . . . 183--184
Alexander Czutro and
Ilia Polian and
Matthew Lewis and
Piet Engelke and
Sudhakar M. Reddy and
others Thread-Parallel Integrated Test Pattern
Generator Utilizing Satisfiability
Analysis . . . . . . . . . . . . . . . . 185--202
Tameesh Suri and
Aneesh Aggarwal Improving Adaptability and Per-Core
Performance of Many-Core Processors
Through Reconfiguration . . . . . . . . 203--224
Unmesh D. Bordoloi and
Samarjit Chakraborty GPU-based Acceleration of System-level
Design Tasks . . . . . . . . . . . . . . 225--253
Reiley Jeyapaul and
Aviral Shrivastava Code Transformations for TLB Power
Reduction . . . . . . . . . . . . . . . 254--276
Sourav Roy H-NMRU: an Efficient Cache Replacement
Policy with Low Area . . . . . . . . . . 277--287
Spyros Apostolakos and
Apostolos Meliones and
George Lykakis and
Emmanuel Touloupis and
Vassilis Vlagoulis Design, Implementation and Validation of
an Open Source IP-PBX/VoIP Gateway
Multi-Core SoC . . . . . . . . . . . . . 288--302
T. Kempf and
S. Wallentowitz and
G. Ascheid and
R. Leupers and
H. Meyr Analytical and Simulation-based Design
Space Exploration of Software Defined
Radios . . . . . . . . . . . . . . . . . 303--321
Vinay B. Y. Kumar and
Siddharth Joshi and
Sachin B. Patkar and
H. Narayanan FPGA Based High Performance
Double-Precision Matrix Multiplication 322--338
Matthias S. Müller and
Eduard Ayguadé Guest Editors' Introduction . . . . . . 339--340
Stephen L. Olivier and
Jan F. Prins Comparison of OpenMP 3.0 and Other Task
Parallel Frameworks on Unbalanced Task
Graphs . . . . . . . . . . . . . . . . . 341--360
Chunhua Liao and
Daniel J. Quinlan and
Jeremiah J. Willcock and
Thomas Panas Semantic-Aware Automatic Parallelization
of Modern Applications Using High-Level
Abstractions . . . . . . . . . . . . . . 361--378
Paul Kapinos and
Dieter an Mey Productivity and Performance Portability
of the OpenMP 3.0 Tasking Concept When
Applied to an Engineering Code Written
in Fortran 95 . . . . . . . . . . . . . 379--395
J. Mark Bull and
James Enright and
Xu Guo and
Chris Maynard and
Fiona Reid Performance Evaluation of Mixed-Mode
OpenMP/MPI Implementations . . . . . . . 396--417
François Broquedis and
Nathalie Furmento and
Brice Goglin and
Pierre-André Wacrenier and
Raymond Namyst ForestGOMP: an Efficient OpenMP
Environment for NUMA Architectures . . . 418--439
Eduard Ayguadé and
Rosa M. Badia and
Pieter Bellens and
Daniel Cabrera and
Alejandro Duran Roger Ferrer and
Marc González and
Francisco Igual and
Daniel Jiménez-González and
Jesús Labarta and
Luis Martinell and
Xavier Martorell and
Rafael Mayo and
Josep M. Pérez and
Judit Planas and
Enrique S. Quintana-Ortí Extending OpenMP to Survive the
Heterogeneous Multi-Core Era . . . . . . 440--459
Valentina Salapura and
José E. Moreira and
Sally A. McKee Guest Editors Introduction . . . . . . . 1--2
Daniele Paolo Scarpazza Top-Performance Tokenization and
Small-Ruleset Regular Expression
Matching: a Quantitative Performance
Analysis and Optimization Study on the
Cell/B.E. Processor . . . . . . . . . . 3--32
Arrvindh Shriraman and
Sandhya Dwarkadas Analyzing Conflicts in
Hardware-Supported Memory Transactions 33--61
Mehmet Belgin and
Godmar Back and
Calvin J. Ribbens A Library for Pattern-based Sparse
Matrix Vector Multiply . . . . . . . . . 62--87
Rob V. van Nieuwpoort and
John W. Romein Correlating Radio Astronomy Signals with
Many-Core Hardware . . . . . . . . . . . 88--114
Jiayuan Meng and
Kevin Skadron A Performance Study for Iterative
Stencil Loops on GPUs with Ghost Zone
Optimizations . . . . . . . . . . . . . 115--142
Ghada F. El Kabbany and
Nayer M. Wanas and
Nadia H. Hegazi and
Samir I. Shaheen A Dynamic Load Balancing Framework for
Real-time Applications in Message
Passing Systems . . . . . . . . . . . . 143--182
K. A. Hawick and
A. Leist and
D. P. Playne Regular Lattice and Small-World Spin
Model Simulations Using CUDA and GPUs 183--201
Simon Uzezi Ewedafe and
Rio Hirowati Shariffudin Parallel Implementation of $2$-D
Telegraphic Equation on MPI/PVM Cluster 202--231
Nasser Giacaman and
Oliver Sinnen Parallel Iterator for Parallelizing
Object-Oriented Applications . . . . . . 232--269
Christian Fensch and
Marcelo Cintra An Evaluation of an OS-Based Coherence
Scheme for Tiled CMPs . . . . . . . . . 271--295
Grigori Fursin and
Yuriy Kashnikov and
Abdul Wahid Memon and
Zbigniew Chamski and
Olivier Temam and
others Milepost GCC: Machine Learning Enabled
Self-tuning Compiler . . . . . . . . . . 296--327
Arnaud Grasset and
Philippe Millet and
Philippe Bonnot and
Sami Yehia and
Wolfram Putzke-Roeming and
others The MORPHEUS Heterogeneous Dynamically
Reconfigurable Platform . . . . . . . . 328--356
R. Tornero and
J. M. Orduña and
A. Mejia and
J. Flich and
J. Duato A Communication-Driven Routing Technique
for Application-Specific NoCs . . . . . 357--374
Enrique Vallejo and
Sutirtha Sanyal and
Tim Harris and
Fernando Vallejo and
Ramón Beivide and
others Hybrid Transactional Memory with
Pessimistic Concurrency Control . . . . 375--396
Harm Munk and
Eduard Ayguadé and
Cédric Bastoul and
Paul Carpenter and
Zbigniew Chamski and
others ACOTES Project: Advanced Compiler
Technologies for Embedded Streaming . . 397--450
Shaoshan Liu and
Ligang Wang and
Xiao-Feng Li and
Jean-Luc Gaudiot Space-and-Time Efficient Parallel
Garbage Collector for Data-Intensive
Applications . . . . . . . . . . . . . . 451--472
Ying Qian and
Ahmad Afsahi Process Arrival Pattern Aware Alltoall
and Allgather on InfiniBand Clusters . . 473--493
L. Benini and
R. Grottesi and
S. Morigi and
M. Ruggiero Parallel Rendering and Animation of
Subdivision Surfaces on the Cell BE
Processor . . . . . . . . . . . . . . . 494--521
Kush K. Kella and
Aasia Khanum APCFS: Autonomous and Parallel
Compressed File System . . . . . . . . . 522--532
Shaoshan Liu and
Christine Eisenbeis and
Jean-Luc Gaudiot Value Prediction and Speculative
Execution on GPU . . . . . . . . . . . . 533--552
Ralf Hoffmann and
Thomas Rauber Adaptive Task Pools: Efficiently
Balancing Large Number of Tasks on
Shared-address Spaces . . . . . . . . . 553--581
Can Ozturan and
Dan Grigoras Guest Editorial: Parallel and
Distributed Computing . . . . . . . . . 582--583
Anne Benoit and
Hinde Lilia Bouziane and
Yves Robert Optimizing the Reliability of Streaming
Applications Under Throughput
Constraints . . . . . . . . . . . . . . 584--614
George C. Caragea and
Alexandros Tzannes and
Fuat Keceli and
Rajeev Barua and
Uzi Vishkin Resource-Aware Compiler Prefetching for
Fine-Grained Many-Cores . . . . . . . . 615--638
Alper Sen and
Baris Aksanli and
Murat Bozkurt Speeding Up Cycle Based Logic Simulation
Using Graphics Processing Units . . . . 639--661
Yu-Min Lu and
Peng-Sheng Chen Probabilistic Alias Analysis of
Executable Code . . . . . . . . . . . . 663--693
Håkan Sundell Wait-Free Multi-Word Compare-and-Swap
Using Greedy Helping and Grabbing . . . 694--716
Masroor Hussain and
Muhammad Abid and
Mushtaq Ahmad and
Ashfaq Khokhar and
Arif Masud A Parallel Implementation of ALE Moving
Mesh Technique for FSI Problems using
OpenMP . . . . . . . . . . . . . . . . . 717--745
Kayhan M. Imre and
Cesur Baransel and
Harun Artuner Efficient and Scalable Routing
Algorithms for Collective Communication
Operations on $2$D All-Port Torus
Networks . . . . . . . . . . . . . . . . 746--782
Brian Demsky Using Discrete Event Simulation to
Analyze Contention Managers . . . . . . 783--808
Seçkin Sanci and
Veysi Isler A Parallel Algorithm for UAV Flight
Route Planning on GPU . . . . . . . . . 809--837
Valentina Salapura and
Michael Gschwind and
Jens Knoop Guest Editorial: Parallel Systems and
Compilers . . . . . . . . . . . . . . . 1--3
I-Jui Sung and
Nasser Anssari and
John A. Stratton and
Wen-Mei W. Hwu Data Layout Transformation Exploiting
Memory-Level Parallelism in Structured
Grid Many-Core Applications . . . . . . 4--24
Ferad Zyulkyarov and
Srdjan Stipic and
Tim Harris and
Osman S. Unsal and
Adrián Cristal and
Ibrahim Hur and
Mateo Valero Profiling and Optimizing Transactional
Memory Applications . . . . . . . . . . 25--56
M. Awasthi and
D. Nellans and
K. Sudan and
R. Balasubramonian and
A. Davis Managing Data Placement in Memory
Systems with Multiple Memory Controllers 57--83
Changhui Lin and
Vijay Nagarajan and
Rajiv Gupta Efficient Sequential Consistency Using
Conditional Fences . . . . . . . . . . . 84--117
Yun Zhang and
Jae W. Lee and
Nick P. Johnson and
David I. August DAFT: Decoupled Acyclic Fault Tolerance 118--140
Yan Huang and
Jie Tang and
Zhi-min Gu and
Min Cai and
Jianxun Zhang and
Ninghan Zheng The Performance Optimization of Threaded
Prefetching for Linked Data Structures 141--163
Jean-Claude Charr and
Raphaël Couturier and
David Laiymani Adaptation and Evaluation of the
Multisplitting-Newton and Waveform
Relaxation Methods Over Distributed
Volatile Environments . . . . . . . . . 164--183
Mwaffaq Otoom and
JoAnn M. Paul Workload Mode Identification for Chip
Heterogeneous Multiprocessors . . . . . 184--224
Mohsen Ebrahimi Moghaddam and
Mohammad Reza Bonyadi An Immune-based Genetic Algorithm with
Reduced Search Space Coding for
Multiprocessor Task Scheduling Problem 225--257
Wagner Meira and
Ricardo Bianchini Special Issue on Computer Architecture
and High-Performance Computing . . . . . 259--261
Ricardo Menotti and
João M. P. Cardoso and
Marcio M. Fernandes and
Eduardo Marques LALP: a Language to Program Custom
FPGA-Based Acceleration Engines . . . . 262--289
Jairo Panetta and
Thiago Teixeira and
Paulo R. P. de Souza Filho and
Carlos A. da Cunha Filho and
David Sotelo and
Fernando M. Roxo da Motta and
Silvio Sinedino Pinheiro and
Andre L. Romanelli Rosa and
Luiz R. Monnerat and
Leandro T. Carneiro and
Carlos H. B. de Albrecht Accelerating Time and Depth Seismic
Migration by CPU and GPU Cooperation . . 290--312
Pedro Leite and
João Marcelo Teixeira and
Thiago Farias and
Bernardo Reis and
Veronica Teichrieb and
Judith Kelner Nearest Neighbor Searches on the GPU: a
Massively Parallel Approach for Dynamic
Point Clouds . . . . . . . . . . . . . . 313--330
Artur Santos and
João Marcelo Teixeira and
Thiago Farias and
Veronica Teichrieb and
Judith Kelner Understanding the Efficiency of kD-tree
Ray-Traversal Techniques over a GPGPU
Architecture . . . . . . . . . . . . . . 331--352
Girish Venkatasubramanian and
Renato J. Figueiredo and
Ramesh Illikkal and
Donald Newell TMT: a TLB Tag Management Framework for
Virtualized Platforms . . . . . . . . . 353--380
Ákos Dudás and
Sándor Juhász and
Tamás Schrádi Software Controlled Adaptive
Pre-Execution for Data Prefetching . . . 381--396
Giuliano Laccetti and
Marco Lapegna and
Valeria Mele and
Diego Romano and
Almerico Murli A Double Adaptive Algorithm for
Multidimensional Integration on
Multicore Based HPC Systems . . . . . . 397--409
Rohit Jalan and
Arun Kejariwal Trin--Trin: Who's Calling? A Pin-Based
Dynamic Call Graph Extraction Framework 410--442
John M. Neuberger and
Nándor Sieben and
James W. Swift An MPI Implementation of a
Self-Submitting Parallel Job Queue . . . 443--464
Yan Huang and
Zhi-Min Gu and
Jie Tang and
Min Cai and
Jianxun Zhang and
others Estimating Effective Prefetch Distance
in Threaded Prefetching for Linked Data
Structures . . . . . . . . . . . . . . . 465--487
Fadi Abboud and
Yosi Ben-Asher and
Yousef Shajrawi and
Esti Stein Combining Height Reduction and
Scheduling for VLIW Machines Enhanced
with Three-Argument Arithmetic
Operations . . . . . . . . . . . . . . . 488--513
Wai-Mee Ching and
Da Zheng Automatic Parallelization of
Array-oriented Programs for a Multi-core
Machine . . . . . . . . . . . . . . . . 514--531
Joppe W. Bos Low-Latency Elliptic Curve Scalar
Multiplication . . . . . . . . . . . . . 532--550
Hubertus Franke and
Paul H. J. Kelly and
Pedro Trancoso Guest Editorial: Computing Frontiers . . 551--552
Alexander D. Rast and
Javier Navaridas and
Xin Jin and
Francesco Galluppi and
Luis A. Plana and
others Managing Burstiness and Scalability in
Event-Driven Models on the SpiNNaker
Neuromimetic System . . . . . . . . . . 553--582
Stamatis Kavadias and
Manolis Katevenis and
Michail Zampetakis and
Dimitrios S. Nikolopoulos Cache-Integrated Network Interfaces:
Flexible On-Chip Communication and
Synchronization for Large-Scale CMPs . . 583--604
Yong Cao and
Debprakash Patnaik and
Sean Ponce and
Jeremy Archuleta and
Patrick Butler and
others Parallel Mining of Neuronal Spike
Streams on Graphics Processing Units . . 605--632
Vinod Tipparaju and
Edoardo Apra and
Weikuan Yu and
Xinyu Que and
Jeffrey S. Vetter Runtime Techniques to Enable a
Highly-Scalable Global Address Space
Model for Petascale Computing . . . . . 633--655
Mounira Bachir and
Sid-Ahmed-Ali Touati and
Frederic Brault and
David Gregg and
Albert Cohen Minimal Unroll Factor for Code
Generation of Software Pipelining . . . 1--58
Shixun Zhang and
Shinichi Yamagiwa and
Masahiko Okumura and
Seiji Yunoki Kernel Polynomial Method on GPU . . . . 59--88
Daniel Nicácio and
Alexandro Baldassin and
Guido Araújo Transaction Scheduling Using Dynamic
Conflict Avoidance . . . . . . . . . . . 89--110
Khaled Hamidouche and
Fernando Machado Mendonca and
Joel Falcou and
Alba Cristina Magalhaes Alves de Melo and
Daniel Etiemble Parallel Smith--Waterman Comparison on
Multicore and Manycore Computing
Platforms with BSP++ . . . . . . . . . . 111--136
Junchang Wang and
Kai Zhang and
Xinan Tang and
Bei Hua B-Queue: Efficient and Practical Queuing
for Fast Core-to-Core Communication . . 137--159
John McAllister and
Luigi Carro and
Skevos Evripidou Guest Editorial: Special Issue on 2011
International Conference on Embedded
Computer Systems: Architectures,
Modeling and Simulation (SAMOS XI) . . . 161--162
David A. Penry and
Kurtis D. Cahill ADL-Based Specification of
Implementation Styles for Functional
Simulators . . . . . . . . . . . . . . . 163--211
Oscar Almer and
Igor Böhm and
Tobias Edler von Koch and
Björn Franke and
Stephen Kyle and
Volker Seeker and
Christopher Thompson and
Nigel Topham A Parallel Dynamic Binary Translator for
Efficient Multi-Core Simulation . . . . 212--235
Tiago Dias and
Sebastián López and
Nuno Roma and
Leonel Sousa Scalable Unified Transform Architecture
for Advanced Video Coding Embedded
Systems . . . . . . . . . . . . . . . . 236--260
Kenneth C. Rovers and
Jan Kuper UniTi: Unified Composition and Time for
Multi-domain Model-based Design . . . . 261--304
Karthik T. Sundararajan and
Timothy M. Jones and
Nigel P. Topham The Smart Cache: an Energy-Efficient
Cache Architecture Through Dynamic
Adaptation . . . . . . . . . . . . . . . 305--330
Stefan Langemeyer and
Peter Pirsch and
Holger Blume Using SDRAM Memories for
High-Performance Accesses to
Two-Dimensional Matrices Without
Transpose . . . . . . . . . . . . . . . 331--354
Calin Cascaval and
Pedro Trancoso and
Viktor Prasanna Guest Editorial: Computing Frontiers . . 355--356
Alexander Heinecke and
Dirk Pflüger Emerging Architectures Enable to Boost
Massively Parallel Data Mining Using
Adaptive Sparse Grids . . . . . . . . . 357--399
Chunyang Gou and
Georgi N. Gaydadjiev Addressing GPU On-Chip Shared Memory
Bank Conflicts Using Elastic Pipeline 400--429
Gianfranco Bilardi and
Kattamuri Ekanadham and
Pratap Pattnaik Efficient Stack Distance Computation for
a Class of Priority Replacement Policies 430--468
Nawab Ali and
Sriram Krishnamoorthy and
Mahantesh Halappanavar and
Jeff Daily Multi-Fault Tolerance for Cartesian Data
Distributions . . . . . . . . . . . . . 469--493
Emanuel Vianna and
Giovanni Comarela and
Tatiana Pontes and
Jussara Almeida and
Virgílio Almeida and
Kevin Wilkinson and
Harumi Kuno and
Umeshwar Dayal Analytical Performance Models for
MapReduce Workloads . . . . . . . . . . 495--525
Yunho Oh and
Doohwan Oh and
Won W. Ro GPU-Friendly Parallel Genome Matching
with Tiled Access and Reduced State
Transition Table . . . . . . . . . . . . 526--551
Claudio Schepke and
Nicolas Maillard and
Joerg Schneider and
Hans-Ulrich Heiss Online Mesh Refinement for Parallel
Atmospheric Models . . . . . . . . . . . 552--569
Christopher Oßner and
Klemens Böhm Graphs for Mining-Based Defect
Localization in Multithreaded Programs 570--593
Bugra Gedik Auto-tuning Similarity Search Algorithms
on Multi-core Architectures . . . . . . 595--620
Nasser Giacaman and
Oliver Sinnen Parallel Task for Parallelising
Object-Oriented Desktop Applications . . 621--681
Zheng Gu and
Matthew Small and
Xin Yuan and
Aniruddha Marathe and
David K. Lowenthal Protocol Customization for Improving MPI
Performance on RDMA-Enabled Clusters . . 682--703
Eunjung Park and
John Cavazos and
Louis-Noël Pouchet and
Cédric Bastoul and
Albert Cohen and
P. Sadayappan Predictive Modeling in a Polyhedral
Optimization Space . . . . . . . . . . . 704--750
Rudi Eigenmann and
Sam Midkiff Compiler Infrastructure . . . . . . . . 751--752
Hansang Bae and
Dheya Mustafa and
Jae-Woo Lee and
Aurangzeb and
Hao Lin and
Chirag Dave and
Rudolf Eigenmann and
Samuel P. Midkiff The Cetus Source-to-Source Compiler
Infrastructure: Overview and Evaluation 753--767
Yi Yang and
Huiyang Zhou The Implementation of a High Performance
GPGPU Compiler . . . . . . . . . . . . . 768--781
Gabriel Rodríguez and
María J. Martín and
Patricia González and
Juan Touriño and
Ramón Doallo Compiler-Assisted Checkpointing of
Parallel Codes: The Cetus and LLVM
Experience . . . . . . . . . . . . . . . 782--805
Amin Shafiee Sarvestani and
Erik Hansson and
Christoph Kessler Extensible Recognition of Algorithmic
Patterns in DSP Programs for Automatic
Parallelization . . . . . . . . . . . . 806--824
Barbara Chapman and
Deepak Eachempati and
Oscar Hernandez Experiences Developing the OpenUH
Compiler and Runtime Infrastructure . . 825--854
Xipeng Shen and
Yixun Liu and
Eddy Z. Zhang and
Poornima Bhamidipati An Infrastructure for Tackling
Input-Sensitivity of GPU Program
Optimizations . . . . . . . . . . . . . 855--869
Alba Melo and
Jean-Luc Gaudiot and
Luiz DeRose and
Kunle Olukotun and
Albert Zomaya Guest Editorial . . . . . . . . . . . . 1--3
Ana Avilés-González and
Juan Piernas and
Pilar González-Férez Scalable Metadata Management Through
OSD+ Devices . . . . . . . . . . . . . . 4--29
Enqiang Sun and
David Kaeli Aggressive Value Prediction on a GPU . . 30--48
Mouad Bahi and
Christine Eisenbeis Impact of Reverse Computing on
Information Locality in Register
Allocation for High Performance
Computing . . . . . . . . . . . . . . . 49--76
Joerg Schneider and
Barry Linnert List-based Data Structures for Efficient
Management of Advance Reservations . . . 77--93
Claudia Rosas and
Anna Sikora and
Josep Jorba and
Andreu Moreno and
Eduardo César Improving Performance on Data--Intensive
Applications Using a Load Balancing
Methodology Based on Divisible Load
Theory . . . . . . . . . . . . . . . . . 94--118
Sasa Tomi\'c and
Adrián Cristal and
Osman Unsal and
Mateo Valero Using Dynamic Runtime Testing for Rapid
Development of Architectural Simulators 119--139
Edson Borin and
Guido Araujo and
Mauricio Breternitz, Jr. and
Youfeng Wu Microcode Compression Using
Structured--Constrained Clustering . . . 140--164
Sarala Arunagiri and
Yipkei Kwok and
Patricia J. Teller and
Ricardo A. Portillo and
Seetharami R. Seelam FAIRIO: a Throughput-oriented Algorithm
for Differentiated I/O Performance . . . 165--197
M. M. Waliullah and
Per Stenstrom Removal of Conflicts in Hardware
Transactional Memory Systems . . . . . . 198--218
Nam Ma and
Yinglong Xia and
Viktor K. Prasanna Data Parallel Implementation of Belief
Propagation in Factor Graphs on
Multi-core Platforms . . . . . . . . . . 219--237
Eduarda Monteiro and
Bruno Vizzotto and
Cláudio Diniz and
Marilena Maule and
Bruno Zatt and
Sergio Bampi Parallelization of Full Search Motion
Estimation Algorithm for Parallel and
Distributed Platforms . . . . . . . . . 239--264
Gabriel P. Silva and
Juliana Correa and
Cristiana Bentes and
Sergio Guedes and
Mariela Gabioux The Experience in Designing and
Evaluating the High Performance Cluster
Netuno . . . . . . . . . . . . . . . . . 265--286
Mitja Bezensek and
Borut Robic A Survey of Parallel and Distributed
Algorithms for the Steiner Tree Problem 287--319
Johann Steinbrecher and
Cesar J. Philippidis and
Weijia Shang A Case Study of Implementing Supernode
Transformations . . . . . . . . . . . . 320--342
John K. Holmen and
David L. Foster Accelerating Single Iteration
Performance of CUDA--Based $3$D
Reaction--Diffusion Simulations . . . . 343--363
John K. Holmen and
David L. Foster Erratum to: Accelerating Single
Iteration Performance of CUDA--Based
$3$D Reaction--Diffusion Simulations . . 364--364
Luís Fabrício Wanderley Góes and
Christiane Pousa Ribeiro and
Márcio Castro and
Jean-François Méhaut and
Murray Cole and
Marcelo Cintra Automatic Skeleton-Driven Memory
Affinity for Transactional Worklist
Applications . . . . . . . . . . . . . . 365--382
Anonymous Editor's Note . . . . . . . . . . . . . 383--383
Changmin Lee and
Won Woo Ro and
Jean-Luc Gaudiot Boosting CUDA Applications with CPU--GPU
Hybrid Computing . . . . . . . . . . . . 384--404
Jesus Carretero and
Laurence T. Yang Parallel and Distributed Processing with
Applications: Preface . . . . . . . . . 405--407
Jesús Cámara and
Javier Cuenca and
Domingo Giménez and
Luis Pedro García and
Antonio M. Vidal Empirical Installation of Linear Algebra
Shared-Memory Subroutines for
Auto-Tuning . . . . . . . . . . . . . . 408--434
Abdullah Kayi and
Olivier Serres and
Tarek El-Ghazawi Bandwidth Adaptive Cache Coherence
Optimizations for Chip Multiprocessors 435--455
Yousun Ko and
Minyoung Jung and
Yo-Sub Han and
Bernd Burgstaller A Speculative Parallel DFA Membership
Test for Multicore, SIMD and Cloud
Computing Environments . . . . . . . . . 456--489
Thomas Baumann and
Michael Resch Parallel Parameter Identification in
Industrial Biotechnology . . . . . . . . 490--504
Cheng Hua Li and
Laurence T. Yang and
Man Lin Parallel Training of an Improved Neural
Network for Text Categorization . . . . 505--523
Gaetan Hains and
Youry Khmelevsky Guest Editorial for High-level Parallel
Programming and Applications . . . . . . 525--528
Alexandra Jimborean and
Philippe Clauss and
Jean-François Dollinger and
Vincent Loechner and
Juan Manuel Martinez Caamaño Dynamic and Speculative Polyhedral
Parallelization Using Compiler-Generated
Skeletons . . . . . . . . . . . . . . . 529--545
Kento Emoto and
Kiminori Matsuzaki An Automatic Fusion Mechanism for
Variable-Length List Skeletons in SkeTo 546--563
Christopher Brown and
Marco Danelutto and
Kevin Hammond and
Peter Kilpatrick and
Archibald Elliott Cost-Directed Refactoring for Parallel
Erlang Programs . . . . . . . . . . . . 564--582
Mathias Bourgoin and
Emmanuel Chailloux and
Jean-Luc Lamotte Efficient Abstractions for GPGPU
Programming . . . . . . . . . . . . . . 583--600
Michel Steuwer and
Malte Friese and
Sebastian Albers and
Sergei Gorlatch Introducing and Implementing the
Allpairs Skeleton for Programming
Multi-GPU Systems . . . . . . . . . . . 601--618
A. N. Yzelman and
R. H. Bisseling and
D. Roose and
K. Meerbergen MulticoreBSP for C: a High-Performance
Library for Shared-Memory Parallel
Programming . . . . . . . . . . . . . . 619--642
Nuno Gaspar and
Ludovic Henrio and
Eric Madelaine Bringing Coq into the World of GCM
Distributed Applications . . . . . . . . 643--662
Stefano Chessa and
Susanna Pelagatti and
Nicoletta Triolo Engineering Energy Efficient Visual
Sensor Network Applications Using
Skeletons . . . . . . . . . . . . . . . 663--680
Pavel Krömer and
Jan Platos and
Václav Snásel Nature-Inspired Meta-Heuristics on
Modern GPUs: State of the Art and Brief
Survey of Selected Algorithms . . . . . 681--709
Ciprian Dobre and
Fatos Xhafa Parallel Programming Paradigms and
Frameworks in Big Data Era . . . . . . . 710--738
Fahimeh Ramezani and
Jie Lu and
Farookh Khadeer Hussain Task-Based System Load Balancing in
Cloud Computing Using Particle Swarm
Optimization . . . . . . . . . . . . . . 739--754
Ugo Fiore and
Francesco Palmieri and
Aniello Castiglione and
Alfredo De Santis A Cluster-Based Data-Centric Model for
Network-Aware Task Scheduling in
Distributed Systems . . . . . . . . . . 755--775
Ibtehal Nafea and
Muhammad Younas and
Robert Holton and
Irfan Awan A Priority-Based Admission Control
Scheme for Commercial Web Servers . . . 776--797
Tomoya Enokido and
Ailixier Aikebaier and
Makoto Takizawa Energy-Efficient Redundant Execution of
Processes in a Fault-Tolerant Cluster of
Servers . . . . . . . . . . . . . . . . 798--819
Zia ur Rehman and
Omar Khadeer Hussain and
Farookh Khadeer Hussain Parallel Cloud Service Selection and
Ranking Based on QoS History . . . . . . 820--852
Fei Song and
Daochao Huang and
Huachun Zhou and
Hongke Zhang and
Ilsun You An Optimization-Based Scheme for
Efficient Virtual Machine Placement . . 853--872
Alex Nicolau Acknowledgment to Reviewers . . . . . . 873--874
Shin-Kai Chen and
Cheng-Yu Hung and
Ching-Chih Chen and
Chih-Wei Liu Parallelizing Complex Streaming
Applications on Distributed Scratchpad
Memory Multicore Architecture . . . . . 875--899
Young-Joo Kim and
Sejun Song and
Yong-Kee Jun VORD: a Versatile On-the-fly Race
Detection Tool in OpenMP Programs . . . 900--930
S. Sankaraiah and
Lam Hai Shuan and
C. Eswaran and
Junaidi Abdullah Performance Optimization of Video Coding
Process on Multi-Core Platform Using GOP
Level Parallelism . . . . . . . . . . . 931--947
Carlos H. González and
Basilio B. Fraguela An Algorithm Template for Domain-Based
Parallel Irregular Algorithms . . . . . 948--967
Steffen Ernsting and
Herbert Kuchen A Scalable Farm Skeleton for Hybrid
Parallel and Distributed Programming . . 968--987
Bert Gijsbers and
Clemens Grelck An Efficient Scalable Runtime System for
Macro Data Flow Processing Using S-Net 988--1011
M. Aldinucci and
S. Campa and
M. Danelutto and
P. Kilpatrick and
M. Torquati Design patterns percolating to parallel
programming framework implementation . . 1012--1031
Michal Czapi\'nski and
Chris Thompson and
Stuart Barnes Reducing Communication Overhead in
Multi-GPU Hybrid Solver for $2$D
Laplace's Equation . . . . . . . . . . . 1032--1047
John McAllister and
David Guevorkian and
Hartwig Jeschke and
Mihai Sima Guest Editorial: Special Issue on
Embedded Computer Systems:
Architectures, Modeling and Simulation 1--2
Teemu Nyländen and
Jani Boutellier and
Karri Nikunen and
Jari Hannuksela and
Olli Silvén Low-Power Reconfigurable Miniature
Sensor Nodes for Condition Monitoring 3--23
Amine Anane and
El Mostapha Aboulhamid A Transaction-Based Environment for
System Modeling and Parallel Simulation 24--58
Georgios Keramidas and
Chrysovalantis Datsios Revisiting Cache Resizing . . . . . . . 59--85
Daniel Baudisch and
Klaus Schneider Evaluation of Speculation in
Out-of-Order Execution of Synchronous
Dataflow Networks . . . . . . . . . . . 86--129
Ricardo A. Velásquez and
Pierre Michaud and
André Seznec BADCO: Behavioral Application-Dependent
Superscalar Core Models . . . . . . . . 130--157
Markus Metzger and
Xinmin Tian and
Walfred Tedeschi User-Guided Dynamic Data Race Detection 159--179
Jianxun Zhang and
Zhimin Gu and
Yan Huang and
Ninghan Zheng and
Xiaohan Hu Helper Thread Prefetching Control
Framework on Chip Multi-processor . . . 180--202
I. Z. Reguly and
M. B. Giles Finite Element Algorithms and Data
Structures on Graphical Processing Units 203--239
Matthew Williamson and
K. Subramani A Parallel Implementation for the
Negative Cost Girth Problem . . . . . . 240--259
Zhendong Wu and
Kai Lu and
Xiaoping Wang and
Xu Zhou Collaborative Technique for Concurrency
Bug Detection . . . . . . . . . . . . . 260--285
Kshitij Mehta and
Edgar Gabriel Multi-Threaded Parallel I/O for OpenMP
Applications . . . . . . . . . . . . . . 286--309
Ching-Hsien Hsu and
Xiaoming Li and
Xuanhua Shi Network and Parallel Computing . . . . . 311--315
Quanqing Xu and
Liang Zhao and
Mingzhong Xiao and
Anna Liu and
Yafei Dai YuruBackup: a Space-Efficient and Highly
Scalable Incremental Backup System in
the Cloud . . . . . . . . . . . . . . . 316--338
Hui Huang and
Ligang He and
Xueguang Chen and
Minghui Yu and
Zhiwu Wang Automatic Composition of Heterogeneous
Models Based on Semantic Web Services 339--358
Xiaowen Feng and
Hai Jin and
Ran Zheng and
Lei Zhu and
Weiqi Dai Accelerating Smith--Waterman Alignment
of Species-Based Protein Sequences on
GPU . . . . . . . . . . . . . . . . . . 359--380
Edwin Sha and
Li Wang and
Qingfeng Zhuge and
Jun Zhang and
Jing Liu Power Efficiency for Hardware/Software
Partitioning with Time and Area
Constraints on MPSoC . . . . . . . . . . 381--402
Hai Jin and
Hanfeng Qin and
Song Wu and
Xuerong Guo CCAP: a Cache Contention-Aware Virtual
Machine Placement Approach for HPC Cloud 403--420
Bernhard Egger and
Erik Gustafsson and
Changyeon Jo and
Jeongseok Son Efficiently Restoring Virtual Machines 421--439
Feng Liang and
Yunzhen Liu and
Hai Liu and
Shilong Ma and
Bettina Schnor A Parallel Job Execution Time Estimation
Approach Based on User Submission
Patterns within Computational Grids . . 440--454
Xianming Zhong and
Chengcheng Xiang and
Miao Yu and
Zhengwei Qi and
Haibing Guan A Virtualization Based Monitoring System
for Mini-intrusive Live Forensics . . . 455--471
Zhao Li and
Yao Shen and
Bin Yao and
Minyi Guo OFScheduler: a Dynamic Network Optimizer
for MapReduce in Heterogeneous Cluster 472--488
Kenn Slagter and
Ching-Hsien Hsu and
Yeh-Ching Chung An Adaptive and Memory Efficient
Sampling Mechanism for Partitioning in
MapReduce . . . . . . . . . . . . . . . 489--507
Songbin Liu and
Xiaomeng Huang and
Haohuan Fu and
Guangwen Yang and
Zhenya Song Data Reduction Analysis for Climate Data
Sets . . . . . . . . . . . . . . . . . . 508--527
Hai Jin and
Honglei Jiang and
Shadi Ibrahim and
Xiaofei Liao Inaccuracy in Private BitTorrent
Measurements . . . . . . . . . . . . . . 528--547
Dheya Mustafa and
Rudolf Eigenmann PETRA: Performance Evaluation Tool for
Modern Parallelizing Compilers . . . . . 549--571
Steven Feldman and
Pierre LaBorde and
Damian Dechev A Wait-Free Multi-Word Compare-and-Swap
Operation . . . . . . . . . . . . . . . 572--596
Tae-Hyuk Ahn and
Adrian Sandu and
Layne T. Watson and
Clifford A. Shaffer and
Yang Cao and
William T. Baumann A Framework to Analyze the Performance
of Load Balancing Schemes for Ensembles
of Stochastic Simulations . . . . . . . 597--630
Ryma Mahfoudhi and
Zaher Mahjoub and
Wahid Nasri Parallel Communication-Avoiding
Algorithm for Triangular Matrix
Inversion on Homogeneous and
Heterogeneous Platforms . . . . . . . . 631--655
Ali Jannesari Detection of High-Level Synchronization
Anomalies in Parallel Programs . . . . . 656--678
Daniel Langr and
Pavel Tvrdík and
Ivan Simecek and
Tomás Dytrych Downsampling Algorithms for Large Sparse
Matrices . . . . . . . . . . . . . . . . 679--702
Alejandro Hidalgo-Paniagua and
Miguel A. Vega-Rodríguez and
Nieves Pavón and
Joaquín Ferruz A Comparative Study of Parallel RANSAC
Implementations in $3$D Space . . . . . 703--720
Deli Zhang and
Brendan Lynch and
Damian Dechev Queue-Based and Adaptive Lock Algorithms
for Scalable Resource Allocation on
Shared--Memory Multiprocessors . . . . . 721--751
Pekka Jääskeläinen and
Carlos Sánchez de La Lama and
Erik Schnetter and
Kalle Raiskila and
Jarmo Takala and
Heikki Berg pocl: a Performance-Portable OpenCL
Implementation . . . . . . . . . . . . . 752--785
María Botón-Fernández and
Manuel Rodríguez-Pascual and
Miguel A. Vega-Rodríguez and
Francisco Prieto-Castrillo and
Rafael Mayo-García A Comparative Analysis of Adaptive
Solutions for Grid Environments . . . . 786--811
Jakub Nalepa and
Miroslaw Blocho Co-operation in the Parallel Memetic
Algorithm . . . . . . . . . . . . . . . 812--839
Slobodan Jeli\'c and
Sören Laue and
Domagoj Matijevi\'c and
Patrick Wijerama A Fast Parallel Implementation of a PTAS
for Fractional Packing and Covering
Linear Programs . . . . . . . . . . . . 840--875
Jose L. Jodra and
Ibai Gurrutxaga and
Javier Muguerza Efficient $3$D Transpositions in
Graphics Processing Units . . . . . . . 876--891
Christopher Brown High-Level Heterogeneous and
Hierarchical Parallel Systems (HLPGPU
2014) . . . . . . . . . . . . . . . . . 892--893
Ashkan Tousimojarad and
Wim Vanderbauwhede Steal Locally, Share Globally . . . . . 894--917
Hector Ortega-Arranz and
Yuri Torres and
Arturo Gonzalez-Escribano and
Diego R. Llanos Comprehensive Evaluation of a New
GPU-based Approach to the Shortest Path
Problem . . . . . . . . . . . . . . . . 918--938
Hector Ortega-Arranz and
Yuri Torres and
Arturo Gonzalez-Escribano and
Diego R. Llanos TuCCompi: a Multi-layer Model for
Distributed Heterogeneous Computing with
Tuning Capabilities . . . . . . . . . . 939--960
Guido Araujo and
Jean-Luc Gaudiot Guest Editorial: SBAC--PAD 2013 . . . . 961--964
Yun R. Qu and
Shijie Zhou and
Viktor K. Prasanna A Decomposition-Based Approach for
Scalable Many-Field Packet
Classification on Multi-core Processors 965--987
Karlo G. Lenzi and
Felipe A. P. Figueiredo Fully Optimized Code Block Segmentation
Algorithm for LTE--Advanced . . . . . . 988--1003
Martin Schreiber and
Christoph Riesinger Invasive Compute Balancing for
Applications with Shared and Hybrid
Parallelization . . . . . . . . . . . . 1004--1027
Zifan Liu and
Nahid Emad and
Soufian Ben Amor PageRank Computation Using a Multiple
Implicitly Restarted Arnoldi Method for
Modeling Epidemic Spread . . . . . . . . 1028--1053
Guohong Li and
Olivier Temam and
Zhenyu Liu Cluster Cache Monitor: Leveraging the
Proximity Data in CMP . . . . . . . . . 1054--1077
J. Lobeiras and
M. Amor and
R. Doallo BPLG: a Tuned Butterfly Processing
Library for GPU Architectures . . . . . 1078--1102
Paul-Antoine Arras and
Didier Fuin List Scheduling in Embedded Systems
Under Memory Constraints . . . . . . . . 1103--1128
Bharat Sukhwani and
Mathew Thoennes and
Hong Min A Hardware/Software Approach for
Database Query Acceleration with FPGAs 1129--1159
Gregorio Bernabé and
Javier Cuenca An Autotuning Engine for the $3$D Fast
Wavelet Transform on Clusters with
Hybrid CPU + GPU Platforms . . . . . . . 1160--1191
Gong Su and
Stephen Heisig The Scalability of Disjoint Data
Structures on a New Hardware
Transactional Memory System . . . . . . 1192--1217
George Michelogiannakis and
Xiaoye S. Li Extending Summation Precision for
Network Reduction Operations . . . . . . 1218--1243
Ching-Hsien Hsu and
Valentina Salapura Network and Parallel Computing . . . . . 1--4
Chengcheng Yang and
Peiquan Jin and
Lihua Yue Efficient Buffer Management for Tree
Indexes on Solid State Drives . . . . . 5--25
Ralph Duncan and
Peder Jungck and
Kenneth Ross Using Packet Processing Object Modules
Interchangeably as Stand-Alone Programs
or ``Multi-app'' Components . . . . . . 26--45
Mei-Ling Chiang and
Bo-Wen Yu and
Chi-Shian Shia Operating System Enhancement for
Supporting Massively Multiplayer Online
Games in a Server Cluster . . . . . . . 46--67
Xiaofei Liao and
Rentong Guo and
Danping Yu A Phase Behavior Aware Dynamic Cache
Partitioning Scheme for CMPs . . . . . . 68--86
Byungjoo Kim and
Jung Eun Lee and
Young J. Kim GPU Accelerated Finding of Channels and
Tunnels for a Protein Molecule . . . . . 87--108
Yulong Yu and
Xubin He and
He Guo and
Yuxin Wang A Credit-Based Load-Balance-Aware CTA
Scheduling Optimization Scheme in GPGPU 109--129
Xi Li and
Anthony Ventresque and
John Murphy SOC: Satisfaction-Oriented Virtual
Machine Consolidation in Enterprise Data
Centers . . . . . . . . . . . . . . . . 130--150
Yihua Ding and
James Z. Wang and
Pradip K. Srimani A Linear Time Self-stabilizing Algorithm
for Minimal Weakly Connected Dominating
Sets . . . . . . . . . . . . . . . . . . 151--162
Jian Cao and
Qiang Li and
Yuede Ji and
Yukun He Detection of Forwarding-Based Malicious
URLs in Online Social Networks . . . . . 163--180
Lizhi Peng and
Bo Yang and
Yuehui Chen Effectiveness of Statistical Features
for Early Stage Internet Traffic
Identification . . . . . . . . . . . . . 181--197
Zhaoxin Fan and
Shuoying Chen and
Li Zha A Text Clustering Approach of Chinese
News Based on Neural Network Language
Model . . . . . . . . . . . . . . . . . 198--206
Anonymous Editor's Note: Special Section on
Data-Flow for Multicore . . . . . . . . 207--207
Sebastian Weis and
Arne Garbade and
Bernhard Fechner and
Avi Mendelson and
Roberto Giorgi and
Theo Ungerer Architectural Support for Fault
Tolerance in a Teradevice Dataflow
System . . . . . . . . . . . . . . . . . 208--232
Dragos Sb\^\irlea and
Jun Shirako and
Ryan Newton and
Vivek Sarkar SCnC: Efficient Unification of Streaming
with Dynamic Task Parallelism . . . . . 233--256
Andreas Diavastos and
Pedro Trancoso and
Mikel Luján and
Ian Watson Integrating Transactions into the
Data-Driven Multi-threading Model Using
the TFlux Platform . . . . . . . . . . . 257--277
Daniel Orozco and
Elkin Garcia and
Robert Pavel and
Jaime Arteaga and
Guang Gao The Design and Implementation of
TIDeFlow: A Dataflow-Inspired Execution
Model for Parallel Loops and Task
Pipelining . . . . . . . . . . . . . . . 278--307
Anonymous Editor's Note: Special Section on
Concurrent Systems: Status and
Perspectives . . . . . . . . . . . . . . 308--308
Nakul Jindal and
Victor Lotrich and
Erik Deumens and
Beverly A. Sanders Exploiting GPUs with the Super
Instruction Architecture . . . . . . . . 309--324
W. Morven Gentleman Concurrency Paradigms: Competitive,
Coordinated, and Collaborative: Which
Control Mechanisms are Appropriate? . . 325--336
Emre Kültürsay and
Kemal Ebcioglu and
Gürhan Küçük and
Mahmut T. Kandemir Memory Partitioning in the Limit . . . . 337--380
Anonymous Editor's Note: High-Level Parallel
Programming and Applications (HLPP) . . 381--382
Clemens Grelck Guest Editorial for High-Level Parallel
Programming and Applications . . . . . . 383--385
Miguel Areias and
Ricardo Rocha A Lock-Free Hash Trie Design for
Concurrent Tabled Logic Programs . . . . 386--406
Alvaro Estebanez and
Diego R. Llanos and
Arturo Gonzalez-Escribano New Data Structures to Handle
Speculative Parallelization at Runtime 407--426
Ye Wang and
Zhiyuan Li GridFOR: a Domain Specific Language for
Parallel Grid-Based Applications . . . . 427--448
Antoine Tran Tan and
Joel Falcou and
Daniel Etiemble and
Hartmut Kaiser Automatic Task-Based Code Generation for
High Performance Domain Specific
Embedded Language . . . . . . . . . . . 449--465
Kiminori Matsuzaki and
Reina Miyazaki Parallel Tree Accumulations on MapReduce 466--485
Tarek Menouer and
Mohamed Rezgui and
Bertrand Le Cun and
Jean-Charles Régin Mixing Static and Dynamic Partitioning
to Parallelize a Constraint Programming
Solver . . . . . . . . . . . . . . . . . 486--505
Usman Dastgeer and
Christoph Kessler Smart Containers and Skeleton
Programming for GPU-Based Systems . . . 506--530
Marco Aldinucci and
Sonia Campa and
Marco Danelutto and
Peter Kilpatrick and
Massimo Torquati Pool Evolution: a Parallel Pattern for
Evolutionary and Symbolic Computing . . 531--551
Tristan Aubrey-Jones and
Bernd Fischer Synthesizing MPI Implementations from
Functional Data-Parallel Programs . . . 552--573
Jean Fortin and
Frédéric Gava BSP-Why: a Tool for Deductive
Verification of BSP Algorithms with
Subgroup Synchronisation . . . . . . . . 574--597
Konrad Siek and
Pawel T. Wojciechowski Atomic RMI: a Distributed Transactional
Memory Framework . . . . . . . . . . . . 598--619
José M. Andión and
Manuel Arenaz and
François Bodin and
Gabriel Rodríguez and
Juan Touriño Locality-Aware Automatic Parallelization
for GPGPU with OpenHMPP Directives . . . 620--643
Ali Jannesari and
Felix Wolf Automatic Generation of Unit Tests for
Correlated Variables in Parallel
Programs . . . . . . . . . . . . . . . . 644--662
Carlos Alberto Martínez-Angeles and
Haicheng Wu and
Inês Dutra and
Vítor Santos Costa and
Jorge Buenabad-Chávez Relational Learning with GPUs:
Accelerating Rule Coverage . . . . . . . 663--685
Shigeyuki Sato and
Kiminori Matsuzaki A Generic Implementation of Tree
Skeletons . . . . . . . . . . . . . . . 686--707
Juan Chabkinian and
Thomas J. E. Schwarz SJ Fast LH$*$ . . . . . . . . . . . . . . . 709--734
Marco Lattuada and
Christian Pilato and
Fabrizio Ferrandi Performance Estimation of Task Graphs
Based on Path Profiling . . . . . . . . 735--771
Srimanth Gadde and
William Acosta and
Jordan Ringenberg and
Robert Green and
Vijay Devabhaktuni Achieving Optimal Inter-Node
Communication in Graph Partitioning
Using Random Selection and Breadth-First
Search . . . . . . . . . . . . . . . . . 772--800
Ayaz ul Hassan Khan and
Mayez Al-Mouhamed and
Allam Fatayer and
Nazeeruddin Mohammad Optimizing the Matrix Multiplication
Using Strassen and Winograd Algorithms
with Limited Recursions on Many-Core . . 801--830
Ayaz ul Hassan Khan and
Mayez Al-Mouhamed and
Allam Fatayer and
Nazeeruddin Mohammad Erratum to: Optimizing the Matrix
Multiplication Using Strassen and
Winograd Algorithms with Limited
Recursions on Many--Core . . . . . . . . 831--831
Ren Li and
Haibo Hu and
Heng Li and
Yunsong Wu and
Jianxi Yang MapReduce Parallel Programming Model: A
State-of-the-Art Survey . . . . . . . . 832--866
Etem Deniz and
Alper Sen Using Machine Learning Techniques to
Detect Parallel Patterns of
Multi-threaded Applications . . . . . . 867--900
Giuliano Laccetti and
Marco Lapegna and
Valeria Mele A Loosely Coordinated Model for
Heap-Based Priority Queues in Multicore
Environments . . . . . . . . . . . . . . 901--921
Anonymous Editor's Note: Special Issue on
Computing Frontiers . . . . . . . . . . 923--923
Andreea Anghel and
Laura Mihaela Vasilescu and
Giovanni Mariani and
Rik Jongerius and
Gero Dittmann An Instrumentation Approach for
Hardware-Agnostic Software
Characterization . . . . . . . . . . . . 924--948
Musfiq Rahman and
Bruce R. Childers Asteroid: Scalable Online Memory
Diagnostics for Multi-core, Multi-socket
Servers . . . . . . . . . . . . . . . . 949--974
Giovanni Mariani and
Andreea Anghel and
Rik Jongerius and
Gero Dittmann Scaling Properties of Parallel
Applications to Exascale . . . . . . . . 975--1002
Leandro Fiorin and
Erik Vermij and
Jan van Lunteren and
Rik Jongerius and
Christoph Hagleitner Exploring the Design Space of an
Energy-Efficient Accelerator for the
SKA1-Low Central Signal Processor . . . 1003--1027
Archimedes Pavlidis and
Dimitris Gizopoulos Hierarchical Synthesis of Quantum and
Reversible Architectures . . . . . . . . 1028--1053
Rui Han and
Jianfeng Zhan and
Jose Vazquez-Poletti Luis SARP: Synopsis--Based Approximate
Request Processing for Low Latency and
Small Correctness Loss in Cloud Online
Services . . . . . . . . . . . . . . . . 1054--1077
Vassilis Vassiliadis and
Charalampos Chalios and
Konstantinos Parasyris and
Christos D. Antonopoulos and
Spyros Lalis and
Nikolaos Bellas and
Hans Vandierendonck and
Dimitrios S. Nikolopoulos Exploiting Significance of Computations
for Energy-Constrained Approximate
Computing . . . . . . . . . . . . . . . 1078--1098
Chao Wang and
Nadia Nedjah and
Luiza M. Mourelle and
Aili Wang Preface to the Special Issue on
Sequential Code Parallelization . . . . 1099--1101
Nadia Nedjah and
Luiza de Macedo Mourelle and
Chao Wang A Parallel Yet Pipelined Architecture
for Efficient Implementation of the
Advanced Encryption Standard Algorithm
on Reconfigurable Hardware . . . . . . . 1102--1117
Huang Wang and
Xianglan Chen and
Huaping Chen A Cross-ISA Kernelized High-Performance
Parallel Emulator . . . . . . . . . . . 1118--1141
Ansar Javed and
Bibrak Qamar and
Mohsan Jameel and
Aamir Shafi and
Bryan Carpenter Towards Scalable Java HPC with Hybrid
and Native Communication Devices in MPJ
Express . . . . . . . . . . . . . . . . 1142--1172
Nadia Nedjah and
Rogério de M. Calazan and
Luiza de Macedo Mourelle and
Chao Wang Parallel Implementations of the
Cooperative Particle Swarm Optimization
on Many-core and Multi-core
Architectures . . . . . . . . . . . . . 1173--1199
Alessandro Pellegrini and
Sebastiano Peluso and
Francesco Quaglia and
Roberto Vitali Transparent Speculative Parallelization
of Discrete Event Simulation
Applications Using Global Variables . . 1200--1247
Xiaomeng Huang and
Yufang Ni and
Dexun Chen and
Songbin Liu and
Haohuan Fu and
Guangwen Yang Czip: a Fast Lossless Compression
Algorithm for Climate Data . . . . . . . 1248--1267
Rachid Habel and
Frédérique Silber-Chaussumier and
François Irigoin and
Elisabeth Brunet and
François Trahay Combining Data and Computation
Distribution Directives for Hybrid
Parallel Programming: a Transformation
System . . . . . . . . . . . . . . . . . 1268--1295
Martin Frieb and
Ralf Jahr and
Haluk Ozaktas and
Andreas Hugl and
Hans Regler and
Theo Ungerer A Parallelization Approach for Hard
Real-Time Systems and Its Application on
Two Industrial Programs . . . . . . . . 1296--1336
Alcides Fonseca and
Bruno Cabral and
João Rafael and
Ivo Correia Automatic Parallelization: Executing
Sequential Programs on a Task-Based
Parallel Runtime . . . . . . . . . . . . 1337--1358
Abubakar Siddique and
Mohammad Ansari and
Mikel Luján Purge--Rehab: Eager Software
Transactional Memory with High
Performance Under Contention . . . . . . 1359--1383
Vijayalakshmi Srinivasan and
Yunquan Zhang Special Issue on Network and Parallel
Computing . . . . . . . . . . . . . . . 1--3
Jinbao Zhang and
Xiaofei Liao and
Hai Jin and
Dong Liu and
Li Lin and
Kao Zhao An Optimal Page-Level Power Management
Strategy in PCM--DRAM Hybrid Memory . . 4--16
Vesna Smiljkovi\'c and
Osman Ünsal and
Adrián Cristal and
Mateo Valero Determinism at Standard-Library Level in
TM-Based Applications . . . . . . . . . 17--29
Chencheng Ye and
Jacob Brock and
Chen Ding and
Hai Jin Rochester Elastic Cache Utility (RECU):
Unequal Cache Sharing is Good Economics 30--44
Song Wu and
Yongchang Li and
Xinhou Wang and
Hai Jin and
Hanhua Chen Vshadow: Promoting Physical Servers into
Virtualization World . . . . . . . . . . 45--66
Yaojie Lu and
Sotirios G. Ziavras Instruction Fusion for Multiscalar and
Many-Core Processors . . . . . . . . . . 67--78
Jing Li and
Lei Liu and
Yuan Wu and
Xiaobing Feng and
Chengyong Wu Two-Level Task Scheduling for Irregular
Applications on GPU Platform . . . . . . 79--93
Preeti Malakar and
Venkatram Vishwanath Hierarchical Read--Write Optimizations
for Scientific Applications with
Multi-variable Structured Datasets . . . 94--108
Maksudul Alam and
Maleq Khan Parallel Algorithms for Generating
Random Networks with Given Degree
Sequences . . . . . . . . . . . . . . . 109--127
Yu Zhang and
Huifang Cao DMR: a Deterministic MapReduce for
Multicore Systems . . . . . . . . . . . 128--141
Sheng Wang and
Weizhong Qiang and
Hai Jin and
Jinfeng Yuan CovertInspector: Identification of
Shared Memory Covert Timing Channel in
Multi-tenanted Cloud . . . . . . . . . . 142--156
Jiansheng Yao and
Chunguang Ma and
Peng Wu and
Gang Du and
Qi Yuan An Opportunistic Network Coding Routing
for Opportunistic Networks . . . . . . . 157--171
Yong Su and
Zhan Wang and
Zhiguo Fan and
Zheng Cao and
Xiaoli Liu and
En Shao and
Xuejun An and
Ninghui Sun HyperFatTree: a Large-Scale Tree-Based
Network with Low-Radix Switches . . . . 172--184
Xingjing Lu and
Long Chen and
Zhiyuan Li Performance Evaluation and Enhancement
of Process-Based Parallel Loop Execution 185--198
Marco Danelutto and
Susanna Pelagatti and
Massimo Torquati Guest Editorial: High-Level Parallel
Programming and Applications . . . . . . 199--202
Mehdi Goli and
Horacio González-Vélez Autonomic Coordination of Skeleton-Based
Applications Over CPU/GPU Multi-Core
Architectures . . . . . . . . . . . . . 203--224
Alvaro Estebanez and
Diego R. Llanos and
Arturo Gonzalez-Escribano Using the Xeon Phi Platform to Run
Speculatively-Parallelized Codes . . . . 225--241
Mathias Bourgoin and
Emmanuel Chailloux and
Jean-Luc Lamotte High Level Data Structures for GPGPU
Programming in a Statically Typed
Language . . . . . . . . . . . . . . . . 242--261
Rafael Sotomayor and
Luis Miguel Sanchez and
Javier Garcia Blas and
Javier Fernandez and
J. Daniel Garcia Automatic CPU/GPU Generation of
Multi-versioned OpenCL Kernels for C++
Scientific Applications . . . . . . . . 262--282
Steffen Ernsting and
Herbert Kuchen Data Parallel Algorithmic Skeletons with
Accelerator Support . . . . . . . . . . 283--299
Frédéric Loulergue and
Wadoud Bousdira and
Julien Tesson Calculating Parallel Programs in Coq
Using List Homomorphisms . . . . . . . . 300--319
Le-Duc Tung and
Zhenjiang Hu Towards Systematic Parallelization of
Graph Transformations Over Pregel . . . 320--339
V. Allombert and
F. Gava and
J. Tesson Multi-ML: Programming Multi-BSP
Algorithms in ML . . . . . . . . . . . . 340--361
Kiminori Matsuzaki Functional Models of Hadoop MapReduce
with Application to Scan . . . . . . . . 362--381
Tiziano De Matteis and
Gabriele Mencagli Parallel Patterns for Window-Based
Stateful Operators on Data Streams: an
Algorithmic Skeleton Approach . . . . . 382--401
J. Darlington and
A. J. Field and
L. Hakim Tackling Complexity in High Performance
Computing Applications . . . . . . . . . 402--420
Pierre Laborde and
Steven Feldman and
Damian Dechev A Wait-Free Hash Map . . . . . . . . . . 421--448
Nuno Fachada and
Vitor V. Lopes and
Rui C. Martins and
Agostinho C. Rosa Parallelization Strategies for Spatial
Agent-Based Models . . . . . . . . . . . 449--481
Milos Cvetanovi\'c and
Zaharije Radivojevi\'c and
Veljko Milutinovi\'c Restart Optimization for Transactional
Memory with Lazy Conflict Detection . . 482--507
Jiaquan Gao and
Zejie Li and
Ronghua Liang and
Guixia He Adaptive Optimization $
l_1$-Minimization Solvers on GPU . . . . 508--529
Victor Garcia and
Alejandro Rico and
Carlos Villavieja and
Paul Carpenter and
Nacho Navarro and
Alex Ramirez Adaptive Runtime-Assisted Block
Prefetching on Chip-Multiprocessors . . 530--550
Ayaz H. Khan and
Mayez Al-Mouhamed and
Muhammed Al-Mulhem and
Adel F. Ahmed RT-CUDA: a Software Tool for CUDA Code
Restructuring . . . . . . . . . . . . . 551--594
Yiming Han and
Anthony T. Chronopoulos Scalable Loop Self-scheduling Schemes
for Large-Scale Clusters and Cloud
Systems . . . . . . . . . . . . . . . . 595--611
Asim YarKhan and
Jakub Kurzak and
Piotr Luszczek and
Jack Dongarra Porting the PLASMA Numerical Library to
the OpenMP Standard . . . . . . . . . . 612--633
Krupa Sivakumaran and
Arul Siromoney Priority Based Yield of Shared Cache to
Provide Cache QoS in Multicore Systems 634--656
Shuai Che and
Bradford M. Beckmann and
Steven K. Reinhardt Programming GPGPU Graph Applications
with Linear Algebra Building Blocks . . 657--679
Xiao-qing Wang and
Xian-long Jin and
Da-zhi Kou and
Jia-hui Chen A Parallel Approach for the Generation
of Unstructured Meshes with Billions of
Elements on Distributed-Memory
Supercomputers . . . . . . . . . . . . . 680--710
Mohammed Sourouri and
Scott B. Baden and
Xing Cai Panda: a Compiler Framework for
Concurrent CPU $+$ GPU Execution of $3$D
Stencil Computations on GPU-accelerated
Supercomputers . . . . . . . . . . . . . 711--729
Maozhen Li and
Zhuo Tang Guest Editorial: The Parallel Storage,
Processing and Analysis for Big Data . . 731--733
Qicong Wang and
Jinhao Zhao and
Dingxi Gong and
Yehu Shen and
Maozhen Li and
Yunqi Lei Parallelizing Convolutional Neural
Networks for Action Event Recognition in
Surveillance Videos . . . . . . . . . . 734--759
Yang Liu and
Lixiong Xu and
Maozhen Li The Parallelization of Back Propagation
Neural Network in MapReduce and Spark 760--779
Kien Tuong Phan and
Tomas Henrique Maul and
Tuong Thuy Vu An Empirical Study on Improving the
Speed and Generalization of Neural
Networks Using a Parallel Circuit
Approach . . . . . . . . . . . . . . . . 780--796
Hsiang-Huang Wu and
Chien-Min Wang Generalization of Large-Scale Data
Processing in One MapReduce Job for
Coarse-Grained Parallelism . . . . . . . 797--826
Yan Wang and
Kenli Li and
Keqin Li Partition Scheduling on Heterogeneous
Multicore Processors for
Multi-dimensional Loops Applications . . 827--852
Zhuoer Gu and
Ligang He and
Cheng Chang and
Jianhua Sun and
Hao Chen and
Chenlin Huang Developing an Efficient Pattern
Discovery Method for CPU Utilizations of
Computers . . . . . . . . . . . . . . . 853--878
Wei Liu and
Lu Wang and
Yuyue Du and
Maozhen Li Deadlock Property Analysis of Concurrent
Programs Based on Petri Net Structure 879--898
Aijia Ouyang and
Xuyu Peng and
Jing Liu and
Ahmed Sallam Hardware/Software Partitioning for
Heterogeneous MPSoC Considering
Communication Overhead . . . . . . . . . 899--922
Yang Ou and
Nong Xiao and
Fang Liu and
Zhiguang Chen and
Wei Chen and
Lizhou Wu Gemini: a Novel Hardware and Software
Implementation of High-performance PCIe
SSD . . . . . . . . . . . . . . . . . . 923--945
Mingzhu Deng and
Wei Chen and
Nong Xiao and
Songping Yu and
Yupeng Hu GLE-Dedup: a Globally-Locally Even
Deduplication by Request-Aware Placement
for Better Read Performance . . . . . . 946--964
Jiayi Du and
Renfa Li and
Zheng Xiao and
Zhao Tong and
Li Zhang Optimization of Data Allocation on CMP
Embedded System with Data Migration . . 965--981
Yuyue Du and
Lu Wang and
Man Qi Constructing Service Clusters Based on
Service Space . . . . . . . . . . . . . 982--1000
Yanan Sun and
Yuyue Du and
Maozhen Li A Repair of Workflow Models Based on
Mirroring Matrices . . . . . . . . . . . 1001--1020
Giuliano Laccetti and
Ian Foster and
Marco Lapegna and
Paul Messina and
Raffaele Montella and
Almerico Murli Guest Editorial for Hybrid Parallelism
in New HPC Systems . . . . . . . . . . . 1021--1025
Ami Marowka Energy-Aware Modeling of Scaled
Heterogeneous Systems . . . . . . . . . 1026--1045
Moritz Kreutzer and
Jonas Thies and
Melven Röhrig-Zöllner and
Andreas Pieper and
Faisal Shahzad and
Martin Galgon and
Achim Basermann and
Holger Fehske and
Georg Hager and
Gerhard Wellein GHOST: Building Blocks for High
Performance Sparse Linear Algebra on
Heterogeneous Systems . . . . . . . . . 1046--1072
Beata Bylina and
Joanna Potiopa Explicit Fourth-Order Runge--Kutta
Method on Intel Xeon Phi Coprocessor . . 1073--1090
Pawel Czarnul Benchmarking Performance of a Hybrid
Intel Xeon/Xeon Phi System for Parallel
Computation of Similarity Measures
Between Large Vectors . . . . . . . . . 1091--1107
Andrzej Glowacz and
Marcin Pietro\'n Implementation of Digital Watermarking
Algorithms in Parallel Hardware
Accelerators . . . . . . . . . . . . . . 1108--1127
Jieun Choi and
Theodora Adufu and
Yoonhee Kim Data-Locality Aware Scientific Workflow
Scheduling Methods in HPC Cloud
Environments . . . . . . . . . . . . . . 1128--1141
Raffaele Montella and
Giulio Giunta and
Giuliano Laccetti and
Marco Lapegna and
Carlo Palmieri and
Carmine Ferraro and
Valentina Pelliccia and
Cheol-Ho Hong and
Ivor Spence and
Dimitrios S. Nikolopoulos On the Virtualization of CUDA Based GPU
Remoting on ARM and x86 Machines in the
GVirtuS Framework . . . . . . . . . . . 1142--1163
G. B. Barone and
V. Boccia and
D. Bottalico and
R. Campagna and
L. Carracciuolo and
G. Laccetti and
M. Lapegna An Approach to Forecast Queue Time in
Adaptive Scheduling: How to Mediate
System Efficiency and Users Satisfaction 1164--1193
P. Natesan and
R. R. Rajalaxmi and
G. Gowrison and
P. Balasubramanie Hadoop Based Parallel Binary Bat
Algorithm for Network Intrusion
Detection . . . . . . . . . . . . . . . 1194--1213
Rossella Arcucci and
Luisa D'Amore and
Luisa Carracciuolo and
Giuseppe Scotti and
Giuliano Laccetti A Decomposition of the Tikhonov
Regularization Functional Oriented to
Exploit Hybrid Multilevel Parallelism 1214--1235
Johannes Langguth and
Qiang Lan and
Namit Gaur and
Xing Cai Accelerating Detailed Tissue-Scale $3$D
Cardiac Simulations Using Heterogeneous
CPU--Xeon Phi Computing . . . . . . . . 1236--1258
Zhiyuan Shao and
Jian He and
Huiming Lv and
Hai Jin FOG: a Fast Out-of-Core Graph Processing
Framework . . . . . . . . . . . . . . . 1259--1272
Hai Jin and
Aaqif Afzaal Abbasi and
Song Wu Pathfinder: Application-Aware
Distributed Path Computation in Clouds 1273--1284
Yuanzhen Geng and
Xuanhua Shi and
Cheng Pei and
Hai Jin and
Wenbin Jiang LCS: an Efficient Data Eviction Strategy
for Spark . . . . . . . . . . . . . . . 1285--1297
Chonghua Wang and
Zhiyu Hao and
Lei Cui and
Xiangyu Zhang and
Xiaochun Yun Introspection-Based Memory Pruning for
Live VM Migration . . . . . . . . . . . 1298--1309
Fengfeng Pan and
Yinliang Yue and
Jin Xiong dCompaction: Delayed Compaction for the
LSM-Tree . . . . . . . . . . . . . . . . 1310--1325
Sudakshina Dutta and
Dipankar Sarkar and
Arvind Rawat Synchronization Validation for
Cross-Thread Dependences in Parallel
Programs . . . . . . . . . . . . . . . . 1326--1365
Xing Fan and
Mostafa Mehrabi and
Oliver Sinnen and
Nasser Giacaman Supporting Enhanced Exception Handling
with OpenMP in Object--Oriented
Languages . . . . . . . . . . . . . . . 1366--1389
Youcef Barigou and
Edgar Gabriel Maximizing Communication--Computation
Overlap Through Automatic
Parallelization and Run-time Tuning of
Non-blocking Collective Operations . . . 1390--1416
Guillermo Payá-Vayá and
Andreas Gerstlauer Guest Editorial: Special Issue on the
2015 International Conference on
Embedded Computer Systems ---
Architectures, Modeling and Simulation
(SAMOS XV) . . . . . . . . . . . . . . . 1417--1419
Pei Liu and
Ahmed Hemani and
Kolin Paul and
Christian Weis and
Matthias Jung and
Norbert Wehn $3$D-Stacked Many-Core Architecture for
Biological Sequence Analysis Problems 1420--1460
Yosi Ben Asher and
Irina Lipov and
Vladislav Tartakovsky and
Dror Tiv Generating ASIPs with Reduced Number of
Connections to the Register-File . . . . 1461--1487
Xinnian Zheng and
Lizy K. John and
Andreas Gerstlauer LACross: Learning-Based Analytical
Cross-Platform Performance and Power
Prediction . . . . . . . . . . . . . . . 1488--1514
Biao Wang and
Diego F. de Souza and
Mauricio Alvarez-Mesa and
Chi Ching Chi and
Ben Juurlink and
Aleksandar Ilic and
Nuno Roma and
Leonel Sousa GPU Parallelization of HEVC In-Loop
Filters . . . . . . . . . . . . . . . . 1515--1535
Nabil Hallou and
Erven Rohou and
Philippe Clauss Runtime Vectorization Transformations of
Binary Code . . . . . . . . . . . . . . 1536--1565
Christian Weis and
Abdul Mutaal and
Omar Naji and
Matthias Jung and
Andreas Hansson and
Norbert Wehn DRAMSpec: a High-Level DRAM Timing,
Power and Area Exploration Tool . . . . 1566--1591
Miguel Angel Aguilar and
Juan Fernando Eusse and
Projjol Ray and
Rainer Leupers and
Gerd Ascheid and
Weihua Sheng and
Prashant Sharma Towards Parallelism Extraction for
Heterogeneous Multicore Android Devices 1592--1624
Nuno Fachada and
Vitor V. Lopes and
Rui C. Martins and
Agostinho C. Rosa Erratum to: Parallelization Strategies
for Spatial Agent-Based Models . . . . . 1625--1626
Sergei Gorlatch and
Herbert Kuchen Guest Editorial: High-Level Parallel
Programming with Algorithmic Skeletons 1--3
Jan Stypka and
Wojciech Turek and
Aleksander Byrski and
Marek Kisiel-Dorohinicki and
Adam D. Barwell and
Christopher Brown and
Kevin Hammond and
Vladimir Janjic The Missing Link! A New Skeleton for
Evolutionary Multi-agent Systems in
Erlang . . . . . . . . . . . . . . . . . 4--22
Michael Haidl and
Sergei Gorlatch High-Level Programming for Many-Cores
Using C++14 and the STL . . . . . . . . 23--41
Fabian Wrede and
Steffen Ernsting Simultaneous CPU--GPU Execution of Data
Parallel Algorithmic Skeletons . . . . . 42--61
August Ernstsson and
Lu Li and
Christoph Kessler SkePU 2: Flexible and Type-Safe Skeleton
Programming for Heterogeneous Parallel
Systems . . . . . . . . . . . . . . . . 62--80
Antonio Brogi and
Marco Danelutto and
Daniele De Sensi and
Ahmad Ibrahim and
Jacopo Soldani and
Massimo Torquati Analysing Multiple QoS Attributes in
Parallel Design Patterns-Based
Applications . . . . . . . . . . . . . . 81--100
Ari Rasch and
Sergei Gorlatch Multi-dimensional Homomorphisms and
Their Implementation in OpenCL . . . . . 101--119
Mehdi Goli and
Horacio González-Vélez Formalised Composition and Interaction
for Heterogeneous Structured Parallelism 120--151
Venkatesh Kannan and
G. W. Hamilton Functional Program Transformation for
Parallelisation Using Skeletons . . . . 152--172
Jixiang Yang and
Qingbi He Scheduling Parallel Computations by Work
Stealing: a Survey . . . . . . . . . . . 173--197
Samer Arandi and
George Matheou and
Costas Kyriacou and
Paraskevas Evripidou Data-Driven Thread Execution on
Heterogeneous Processors . . . . . . . . 198--224
Saurabh Hukerikar and
Keita Teranishi and
Pedro C. Diniz and
Robert F. Lucas RedThreads: an Interface for
Application-Level Fault
Detection/Correction Through Adaptive
Redundant Multithreading . . . . . . . . 225--251
Jorge Silva and
Ana Aguiar and
Fernando Silva Parallel Asynchronous Strategies for the
Execution of Feature Selection
Algorithms . . . . . . . . . . . . . . . 252--283
Jawad Haj-Yihia and
Yosi Ben-Asher Software Static Energy Modeling for
Modern Processors . . . . . . . . . . . 284--312
Sai Charan Koduru and
Keval Vora and
Rajiv Gupta Software Speculation on Caching DSMs . . 313--332
Antonino Tumeo and
Hubertus Franke and
Gianluca Palermo and
John Feo Guest Editorial: Special Issue on
Computing Frontiers . . . . . . . . . . 333--335
Naila Farooqui and
Indrajit Roy and
Yuan Chen Vanish Talwar and
Rajkishore Barik and
Brian Lewis and
Tatiana Shpeisman and
Karsten Schwan Accelerating Data Analytics on
Integrated GPU Platforms via Runtime
Specialization . . . . . . . . . . . . . 336--375
Ke Wang and
Elaheh Sadredini and
Kevin Skadron Hierarchical Pattern Mining with the
Automata Processor . . . . . . . . . . . 376--411
William Horn and
Manoj Kumar and
Joefon Jann and
José Moreira and
Pratap Pattnaik and
Mauricio Serrano and
Gabriel Tanase and
Hao Yu Graph Programming Interface (GPI): a
Linear Algebra Programming Model for
Large Scale Graph Computations . . . . . 412--440
David Jaeger and
Hendrik Graupner and
Chris Pelchen and
Feng Cheng and
Christoph Meinel Fast Automated Processing and Evaluation
of Identity Leaks . . . . . . . . . . . 441--470
Farhana Aleen and
Vyacheslav P. Zakharin and
Rakesh Krishnaiyer and
Garima Gupta and
David Kreitzer and
Chang-Sun Lin, Jr. Automated Compiler Optimization of
Multiple Vector Loads/Stores . . . . . . 471--503
Salvatore Cuomo and
Marco Aldinucci and
Massimo Torquati Guest Editorial for Programming Models
and Algorithms for Data Analysis in HPC
Systems . . . . . . . . . . . . . . . . 505--507
Awais Ahmad and
Anand Paul and
Sadia Din M. Mazhar Rathore and
Gyu Sang Choi and
Gwanggil Jeon Multilevel Data Processing Using
Parallel Algorithms for Analyzing Big
Data in High-Performance Computing . . . 508--527
Pasquale De Michele and
Francesco Maiorano and
Livia Marcellino and
Francesco Piccialli A GPU Implementation of OLPCA Method in
Hybrid Environment . . . . . . . . . . . 528--542
Puneet Jai Kaur and
Sakshi Kaushal and
Arun Kumar Sangaiah and
Francesco Piccialli A Framework for Assessing Reusability
Using Package Cohesion Measure in Aspect
Oriented Systems . . . . . . . . . . . . 543--564
Gang Mei and
Salvatore Cuomo and
Hong Tian and
Nengxiong Xu and
Linjun Peng MeshCleaner: a Generic and
Straightforward Algorithm for Cleaning
Finite Element Meshes . . . . . . . . . 565--583
Bastien Plazolles and
Didier El Baz and
Martin Spel and
Vincent Rivola and
Pascal Gegout SIMD Monte-Carlo Numerical Simulations
Accelerated on GPU and Xeon Phi . . . . 584--606
Emilia Popa and
Mauro Iacono and
Florin Pop Adapting MCP and HLFET Algorithms to
Multiple Simultaneous Scheduling . . . . 607--629
M. Mazhar Rathore and
Hojae Son and
Awais Ahmad and
Anand Paul and
Gwanggil Jeon Real-Time Big Data Stream Processing
Using GPU with Spark Over Hadoop
Ecosystem . . . . . . . . . . . . . . . 630--646
Anonymous Editor's Note: Special Issue on Network
and Parallel Computing for New
Architectures and Applications . . . . . 647--647
Yuntao Lu and
Chao Wang and
Lei Gong and
Xuehai Zhou SparseNN: a Performance-Efficient
Accelerator for Large-Scale Sparse
Neural Networks . . . . . . . . . . . . 648--659
Sijiang Fan and
Jiawei Fei and
Li Shen Accelerating Deep Learning with a
Parallel Mechanism Using CPU + MIC . . . 660--673
Chengfan Jia and
Junnan Liu and
Xu Jin and
Han Lin and
Hong An and
Wenting Han and
Zheng Wu and
Mengxian Chi Improving the Performance of Distributed
TensorFlow with RDMA . . . . . . . . . . 674--685
Xiangyu Ju and
Quan Chen and
Zhenning Wang and
Minyi Guo and
Guang R. Gao DCF: a Dataflow-Based Collaborative
Filtering Training Algorithm . . . . . . 686--698
Zhiwen Chen and
Xin He and
Jianhua Sun and
Hao Chen Have Your Cake and Eat it (Too): a
Concurrent Hash Table with Hardware
Transactions . . . . . . . . . . . . . . 699--709
Donghyun Gouk and
Jie Zhang and
Myoungsoo Jung Enabling Realistic Logical Device
Interface and Driver for NVM Express
Enabled Full System Simulations . . . . 710--721
Wenjie Liu and
Sheng Ma and
Libo Huang and
Zhiying Wang The Design of NoC-Side Memory Access
Scheduling for Energy-Efficient GPGPUs 722--735
Yang Shi and
Yanmin Zhu and
Linpeng Huang Partial-PreSET: Enhancing Lifetime of
PCM-Based Main Memory with Fine-Grained
SET Operations . . . . . . . . . . . . . 736--748
Jian Gao and
Hongmei Wei and
Kang Yu and
Peng Qing A Scalable Runtime Fault Localization
Framework for High-Performance Computing
Systems . . . . . . . . . . . . . . . . 749--761
Han Lin and
Zhichao Su and
Xiandong Meng and
Xu Jin and
Zhong Wang and
Wenting Han and
Hong An and
Mengxian Chi and
Zheng Wu Combining Hadoop with MPI to Solve
Metagenomics Problems that are both
Data- and Compute-intensive . . . . . . 762--775
Fan Sun and
Chao Wang and
Lei Gong and
Yiwei Zhang and
Chongchong Xu and
Yuntao Lu and
Xi Li and
Xuehai Zhou UniCNN: a Pipelined Accelerator Towards
Uniformed Computing for CNNs . . . . . . 776--787
Weiqi Dai and
Yukun Du and
Hai Jin and
Weizhong Qiang and
Deqing Zou and
Shouhuai Xu and
Zhongze Liu RollSec: Automatically Secure Software
States Against General Rollback . . . . 788--805
Francesco Piccialli and
Salvatore Cuomo and
Gwanggil Jeon Parallel Approaches for Data Mining in
the Internet of Things Realm . . . . . . 807--811
Santosh Kumar and
Sanjay Kumar Singh and
Ali Imam Abidi and
Deepanwita Datta and
Arun Kumar Sangaiah Group Sparse Representation Approach for
Recognition of Cattle on Muzzle Point
Images . . . . . . . . . . . . . . . . . 812--837
Xiaomin Yang and
Wei Wu and
Binyu Yan and
Huiqian Wang and
Kai Zhou and
Kai Liu Infrared Image Super-Resolution with
Parallel Random Forest . . . . . . . . . 838--858
Jun-fang Song Vehicle Detection Using Spatial
Relationship GMM for Complex Urban
Surveillance in Daytime and Nighttime 859--872
Jun-fang Song and
Wei-xing Wang and
Feng Chen Target Detection Based on $3$D
Multi-Component Model and Inverse
Projection Transformation . . . . . . . 873--885
Muhammad Farhan and
Sohail Jabbar and
Muhammad Aslam and
Awais Ahmad and
Muhammad Munwar Iqbal and
Murad Khan and
Martinez-Enriquez Ana Maria A Real-Time Data Mining Approach for
Interaction Analytics Assessment: IoT
Based Student Interaction Framework . . 886--903
Vanitha Mohanraj and
R. Sakthivel and
Anand Paul and
Seungmin Rho High Performance GCM Architecture for
the Security of High Speed Network . . . 904--922
Salvatore Cuomo and
Pasquale De Michele and
Emanuel Di Nardo and
Livia Marcellino Parallel Implementation of a Machine
Learning Algorithm on GPU . . . . . . . 923--942
Wei Lu and
Xiaomin Yang and
Xu Gou and
Lihua Jian and
Wei Wu and
Gwanggil Jeon Parallel Heat Kernel Volume Based Local
Binary Pattern on Multi-Orientation
Planes for Face Representation . . . . . 943--962
Zengyu Ding and
Gang Mei and
Salvatore Cuomo and
Nengxiong Xu and
Hong Tian Performance Evaluation of
GPU-Accelerated Spatial Interpolation
Using Radial Basis Functions for
Building Explicit Surfaces . . . . . . . 963--991
Atif Khan and
Naomie Salim and
Haleem Farman and
Murad Khan and
Bilal Jan and
Awais Ahmad and
Imran Ahmed and
Anand Paul Abstractive Text Summarization based on
Improved Semantic Graph Approach . . . . 992--1016
Dhirendra Pratap Singh and
Ishan Joshi and
Jaytrilok Choudhary Survey of GPU Based Sorting Algorithms 1017--1034
Rafael Palomar and
Juan Gómez-Luna and
Faouzi A. Cheikh and
Joaqu\'ìn Olivares-Bueno and
Ole J. Elle High-Performance Computation of Bézier
Surfaces on Parallel and Heterogeneous
Platforms . . . . . . . . . . . . . . . 1035--1062
Marcin Gorawski and
Michal Lorek Efficient Processing of Large Data
Structures on GPUs: Enumeration Scheme
Based Optimisation . . . . . . . . . . . 1063--1093
Mina Hosseini Rad and
Ahmad Patooghy and
Mahdi Fazeli An Efficient Programming Skeleton for
Clusters of Multi-Core Processors . . . 1094--1109
Lucia G. Menezo and
Valentin Puente and
Pablo Abad and
Jose-Angel Gregorio Mosaic: a Scalable Coherence Protocol 1110--1138
David Wehr and
Rafael Radkowski Parallel $ k d$-Tree Construction on the
GPU with an Adaptive Split and Sort
Strategy . . . . . . . . . . . . . . . . 1139--1156
Mengda He and
Viktor Vafeiadis and
Shengchao Qin João F. Ferreira GPS$+$: Reasoning About Fences and
Relaxed Atomics . . . . . . . . . . . . 1157--1183
Anonymous Editor's Note: Special Issue on Embedded
Computer Systems: Architectures,
Modeling and Simulation . . . . . . . . 1184--1184
Catalin Bogdan Ciobanu and
Georgi Gaydadjiev and
Christian Pilato and
Donatella Sciuto The Case for Polymorphic Registers in
Dataflow Computing . . . . . . . . . . . 1185--1219
Christos Kyrkou and
Theocharis Theocharides and
Christos-Savvas Bouganis and
Marios Polycarpou Boosting the Hardware-Efficiency of
Cascade Support Vector Machines for
Embedded Classification Applications . . 1220--1246
Christopher Thompson and
Miles Gould and
Nigel Topham High Speed Cycle-Approximate Simulation
of Embedded Cache-Incoherent and
Coherent Chip-Multiprocessors . . . . . 1247--1282
Timo Viitanen and
Janne Helkala and
Heikki Kultala and
Pekka Jääskeläinen and
Jarmo Takala and
Tommi Zetterman and
Heikki Berg Variable Length Instruction Compression
on Transport Triggered Architectures . . 1283--1303
Dimitra Papagiannopoulou and
Andrea Marongiu and
Tali Moreshet and
Luca Benini and
Maurice Herlihy and
R. Iris Bahar Hardware Transactional Memory
Exploration in Coherence-Free Many-Core
Architectures . . . . . . . . . . . . . 1304--1328
Christopher Brown Guest Editorial Special Issue:
High-Level Programming for Heterogeneous
Parallel Systems . . . . . . . . . . . . 1--2
Javier Fresno and
Daniel Barba and
Arturo Gonzalez-Escribano and
Diego R. Llanos HitFlow: a Dataflow Programming Model
for Hybrid Distributed- and
Shared-Memory Systems . . . . . . . . . 3--23
Georgios C. Chasparis and
Michael Rossbory Efficient Dynamic Pinning of
Parallelized Applications by Distributed
Reinforcement Learning . . . . . . . . . 24--38
Matthew B. Ashcraft and
Alexander Lemon and
David A. Penry and
Quinn Snell Compiler Optimization of Accelerator
Data Transfers . . . . . . . . . . . . . 39--58
Moria Abadi and
Sharon Keidar-Barner and
Dmitry Pidan and
Tatyana Veksler Verifying Parallel Code After
Refactoring Using Equivalence Checking 59--73
Marco Danelutto and
Tiziano De Matteis and
Daniele De Sensi and
Gabriele Mencagli and
Massimo Torquati and
Marco Aldinucci and
Peter Kilpatrick The RePhrase Extended Pattern Set for
Data Intensive Parallel Computing . . . 74--93
Ana Moreton-Fernandez and
Arturo Gonzalez-Escribano and
Diego R. Llanos Multi-device Controllers: a Library to
Simplify Parallel Heterogeneous
Programming . . . . . . . . . . . . . . 94--113
Wim Vanderbauwhede and
Syed Waqar Nabi and
Cristian Urlea Type-Driven Automated Program
Transformations and Cost Modelling for
Optimising Streaming Programs on FPGAs 114--136
Hamidreza Mohebbi Parallel SIMD CPU and GPU
Implementations of Berlekamp--Massey
Algorithm and Its Error Correction
Application . . . . . . . . . . . . . . 137--160
J. Daniel García and
Arturo Gonzalez-Escribano Guest Editorial: High-Level Parallel
Programming and the Road to High
Performance . . . . . . . . . . . . . . 161--163
Clemens Grelck and
Heinrich Wiesinger Persistent Asynchronous Adaptive
Specialization for Generic Array
Programming . . . . . . . . . . . . . . 164--183
Arvid Jakobsson Automatic Cost Analysis for Imperative
BSP Programs . . . . . . . . . . . . . . 184--212
Angeles Navarro and
Francisco Corbera and
Andres Rodriguez and
Antonio Vilches and
Rafael Asenjo Heterogeneous parallel\_for Template for
CPU--GPU Chips . . . . . . . . . . . . . 213--233
Fabian Wrede and
Breno Menezes and
Herbert Kuchen Fish School Search with Algorithmic
Skeletons . . . . . . . . . . . . . . . 234--252
Dalvan Griebler and
Renato B. Hoffmann and
Marco Danelutto and
Luiz G. Fernandes High-Level and Productive Stream
Parallelism for Dedup, Ferret, and Bzip2 253--271
Javier López-Fandiño and
Dora B. Heras and
Francisco Argüello and
Mauro Dalla Mura GPU Framework for Change Detection in
Multitemporal Hyperspectral Images . . . 272--292
Miguel A. Vega-Rodr\'ìguez and
José M. Granado-Criado Parallel Programming in Bioinformatics:
Some Interesting Approaches . . . . . . 293--295
Enzo Rucci and
Carlos Garcia Sanchez and
Guillermo Botella Juan and
Armando De Giusti and
Marcelo Naiouf and
Manuel Prieto-Matias SWIMM 2.0: Enhanced Smith--Waterman on
Intel's Multicore and Manycore
Architectures Based on AVX-512 Vector
Extensions . . . . . . . . . . . . . . . 296--316
Ferran Badosa and
Antonio Espinosa and
Cesar Acevedo and
Gonzalo Vera and
Ana Ripoll A History-Based Resource Manager for
Genome Analysis Workflows Applications
on Clusters with Heterogeneous Nodes . . 317--342
Feng Zhang and
Jidong Zhai and
Marc Snir and
Hai Jin and
Hironori Kasahara and
Mateo Valero Guest Editorial: Special Issue on
Network and Parallel Computing for
Emerging Architectures and Applications 343--344
Dong Han and
Shengyuan Zhou and
Tian Zhi and
Yibo Wang and
Shaoli Liu Float-Fix: an Efficient and
Hardware-Friendly Data Type for Deep
Neural Network . . . . . . . . . . . . . 345--359
Yong Yu and
Tian Zhi and
Xuda Zhou and
Shaoli Liu and
Yunji Chen and
Shuyao Cheng BSHIFT: a Low Cost Deep Neural Networks
Accelerator . . . . . . . . . . . . . . 360--372
Lianke Qin and
Yifan Gong and
Tianqi Tang and
Yutian Wang and
Jiangming Jin Training Deep Nets with Progressive
Batch Normalization on Multi-GPUs . . . 373--387
Huihui Zou and
Shanjiang Tang and
Ce Yu and
Hao Fu and
Yusen Li and
Wenjie Tang ASW: Accelerating Smith--Waterman
Algorithm on Coupled CPU--GPU
Architecture . . . . . . . . . . . . . . 388--402
Junhong Liu and
Xin He and
Weifeng Liu and
Guangming Tan Register-Aware Optimizations for
Parallel Sparse Matrix--Matrix
Multiplication . . . . . . . . . . . . . 403--417
Donglin Chen and
Jianbin Fang and
Shizhao Chen and
Chuanfu Xu and
Zheng Wang Optimizing Sparse Matrix--Vector
Multiplications on an ARMv8-based
Many-Core Architecture . . . . . . . . . 418--432
Kang Jin and
Cunlu Li and
Dezun Dong and
Binzhang Fu HARE: History-Aware Adaptive Routing
Algorithm for Endpoint Congestion in
Networks-on-Chip . . . . . . . . . . . . 433--450
Cheng Pan and
Lan Zhou and
Yingwei Luo and
Xiaolin Wang and
Zhenlin Wang Lightweight and Accurate Memory
Allocation in Key--Value Cache . . . . . 451--466
Mingfan Li and
Ke Wen and
Han Lin and
Xu Jin and
Zheng Wu and
Hong An and
Mengxian Chi Improving the Performance of Distributed
MXNet with RDMA . . . . . . . . . . . . 467--480
Heyang Xu and
Yang Liu and
Wei Wei and
Ying Xue Migration Cost and Energy-Aware Virtual
Machine Consolidation Under Cloud
Environments Considering Remaining
Runtime . . . . . . . . . . . . . . . . 481--501
Bo Wang and
Jie Tang and
Rui Zhang and
Wei Ding and
Deyu Qi A Dependency-Aware Storage Schema
Selection Mechanism for In-Memory Big
Data Computing Frameworks . . . . . . . 502--519
Peng Zhao and
Lei Liu and
Wei Cao and
Xiao Dong and
Jiansong Li and
Xiaobing Feng ElasticActor: an Actor System with
Automatic Granularity Adjustment . . . . 520--534
Nahid Farhady Ghalaty Editorial: Special Issue on Side-Channel
and Fault Analysis of High-Performance
Computing Platforms . . . . . . . . . . 535--537
Ahmad Moghimi and
Jan Wichelmann and
Thomas Eisenbarth and
Berk Sunar MemJam: a False Dependency Attack
Against Constant-Time Crypto
Implementations . . . . . . . . . . . . 538--570
Hongyu Fang and
Sai Santosh Dayapule and
Fan Yao and
Milo\vs Doroslova\vcki and
Guru Venkataramani PrODACT: Prefetch-Obfuscator to Defend
Against Cache Timing Channels . . . . . 571--594
Fan Yao and
Milo\vs Doroslova\vcki and
Guru Venkataramani Covert Timing Channels Exploiting Cache
Coherence Hardware: Characterization and
Defense . . . . . . . . . . . . . . . . 595--620
Alejandro Cabrera Aldaya and
Billy Bob Brumley and
Alejandro J. Cabrera Sarmiento and
Santiago Sánchez-Solano Memory Tampering Attack on Binary GCD
Based Inversion Algorithms . . . . . . . 621--640
Qiang-Sheng Hua and
Xuanhua Shi and
Yinglong Xia and
Howie Huang Guest Editorial: Special Issue on
Algorithms and Systems on Big Graph
Processing . . . . . . . . . . . . . . . 641--643
Huanzhou Zhu and
Ligang He and
Songling Fu and
Rui Li and
Xie Han and
Zhangjie Fu and
Yongjian Hu and
Chang-Tsun Li WolfPath: Accelerating Iterative
Traversing-Based Graph Processing
Algorithms on GPU . . . . . . . . . . . 644--667
Zhiyuan Shao and
Zhenjie Mei and
Xiaofeng Ding and
Hai Jin BlockGraphChi: Enabling Block Update in
Out-of-Core Graph Processing . . . . . . 668--685
Deng Li and
Zhujun Chen and
Jiaqi Liu Analysis for Behavioral Economics in
Social Networks: An Altruism-Based
Dynamic Cooperation Model . . . . . . . 686--708
Wei Liu and
Lu Wang and
Xin Feng and
Man Qi and
Chun Yan and
Maozhen Li Soundness Analytics of Composed Logical
Workflow Nets . . . . . . . . . . . . . 709--724
Jianliang Gao and
Jianxin Wang and
Jianbiao He and
Fengxia Yan Against Signed Graph Deanonymization
Attacks on Social Networks . . . . . . . 725--739
Haipeng Yao and
Qiyi Wang and
Luyao Wang and
Peiying Zhang and
Maozhen Li and
Yunjie Liu An Intrusion Detection Framework Based
on Hybrid Multi-Level Data Mining . . . 740--758
Xingwang Wang and
Xiaohui Wei and
Shang Gao and
Yuanyuan Liu and
Zongpeng Li A Novel Auction-Based Query Pricing
Schema . . . . . . . . . . . . . . . . . 759--780
David Niedzielski and
Kleanthis Psarris An Analytical Evaluation of Data
Dependence Analysis Techniques . . . . . 781--804
Misun Yu and
Joon-Sang Lee and
Doo-Hwan Bae AdaptiveLock: Efficient Hybrid Data Race
Detection Based on Real-World Locking
Patterns . . . . . . . . . . . . . . . . 805--837
Andrea Crivellini and
Matteo Franciolini OpenMP Parallelization Strategies for a
Discontinuous Galerkin Solver . . . . . 838--873
Andreas Simbürger and
Sven Apel PolyJIT: Polyhedral Optimization Just in
Time . . . . . . . . . . . . . . . . . . 874--906
Mohammad Amin Irandoost and
Amir Masoud Rahmani MapReduce Data Skewness Handling: a
Systematic Literature Review . . . . . . 907--950
Fabien Reumont-Locke and
Naser Ezzati-Jivan Efficient Methods for Trace Analysis
Parallelization . . . . . . . . . . . . 951--972
Pierre Zins and
Michel Dagenais Tracing and Profiling Machine Learning
Dataflow Applications on GPU . . . . . . 973--1013
Ismail Akturk and
Ozcan Ozturk Adaptive Thread Scheduling in Chip
Multiprocessors . . . . . . . . . . . . 1014--1044
Anonymous Editor's Note: Special Issue on
High-Level Languages and Frameworks for
High-Performance Computing . . . . . . . 1045--1045
Hél\`ene Coullon and
Julien Bigot Extensibility and Composability of a
Multi-Stencil Domain Specific Framework 1046--1085
Brad Peterson and
Alan Humphrey and
Dan Sunderland Automatic Halo Management for the Uintah
GPU--Heterogeneous Asynchronous
Many-Task Runtime . . . . . . . . . . . 1086--1116
José L. Quiroz-Fabián and
Graciela Román-Alonso VPPE: a Novel Visual Parallel
Programming Environment . . . . . . . . 1117--1151
Re'em Harel and
Idan Mosseri and
Harel Levin and
Lee-or Alon and
Matan Rusanovsky and
Gal Oren Source-to-Source Parallelization
Compilers for Scientific Shared-Memory
Multi-core and Accelerated
Multiprocessing: Analysis, Pitfalls,
Enhancement and Potential . . . . . . . 1--31
Zhen Yu and
Yu Zuo and
Yong Zhao Convoider: a Concurrency Bug Avoider
Based on Transparent Software
Transactional Memory . . . . . . . . . . 32--60
Wensi Yang and
Qingfeng Yao and
Kejiang Ye and
Cheng-Zhong Xu Empirical Mode Decomposition and
Temporal Convolutional Networks for
Remaining Useful Life Estimation . . . . 61--79
Donglin Chen and
Jianbin Fang and
Chuanfu Xu and
Shizhao Chen and
Zheng Wang Characterizing Scalability of Sparse
Matrix--Vector Multiplications on
Phytium FT-2000+ . . . . . . . . . . . . 80--97
Shuo Chen and
Zhan Shi and
Dan Feng and
Shang Liu and
Fang Wang and
Lei Yang and
Ruili Yu CSMqGraph: Coarse-Grained and
Multi-external-storage Multi-queue I/O
Management for Graph Computing . . . . . 98--118
Ziyue Jiang and
Yifan Gong and
Jidong Zhai and
Yu-Ping Wang and
Wei Liu and
Hao Wu and
Jiangming Jin Message Passing Optimization in Robot
Operating System . . . . . . . . . . . . 119--136
Zelin Liu and
Jian Cao and
Yudong Tan and
Quanwu Xiao and
Mukesh Prasad Planning Above the API Clouds Before
Flying Above the Clouds: a Real-Time
Personalized Air Travel Planning
Approach . . . . . . . . . . . . . . . . 137--156
Gwanggil Jeon and
Awais Ahmad and
Salvatore Cuomo and
Burak Kantarci Guest Editorial: Special Issue on
Emerging Technology for Software Define
Network Enabled Internet of Things . . . 157--161
Farhan Ullah and
Junfeng Wang and
Muhammad Farhan and
Sohail Jabbar and
Muhammad Kashif Naseer and
Muhammad Asif LSA Based Smart Assessment Methodology
for SDN Infrastructure in IoT
Environment . . . . . . . . . . . . . . 162--177
Murad Khan and
Javed Iqbal and
Muhammad Talha and
Muhammad Arshad and
Muhammad Diyan and
Kijun Han Big Data Processing using Internet of
Software Defined Things in Smart Cities 178--191
S. Ramesh and
C. Yaashuwanth QoS and QoE Enhanced Resource Allocation
for Wireless Video Sensor Networks Using
Hybrid Optimization Algorithm . . . . . 192--212
Mudassar Ahmad and
Usman Ahmad and
Md Asri Ngadi and
Muhammad Asif Habib and
Shehzad Khalid and
Rehan Ashraf Loss Based Congestion Control Module for
Health Centers Deployed by Using
Advanced IoT Based SDN Communication
Networks . . . . . . . . . . . . . . . . 213--243
Fakhri Alam Khan and
Awais Ahmad and
Muhammad Imran Energy Optimization of PR--LEACH Routing
Scheme Using Distance Awareness in
Internet of Things Networks . . . . . . 244--263
Tao Han and
Miaowang Zeng and
Lijuan Zhang and
Arun Kumar Sangaiah A Channel-Aware Duty Cycle Optimization
for Node-to-Node Communications in the
Internet of Medical Things . . . . . . . 264--279
Salah A. Alabady and
Fadi Al-Turjman and
Sadia Din A Novel Security Model for Cooperative
Virtual Networks in the IoT Era . . . . 280--295
E. Anna Devi and
J. Martin Leo Manickam Identifying Partitions in Wireless
Sensor Network . . . . . . . . . . . . . 296--309
Hsiu-Sen Chiang and
Arun Kumar Sangaiah and
Mu-Yen Chen and
Jia-Yu Liu A Novel Artificial Bee Colony
Optimization Algorithm with SVM for
Bio-inspired Software-Defined Networking 310--328
M. BalaAnand and
N. Karthikeyan and
S. Karthik Designing a Framework for Communal
Software: Based on the Assessment Using
Relation Modelling . . . . . . . . . . . 329--343
Idrees Ahmed and
Abid Khan and
Adeel Anjum and
Mansoor Ahmed and
Muhammad Asif Habib A Secure Provenance Scheme for Detecting
Consecutive Colluding Users in
Distributed Networks . . . . . . . . . . 344--366
Ghulam Shabbir and
Adeel Akram and
Muhammad Munwar Iqbal and
Sohail Jabbar and
Mai Alfawair and
Junaid Chaudhry Network Performance Enhancement of
Multi-sink Enabled Low Power Lossy
Networks in SDN Based Internet of Things 367--398
A. N. Gnana Jeevan and
M. A. Maluk Mohamed DyTO: Dynamic Task Offloading Strategy
for Mobile Cloud Computing Using
Surrogate Object Model . . . . . . . . . 399--415
A. K. Gnanasekar and
V. Nagarajan Efficient MAI Cancellation Scheme in
MC-DS-CDMA Using SIC . . . . . . . . . . 416--430
R. Saravana Ram and
A. Gopi Saminathan and
S. Arun Prakash An Area Efficient and Low Power
Consumption of Run Time Digital System
Based on Dynamic Partial Reconfiguration 431--446
Sathees Lingam Paulswamy and
Hariharan Kaluvan Quadrant Based Neighbor to Sink and
Neighbor to Source Routing Protocol and
Alternate Node Deployment Strategies for
WSN . . . . . . . . . . . . . . . . . . 447--469
M. A. Manazir Ahsan and
Ihsan Ali and
Mohd Yamani Idna Bin Idris and
Muhammad Imran and
Muhammad Shoaib Countering Statistical Attacks in
Cloud-Based Searchable Encryption . . . 470--495
E. Laxmi Lydia and
P. Krishna Kumar and
K. Shankar and
S. K. Lakshmanaprabu and
R. M. Vidhyavathi and
Andino Maseleno Charismatic Document Clustering Through
Novel $K$-Means Non-negative Matrix
Factorization (KNMF) Algorithm Using Key
Phrase Extraction . . . . . . . . . . . 496--514
R. Ramya Devi and
V. Vijaya Chamundeeswari Triple DES: Privacy Preserving in Big
Data Healthcare . . . . . . . . . . . . 515--533
Zengyu Ding and
Gang Mei and
Salvatore Cuomo and
Yixuan Li and
Nengxiong Xu Comparison of Estimating Missing Values
in IoT Time Series Data Using Different
Interpolation Algorithms . . . . . . . . 534--548
P. Durgadevi and
S. Srinivasan Resource Allocation in Cloud Computing
Using SFLA and Cuckoo Search
Hybridization . . . . . . . . . . . . . 549--565
Bowei Shan and
Yong Fang GPU Accelerated Parallel Algorithm of
Sliding-Window Belief Propagation for
LDPC Codes . . . . . . . . . . . . . . . 566--579
M. A. Manazir Ahsan and
Ihsan Ali and
Mohd Yamani Idna Bin Idris and
Muhammad Imran and
Muhammad Shoaib Correction to: Countering Statistical
Attacks in Cloud-Based Searchable
Encryption . . . . . . . . . . . . . . . 580--580
Christoph Kessler Guest Editor's Note: High-Level Parallel
Programming 2019 . . . . . . . . . . . . 581--582
Christopher Brown and
Vladimir Janjic and
J. McCall Programming Heterogeneous Parallel
Machines Using Refactoring and
Monte-Carlo Tree Search . . . . . . . . 583--602
Christopher Brown and
Vladimir Janjic and
Kenneth MacKenzie Refactoring GrPPI: Generic Refactoring
for Generic Parallelism in C++ . . . . . 603--625
F. Gava and
Y. Marquer Axiomatization and Imperative
Characterization of Multi-BSP
Algorithms: A Q&A on a Partial Solution 626--651
Clemens Grelck and
Cédric Blom Resource-Aware Data Parallel Array
Processing . . . . . . . . . . . . . . . 652--674
M. Köster and
J. Groß and
A. Krüger Massively Parallel Rule-Based
Interpreter Execution on GPUs Using
Thread Compaction . . . . . . . . . . . 675--691
Luca Rinaldi and
Massimo Torquati and
Marco Danelutto Improving the Performance of Actors on
Multi-cores with Parallel Patterns . . . 692--712
Fabian Wrede and
Herbert Kuchen Towards High-Performance Code Generation
for Multi-GPU Clusters Based on a
Domain-Specific Language for Algorithmic
Skeletons . . . . . . . . . . . . . . . 713--728
Anonymous Editor's Note . . . . . . . . . . . . . 729--729
Kang Jin and
Dezun Dong and
Binzhang Fu DancerFly: An Order-Aware
Network-on-Chip Router On-the-Fly
Mitigating Multi-path Packet Reordering 730--749
Junmin Xiao and
Guizhao Zhang and
Guangming Tan Fast Data-Obtaining Algorithm for Data
Assimilation with Large Data Set . . . . 750--770
Ayman A. Ataher Mahmud and
Satakshi and
W. Jeberson Aircraft Landing Scheduling Using
Embedded Flower Pollination Algorithm 771--785
P. Gowtham and
V. P. Arunachalam and
S. Karthik An Efficient Monitoring of Real Time
Traffic Clearance for an Emergency
Service Vehicle Using IOT . . . . . . . 786--812
S. Chidambaram and
A. Sumathi Optimal Feature Selection for the
Classification of Hyperspectral Imagery
Using Adaptive Spectral--Spatial
Clustering . . . . . . . . . . . . . . . 813--832
M. S. Arunkumar and
P. Suresh and
C. Gunavathi High Utility Infrequent Itemset Mining
Using a Customized Ant Colony Algorithm 833--849
Puneet Jai Kaur and
Sakshi Kaushal A Fuzzy Approach for Estimating Quality
of Aspect Oriented Systems . . . . . . . 850--869
Iftikhar Ahmad and
Rafidah Md Noor and
Muhammad Shoaib A Cooperative Heterogeneous Vehicular
Clustering Mechanism for Road Traffic
Management . . . . . . . . . . . . . . . 870--889
Han Zhang and
Yurong Qian and
Chenwei Tian A ViBe Based Moving Targets Edge
Detection Algorithm and Its Parallel
Implementation . . . . . . . . . . . . . 890--908
Seokhoon Ryu and
Young-Sup Lee and
Seonghyun Kim Active Control of Engine Sound Quality
in a Passenger Car Using a Virtual Error
Microphone . . . . . . . . . . . . . . . 909--927
Wei Wang and
Huansheng Song and
Hua Cui Landslide Multi-attitude Data
Measurement of Bedding Rock Slope Model 928--939
Zeyu He and
Qiuli Huang and
Chuliang Weng Handling Data Skew for Aggregation in
Spark SQL Using Task Stealing . . . . . 941--956
Kim Grüttner and
Philipp A. Hartmann and
Wolfgang Rosenstiel A Timed-Value Stream Based ESL Timing
and Power Estimation and Simulation
Framework for Heterogeneous MPSoCs . . . 957--1007
Yuanzhe Li and
Loren Schwiebert Memory-Optimized Wavefront Parallelism
on GPUs . . . . . . . . . . . . . . . . 1008--1031
Jihyun Park and
Byoungju Choi and
Seungyeun Jang Dynamic Analysis Method for Concurrency
Bugs in Multi-process/Multi-thread
Environments . . . . . . . . . . . . . . 1032--1060
Tim Süß and
Lars Nagel and
Thomas Soddemann Pure Functions in C: A Small Keyword for
Automatic Parallelization . . . . . . . 1--24
Bo Wang and
Jie Tang and
Deyu Qi A Task-Aware Fine-Grained Storage
Selection Mechanism for In-Memory Big
Data Computing Frameworks . . . . . . . 25--50
Evan Coleman and
Erik J. Jensen and
Masha Sosonkina Fault Recovery Methods for Asynchronous
Linear Solvers . . . . . . . . . . . . . 51--80
Jean-Charles Papin and
Christophe Denoual and
Raymond Namyst SPAWN: An Iterative, Potentials-Based,
Dynamic Scheduling and Partitioning Tool 81--103
Raphael Beamonte and
Naser Ezzati-Jivan and
Michel R. Dagenais Automated Generation of Model-Based
Constraints for Common Multi-core and
Real-Time Applications Using Execution
Tracing . . . . . . . . . . . . . . . . 104--134
Anonymous Editor's Note: Special Issue on
High-level Programming for Heterogeneous
Parallel Systems (2019) . . . . . . . . 135--135
Adam Seewald and
Ulrik Pagh Schultz and
Henrik Skov Midtiby Coarse-Grained Computation-Oriented
Energy Modeling for Heterogeneous
Parallel Embedded Systems . . . . . . . 136--157
V. Pothos and
E. Vassalos and
N. Fragoulis Deep Learning Inference with Dynamic
Graphs on Heterogeneous Platforms . . . 158--176
Marco Danelutto and
Gabriele Mencagli and
Peter Kilpatrick Algorithmic Skeletons and Parallel
Design Patterns in Mainstream Parallel
Programming . . . . . . . . . . . . . . 177--198
Anonymous Editor's Note: Special Issue on
International Embedded Systems Symposium
(2019) . . . . . . . . . . . . . . . . . 199--199
Zhongqi Cheng and
Tim Schmidt and
Rainer Dömer Scaled Static Analysis and IP Reuse for
Out-of-Order Parallel SystemC Simulation 200--215
Tomoaki Kawada and
Shinya Honda and
Hiroaki Takada TZmCFI: RTOS-Aware Control-Flow
Integrity Using TrustZone for Armv8-M 216--236
Paulo C. Santos and
João P. C. de Lima and
Luigi Carro Enabling Near-Data Accelerators Adoption
by Through Investigation of Datapath
Solutions . . . . . . . . . . . . . . . 237--252
Menbere Kina Tekleyohannes and
Vladimir Rybalkin and
Andreas Dengel $i$DocChip: A Configurable Hardware
Architecture for Historical Document
Image Processing . . . . . . . . . . . . 253--284
Amartya Mukherjee and
Prateeti Mukherjee and
Nilanjan Dey iGridEdgeDrone: Hybrid Mobility Aware
Intelligent Load Forecasting by Edge
Enabled Internet of Drone Things for
Smart Grid Networks . . . . . . . . . . 285--325
Furat Al-Obaidy and
Arghavan Asad and
Farah A. Mohammadi A Power-Aware Hybrid Cache for
Chip-Multi Processors Based on Neural
Network Prediction Technique . . . . . . 326--346
Maria Fazio and
Alina Buzachis and
Massimo Villari A Map-Reduce Approach for the Dijkstra
Algorithm in SDN Over Osmotic Computing
Systems . . . . . . . . . . . . . . . . 347--375
Guillaume Iooss and
Christophe Alias and
Sanjay Rajopadhye Monoparametric Tiling of Polyhedral
Programs . . . . . . . . . . . . . . . . 376--409
Isil Öz and
Sanem Arslan Predicting the Soft Error Vulnerability
of Parallel Applications Using Machine
Learning . . . . . . . . . . . . . . . . 410--439
Iraklis M. Spiliotis and
Charalampos Sitaridis and
Michael P. Bekakos Parallel Computation of Discrete
Orthogonal Moment on Block Represented
Images Using OpenMP . . . . . . . . . . 440--462
Biao Xing and
DanDan Wang and
Cuihua He Accelerating DES and AES Algorithms for
a Heterogeneous Many-core Processor . . 463--486
Jörg Mische and
Martin Frieb and
Theo Ungerer PIMP My Many-Core: Pipeline-Integrated
Message Passing . . . . . . . . . . . . 487--505
Sven Rheindt and
Sebastian Maier and
Andreas Herkersdorf \pkgDySHARQ: Dynamic Software-Defined
Hardware-Managed Queues for Tile-Based
Architectures . . . . . . . . . . . . . 506--540
Sven Gesper and
Moritz Weißbrich and
Guillermo Payá-Vayá Evaluation of Different Processor
Architecture Organizations for On-Site
Electronics in Harsh Environments . . . 541--569
Akshay Srivatsa and
Mostafa Mansour and
Andreas Herkersdorf \pkgDynaCo: Dynamic Coherence Management
for Tiled Manycore Architectures . . . . 570--599
Rafael Stahl and
Alexander Hoffman and
Ulf Schlichtmann \pkgDeeperThings: Fully Distributed CNN
Inference on Resource-Constrained Edge
Devices . . . . . . . . . . . . . . . . 600--624
Guangming Tan and
Guang R. Gao Guest Editorial: Special issue on
Network and Parallel Computing for
Emerging Architectures and Applications 625--627
Jiansong Li and
Wei Cao and
Xiaobing Feng Compiler-assisted Operator Template
Library for DNN Accelerators . . . . . . 628--645
Tianba Chen and
Wei Li and
Yunchun Li o\pkgM-DRL: Deep Reinforcement Learning
Based Coflow Traffic Scheduler with MLFQ
Threshold Adaption . . . . . . . . . . . 646--657
Zhanyuan Di and
En Shao and
Guangming Tan High-performance Migration Tool for Live
Container in a Workflow . . . . . . . . 658--670
Ziyu Zhang and
Zitan Liu and
Hong An RDMA-Based Apache Storm for
High-Performance Stream Data Processing 671--684
Yang Bai and
Dinghuang Hu and
Xiangke Liao CCRP: Converging Credit-Based and
Reactive Protocols in Datacenters . . . 685--699
Hui Dong and
Jianxi Fan and
Jingya Zhou Fault-Tolerant and Unicast Performances
of the Data Center Network HSDC . . . . 700--714
Mengshan Yu and
Guisheng Fan and
Liang Chen Location-based and Time-aware Service
Recommendation in Mobile Edge Computing 715--731
Haonan Ji and
Shibo Lu and
Brian Vinter Segmented Merge: A New Primitive for
Parallel Sparse Matrix Computations . . 732--744
Xiao Hu and
Zhonghai Lu A Configurable Hardware Architecture for
Runtime Application of Network Calculus 745--760
Troels Henriksen Bounds Checking on GPU . . . . . . . . . 761--775
Breno A. de Melo Menezes and
Nina Herrmann and
Fernando Buarque de Lima Neto High-Level Parallel Ant Colony
Optimization with Algorithmic Skeletons 776--801
Frédéric Dabrowski On Single-Valuedness in Textually
Aligned SPMD Programs . . . . . . . . . 802--819
Millán A. Martínez and
Basilio B. Fraguela and
José C. Cabaleiro A Parallel Skeleton for
Divide-and-conquer Unbalanced and Deep
Problems . . . . . . . . . . . . . . . . 820--845
August Ernstsson and
Johan Ahlqvist and
Christoph Kessler \pkgSkePU 3: Portable High-Level
Programming of Heterogeneous Systems and
HPC Clusters . . . . . . . . . . . . . . 846--866
Pascal Jungblut and
Karl Fürlinger Portable Node-Level Parallelism for the
PGAS Model . . . . . . . . . . . . . . . 867--885
Vladimir Janjic and
Christopher Brown and
Adam D. Barwell Restoration of Legacy Parallelism:
Transforming Pthreads into Farm and
Pipeline Patterns . . . . . . . . . . . 886--910
Anshu S. Anand and
Karthik Sayani and
R. K. Shyamasundar Fortress Abstractions in X10 Framework 911--933
Neeraj Gupta and
Mahdi Khosravy and
Rubén González Crespo Lightweight Artificial Intelligence
Technology for Health Diagnosis of
Agriculture Vehicles: Parallel Evolving
Artificial Neural Networks by Genetic
Algorithm . . . . . . . . . . . . . . . 1--26
Fei Yin and
Feng Shi A Comparative Survey of Big Data
Computing and HPC: From a Parallel
Programming Model to a Cluster
Architecture . . . . . . . . . . . . . . 27--64
Jichi Guo and
Qing Yi and
Kleanthis Psarris Enhancing the Effectiveness of Inlining
in Automatic Parallelization . . . . . . 65--88
Talha Naqash and
Sajjad Hussain Shah and
Muhammad Najam Ul Islam Statistical Analysis Based Intrusion
Detection System for Ultra-High-Speed
Software Defined Network . . . . . . . . 89--114
Tongsheng Geng and
Marcos Amaris and
Jean-Luc Gaudiot A Profile-Based AI-Assisted Dynamic
Scheduling Approach for Heterogeneous
Architectures . . . . . . . . . . . . . 115--151
Rajesh Pandian Muniasamy and
Rupesh Nasre and
N. S. Narayanaswamy Accelerating Computation of Steiner
Trees on GPUs . . . . . . . . . . . . . 152--185
Marc Reichenbach and
Matthias Jung and
Alex Orailoglu Guest Editorial: Special Issue on 2020
IEEE International Conference on
Embedded Computer Systems:
Architectures, Modeling and Simulation
(SAMOS 2020) . . . . . . . . . . . . . . 187--188
Sohan Lal and
Bogaraju Sharatchandra Varma and
Ben Juurlink A Quantitative Study of Locality in GPU
Caches for Memory-Divergent Workloads 189--216
Lukas Steiner and
Matthias Jung and
Norbert Wehn DRAMSys4.0: an Open-Source Simulation
Framework for In-depth DRAM Analyses . . 217--242
Mark Sagi and
Nguyen Anh Vu Doan and
Andreas Herkersdorf Fine-Grained Power Modeling of Multicore
Processors Using FFNNs . . . . . . . . . 243--266
Minyu Cui and
Angeliki Kritikakou and
Emmanuel Casseau Energy-Efficient Partial-Duplication
Task Mapping Under Multiple DVFS Schemes 267--294
Niko Zurstraßen and
Lukas Jünger and
Rainer Leupers AMAIX In-Depth: a Generic Analytical
Model for Deep Learning Accelerators . . 295--318
August Ernstsson and
Nicolas Vandenbergen and
Christoph Kessler A Deterministic Portable Parallel
Pseudo-Random Number Generator for
Pattern-Based Programming of
Heterogeneous Parallel Systems . . . . . 319--340
Peter Thoman and
Florian Tischler and
Thomas Fahringer The Celerity High-level API: C++20 for
Accelerator Clusters . . . . . . . . . . 341--359
Sébastien Rivault and
Mostafa Bamha and
Sophie Robert A Scalable Similarity Join Algorithm
Based on MapReduce and LSH . . . . . . . 360--380
Hemalatha Eedi and
Sahith Karra and
Rahul Utkoor An Improved/Optimized Practical
Non-Blocking PageRank Algorithm for
Massive Graphs* . . . . . . . . . . . . 381--404
Vasilios Kelefouras and
Karim Djemame and
Nikolaos Voros A Methodology for Efficient Tile Size
Selection for Affine Loop Kernels . . . 405--432
Nina Herrmann and
Breno A. de Melo Menezes and
Herbert Kuchen Stencil Calculations with Algorithmic
Skeletons for Heterogeneous Computing
Environments . . . . . . . . . . . . . . 433--453
Júnior Löff and
Renato B. Hoffmann and
Ricardo Pieper and
Dalvan Griebler and
Luiz G. Fernandes DSParLib: a C++ Template Library for
Distributed Stream Parallelism . . . . . 454--485
Breno Augusto de Melo Menezes and
Herbert Kuchen and
Fernando Buarque de Lima Neto Parallelization of Swarm Intelligence
Algorithms: Literature Review . . . . . 486--514
Jash Khatri and
Arihant Samar and
Bikash Behera and
Rupesh Nasre Scaling the Maximum Flow Computation on
GPUs . . . . . . . . . . . . . . . . . . 515--561
S. Ramesh and
C. Yaashuwanth Retraction Note: QoS and QoE Enhanced
Resource Allocation for Wireless Video
Sensor Networks Using Hybrid
Optimization Algorithm . . . . . . . . . 562--562
Nicol\`o Tonci and
Massimo Torquati and
Gabriele Mencagli and
Marco Danelutto Distributed-Memory \pkgFastFlow Building
Blocks . . . . . . . . . . . . . . . . . 1--21
Rui S. Silva and
João L. Sobral Efficient High-Level Programming in
Plain Java . . . . . . . . . . . . . . . 22--42
Stephen Timcheck and
Jeremy Buhler Interruptible Nodes: Reducing Queueing
Costs in Irregular Streaming Dataflow
Applications on Wide-SIMD Architectures 43--60
August Ernstsson and
Dalvan Griebler and
Christoph Kessler Assessing Application Efficiency and
Performance Portability in Single-Source
Programming for Heterogeneous Parallel
Systems . . . . . . . . . . . . . . . . 61--82
Ruairidh MacGregor and
Blair Archibald and
Phil Trinder Generic Exact Combinatorial Search at
HPC Scale . . . . . . . . . . . . . . . 83--106
M. BalaAnand and
N. Karthikeyan and
S. Karthik Retraction Note: Designing a Framework
for Communal Software: Based on the
Assessment Using Relation Modelling . . 107--107
Haoran Wang and
Thibaut Tachon and
Chong Li and
Sophie Robert and
Sébastien Limet SMSG: Profiling-Free Parallelism
Modeling for Distributed Training of DNN 109--127
Grace Nansamba and
Amani Altarawneh and
Anthony Skjellum A Fault-Model-Relevant Classification of
Consensus Mechanisms for MPI and HPC . . 128--149
Fabian Knorr and
Peter Thoman and
Thomas Fahringer Declarative Data Flow in a Graph-Based
Distributed Memory Runtime System . . . 150--171
Nina Herrmann and
Herbert Kuchen Distributed Calculations with
Algorithmic Skeletons for Heterogeneous
Computing Environments . . . . . . . . . 172--185
Lo\"\ic Sylvestre and
Emmanuel Chailloux and
Jocelyn Sérot Accelerating OCaml Programs on FPGA . . 186--207
Matthew Norman and
Isaac Lyngaas and
Abhishek Bagusetty and
Mark Berrill Portable C++ Code that can Look and Feel
Like Fortran Code with Yet Another
Kernel Launcher (YAKL) . . . . . . . . . 209--230
Daniel Presser and
Frank Siqueira Partitioning-Aware Performance Modeling
of Distributed Graph Processing Tasks 231--255
Vsevolod Bohaienko Calculation of Distributed-Order
Fractional Derivative on Tensor
Cores-Enabled GPU . . . . . . . . . . . 256--270
Virginia Niculescu and
Frédéric Loulergue Guest Editor's Note: High--Level
Parallel Programming 2021 . . . . . . . 271--273
Polychronis Velentzas and
Michael Vassilakopoulos and
Antonio Corral and
Christos Antonopoulos GPU-Based Algorithms for Processing the
$k$ Nearest--Neighbor Query on Spatial
Data Using Partitioning and Concurrent
Kernel Execution . . . . . . . . . . . . 275--308
Yacine Hakimi and
Riyadh Baghdadi and
Yacine Challal A Hybrid Machine Learning Model for Code
Optimization . . . . . . . . . . . . . . 309--331
Alex Orailoglu and
Marc Reichenbach and
Matthias Jung Special Issue on SAMOS 2022 . . . . . . 1--2
Viktor Razilov and
Robert Wittig and
Emil Matú\vs and
Gerhard Fettweis Access Interval Prediction by Partial
Matching for Tightly Coupled Memory
Systems . . . . . . . . . . . . . . . . 3--19
Milad Kokhazadeh and
Georgios Keramidas and
Vasilios Kelefouras and
Iakovos Stamoulis A Practical Approach for Employing
Tensor Train Decomposition in Edge
Devices . . . . . . . . . . . . . . . . 20--39
Christian Heidorn and
Muhammad Sabih and
Nicolai Meyerhöfer and
Christian Schinabeck and
Jürgen Teich and
Frank Hannig Hardware-Aware Evolutionary Explainable
Filter Pruning for Convolutional Neural
Networks . . . . . . . . . . . . . . . . 40--58
Luise Müller and
Philipp Wanko and
Christian Haubelt and
Torsten Schaub Investigating Methods for ASPmT-Based
Design Space Exploration in Evolutionary
Product Design . . . . . . . . . . . . . 59--92
Alessandro Ottaviano and
Robert Balas and
Giovanni Bambini and
Antonio Del Vecchio and
Maicol Ciani and
Davide Rossi and
Luca Benini and
Andrea Bartolini ControlPULP: a RISC-V On-Chip Parallel
Power Controller for Many-Core HPC
Processors with FPGA-Based
Hardware-In-The-Loop Power and Thermal
Emulation . . . . . . . . . . . . . . . 93--123
Yingpeng Wen and
Zhilin Qiu and
Dongyu Zhang and
Dan Huang and
Nong Xiao and
Liang Lin Accelerating Massively Distributed Deep
Learning Through Efficient
Pseudo-Synchronous Update Method . . . . 125--146
Alif Ahmed and
Farzana Ahmed Siddique and
Kevin Skadron GraphTango: a Hybrid Representation
Format for Efficient Streaming Graph
Updates and Analysis . . . . . . . . . . 147--170
Fabian Knorr and
Philip Salzmann and
Peter Thoman and
Thomas Fahringer Automatic Discovery of Collective
Communication Patterns in Parallelized
Task Graphs . . . . . . . . . . . . . . 171--186
Pedro Moreno and
Miguel Areias and
Ricardo Rocha and
Vítor Santos Costa Yet Another Lock-Free Atom Table Design
for Scalable Symbol Management in Prolog 187--206
Nicol\`o Tonci and
Sébastien Rivault and
Mostafa Bamha and
Sophie Robert and
Sébastien Limet and
Massimo Torquati LSH SimilarityJoin Pattern in
\pkgFastFlow . . . . . . . . . . . . . . 207--230
Bing Wei and
Qiang Huang and
Hui Chen and
Chenhao Zhang and
Limin Xiao Erasure-Coded Hybrid Writes Based on
Data Delta . . . . . . . . . . . . . . . 231--252
Björn Birath and
August Ernstsson and
John Tinnerholm and
Christoph Kessler High-Level Programming of
FPGA-Accelerated Systems with Parallel
Patterns . . . . . . . . . . . . . . . . 253--273
Nina Herrmann and
Justus Dieckmann and
Herbert Kuchen Optimizing Three-Dimensional
Stencil-Operations on Heterogeneous
Computing Environments . . . . . . . . . 274--297
Achilleas Tzenetopoulos and
Dimosthenis Masouros and
Sotirios Xydis and
Dimitrios Soudris Orchestration Extensions for
Interference- and Heterogeneity-Aware
Placement for Data-Analytics . . . . . . 298--323
Bhanu Dwivedi and
Bachu Dushmanta Kumar Patro RMOWOA: a Revamped Multi-Objective Whale
Optimization Algorithm for Maximizing
the Lifetime of a Network in Wireless
Sensor Networks . . . . . . . . . . . . 325--366
Mustafa Sanli Design and Performance Evaluation of a
Novel High-Speed Hardware Architecture
for Keccak Crypto Coprocessor . . . . . 367--379
Songwen Pei and
Wei Qin and
Jianan Li and
Junhao Tan and
Jie Tang and
Jean-Luc Gaudiot Intelligent Page Migration on
Heterogeneous Memory by Using
Transformer . . . . . . . . . . . . . . 380--399
Kevin Jude Concessao and
Unnikrishnan Cheramangalath and
Ricky Dev and
Rupesh Nasre Meerkat: a Framework for Dynamic Graph
Algorithms on GPUs . . . . . . . . . . . 400--453
Assia Brighen and
Asma Chouikh and
Hamida Ikhlef and
Hachem Slimani and
Abdelmounaam Rezgui and
Hamamache Kheddouci Giraph-Based Distributed Algorithms for
Coloring Large-Scale Graphs . . . . . . ??
Re'em Harel and
Tal Kadosh and
Niranjan Hasabnis and
Timothy Mattson and
Yuval Pinter and
Gal Oren PragFormer: Data-Driven Parallel Source
Code Classification with Transformers ??
Jianwu Long and
Luping Liu K*-Means: an Efficient Clustering
Algorithm with Adaptive Decision
Boundaries . . . . . . . . . . . . . . . ??
Naw Safrin Sattar and
Khaled Z. Ibrahim and
Aydin Buluc and
Shaikh Arifuzzaman DyG-DPCD: a Distributed Parallel
Community Detection Algorithm for
Large-Scale Dynamic Graphs . . . . . . . ??
Stefan Brankovi\'c and
Lazar Smiljkovi\'c and
Predrag Obradovi\'c and
Milo\vs Radonjii\'c and
Marko Mi\vsi\'c Fast Parallel CPU--GPU Approximate
Spectral Clustering for Transcriptomics
Data . . . . . . . . . . . . . . . . . . ??
M. Mohamed Asan Basiri High Throughput Instruction-Data Level
Parallelism Based Arithmetic Hardware
Accelerator . . . . . . . . . . . . . . ??
Valentin Beauvais and
Nicol\`o Tonci and
Sophie Robert and
Sébastien Limet Parallelizing RNA-Seq Analysis with
\pkgBioSkel: a \pkgFastFlow Based
Prototype . . . . . . . . . . . . . . . ??
Yaseen Zaidi and
Simon Winberg Automatic Heterogeneous Runtime Using
Signal Processing Domain-Specific and
Parallel Patterns . . . . . . . . . . . ??
Parinaz Barakhshan and
Rudolf Eigenmann Advancing Interactive Parallelization:
\pkgiCetus . . . . . . . . . . . . . . . ??
Marco Edoardo Santimaria and
Alberto Riccardo Martinelli and
Iacopo Colonnelli and
Barbara Cantalupo and
Massimo Torquati and
Marco Aldinucci CAPIO-CL: The CAPIO Coordination
Language . . . . . . . . . . . . . . . . ??
Christopher Brown and
Adam D. Barwell \pkgpi-par: a Dependently-Typed Parallel
Language with Algorithmic Skeletons . . ??
Simone Frassinelli and
Gabriele Mencagli Larger-Than-Memory Stateful Stream
Processing with WindFlow . . . . . . . . ??
Paolo Palazzari and
Marco Faltelli and
Francesco Iannone FIPLib: an Image Processing Library for
FPGAs Using High-Level Synthesis . . . . ??
Ricardo Leonarczyk and
Gabriele Mencagli and
Dalvan Griebler Self-Adaptive Micro-Batching for
Low-Latency GPU-Accelerated Stream
Processing . . . . . . . . . . . . . . . ??
Michail Boulasikis and
Flavius Gruian and
Robert-Zoltán Szász Using Machine Learning Hardware to Solve
Linear Partial Differential Equations
with Finite Difference Methods . . . . . ??
William Ruys and
Hochan Lee and
Bozhi You and
Shreya Talati and
Jaeyoung Park and
James Almgren-Bell and
Yineng Yan and
Milinda Fernando and
Mattan Erez and
Milos Gligoric and
Martin Burtscher and
Christopher J. Rossbach and
Keshav Pingali and
George Biros Performance Characterization of Python
Runtimes for Multi-device Task Parallel
Programming . . . . . . . . . . . . . . ??