Last update:
Wed Sep 10 10:03:40 MDT 2025
Phillip B. Gibbons ACM Transactions on Parallel Computing:
an introduction . . . . . . . . . . . . 1:1--1:??
David J. Lilja Introduction . . . . . . . . . . . . . . 2:1--2:??
Ashay Rane and
James Browne Enhancing Performance Optimization of
Multicore/Multichip Nodes with Data
Structure Metrics . . . . . . . . . . . 3:1--3:??
Víctor Jiménez and
Francisco J. Cazorla and
Roberto Gioiosa and
Alper Buyuktosunoglu and
Pradip Bose and
Francis P. O'Connell and
Bruce G. Mealey Adaptive Prefetching on POWER7:
Improving Performance and Power
Consumption . . . . . . . . . . . . . . 4:1--4:??
Timothy Heil and
Anil Krishna and
Nicholas Lindberg and
Farnaz Toussi and
Steven Vanderwiel Architecture and Performance of the
Hardware Accelerators in IBM's PowerEN
Processor . . . . . . . . . . . . . . . 5:1--5:??
Xing Wu and
Frank Mueller and
Scott Pakin A methodology for automatic generation
of executable communication
specifications from parallel MPI
applications . . . . . . . . . . . . . . 6:1--6:??
Mahesh Ravishankar and
John Eisenlohr and
Louis-Noël Pouchet and
J. Ramanujam and
Atanas Rountev and
P. Sadayappan Automatic parallelization of a class of
irregular loops for distributed memory
systems . . . . . . . . . . . . . . . . 7:1--7:??
Julian Shun and
Guy E. Blelloch A simple parallel Cartesian tree
algorithm and its application to
parallel suffix tree construction . . . 8:1--8:??
Keshav Pingali and
J. Ramanujam and
P. Sadayappan Introduction to the Special Issue on
PPoPP'12 . . . . . . . . . . . . . . . . 9:1--9:??
Aurelien Bouteiller and
Thomas Herault and
George Bosilca and
Peng Du and
Jack Dongarra Algorithm-Based Fault Tolerance for
Dense Matrix Factorizations, Multiple
Failures and Accuracy . . . . . . . . . 10:1--10:??
Grey Ballard and
James Demmel and
Nicholas Knight Avoiding Communication in Successive
Band Reduction . . . . . . . . . . . . . 11:1--11:??
Paul Sack and
William Gropp Collective Algorithms for Multiported
Torus Networks . . . . . . . . . . . . . 12:1--12:??
David Dice and
Virendra J. Marathe and
Nir Shavit Lock Cohorting: a General Technique for
Designing NUMA Locks . . . . . . . . . . 13:1--13:??
Duane Merrill and
Michael Garland and
Andrew Grimshaw High-Performance and Scalable GPU Graph
Traversal . . . . . . . . . . . . . . . 14:1--14:??
Stephan C. Kramer and
Johannes Hagemann SciPAL: Expression Templates and
Composition Closure Objects for High
Performance Computational Physics with
CUDA and OpenMP . . . . . . . . . . . . 15:1--15:??
Ehsan Totoni and
Nikhil Jain and
Laxmikant V. Kale Power Management of Extreme-Scale
Networks with On/Off Links in Runtime
Systems . . . . . . . . . . . . . . . . 16:1--16:??
Maurice Herlihy Guest Editor Introduction . . . . . . . 1:1--1:??
Bastian Degener and
Barbara Kempkes and
Peter Kling and
Friedhelm Meyer Auf Der Heide Linear and Competitive Strategies for
Continuous Robot Formation Problems . . 2:1--2:??
Navendu Jain and
Ishai Menache and
Joseph (Seffi) Naor and
Jonathan Yaniv Near-Optimal Scheduling Mechanisms for
Deadline-Sensitive Jobs in Large
Computing Clusters . . . . . . . . . . . 3:1--3:??
Moran Feldman and
Liane Lewin-Eytan and
Joseph (Seffi) Naor Hedonic Clustering Games . . . . . . . . 4:1--4:??
Janmartin Jahn and
Santiago Pagani and
Sebastian Kobbe and
Jian-Jia Chen and
Jörg Henkel Runtime Resource Allocation for Software
Pipelines . . . . . . . . . . . . . . . 5:1--5:??
Yi Xu and
Bo Zhao and
Youtao Zhang and
Jun Yang Simple Virtual Channel Allocation for
High-Throughput and High-Frequency
On-Chip Routers . . . . . . . . . . . . 6:1--6:??
Adam Hammouda and
Andrew R. Siegel and
Stephen F. Siegel Noise-Tolerant Explicit Stencil
Computations for Nonuniform Process
Execution Rates . . . . . . . . . . . . 7:1--7:??
Ciaran McCreesh and
Patrick Prosser The Shape of the Search Tree for the
Maximum Clique Problem and the
Implications for Parallel Branch and
Bound . . . . . . . . . . . . . . . . . 8:1--8:??
Torsten Hoefler and
James Dinan and
Rajeev Thakur and
Brian Barrett and
Pavan Balaji and
William Gropp and
Keith Underwood Remote Memory Access Programming in
MPI-3 . . . . . . . . . . . . . . . . . 9:1--9:??
Walther Maldonado and
Patrick Marlier and
Pascal Felber and
Julia Lawall and
Gilles Muller and
Etienne Rivi\`ere Supporting Time-Based QoS Requirements
in Software Transactional Memory . . . . 10:1--10:??
Gokcen Kestor and
Osman S. Unsal and
Adrian Cristal and
Serdar Tasiran TRADE: Precise Dynamic Race Detection
for Scalable Transactional Memory
Systems . . . . . . . . . . . . . . . . 11:1--11:??
Nuno Diegues and
Paolo Romano Time-Warp: Efficient Abort Reduction in
Transactional Memory . . . . . . . . . . 12:1--12:??
Lionel Eyraud-Dubois and
Loris Marchal and
Oliver Sinnen and
Frédéric Vivien Parallel Scheduling of Task Trees with
Limited Memory . . . . . . . . . . . . . 13:1--13:??
Michael Dinitz and
Torsten Hoefler Introduction to the Special Issue on
SPAA 2013 . . . . . . . . . . . . . . . 14:1--14:??
Ravi Kumar and
Benjamin Moseley and
Sergei Vassilvitskii and
Andrea Vattani Fast Greedy Algorithms in MapReduce and
Streaming . . . . . . . . . . . . . . . 14:1--14:??
Peter Sanders and
Jochen Speck and
Raoul Steffen Work-Efficient Matrix Inversion in
Polylogarithmic Time . . . . . . . . . . 15:1--15:??
Seth Gilbert and
Chaodong Zheng SybilCast: Broadcast on the Open
Airwaves . . . . . . . . . . . . . . . . 16:1--16:??
I-Ting Angelina Lee and
Charles E. Leiserson and
Tao B. Schardl and
Zhunping Zhang and
Jim Sukha On-the-Fly Pipeline Parallelism . . . . 17:1--17:??
Martina Eikel and
Christian Scheideler IRIS: a Robust Information System
Against Insider DoS Attacks . . . . . . 18:1--18:??
Peter Kling and
Peter Pietrzyk Profitable Scheduling on Multiple
Speed-Scalable Processors . . . . . . . 19:1--19:??
Chinmoy Dutta and
Gopal Pandurangan and
Rajmohan Rajaraman and
Scott Roche Coalescing-Branching Random Walks on
Graphs . . . . . . . . . . . . . . . . . 20:1--20:??
James Larus and
Sandhya Dwarkadas and
José Moreira and
Andrew Lumsdaine Introduction to the Special Issue on
PPoPP'14 . . . . . . . . . . . . . . . . 21:1--21:??
Maurice Herlihy and
Zhiyu Liu Well-Structured Futures and Cache
Locality . . . . . . . . . . . . . . . . 22:1--22:??
Paul Thomson and
Alastair F. Donaldson and
Adam Betts Concurrency Testing Using Controlled
Schedulers: an Empirical Study . . . . . 23:1--23:??
Darko Petrovi\'c and
Thomas Ropars and
André Schiper Leveraging Hardware Message Passing for
Efficient Thread Synchronization . . . . 24:1--24:??
Olivier Tardieu and
Benjamin Herta and
David Cunningham and
David Grove and
Prabhanjan Kambadur and
Vijay Saraswat and
Avraham Shinnar and
Mikio Takeuchi and
Mandana Vaziri and
Wei Zhang X10 and APGAS at Petascale . . . . . . . 25:1--25:??
Saeed Maleki and
Madanlal Musuvathi and
Todd Mytkowicz Low-Rank Methods for Parallelizing
Dynamic Programming Algorithms . . . . . 26:1--26:??
Xin Yuan and
Wickus Nienaber and
Santosh Mahapatra On Folded-Clos Networks with
Deterministic Single-Path Routing . . . 27:1--27:??
Edans F. De O. Sandes and
Guillermo Miranda and
Xavier Martorell and
Eduard Ayguade and
George Teodoro and
Alba C. M. A. De Melo MASA: a Multiplatform Architecture for
Sequence Aligners with Block Pruning . . 28:1--28:??
Friedhelm Meyer auf der Heide and
Peter Sanders and
Nodari Sitchinava Introduction to the Special Issue on
SPAA 2014 . . . . . . . . . . . . . . . 1:1--1:??
Tim Kaler and
William Hasenplaugh and
Tao B. Schardl and
Charles E. Leiserson Executing Dynamic Data-Graph
Computations Deterministically Using
Chromatic Scheduling . . . . . . . . . . 2:1--2:??
Sungjin Im and
Benjamin Moseley and
Kirk Pruhs and
Eric Torng Competitively Scheduling Tasks with
Intermediate Parallelizability . . . . . 4:1--4:??
Ioana O. Bercea and
Navin Goyal and
David G. Harris and
Aravind Srinivasan On Computing Maximal Independent Sets of
Hypergraphs in Parallel . . . . . . . . 5:1--5:??
Davide Bil\`o and
Luciano Gual\`a and
Stefano Leucci and
Guido Proietti Locality-Based Network Creation Games 6:1--6:??
Jiayang Jiang and
Michael Mitzenmacher and
Justin Thaler Parallel Peeling Algorithms . . . . . . 7:1--7:??
Harsha Vardhan Simhadri and
Guy E. Blelloch and
Jeremy T. Fineman and
Phillip B. Gibbons and
Aapo Kyrola Experimental Analysis of Space-Bounded
Schedulers . . . . . . . . . . . . . . . 8:1--8:??
Hafiz Fahad Sheikh and
Ishfaq Ahmad Sixteen Heuristics for Joint
Optimization of Performance, Energy, and
Temperature in Allocating Tasks to
Multi-Cores . . . . . . . . . . . . . . 9:1--9:??
Jeffrey D. Blanchard and
Erik Opavsky and
Emircan Uysaler Selecting Multiple Order Statistics with
a Graphics Processing Unit . . . . . . . 10:1--10:??
David Böhme and
Markus Geimer and
Lukas Arnold and
Felix Voigtlaender and
Felix Wolf Identifying the Root Causes of Wait
States in Large-Scale Parallel
Applications . . . . . . . . . . . . . . 11:1--11:??
Roshan Dathathri and
Ravi Teja Mullapudi and
Uday Bondhugula Compiling Affine Loop Nests for a
Dynamic Scheduling Runtime on Shared and
Distributed Memory . . . . . . . . . . . 12:1--12:??
Anne Benoit and
Aurélien Cavelan and
Yves Robert and
Hongyang Sun Assessing General-Purpose Algorithms to
Cope with Fail-Stop and Silent Errors 13:1--13:??
Ioannis Koutis and
Shen Chen Xu Simple Parallel and Distributed
Algorithms for Spectral Graph
Sparsification . . . . . . . . . . . . . 14:1--14:??
Matthieu Dorier and
Gabriel Antoniu and
Franck Cappello and
Marc Snir and
Robert Sisneros and
Orcun Yildiz and
Shadi Ibrahim and
Tom Peterka and
Leigh Orf Damaris: Addressing Performance
Variability in Data Management for
Post-Petascale Simulations . . . . . . . 15:1--15:??
Jiaquan Gao and
Yu Wang and
Jun Wang and
Ronghua Liang Adaptive Optimization Modeling of
Preconditioned Conjugate Gradient on
Multi-GPUs . . . . . . . . . . . . . . . 16:1--16:??
Timothy Creech and
Rajeev Barua Transparently Space Sharing a Multicore
Among Multiple Processes . . . . . . . . 17:1--17:??
Grey Ballard and
Alex Druinsky and
Nicholas Knight and
Oded Schwartz Hypergraph Partitioning for Sparse
Matrix--Matrix Multiplication . . . . . 18:1--18:??
David Grove Introduction to the Special Section on
PPoPP'15 . . . . . . . . . . . . . . . . 19:1--19:??
Zoltan Majo and
Thomas R. Gross A Library for Portable and Composable
Data Locality Optimizations for NUMA
Systems . . . . . . . . . . . . . . . . 20:1--20:??
Guy Golan-Gueta and
G. Ramalingam and
Mooly Sagiv and
Eran Yahav Automatic Scalable Atomicity via
Semantic Locking . . . . . . . . . . . . 21:1--21:??
Joseph Izraelevitz and
Michael L. Scott Generality and Speed in Nonblocking Dual
Containers . . . . . . . . . . . . . . . 22:1--22:??
Richard Cole and
Vijaya Ramachandran Resource Oblivious Sorting on Multicores 23:1--23:??
Grey Ballard and
Mary Hall and
Tim Harris and
Brandon Lucia Guest Editor Introduction PPoPP 2016,
Special Issue 2 of 2 . . . . . . . . . . 1:1--1:??
Saman Ashkiani and
Andrew Davidson and
Ulrich Meyer and
John D. Owens GPU Multisplit: an Extended Study of a
Parallel Algorithm . . . . . . . . . . . 2:1--2:??
Yangzihao Wang and
Yuechao Pan and
Andrew Davidson and
Yuduo Wu and
Carl Yang and
Leyuan Wang and
Muhammad Osama and
Chenshan Yuan and
Weitang Liu and
Andy T. Riffel and
John D. Owens Gunrock: GPU Graph Analytics . . . . . . 3:1--3:??
Rezaul Chowdhury and
Pramod Ganapathi and
Stephen Tschudi and
Jesmin Jahan Tithi and
Charles Bachmeier and
Charles E. Leiserson and
Armando Solar-Lezama and
Bradley C. Kuszmaul and
Yuan Tang Autogen: Automatic Discovery of
Efficient Recursive Divide-&-Conquer
Algorithms for Solving Dynamic
Programming Problems . . . . . . . . . . 4:1--4:??
Guy L. Steele Jr. and
Jean-Baptiste Tristan Adding Approximate Counters . . . . . . 5:1--5:??
Grey Ballard and
Mary Hall and
Tim Harris and
Brandon Lucia Guest Editor Introduction PPoPP 2016,
Special Issue 2 of 2 . . . . . . . . . . 6:1--6:??
Saurabh Kalikar and
Rupesh Nasre DomLock: a New Multi-Granularity Locking
Technique for Hierarchies . . . . . . . 7:1--7:??
Syed Kamran Haider and
William Hasenplaugh and
Dan Alistarh Lease/Release: Architectural Support for
Scaling Contended Data Structures . . . 8:1--8:??
Man Cao and
Minjia Zhang and
Aritra Sengupta and
Swarnendu Biswas and
Michael D. Bond Hybridizing and Relaxing Dependence
Tracking for Efficient Parallel Runtime
Support . . . . . . . . . . . . . . . . 9:1--9:??
Georgios Chatzopoulos and
Aleksandar Dragojevi\'c and
Rachid Guerraoui ESTIMA: Extrapolating ScalabiliTy of
In-Memory Applications . . . . . . . . . 10:1--10:??
Vincenzo Gulisano and
Yiannis Nikolakopoulos and
Daniel Cederman and
Marina Papatriantafilou and
Philippas Tsigas Efficient Data Streaming Multiway
Aggregation through Concurrent
Algorithmic Designs and New Abstract
Data Types . . . . . . . . . . . . . . . 11:1--11:??
Tareq M. Malas and
Georg Hager and
Hatem Ltaief and
David E. Keyes Multidimensional Intratile
Parallelization for Memory-Starved
Stencil Computations . . . . . . . . . . 12:1--12:??
Kadir Akbudak and
Oguz Selvitopi and
Cevdet Aykanat Partitioning Models for Scaling Parallel
Sparse Matrix--Matrix Multiplication . . 13:1--13:??
Gianfranco Bilardi and
Michele Scquizzato and
Francesco Silvestri A Lower Bound Technique for
Communication in BSP . . . . . . . . . . 14:1--14:??
Semih Sahin and
Bugra Gedik C-Stream: a Co-routine-Based Elastic
Stream Processing Engine . . . . . . . . 15:1--15:??
Kunal Agrawal and
I-Ting Angelina Lee and
Michael Spear Introduction to Special Issue on SPAA'15 16:1--16:??
Kook Jin Ahn and
Sudipto Guha Access to Data and Number of Iterations:
Dual Primal Algorithms for Maximum
Matching under Resource Constraints . . 17:1--17:??
Dan Alistarh and
William Leiserson and
Alexander Matveev and
Nir Shavit ThreadScan: Automatic and Scalable
Memory Reclamation . . . . . . . . . . . 18:1--18:??
Dimitar Dimitrov and
Martin Vechev and
Vivek Sarkar Race Detection in Two Dimensions . . . . 19:1--19:??
I-Ting Angelina Lee and
Tao B. Schardl Efficient Race Detection for Reducer
Hyperobjects . . . . . . . . . . . . . . 20:1--20:??
Seth Gilbert Introduction to the Special Issue for
SPAA 2016 . . . . . . . . . . . . . . . 1:1--1:??
Michael Mitzenmacher and
Rajmohan Rajaraman and
Scott Roche Better Bounds for Coalescing-Branching
Random Walks . . . . . . . . . . . . . . 2:1--2:??
Mingmou Liu and
Xiaoyin Pan and
Yitong Yin Randomized Approximate Nearest Neighbor
Search with Limited Adaptivity . . . . . 3:1--3:??
Gopal Pandurangan and
Peter Robinson and
Michele Scquizzato Fast Distributed Algorithms for
Connectivity and MST in Large Graphs . . 4:1--4:??
Madhukar Korupolu and
Rajmohan Rajaraman Robust and Probabilistic Failure-Aware
Placement . . . . . . . . . . . . . . . 5:1--5:??
Deli Zhang and
Pierre Laborde and
Lance Lebanoff and
Damian Dechev Lock-Free Transactional Transformation
for Linked Data Structures . . . . . . . 6:1--6:??
Michel Müller and
Takayuki Aoki New High Performance GPGPU Code
Transformation Framework Applied to
Large Production Weather Prediction Code 7:1--7:??
Martin Burtscher and
Sindhu Devale and
Sahar Azimi and
Jayadharini Jaiganesh and
Evan Powers A High-Quality and Fast Maximal
Independent Set Implementation for GPUs 8:1--8:??
Junhong Liu and
Guangming Tan and
Yulong Luo and
Jiajia Li and
Zeyao Mo and
Ninghui Sun An Autotuning Protocol to Rapidly Build
Autotuners . . . . . . . . . . . . . . . 9:1--9:??
Antonio Fernández Anta and
Dariusz R. Kowalski and
Miguel A. Mosteiro and
Prudence W. H. Wong Scheduling Dynamic Parallel Workload of
Mobile Devices with Access Guarantees 10:1--10:??
Amirhossein Mirhosseini and
Mohammad Sadrosadati and
Fatemeh Aghamohammadi and
Mehdi Modarressi and
Hamid Sarbazi-Azad BARAN: Bimodal Adaptive
Reconfigurable-Allocator Network-on-Chip 11:1--11:??
Abdelhalim Amer and
Huiwei Lu and
Pavan Balaji and
Milind Chabbi and
Yanjie Wei and
Jeff Hammond and
Satoshi Matsuoka Lock Contention Management in
Multithreaded MPI . . . . . . . . . . . 12:1--12:??
Rong Chen and
Jiaxin Shi and
Yanzhe Chen and
Binyu Zang and
Haibing Guan and
Haibo Chen PowerLyra: Differentiated Graph
Computation and Partitioning on Skewed
Graphs . . . . . . . . . . . . . . . . . 13:1--13:??
Alex Aravind and
Wim H. Hesselink Group Mutual Exclusion by
Fetch-and-increment . . . . . . . . . . 14:1--14:??
Babak Behzad and
Surendra Byna and
Prabhat and
Marc Snir Optimizing I/O Performance of HPC
Applications with Autotuning . . . . . . 15:1--15:??
Tobias Maier and
Peter Sanders and
Roman Dementiev Concurrent Hash Tables: Fast and
General(?)! . . . . . . . . . . . . . . 16:1--16:??
Eduardo H. M. Cruz and
Matthias Diener and
Laércio L. Pilla and
Philippe O. A. Navaux EagerMap: a Task Mapping Algorithm to
Improve Communication and Load Balancing
in Clusters of Multicore Systems . . . . 17:1--17:??
David A. Bader Editorial from the Editor-in-Chief . . . 1:1--1:??
Martin Kronbichler and
Karl Ljungkvist Multigrid for Matrix-Free High-Order
Finite Element Computations on Graphics
Processors . . . . . . . . . . . . . . . 2:1--2:??
Vincenzo Bonifaci and
Andreas Wiese and
Sanjoy K. Baruah and
Alberto Marchetti-Spaccamela and
Sebastian Stiller and
Leen Stougie A Generalized Parallel Task Model for
Recurrent Real-Time Processes . . . . . 3:1--3:??
Hatem Ltaief and
Dalal Sukkari and
Aniello Esposito and
Yuji Nakatsukasa and
David Keyes Massively Parallel Polar Decomposition
on Distributed-memory Systems . . . . . 4:1--4:??
Dibakar Saha and
Koushik Sinha Optimal Schedule for All-to-All
Personalized Communication in
Multiprocessor Systems . . . . . . . . . 5:1--5:??
Sarunya Pumma and
Min Si and
Wu-Chun Feng and
Pavan Balaji Scalable Deep Learning via I/O Analysis
and Optimization . . . . . . . . . . . . 6:1--6:??
Guilllaume Aupy and
Ana Gainaru and
Valentin Le F\`evre I/O Scheduling Strategy for Periodic
Applications . . . . . . . . . . . . . . 7:1--7:??
Dimitri Kagaris and
Sourav Dutta Scheduling Mutual Exclusion Accesses in
Equal-Length Jobs . . . . . . . . . . . 8:1--8:??
Md Atiqul Mollah and
Wenqi Wang and
Peyman Faizian and
MD Shafayat Rahman and
Xin Yuan and
Scott Pakin and
Michael Lang Modeling Universal Globally Adaptive
Load-Balanced Routing . . . . . . . . . 9:1--9:??
Mohammed Hossein Bateni and
Mohammad T. Hajiaghayi and
Silvio Lattanzi Introduction to the Special Issue for
SPAA'17 . . . . . . . . . . . . . . . . 10:1--10:??
Sudipto Guha and
Yi Li and
Qin Zhang Distributed Partial Clustering . . . . . 11:1--11:??
Pierre Fraigniaud and
Dennis Olivetti Distributed Detection of Cycles . . . . 12:1--12:??
Susanne Albers On Energy Conservation in Data Centers 13:1--13:??
Björn Feldkord and
Friedhelm Meyer Auf Der Heide The Mobile Server Problem . . . . . . . 14:1--14:??
Yossi Azar and
Danny Vainstein Tight Bounds for Clairvoyant Dynamic Bin
Packing . . . . . . . . . . . . . . . . 15:1--15:??
Colin Cooper and
Tomasz Radzik and
Nicolas Rivera New Cover Time Bounds for the
Coalescing-Branching Random Walk on
Graphs . . . . . . . . . . . . . . . . . 16:1--16:??
He Sun and
Luca Zanetti Distributed Graph Clustering and
Sparsification . . . . . . . . . . . . . 17:1--17:??
Shahbaz Khan Near Optimal Parallel Algorithms for
Dynamic DFS in Undirected Graphs . . . . 18:1--18:??
Lawrence Rauchwerger and
Jaejin Lee and
Armando Solar-Lezama and
Guy Steele Introduction to the Special Issue on
PPoPP 2017 (Part 1) . . . . . . . . . . 19:1--19:??
Tao B. Schardl and
William S. Moses and
Charles E. Leiserson Tapir: Embedding Recursive Fork-join
Parallelism into LLVM's Intermediate
Representation . . . . . . . . . . . . . 19:1--19:??
Robert Utterback and
Kunal Agrawal and
I-Ting Angelina Lee and
Milind Kulkarni Processor-Oblivious Record and Replay 20:1--20:??
Tsung Tai Yeh and
Amit Sabne and
Putt Sakdhnagool and
Rudolf Eigenmann and
Timothy G. Rogers Pagoda: a GPU Runtime System for Narrow
Tasks . . . . . . . . . . . . . . . . . 21:1--21:??
Guy L. Steele Jr. and
Jean-Baptiste Tristan Using Butterfly-patterned Partial Sums
to Draw from Discrete Distributions . . 22:1--22:??
Hans Vandierendonck and
Dimitrios S. Nikolopoulos Hyperqueues: Design and Implementation
of Deterministic Concurrent Queues . . . 23:1--23:??
Bin Ren and
Shruthi Balakrishna and
Youngjoon Jo and
Sriram Krishnamoorthy and
Kunal Agrawal and
Milind Kulkarni Extracting SIMD Parallelism from
Recursive Task-Parallel Programs . . . . 24:1--24:??
Antonino Tumeo and
Fabrizio Petrini and
John Feo and
Mahantesh Halappanavar Introduction to the TOPC Special Issue
on Innovations in Systems for Irregular
Applications, Part 1 . . . . . . . . . . 1:1--1:2
Hartwig Anzt and
Terry Cojean and
Chen Yen-Chen and
Jack Dongarra and
Goran Flegar and
Pratik Nayak and
Stanimire Tomov and
Yuhsiang M. Tsai and
Weichung Wang Load-balancing Sparse Matrix Vector
Product Kernels on GPUs . . . . . . . . 2:1--2:26
Ioan Hadade and
Timothy M. Jones and
Feng Wang and
Luca di Mare Software Prefetching for Unstructured
Mesh Applications . . . . . . . . . . . 3:1--3:23
Thomas Grützmacher and
Terry Cojean and
Goran Flegar and
Hartwig Anzt and
Enrique S. Quintana-Ortí Acceleration of PageRank with Customized
Precision Based on Mantissa Segmentation 4:1--4:19
Apurba Das and
Seyed-Vahid Sanei-Mehri and
Srikanta Tirthapura Shared-memory Parallel Maximal Clique
Enumeration from Static and Dynamic
Graphs . . . . . . . . . . . . . . . . . 5:1--5:28
Kathleen E. Hamilton and
Catherine D. Schuman and
Steven R. Young and
Ryan S. Bennink and
Neena Imam and
Travis S. Humble Accelerating Scientific Computing in the
Post-Moore's Era . . . . . . . . . . . . 6:1--6:31
Kartik Lakhotia and
Rajgopal Kannan and
Sourav Pati and
Viktor Prasanna GPOP: a Scalable Cache- and
Memory-efficient Framework for Graph
Processing over Parts . . . . . . . . . 7:1--7:24
Jeff Anderson and
Engin Kayraklioglu and
Shuai Sun and
Joseph Crandall and
Yousra Alkabani and
Vikram Narayana and
Volker Sorger and
Tarek El-Ghazawi ROC: a Reconfigurable Optical Computer
for Simulating Physical Processes . . . 8:1--8:29
Alireza Monemi and
Farshad Khunjush and
Maurizio Palesi and
Hamid Sarbazi-Azad An Enhanced Dynamic Weighted Incremental
Technique for QoS Support in NoC . . . . 9:1--9:31
Aravind Natarajan and
Arunmoezhi Ramachandran and
Neeraj Mittal FEAST: a Lightweight Lock-free
Concurrent Binary Search Tree . . . . . 10:1--10:64
Ahmad Salah and
Kenli Li and
Qing Liao and
Mervat Hashem and
Zhiyong Li and
Anthony T. Chronopoulos and
Albert Y. Zomaya A Time-space Efficient Algorithm for
Parallel $k$-way In-place Merging based
on Sequence Partitioning and Perfect
Shuffle . . . . . . . . . . . . . . . . 11:1--11:23
Shaohua Duan and
Pradeep Subedi and
Philip Davis and
Keita Teranishi and
Hemanth Kolla and
Marc Gamell and
Manish Parashar CoREC: Scalable and Resilient In-memory
Data Staging for In-situ Workflows . . . 12:1--12:29
Maksudul Alam and
Maleq Khan and
Kalyan S. Perumalla and
Madhav Marathe Generating Massive Scale-free Networks:
Novel Parallel Algorithms using the
Preferential Attachment Model . . . . . 13:1--13:35
Jaejin Lee and
Lawrence Rauchwerger and
Armando Solar-Lezama and
Guy Steele Introduction to the Special Issue on
PPoPP 2017 (Part 2) . . . . . . . . . . 14:1--14:2
Peng Jiang and
Yang Xia and
Gagan Agrawal Combining SIMD and Many/Multi-core
Parallelism for Finite-state Machines
with Enumerative Speculation . . . . . . 15:1--15:26
Dmitry Basin and
Edward Bortnikov and
Anastasia Braginsky and
Guy Golan-Gueta and
Eshcar Hillel and
Idit Keidar and
Moshe Sulamy KiWi: a Key--value Map for Scalable
Real-time Analytics . . . . . . . . . . 16:1--16:28
Milind Chabbi and
Abdelhalim Amer and
Xu Liu Efficient Abortable-locking Protocol for
Multi-level NUMA Systems: Design and
Correctness . . . . . . . . . . . . . . 17:1--17:32
Tal Ben-Nun and
Michael Sutton and
Sreepathi Pai and
Keshav Pingali Groute: Asynchronous Multi-GPU
Programming Model with Applications to
Large-scale Graph Processing . . . . . . 18:1--18:27
Christie Alappat and
Achim Basermann and
Alan R. Bishop and
Holger Fehske and
Georg Hager and
Olaf Schenk and
Jonas Thies and
Gerhard Wellein A Recursive Algebraic Coloring Technique
for Hardware-efficient Symmetric Sparse
Matrix--vector Multiplication . . . . . 19:1--19:37
Denis Davydov and
Martin Kronbichler Algorithms and Data Structures for
Matrix-Free Finite Element Operators
with MPI-Parallel Sparse Multi-Vectors 20:1--20:30
Tapan K. Sengupta and
Prasannabalaji Sundaram and
Vajjala K. Suman and
Swagata Bhaumik A High Accuracy Preserving Parallel
Algorithm for Compact Schemes for DNS 21:1--21:32
Karan Aggarwal and
Uday Bondhugula Optimizing the Linear Fascicle
Evaluation Algorithm for Multi-core and
Many-core Systems . . . . . . . . . . . 22:1--22:45
Antonino Tumeo and
Fabrizio Petrini and
John Feo and
Mahantesh Halappanavar Introduction to the TOPC Special Issue
on Innovations in Systems for Irregular
Applications, Part 2 . . . . . . . . . . 23:1--23:2
Naveen Namashivayam and
Bill Long and
Deepak Eachempati and
Bob Cernohous and
Mark Pagel A Modern Fortran Interface in OpenSHMEM
Need for Interoperability with Parallel
Fortran Using Coarrays . . . . . . . . . 24:1--24:25
Eric R. Hein and
Srinivas Eswar and
Abdurrahman Yasar and
Jiajia Li and
Jeffrey S. Young and
Thomas M. Conte and
Ümit V. Çatalyürek and
Richard Vuduc and
Jason Riedy and
Bora Uçar Programming Strategies for Irregular
Algorithms on the Emu Chick . . . . . . 25:1--25:25
John D. Leidel and
Xi Wang and
Brody Williams and
Yong Chen Toward a Microarchitecture for Efficient
Execution of Irregular Applications . . 26:1--26:24
Pietro Fezzardi and
Fabrizio Ferrandi Automated Bug Detection for High-level
Synthesis of Multi-threaded Irregular
Applications . . . . . . . . . . . . . . 27:1--27:26
Lee Savoie and
David K. Lowenthal and
Bronis R. De Supinski and
Kathryn Mohror and
Nikhil Jain Mitigating Inter-Job Interference via
Process-Level Quality-of-Service . . . . 1:1--1:26
Tahsin Reza and
Hassan Halawa and
Matei Ripeanu and
Geoffrey Sanders and
Roger A. Pearce Scalable Pattern Matching in Metadata
Graphs via Constraint Checking . . . . . 2:1--2:45
Jeremy Fineman and
Aydin Buluc and
Seth Gilbert Introduction to the Special Issue for
SPAA 2018: Part 1 . . . . . . . . . . . 3e:1--3e:1
Noga Alon and
Yossi Azar and
Mark Berlin The Price of Bounded Preemption . . . . 3:1--3:21
Laxman Dhulipala and
Guy E. Blelloch and
Julian Shun Theoretically Efficient Parallel Graph
Algorithms Can Be Fast and Scalable . . 4:1--4:70
Haim Kaplan and
Shay Solomon Dynamic Representations of Sparse
Distributed Networks: a
Locality-sensitive Approach . . . . . . 5:1--5:26
Jeremy Fineman and
Aydin Buluc and
Seth Gilbert Introduction to the Special Issue for
SPAA 2018 --- Part 2 . . . . . . . . . . 6:1--6:1
Gopal Pandurangan and
Peter Robinson and
Michele Scquizzato On the Distributed Complexity of
Large-Scale Graph Computations . . . . . 7:1--7:28
Barbara Geissmann and
Lukas Gianinazzi Parallel Minimum Cuts in Near-linear
Work and Low Depth . . . . . . . . . . . 8:1--8:20
Giorgio Lucarelli and
Benjamin Moseley and
Nguyen Kim Thang and
Abhinav Srivastav and
Denis Trystram Online Non-preemptive Scheduling on
Unrelated Machines with Rejections . . . 9:1--9:22
Kjell Winblad and
Konstantinos Sagonas and
Bengt Jonsson Lock-free Contention Adapting Search
Trees . . . . . . . . . . . . . . . . . 10:1--10:38
Oded Green HashGraph --- Scalable Hash Tables Using
a Sparse Graph Data Structure . . . . . 11:1--11:17
Petra Berenbrink Introduction to the Special Issue for
SPAA 2019 . . . . . . . . . . . . . . . 12:1--12:1
Soheil Behnezhad and
Laxman Dhulipala and
Hossein Esfandiari and
Jakub Lacki and
Vahab Mirrokni and
Warren Schudy Massively Parallel Computation via
Remote Memory Access . . . . . . . . . . 13:1--13:25
Faith Ellen and
Barun Gorain and
Avery Miller and
Andrzej Pelc Constant-Length Labeling Schemes for
Deterministic Radio Broadcast . . . . . 14:1--14:17
Michael A. Bender and
Alex Conway and
Martín Farach-Colton and
William Jannen and
Yizheng Jiao and
Rob Johnson and
Eric Knorr and
Sara Mcallister and
Nirjhar Mukherjee and
Prashant Pandey and
Donald E. Porter and
Jun Yuan and
Yang Zhan External-memory Dictionaries in the
Affine and PDAM Models . . . . . . . . . 15:1--15:20
Matthias Maier and
Martin Kronbichler Efficient Parallel $3$D Computation of
the Compressible Euler Equations with an
Invariant-domain Preserving Second-order
Finite-element Scheme . . . . . . . . . 16:1--16:30
James Edwards and
Uzi Vishkin Study of Fine-grained Nested Parallelism
in CDCL SAT Solvers . . . . . . . . . . 17:1--17:18
Laurent Feuilloley and
Pierre Fraigniaud Randomized Local Network Computing:
Derandomization Beyond Locally Checkable
Labelings . . . . . . . . . . . . . . . 18:1--18:25
Saleh Khalaj Monfared and
Omid Hajihassani and
Vahid Mohsseni and
Dara Rahmati and
Saeid Gorgin A High-throughput Parallel Viterbi
Algorithm via Bitslicing . . . . . . . . 19:1--19:25
Shao-Chung Wang and
Lin-Ya Yu and
Li-An Her and
Yuan-Shin Hwang and
Jenq-Kuen Lee Pointer-Based Divergence Analysis for
OpenCL 2.0 Programs . . . . . . . . . . 20:1--20:23
Xuejiao Kang and
David F. Gleich and
Ahmed Sameh and
Ananth Grama Adaptive Erasure Coded Fault Tolerant
Linear System Solver . . . . . . . . . . 21:1--21:19
Prasad Jayanti and
Siddhartha Jayanti Deterministic Constant-Amortized-RMR
Abortable Mutex for CC and DSM . . . . . 22:1--22:26
Matthew Leinhauser and
René Widera and
Sergei Bastrakov and
Alexander Debus and
Michael Bussmann and
Sunita Chandrasekaran Metrics and Design of an Instruction
Roofline Model for AMD GPUs . . . . . . 1:1--1:14
Michael Axtmann and
Sascha Witt and
Daniel Ferizovic and
Peter Sanders Engineering In-place (Shared-memory)
Sorting Algorithms . . . . . . . . . . . 2:1--2:62
Rory Mitchell and
Daniel Stokes and
Eibe Frank and
Geoffrey Holmes Bandwidth-Optimal Random Shuffling for
GPUs . . . . . . . . . . . . . . . . . . 3:1--3:20
Rabab Alomairy and
Wael Bader and
Hatem Ltaief and
Youssef Mesri and
David Keyes High-performance $3$D Unstructured Mesh
Deformation Using Rank Structured Matrix
Computations . . . . . . . . . . . . . . 4:1--4:23
Gal Milman-Sela and
Alex Kogan and
Yossi Lev and
Victor Luchangco and
Erez Petrank BQ: a Lock-Free Queue with Batching . . 5:1--5:49
Arik Rinberg and
Alexander Spiegelman and
Edward Bortnikov and
Eshcar Hillel and
Idit Keidar and
Lee Rhodes and
Hadar Serviansky Fast Concurrent Data Sketches . . . . . 6:1--6:35
Guy Blelloch and
Daniel Ferizovic and
Yihan Sun Joinable Parallel Balanced Binary Trees 7:1--7:41
Yuedan Chen and
Guoqing Xiao and
Kenli Li and
Francesco Piccialli and
Albert Y. Zomaya fgSpMSpV: a Fine-grained Parallel SpMSpV
Framework on HPC Platforms . . . . . . . 8:1--8:29
Sixue Cliff Liu and
Robert Endre Tarjan Simple Concurrent Connected Components
Algorithms . . . . . . . . . . . . . . . 9:1--9:26
Ghadeer Alabandi and
Martin Burtscher Improving the Speed and Quality of
Parallel Graph Coloring . . . . . . . . 10:1--10:35
Hung K. Nguyen and
Xuan-Tu Tran Design and Implementation of a
Coarse-grained Dynamically
Reconfigurable Multimedia Accelerator 11:1--11:23
M. A. Anju and
Rupesh Nasre Multi-Interval DomLock: Toward Improving
Concurrency in Hierarchies . . . . . . . 12:1--12:27
Yuede Ji and
Hang Liu and
Yang Hu and
H. Howie Huang iSpan: Parallel Identification of
Strongly Connected Components with
Spanning Trees . . . . . . . . . . . . . 13:1--13:27
Anne Benoit and
Luca Perotin and
Yves Robert and
Hongyang Sun Checkpointing Workflows \`a la
Young/Daly Is Not Good Enough . . . . . 14:1--14:??
Susanne Albers and
Jens Quedenfeld Optimal Algorithms for Right-sizing Data
Centers . . . . . . . . . . . . . . . . 15:1--15:??
Giorgos Kappes and
Stergios V. Anastasiadis A Family of Relaxed Concurrent Queues
for Low-Latency Operations and Item
Transfers . . . . . . . . . . . . . . . 16:1--16:??
Prasannabalaji Sundaram and
Aditi Sengupta and
Vajjala K. Suman and
Tapan K. Sengupta Non-overlapping High-accuracy Parallel
Closure for Compact Schemes: Application
in Multiphysics and Complex Geometry . . 1:1--1:??
Shelby Lockhart and
Amanda Bienz and
William Gropp and
Luke Olson Performance Analysis and Optimal
Node-aware Communication for Enlarged
Conjugate Gradient Methods . . . . . . . 2:1--2:??
Peter Munch and
Timo Heister and
Laura Prieto Saavedra and
Martin Kronbichler Efficient Distributed Matrix-free
Multigrid Methods on Locally Refined
Meshes for FEM Computations . . . . . . 3:1--3:??
Christoph Klein and
Robert Strzodka Tridigpu: a GPU Library for Block
Tridiagonal and Banded Linear Equation
Systems . . . . . . . . . . . . . . . . 4:1--4:??
Kartik Lakhotia and
Rajgopal Kannan and
Viktor Prasanna Parallel Peeling of Bipartite Networks
for Hierarchical Dense Subgraph
Discovery . . . . . . . . . . . . . . . 5:1--5:??
Weijian Zheng and
Dali Wang and
Fengguang Song A Distributed-GPU Deep Reinforcement
Learning System for Solving Large Graph
Optimization Problems . . . . . . . . . 6:1--6:??
Andrew D. Brown and
Jonathan R. Beaumont and
David B. Thomas and
Julian C. Shillcock and
Matthew F. Naylor and
Graeme M. Bragg and
Mark L. Vousden and
Simon W. Moore and
Shane T. Fleming POETS: an Event-driven Approach to
Dissipative Particle Dynamics:
Implementing a Massively
Compute-intensive Problem on a Novel
Hard/Software Architecture. . . . . . . 7:1--7:??
Valentina Aleeva and
Rifkhat Aleev Investigation and Implementation of
Parallelism Resources of Numerical
Algorithms . . . . . . . . . . . . . . . 8:1--8:??
Haotian Wang and
Wangdong Yang and
Renqiu Ouyang and
Rong Hu and
Kenli Li and
Keqin Li A Heterogeneous Parallel Computing
Approach Optimizing SpTTM on CPU-GPU via
GCN . . . . . . . . . . . . . . . . . . 9:1--9:??
Zheng Miao and
Jon C. Calhoun and
Rong Ge and
Jiajia Li Performance Implication of Tensor
Irregularity and Optimization for
Distributed Tensor Decomposition . . . . 10:1--10:??
Wim H. Hesselink and
Peter A. Buhr MCSH, a Lock with the Standard Interface 11:1--11:??
Hadi Zamani and
Laxmi Bhuyan and
Jieyang Chen and
Zizhong Chen GreenMD: Energy-efficient Matrix
Decomposition on Heterogeneous Multi-GPU
Systems . . . . . . . . . . . . . . . . 12:1--12:??
Aleksandar Kamenev and
Dariusz R. Kowalski and
Miguel A. Mosteiro Faster Supervised Average Consensus in
Adversarial and Stochastic Anonymous
Dynamic Networks . . . . . . . . . . . . 13:1--13:??
Sanjoy Baruah and
Alberto Marchetti-Spaccamela The Computational Complexity of
Feasibility Analysis for Conditional DAG
Tasks . . . . . . . . . . . . . . . . . 14:1--14:??
Jovan Blanusa and
Kubilay Atasu and
Paolo Ienne Fast Parallel Algorithms for Enumeration
of Simple, Temporal, and Hop-constrained
Cycles . . . . . . . . . . . . . . . . . 15:1--15:??
Andreas Alvermann and
Georg Hager and
Holger Fehske Orthogonal Layers of Parallelism in
Large-Scale Eigenvalue Computations . . 16:1--16:??
Yossi Azar and
Julian Shun Introduction to the Special Issue for
SPAA'21 . . . . . . . . . . . . . . . . 17:1--17:??
Daniel Anderson and
Guy E. Blelloch Parallel Minimum Cuts in $ O(m \log_2 n)
$ Work and Low Depth . . . . . . . . . . 18:1--18:??
Sungjin Im and
Ravi Kumar and
Mahshid Montazer Qaem and
Manish Purohit Non-clairvoyant Scheduling with
Predictions . . . . . . . . . . . . . . 19:1--19:??
Susanne Albers and
Jens Quedenfeld Algorithms for Right-sizing
Heterogeneous Data Centers . . . . . . . 20:1--20:??
Yannic Maus Distributed Graph Coloring Made Easy . . 21:1--21:??
Zafar Ahmad and
Rezaul Chowdhury and
Rathish Das and
Pramod Ganapathi and
Aaron Gregory and
Yimin Zhu A Fast Algorithm for Aperiodic Linear
Stencil Computation using Fast Fourier
Transforms . . . . . . . . . . . . . . . 22:1--22:??
Anne Benoit and
Lucas Perotin and
Yves Robert and
Frédéric Vivien Checkpointing Strategies to Tolerate
Non-Memoryless Failures on HPC Platforms 1:1--1:??
Lucas Perotin and
Hongyang Sun Improved Online Scheduling of Moldable
Task Graphs under Common Speedup Models 2:1--2:??
Shengle Lin and
Wangdong Yang and
Yikun Hu and
Qinyun Cai and
Minlu Dai and
Haotian Wang and
Kenli Li HPS Cholesky: Hierarchical Parallelized
Supernodal Cholesky with Adaptive
Parameters . . . . . . . . . . . . . . . 3:1--3:??
Romolo Marotta and
Mauro Ianni and
Alessandro Pellegrini and
Francesco Quaglia A Conflict-Resilient Lock-Free
Linearizable Calendar Queue . . . . . . 4:1--4:??
Stefan K. Muller and
Jan Hoffmann Modeling and Analyzing Evaluation Cost
of CUDA Kernels . . . . . . . . . . . . 5:1--5:??
Qinyun Cai and
Guoqing Xiao and
Shengle Lin and
Wangdong Yang and
Keqin Li and
Kenli Li ABSS: an Adaptive Batch-Stream
Scheduling Module for Dynamic Task
Parallelism on Chiplet-based Multi-Chip
Systems . . . . . . . . . . . . . . . . 6:1--6:??
Qiang Fu and
Yuede Ji and
Thomas Rolinger and
H. Howie Huang TLPGNN: a Lightweight Two-level
Parallelism Paradigm for Graph Neural
Network Computation on Single and
Multiple GPUs . . . . . . . . . . . . . 7:1--7:??
Zixuan Li and
Yunchuan Qin and
Qi Xiao and
Wangdong Yang and
Kenli Li cuFasterTucker: a Stochastic
Optimization Strategy for Parallel
Sparse FastTucker Decomposition on GPU
Platform . . . . . . . . . . . . . . . . 8:1--8:??
Sébastien Darche and
Michel R. Dagenais Low-Overhead Trace Collection and
Profiling on GPU Compute Kernels . . . . 9:1--9:??
Ziyang Li and
Dongsheng Li and
Yingwen Chen and
Kai Chen and
Yiming Zhang Decentralized Scheduling for
Data-Parallel Tasks in the Cloud . . . . 10:1--10:??
Guoqing Xiao and
Tao Zhou and
Yuedan Chen and
Yikun Hu and
Kenli Li Machine Learning-Based Kernel Selector
for SpMV Optimization in Graph Analysis 11:1--11:??
Zixuan Li and
Yikun Hu and
Mengquan Li and
Wangdong Yang and
Kenli Li cuFastTucker: a Novel Sparse FastTucker
Decomposition For HHLST on Multi-GPUs 12:1--12:??
Yiqian Liu and
Noushin Azami and
Avery Vanausdal and
Martin Burtscher Indigo3: a Parallel Graph Analytics
Benchmark Suite for Exploring
Implementation Styles and Common Bugs 13:1--13:??
Johan Bontes and
James Gain Redzone stream compaction: removing $k$
items from a list in parallel $ O(k) $
time . . . . . . . . . . . . . . . . . . 14:1--14:??
Cu Cui Acceleration of Tensor-Product
Operations with Tensor Cores . . . . . . 15:1--15:??
Wim A. Hesselink and
Peter A. Buhr and
Colby A. Parsons First-Come-First-Served as a Separate
Principle . . . . . . . . . . . . . . . 16:1--16:??
Johannes Pahlke and
Ivo F. Sbalzarini Proven Distributed Memory
Parallelization of Particle Methods . . 17:1--17:??
Hermann Bogning Tepiele and
Vianney Kengne Tchendji and
Mathias Akong Onabid and
Jean Frédéric Myoupo and
Armel Nkonjoh Ngomade Dominant Point-Based Sequential and
Parallel Algorithms for the Multiple
Sequential Substring Constrained-LCS
Problem . . . . . . . . . . . . . . . . 18:1--18:??
Minh Pham and
Yongke Yuan and
Hao Li and
Chengcheng Mou and
Yicheng Tu and
Zichen Xu and
Jinghan Meng Dynamic Buffer Management in Massively
Parallel Systems: The Power of
Randomness . . . . . . . . . . . . . . . 1:1--1:??
Chun-Yu Wu and
Chih-Chieh Tu and
Kai-Jung Cheng and
Che-Rung Lee EITHOT: Efficient In-place Transposition
of High Order Tensors on GPUs . . . . . 2:1--2:??
Youguang Chen and
William Ruys and
George Biros KNN-DBSCAN: a DBSCAN in high dimensions 3:1--3:??
Alexander Lindermayr and
Nicole Megow Permutation Predictions for
Non-Clairvoyant Scheduling . . . . . . . 4:1--4:??
Dmitrii Tolmachev and
Philippe Marti and
Giacomo Castiglioni and
Andrew Jackson High Performance Solution of Tridiagonal
Systems on the GPU . . . . . . . . . . . 5:1--5:25
Hussam Al Daas and
Grey Ballard and
Laura Grigori and
Suraj Kumar and
Kathryn Rouse and
Mathieu Verite Communication Lower Bounds and Optimal
Algorithms for Symmetric Matrix
Computations . . . . . . . . . . . . . . 6:1--6:??