Last update: Fri Sep 13 15:22:55 MDT 2024
Volume 1, Number 1, September, 2014Phillip B. Gibbons ACM Transactions on Parallel Computing: an introduction . . . . . . . . . . . . 1:1--1:?? David J. Lilja Introduction . . . . . . . . . . . . . . 2:1--2:?? Ashay Rane and James Browne Enhancing Performance Optimization of Multicore/Multichip Nodes with Data Structure Metrics . . . . . . . . . . . 3:1--3:?? Víctor Jiménez and Francisco J. Cazorla and Roberto Gioiosa and Alper Buyuktosunoglu and Pradip Bose and Francis P. O'Connell and Bruce G. Mealey Adaptive Prefetching on POWER7: Improving Performance and Power Consumption . . . . . . . . . . . . . . 4:1--4:?? Timothy Heil and Anil Krishna and Nicholas Lindberg and Farnaz Toussi and Steven Vanderwiel Architecture and Performance of the Hardware Accelerators in IBM's PowerEN Processor . . . . . . . . . . . . . . . 5:1--5:?? Xing Wu and Frank Mueller and Scott Pakin A methodology for automatic generation of executable communication specifications from parallel MPI applications . . . . . . . . . . . . . . 6:1--6:?? Mahesh Ravishankar and John Eisenlohr and Louis-Noël Pouchet and J. Ramanujam and Atanas Rountev and P. Sadayappan Automatic parallelization of a class of irregular loops for distributed memory systems . . . . . . . . . . . . . . . . 7:1--7:?? Julian Shun and Guy E. Blelloch A simple parallel Cartesian tree algorithm and its application to parallel suffix tree construction . . . 8:1--8:??
Keshav Pingali and J. Ramanujam and P. Sadayappan Introduction to the Special Issue on PPoPP'12 . . . . . . . . . . . . . . . . 9:1--9:?? Aurelien Bouteiller and Thomas Herault and George Bosilca and Peng Du and Jack Dongarra Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy . . . . . . . . . 10:1--10:?? Grey Ballard and James Demmel and Nicholas Knight Avoiding Communication in Successive Band Reduction . . . . . . . . . . . . . 11:1--11:?? Paul Sack and William Gropp Collective Algorithms for Multiported Torus Networks . . . . . . . . . . . . . 12:1--12:?? David Dice and Virendra J. Marathe and Nir Shavit Lock Cohorting: a General Technique for Designing NUMA Locks . . . . . . . . . . 13:1--13:?? Duane Merrill and Michael Garland and Andrew Grimshaw High-Performance and Scalable GPU Graph Traversal . . . . . . . . . . . . . . . 14:1--14:?? Stephan C. Kramer and Johannes Hagemann SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP . . . . . . . . . . . . 15:1--15:?? Ehsan Totoni and Nikhil Jain and Laxmikant V. Kale Power Management of Extreme-Scale Networks with On/Off Links in Runtime Systems . . . . . . . . . . . . . . . . 16:1--16:??
Maurice Herlihy Guest Editor Introduction . . . . . . . 1:1--1:?? Bastian Degener and Barbara Kempkes and Peter Kling and Friedhelm Meyer Auf Der Heide Linear and Competitive Strategies for Continuous Robot Formation Problems . . 2:1--2:?? Navendu Jain and Ishai Menache and Joseph (Seffi) Naor and Jonathan Yaniv Near-Optimal Scheduling Mechanisms for Deadline-Sensitive Jobs in Large Computing Clusters . . . . . . . . . . . 3:1--3:?? Moran Feldman and Liane Lewin-Eytan and Joseph (Seffi) Naor Hedonic Clustering Games . . . . . . . . 4:1--4:?? Janmartin Jahn and Santiago Pagani and Sebastian Kobbe and Jian-Jia Chen and Jörg Henkel Runtime Resource Allocation for Software Pipelines . . . . . . . . . . . . . . . 5:1--5:?? Yi Xu and Bo Zhao and Youtao Zhang and Jun Yang Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip Routers . . . . . . . . . . . . 6:1--6:?? Adam Hammouda and Andrew R. Siegel and Stephen F. Siegel Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates . . . . . . . . . . . . 7:1--7:?? Ciaran McCreesh and Patrick Prosser The Shape of the Search Tree for the Maximum Clique Problem and the Implications for Parallel Branch and Bound . . . . . . . . . . . . . . . . . 8:1--8:??
Torsten Hoefler and James Dinan and Rajeev Thakur and Brian Barrett and Pavan Balaji and William Gropp and Keith Underwood Remote Memory Access Programming in MPI-3 . . . . . . . . . . . . . . . . . 9:1--9:?? Walther Maldonado and Patrick Marlier and Pascal Felber and Julia Lawall and Gilles Muller and Etienne Rivi\`ere Supporting Time-Based QoS Requirements in Software Transactional Memory . . . . 10:1--10:?? Gokcen Kestor and Osman S. Unsal and Adrian Cristal and Serdar Tasiran TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems . . . . . . . . . . . . . . . . 11:1--11:?? Nuno Diegues and Paolo Romano Time-Warp: Efficient Abort Reduction in Transactional Memory . . . . . . . . . . 12:1--12:?? Lionel Eyraud-Dubois and Loris Marchal and Oliver Sinnen and Frédéric Vivien Parallel Scheduling of Task Trees with Limited Memory . . . . . . . . . . . . . 13:1--13:??
Michael Dinitz and Torsten Hoefler Introduction to the Special Issue on SPAA 2013 . . . . . . . . . . . . . . . 14:1--14:?? Ravi Kumar and Benjamin Moseley and Sergei Vassilvitskii and Andrea Vattani Fast Greedy Algorithms in MapReduce and Streaming . . . . . . . . . . . . . . . 14:1--14:?? Peter Sanders and Jochen Speck and Raoul Steffen Work-Efficient Matrix Inversion in Polylogarithmic Time . . . . . . . . . . 15:1--15:?? Seth Gilbert and Chaodong Zheng SybilCast: Broadcast on the Open Airwaves . . . . . . . . . . . . . . . . 16:1--16:?? I-Ting Angelina Lee and Charles E. Leiserson and Tao B. Schardl and Zhunping Zhang and Jim Sukha On-the-Fly Pipeline Parallelism . . . . 17:1--17:?? Martina Eikel and Christian Scheideler IRIS: a Robust Information System Against Insider DoS Attacks . . . . . . 18:1--18:?? Peter Kling and Peter Pietrzyk Profitable Scheduling on Multiple Speed-Scalable Processors . . . . . . . 19:1--19:?? Chinmoy Dutta and Gopal Pandurangan and Rajmohan Rajaraman and Scott Roche Coalescing-Branching Random Walks on Graphs . . . . . . . . . . . . . . . . . 20:1--20:??
James Larus and Sandhya Dwarkadas and José Moreira and Andrew Lumsdaine Introduction to the Special Issue on PPoPP'14 . . . . . . . . . . . . . . . . 21:1--21:?? Maurice Herlihy and Zhiyu Liu Well-Structured Futures and Cache Locality . . . . . . . . . . . . . . . . 22:1--22:?? Paul Thomson and Alastair F. Donaldson and Adam Betts Concurrency Testing Using Controlled Schedulers: an Empirical Study . . . . . 23:1--23:?? Darko Petrovi\'c and Thomas Ropars and André Schiper Leveraging Hardware Message Passing for Efficient Thread Synchronization . . . . 24:1--24:?? Olivier Tardieu and Benjamin Herta and David Cunningham and David Grove and Prabhanjan Kambadur and Vijay Saraswat and Avraham Shinnar and Mikio Takeuchi and Mandana Vaziri and Wei Zhang X10 and APGAS at Petascale . . . . . . . 25:1--25:?? Saeed Maleki and Madanlal Musuvathi and Todd Mytkowicz Low-Rank Methods for Parallelizing Dynamic Programming Algorithms . . . . . 26:1--26:?? Xin Yuan and Wickus Nienaber and Santosh Mahapatra On Folded-Clos Networks with Deterministic Single-Path Routing . . . 27:1--27:?? Edans F. De O. Sandes and Guillermo Miranda and Xavier Martorell and Eduard Ayguade and George Teodoro and Alba C. M. A. De Melo MASA: a Multiplatform Architecture for Sequence Aligners with Block Pruning . . 28:1--28:??
Friedhelm Meyer auf der Heide and Peter Sanders and Nodari Sitchinava Introduction to the Special Issue on SPAA 2014 . . . . . . . . . . . . . . . 1:1--1:?? Tim Kaler and William Hasenplaugh and Tao B. Schardl and Charles E. Leiserson Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling . . . . . . . . . . 2:1--2:?? Sungjin Im and Benjamin Moseley and Kirk Pruhs and Eric Torng Competitively Scheduling Tasks with Intermediate Parallelizability . . . . . 4:1--4:?? Ioana O. Bercea and Navin Goyal and David G. Harris and Aravind Srinivasan On Computing Maximal Independent Sets of Hypergraphs in Parallel . . . . . . . . 5:1--5:?? Davide Bil\`o and Luciano Gual\`a and Stefano Leucci and Guido Proietti Locality-Based Network Creation Games 6:1--6:?? Jiayang Jiang and Michael Mitzenmacher and Justin Thaler Parallel Peeling Algorithms . . . . . . 7:1--7:?? Harsha Vardhan Simhadri and Guy E. Blelloch and Jeremy T. Fineman and Phillip B. Gibbons and Aapo Kyrola Experimental Analysis of Space-Bounded Schedulers . . . . . . . . . . . . . . . 8:1--8:??
Hafiz Fahad Sheikh and Ishfaq Ahmad Sixteen Heuristics for Joint Optimization of Performance, Energy, and Temperature in Allocating Tasks to Multi-Cores . . . . . . . . . . . . . . 9:1--9:?? Jeffrey D. Blanchard and Erik Opavsky and Emircan Uysaler Selecting Multiple Order Statistics with a Graphics Processing Unit . . . . . . . 10:1--10:?? David Böhme and Markus Geimer and Lukas Arnold and Felix Voigtlaender and Felix Wolf Identifying the Root Causes of Wait States in Large-Scale Parallel Applications . . . . . . . . . . . . . . 11:1--11:?? Roshan Dathathri and Ravi Teja Mullapudi and Uday Bondhugula Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory . . . . . . . . . . . 12:1--12:?? Anne Benoit and Aurélien Cavelan and Yves Robert and Hongyang Sun Assessing General-Purpose Algorithms to Cope with Fail-Stop and Silent Errors 13:1--13:?? Ioannis Koutis and Shen Chen Xu Simple Parallel and Distributed Algorithms for Spectral Graph Sparsification . . . . . . . . . . . . . 14:1--14:??
Matthieu Dorier and Gabriel Antoniu and Franck Cappello and Marc Snir and Robert Sisneros and Orcun Yildiz and Shadi Ibrahim and Tom Peterka and Leigh Orf Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations . . . . . . . 15:1--15:?? Jiaquan Gao and Yu Wang and Jun Wang and Ronghua Liang Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs . . . . . . . . . . . . . . . 16:1--16:?? Timothy Creech and Rajeev Barua Transparently Space Sharing a Multicore Among Multiple Processes . . . . . . . . 17:1--17:?? Grey Ballard and Alex Druinsky and Nicholas Knight and Oded Schwartz Hypergraph Partitioning for Sparse Matrix--Matrix Multiplication . . . . . 18:1--18:??
David Grove Introduction to the Special Section on PPoPP'15 . . . . . . . . . . . . . . . . 19:1--19:?? Zoltan Majo and Thomas R. Gross A Library for Portable and Composable Data Locality Optimizations for NUMA Systems . . . . . . . . . . . . . . . . 20:1--20:?? Guy Golan-Gueta and G. Ramalingam and Mooly Sagiv and Eran Yahav Automatic Scalable Atomicity via Semantic Locking . . . . . . . . . . . . 21:1--21:?? Joseph Izraelevitz and Michael L. Scott Generality and Speed in Nonblocking Dual Containers . . . . . . . . . . . . . . . 22:1--22:?? Richard Cole and Vijaya Ramachandran Resource Oblivious Sorting on Multicores 23:1--23:??
Grey Ballard and Mary Hall and Tim Harris and Brandon Lucia Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2 . . . . . . . . . . 1:1--1:?? Saman Ashkiani and Andrew Davidson and Ulrich Meyer and John D. Owens GPU Multisplit: an Extended Study of a Parallel Algorithm . . . . . . . . . . . 2:1--2:?? Yangzihao Wang and Yuechao Pan and Andrew Davidson and Yuduo Wu and Carl Yang and Leyuan Wang and Muhammad Osama and Chenshan Yuan and Weitang Liu and Andy T. Riffel and John D. Owens Gunrock: GPU Graph Analytics . . . . . . 3:1--3:?? Rezaul Chowdhury and Pramod Ganapathi and Stephen Tschudi and Jesmin Jahan Tithi and Charles Bachmeier and Charles E. Leiserson and Armando Solar-Lezama and Bradley C. Kuszmaul and Yuan Tang Autogen: Automatic Discovery of Efficient Recursive Divide-&-Conquer Algorithms for Solving Dynamic Programming Problems . . . . . . . . . . 4:1--4:?? Guy L. Steele Jr. and Jean-Baptiste Tristan Adding Approximate Counters . . . . . . 5:1--5:??
Grey Ballard and Mary Hall and Tim Harris and Brandon Lucia Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2 . . . . . . . . . . 6:1--6:?? Saurabh Kalikar and Rupesh Nasre DomLock: a New Multi-Granularity Locking Technique for Hierarchies . . . . . . . 7:1--7:?? Syed Kamran Haider and William Hasenplaugh and Dan Alistarh Lease/Release: Architectural Support for Scaling Contended Data Structures . . . 8:1--8:?? Man Cao and Minjia Zhang and Aritra Sengupta and Swarnendu Biswas and Michael D. Bond Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support . . . . . . . . . . . . . . . . 9:1--9:?? Georgios Chatzopoulos and Aleksandar Dragojevi\'c and Rachid Guerraoui ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications . . . . . . . . . 10:1--10:?? Vincenzo Gulisano and Yiannis Nikolakopoulos and Daniel Cederman and Marina Papatriantafilou and Philippas Tsigas Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types . . . . . . . . . . . . . . . 11:1--11:??
Tareq M. Malas and Georg Hager and Hatem Ltaief and David E. Keyes Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations . . . . . . . . . . 12:1--12:?? Kadir Akbudak and Oguz Selvitopi and Cevdet Aykanat Partitioning Models for Scaling Parallel Sparse Matrix--Matrix Multiplication . . 13:1--13:?? Gianfranco Bilardi and Michele Scquizzato and Francesco Silvestri A Lower Bound Technique for Communication in BSP . . . . . . . . . . 14:1--14:?? Semih Sahin and Bugra Gedik C-Stream: a Co-routine-Based Elastic Stream Processing Engine . . . . . . . . 15:1--15:??
Kunal Agrawal and I-Ting Angelina Lee and Michael Spear Introduction to Special Issue on SPAA'15 16:1--16:?? Kook Jin Ahn and Sudipto Guha Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints . . 17:1--17:?? Dan Alistarh and William Leiserson and Alexander Matveev and Nir Shavit ThreadScan: Automatic and Scalable Memory Reclamation . . . . . . . . . . . 18:1--18:?? Dimitar Dimitrov and Martin Vechev and Vivek Sarkar Race Detection in Two Dimensions . . . . 19:1--19:?? I-Ting Angelina Lee and Tao B. Schardl Efficient Race Detection for Reducer Hyperobjects . . . . . . . . . . . . . . 20:1--20:??
Seth Gilbert Introduction to the Special Issue for SPAA 2016 . . . . . . . . . . . . . . . 1:1--1:?? Michael Mitzenmacher and Rajmohan Rajaraman and Scott Roche Better Bounds for Coalescing-Branching Random Walks . . . . . . . . . . . . . . 2:1--2:?? Mingmou Liu and Xiaoyin Pan and Yitong Yin Randomized Approximate Nearest Neighbor Search with Limited Adaptivity . . . . . 3:1--3:?? Gopal Pandurangan and Peter Robinson and Michele Scquizzato Fast Distributed Algorithms for Connectivity and MST in Large Graphs . . 4:1--4:?? Madhukar Korupolu and Rajmohan Rajaraman Robust and Probabilistic Failure-Aware Placement . . . . . . . . . . . . . . . 5:1--5:?? Deli Zhang and Pierre Laborde and Lance Lebanoff and Damian Dechev Lock-Free Transactional Transformation for Linked Data Structures . . . . . . . 6:1--6:??
Michel Müller and Takayuki Aoki New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code 7:1--7:?? Martin Burtscher and Sindhu Devale and Sahar Azimi and Jayadharini Jaiganesh and Evan Powers A High-Quality and Fast Maximal Independent Set Implementation for GPUs 8:1--8:?? Junhong Liu and Guangming Tan and Yulong Luo and Jiajia Li and Zeyao Mo and Ninghui Sun An Autotuning Protocol to Rapidly Build Autotuners . . . . . . . . . . . . . . . 9:1--9:?? Antonio Fernández Anta and Dariusz R. Kowalski and Miguel A. Mosteiro and Prudence W. H. Wong Scheduling Dynamic Parallel Workload of Mobile Devices with Access Guarantees 10:1--10:??
Amirhossein Mirhosseini and Mohammad Sadrosadati and Fatemeh Aghamohammadi and Mehdi Modarressi and Hamid Sarbazi-Azad BARAN: Bimodal Adaptive Reconfigurable-Allocator Network-on-Chip 11:1--11:?? Abdelhalim Amer and Huiwei Lu and Pavan Balaji and Milind Chabbi and Yanjie Wei and Jeff Hammond and Satoshi Matsuoka Lock Contention Management in Multithreaded MPI . . . . . . . . . . . 12:1--12:?? Rong Chen and Jiaxin Shi and Yanzhe Chen and Binyu Zang and Haibing Guan and Haibo Chen PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs . . . . . . . . . . . . . . . . . 13:1--13:??
Alex Aravind and Wim H. Hesselink Group Mutual Exclusion by Fetch-and-increment . . . . . . . . . . 14:1--14:?? Babak Behzad and Surendra Byna and Prabhat and Marc Snir Optimizing I/O Performance of HPC Applications with Autotuning . . . . . . 15:1--15:?? Tobias Maier and Peter Sanders and Roman Dementiev Concurrent Hash Tables: Fast and General(?)! . . . . . . . . . . . . . . 16:1--16:?? Eduardo H. M. Cruz and Matthias Diener and Laércio L. Pilla and Philippe O. A. Navaux EagerMap: a Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems . . . . 17:1--17:??
David A. Bader Editorial from the Editor-in-Chief . . . 1:1--1:?? Martin Kronbichler and Karl Ljungkvist Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors . . . . . . . . . . . . . . . 2:1--2:?? Vincenzo Bonifaci and Andreas Wiese and Sanjoy K. Baruah and Alberto Marchetti-Spaccamela and Sebastian Stiller and Leen Stougie A Generalized Parallel Task Model for Recurrent Real-Time Processes . . . . . 3:1--3:?? Hatem Ltaief and Dalal Sukkari and Aniello Esposito and Yuji Nakatsukasa and David Keyes Massively Parallel Polar Decomposition on Distributed-memory Systems . . . . . 4:1--4:?? Dibakar Saha and Koushik Sinha Optimal Schedule for All-to-All Personalized Communication in Multiprocessor Systems . . . . . . . . . 5:1--5:??
Sarunya Pumma and Min Si and Wu-Chun Feng and Pavan Balaji Scalable Deep Learning via I/O Analysis and Optimization . . . . . . . . . . . . 6:1--6:?? Guilllaume Aupy and Ana Gainaru and Valentin Le F\`evre I/O Scheduling Strategy for Periodic Applications . . . . . . . . . . . . . . 7:1--7:?? Dimitri Kagaris and Sourav Dutta Scheduling Mutual Exclusion Accesses in Equal-Length Jobs . . . . . . . . . . . 8:1--8:?? Md Atiqul Mollah and Wenqi Wang and Peyman Faizian and MD Shafayat Rahman and Xin Yuan and Scott Pakin and Michael Lang Modeling Universal Globally Adaptive Load-Balanced Routing . . . . . . . . . 9:1--9:??
Mohammed Hossein Bateni and Mohammad T. Hajiaghayi and Silvio Lattanzi Introduction to the Special Issue for SPAA'17 . . . . . . . . . . . . . . . . 10:1--10:?? Sudipto Guha and Yi Li and Qin Zhang Distributed Partial Clustering . . . . . 11:1--11:?? Pierre Fraigniaud and Dennis Olivetti Distributed Detection of Cycles . . . . 12:1--12:?? Susanne Albers On Energy Conservation in Data Centers 13:1--13:?? Björn Feldkord and Friedhelm Meyer Auf Der Heide The Mobile Server Problem . . . . . . . 14:1--14:?? Yossi Azar and Danny Vainstein Tight Bounds for Clairvoyant Dynamic Bin Packing . . . . . . . . . . . . . . . . 15:1--15:?? Colin Cooper and Tomasz Radzik and Nicolas Rivera New Cover Time Bounds for the Coalescing-Branching Random Walk on Graphs . . . . . . . . . . . . . . . . . 16:1--16:?? He Sun and Luca Zanetti Distributed Graph Clustering and Sparsification . . . . . . . . . . . . . 17:1--17:?? Shahbaz Khan Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs . . . . 18:1--18:??
Lawrence Rauchwerger and Jaejin Lee and Armando Solar-Lezama and Guy Steele Introduction to the Special Issue on PPoPP 2017 (Part 1) . . . . . . . . . . 19:1--19:?? Tao B. Schardl and William S. Moses and Charles E. Leiserson Tapir: Embedding Recursive Fork-join Parallelism into LLVM's Intermediate Representation . . . . . . . . . . . . . 19:1--19:?? Robert Utterback and Kunal Agrawal and I-Ting Angelina Lee and Milind Kulkarni Processor-Oblivious Record and Replay 20:1--20:?? Tsung Tai Yeh and Amit Sabne and Putt Sakdhnagool and Rudolf Eigenmann and Timothy G. Rogers Pagoda: a GPU Runtime System for Narrow Tasks . . . . . . . . . . . . . . . . . 21:1--21:?? Guy L. Steele Jr. and Jean-Baptiste Tristan Using Butterfly-patterned Partial Sums to Draw from Discrete Distributions . . 22:1--22:?? Hans Vandierendonck and Dimitrios S. Nikolopoulos Hyperqueues: Design and Implementation of Deterministic Concurrent Queues . . . 23:1--23:?? Bin Ren and Shruthi Balakrishna and Youngjoon Jo and Sriram Krishnamoorthy and Kunal Agrawal and Milind Kulkarni Extracting SIMD Parallelism from Recursive Task-Parallel Programs . . . . 24:1--24:??
Antonino Tumeo and Fabrizio Petrini and John Feo and Mahantesh Halappanavar Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 1 . . . . . . . . . . 1:1--1:2 Hartwig Anzt and Terry Cojean and Chen Yen-Chen and Jack Dongarra and Goran Flegar and Pratik Nayak and Stanimire Tomov and Yuhsiang M. Tsai and Weichung Wang Load-balancing Sparse Matrix Vector Product Kernels on GPUs . . . . . . . . 2:1--2:26 Ioan Hadade and Timothy M. Jones and Feng Wang and Luca di Mare Software Prefetching for Unstructured Mesh Applications . . . . . . . . . . . 3:1--3:23 Thomas Grützmacher and Terry Cojean and Goran Flegar and Hartwig Anzt and Enrique S. Quintana-Ortí Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation 4:1--4:19 Apurba Das and Seyed-Vahid Sanei-Mehri and Srikanta Tirthapura Shared-memory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs . . . . . . . . . . . . . . . . . 5:1--5:28 Kathleen E. Hamilton and Catherine D. Schuman and Steven R. Young and Ryan S. Bennink and Neena Imam and Travis S. Humble Accelerating Scientific Computing in the Post-Moore's Era . . . . . . . . . . . . 6:1--6:31 Kartik Lakhotia and Rajgopal Kannan and Sourav Pati and Viktor Prasanna GPOP: a Scalable Cache- and Memory-efficient Framework for Graph Processing over Parts . . . . . . . . . 7:1--7:24 Jeff Anderson and Engin Kayraklioglu and Shuai Sun and Joseph Crandall and Yousra Alkabani and Vikram Narayana and Volker Sorger and Tarek El-Ghazawi ROC: a Reconfigurable Optical Computer for Simulating Physical Processes . . . 8:1--8:29
Alireza Monemi and Farshad Khunjush and Maurizio Palesi and Hamid Sarbazi-Azad An Enhanced Dynamic Weighted Incremental Technique for QoS Support in NoC . . . . 9:1--9:31 Aravind Natarajan and Arunmoezhi Ramachandran and Neeraj Mittal FEAST: a Lightweight Lock-free Concurrent Binary Search Tree . . . . . 10:1--10:64 Ahmad Salah and Kenli Li and Qing Liao and Mervat Hashem and Zhiyong Li and Anthony T. Chronopoulos and Albert Y. Zomaya A Time-space Efficient Algorithm for Parallel $k$-way In-place Merging based on Sequence Partitioning and Perfect Shuffle . . . . . . . . . . . . . . . . 11:1--11:23 Shaohua Duan and Pradeep Subedi and Philip Davis and Keita Teranishi and Hemanth Kolla and Marc Gamell and Manish Parashar CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows . . . 12:1--12:29 Maksudul Alam and Maleq Khan and Kalyan S. Perumalla and Madhav Marathe Generating Massive Scale-free Networks: Novel Parallel Algorithms using the Preferential Attachment Model . . . . . 13:1--13:35
Jaejin Lee and Lawrence Rauchwerger and Armando Solar-Lezama and Guy Steele Introduction to the Special Issue on PPoPP 2017 (Part 2) . . . . . . . . . . 14:1--14:2 Peng Jiang and Yang Xia and Gagan Agrawal Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation . . . . . . 15:1--15:26 Dmitry Basin and Edward Bortnikov and Anastasia Braginsky and Guy Golan-Gueta and Eshcar Hillel and Idit Keidar and Moshe Sulamy KiWi: a Key--value Map for Scalable Real-time Analytics . . . . . . . . . . 16:1--16:28 Milind Chabbi and Abdelhalim Amer and Xu Liu Efficient Abortable-locking Protocol for Multi-level NUMA Systems: Design and Correctness . . . . . . . . . . . . . . 17:1--17:32 Tal Ben-Nun and Michael Sutton and Sreepathi Pai and Keshav Pingali Groute: Asynchronous Multi-GPU Programming Model with Applications to Large-scale Graph Processing . . . . . . 18:1--18:27 Christie Alappat and Achim Basermann and Alan R. Bishop and Holger Fehske and Georg Hager and Olaf Schenk and Jonas Thies and Gerhard Wellein A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix--vector Multiplication . . . . . 19:1--19:37 Denis Davydov and Martin Kronbichler Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors 20:1--20:30
Tapan K. Sengupta and Prasannabalaji Sundaram and Vajjala K. Suman and Swagata Bhaumik A High Accuracy Preserving Parallel Algorithm for Compact Schemes for DNS 21:1--21:32 Karan Aggarwal and Uday Bondhugula Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems . . . . . . . . . . . 22:1--22:45 Antonino Tumeo and Fabrizio Petrini and John Feo and Mahantesh Halappanavar Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 2 . . . . . . . . . . 23:1--23:2 Naveen Namashivayam and Bill Long and Deepak Eachempati and Bob Cernohous and Mark Pagel A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays . . . . . . . . . 24:1--24:25 Eric R. Hein and Srinivas Eswar and Abdurrahman Yasar and Jiajia Li and Jeffrey S. Young and Thomas M. Conte and Ümit V. Çatalyürek and Richard Vuduc and Jason Riedy and Bora Uçar Programming Strategies for Irregular Algorithms on the Emu Chick . . . . . . 25:1--25:25 John D. Leidel and Xi Wang and Brody Williams and Yong Chen Toward a Microarchitecture for Efficient Execution of Irregular Applications . . 26:1--26:24 Pietro Fezzardi and Fabrizio Ferrandi Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications . . . . . . . . . . . . . . 27:1--27:26
Lee Savoie and David K. Lowenthal and Bronis R. De Supinski and Kathryn Mohror and Nikhil Jain Mitigating Inter-Job Interference via Process-Level Quality-of-Service . . . . 1:1--1:26 Tahsin Reza and Hassan Halawa and Matei Ripeanu and Geoffrey Sanders and Roger A. Pearce Scalable Pattern Matching in Metadata Graphs via Constraint Checking . . . . . 2:1--2:45 Jeremy Fineman and Aydin Buluc and Seth Gilbert Introduction to the Special Issue for SPAA 2018: Part 1 . . . . . . . . . . . 3e:1--3e:1 Noga Alon and Yossi Azar and Mark Berlin The Price of Bounded Preemption . . . . 3:1--3:21 Laxman Dhulipala and Guy E. Blelloch and Julian Shun Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable . . 4:1--4:70 Haim Kaplan and Shay Solomon Dynamic Representations of Sparse Distributed Networks: a Locality-sensitive Approach . . . . . . 5:1--5:26
Jeremy Fineman and Aydin Buluc and Seth Gilbert Introduction to the Special Issue for SPAA 2018 --- Part 2 . . . . . . . . . . 6:1--6:1 Gopal Pandurangan and Peter Robinson and Michele Scquizzato On the Distributed Complexity of Large-Scale Graph Computations . . . . . 7:1--7:28 Barbara Geissmann and Lukas Gianinazzi Parallel Minimum Cuts in Near-linear Work and Low Depth . . . . . . . . . . . 8:1--8:20 Giorgio Lucarelli and Benjamin Moseley and Nguyen Kim Thang and Abhinav Srivastav and Denis Trystram Online Non-preemptive Scheduling on Unrelated Machines with Rejections . . . 9:1--9:22 Kjell Winblad and Konstantinos Sagonas and Bengt Jonsson Lock-free Contention Adapting Search Trees . . . . . . . . . . . . . . . . . 10:1--10:38 Oded Green HashGraph --- Scalable Hash Tables Using a Sparse Graph Data Structure . . . . . 11:1--11:17
Petra Berenbrink Introduction to the Special Issue for SPAA 2019 . . . . . . . . . . . . . . . 12:1--12:1 Soheil Behnezhad and Laxman Dhulipala and Hossein Esfandiari and Jakub Lacki and Vahab Mirrokni and Warren Schudy Massively Parallel Computation via Remote Memory Access . . . . . . . . . . 13:1--13:25 Faith Ellen and Barun Gorain and Avery Miller and Andrzej Pelc Constant-Length Labeling Schemes for Deterministic Radio Broadcast . . . . . 14:1--14:17 Michael A. Bender and Alex Conway and Martín Farach-Colton and William Jannen and Yizheng Jiao and Rob Johnson and Eric Knorr and Sara Mcallister and Nirjhar Mukherjee and Prashant Pandey and Donald E. Porter and Jun Yuan and Yang Zhan External-memory Dictionaries in the Affine and PDAM Models . . . . . . . . . 15:1--15:20 Matthias Maier and Martin Kronbichler Efficient Parallel $3$D Computation of the Compressible Euler Equations with an Invariant-domain Preserving Second-order Finite-element Scheme . . . . . . . . . 16:1--16:30 James Edwards and Uzi Vishkin Study of Fine-grained Nested Parallelism in CDCL SAT Solvers . . . . . . . . . . 17:1--17:18
Laurent Feuilloley and Pierre Fraigniaud Randomized Local Network Computing: Derandomization Beyond Locally Checkable Labelings . . . . . . . . . . . . . . . 18:1--18:25 Saleh Khalaj Monfared and Omid Hajihassani and Vahid Mohsseni and Dara Rahmati and Saeid Gorgin A High-throughput Parallel Viterbi Algorithm via Bitslicing . . . . . . . . 19:1--19:25 Shao-Chung Wang and Lin-Ya Yu and Li-An Her and Yuan-Shin Hwang and Jenq-Kuen Lee Pointer-Based Divergence Analysis for OpenCL 2.0 Programs . . . . . . . . . . 20:1--20:23 Xuejiao Kang and David F. Gleich and Ahmed Sameh and Ananth Grama Adaptive Erasure Coded Fault Tolerant Linear System Solver . . . . . . . . . . 21:1--21:19 Prasad Jayanti and Siddhartha Jayanti Deterministic Constant-Amortized-RMR Abortable Mutex for CC and DSM . . . . . 22:1--22:26
Matthew Leinhauser and René Widera and Sergei Bastrakov and Alexander Debus and Michael Bussmann and Sunita Chandrasekaran Metrics and Design of an Instruction Roofline Model for AMD GPUs . . . . . . 1:1--1:14 Michael Axtmann and Sascha Witt and Daniel Ferizovic and Peter Sanders Engineering In-place (Shared-memory) Sorting Algorithms . . . . . . . . . . . 2:1--2:62 Rory Mitchell and Daniel Stokes and Eibe Frank and Geoffrey Holmes Bandwidth-Optimal Random Shuffling for GPUs . . . . . . . . . . . . . . . . . . 3:1--3:20 Rabab Alomairy and Wael Bader and Hatem Ltaief and Youssef Mesri and David Keyes High-performance $3$D Unstructured Mesh Deformation Using Rank Structured Matrix Computations . . . . . . . . . . . . . . 4:1--4:23 Gal Milman-Sela and Alex Kogan and Yossi Lev and Victor Luchangco and Erez Petrank BQ: a Lock-Free Queue with Batching . . 5:1--5:49
Arik Rinberg and Alexander Spiegelman and Edward Bortnikov and Eshcar Hillel and Idit Keidar and Lee Rhodes and Hadar Serviansky Fast Concurrent Data Sketches . . . . . 6:1--6:35 Guy Blelloch and Daniel Ferizovic and Yihan Sun Joinable Parallel Balanced Binary Trees 7:1--7:41 Yuedan Chen and Guoqing Xiao and Kenli Li and Francesco Piccialli and Albert Y. Zomaya fgSpMSpV: a Fine-grained Parallel SpMSpV Framework on HPC Platforms . . . . . . . 8:1--8:29 Sixue Cliff Liu and Robert Endre Tarjan Simple Concurrent Connected Components Algorithms . . . . . . . . . . . . . . . 9:1--9:26
Ghadeer Alabandi and Martin Burtscher Improving the Speed and Quality of Parallel Graph Coloring . . . . . . . . 10:1--10:35 Hung K. Nguyen and Xuan-Tu Tran Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator 11:1--11:23 M. A. Anju and Rupesh Nasre Multi-Interval DomLock: Toward Improving Concurrency in Hierarchies . . . . . . . 12:1--12:27 Yuede Ji and Hang Liu and Yang Hu and H. Howie Huang iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees . . . . . . . . . . . . . 13:1--13:27
Anne Benoit and Luca Perotin and Yves Robert and Hongyang Sun Checkpointing Workflows \`a la Young/Daly Is Not Good Enough . . . . . 14:1--14:?? Susanne Albers and Jens Quedenfeld Optimal Algorithms for Right-sizing Data Centers . . . . . . . . . . . . . . . . 15:1--15:?? Giorgos Kappes and Stergios V. Anastasiadis A Family of Relaxed Concurrent Queues for Low-Latency Operations and Item Transfers . . . . . . . . . . . . . . . 16:1--16:??
Prasannabalaji Sundaram and Aditi Sengupta and Vajjala K. Suman and Tapan K. Sengupta Non-overlapping High-accuracy Parallel Closure for Compact Schemes: Application in Multiphysics and Complex Geometry . . 1:1--1:?? Shelby Lockhart and Amanda Bienz and William Gropp and Luke Olson Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods . . . . . . . 2:1--2:?? Peter Munch and Timo Heister and Laura Prieto Saavedra and Martin Kronbichler Efficient Distributed Matrix-free Multigrid Methods on Locally Refined Meshes for FEM Computations . . . . . . 3:1--3:?? Christoph Klein and Robert Strzodka Tridigpu: a GPU Library for Block Tridiagonal and Banded Linear Equation Systems . . . . . . . . . . . . . . . . 4:1--4:??
Kartik Lakhotia and Rajgopal Kannan and Viktor Prasanna Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery . . . . . . . . . . . . . . . 5:1--5:?? Weijian Zheng and Dali Wang and Fengguang Song A Distributed-GPU Deep Reinforcement Learning System for Solving Large Graph Optimization Problems . . . . . . . . . 6:1--6:?? Andrew D. Brown and Jonathan R. Beaumont and David B. Thomas and Julian C. Shillcock and Matthew F. Naylor and Graeme M. Bragg and Mark L. Vousden and Simon W. Moore and Shane T. Fleming POETS: an Event-driven Approach to Dissipative Particle Dynamics: Implementing a Massively Compute-intensive Problem on a Novel Hard/Software Architecture. . . . . . . 7:1--7:?? Valentina Aleeva and Rifkhat Aleev Investigation and Implementation of Parallelism Resources of Numerical Algorithms . . . . . . . . . . . . . . . 8:1--8:?? Haotian Wang and Wangdong Yang and Renqiu Ouyang and Rong Hu and Kenli Li and Keqin Li A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN . . . . . . . . . . . . . . . . . . 9:1--9:?? Zheng Miao and Jon C. Calhoun and Rong Ge and Jiajia Li Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition . . . . 10:1--10:?? Wim H. Hesselink and Peter A. Buhr MCSH, a Lock with the Standard Interface 11:1--11:?? Hadi Zamani and Laxmi Bhuyan and Jieyang Chen and Zizhong Chen GreenMD: Energy-efficient Matrix Decomposition on Heterogeneous Multi-GPU Systems . . . . . . . . . . . . . . . . 12:1--12:?? Aleksandar Kamenev and Dariusz R. Kowalski and Miguel A. Mosteiro Faster Supervised Average Consensus in Adversarial and Stochastic Anonymous Dynamic Networks . . . . . . . . . . . . 13:1--13:??
Sanjoy Baruah and Alberto Marchetti-Spaccamela The Computational Complexity of Feasibility Analysis for Conditional DAG Tasks . . . . . . . . . . . . . . . . . 14:1--14:?? Jovan Blanusa and Kubilay Atasu and Paolo Ienne Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-constrained Cycles . . . . . . . . . . . . . . . . . 15:1--15:?? Andreas Alvermann and Georg Hager and Holger Fehske Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations . . 16:1--16:??
Yossi Azar and Julian Shun Introduction to the Special Issue for SPAA'21 . . . . . . . . . . . . . . . . 17:1--17:?? Daniel Anderson and Guy E. Blelloch Parallel Minimum Cuts in $ O(m \log_2 n) $ Work and Low Depth . . . . . . . . . . 18:1--18:?? Sungjin Im and Ravi Kumar and Mahshid Montazer Qaem and Manish Purohit Non-clairvoyant Scheduling with Predictions . . . . . . . . . . . . . . 19:1--19:?? Susanne Albers and Jens Quedenfeld Algorithms for Right-sizing Heterogeneous Data Centers . . . . . . . 20:1--20:?? Yannic Maus Distributed Graph Coloring Made Easy . . 21:1--21:?? Zafar Ahmad and Rezaul Chowdhury and Rathish Das and Pramod Ganapathi and Aaron Gregory and Yimin Zhu A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier Transforms . . . . . . . . . . . . . . . 22:1--22:??
Anne Benoit and Lucas Perotin and Yves Robert and Frédéric Vivien Checkpointing Strategies to Tolerate Non-Memoryless Failures on HPC Platforms 1:1--1:?? Lucas Perotin and Hongyang Sun Improved Online Scheduling of Moldable Task Graphs under Common Speedup Models 2:1--2:?? Shengle Lin and Wangdong Yang and Yikun Hu and Qinyun Cai and Minlu Dai and Haotian Wang and Kenli Li HPS Cholesky: Hierarchical Parallelized Supernodal Cholesky with Adaptive Parameters . . . . . . . . . . . . . . . 3:1--3:?? Romolo Marotta and Mauro Ianni and Alessandro Pellegrini and Francesco Quaglia A Conflict-Resilient Lock-Free Linearizable Calendar Queue . . . . . . 4:1--4:?? Stefan K. Muller and Jan Hoffmann Modeling and Analyzing Evaluation Cost of CUDA Kernels . . . . . . . . . . . . 5:1--5:?? Qinyun Cai and Guoqing Xiao and Shengle Lin and Wangdong Yang and Keqin Li and Kenli Li ABSS: an Adaptive Batch-Stream Scheduling Module for Dynamic Task Parallelism on Chiplet-based Multi-Chip Systems . . . . . . . . . . . . . . . . 6:1--6:??
Qiang Fu and Yuede Ji and Thomas Rolinger and H. Howie Huang TLPGNN: a Lightweight Two-level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs . . . . . . . . . . . . . 7:1--7:?? Zixuan Li and Yunchuan Qin and Qi Xiao and Wangdong Yang and Kenli Li cuFasterTucker: a Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform . . . . . . . . . . . . . . . . 8:1--8:?? Sébastien Darche and Michel R. Dagenais Low-Overhead Trace Collection and Profiling on GPU Compute Kernels . . . . 9:1--9:?? Ziyang Li and Dongsheng Li and Yingwen Chen and Kai Chen and Yiming Zhang Decentralized Scheduling for Data-Parallel Tasks in the Cloud . . . . 10:1--10:?? Guoqing Xiao and Tao Zhou and Yuedan Chen and Yikun Hu and Kenli Li Machine Learning-Based Kernel Selector for SpMV Optimization in Graph Analysis 11:1--11:?? Zixuan Li and Yikun Hu and Mengquan Li and Wangdong Yang and Kenli Li cuFastTucker: a Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs 12:1--12:??
Yiqian Liu and Noushin Azami and Avery Vanausdal and Martin Burtscher Indigo3: a Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs 13:1--13:?? Johan Bontes and James Gain Redzone stream compaction: removing $k$ items from a list in parallel $ O(k) $ time . . . . . . . . . . . . . . . . . . 14:1--14:??