Table of contents for issues of ACM Transactions on Architecture and Code Optimization

Last update: Mon Nov 10 17:01:59 MST 2025

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 1, March, 2004

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
                   W. Zhang and   
                   J. S. Hu and   
               V. Degalahal and   
                M. Kandemir and   
           N. Vijaykrishnan and   
                    M. J. Irwin   Reducing instruction cache energy
                                  consumption using a compiler-based
                                  strategy . . . . . . . . . . . . . . . . 3--33
          Nemanja Isailovic and   
               Mark Whitney and   
               Yatish Patel and   
           John Kubiatowicz and   
                Dean Copsey and   
          Frederic T. Chong and   
            Isaac L. Chuang and   
                     Mark Oskin   Datapath and control for quantum wires   34--61
  Karthikeyan Sankaralingam and   
         Ramadass Nagarajan and   
                Haiming Liu and   
               Changkyu Kim and   
                Jaehyuk Huh and   
          Nitya Ranganathan and   
                Doug Burger and   
         Stephen W. Keckler and   
         Robert G. McDonald and   
               Charles R. Moore   TRIPS: a polymorphous architecture for
                                  exploiting ILP, TLP, and DLP . . . . . . 62--93
              Kevin Skadron and   
             Mircea R. Stan and   
   Karthik Sankaranarayanan and   
                  Wei Huang and   
         Sivakumar Velusamy and   
                   David Tarjan   Temperature-aware microarchitecture:
                                  Modeling and implementation  . . . . . . 94--125

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 2, June, 2004

               Alex Alet\`a and   
            Josep M. Codina and   
    Antonio González and   
                    David Kaeli   Removing communications in clustered
                                  microarchitectures through instruction
                                  replication  . . . . . . . . . . . . . . 127--151
                     Yu Bai and   
                  R. Iris Bahar   A low-power in-order/out-of-order issue
                                  queue  . . . . . . . . . . . . . . . . . 152--179
                Philo Juang and   
              Kevin Skadron and   
         Margaret Martonosi and   
                 Zhigang Hu and   
           Douglas W. Clark and   
          Philip W. Diodato and   
               Stefanos Kaxiras   Implementing branch-predictor decay
                                  using quasi-static memory cells  . . . . 180--219
        Oliverio J. Santana and   
               Alex Ramirez and   
       Josep L. Larriba-Pey and   
                   Mateo Valero   A low-complexity fetch architecture for
                                  high-performance superscalar processors  220--245

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 3, September, 2004

                    Jin Lin and   
                  Tong Chen and   
              Wei-Chung Hsu and   
              Pen-Chung Yew and   
            Roy Dz-Ching Ju and   
              Tin-Fook Ngai and   
                       Sun Chan   A compiler framework for speculative
                                  optimizations  . . . . . . . . . . . . . 247--271
            Brian A. Fields and   
            Rastislav Bodik and   
               Mark D. Hill and   
               Chris J. Newburn   Interaction cost and shotgun profiling   272--304
   Karthik Sankaranarayanan and   
                  Kevin Skadron   Profile-based adaptation for cache decay 305--322
                    Fen Xie and   
         Margaret Martonosi and   
                   Sharad Malik   Intraprogram dynamic voltage scaling:
                                  Bounding opportunities with analytic
                                  modeling . . . . . . . . . . . . . . . . 323--367

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 4, December, 2004

               A. Hartstein and   
                Thomas R. Puzak   The optimum pipeline depth considering
                                  both power and performance . . . . . . . 369--388
      Adrián Cristal and   
        Oliverio J. Santana and   
               Mateo Valero and   
 José F. Martínez   Toward kilo-instruction processors . . . 389--417
             Haitham Akkary and   
                Ravi Rajwar and   
         Srikanth T. Srinivasan   An analysis of a resource efficient
                                  checkpoint architecture  . . . . . . . . 418--444
              Chia-Lin Yang and   
            Alvin R. Lebeck and   
             Hung-Wei Tseng and   
                  Chien-Hao Lee   Tolerating memory latency through push
                                  prefetching for pointer-intensive
                                  applications . . . . . . . . . . . . . . 445--475

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 1, March, 2005

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
              Yuanyuan Zhou and   
                   Pin Zhou and   
                   Feng Qin and   
                    Wei Liu and   
                Josep Torrellas   Efficient and flexible architectural
                                  support for dynamic monitoring . . . . . 3--33
             Chuanjun Zhang and   
                Frank Vahid and   
                   Jun Yang and   
                   Walid Najjar   A way-halting cache for low-energy
                                  high-performance systems . . . . . . . . 34--54
               Jaume Abella and   
    Antonio González and   
                Xavier Vera and   
          Michael F. P. O'Boyle   IATAC: a smart predictor to turn-off L2
                                  cache lines  . . . . . . . . . . . . . . 55--77
       John W. Haskins, Jr. and   
                  Kevin Skadron   Accelerated warmup for sampled
                                  microarchitecture simulation . . . . . . 78--108

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 2, June, 2005

                     Tao Li and   
              Ravi Bhargava and   
               Lizy Kurian John   Adapting branch-target buffer to improve
                                  the target predictability of Java code   109--130
               Lingli Zhang and   
                 Chandra Krintz   The design, implementation, and
                                  evaluation of adaptive code unloading
                                  for resource-constrained devices . . . . 131--164
         Prasad A. Kulkarni and   
           Stephen R. Hines and   
           David B. Whalley and   
             Jason D. Hiser and   
           Jack W. Davidson and   
               Douglas L. Jones   Fast and efficient searches for
                                  effective optimization-phase sequences   165--198
       Esther Salamí and   
                   Mateo Valero   Dynamic memory interval test vs.
                                  interprocedural pointer analysis in
                                  multimedia applications  . . . . . . . . 199--219

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 3, September, 2005

                   Yan Meng and   
           Timothy Sherwood and   
                   Ryan Kastner   Exploring the limits of leakage power
                                  reduction in caches  . . . . . . . . . . 221--246
María Jesús Garzarán and   
            Milos Prvulovic and   
José María Llabería and   
Víctor Viñals and   
       Lawrence Rauchwerger and   
                Josep Torrellas   Tradeoffs in buffering speculative
                                  memory state for thread-level
                                  speculation in multiprocessors . . . . . 247--279
               David Tarjan and   
                  Kevin Skadron   Merging path and gshare indexing in
                                  perceptron branch prediction . . . . . . 280--300
              Xiangyu Zhang and   
                    Rajiv Gupta   Whole execution traces and their
                                  applications . . . . . . . . . . . . . . 301--334

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 4, December, 2005

               Wankang Zhao and   
              David Whalley and   
          Christopher Healy and   
                  Frank Mueller   Improving WCET by applying a WC
                                  code-positioning optimization  . . . . . 335--365
             George A. Reis and   
             Jonathan Chang and   
          Neil Vachharajani and   
                 Ram Rangan and   
            David I. August and   
         Shubhendu S. Mukherjee   Software-controlled fault tolerance  . . 366--396
                    Jian Li and   
 José F. Martínez   Power-performance considerations of
                                  parallel computing on chip
                                  multiprocessors  . . . . . . . . . . . . 397--422
             Saurabh Sharma and   
               Jesse G. Beu and   
                Thomas M. Conte   Spectral prefetcher: An effective
                                  mechanism for L2 cache prefetching . . . 423--450

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 1, March, 2006

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
                    Lin Tan and   
           Brett Brotherton and   
               Timothy Sherwood   Bit-split string-matching engines for
                                  intrusion detection and prevention . . . 3--34
            Priya Nagpurkar and   
               Hussam Mousa and   
             Chandra Krintz and   
               Timothy Sherwood   Efficient remote profiling for
                                  resource-constrained devices . . . . . . 35--66
                    Jin Lin and   
              Wei-Chung Hsu and   
              Pen-Chung Yew and   
            Roy Dz-Ching Ju and   
                  Tin-Fook Ngai   Recovery code generation for general
                                  speculative optimizations  . . . . . . . 67--89
               Yoonseo Choi and   
                    Hwansoo Han   Optimal register reassignment for
                                  register stack overflow minimization . . 90--114

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 2, June, 2006

               Jingling Xue and   
                      Qiong Cai   A lifetime optimal algorithm for
                                  speculative PRE  . . . . . . . . . . . . 115--155
          Joseph J. Sharkey and   
        Dmitry V. Ponomarev and   
                Kanad Ghose and   
                     Oguz Ergin   Instruction packing: Toward fast and
                                  energy-efficient instruction scheduling  156--181
                  Luis Ceze and   
              Karin Strauss and   
                 James Tuck and   
            Josep Torrellas and   
                     Jose Renau   CAVA: Using checkpoint-assisted value
                                  prediction to hide L2 misses . . . . . . 182--208
                Lixin Zhang and   
                Mike Parker and   
                    John Carter   Efficient address remapping in
                                  distributed shared-memory systems  . . . 209--229

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 3, September, 2006

                   Min Zhao and   
          Bruce R. Childers and   
                 Mary Lou Soffa   An approach toward profit-driven
                                  optimization . . . . . . . . . . . . . . 231--262
              Kim Hazelwood and   
               Michael D. Smith   Managing bounded code caches in dynamic
                                  binary optimization systems  . . . . . . 263--294
        Olivier Rochecouste and   
               Gilles Pokam and   
            André Seznec   A case for a complexity-effective,
                                  width-partitioned microarchitecture  . . 295--326
                Ahmad Zmily and   
             Christos Kozyrakis   Block-aware instruction set architecture 327--357

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 4, December, 2006

       Jedidiah R. Crandall and   
                S. Felix Wu and   
              Frederic T. Chong   Minos: Architectural support for
                                  protecting control data  . . . . . . . . 359--389
            Jaydeep Marathe and   
              Frank Mueller and   
          Bronis R. de Supinski   Analysis of cache-coherence bottlenecks
                                  with hybrid hardware/software techniques 390--423
               Ilya Ganusov and   
               Martin Burtscher   Future execution: a prefetching
                                  mechanism that uses multiple cores to
                                  speed up single threads  . . . . . . . . 424--449
                 Michele Co and   
           Dee A. B. Weikle and   
                  Kevin Skadron   Evaluating trace cache energy efficiency 450--476
                  Shiwen Hu and   
            Madhavi Valluri and   
               Lizy Kurian John   Effective management of multiple
                                  configurable units using dynamic
                                  optimization . . . . . . . . . . . . . . 477--501
              Chris Bentley and   
         Scott A. Watterson and   
         David K. Lowenthal and   
                 Barry Rountree   Implicit array bounds checking on 64-bit
                                  architectures  . . . . . . . . . . . . . 502--527

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 1, March, 2007

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1:1--1:1
      Kypros Constantinides and   
              Stephen Plaza and   
                Jason Blome and   
           Valeria Bertacco and   
               Scott Mahlke and   
                Todd Austin and   
                  Bin Zhang and   
              Michael Orshansky   Architecting a reliable CMP switch
                                  architecture . . . . . . . . . . . . . . 2:1--2:37
            Ruchira Sasanka and   
                 Man-Lap Li and   
             Sarita V. Adve and   
             Yen-Kuang Chen and   
                     Eric Debes   ALP: Efficient support for all levels of
                                  parallelism for complex media
                                  applications . . . . . . . . . . . . . . 3:1--3:30
                    Yan Luo and   
                     Jia Yu and   
                   Jun Yang and   
                Laxmi N. Bhuyan   Conserving network processor power
                                  consumption by exploiting traffic
                                  variability  . . . . . . . . . . . . . . 4:1--4:26
            Vassos Soteriou and   
                Noel Eisley and   
                  Li-Shiuan Peh   Software-directed power-aware
                                  interconnection networks . . . . . . . . 5:1--5:40
            Yuan-Shin Hwang and   
                     Jia-Jhe Li   Snug set-associative caches: Reducing
                                  leakage power of instruction and data
                                  caches with no performance penalties . . 6:1--6:28
                Hongbo Rong and   
              Zhizhong Tang and   
            R. Govindarajan and   
             Alban Douillet and   
                   Guang R. Gao   Single-dimension software pipelining for
                                  multidimensional loops . . . . . . . . . 7:1--7:44

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 2, June, 2007

              Fred A. Bower and   
            Daniel J. Sorin and   
                      Sule Ozev   Online diagnosis of hard faults in
                                  microprocessors  . . . . . . . . . . . . 8:1--8:??
             Pierre Michaud and   
        André Seznec and   
               Damien Fetis and   
         Yiannakis Sazeides and   
         Theofanis Constantinou   A study of thread migration in
                                  temperature-constrained multicores . . . 9:1--9:??
                    Yu Chen and   
                    Fuxin Zhang   Code reordering on limited branch offset 10:1--10:??
             A. S. Terechko and   
                   H. Corporaal   Inter-cluster communication in VLIW
                                  architectures  . . . . . . . . . . . . . 11:1--11:??
                 Jialin Dou and   
                 Marcelo Cintra   A compiler cost model for speculative
                                  parallelization  . . . . . . . . . . . . 12:1--12:??
               Wolfram Amme and   
          Jeffery von Ronne and   
                  Michael Franz   SSA-based mobile code: Implementation
                                  and empirical evaluation . . . . . . . . 13:1--13:??

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 3, September, 2007

                Xiaodong Li and   
                 Ritu Gupta and   
             Sarita V. Adve and   
                  Yuanyuan Zhou   Cross-component energy management: Joint
                                  adaptation of processor and memory . . . 14:1--14:??
                  Ron Gabor and   
               Shlomo Weiss and   
                  Avi Mendelson   Fairness enforcement in switch on event
                                  multithreading . . . . . . . . . . . . . 15:1--15:??
              Diego Andrade and   
        Basilio B. Fraguela and   
            Ramón Doallo   Precise automatable analytical modeling
                                  of the cache behavior of codes with
                                  indirections . . . . . . . . . . . . . . 16:1--16:??
           Kris Venstermans and   
            Lieven Eeckhout and   
              Koen De Bosschere   Java object header elimination for
                                  reduced memory consumption in 64-bit
                                  virtual machines . . . . . . . . . . . . 17:1--17:??
                   Shu Xiao and   
               Edmund M.-K. Lai   VLIW instruction scheduling for minimal
                                  power variation  . . . . . . . . . . . . 18:1--18:??
            Sriraman Tallam and   
                    Rajiv Gupta   Unified control flow and data dependence
                                  traces . . . . . . . . . . . . . . . . . 19:1--19:??

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 4, January, 2008

                 Engin Ipek and   
             Sally A. McKee and   
                Karan Singh and   
               Rich Caruana and   
      Bronis R. de Supinski and   
                  Martin Schulz   Efficient architectural design space
                                  exploration via predictive modeling  . . 1:1--1:??
                  Yunhe Shi and   
                Kevin Casey and   
              M. Anton Ertl and   
                    David Gregg   Virtual machine showdown: Stack versus
                                  registers  . . . . . . . . . . . . . . . 2:1--2:??
                    Jun Yan and   
                      Wei Zhang   Exploiting virtual registers to reduce
                                  pressure on real registers . . . . . . . 3:1--3:??
               Zoe C. H. Yu and   
          Francis C. M. Lau and   
                    Cho-Li Wang   Object co-location and memory reuse for
                                  Java programs  . . . . . . . . . . . . . 4:1--4:??
                 Chuanjun Zhang   Reducing cache misses through
                                  programmable decoders  . . . . . . . . . 5:1--5:??
              Amit Golander and   
                   Shlomo Weiss   Hiding the misprediction penalty of a
                                  resource-efficient high-performance
                                  processor  . . . . . . . . . . . . . . . 6:1--6:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 1, May, 2008

                Brad Calder and   
                   Dean Tullsen   Editorial  . . . . . . . . . . . . . . . 1:1--1:??
          Shashidhar Mysore and   
              Banit Agrawal and   
             Rodolfo Neuber and   
           Timothy Sherwood and   
       Nisheeth Shrivastava and   
                   Subhash Suri   Formulating and implementing profiling
                                  over adaptive ranges . . . . . . . . . . 2:1--2:??
               Antonia Zhai and   
         J. Gregory Steffan and   
     Christopher B. Colohan and   
                  Todd C. Mowry   Compiler and hardware support for
                                  reducing the synchronization of
                                  speculative threads  . . . . . . . . . . 3:1--3:??
         Jonathan A. Winter and   
              David H. Albonesi   Addressing thermal nonuniformity in SMT
                                  workloads  . . . . . . . . . . . . . . . 4:1--4:??
      Asadollah Shahbahrami and   
               Ben Juurlink and   
           Stamatis Vassiliadis   Versatility of extended subwords and the
                                  matrix register file . . . . . . . . . . 5:1--5:??
                    Zhi Guo and   
               Walid Najjar and   
                Betul Buyukkurt   Efficient hardware code generation for
                                  FPGAs  . . . . . . . . . . . . . . . . . 6:1--6:??
            Thomas Kotzmann and   
           Christian Wimmer and   
Hanspeter Mössenböck and   
           Thomas Rodriguez and   
            Kenneth Russell and   
                      David Cox   Design of the Java HotSpot\TM client
                                  compiler for Java 6  . . . . . . . . . . 7:1--7:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 2, August, 2008

                 Ram Rangan and   
          Neil Vachharajani and   
           Guilherme Ottoni and   
                David I. August   Performance scalability of decoupled
                                  software pipelining  . . . . . . . . . . 8:1--8:??
                 Jieyi Long and   
         Seda Ogrenci Memik and   
               Gokhan Memik and   
             Rajarshi Mukherjee   Thermal monitoring mechanisms for chip
                                  multiprocessors  . . . . . . . . . . . . 9:1--9:??
                 Ajay Joshi and   
            Lieven Eeckhout and   
        Robert H. Bell, Jr. and   
                   Lizy K. John   Distilling the essence of proprietary
                                  workloads into miniature benchmarks  . . 10:1--10:??
           Vincenzo Catania and   
            Maurizio Palesi and   
                   Davide Patti   Reducing complexity of multiobjective
                                  design space exploration in VLIW-based
                                  embedded systems . . . . . . . . . . . . 11:1--11:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 3, November, 2008

             Jacob Leverich and   
             Hideho Arakida and   
          Alex Solomatnikov and   
         Amin Firoozshahian and   
              Mark Horowitz and   
             Christos Kozyrakis   Comparative evaluation of memory models
                                  for chip multiprocessors . . . . . . . . 12:1--12:??
          Joseph J. Sharkey and   
                 Jason Loew and   
            Dmitry V. Ponomarev   Reducing register pressure in SMT
                                  processors through L2-miss-driven early
                                  register release . . . . . . . . . . . . 13:1--13:??
            Mojtaba Mehrara and   
                    Todd Austin   Exploiting selective placement for
                                  low-cost memory protection . . . . . . . 14:1--14:??
        Hans Vandierendonck and   
            André Seznec   Speculative return address stack
                                  management revisited . . . . . . . . . . 15:1--15:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 4, March, 2009

         Siddhartha Chhabra and   
               Brian Rogers and   
                Yan Solihin and   
                Milos Prvulovic   Making secure processors OS- and
                                  performance-friendly . . . . . . . . . . 16:1--16:??
       Daniel A. Jiménez   Generalizing neural branch prediction    17:1--17:??
              Jinseong Jeon and   
             Keoncheol Shin and   
                    Hwansoo Han   Abstracting access patterns of dynamic
                                  memory using regular expressions . . . . 18:1--18:??
            Ghassan Shobaki and   
                Kent Wilken and   
                 Mark Heffernan   Optimal trace scheduling using
                                  enumeration  . . . . . . . . . . . . . . 19:1--19:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 1, March, 2009

         Prasad A. Kulkarni and   
           David B. Whalley and   
              Gary S. Tyson and   
               Jack W. Davidson   Practical exhaustive optimization phase
                                  order exploration and evaluation . . . . 1:1--1:??
           Manuel Hohenauer and   
                Felix Engel and   
             Rainer Leupers and   
               Gerd Ascheid and   
                  Heinrich Meyr   A SIMD optimization framework for
                                  retargetable compilers . . . . . . . . . 2:1--2:??
              Stijn Eyerman and   
                Lieven Eeckhout   Memory-level parallelism aware fetch
                                  policies for simultaneous multithreading
                                  processors . . . . . . . . . . . . . . . 3:1--3:??
             Lukasz Strozek and   
                   David Brooks   Energy- and area-efficient architectures
                                  through application clustering and
                                  architectural heterogeneity  . . . . . . 4:1--4:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 2, June, 2009

         Guru Venkataramani and   
           Ioannis Doudalis and   
                Yan Solihin and   
                Milos Prvulovic   MemTracker: An accelerator for memory
                                  debugging and monitoring . . . . . . . . 5:1--5:??
                  Ron Gabor and   
              Avi Mendelson and   
                   Shlomo Weiss   Service level agreement for
                                  multithreaded processors . . . . . . . . 6:1--6:??
          Wilson W. L. Fung and   
                  Ivan Sham and   
                George Yuan and   
                  Tor M. Aamodt   Dynamic warp formation: Efficient MIMD
                                  control flow on SIMD graphics hardware   7:1--7:??
              Cheng-Kok Koh and   
              Weng-Fai Wong and   
                 Yiran Chen and   
                         Hai Li   Tolerating process variations in large,
                                  set-associative caches: The buddy cache  8:1--8:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 3, September, 2009

                    Lian Li and   
                   Hui Feng and   
                   Jingling Xue   Compiler-directed scratchpad memory
                                  management via graph coloring  . . . . . 9:1--9:??
              Amit Golander and   
                   Shlomo Weiss   Checkpoint allocation and release  . . . 10:1--10:??
                 Weifeng Xu and   
                Russell Tessier   Tetris-XL: a performance-driven spill
                                  reduction technique for embedded VLIW
                                  processors . . . . . . . . . . . . . . . 11:1--11:??
           Timothy M. Jones and   
      Michael F. P. O'Boyle and   
               Jaume Abella and   
    Antonio González and   
                   O\uguz Ergin   Exploring the limits of early register
                                  release: Exploiting compiler analysis    12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 4, October, 2009

           Timothy M. Jones and   
      Michael F. P. O'Boyle and   
               Jaume Abella and   
    Antonio González and   
                   O\uguz Ergin   Energy-efficient register caching with
                                  compiler assistance  . . . . . . . . . . 13:1--13:??
                  Weijia Li and   
               Youtao Zhang and   
                   Jun Yang and   
                    Jiang Zheng   Towards update-conscious compilation for
                                  energy-efficient code dissemination in
                                  WSNs . . . . . . . . . . . . . . . . . . 14:1--14:??
              Michal Wegiel and   
                 Chandra Krintz   The single-referent collector:
                                  Optimizing compaction for the common
                                  case . . . . . . . . . . . . . . . . . . 15:1--15:??
      Samantika Subramaniam and   
                 Gabriel H. Loh   Design and optimization of the store
                                  vectors memory dependence predictor  . . 16:1--16:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 1, April, 2010

              Xiaohang Wang and   
                   Mei Yang and   
              Yingtao Jiang and   
                       Peng Liu   A power-aware mapping approach to map IP
                                  cores onto NoCs under bandwidth and
                                  latency constraints  . . . . . . . . . . 1:1--1:??
              Zhong-Ho Chen and   
                 Alvin W. Y. Su   A hardware/software framework for
                                  instruction and data scratchpad memory
                                  allocation . . . . . . . . . . . . . . . 2:1--2:??
              Dong Hyuk Woo and   
           Joshua B. Fryman and   
             Allan D. Knies and   
              Hsien-Hsin S. Lee   Chameleon: Virtualizing idle
                                  acceleration cores of a heterogeneous
                                  multicore processor for caching and
                                  prefetching  . . . . . . . . . . . . . . 3:1--3:??
             Daniel Sanchez and   
    George Michelogiannakis and   
             Christos Kozyrakis   An analysis of on-chip interconnection
                                  networks for large-scale chip
                                  multiprocessors  . . . . . . . . . . . . 4:1--4:??
                 Xiuyi Zhou and   
                   Jun Yang and   
              Marek Chrobak and   
                   Youtao Zhang   Performance-aware thermal management via
                                  task scheduling  . . . . . . . . . . . . 5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 2, September, 2010

              Arun Raghavan and   
             Colin Blundell and   
              Milo M. K. Martin   Token tenure and PATCH: a
                                  predictive/adaptive token-counting
                                  hybrid . . . . . . . . . . . . . . . . . 6:1--6:??
           Christian Wimmer and   
Hanspeter Mössenbösck   Automatic feedback-directed object
                                  fusing . . . . . . . . . . . . . . . . . 7:1--7:??
            Benjamin C. Lee and   
                   David Brooks   Applied inference: Case studies in
                                  microarchitectural design  . . . . . . . 8:1--8:??
                  R. Rakvic and   
                     Q. Cai and   
         J. González and   
                 G. Magklis and   
                P. Chaparro and   
             A. González   Thread-management techniques to maximize
                                  efficiency in multicore and simultaneous
                                  multithreaded microprocessors  . . . . . 9:1--9:??
                  Derek Pao and   
                    Wei Lin and   
                        Bin Liu   A memory-efficient pipelined
                                  implementation of the Aho--Corasick
                                  string-matching algorithm  . . . . . . . 10:1--10:??
                Xuejun Yang and   
                 Ying Zhang and   
                 Xicheng Lu and   
               Jingling Xue and   
                 Ian Rogers and   
                     Gen Li and   
                Guibin Wang and   
                    Xudong Fang   Exploiting the reuse supplied by
                                  loop-dependent stream references for
                                  stream processors  . . . . . . . . . . . 11:1--11:??
         Vijay Janapa Reddi and   
           Simone Campanoni and   
             Meeta S. Gupta and   
           Michael D. Smith and   
                Gu-Yeon Wei and   
               David Brooks and   
                  Kim Hazelwood   Eliminating voltage emergencies via
                                  software-guided code transformations . . 12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 3, December, 2010

                   Qin Zhao and   
           Ioana Cutcutache and   
                  Weng-Fai Wong   PiPA: Pipelined profiling and analysis
                                  on multicore systems . . . . . . . . . . 13:1--13:??
                    Fei Guo and   
                Yan Solihin and   
                    Li Zhao and   
               Ravishankar Iyer   Quality of service shared cache
                                  management in chip multiprocessor
                                  architecture . . . . . . . . . . . . . . 14:1--14:??
                 Xiaoxia Wu and   
                    Jian Li and   
                Lixin Zhang and   
               Evan Speight and   
               Ram Rajamony and   
                       Yuan Xie   Design exploration of hybrid caches with
                                  disparate memory technologies  . . . . . 15:1--15:??
          Kornilios Kourtis and   
            Georgios Goumas and   
              Nectarios Koziris   Exploiting compression opportunities to
                                  improve SpMxV performance on shared
                                  memory systems . . . . . . . . . . . . . 16:1--16:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 4, December, 2010

            Betul Buyukkurt and   
                John Cortes and   
           Jason Villarreal and   
                Walid A. Najjar   Impact of high-level transformations
                                  within the ROCCC framework . . . . . . . 17:1--17:??
            Yuan-Shin Hwang and   
              Tzong-Yen Lin and   
                Rong-Guey Chang   DisIRer: Converting a retargetable
                                  compiler into a multiplatform binary
                                  translator . . . . . . . . . . . . . . . 18:1--18:??
              Michael Boyer and   
               David Tarjan and   
                  Kevin Skadron   Federation: Boosting per-thread
                                  performance of throughput-oriented
                                  manycore architectures . . . . . . . . . 19:1--19:??
             Grigori Fursin and   
                  Olivier Temam   Collective optimization: a practical
                                  collaborative approach . . . . . . . . . 20:1--20:??
                   Fang Liu and   
                    Yan Solihin   Understanding the behavior and
                                  implications of context switch misses    21:1--21:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 1, April, 2011

              Stijn Eyerman and   
                Lieven Eeckhout   Fine-grained DVFS using on-chip
                                  regulators . . . . . . . . . . . . . . . 1:1--1:??
             Chen-Yong Cher and   
                    Eren Kursun   Exploring the effects of on-chip thermal
                                  variation on high-performance multicore
                                  architectures  . . . . . . . . . . . . . 2:1--2:??
             Carole-Jean Wu and   
             Margaret Martonosi   Adaptive timekeeping replacement:
                                  Fine-grained capacity management for
                                  shared CMP caches  . . . . . . . . . . . 3:1--3:??
                Lucas Vespa and   
                      Ning Weng   Deterministic finite automata
                                  characterization and optimization for
                                  scalable pattern matching  . . . . . . . 4:1--4:??
     Abhishek Bhattacharjee and   
         Gilberto Contreras and   
             Margaret Martonosi   Parallelization libraries:
                                  Characterizing and reducing overheads    5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 2, July, 2011

               Xiangyu Dong and   
                   Yuan Xie and   
       Naveen Muralimanohar and   
               Norman P. Jouppi   Hybrid checkpointing using emerging
                                  nonvolatile memories for future exascale
                                  systems  . . . . . . . . . . . . . . . . 6:1--6:??
                 Jianjun Li and   
               Chenggang Wu and   
                  Wei-Chung Hsu   Efficient and effective misaligned data
                                  access handling in a dynamic binary
                                  translation system . . . . . . . . . . . 7:1--7:??
         Guru Venkataramani and   
      Christopher J. Hughes and   
              Sanjeev Kumar and   
                Milos Prvulovic   DeFT: Design space exploration for
                                  on-the-fly detection of coherence misses 8:1--8:??
             Jason D. Hiser and   
         Daniel W. Williams and   
                     Wei Hu and   
           Jack W. Davidson and   
                 Jason Mars and   
              Bruce R. Childers   Evaluating indirect branch handling
                                  mechanisms in software dynamic
                                  translation systems  . . . . . . . . . . 9:1--9:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 3, October, 2011

                 Xi E. Chen and   
                  Tor M. Aamodt   Hybrid analytical modeling of pending
                                  cache hits, data prefetching, and MSHRs  10:1--10:??
          Marios Kleanthous and   
             Yiannakis Sazeides   CATCH: a mechanism for dynamically
                                  detecting cache-content-duplication in
                                  instruction caches . . . . . . . . . . . 11:1--11:??
        Hans Vandierendonck and   
            André Seznec   Managing SMT resource usage through
                                  speculative instruction window weighting 12:1--12:??
                Po-Han Wang and   
              Chia-Lin Yang and   
              Yen-Ming Chen and   
                  Yu-Jung Cheng   Power gating strategies on GPUs  . . . . 13:1--13:??
                   Min Feng and   
                  Chen Tian and   
               Changhui Lin and   
                    Rajiv Gupta   Dynamic access distance driven cache
                                  replacement  . . . . . . . . . . . . . . 14:1--14:??
                Ahmad Samih and   
                Yan Solihin and   
                   Anil Krishna   Evaluating placement policies for
                                  managing capacity sharing in CMP
                                  architectures with private caches  . . . 15:1--15:??
            Chang-Ching Yeh and   
           Kuei-Chung Chang and   
               Tien-Fu Chen and   
                   Chingwei Yeh   Maintaining performance on power gating
                                  of microprocessor functional units by
                                  using a predictive pre-wakeup strategy   16:1--16:??
                Hyunjin Lee and   
               Sangyeun Cho and   
              Bruce R. Childers   DEFCAM: a design and evaluation
                                  framework for defect-tolerant cache
                                  memories . . . . . . . . . . . . . . . . 17:1--17:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 4, January, 2012

         Per Stenström and   
              Koen De Bosschere   Introduction to the special issue on
                                  high-performance and embedded
                                  architectures and compilers  . . . . . . 18:1--18:??
            Jorge Albericio and   
          Rubén Gran and   
 Pablo Ibáñez and   
Víctor Viñals and   
Jose María Llabería   ABS: a low-cost adaptive controller for
                                  prefetching in a banked shared
                                  last-level cache . . . . . . . . . . . . 19:1--19:??
           Ali Galip Bayrak and   
          Nikola Velickovic and   
                Paolo Ienne and   
                 Wayne Burleson   An architecture-independent instruction
                                  shuffler to protect against side-channel
                                  attacks  . . . . . . . . . . . . . . . . 20:1--20:??
                 John Demme and   
            Simha Sethumadhavan   Approximate graph clustering for program
                                  characterization . . . . . . . . . . . . 21:1--21:??
              Mihai Pricopi and   
                   Tulika Mitra   Bahurupi: a polymorphic heterogeneous
                                  multi-core architecture  . . . . . . . . 22:1--22:??
         Jeroen V. Cleemput and   
               Bart Coppens and   
                Bjorn De Sutter   Compiler mitigations for time attacks on
                                  modern x86 processors  . . . . . . . . . 23:1--23:??
           Jason Mccandless and   
                    David Gregg   Compiler techniques to improve dynamic
                                  branch prediction for indirect jump and
                                  call instructions  . . . . . . . . . . . 24:1--24:??
Antonio García-Guirado and   
Ricardo Fernández-Pascual and   
                Alberto Ros and   
   José M. García   DAPSCO: Distance-aware partially shared
                                  cache organization . . . . . . . . . . . 25:1--25:??
             Zhenjiang Wang and   
               Chenggang Wu and   
              Pen-Chung Yew and   
                 Jianjun Li and   
                          Di Xu   On-the-fly structure splitting for heap
                                  objects  . . . . . . . . . . . . . . . . 26:1--26:??
               Dibyendu Das and   
      B. Dupont De Dinechin and   
          Ramakrishna Upadrasta   Efficient liveness computation using
                                  merge sets and DJ-graphs . . . . . . . . 27:1--27:??
          George Patsilaras and   
         Niket K. Choudhary and   
                     James Tuck   Efficiently exploiting memory level
                                  parallelism on asymmetric coupled cores
                                  in the dark silicon era  . . . . . . . . 28:1--28:??
               Roman Malits and   
             Evgeny Bolotin and   
            Avinoam Kolodny and   
                  Avi Mendelson   Exploring the limits of GPGPU scheduling
                                  in control flow bound applications . . . 29:1--29:??
                 Lois Orosa and   
            Elisardo Antelo and   
             Javier D. Bruguera   FlexSig: Implementing flexible hardware
                                  signatures . . . . . . . . . . . . . . . 30:1--30:??
            Ruben Titos-Gil and   
           Manuel E. Acacio and   
             Jose M. Garcia and   
                 Tim Harris and   
             Adrian Cristal and   
                Osman Unsal and   
                Ibrahim Hur and   
                   Mateo Valero   Hardware transactional memory with
                                  software-defined conflicts . . . . . . . 31:1--31:??
                Yongjoo Kim and   
                Jongeun Lee and   
                Toan X. Mai and   
                  Yunheung Paek   Improving performance of nested loops on
                                  reconfigurable array processors  . . . . 32:1--32:??
        Madhura Purnaprajna and   
                    Paolo Ienne   Making wide-issue VLIW processors viable
                                  on FPGAs . . . . . . . . . . . . . . . . 33:1--33:??
         Petar Radojkovi\'c and   
             Sylvain Girbal and   
             Arnaud Grasset and   
    Eduardo Quiñones and   
                 Sami Yehia and   
           Francisco J. Cazorla   On the evaluation of the impact of
                                  shared resources in multithreaded COTS
                                  processors in time-critical environments 34:1--34:??
           Leonid Domnitser and   
               Aamer Jaleel and   
                 Jason Loew and   
          Nael Abu-Ghazaleh and   
               Dmitry Ponomarev   Non-monopolizable caches: Low-complexity
                                  mitigation of cache side channel attacks 35:1--35:??
             Alejandro Rico and   
            Felipe Cabarcas and   
          Carlos Villavieja and   
             Milan Pavlovic and   
               Augusto Vega and   
                Yoav Etsion and   
               Alex Ramirez and   
                   Mateo Valero   On the simulation of large-scale
                                  architectures using multiple application
                                  abstraction levels . . . . . . . . . . . 36:1--36:??
                Selma Saidi and   
           Pranav Tendulkar and   
             Thierry Lepley and   
                     Oded Maler   Optimizing explicit data transfers for
                                  data parallel applications on the Cell
                                  architecture . . . . . . . . . . . . . . 37:1--37:??
                   Min Feng and   
               Changhui Lin and   
                    Rajiv Gupta   PLDS: Partitioning linked data
                                  structures for parallelism . . . . . . . 38:1--38:??
            Benoit Pradelle and   
            Alain Ketterlin and   
                Philippe Clauss   Polyhedral parallelization of binary
                                  code . . . . . . . . . . . . . . . . . . 39:1--39:??
                 Yaozu Dong and   
                    Yu Chen and   
                Zhenhao Pan and   
                Jinquan Dai and   
                  Yunhong Jiang   ReNIC: Architectural extension to SR-IOV
                                  I/O virtualization for efficient
                                  replication  . . . . . . . . . . . . . . 40:1--40:??
           Tom M. Bruintjes and   
        Karel H. G. Walters and   
             Sabih H. Gerez and   
             Bert Molenkamp and   
              Gerard J. M. Smit   Sabrewing: a lightweight architecture
                                  for combined floating-point and integer
                                  arithmetic . . . . . . . . . . . . . . . 41:1--41:??
             Mario Kicherer and   
               Fabian Nowak and   
              Rainer Buchty and   
                  Wolfgang Karl   Seamlessly portable applications:
                                  Managing the diversity of modern
                                  heterogeneous systems  . . . . . . . . . 42:1--42:??
       Nathanael Premillieu and   
                   Andre Seznec   SYRANT: SYmmetric Resource Allocation on
                                  Not-taken and Taken paths  . . . . . . . 43:1--43:??
        William Hasenplaugh and   
           Pritpal S. Ahuja and   
               Aamer Jaleel and   
          Simon Steely, Jr. and   
                      Joel Emer   The gradient-based cache partitioning
                                  algorithm  . . . . . . . . . . . . . . . 44:1--44:??
                Javier Lira and   
           Timothy M. Jones and   
              Carlos Molina and   
        Antonio González   The migration prefetcher: Anticipating
                                  data promotion in dynamic NUCA caches    45:1--45:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   Thread Tranquilizer: Dynamically
                                  reducing performance variation . . . . . 46:1--46:??
             Dongsong Zhang and   
                   Deke Guo and   
              Fangyuan Chen and   
                     Fei Wu and   
                    Tong Wu and   
                   Ting Cao and   
                     Shiyao Jin   TL-plane-based multi-core
                                  energy-efficient real-time scheduling
                                  algorithm for sporadic tasks . . . . . . 47:1--47:??
           Michael J. Lyons and   
             Mark Hempstead and   
                Gu-Yeon Wei and   
                   David Brooks   The accelerator store: a shared memory
                                  framework for accelerator-based systems  48:1--48:??
              Daniel Orozco and   
               Elkin Garcia and   
                 Rishi Khan and   
           Kelly Livingston and   
                   Guang R. Gao   Toward high-throughput algorithms on
                                  many-core architectures  . . . . . . . . 49:1--49:??
                Kevin Stock and   
    Louis-Noël Pouchet and   
                  P. Sadayappan   Using machine learning to improve
                                  automatic vectorization  . . . . . . . . 50:1--50:??
     Kanit Therdsteerasukdi and   
               Gyungsu Byun and   
                 Jason Cong and   
             M. Frank Chang and   
                  Glenn Reinman   Utilizing RF-I and intelligent
                                  scheduling for better throughput/watt in
                                  a mobile GPU memory system . . . . . . . 51:1--51:??
        Frederick Ryckbosch and   
             Stijn Polfliet and   
                Lieven Eeckhout   VSim: Simulating multi-server setups at
                                  near native hardware speed . . . . . . . 52:1--52:??
                  Miao Zhou and   
                      Yu Du and   
             Bruce Childers and   
                Rami Melhem and   
            Daniel Mossé   Writeback-aware partitioning and
                                  replacement for last-level caches in
                                  phase change main memory systems . . . . 53:1--53:??
              Qingping Wang and   
            Sameer Kulkarni and   
               John Cavazos and   
                  Michael Spear   A transactional memory with automatic
                                  performance tuning . . . . . . . . . . . 54:1--54:??
          Bartosz Bogdanski and   
          Sven-Arne Reinemo and   
    Frank Olaf Sem-Jacobsen and   
              Ernst Gunnar Gran   sFtree: a fully connected and
                                  deadlock-free switch-to-switch routing
                                  algorithm for fat-trees  . . . . . . . . 55:1--55:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 1, March, 2012

          Walid J. Ghandour and   
             Haitham Akkary and   
                      Wes Masri   Leveraging Strength-Based Dynamic
                                  Information Flow Analysis to Enhance
                                  Data Value Prediction  . . . . . . . . . 1:1--1:??
                 Jaekyu Lee and   
                Hyesoon Kim and   
                  Richard Vuduc   When Prefetching Works, When It Doesn't,
                                  and Why  . . . . . . . . . . . . . . . . 2:1--2:??
               Bita Mazloom and   
          Shashidhar Mysore and   
               Mohit Tiwari and   
              Banit Agrawal and   
                   Tim Sherwood   Dataflow Tomography: Information Flow
                                  Tracking For Understanding and
                                  Visualizing Full Systems . . . . . . . . 3:1--3:??
                Jung Ho Ahn and   
           Norman P. Jouppi and   
         Christos Kozyrakis and   
             Jacob Leverich and   
            Robert S. Schreiber   Improving System Energy Efficiency with
                                  Memory Rank Subsetting . . . . . . . . . 4:1--4:??
                Xuejun Yang and   
                    Li Wang and   
               Jingling Xue and   
                      Qingbo Wu   Comparability Graph Coloring for
                                  Optimizing Utilization of
                                  Software-Managed Stream Register Files
                                  for Stream Processors  . . . . . . . . . 5:1--5:??
        Abhinandan Majumdar and   
            Srihari Cadambi and   
             Michela Becchi and   
       Srimat T. Chakradhar and   
                Hans Peter Graf   A Massively Parallel, Energy Efficient
                                  Programmable Accelerator for Learning
                                  and Classification . . . . . . . . . . . 6:1--6:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 2, June, 2012

              Stijn Eyerman and   
                Lieven Eeckhout   Probabilistic modeling for job symbiosis
                                  scheduling on SMT processors . . . . . . 7:1--7:??
              Rachid Seghir and   
           Vincent Loechner and   
              Beno\^\it Meister   Integer affine transformations of
                                  parametric $Z$-polytopes and
                                  applications to loop nest optimization   8:1--8:??
                    Yi Yang and   
                 Ping Xiang and   
               Jingfei Kong and   
                Mike Mantor and   
                   Huiyang Zhou   A unified optimizing compiler framework
                                  for different GPGPU architectures  . . . 9:1--9:??
               Choonki Jang and   
                 Jaejin Lee and   
             Bernhard Egger and   
                    Soojung Ryu   Automatic code overlay generation and
                                  partially redundant code fetch
                                  elimination  . . . . . . . . . . . . . . 10:1--10:??
               Zahra Abbasi and   
     Georgios Varsamopoulos and   
            Sandeep K. S. Gupta   TACOMA: Server and workload management
                                  in Internet data centers considering
                                  cooling-computing power trade-off and
                                  energy proportionality . . . . . . . . . 11:1--11:??
             Andreas Lankes and   
                Thomas Wild and   
        Stefan Wallentowitz and   
            Andreas Herkersdorf   Benefits of selective packet discard in
                                  networks-on-chip . . . . . . . . . . . . 12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 3, September, 2012

               Yangchun Luo and   
                   Antonia Zhai   Dynamically dispatching speculative
                                  threads to improve sequential execution  13:1--13:??
                 Huimin Cui and   
               Jingling Xue and   
                   Lei Wang and   
                  Yang Yang and   
              Xiaobing Feng and   
                    Dongrui Fan   Extendable pattern-oriented optimization
                                  directives . . . . . . . . . . . . . . . 14:1--14:??
            Adam Wade Lewis and   
            Nian-Feng Tzeng and   
                   Soumik Ghosh   Runtime energy consumption estimation
                                  for server workloads based on chaotic
                                  time-series approximation  . . . . . . . 15:1--15:??
           Alejandro Valero and   
           Julio Sahuquillo and   
             Salvador Petit and   
         Pedro López and   
              José Duato   Combining recency of information with
                                  selective random and a victim cache in
                                  last-level caches  . . . . . . . . . . . 16:1--16:??
                     Bin Li and   
              Li-Shiuan Peh and   
                    Li Zhao and   
                      Ravi Iyer   Dynamic QoS management for chip
                                  multiprocessors  . . . . . . . . . . . . 17:1--17:??
      Polychronis Xekalakis and   
            Nikolas Ioannou and   
                 Marcelo Cintra   Mixed speculative multithreaded
                                  execution models . . . . . . . . . . . . 18:1--18:??
        Mageda Sharafeddine and   
                Komal Jothi and   
                 Haitham Akkary   Disjoint out-of-order execution
                                  processor  . . . . . . . . . . . . . . . 19:1--19:??
              Diego Andrade and   
        Basilio B. Fraguela and   
            Ramón Doallo   Static analysis of the worst-case memory
                                  performance for irregular codes with
                                  indirections . . . . . . . . . . . . . . 20:1--20:??
                  Yang Chen and   
              Shuangde Fang and   
              Yuanjie Huang and   
            Lieven Eeckhout and   
             Grigori Fursin and   
              Olivier Temam and   
                   Chengyong Wu   Deconstructing iterative optimization    21:1--21:??
                 Apala Guha and   
              Kim Hazelwood and   
                 Mary Lou Soffa   Memory optimization of dynamic binary
                                  translators for embedded systems . . . . 22:1--22:??
            James R. Geraci and   
                Sharon M. Sacco   A transpose-free in-place SIMD optimized
                                  FFT  . . . . . . . . . . . . . . . . . . 23:1--23:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 4, January, 2013

               Bart Coppens and   
            Bjorn De Sutter and   
                    Jonas Maebe   Feedback-driven binary code
                                  diversification to the special issue on
                                  high-performance embedded architectures
                                  and compilers  . . . . . . . . . . . . . 24:1--24:??
              Jeremy Fowers and   
                 Greg Brown and   
              John Wernsing and   
                     Greg Stitt   A performance and energy comparison of
                                  convolution on GPUs, FPGAs, and
                                  multicore processors . . . . . . . . . . 25:1--25:??
                Erven Rohou and   
             Kevin Williams and   
                    David Yuste   Vectorization technology to improve
                                  interpreter performance  . . . . . . . . 26:1--26:??
               Jimmy Cleary and   
              Owen Callanan and   
               Mark Purcell and   
                    David Gregg   Fast asymmetric thread synchronization   27:1--27:??
                    Yong Li and   
                Rami Melhem and   
                  Alex K. Jones   PS-TLB: Leveraging page classification
                                  information for fast, scalable and
                                  efficient translation for future CMPs    28:1--28:??
            Kristof Du Bois and   
              Stijn Eyerman and   
                Lieven Eeckhout   Per-thread cycle accounting in multicore
                                  processors . . . . . . . . . . . . . . . 29:1--29:??
           Christian Wimmer and   
              Michael Haupt and   
   Michael L. Van De Vanter and   
                Mick Jordan and   
           Laurent Dayn\`es and   
                  Douglas Simon   Maxine: an approachable virtual machine
                                  for, and in, Java  . . . . . . . . . . . 30:1--30:??
                 Malik Khan and   
               Protonu Basu and   
                  Gabe Rudy and   
                  Mary Hall and   
                  Chun Chen and   
               Jacqueline Chame   A script-based autotuning compiler
                                  system to generate high-performance CUDA
                                  code . . . . . . . . . . . . . . . . . . 31:1--31:??
        Kenzo Van Craeynest and   
                Lieven Eeckhout   Understanding fundamental design choices
                                  in single-ISA heterogeneous multicore
                                  architectures  . . . . . . . . . . . . . 32:1--32:??
        Samuel Antão and   
                   Leonel Sousa   The CRNS framework and its application
                                  to programmable and reconfigurable
                                  cryptography . . . . . . . . . . . . . . 33:1--33:??
             Boubacar Diouf and   
                 Can Hantas and   
               Albert Cohen and   
     Özcan Özturk and   
                  Jens Palsberg   A decoupled local memory allocator . . . 34:1--34:??
                 Huimin Cui and   
                    Qing Yi and   
               Jingling Xue and   
                  Xiaobing Feng   Layout-oblivious compiler optimization
                                  for matrix computations  . . . . . . . . 35:1--35:??
              Stephen Dolan and   
       Servesh Muralidharan and   
                    David Gregg   Compiler support for lightweight context
                                  switching  . . . . . . . . . . . . . . . 36:1--36:??
                 Pablo Abad and   
            Valentin Puente and   
            Jose-Angel Gregorio   LIGERO: a light but efficient router
                                  conceived for cache-coherent chip
                                  multiprocessors  . . . . . . . . . . . . 37:1--37:??
            Jorge Albericio and   
 Pablo Ibáñez and   
Víctor Viñals and   
Jose María Llabería   Exploiting reuse locality on inclusive
                                  shared last-level caches . . . . . . . . 38:1--38:??
        Paraskevas Yiapanis and   
           Demian Rosas-Ham and   
                Gavin Brown and   
             Mikel Luján   Optimizing software runtime systems for
                                  speculative parallelization  . . . . . . 39:1--39:??
            Cedric Nugteren and   
             Pieter Custers and   
                 Henk Corporaal   Algorithmic species: a classification of
                                  affine loop nests for parallel
                                  programming  . . . . . . . . . . . . . . 40:1--40:??
        Marco E. T. Gerards and   
                      Jan Kuper   Optimal DPM and DVFS for frame-based
                                  real-time systems  . . . . . . . . . . . 41:1--41:??
                Zhichao Yan and   
                 Hong Jiang and   
                 Yujuan Tan and   
                       Dan Feng   An integrated pseudo-associativity and
                                  relaxed-order approach to hardware
                                  transactional memory . . . . . . . . . . 42:1--42:??
                 Doris Chen and   
                Deshanand Singh   Profile-guided floating- to fixed-point
                                  conversion for hybrid FPGA-processor
                                  applications . . . . . . . . . . . . . . 43:1--43:??
                    Yan Cui and   
               Yingxin Wang and   
                    Yu Chen and   
                   Yuanchun Shi   Lock-contention-aware scheduler: a
                                  scalable and energy-efficient method for
                                  addressing scalability collapse on
                                  multicore systems  . . . . . . . . . . . 44:1--44:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   ADAPT: a framework for coscheduling
                                  multithreaded programs . . . . . . . . . 45:1--45:??
            Michele Tartara and   
        Stefano Crespi Reghizzi   Continuous learning of compiler
                                  heuristics . . . . . . . . . . . . . . . 46:1--46:??
          Grigorios Chrysos and   
     Panagiotis Dagritzikos and   
     Ioannis Papaefstathiou and   
               Apostolos Dollas   HC-CART: a parallel system
                                  implementation of data mining
                                  classification and regression tree
                                  (CART) algorithm on a multi-FPGA system  47:1--47:??
                Jongwon Lee and   
                   Yohan Ko and   
              Kyoungwoo Lee and   
            Jonghee M. Youn and   
                  Yunheung Paek   Dynamic code duplication with
                                  vulnerability awareness for soft error
                                  detection on VLIW architectures  . . . . 48:1--48:??
              Fabien Coelho and   
        François Irigoin   API compilation for image hardware
                                  accelerators . . . . . . . . . . . . . . 49:1--49:??
               Carlos Luque and   
              Miquel Moreto and   
       Francisco J. Cazorla and   
                   Mateo Valero   Fair CPU time accounting in CMP+SMT
                                  processors . . . . . . . . . . . . . . . 50:1--50:??
       Pavlos M. Mattheakis and   
         Ioannis Papaefstathiou   Significantly reducing MPI
                                  intercommunication latency and power
                                  overhead in both embedded and HPC
                                  systems  . . . . . . . . . . . . . . . . 51:1--51:??
            Riyadh Baghdadi and   
               Albert Cohen and   
           Sven Verdoolaege and   
            Konrad Trifunovi\'c   Improved loop tiling based on the
                                  removal of spurious false dependences    52:1--52:??
                Antoniu Pop and   
                   Albert Cohen   OpenStream: Expressiveness and data-flow
                                  compilation of OpenMP streaming programs 53:1--53:??
           Sven Verdoolaege and   
          Juan Carlos Juega and   
               Albert Cohen and   
José Ignacio Gómez and   
         Christian Tenllado and   
               Francky Catthoor   Polyhedral parallel code generation for
                                  CUDA . . . . . . . . . . . . . . . . . . 54:1--54:??
                      Yu Du and   
                  Miao Zhou and   
             Bruce Childers and   
                Rami Melhem and   
            Daniel Mossé   Delta-compressed caching for overcoming
                                  the write bandwidth limitation of hybrid
                                  main memory  . . . . . . . . . . . . . . 55:1--55:??
              Suresh Purini and   
                   Lakshya Jain   Finding good optimization sequences
                                  covering program space . . . . . . . . . 56:1--56:??
       Mehmet E. Belviranli and   
            Laxmi N. Bhuyan and   
                    Rajiv Gupta   A dynamic self-scheduling scheme for
                                  heterogeneous multiprocessor
                                  architectures  . . . . . . . . . . . . . 57:1--57:??
                Anurag Negi and   
                Ruben Titos-Gil   SCIN-cache: Fast speculative versioning
                                  in multithreaded cores . . . . . . . . . 58:1--58:??
               Thibaut Lutz and   
           Christian Fensch and   
                    Murray Cole   PARTANS: an autotuning framework for
                                  stencil computation on multi-GPU systems 59:1--59:??
               Chunhua Xiao and   
           M-C. Frank Chang and   
                 Jason Cong and   
               Michael Gill and   
             Zhangqin Huang and   
                Chunyue Liu and   
              Glenn Reinman and   
                         Hao Wu   Stream arbitration: Towards efficient
                                  bandwidth utilization for emerging
                                  on-chip interconnects  . . . . . . . . . 60:1--60:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 1, April, 2013

                 Yunji Chen and   
               Tianshi Chen and   
                    Ling Li and   
                 Ruiyang Wu and   
                  Daofu Liu and   
                       Weiwu Hu   Deterministic Replay Using Global Clock  1:1--1:??
              Daniel Lustig and   
     Abhishek Bhattacharjee and   
             Margaret Martonosi   TLB Improvements for Chip
                                  Multiprocessors: Inter-Core Cooperative
                                  Prefetchers and Shared Last-Level TLBs   2:1--2:??
                  Rong Chen and   
                     Haibo Chen   Tiled-MapReduce: Efficient and Flexible
                                  MapReduce Processing on Multicore with
                                  Tiling . . . . . . . . . . . . . . . . . 3:1--3:??
             Michela Becchi and   
                Patrick Crowley   A-DFA: a Time- and Space-Efficient DFA
                                  Compression Algorithm for Fast Regular
                                  Expression Evaluation  . . . . . . . . . 4:1--4:26
                   Sheng Li and   
                Jung Ho Ahn and   
          Richard D. Strong and   
            Jay B. Brockman and   
            Dean M. Tullsen and   
               Norman P. Jouppi   The McPAT Framework for Multicore and
                                  Manycore Architectures: Simultaneously
                                  Modeling Power, Area, and Timing . . . . 5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 2, May, 2013

        Angeliki Kritikakou and   
           Francky Catthoor and   
       George S. Athanasiou and   
        Vasilios Kelefouras and   
                  Costas Goutis   Near-Optimal Microprocessor and
                                  Accelerators Codesign with Latency and
                                  Throughput Constraints . . . . . . . . . 6:1--6:??
                  Lei Jiang and   
                      Yu Du and   
                    Bo Zhao and   
               Youtao Zhang and   
          Bruce R. Childers and   
                       Jun Yang   Hardware-Assisted Cooperative
                                  Integration of Wear-Leveling and
                                  Salvaging for Phase Change Memory  . . . 7:1--7:??
               Kyuseung Han and   
                Junwhan Ahn and   
                   Kiyoung Choi   Power-Efficient Predication Techniques
                                  for Acceleration of Control Flow
                                  Execution on CGRA  . . . . . . . . . . . 8:1--8:??
                  Chao Wang and   
                      Xi Li and   
              Junneng Zhang and   
                Xuehai Zhou and   
                   Xiaoning Nie   MP-Tomasulo: a Dependency-Aware
                                  Automatic Parallel Execution Engine for
                                  Sequential Programs  . . . . . . . . . . 9:1--9:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 3, September, 2013

                      Anonymous   TACO Reviewers 2012  . . . . . . . . . . 9:1--9:??
                Eran Shifer and   
                   Shlomo Weiss   Low-latency adaptive mode transitions
                                  and hierarchical power management in
                                  asymmetric clustered cores . . . . . . . 10:1--10:??
             Yosi Ben Asher and   
                    Nadav Rotem   Hybrid type legalization for a sparse
                                  SIMD instruction set . . . . . . . . . . 11:1--11:??
                 Yuanwu Lei and   
                   Yong Dou and   
                    Lei Guo and   
                   Jinbo Xu and   
                   Jie Zhou and   
                Yazhuo Dong and   
                    Hongjian Li   VLIW coprocessor for IEEE-754
                                  quadruple-precision elementary functions 12:1--12:??
          Motohiro Kawahito and   
            Hideaki Komatsu and   
             Takao Moriyama and   
              Hiroshi Inoue and   
                Toshio Nakatani   Idiom recognition framework using
                                  topological embedding  . . . . . . . . . 13:1--13:??
            Ghassan Shobaki and   
            Maxim Shawabkeh and   
        Najm Eldeen Abu Rmaileh   Preallocation instruction scheduling
                                  with register pressure minimization
                                  using a combinatorial optimization
                                  approach . . . . . . . . . . . . . . . . 14:1--14:??
                Dongrui She and   
                   Yifan He and   
                 Henk Corporaal   An energy-efficient method of supporting
                                  flexible special instructions in an
                                  embedded processor with compact ISA  . . 15:1--15:??
       V. Krishna Nandivada and   
               Rajkishore Barik   Improved bitwidth-aware variable packing 16:1--16:??
                Jung Ho Ahn and   
             Young Hoon Son and   
                       John Kim   Scalable high-radix router
                                  microarchitecture using a network switch
                                  organization . . . . . . . . . . . . . . 17:1--17:??
                 Libo Huang and   
               Zhiying Wang and   
                  Nong Xiao and   
               Yongwen Wang and   
                      Qiang Dou   Adaptive communication mechanism for
                                  accelerating MPI functions in NoC-based
                                  multicore processors . . . . . . . . . . 18:1--18:??
              Avinash Malik and   
                    David Gregg   Orchestrating stream graphs using model
                                  checking . . . . . . . . . . . . . . . . 19:1--19:??
                 Zheng Wang and   
          Michael F. P. O'Boyle   Using machine learning to partition
                                  streaming programs . . . . . . . . . . . 20:1--20:??
                Ali Bakhoda and   
                   John Kim and   
                  Tor M. Aamodt   Designing on-chip networks for
                                  throughput accelerators  . . . . . . . . 21:1--21:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 4, December, 2013

           Michael R. Jantz and   
             Prasad A. Kulkarni   Exploring single and multilevel JIT
                                  compilation policy for modern machines 1 22:1--22:??
               Xiangyu Dong and   
           Norman P. Jouppi and   
                       Yuan Xie   A circuit-architecture co-optimization
                                  framework for exploring nonvolatile
                                  memory hierarchies . . . . . . . . . . . 23:1--23:??
                Jishen Zhao and   
                Guangyu Sun and   
             Gabriel H. Loh and   
                       Yuan Xie   Optimizing GPU energy efficiency with
                                  $3$D die-stacking graphics memory and
                                  reconfigurable memory interface  . . . . 24:1--24:??
             Chien-Chi Chen and   
                  Sheng-De Wang   An efficient multicharacter transition
                                  string-matching engine based on the
                                  Aho--Corasick algorithm  . . . . . . . . 25:1--25:??
               Yangchun Luo and   
              Wei-Chung Hsu and   
                   Antonia Zhai   The design and implementation of
                                  heterogeneous multicore systems for
                                  energy-efficient speculative thread
                                  execution  . . . . . . . . . . . . . . . 26:1--26:??
          Dyer Rolán and   
        Basilio B. Fraguela and   
            Ramón Doallo   Virtually split cache: an efficient
                                  mechanism to distribute instructions and
                                  data 1 . . . . . . . . . . . . . . . . . 27:1--27:??
      Samantika Subramaniam and   
            Simon C. Steely and   
           Will Hasenplaugh and   
               Aamer Jaleel and   
              Carl Beckmann and   
             Tryggve Fossum and   
                      Joel Emer   Using in-flight chains to build a
                                  scalable cache coherence protocol  . . . 28:1--28:??
      Daniel Sánchez and   
         Yiannakis Sazeides and   
     Juan M. Cebrián and   
José M. García and   
          Juan L. Aragón   Modeling the impact of permanent faults
                                  in caches  . . . . . . . . . . . . . . . 29:1--29:??
               Sanghoon Lee and   
                     James Tuck   Automatic parallelization of
                                  fine-grained metafunctions on a chip
                                  multiprocessor . . . . . . . . . . . . . 30:1--30:??
          Christophe Dubach and   
           Timothy M. Jones and   
               Edwin V. Bonilla   Dynamic microarchitectural adaptation
                                  using machine learning . . . . . . . . . 31:1--31:??
                  Long Chen and   
                  Yanan Cao and   
                     Zhao Zhang   E$^3$CC: a memory error protection
                                  scheme with novel address mapping for
                                  subranked and low-power memories . . . . 32:1--32:??
              Yingying Tian and   
             Samira M. Khan and   
       Daniel A. Jiménez   Temporal-based multilevel correlating
                                  inclusive cache replacement  . . . . . . 33:1--33:??
                 Qixiao Liu and   
              Miquel Moreto and   
             Victor Jimenez and   
               Jaume Abella and   
       Francisco J. Cazorla and   
                   Mateo Valero   Hardware support for accurate per-task
                                  energy metering in multicore systems . . 34:1--34:??
               Sanyam Mehta and   
            Gautham Beeraka and   
                  Pen-Chung Yew   Tile size selection revisited  . . . . . 35:1--35:??
           Bogdan Prisacari and   
           German Rodriguez and   
          Cyriel Minkenberg and   
                Torsten Hoefler   Fast pattern-specific routing for fat
                                  tree networks  . . . . . . . . . . . . . 36:1--36:??
      Maximilien B. Breughe and   
                Lieven Eeckhout   Selecting representative benchmark
                                  inputs for exploring microprocessor
                                  design spaces  . . . . . . . . . . . . . 37:1--37:??
     Christoph Kerschbaumer and   
              Eric Hennigan and   
                 Per Larsen and   
          Stefan Brunthaler and   
                  Michael Franz   Information flow tracking meets
                                  just-in-time compilation . . . . . . . . 38:1--38:??
                   Rupesh Nasre   Time- and space-efficient flow-sensitive
                                  points-to analysis . . . . . . . . . . . 39:1--39:??
                Wenjia Ruan and   
                  Yujie Liu and   
                  Michael Spear   Boosting timestamp-based transactional
                                  memory by exploiting hardware cycle
                                  counters . . . . . . . . . . . . . . . . 40:1--40:??
                 Tanima Dey and   
                   Wei Wang and   
           Jack W. Davidson and   
                 Mary Lou Soffa   ReSense: Mapping dynamic workloads of
                                  colocated multithreaded applications
                                  using resource sensitivity . . . . . . . 41:1--41:??
           Adri\`a Armejach and   
            Ruben Titos-Gil and   
                Anurag Negi and   
             Osman S. Unsal and   
          Adrián Cristal   Techniques to improve performance in
                                  requester-wins hardware transactional
                                  memory . . . . . . . . . . . . . . . . . 42:1--42:??
             Myeongjae Jeon and   
                Conglong Li and   
                Alan L. Cox and   
                   Scott Rixner   Reducing DRAM row activations with eager
                                  read/write clustering  . . . . . . . . . 43:1--43:??
                Zhijia Zhao and   
           Michael Bebenita and   
                Dave Herman and   
                Jianhua Sun and   
                    Xipeng Shen   HPar: a practical parallel parser for
                                  HTML --- taming HTML complexities for
                                  parallel parsing . . . . . . . . . . . . 44:1--44:??
               Ehsan Totoni and   
                Mert Dikmen and   
María Jesús Garzarán   Easy, fast, and energy-efficient object
                                  detection on heterogeneous on-chip
                                  architectures  . . . . . . . . . . . . . 45:1--45:??
      Viacheslav V. Fedorov and   
                  Sheng Qiu and   
      A. L. Narasimha Reddy and   
                  Paul V. Gratz   ARI: Adaptive LLC-memory traffic
                                  management . . . . . . . . . . . . . . . 46:1--46:??
Cecilia González-Álvarez and   
         Jennifer B. Sartor and   
      Carlos Álvarez and   
Daniel Jiménez-González and   
                Lieven Eeckhout   Accelerating an application domain with
                                  specialized functional units . . . . . . 47:1--47:??
               Xiaolin Wang and   
               Lingmei Weng and   
               Zhenlin Wang and   
                    Yingwei Luo   Revisiting memory management on
                                  virtualized environments . . . . . . . . 48:1--48:??
              Chuntao Jiang and   
                  Zhibin Yu and   
                    Hai Jin and   
              Chengzhong Xu and   
            Lieven Eeckhout and   
                Wim Heirman and   
          Trevor E. Carlson and   
                   Xiaofei Liao   PCantorSim: Accelerating parallel
                                  architecture simulation through
                                  fractal-based sampling . . . . . . . . . 49:1--49:??
             Srdan Stipi\'c and   
         Vesna Smiljkovi\'c and   
                Osman Unsal and   
      Adrián Cristal and   
                   Mateo Valero   Profile-guided transaction
                                  coalescing-lowering transactional
                                  overheads by merging transactions  . . . 50:1--50:??
                   Zhe Wang and   
              Shuchang Shan and   
                   Ting Cao and   
                   Junli Gu and   
                      Yi Xu and   
                   Shuai Mu and   
                   Yuan Xie and   
       Daniel A. Jiménez   WADE: Writeback-aware dynamic cache
                                  management for NVM-based main memory
                                  system . . . . . . . . . . . . . . . . . 51:1--51:??
                    Yong Li and   
               Yaojun Zhang and   
                     Hai LI and   
                 Yiran Chen and   
                  Alex K. Jones   C1C: a configurable, compiler-guided
                                  STT-RAM L1 cache . . . . . . . . . . . . 52:1--52:??
              Naznin Fauzia and   
            Venmugil Elango and   
         Mahesh Ravishankar and   
               J. Ramanujam and   
           Fabrice Rastello and   
             Atanas Rountev and   
    Louis-Noël Pouchet and   
                  P. Sadayappan   Beyond reuse distance analysis: Dynamic
                                  analysis for characterization of data
                                  locality potential . . . . . . . . . . . 53:1--53:??
          Alen Bardizbanyan and   
      Magnus Själander and   
              David Whalley and   
            Per Larsson-Edefors   Designing a practical data filter cache
                                  to improve both energy efficiency and
                                  performance  . . . . . . . . . . . . . . 54:1--54:??
            Andrei Hagiescu and   
                   Bing Liu and   
              R. Ramanathan and   
  Sucheendra K. Palaniappan and   
                  Zheng Cui and   
       Bipasa Chattopadhyay and   
          P. S. Thiagarajan and   
                  Weng-Fai Wong   GPU code generation for ODE-based
                                  applications with phased shared-data
                                  access patterns  . . . . . . . . . . . . 55:1--55:??
                Junghee Lee and   
    Chrysostomos Nicopoulos and   
              Hyung Gyu Lee and   
                    Jongman Kim   TornadoNoC: a lightweight and scalable
                                  on-chip network architecture for the
                                  many-core era  . . . . . . . . . . . . . 56:1--56:??
           Christos Strydis and   
          Robert M. Seepers and   
          Pedro Peris-Lopez and   
           Dimitrios Siskos and   
                Ioannis Sourdis   A system architecture, processor, and
                                  communication protocol for secure
                                  implants . . . . . . . . . . . . . . . . 57:1--57:??
                 Wonsub Kim and   
               Yoonseo Choi and   
                    Haewoo Park   Fast modulo scheduler utilizing
                                  patternized routes for coarse-grained
                                  reconfigurable architectures . . . . . . 58:1--58:??
               Dorit Nuzman and   
               Revital Eres and   
              Sergei Dyshel and   
         Marcel Zalmanovici and   
                  Jose Castanos   JIT technology with C/C++:
                                  Feedback-directed dynamic recompilation
                                  for statically compiled languages  . . . 59:1--59:??
          Thejas Ramashekar and   
                Uday Bondhugula   Automatic data allocation and buffer
                                  management for multi-GPU machines  . . . 60:1--60:??
        Hans Vandierendonck and   
            George Tzenakis and   
      Dimitrios S. Nikolopoulos   Analysis of dependence tracking
                                  algorithms for task dataflow execution   61:1--61:??
             Yeonghun Jeong and   
              Seongseok Seo and   
                    Jongeun Lee   Evaluator-executor transformation for
                                  efficient pipelining of loops with
                                  conditionals . . . . . . . . . . . . . . 62:1--62:??
           Rajkishore Barik and   
               Jisheng Zhao and   
                   Vivek Sarkar   A decoupled non-SSA global register
                                  allocation using bipartite liveness
                                  graphs . . . . . . . . . . . . . . . . . 63:1--63:??
                Peter Gavin and   
              David Whalley and   
          Magnus Själander   Reducing instruction fetch energy in
                                  multi-issue processors . . . . . . . . . 64:1--64:??
                      Anonymous   List of distinguished reviewers ACM TACO 65:1--65:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 1, February, 2014

                Neeraj Goel and   
               Anshul Kumar and   
            Preeti Ranjan Panda   Shared-port register file architecture
                                  for low-energy VLIW processors . . . . . 1:1--1:32
                 Zheng Wang and   
       Georgios Tournavitis and   
          Björn Franke and   
          Michael F. P. O'Boyle   Integrating profile-driven parallelism
                                  detection and machine-learning-based
                                  mapping  . . . . . . . . . . . . . . . . 2:1--2:26
             Mehrzad Samadi and   
               Amir Hormati and   
              Janghaeng Lee and   
                   Scott Mahlke   Leveraging GPUs using cooperative loop
                                  speculation  . . . . . . . . . . . . . . 3:1--3:26
                   Jue Wang and   
               Xiangyu Dong and   
                   Yuan Xie and   
               Norman P. Jouppi   Endurance-aware cache line management
                                  for non-volatile caches  . . . . . . . . 4:1--4:24
                    Lei Liu and   
                  Zehan Cui and   
                    Yong Li and   
                Yungang Bao and   
                Mingyu Chen and   
                   Chengyong Wu   BPM/BPM+: Software-based dynamic memory
                                  partitioning mechanisms for mitigating
                                  DRAM bank-/channel-level interferences
                                  in multicore systems . . . . . . . . . . 5:1--5:28
       Christian Häubl and   
           Christian Wimmer and   
 Hanspeter Mössenböck   Trace transitioning and exception
                                  handling in a trace-based JIT compiler
                                  for Java . . . . . . . . . . . . . . . . 6:1--6:26
             Yongbing Huang and   
               Licheng Chen and   
                  Zehan Cui and   
                  Yuan Ruan and   
                Yungang Bao and   
                Mingyu Chen and   
                    Ninghui Sun   HMTT: a hybrid hardware/software tracing
                                  system for bridging the DRAM access
                                  trace's semantic gap . . . . . . . . . . 7:1--7:25
                  Quan Chen and   
                      Minyi Guo   Adaptive workload-aware task scheduling
                                  for single-ISA asymmetric multicore
                                  architectures  . . . . . . . . . . . . . 8:1--8:25
Gülfem Savrun-Yeniçeri and   
                  Wei Zhang and   
               Huahan Zhang and   
               Eric Seckler and   
                    Chen Li and   
          Stefan Brunthaler and   
                 Per Larsen and   
                  Michael Franz   Efficient hosted interpreters on the JVM 9:1--9:24
           Prashant J. Nair and   
             Chia-Chen Chou and   
           Moinuddin K. Qureshi   Refresh pausing in DRAM memory systems   10:1--10:26
                Komal Jothi and   
                 Haitham Akkary   Tuning the continual flow pipeline
                                  architecture with virtual register
                                  renaming . . . . . . . . . . . . . . . . 11:1--11:27
               Thomas Carle and   
         Dumitru Potop-Butucaru   Predicate-aware, makespan-preserving
                                  software pipelining of scheduling tables 12:1--12:26
        Angeliki Kritikakou and   
           Francky Catthoor and   
        Vasilios Kelefouras and   
                  Costas Goutis   A scalable and near-optimal
                                  representation of access schemes for
                                  memory management  . . . . . . . . . . . 13:1--13:25
               Hugh Leather and   
              Edwin Bonilla and   
                Michael O'Boyle   Automatic feature generation for machine
                                  learning--based optimising compilation   14:1--14:32

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 2, July, 2014

                Theo Kluter and   
               Samuel Burri and   
               Philip Brisk and   
            Edoardo Charbon and   
                    Paolo Ienne   Virtual Ways: Low-Cost Coherence for
                                  Instruction Set Extensions with
                                  Architecturally Visible Storage  . . . . 15:1--15:26
                    Bin Ren and   
             Todd Mytkowicz and   
                  Gagan Agrawal   A Portable Optimization Engine for
                                  Accelerating Irregular Data-Traversal
                                  Applications on SIMD Architectures . . . 16:1--16:??
                Zhengwei Qi and   
                Jianguo Yao and   
                 Chao Zhang and   
                    Miao Yu and   
               Zhizhou Yang and   
                   Haibing Guan   VGRIS: Virtualized GPU Resource
                                  Isolation and Scheduling in Cloud Gaming 17:1--17:25
               Bor-Yeh Shen and   
              Wei-Chung Hsu and   
                       Wuu Yang   A Retargetable Static Binary Translator
                                  for the ARM Architecture . . . . . . . . 18:1--18:??
Darío Suárez Gracia and   
  Alexandra Ferrerón and   
   Luis Montesano Del Campo and   
       Teresa Monreal Arnal and   
Víctor Viñals Yúfera   Revisiting LP--NUCA Energy Consumption:
                                  Cache Access Policies and Adaptive Block
                                  Dropping . . . . . . . . . . . . . . . . 19:1--19:??
               Zhibin Liang and   
                  Wei Zhang and   
                  Yung-Cheng Ma   Deadline-Constrained Clustered
                                  Scheduling for VLIW Architectures using
                                  Power-Gated Register Files . . . . . . . 20:1--20:26
              Shuangde Fang and   
                  Zidong Du and   
                Yuntan Fang and   
              Yuanjie Huang and   
                  Yang Chen and   
            Lieven Eeckhout and   
              Olivier Temam and   
                  Huawei Li and   
                 Yunji Chen and   
                   Chengyong Wu   Performance Portability Across
                                  Heterogeneous SoCs Using a Generalized
                                  Library-Based Approach . . . . . . . . . 21:1--21:??
        Abdulrahman Kaitoua and   
                 Hazem Hajj and   
         Mazen A. R. Saghir and   
              Hassan Artail and   
             Haitham Akkary and   
              Mariette Awad and   
        Mageda Sharafeddine and   
                Khaleel Mershad   Hadoop Extensions for Distributed
                                  Computing on Reconfigurable Active SSD
                                  Clusters . . . . . . . . . . . . . . . . 22:1--22:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 3, October, 2014

                   Jue Wang and   
               Xiangyu Dong and   
                       Yuan Xie   Preventing STT-RAM Last-Level Caches
                                  from Port Obstruction  . . . . . . . . . 23:1--23:??
        M. A. Gonzalez-Mesa and   
           Eladio Gutierrez and   
           Emilio L. Zapata and   
                    Oscar Plata   Effective Transactional Memory Execution
                                  Management for Improved Concurrency  . . 24:1--24:??
               Rakesh Kumar and   
  Alejandro Martínez and   
        Antonio González   Efficient Power Gating of SIMD
                                  Accelerators Through Dynamic Selective
                                  Devectorization in an HW/SW Codesigned
                                  Environment  . . . . . . . . . . . . . . 25:1--25:??
           Stefano Di Carlo and   
          Salvatore Galfano and   
               Marco Indaco and   
             Paolo Prinetto and   
            Davide Bertozzi and   
                Piero Olivo and   
              Cristian Zambelli   FLARES: an Aging Aware Algorithm to
                                  Autonomously Adapt the Error Correction
                                  Capability in NAND Flash Memories  . . . 26:1--26:??
        Davide B. Bartolini and   
             Filippo Sironi and   
           Donatella Sciuto and   
          Marco D. Santambrogio   Automated Fine-Grained CPU Provisioning
                                  for Virtual Machines . . . . . . . . . . 27:1--27:??
          Trevor E. Carlson and   
                Wim Heirman and   
              Stijn Eyerman and   
                Ibrahim Hur and   
                Lieven Eeckhout   An Evaluation of High-Level Mechanistic
                                  Core Models  . . . . . . . . . . . . . . 28:1--28:??
              Farrukh Hijaz and   
                      Omer Khan   NUCA-L1: a Non-Uniform Access Latency
                                  Level-1 Cache Architecture for
                                  Multicores Operating at Near-Threshold
                                  Voltages . . . . . . . . . . . . . . . . 29:1--29:??
                Andi Drebes and   
           Karine Heydemann and   
             Nathalie Drach and   
                Antoniu Pop and   
                   Albert Cohen   Topology-Aware and Dependence-Aware
                                  Scheduling and Memory Allocation for
                                  Task-Parallel Languages  . . . . . . . . 30:1--30:??
        Venkata Kalyan Tawa and   
                 Ravi Kasha and   
                   Madhu Mutyam   EFGR: an Enhanced Fine Granularity
                                  Refresh Feature for High-Performance
                                  DDR4 DRAM Devices  . . . . . . . . . . . 31:1--31:??
               Gulay Yalcin and   
                 Oguz Ergin and   
                Emrah Islek and   
          Osman Sabri Unsal and   
                 Adrian Cristal   Exploiting Existing Comparators for
                                  Fine-Grained Low-Cost Error Detection    32:1--32:??
       Pradeep Ramachandran and   
     Siva Kumar Sastry Hari and   
                  Manlap Li and   
                 Sarita V. Adve   Hardware Fault Recovery for I/O
                                  Intensive Applications . . . . . . . . . 33:1--33:??
              Stijn Eyerman and   
             Pierre Michaud and   
                 Wouter Rogiest   Multiprogram Throughput Metrics: a
                                  Systematic Approach  . . . . . . . . . . 34:1--34:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 4, January, 2015

            Cedric Nugteren and   
                 Henk Corporaal   Bones: an Automatic Skeleton-Based
                                  C-to-CUDA Compiler for GPUs  . . . . . . 35:1--35:??
                   Jue Wang and   
               Xiangyu Dong and   
                       Yuan Xie   Building and Optimizing MRAM-Based
                                  Commodity Memories . . . . . . . . . . . 36:1--36:??
         Rakesh Komuravelli and   
             Sarita V. Adve and   
                Ching-Tsun Chou   Revisiting the Complexity of Hardware
                                  Cache Coherence and Some Implications    37:1--37:??
   Gabriel Rodríguez and   
        Juan Touriño and   
             Mahmut T. Kandemir   Volatile STT--RAM Scratchpad Design and
                                  Data Allocation for Low Energy . . . . . 38:1--38:??
  Cristóbal Camarero and   
            Enrique Vallejo and   
           Ramón Beivide   Topological Characterization of Hamming
                                  and Dragonfly Networks and Its
                                  Implications on Routing  . . . . . . . . 39:1--39:??
                Hanbin Yoon and   
                Justin Meza and   
       Naveen Muralimanohar and   
           Norman P. Jouppi and   
                     Onur Mutlu   Efficient Data Mapping and Buffering
                                  Techniques for Multilevel Cell
                                  Phase-Change Memories  . . . . . . . . . 40:1--40:??
Nathanael Prémillieu and   
            André Seznec   Efficient Out-of-Order Execution of
                                  Guarded ISAs . . . . . . . . . . . . . . 41:1--41:??
                 Zheng Wang and   
              Dominik Grewe and   
          Michael F. P. O'Boyle   Automatic and Portable Mapping of Data
                                  Parallel Programs to OpenCL for
                                  GPU-Based Heterogeneous Systems  . . . . 42:1--42:??
                     Dan He and   
                  Fang Wang and   
                 Hong Jiang and   
                   Dan Feng and   
              Jing Ning Liu and   
                   Wei Tong and   
                    Zheng Zhang   Improving Hybrid FTL by Fully Exploiting
                                  Internal SSD Parallelism with Virtual
                                  Blocks . . . . . . . . . . . . . . . . . 43:1--43:??
                  Eri Rubin and   
                   Ely Levy and   
                Amnon Barak and   
                    Tal Ben-Nun   MAPS: Optimizing Massively Parallel
                                  Applications Using Device-Level Memory
                                  Abstraction  . . . . . . . . . . . . . . 44:1--44:??
         Alessandro Cilardo and   
                     Luca Gallo   Improving Multibank Memory Access
                                  Parallelism with Lattice-Based
                                  Partitioning . . . . . . . . . . . . . . 45:1--45:??
       Jan Kasper Martinsen and   
          Håkan Grahn and   
                  Anders Isberg   The Effects of Parameter Tuning in
                                  Software Thread-Level Speculation in
                                  JavaScript Engines . . . . . . . . . . . 46:1--46:??
           Quentin Colombet and   
           Florian Brandner and   
                    Alain Darte   Studying Optimal Spilling in the Light
                                  of SSA . . . . . . . . . . . . . . . . . 47:1--47:??
            Jawad Haj-Yihia and   
             Yosi Ben Asher and   
               Efraim Rotem and   
                Ahmad Yasin and   
                    Ran Ginosar   Compiler-Directed Power Management for
                                  Superscalars . . . . . . . . . . . . . . 48:1--48:??
            Hong-Phuc Trinh and   
              Marc Duranton and   
             Michel Paindavoine   Efficient Data Encoding for
                                  Convolutional Neural Network application 49:1--49:??
       Maximilien B. Breugh and   
              Stijn Eyerman and   
                Lieven Eeckhout   Mechanistic Analytical Modeling of
                                  Superscalar In-Order Processor
                                  Performance  . . . . . . . . . . . . . . 50:1--50:??
             Vivek Seshadri and   
             Samihan Yedkar and   
                 Hongyi Xin and   
                 Onur Mutlu and   
         Phillip B. Gibbons and   
          Michael A. Kozuch and   
                  Todd C. Mowry   Mitigating Prefetcher-Caused Pollution
                                  Using Informed Caching Policies for
                                  Prefetched Blocks  . . . . . . . . . . . 51:1--51:??
             George Matheou and   
           Paraskevas Evripidou   Architectural Support for Data-Driven
                                  Execution  . . . . . . . . . . . . . . . 52:1--52:??
                 Amir Morad and   
              Leonid Yavits and   
                    Ran Ginosar   GP--SIMD Processing-in-Memory  . . . . . 53:1--53:??
              Thomas Schaub and   
                 Simon Moll and   
            Ralf Karrenberg and   
                 Sebastian Hack   The Impact of the SIMD Width on
                                  Control-Flow and Memory Divergence . . . 54:1--54:??
               Zhenman Fang and   
               Sanyam Mehta and   
              Pen-Chung Yew and   
               Antonia Zhai and   
             James Greensky and   
            Gautham Beeraka and   
                     Binyu Zang   Measuring Microarchitectural Details of
                                  Multi- and Many-Core Memory Systems
                                  through Microbenchmarking  . . . . . . . 55:1--55:??
              Chi Ching Chi and   
      Mauricio Alvarez-Mesa and   
                   Ben Juurlink   Low-Power High-Efficiency Video Decoding
                                  using General-Purpose Processors . . . . 56:1--56:??
             Fabio Luporini and   
       Ana Lucia Varbanescu and   
          Florian Rathgeber and   
     Gheorghe-Teodor Bercea and   
               J. Ramanujam and   
               David A. Ham and   
               Paul H. J. Kelly   Cross-Loop Optimization of Arithmetic
                                  Intensity for Finite Element Local
                                  Assembly . . . . . . . . . . . . . . . . 57:1--57:??
                  Xing Zhou and   
María J. Garzarán and   
                 David A. Padua   Optimal Parallelogram Selection for
                                  Hierarchical Tiling  . . . . . . . . . . 58:1--58:??
                 Leo Porter and   
      Michael A. Laurenzano and   
              Ananta Tiwari and   
                 Adam Jundt and   
       William A. Ward, Jr. and   
               Roy Campbell and   
               Laura Carrington   Making the Most of SMT in HPC: System-
                                  and Application-Level Perspectives . . . 59:1--59:??
                   Xin Tong and   
             Toshihiko Koju and   
          Motohiro Kawahito and   
               Andreas Moshovos   Optimizing Memory Translation Emulation
                                  in Full System Emulators . . . . . . . . 60:1--60:??
                Martin Kong and   
                Antoniu Pop and   
    Louis-Noël Pouchet and   
            R. Govindarajan and   
               Albert Cohen and   
                  P. Sadayappan   Compiler/Runtime Framework for Dynamic
                                  Dataflow Parallelization of Tiled
                                  Programs . . . . . . . . . . . . . . . . 61:1--61:??
              Nicolas Melot and   
          Christoph Kessler and   
           Jörg Keller and   
           Patrick Eitschberger   Fast Crown Scheduling Heuristics for
                                  Energy-Efficient Mapping and Scaling of
                                  Moldable Streaming Tasks on Manycore
                                  Systems  . . . . . . . . . . . . . . . . 62:1--62:??
                Wenjia Ruan and   
                  Yujie Liu and   
                  Michael Spear   Transactional Read-Modify-Write Without
                                  Aborts . . . . . . . . . . . . . . . . . 63:1--63:??
                Zia Ul Huda and   
              Ali Jannesari and   
                     Felix Wolf   Using Template Matching to Infer
                                  Parallel Design Patterns . . . . . . . . 64:1--64:??
                Heiner Litz and   
            Ricardo J. Dias and   
              David R. Cheriton   Efficient Correction of Anomalies in
                                  Snapshot Isolation Transactions  . . . . 65:1--65:??
              Helge Bahmann and   
             Nico Reissmann and   
               Magnus Jahre and   
            Jan Christian Meyer   Perfect Reconstructability of Control
                                  Flow from Demand Dependence Graphs . . . 66:1--66:??
            Venmugil Elango and   
            Naser Sedaghati and   
           Fabrice Rastello and   
    Louis-Noël Pouchet and   
               J. Ramanujam and   
            Radu Teodorescu and   
                  P. Sadayappan   On Using the Roofline Model with Lower
                                  Bounds on Data Movement  . . . . . . . . 67:1--67:??
                      Anonymous   List of Distinguished Reviewers ACM TACO
                                  2014 . . . . . . . . . . . . . . . . . . 68:1--68:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 1, April, 2015

         Christopher Zimmer and   
                  Frank Mueller   NoCMsg: a Scalable Message-Passing
                                  Abstraction for Network-on-Chips . . . . 1:1--1:??
           Beayna Grigorian and   
                  Glenn Reinman   Accelerating Divergent Applications on
                                  SIMD Architectures Using Neural Networks 2:1--2:??
                 Anup Holey and   
             Vineeth Mekkat and   
              Pen-Chung Yew and   
                   Antonia Zhai   Performance-Energy Considerations for
                                  Shared Cache Management in a
                                  Heterogeneous Multicore Processor  . . . 3:1--3:??
                  Jinho Suh and   
           Chieh-Ting Huang and   
                  Michel Dubois   Dynamic MIPS Rate Stabilization for
                                  Complex Processors . . . . . . . . . . . 4:1--4:??
             Naghmeh Karimi and   
    Arun Karthik Kanuparthi and   
               Xueyang Wang and   
            Ozgur Sinanoglu and   
                   Ramesh Karri   MAGIC: Malicious Aging in Circuits/Cores 5:1--5:??
   Pablo De Oliveira Castro and   
                 Chadi Akel and   
                 Eric Petit and   
               Mihail Popov and   
                  William Jalby   CERE: LLVM-Based Codelet Extractor and
                                  REplayer for Piecewise Benchmarking and
                                  Optimization . . . . . . . . . . . . . . 6:1--6:??
         Benedict R. Gaster and   
                Derek Hower and   
                      Lee Howes   HRF-Relaxed: Adapting HRF to the
                                  Complexities of Industrial Heterogeneous
                                  Memory Models  . . . . . . . . . . . . . 7:1--7:??
               Kevin Streit and   
          Johannes Doerfert and   
          Clemens Hammacher and   
             Andreas Zeller and   
                 Sebastian Hack   Generalized Task Parallelism . . . . . . 8:1--8:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 2, July, 2015

               Hamed Tabkhi and   
                 Gunar Schirner   A Joint SW/HW Approach for Reducing
                                  Register File Vulnerability  . . . . . . 9:1--9:??
            Arun Kanuparthi and   
                   Ramesh Karri   Reliable Integrity Checking in Multicore
                                  Processors . . . . . . . . . . . . . . . 10:1--10:??
                Do-Heon Lee and   
              Su-Kyung Yoon and   
              Jung-Geun Kim and   
           Charles C. Weems and   
                   Shin-Dug Kim   A New Memory-Disk Integrated System with
                                  HW Optimizer . . . . . . . . . . . . . . 11:1--11:??
 Morteza Mohajjel Kafshdooz and   
                 Alireza Ejlali   Dynamic Shared SPM Reuse for Real-Time
                                  Multicore Embedded Systems . . . . . . . 12:1--12:??
                 Wenhao Jia and   
                 Elba Garza and   
              Kelly A. Shaw and   
             Margaret Martonosi   GPU Performance and Power Tuning Using
                                  Regression Trees . . . . . . . . . . . . 13:1--13:??
          Irshad Pananilath and   
            Aravind Acharya and   
              Vinay Vasista and   
                Uday Bondhugula   An Optimizing Code Generator for a Class
                                  of Lattice-Boltzmann Computations  . . . 14:1--14:??
              Shuangde Fang and   
                  Wenwen Xu and   
                  Yang Chen and   
            Lieven Eeckhout and   
              Olivier Temam and   
                 Yunji Chen and   
               Chengyong Wu and   
                  Xiaobing Feng   Practical Iterative Optimization for the
                                  Data Center  . . . . . . . . . . . . . . 15:1--15:??
                  Tao Zhang and   
               Naifeng Jing and   
              Kaiming Jiang and   
                    Wei Shu and   
                 Min-You Wu and   
                  Xiaoyao Liang   Buddy SM: Sharing Pipeline Front-End for
                                  Improved Energy Efficiency in GPGPUs . . 16:1--16:??
           Hsiang-Yun Cheng and   
               Matt Poremba and   
             Narges Shahidi and   
                Ivan Stalev and   
            Mary Jane Irwin and   
            Mahmut Kandemir and   
               Jack Sampson and   
                       Yuan Xie   EECache: a Comprehensive Study on the
                                  Architectural Design for
                                  Energy-Efficient Last-Level Caches in
                                  Chip Multiprocessors . . . . . . . . . . 17:1--17:??
               Arjun Suresh and   
    Bharath Narasimha Swamy and   
                Erven Rohou and   
            André Seznec   Intercepting Functions for Memoization:
                                  a Case Study Using Transcendental
                                  Functions  . . . . . . . . . . . . . . . 18:1--18:??
           Chung-Hsiang Lin and   
                 De-Yu Shen and   
               Yi-Jung Chen and   
              Chia-Lin Yang and   
        Cheng-Yuan Michael Wang   SECRET: a Selective Error Correction
                                  Framework for Refresh Energy Reduction
                                  in DRAMs . . . . . . . . . . . . . . . . 19:1--19:??
                 Doug Simon and   
           Christian Wimmer and   
             Bernhard Urban and   
             Gilles Duboscq and   
              Lukas Stadler and   
         Thomas Würthinger   Snippets: Taking the High Road to a Low
                                  Level  . . . . . . . . . . . . . . . . . 20:1--20:??
 Raghuraman Balasubramanian and   
            Vinay Gangadhar and   
                Ziliang Guo and   
                Chen-Han Ho and   
              Cherin Joseph and   
          Jaikrishnan Menon and   
        Mario Paulo Drumond and   
                 Robin Paul and   
             Sharath Prasad and   
            Pradip Valathol and   
      Karthikeyan Sankaralingam   Enabling GPGPU Low-Level Hardware
                                  Explorations with MIAOW: an Open-Source
                                  RTL Implementation of a GPGPU  . . . . . 21:1--21:??
                  Quan Chen and   
                      Minyi Guo   Locality-Aware Work Stealing Based on
                                  Online Profiling and Auto-Tuning for
                                  Multisocket Multicore Architectures  . . 22:1--22:??
                  Madan Das and   
           Gabriel Southern and   
                     Jose Renau   Section-Based Program Analysis to Reduce
                                  Overhead of Detecting Unsynchronized
                                  Thread Communication . . . . . . . . . . 23:1--23:??
                Atieh Lotfi and   
               Abbas Rahimi and   
                Luca Benini and   
                Rajesh K. Gupta   Aging-Aware Compilation for GP-GPUs  . . 24:1--24:??
           Brian P. Railing and   
               Eric R. Hein and   
                Thomas M. Conte   Contech: Efficiently Generating Dynamic
                                  Task Graphs for Arbitrary Parallel
                                  Programs . . . . . . . . . . . . . . . . 25:1--25:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 3, October, 2015

              Mahdad Davari and   
                Alberto Ros and   
             Erik Hagersten and   
               Stefanos Kaxiras   The Effects of Granularity and
                                  Adaptivity on Private/Shared
                                  Classification for Coherence . . . . . . 26:1--26:??
              Mark Gottscho and   
       Abbas BanaiyanMofrad and   
                 Nikil Dutt and   
               Alex Nicolau and   
                   Puneet Gupta   DPCS: Dynamic Power/Capacity Scaling for
                                  SRAM Caches in the Nanoscale Era . . . . 27:1--27:??
             Pierre Michaud and   
            Andrea Mondelli and   
            André Seznec   Revisiting Clustered Microarchitecture
                                  for Future Superscalar Cores: a Case for
                                  Wide Issue Clusters  . . . . . . . . . . 28:1--28:??
       Ragavendra Natarajan and   
                   Antonia Zhai   Leveraging Transactional Execution for
                                  Memory Consistency Model Emulation . . . 29:1--29:??
          Biswabandan Panda and   
           Shankar Balachandran   CAFFEINE: a Utility-Driven Prefetcher
                                  Aggressiveness Engine for Multicores . . 30:1--30:??
                Jishen Zhao and   
                   Sheng Li and   
              Jichuan Chang and   
              John L. Byrne and   
           Laura L. Ramirez and   
                  Kevin Lim and   
                   Yuan Xie and   
               Paolo Faraboschi   Buri: Scaling Big-Memory Computing with
                                  Hardware-Based Memory Expansion  . . . . 31:1--31:??
                  Jan Lucas and   
           Michael Andersch and   
      Mauricio Alvarez-Mesa and   
                   Ben Juurlink   Spatiotemporal SIMT and Scalarization
                                  for Improving GPU Efficiency . . . . . . 32:1--32:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 4, January, 2016

               Subhasis Das and   
              Tor M. Aamodt and   
               William J. Dally   Reuse Distance-Based Probabilistic Cache
                                  Replacement  . . . . . . . . . . . . . . 33:1--33:??
                 Etem Deniz and   
                      Alper Sen   MINIME-GPU: Multicore Benchmark
                                  Synthesizer for GPUs . . . . . . . . . . 34:1--34:??
                     Li Tan and   
               Zizhong Chen and   
             Shuaiwen Leon Song   Scalable Energy Efficiency with
                                  Resilience for High Performance
                                  Computing Systems: a Quantitative
                                  Methodology  . . . . . . . . . . . . . . 35:1--35:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   Tumbler: an Effective Load-Balancing
                                  Technique for Multi-CPU Multicore
                                  Systems  . . . . . . . . . . . . . . . . 36:1--36:??
                Erik Tomusk and   
          Christophe Dubach and   
                Michael O'Boyle   Four Metrics to Evaluate Heterogeneous
                                  Multicores . . . . . . . . . . . . . . . 37:1--37:??
        Morteza Hoseinzadeh and   
          Mohammad Arjomand and   
             Hamid Sarbazi-Azad   SPCM: The Striped Phase Change Memory    38:1--38:??
              Chuntao Jiang and   
                  Zhibin Yu and   
            Lieven Eeckhout and   
                    Hai Jin and   
               Xiaofei Liao and   
                  Chengzhong Xu   Two-Level Hybrid Sampled Simulation of
                                  Multithreaded Applications . . . . . . . 39:1--39:??
            Sandeep D'souza and   
                  Soumya J. and   
          Santanu Chattopadhyay   Integrated Mapping and Synthesis
                                  Techniques for Network-on-Chip
                                  Topologies with Express Channels . . . . 40:1--40:??
         Dimitrios Chasapis and   
                 Marc Casas and   
       Miquel Moretó and   
                 Raul Vidal and   
      Eduard Ayguadé and   
       Jesús Labarta and   
                   Mateo Valero   PARSECSs: Evaluating the Impact of Task
                                  Parallelism in the PARSEC Benchmark
                                  Suite  . . . . . . . . . . . . . . . . . 41:1--41:??
           Francisco Gaspar and   
         Luis Taniça and   
         Pedro Tomás and   
            Aleksandar Ilic and   
                   Leonel Sousa   A Framework for Application-Guided Task
                                  Management on Heterogeneous Embedded
                                  Systems  . . . . . . . . . . . . . . . . 42:1--42:??
         Ehsan K. Ardestani and   
  Rafael Trapani Possignolo and   
             Jose Luis Briz and   
                     Jose Renau   Managing Mismatches in Voltage Stacking
                                  with CoreUnfolding . . . . . . . . . . . 43:1--43:??
           Prashant J. Nair and   
           David A. Roberts and   
           Moinuddin K. Qureshi   FaultSim: a Fast, Configurable
                                  Memory-Reliability Simulator for
                                  Conventional and $3$D-Stacked Systems    44:1--44:??
                Byeongcheol Lee   Adaptive Correction of Sampling Bias in
                                  Dynamic Call Graphs  . . . . . . . . . . 45:1--45:??
        Andrew J. Mcpherson and   
            Vijay Nagarajan and   
              Susmit Sarkar and   
                 Marcelo Cintra   Fence Placement for Legacy
                                  Data-Race-Free Programs via
                                  Synchronization Read Detection . . . . . 46:1--46:??
             Ding-Yong Hong and   
              Chun-Chen Hsu and   
              Cheng-Yi Chou and   
              Wei-Chung Hsu and   
               Pangfeng Liu and   
                     Jan-Jan Wu   Optimizing Control Transfer and Memory
                                  Virtualization in Full System Emulators  47:1--47:??
    Aravind Sukumaran-Rajam and   
                Philippe Clauss   The Polyhedral Model of Nonlinear Loops  48:1--48:??
           Prashant J. Nair and   
           David A. Roberts and   
           Moinuddin K. Qureshi   Citadel: Efficiently Protecting Stacked
                                  Memory from TSV and Large Granularity
                                  Failures . . . . . . . . . . . . . . . . 49:1--49:??
            Andrew Anderson and   
              Avinash Malik and   
                    David Gregg   Automatic Vectorization of Interleaved
                                  Data Revisited . . . . . . . . . . . . . 50:1--50:??
                Lihang Zhao and   
               Lizhong Chen and   
                Woojin Choi and   
                 Jeffrey Draper   A Filtering Mechanism to Reduce Network
                                  Bandwidth Utilization of Transaction
                                  Execution  . . . . . . . . . . . . . . . 51:1--51:??
             Olivier Serres and   
              Abdullah Kayi and   
                Ahmad Anbar and   
               Tarek El-Ghazawi   Enabling PGAS Productivity with Hardware
                                  Support for Shared Address Mapping: a
                                  UPC Case Study . . . . . . . . . . . . . 52:1--52:??
          Riccardo Cattaneo and   
            Giuseppe Natale and   
            Carlo Sicignano and   
           Donatella Sciuto and   
    Marco Domenico Santambrogio   On How to Accelerate Iterative Stencil
                                  Loops: a Scalable Streaming-Based
                                  Approach . . . . . . . . . . . . . . . . 53:1--53:??
             Unnikrishnan C and   
               Rupesh Nasre and   
                  Y. N. Srikant   Falcon: a Graph Manipulation Language
                                  for Heterogeneous Systems  . . . . . . . 54:1--54:??
       Rajshekar Kalayappan and   
              Smruti R. Sarangi   FluidCheck: a Redundant Threading-Based
                                  Approach for Reliable Execution in
                                  Manycore Processors  . . . . . . . . . . 55:1--55:??
               Jesse Elwell and   
                 Ryan Riley and   
          Nael Abu-Ghazaleh and   
           Dmitry Ponomarev and   
               Iliano Cervesato   Rethinking Memory Permissions for
                                  Protection Against Cross-Layer Attacks   56:1--56:??
                 Amir Morad and   
              Leonid Yavits and   
           Shahar Kvatinsky and   
                    Ran Ginosar   Resistive GP-SIMD Processing-In-Memory   57:1--57:??
                Yaohua Wang and   
                  Dong Wang and   
               Shuming Chen and   
                Zonglin Liu and   
             Shenggang Chen and   
               Xiaowen Chen and   
                        Xu Zhou   Iteration Interleaving--Based SIMD Lane
                                  Partition  . . . . . . . . . . . . . . . 58:1--58:??
        Tomi Äijö and   
Pekka Jääskeläinen and   
               Tapio Elomaa and   
             Heikki Kultala and   
                   Jarmo Takala   Integer Linear Programming-Based
                                  Scheduling for Transport Triggered
                                  Architectures  . . . . . . . . . . . . . 59:1--59:??
                 Qixiao Liu and   
              Miquel Moreto and   
               Jaume Abella and   
       Francisco J. Cazorla and   
          Daniel A. Jimenez and   
                   Mateo Valero   Sensible Energy Accounting with Abstract
                                  Metering for Multicore Systems . . . . . 60:1--60:??
                  Miao Zhou and   
                      Yu Du and   
             Bruce Childers and   
               Daniel Mosse and   
                    Rami Melhem   Symmetry-Agnostic Coordinated Management
                                  of the Memory Hierarchy in Multicore
                                  Systems  . . . . . . . . . . . . . . . . 61:1--61:??
          Amir Yazdanbakhsh and   
         Gennady Pekhimenko and   
           Bradley Thwaites and   
          Hadi Esmaeilzadeh and   
                 Onur Mutlu and   
                  Todd C. Mowry   RFVP: Rollback-Free Value Prediction
                                  with Safe-to-Approximate Loads . . . . . 62:1--62:??
               Donghyuk Lee and   
              Saugata Ghose and   
         Gennady Pekhimenko and   
                Samira Khan and   
                     Onur Mutlu   Simultaneous Multi-Layer Access:
                                  Improving $3$D-Stacked Memory Bandwidth
                                  at Low Cost  . . . . . . . . . . . . . . 63:1--63:??
                   Yeoul Na and   
              Seon Wook Kim and   
                   Youngsun Han   JavaScript Parallelizing Compiler for
                                  Exploiting Parallelism from
                                  Data-Parallel HTML5 Applications . . . . 64:1--64:??
              Hiroyuki Usui and   
        Lavanya Subramanian and   
        Kevin Kai-Wei Chang and   
                     Onur Mutlu   DASH: Deadline-Aware High-Performance
                                  Memory Scheduler for Heterogeneous
                                  Systems with Hardware Accelerators . . . 65:1--65:??
 Morteza Mohajjel Kafshdooz and   
        Mohammadkazem Taram and   
              Sepehr Assadi and   
                 Alireza Ejlali   A Compile-Time Optimization Method for
                                  WCET Reduction in Real-Time Embedded
                                  Systems through Block Formation  . . . . 66:1--66:25

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 1, April, 2016

        Konstantinos Koukos and   
                Alberto Ros and   
             Erik Hagersten and   
               Stefanos Kaxiras   Building Heterogeneous Unified Virtual
                                  Memories (UVMs) without the Overhead . . 1:1--1:22
               Zhigang Wang and   
               Xiaolin Wang and   
                   Fang Hou and   
                Yingwei Luo and   
                   Zhenlin Wang   Dynamic Memory Balancing for
                                  Virtualization . . . . . . . . . . . . . 2:1--2:??
               Xueyang Wang and   
                   Sek Chai and   
            Michael Isnardi and   
                 Sehoon Lim and   
                   Ramesh Karri   Hardware Performance Counter-Based
                                  Malware Identification and Detection
                                  with Adaptive Compressive Sensing  . . . 3:1--3:??
               Shoaib Akram and   
         Jennifer B. Sartor and   
        Kenzo Van Craeynest and   
                Wim Heirman and   
                Lieven Eeckhout   Boosting the Priority of Garbage:
                                  Scheduling Collection on Heterogeneous
                                  Multicore Processors . . . . . . . . . . 4:1--4:??
                Buse Yilmaz and   
              Baris Aktemur and   
MaríA J. Garzarán and   
                  Sam Kamin and   
            Furkan Kiraç   Autotuning Runtime Specialization for
                                  Sparse Matrix-Vector Multiplication  . . 5:1--5:??
              Mingzhou Zhou and   
                      Bo Wu and   
                Xipeng Shen and   
                Yaoqing Gao and   
                     Graham Yiu   Examining and Reducing the Influence of
                                  Sampling Errors on Feedback-Driven
                                  Optimizations  . . . . . . . . . . . . . 6:1--6:??
           Amanieu D'antras and   
            Cosmin Gorgovan and   
                Jim Garside and   
             Mikel Luján   Optimizing Indirect Branches in Dynamic
                                  Binary Translators . . . . . . . . . . . 7:1--7:??
         Luiz G. A. Martins and   
              Ricardo Nobre and   
  João M. P. Cardoso and   
     Alexandre C. B. Delbem and   
                Eduardo Marques   Clustering-Based Selection for the
                                  Exploration of Compiler Optimization
                                  Sequences  . . . . . . . . . . . . . . . 8:1--8:??
       Sang Wook Stephen Do and   
                  Michel Dubois   Power Efficient Hardware Transactional
                                  Memory: Dynamic Issue of Transactions    9:1--9:??
          Dmitry Evtyushkin and   
           Dmitry Ponomarev and   
              Nael Abu-Ghazaleh   Understanding and Mitigating Covert
                                  Channels Through Branch Predictors . . . 10:1--10:??
                   Hao Zhou and   
                   Jingling Xue   A Compiler Approach for Exploiting
                                  Partial SIMD Parallelism . . . . . . . . 11:1--11:??
     Gert-Jan Van Den Braak and   
                 Henk Corporaal   R-GPU: a Reconfigurable GPU Architecture 12:1--12:??
                   Peng Liu and   
                  Jiyang Yu and   
               Michael C. Huang   Thread-Aware Adaptive Prefetcher on
                                  Multicore Systems: Improving the
                                  Performance for Multithreaded Workloads  13:1--13:??
            Cosmin Gorgovan and   
           Amanieu D'antras and   
             Mikel Luján   MAMBO: a Low-Overhead Dynamic Binary
                                  Modification Tool for ARM  . . . . . . . 14:1--14:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 2, June, 2016

      Panagiotis Theocharis and   
                Bjorn De Sutter   A Bimodal Scheduler for Coarse-Grained
                                  Reconfigurable Arrays  . . . . . . . . . 15:1--15:??
                Ahmad Anbar and   
             Olivier Serres and   
         Engin Kayraklioglu and   
     Abdel-Hameed A. Badawy and   
               Tarek El-Ghazawi   Exploiting Hierarchical Locality in Deep
                                  Parallel Architectures . . . . . . . . . 16:1--16:??
Cecilia González-álvarez and   
         Jennifer B. Sartor and   
      Carlos Álvarez and   
Daniel Jiménez-González and   
                Lieven Eeckhout   MInGLE: an Efficient Framework for
                                  Domain Acceleration Using Low-Power
                                  Specialized Functional Units . . . . . . 17:1--17:??
        Christian Andreetta and   
        Vivien Bégot and   
              Jost Berthold and   
              Martin Elsman and   
             Fritz Henglein and   
           Troels Henriksen and   
         Maj-Britt Nordfang and   
               Cosmin E. Oancea   FinPar: a Parallel Financial Benchmark   18:1--18:??
    Mickaël Dardaillon and   
              Kevin Marquet and   
              Tanguy Risset and   
 Jérôme Martin and   
           Henri-Pierre Charles   A New Compilation Flow for
                                  Software-Defined Radio Applications on
                                  Heterogeneous MPSoCs . . . . . . . . . . 19:1--19:??
               Jianwei Liao and   
     François Trahay and   
                  Guoqiang Xiao   Dynamic Process Migration Based on Block
                                  Access Patterns Occurring in Storage
                                  Servers  . . . . . . . . . . . . . . . . 20:1--20:??
       Amir Hossein Ashouri and   
           Giovanni Mariani and   
           Gianluca Palermo and   
               Eunjung Park and   
               John Cavazos and   
               Cristina Silvano   COBAYN: Compiler Autotuning Framework
                                  Using Bayesian Networks  . . . . . . . . 21:1--21:??
         Kypros Chrysanthou and   
      Panayiotis Englezakis and   
          Andreas Prodromou and   
            Andreas Panteli and   
    Chrysostomos Nicopoulos and   
         Yiannakis Sazeides and   
        Giorgos Dimitrakopoulos   An Online and Real-Time Fault Detection
                                  and Localization Mechanism for
                                  Network-on-Chip Architectures  . . . . . 22:1--22:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 3, September, 2016

               Sanyam Mehta and   
                  Pen-Chung Yew   Variable Liberalization  . . . . . . . . 23:1--23:??
             Hsing-Min Chen and   
             Carole-Jean Wu and   
               Trevor Mudge and   
           Chaitali Chakrabarti   RATT-ECC: Rate Adaptive Two-Tiered Error
                                  Correction Codes for Reliable $3$D
                                  Die-Stacked Memory . . . . . . . . . . . 24:1--24:??
                Wenjie Chen and   
                Zhibin Wang and   
                     Qin Wu and   
              Jiuzhen Liang and   
                    Zhilei Chai   Implementing Dense Optical Flow
                                  Computation on a Heterogeneous FPGA SoC
                                  in C . . . . . . . . . . . . . . . . . . 25:1--25:??
                Nilay Vaish and   
          Michael C. Ferris and   
                  David A. Wood   Optimization Models for Three On-Chip
                                  Network Problems . . . . . . . . . . . . 26:1--26:??
          Somayeh Sardashti and   
               Andre Seznec and   
                  David A. Wood   Yet Another Compressed Cache: a Low-Cost
                                  Yet Effective Compressed Cache . . . . . 27:1--27:??
         Eduardo H. M. Cruz and   
            Matthias Diener and   
    Laércio L. Pilla and   
          Philippe O. A. Navaux   Hardware-Assisted Thread and Data
                                  Mapping in Hierarchical Multicore
                                  Architectures  . . . . . . . . . . . . . 28:1--28:??
             Almutaz Adileh and   
              Stijn Eyerman and   
               Aamer Jaleel and   
                Lieven Eeckhout   Maximizing Heterogeneous Processor
                                  Performance Under Power Constraints  . . 29:1--29:??
               Bagus Wibowo and   
            Abhinav Agrawal and   
             Thomas Stanton and   
                     James Tuck   An Accurate Cross-Layer Approach for
                                  Online Architectural Vulnerability
                                  Estimation . . . . . . . . . . . . . . . 30:1--30:??
                  Manuel Acacio   List of Distinguished Reviewers ACM TACO
                                  2014 . . . . . . . . . . . . . . . . . . 31:1--31:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 4, December, 2016

                 Keval Vora and   
                Rajiv Gupta and   
                     Guoqing Xu   Synergistic Analysis of Evolving Graphs  32:1--32:??
              Yunquan Zhang and   
                 Shigang Li and   
                Shengen Yan and   
                   Huiyang Zhou   A Cross-Platform SpMV Framework on
                                  Many-Core Architectures  . . . . . . . . 33:1--33:??
                Junwhan Ahn and   
                Sungjoo Yoo and   
                   Kiyoung Choi   AIM: Energy-Efficient Aggregation Inside
                                  the Memory Hierarchy . . . . . . . . . . 34:1--34:??
        Amir Kavyan Ziabari and   
                  Yifan Sun and   
                   Yenai Ma and   
                 Dana Schaa and   
José L. Abellán and   
                Rafael Ubal and   
                   John Kim and   
                 Ajay Joshi and   
                    David Kaeli   UMH: a Hardware-Based Unified Memory
                                  Hierarchy for Systems with Multiple
                                  Discrete GPUs  . . . . . . . . . . . . . 35:1--35:??
                  Tom Spink and   
             Harry Wagstaff and   
              Björn Franke   Hardware-Accelerated Cross-Architecture
                                  Full-System Virtualization . . . . . . . 36:1--36:??
              Qingchuan Shi and   
              George Kurian and   
              Farrukh Hijaz and   
           Srinivas Devadas and   
                      Omer Khan   LDAC: Locality-Aware Data Access Control
                                  for Large-Scale Multicore Cache
                                  Hierarchies  . . . . . . . . . . . . . . 37:1--37:??
         Fernando Fernandes and   
               Lucas Weigel and   
               Claudio Jung and   
            Philippe Navaux and   
                Luigi Carro and   
                     Paolo Rech   Evaluation of Histogram of Oriented
                                  Gradients Soft Errors Criticality for
                                  Automotive Applications  . . . . . . . . 38:1--38:??
             Saumay Dublish and   
            Vijay Nagarajan and   
                   Nigel Topham   Cooperative Caching for GPUs . . . . . . 39:1--39:??
      Nikolaos Tampouratzis and   
       Pavlos M. Mattheakis and   
         Ioannis Papaefstathiou   Accelerating Intercommunication in
                                  Highly Parallel Systems  . . . . . . . . 40:1--40:??
               Hyukwoo Park and   
                Myungsu Cha and   
                  Soo-Mook Moon   Concurrent JavaScript Parsing for Faster
                                  Loading of Web Apps  . . . . . . . . . . 41:1--41:??
            Dongliang Xiong and   
                  Kai Huang and   
              Xiaowen Jiang and   
                   Xiaolang Yan   Memory Access Scheduling Based on
                                  Dynamic Multilevel Priority in Shared
                                  DRAM Systems . . . . . . . . . . . . . . 42:1--42:??
           Daniele De Sensi and   
           Massimo Torquati and   
                Marco Danelutto   A Reconfiguration Algorithm for
                                  Power-Aware Parallel Applications  . . . 43:1--43:??
           Michael R. Jantz and   
        Forrest J. Robinson and   
             Prasad A. Kulkarni   Impact of Intrinsic Profiling
                                  Limitations on Effectiveness of Adaptive
                                  Optimizations  . . . . . . . . . . . . . 44:1--44:??
            Marvin Damschen and   
                 Lars Bauer and   
               Jörg Henkel   Extending the WCET Problem to Optimize
                                  for Runtime-Reconfigurable Processors    45:1--45:??
                   Zheng Li and   
                  Fang Wang and   
                   Dan Feng and   
                     Yu Hua and   
               Jingning Liu and   
                       Wei Tong   MaxPB: Accelerating PCM Write by
                                  Maximizing the Power Budget Utilization  46:1--46:??
        Saurav Muralidharan and   
            Michael Garland and   
            Albert Sidelnik and   
                      Mary Hall   Designing a Tunable Nested Data-Parallel
                                  Programming System . . . . . . . . . . . 47:1--47:??
              Ismail Akturk and   
                 Riad Akram and   
    Mohammad Majharul Islam and   
           Abdullah Muzahid and   
               Ulya R. Karpuzcu   Accuracy Bugs: a New Class of
                                  Concurrency Bugs to Exploit Algorithmic
                                  Noise Tolerance  . . . . . . . . . . . . 48:1--48:??
                Erik Tomusk and   
          Christophe Dubach and   
                Michael O'Boyle   Selecting Heterogeneous Cores for
                                  Diversity  . . . . . . . . . . . . . . . 49:1--49:??
                 Pierre Michaud   Some Mathematical Facts About Optimal
                                  Cache Replacement  . . . . . . . . . . . 50:1--50:??
                 Wenlei Bao and   
              Changwan Hong and   
           Sudheer Chunduri and   
      Sriram Krishnamoorthy and   
    Louis-Noël Pouchet and   
           Fabrice Rastello and   
                  P. Sadayappan   Static and Dynamic Frequency Scaling on
                                  Multicore CPUs . . . . . . . . . . . . . 51:1--51:??
              Tiago M. Vale and   
       João A. Silva and   
            Ricardo J. Dias and   
 João M. Lourenço   Pot: Deterministic Transactional
                                  Execution  . . . . . . . . . . . . . . . 52:1--52:??
                Zhonghai Lu and   
                       Yuan Yao   Aggregate Flow-Based Performance
                                  Fairness in CMPs . . . . . . . . . . . . 53:1--53:??
                Yigit Demir and   
              Nikos Hardavellas   Energy-Proportional Photonic
                                  Interconnects  . . . . . . . . . . . . . 54:1--54:??
            Mehmet Can Kurt and   
      Sriram Krishnamoorthy and   
              Gagan Agrawal and   
                        Bin Ren   User-Assisted Store Recycling for
                                  Dynamic Task Graph Schedulers  . . . . . 55:1--55:??
            Jawad Haj-Yihia and   
                Ahmad Yasin and   
             Yosi Ben Asher and   
                  Avi Mendelson   Fine-Grain Power Breakdown of Modern
                                  Out-of-Order Cores and Its Implications
                                  on Skylake-Based Systems . . . . . . . . 56:1--56:??
            Alberto Scolari and   
   Davide Basilio Bartolini and   
    Marco Domenico Santambrogio   A Software Cache Partitioning System for
                                  Hash-Based Caches  . . . . . . . . . . . 57:1--57:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 1, April, 2017

               Lev Mukhanov and   
          Pavlos Petoumenos and   
                 Zheng Wang and   
            Nikos Parasyris and   
  Dimitrios S. Nikolopoulos and   
      Bronis R. De Supinski and   
                   Hugh Leather   ALEA: a Fine-Grained Energy Profiling
                                  Tool . . . . . . . . . . . . . . . . . . 1:1--1:??
              Anuj Pathania and   
 Vanchinathan Venkataramani and   
          Muhammad Shafique and   
               Tulika Mitra and   
               Jörg Henkel   Defragmentation of Tasks in Many-Core
                                  Architecture . . . . . . . . . . . . . . 2:1--2:??
            Darko Zivanovic and   
             Milan Pavlovic and   
            Milan Radulovic and   
              Hyunsung Shin and   
                Jongpil Son and   
             Sally A. Mckee and   
          Paul M. Carpenter and   
         Petar Radojkovi\'c and   
          Eduard Ayguadé   Main Memory in HPC: Do We Need More or
                                  Could We Live with Less? . . . . . . . . 3:1--3:??
             Wenguang Zheng and   
                     Hui Wu and   
                      Qing Yang   WCET-Aware Dynamic I-Cache Locking for a
                                  Single Task  . . . . . . . . . . . . . . 4:1--4:??
             Byung-Sun Yang and   
                Jae-Yun Kim and   
                  Soo-Mook Moon   Exceptionization: a Java VM Optimization
                                  for Non-Java Languages . . . . . . . . . 5:1--5:??
               Rathijit Sen and   
                  David A. Wood   Pareto Governors for Energy-Optimal
                                  Computing  . . . . . . . . . . . . . . . 6:1--6:??
           Mainak Chaudhuri and   
             Mukesh Agrawal and   
                Jayesh Gaur and   
           Sreenivas Subramoney   Micro-Sector Cache: Improving Space
                                  Utilization in Sectored DRAM Caches  . . 7:1--7:??
          Kyriakos Georgiou and   
             Steve Kerrison and   
           Zbigniew Chamski and   
                   Kerstin Eder   Energy Transparency for Deeply Embedded
                                  Programs . . . . . . . . . . . . . . . . 8:1--8:??
               Pengcheng Li and   
                  Xiaoyu Hu and   
                  Dong Chen and   
                Jacob Brock and   
                    Hao Luo and   
              Eddy Z. Zhang and   
                      Chen Ding   LD: Low-Overhead GPU Race Detection
                                  Without Access Monitoring  . . . . . . . 9:1--9:??
     Poovaiah M. Palangappa and   
                Kartik Mohanram   CompEx++: Compression-Expansion Coding
                                  for Energy, Latency, and Lifetime
                                  Improvements in MLC/TLC NVMs . . . . . . 10:1--10:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 2, July, 2017

                Dongwoo Lee and   
               Sangheon Lee and   
                Soojung Ryu and   
                   Kiyoung Choi   Dirty-Block Tracking in a Direct-Mapped
                                  DRAM Cache with Self-Balancing Dispatch  11:1--11:??
     Konstantinos Parasyris and   
       Vassilis Vassiliadis and   
   Christos D. Antonopoulos and   
               Spyros Lalis and   
                Nikolaos Bellas   Significance-Aware Program Execution on
                                  Unreliable Hardware  . . . . . . . . . . 12:1--12:??
    Gleison Mendonça and   
     Breno Guimarães and   
      Péricles Alves and   
      Márcio Pereira and   
        Guido Araújo and   
Fernando Magno Quintão Pereira   DawnCC: Automatic Annotation for Data
                                  Parallelism and Offloading . . . . . . . 13:1--13:??
     Rajeev Balasubramonian and   
            Andrew B. Kahng and   
       Naveen Muralimanohar and   
                Ali Shafiee and   
              Vaishnav Srinivas   CACTI 7: New Tools for Interconnect
                                  Exploration in Innovative Off-Chip
                                  Memories . . . . . . . . . . . . . . . . 14:1--14:??
            Vishwesh Jatala and   
           Jayvant Anantpur and   
                   Amey Karkare   Scratchpad Sharing in GPUs . . . . . . . 15:1--15:??
                Tae Jun Ham and   
      Juan L. Aragón and   
             Margaret Martonosi   Decoupling Data Supply from Computation
                                  for Latency-Tolerant Communication in
                                  Heterogeneous Architectures  . . . . . . 16:1--16:??
               Milan Stanic and   
              Oscar Palomar and   
              Timothy Hayes and   
              Ivan Ratkovic and   
             Adrian Cristal and   
                Osman Unsal and   
                   Mateo Valero   An Integrated Vector-Scalar Design on an
                                  In-Order ARM Core  . . . . . . . . . . . 17:1--17:??
           Fernando A. Endo and   
              Arthur Perais and   
            André Seznec   On the Interactions Between Value
                                  Prediction and Compiler Optimizations in
                                  the Context of EOLE  . . . . . . . . . . 18:1--18:??
       Aswinkumar Sridharan and   
          Biswabandan Panda and   
                   Andre Seznec   Band-Pass Prefetching: an Effective
                                  Prefetch Management Mechanism Using
                                  Prefetch-Fraction Metric in Multi-Core
                                  Systems  . . . . . . . . . . . . . . . . 19:1--19:??
        Andrés Goens and   
              Sergio Siccha and   
            Jeronimo Castrillon   Symmetry in Software Synthesis . . . . . 20:1--20:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 3, September, 2017

               Sander Vocke and   
             Henk Corporaal and   
               Roel Jordans and   
            Rosilde Corvino and   
                       Rick Nas   Extending Halide to Improve Software
                                  Development for Imaging DSPs . . . . . . 21:1--21:??
          Nicklas Bo Jensen and   
                  Sven Karlsson   Improving Loop Dependence Analysis . . . 22:1--22:??
              Stefan Ganser and   
     Armin Grösslinger and   
           Norbert Siegmund and   
                  Sven Apel and   
             Christian Lengauer   Iterative Schedule Optimization for
                                  Parallelization in the Polyhedron Model  23:1--23:??
                    Wei Wei and   
                Dejun Jiang and   
                  Jin Xiong and   
                    Mingyu Chen   HAP: Hybrid-Memory-Aware Partition in
                                  Shared Last-Level Cache  . . . . . . . . 24:1--24:??
            Dongliang Xiong and   
                  Kai Huang and   
              Xiaowen Jiang and   
                   Xiaolang Yan   Providing Predictable Performance via a
                                  Slowdown Estimation Model  . . . . . . . 25:1--25:??
                    Jing Pu and   
                Steven Bell and   
                  Xuan Yang and   
                Jeff Setter and   
         Stephen Richardson and   
      Jonathan Ragan-Kelley and   
                  Mark Horowitz   Programming Heterogeneous Systems from
                                  an Image Processing DSL  . . . . . . . . 26:1--26:??
                Ayman Hroub and   
           M. E. S. Elrabaa and   
              M. F. Mudawar and   
                     A. Khayyat   Efficient Generation of Compact
                                  Execution Traces for Multicore
                                  Architectural Simulations  . . . . . . . 27:1--27:??
              Nicolas Weber and   
                Michael Goesele   MATOG: Array Layout Auto-Tuning for CUDA 28:1--28:??
            Amir H. Ashouri and   
             Andrea Bignoli and   
           Gianluca Palermo and   
           Cristina Silvano and   
            Sameer Kulkarni and   
                   John Cavazos   MiCOMP: Mitigating the Compiler
                                  Phase-Ordering Problem Using
                                  Optimization Sub-Sequences and Machine
                                  Learning . . . . . . . . . . . . . . . . 29:1--29:??
                Erik Vermij and   
             Leandro Fiorin and   
              Rik Jongerius and   
       Christoph Hagleitner and   
           Jan Van Lunteren and   
                   Koen Bertels   An Architecture for Integrated Near-Data
                                  Processors . . . . . . . . . . . . . . . 30:1--30:??
          Andreas Diavastos and   
                 Pedro Trancoso   SWITCHES: a Lightweight Runtime for
                                  Dataflow Execution of Tasks on
                                  Many-Cores . . . . . . . . . . . . . . . 31:1--31:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 4, December, 2017

                 Rahul Jain and   
        Preeti Ranjan Panda and   
           Sreenivas Subramoney   Cooperative Multi-Agent Reinforcement
                                  Learning-Based Co-optimization of Cores,
                                  Caches, and On-chip Network  . . . . . . 32:1--32:??
           Daniele De Sensi and   
         Tiziano De Matteis and   
           Massimo Torquati and   
          Gabriele Mencagli and   
                Marco Danelutto   Bringing Parallel Patterns Out of the
                                  Corner: The P$^3$ARSEC Benchmark Suite   33:1--33:??
               Chencheng Ye and   
                  Chen Ding and   
                    Hao Luo and   
                Jacob Brock and   
                  Dong Chen and   
                        Hai Jin   Cache Exclusivity and Sharing: Theory
                                  and Optimization . . . . . . . . . . . . 34:1--34:??
          Rahul Shrivastava and   
           V. Krishna Nandivada   Energy-Efficient Compilation of
                                  Irregular Task-Parallel Loops  . . . . . 35:1--35:??
                Julien Proy and   
           Karine Heydemann and   
          Alexandre Berzati and   
                   Albert Cohen   Compiler-Assisted Loop Hardening Against
                                  Fault Attacks  . . . . . . . . . . . . . 36:1--36:??
         Christina Peterson and   
                  Damian Dechev   A Transactional Correctness Tool for
                                  Abstract Data Types  . . . . . . . . . . 37:1--37:??
             Matteo Ferroni and   
               Andrea Corna and   
             Andrea Damiani and   
          Rolando Brondolin and   
         Juan A. Colmenares and   
             Steven Hofmeyr and   
        John D. Kubiatowicz and   
          Marco D. Santambrogio   Power Consumption Models for
                                  Multi-Tenant Server Infrastructures  . . 38:1--38:??
            Milad Mohammadi and   
              Tor M. Aamodt and   
               William J. Dally   CG-OoO: Energy-Efficient Coarse-Grain
                                  Out-of-Order Execution Near In-Order
                                  Energy with Near Out-of-Order
                                  Performance  . . . . . . . . . . . . . . 39:1--39:??
               Shivam Swami and   
     Poovaiah M. Palangappa and   
                Kartik Mohanram   ECS: Error-Correcting Strings for
                                  Lifetime Improvements in Nonvolatile
                                  Memories . . . . . . . . . . . . . . . . 40:1--40:??
             M. Waqar Azhar and   
         Per Stenström and   
        Vassilis Papaefstathiou   SLOOP: QoS-Supervised Loop Execution to
                                  Reduce Energy on Heterogeneous
                                  Architectures  . . . . . . . . . . . . . 41:1--41:??
     Raghavendra Kanakagiri and   
          Biswabandan Panda and   
                   Madhu Mutyam   MBZip: Multiblock Data Compression . . . 42:1--42:??
              Richard Neill and   
                Andi Drebes and   
                    Antoniu Pop   Fuse: Accurate Multiplexing of Hardware
                                  Performance Counters Across Executions   43:1--43:??
          Somayeh Sardashti and   
                  David A. Wood   Could Compression Be of General Use?
                                  Evaluating Memory Compression across
                                  Domains  . . . . . . . . . . . . . . . . 44:1--44:??
                 Libo Huang and   
            Yashuai Lü and   
                    Li Shen and   
                   Zhiying Wang   Improving the Efficiency of GPGPU
                                  Work-Queue Through Data Awareness  . . . 45:1--45:??
           Alexandra Angerd and   
               Erik Sintorn and   
             Per Stenström   A Framework for Automated and Controlled
                                  Floating-Point Accuracy Reduction in
                                  Graphics Applications on GPUs  . . . . . 46:1--46:??
              Jaime Arteaga and   
  Stéphane Zuckerman and   
                   Guang R. Gao   Generating Fine-Grain Multithreaded
                                  Applications Using a Multigrain Approach 47:1--47:??
              Ramyad Hadidi and   
                 Lifeng Nai and   
                Hyojong Kim and   
                    Hyesoon Kim   CAIRO: a Compiler-Assisted Technique for
                                  Enabling Instruction-Level Offloading of
                                  Processing-In-Memory . . . . . . . . . . 48:1--48:??
               Hongyeol Lim and   
                      Giho Park   Triple Engine Processor (TEP): a
                                  Heterogeneous Near-Memory Processor for
                                  Diverse Kernel Operations  . . . . . . . 49:1--49:??
          George Patsilaras and   
                     James Tuck   ReDirect: Reconfigurable Directories for
                                  Multicore Architectures  . . . . . . . . 50:1--50:??
               Adarsh Patil and   
         Ramaswamy Govindarajan   HAShCache: Heterogeneity-Aware Shared
                                  DRAMCache for Integrated Heterogeneous
                                  Systems  . . . . . . . . . . . . . . . . 51:1--51:??
           Christophe Alias and   
               Alexandru Plesco   Optimizing Affine Control With Semantic
                                  Factorizations . . . . . . . . . . . . . 52:1--52:??
             George Matheou and   
           Paraskevas Evripidou   Data-Driven Concurrency for High
                                  Performance Computing  . . . . . . . . . 53:1--53:??
       Giorgis Georgakoudis and   
        Hans Vandierendonck and   
               Peter Thoman and   
      Bronis R. De Supinski and   
           Thomas Fahringer and   
      Dimitrios S. Nikolopoulos   SCALO: Scalability-Aware Parallelism
                                  Orchestration for Multi-Threaded
                                  Workloads  . . . . . . . . . . . . . . . 54:1--54:??
             Toufik Baroudi and   
              Rachid Seghir and   
               Vincent Loechner   Optimization of Triangular and Banded
                                  Matrix Operations Using $2$ d-Packed
                                  Layouts  . . . . . . . . . . . . . . . . 55:1--55:??

ACM Transactions on Architecture and Code Optimization
Volume 15, Number 1, April, 2018

                 Hochan Lee and   
      Mansureh S. Moghaddam and   
               Dongkwan Suh and   
                 Bernhard Egger   Improving Energy Efficiency of
                                  Coarse-Grain Reconfigurable Arrays
                                  Through Modulo Schedule
                                  Compression/Decompression  . . . . . . . 1:1--1:??
           Karthik Sangaiah and   
                Michael Lui and   
             Radhika Jagtap and   
       Stephan Diestelhorst and   
       Siddharth Nilakantan and   
                 Ankit More and   
               Baris Taskin and   
                 Mark Hempstead   SynchroTrace: Synchronization-Aware
                                  Architecture-Agnostic Traces for
                                  Lightweight Multicore Simulation of CMP
                                  and HPC Workloads  . . . . . . . . . . . 2:1--2:??
                 Long Zheng and   
               Xiaofei Liao and   
                        Hai Jin   Efficient and Scalable Graph Parallel
                                  Processing With Symbolic Execution . . . 3:1--3:??
                 Jae-Eon Jo and   
              Gyu-Hyeon Lee and   
                Hanhwi Jang and   
                 Jaewon Lee and   
        Mohammadamin Ajdari and   
                    Jangwoo Kim   DiagSim: Systematically Diagnosing
                                  Simulators for Healthy Simulations . . . 4:1--4:??
           Sushant Kondguli and   
                  Michael Huang   A Case for a More Effective,
                                  Power-Efficient Turbo Boosting . . . . . 5:1--5:??
            Kuan-Chung Chen and   
                  Chung-Ho Chen   Enabling SIMT Execution Model on
                                  Homogeneous Multi-Core System  . . . . . 6:1--6:??
              Mingzhe Zhang and   
               King Tin Lam and   
                    Xin Yao and   
                    Cho-Li Wang   SIMPO: a Scalable In-Memory Persistent
                                  Object Framework Using NVRAM for
                                  Reliable Big Data Computing  . . . . . . 7:1--7:??
                 Bobin Deng and   
         Sriseshan Srikanth and   
               Eric R. Hein and   
            Thomas M. Conte and   
          Erik Debenedictis and   
               Jeanine Cook and   
               Michael P. Frank   Extending Moore's Law via
                                  Computationally Error-Tolerant Computing 8:1--8:??
                  Dave Dice and   
            Maurice Herlihy and   
                     Alex Kogan   Improving Parallelism in Hardware
                                  Transactional Memory . . . . . . . . . . 9:1--9:??
               Namhyung Kim and   
                Junwhan Ahn and   
               Kiyoung Choi and   
             Daniel Sanchez and   
               Donghoon Yoo and   
                    Soojung Ryu   Benzene: an Energy-Efficient Distributed
                                  Hybrid Cache Architecture for Manycore
                                  Systems  . . . . . . . . . . . . . . . . 10:1--10:??
                  Yulong Ao and   
                  Chao Yang and   
               Fangfang Liu and   
                Wanwang Yin and   
               Lijuan Jiang and   
                       Qiao Sun   Performance Optimization of the HPCG
                                  Benchmark on the Sunway TaihuLight
                                  Supercomputer  . . . . . . . . . . . . . 11:1--11:??
              Saeed Rashidi and   
               Majid Jalili and   
             Hamid Sarbazi-Azad   Improving MLC PCM Performance through
                                  Relaxed Write and Read for Intermediate
                                  Resistance Levels  . . . . . . . . . . . 12:1--12:??
                Wenlai Zhao and   
                 Haohuan Fu and   
                Jiarui Fang and   
               Weijie Zheng and   
                    Lin Gan and   
                  Guangwen Yang   Optimizing Convolutional Neural Networks
                                  on the Sunway TaihuLight Supercomputer   13:1--13:??
     Dimitrios Mbakoyiannis and   
         Othon Tomoutzoglou and   
                George Kornaros   Energy-Performance Considerations for
                                  Data Offloading to FPGA-Based
                                  Accelerators Over PCIe . . . . . . . . . 14:1--14:??
                   Zhen Lin and   
             Michael Mantor and   
                   Huiyang Zhou   GPU Performance vs. Thread-Level
                                  Parallelism: Scalability Analysis and a
                                  Novel Way to Improve TLP . . . . . . . . 15:1--15:??
          Oleksandr Zinenko and   
       Stéphane Huot and   
          Cédric Bastoul   Visual Program Manipulation in the
                                  Polyhedral Model . . . . . . . . . . . . 16:1--16:??

ACM Transactions on Architecture and Code Optimization
Volume 15, Number 2, June, 2018

          Mustafa M. Shihab and   
                  Jie Zhang and   
             Myoungsoo Jung and   
                Mahmut Kandemir   ReveNAND: a Fast-Drift-Aware Resilient
                                  $3$D NAND Flash Design . . . . . . . . . 17:1--17:??
         Seyed Majid Zahedi and   
               Songchun Fan and   
                Benjamin C. Lee   Managing Heterogeneous Datacenters with
                                  Tokens . . . . . . . . . . . . . . . . . 18:1--18:??
               Miquel Peric\`as   Elastic Places: an Adaptive Resource
                                  Manager for Scalable and Portable
                                  Performance  . . . . . . . . . . . . . . 19:1--19:??
     Matthew Benjamin Olson and   
           Joseph T. Teague and   
                Divyani Rao and   
           Michael R. JANTZ and   
           Kshitij A. Doshi and   
             Prasad A. Kulkarni   Cross-Layer Memory Management to Improve
                                  DRAM Energy Efficiency . . . . . . . . . 20:1--20:??
                Davide Zoni and   
               Luca Colombo and   
             William Fornaciari   DarkCache: Energy-Performance
                                  Optimization of Tiled Multi-Cores by
                                  Adaptively Power-Gating LLC Banks  . . . 21:1--21:??
                 Yang Zhang and   
                   Dan Feng and   
                   Wei Tong and   
                     Yu Hua and   
               Jingning Liu and   
                Zhipeng Tan and   
             Chengning Wang and   
                    Bing Wu and   
                   Zheng Li and   
                    Gaoxiang Xu   CACF: a Novel Circuit Architecture
                                  Co-optimization Framework for Improving
                                  Performance, Reliability and Energy of
                                  ReRAM-based Main Memory System . . . . . 22:1--22:??
          Nicolai Stawinoga and   
                     Tony Field   Predictable Thread Coarsening  . . . . . 23:1--23:??
                 Probir Roy and   
         Shuaiwen Leon Song and   
      Sriram Krishnamoorthy and   
             Abhinav Vishnu and   
          Dipanjan Sengupta and   
                         Xu Liu   NUMA-Caffe: NUMA-Aware Deep Learning
                                  Neural Networks  . . . . . . . . . . . . 24:1--24:??
                 Ahsen Ejaz and   
   Vassilios Papaefstathiou and   
                Ioannis Sourdis   DDRNoC: Dual Data-Rate Network-on-Chip   25:1--25:??
                   Ying Cai and   
                  Yulong Ao and   
                  Chao Yang and   
                 Wenjing Ma and   
                    Haitao Zhao   Extreme-Scale High-Order WENO
                                  Simulations of $3$-D Detonation Wave
                                  with 10 Million Cores  . . . . . . . . . 26:1--26:??

ACM Transactions on Architecture and Code Optimization
Volume 15, Number 3, October, 2018

         Yannis Sfakianakis and   
         Christos Kozanitis and   
         Christos Kozyrakis and   
                  Angelos Bilas   QuMan: Profile-based Improvement of
                                  Cluster Utilization  . . . . . . . . . . 27:1--27:??
         Engin Kayraklioglu and   
        Michael P. Ferguson and   
               Tarek El-Ghazawi   LAPPS: Locality-Aware Productive
                                  Prefetching Support for PGAS . . . . . . 28:1--28:??
              Akrem Benatia and   
                 Weixing Ji and   
                Yizhuo Wang and   
                       Feng Shi   BestSF: a Sparse Meta-Format for
                                  Optimizing SpMV on GPU . . . . . . . . . 29:1--29:??
                 Pierre Michaud   An Alternative TAGE-like Conditional
                                  Branch Predictor . . . . . . . . . . . . 30:1--30:??
              James Garland and   
                    David Gregg   Low Complexity Multiply-Accumulate Units
                                  for Convolutional Neural Networks with
                                  Weight-Sharing . . . . . . . . . . . . . 31:1--31:??
                Hyojong Kim and   
              Ramyad Hadidi and   
                 Lifeng Nai and   
                Hyesoon Kim and   
             Nuwan Jayasena and   
              Yasuko Eckert and   
               Onur Kayiran and   
                    Gabriel Loh   CODA: Enabling Co-location of
                                  Computation and Data for Multiple GPU
                                  Systems  . . . . . . . . . . . . . . . . 32:1--32:??
        Madhavan Manivannan and   
      Miquel Pericás and   
    Vassilis Papaefstathiou and   
             Per Stenström   Global Dead-Block Management for
                                  Task-Parallel Programs . . . . . . . . . 33:1--33:??
               Roman Gareev and   
             Tobias Grosser and   
                  Michael Kruse   High-Performance Generalized Tensor
                                  Operations: a Compiler-Oriented Approach 34:1--34:??
       Hervé Yviquel and   
                 Lauro Cruz and   
                   Guido Araujo   Cluster Programming using the OpenMP
                                  Accelerator Model  . . . . . . . . . . . 35:1--35:??
    Mohammad Khavari Tavana and   
        Amir Kavyan Ziabari and   
                    David Kaeli   Block Cooperation: Advancing Lifetime of
                                  Resistive Memories by Increasing
                                  Utilization of Error Correcting Codes    36:1--36:??
                    Hai Jin and   
                     Bo Liu and   
               Wenbin Jiang and   
                    Yang Ma and   
                Xuanhua Shi and   
               Bingsheng He and   
                  Shaofeng Zhao   Layer-Centric Memory Reuse and Data
                                  Migration for Extreme-Scale Deep
                                  Learning on Many-Core Architectures  . . 37:1--37:??
            Dani Voitsechov and   
            Arslan Zulfiqar and   
            Mark Stephenson and   
               Mark Gebhart and   
             Stephen W. Keckler   Software-Directed Techniques for
                                  Improved GPU Register File Utilization   38:1--38:??
                Huanxin Lin and   
                Cho-Li Wang and   
                   Hongyuan Liu   On-GPU Thread-Data Remapping for Branch
                                  Divergence Reduction . . . . . . . . . . 39:1--39:??

ACM Transactions on Architecture and Code Optimization
Volume 15, Number 4, January, 2019

         Stefan Kronawitter and   
             Christian Lengauer   Polyhedral Search Space Exploration in
                                  the ExaStencils Code Generator . . . . . 40:1--40:??
                Jingheng Xu and   
                 Haohuan Fu and   
                    Wen Shi and   
                    Lin Gan and   
                  Yuxuan Li and   
                  Wayne Luk and   
                  Guangwen Yang   Performance Tuning and Analysis for
                                  Stencil-Based Applications on POWER8
                                  Processor  . . . . . . . . . . . . . . . 41:1--41:??
                Jiajun Wang and   
                Reena Panda and   
                   Lizy K. John   SelSMaP: a Selective Stride Masking
                                  Prefetching Scheme . . . . . . . . . . . 42:1--42:??
                    Xing Su and   
               Xiangke Liao and   
                  Hao Jiang and   
                Canqun Yang and   
                   Jingling Xue   SCP: Shared Cache Partitioning for
                                  High-Performance GEMM  . . . . . . . . . 43:1--43:??
Fernando Magno Quintão Pereira and   
    Guilherme Vieira Leobas and   
       Abdoulaye Gamatié   Static Prediction of Silent Stores . . . 44:1--44:??
              Neal C. Crago and   
            Mark Stephenson and   
             Stephen W. Keckler   Exposing Memory Access Patterns to
                                  Improve Instruction and Memory
                                  Efficiency in GPUs . . . . . . . . . . . 45:1--45:??
                 Feng Zhang and   
                   Jingling Xue   Poker: Permutation-Based SIMD Execution
                                  of Intensive Tree Search by Path
                                  Encoding . . . . . . . . . . . . . . . . 46:1--46:??
         Nicolas Belleville and   
    Damien Couroussé and   
           Karine Heydemann and   
           Henri-Pierre Charles   Automated Software Protection for the
                                  Masses Against Side-Channel Attacks  . . 47:1--47:??
                    Chao Yu and   
                 Yuebin Bai and   
               Qingxiao Sun and   
                   Hailong Yang   Improving Thread-level Parallelism in
                                  GPUs Through Expanding Register File to
                                  Scratchpad Memory  . . . . . . . . . . . 48:1--48:??
                 Lois Orosa and   
            Rodolfo Azevedo and   
                     Onur Mutlu   AVPP: Address-first Value-next Predictor
                                  with Value Prefetching for Improving the
                                  Efficiency of Load Value Prediction  . . 49:1--49:??
                  Jun Zhang and   
                    Rui Hou and   
                   Wei Song and   
             Sally A. Mckee and   
                   Zhen Jia and   
                 Chen Zheng and   
                Mingyu Chen and   
                Lixin Zhang and   
                       Dan Meng   RAGuard: an Efficient and
                                  User-Transparent Hardware Mechanism
                                  against ROP Attacks  . . . . . . . . . . 50:1--50:??
                  Ping Wang and   
                Luke Mchale and   
              Paul V. Gratz and   
                 Alex Sprintson   GenMatcher: a Generic Clustering-Based
                                  Arbitrary Matching Framework . . . . . . 51:1--51:??
             Ding-Yong Hong and   
                 Jan-Jan Wu and   
                Yu-Ping Liu and   
                Sheng-Yu Fu and   
                  Wei-Chung Hsu   Processor-Tracing Guided Region
                                  Formation in Dynamic Binary Translation  52:1--52:??
                    Yu Wang and   
                 Victor Lee and   
                Gu-Yeon Wei and   
                   David Brooks   Predicting New Workload or CPU
                                  Performance by Analyzing Public Datasets 53:1--53:??
               Hyukwoo Park and   
               Sungkook Kim and   
             Jung-Geun Park and   
                  Soo-Mook Moon   Reusing the Optimized Code for
                                  JavaScript Ahead-of-Time Compilation . . 54:1--54:??
                   Han Zhao and   
                  Quan Chen and   
                 Yuxian Qiu and   
                    Ming Wu and   
                   Yao Shen and   
               Jingwen Leng and   
                    Chao Li and   
                      Minyi Guo   Bandwidth and Locality Aware
                                  Task-stealing for Manycore Architectures
                                  with Bandwidth-Asymmetric Memory . . . . 55:1--55:??
              Stefan Ganser and   
Armin Größlinger and   
           Norbert Siegmund and   
                  Sven Apel and   
             Christian Lengauer   Speeding up Iterative Polyhedral
                                  Schedule Optimization with Surrogate
                                  Performance Models . . . . . . . . . . . 56:1--56:??
                    Song Wu and   
                  Fang Zhou and   
                  Xiang Gao and   
                    Hai Jin and   
                    Jinglei Ren   Dual-Page Checkpointing: an
                                  Architectural Approach to Efficient Data
                                  Persistence for In-Memory Applications   57:1--57:??
               Mohsen Kiani and   
                Amir Rajabzadeh   Efficient Cache Performance Modeling in
                                  GPUs Using Reuse Distance Analysis . . . 58:1--58:??
           Thomas Debrunner and   
               Sajad Saeedi and   
               Paul H. J. Kelly   AUKE: Automatic Kernel Code Generation
                                  for an Analogue SIMD Focal-Plane
                                  Sensor-Processor Array . . . . . . . . . 59:1--59:??
                   You Zhou and   
                     Fei Wu and   
                Zhonghai Lu and   
                   Xubin He and   
                 Ping Huang and   
                 Changsheng Xie   SCORE: a Novel Scheme to Efficiently
                                  Cache Overlong ECCs in NAND Flash Memory 60:1--60:??
 Franciso J. Andújar and   
              Salvador Coll and   
              Marina Alonso and   
         Pedro López and   
    Juan-Miguel Martínez   POWAR: Power-Aware Routing in HPC
                                  Networks with On/Off Links . . . . . . . 61:1--61:??
             Rahim Mammadli and   
                 Felix Wolf and   
                  Ali Jannesari   The Art of Getting Deep Neural Networks
                                  in Shape . . . . . . . . . . . . . . . . 62:1--62:??
             Stavros Tzilis and   
             Pedro Trancoso and   
                Ioannis Sourdis   Energy-Efficient Runtime Management of
                                  Heterogeneous Multicores using Online
                                  Projection . . . . . . . . . . . . . . . 63:1--63:??
        Matthew Kay Fei Lee and   
                Yingnan Cui and   
          Thannirmalai Somu and   
                    Tao Luo and   
                   Jun Zhou and   
              Wai Teng Tang and   
              Weng-Fai Wong and   
             Rick Siow Mong Goh   A System-Level Simulator for RRAM-Based
                                  Neuromorphic Computing Chips . . . . . . 64:1--64:??
        Evangelos Vasilakis and   
    Vassilis Papaefstathiou and   
             Pedro Trancoso and   
                Ioannis Sourdis   Decoupled Fused Cache: Fusing a
                                  Decoupled LLC with a DRAM Cache  . . . . 65:1--65:??
          Peter Pirkelbauer and   
              Amalee Wilson and   
         Christina Peterson and   
                  Damian Dechev   Blaze-Tasks: a Framework for Computing
                                  Parallel Reductions over Tasks . . . . . 66:1--66:??
              Yukinori Sato and   
                Tomoya Yuki and   
                    Toshio Endo   An Autotuning Framework for Scalable
                                  Execution of Tiled Code via Iterative
                                  Polyhedral Compilation . . . . . . . . . 67:1--67:??
         S.-Kazem Shekofteh and   
                Hamid Noori and   
        Mahmoud Naghibzadeh and   
         Hadi Sadoghi Yazdi and   
            Holger Fröning   Metric Selection for GPU Kernel
                                  Classification . . . . . . . . . . . . . 68:1--68:??
                  Angelos Bilas   List of 2018 Distinguished Reviewers ACM
                                  TACO . . . . . . . . . . . . . . . . . . 69:1--69:??

ACM Transactions on Architecture and Code Optimization
Volume 16, Number 1, March, 2019

            Ghassan Shobaki and   
              Austin Kerbow and   
         Christopher Pulido and   
                 William Dobson   Exploring an Alternative Cost Function
                                  for Combinatorial
                                  Register-Pressure-Aware Instruction
                                  Scheduling . . . . . . . . . . . . . . . 1:1--1:??
                Yu-Ping Liu and   
             Ding-Yong Hong and   
                 Jan-Jan Wu and   
                Sheng-Yu Fu and   
                  Wei-Chung Hsu   Exploiting SIMD Asymmetry in ARM-to-x86
                                  Dynamic Binary Translation . . . . . . . 2:1--2:??
       Mohammad Sadrosadati and   
         Seyed Borna Ehsani and   
             Hajar Falahati and   
    Rachata Ausavarungnirun and   
             Arash Tavakkol and   
              Mojtaba Abaee and   
                 Lois Orosa and   
                Yaohua Wang and   
         Hamid Sarbazi-Azad and   
                     Onur Mutlu   ITAP: Idle-Time-Aware Power Management
                                  for GPU Execution Units  . . . . . . . . 3:1--3:??
                Halit Dogan and   
                Masab Ahmad and   
                Brian Kahne and   
                      Omer Khan   Accelerating Synchronization Using
                                  Moving Compute to Data Model at
                                  1,000-core Multicore Scale . . . . . . . 4:1--4:??
              Leonid Azriel and   
               Lukas Humbel and   
             Reto Achermann and   
            Alex Richardson and   
            Moritz Hoffmann and   
              Avi Mendelson and   
             Timothy Roscoe and   
        Robert N. M. Watson and   
           Paolo Faraboschi and   
                Dejan Milojicic   Memory-Side Protection With a Capability
                                  Enforcement Co-Processor . . . . . . . . 5:1--5:??
               Aamer Jaleel and   
             Eiman Ebrahimi and   
                     Sam Duncan   DUCATI: High-performance Address
                                  Translation by Extending TLB Reach of
                                  GPU-accelerated Systems  . . . . . . . . 6:1--6:??

ACM Transactions on Architecture and Code Optimization
Volume 16, Number 2, May, 2019

                   Yemao Xu and   
                 Dezun Dong and   
                  Weixia Xu and   
                   Xiangke Liao   SketchDLC: a Sketch on Distributed Deep
                                  Learning Communication via Trace
                                  Capturing  . . . . . . . . . . . . . . . 7:1--7:??
        Aristeidis Mastoras and   
                Thomas R. Gross   Efficient and Scalable Execution of
                                  Fine-Grained Dynamic Linear Pipelines    8:1--8:??
                Tae Jun Ham and   
      Juan L. Aragón and   
             Margaret Martonosi   Efficient Data Supply for Parallel
                                  Heterogeneous Architectures  . . . . . . 9:1--9:??
             Savvas Sioutas and   
              Sander Stuijk and   
                Luc Waeijen and   
                Twan Basten and   
             Henk Corporaal and   
                     Lou Somers   Schedule Synthesis for Halide Pipelines
                                  through Reuse Analysis . . . . . . . . . 10:1--10:??
              Xiaoyuan Wang and   
                 Haikun Liu and   
               Xiaofei Liao and   
                    Ji Chen and   
                    Hai Jin and   
                   Yu Zhang and   
                 Long Zheng and   
               Bingsheng He and   
                     Song Jiang   Supporting Superpages and Lightweight
                                  Page Migration in Hybrid Memory Systems  11:1--11:??
             Sahar Sargaran and   
            Naser Mohammadzadeh   SAQIP: a Scalable Architecture for
                                  Quantum Information Processors . . . . . 12:1--12:??
             Prerna Budhkar and   
           Ildar Absalyamov and   
             Vasileios Zois and   
               Skyler Windh and   
            Walid A. Najjar and   
            Vassilis J. Tsotras   Accelerating In-Memory Database
                                  Selections Using Latency Masking
                                  Hardware Threads . . . . . . . . . . . . 13:1--13:??
           Heinrich Riebler and   
                  Gavin Vaz and   
              Tobias Kenter and   
               Christian Plessl   Transparent Acceleration for
                                  Heterogeneous Platforms With Compilation
                                  to OpenCL  . . . . . . . . . . . . . . . 14:1--14:??
                   Xun Gong and   
                 Xiang Gong and   
                 Leiming Yu and   
                    David Kaeli   HAWS: Accelerating GPU Wavefront
                                  Execution through Selective Out-of-order
                                  Execution  . . . . . . . . . . . . . . . 15:1--15:??
                  Yang Song and   
           Olivier Alavoine and   
                       Bill Lin   A Self-aware Resource Management
                                  Framework for Heterogeneous Multicore
                                  SoCs with Diverse QoS Targets  . . . . . 16:1--16:??
              Pedro Yebenes and   
       Jose Rocher-Gonzalez and   
  Jesus Escudero-Sahuquillo and   
        Pedro Javier Garcia and   
        Francisco J. Alfaro and   
        Francisco J. Quiles and   
Crispín Gómez and   
                     Jose Duato   Combining Source-adaptive and Oblivious
                                  Routing with Congestion Control in
                                  High-performance Interconnects using
                                  Hybrid and Direct Topologies . . . . . . 17:1--17:??
          Mohammad Alshboul and   
           Hussein Elnawawy and   
              Reem Elkhouly and   
               Keiji Kimura and   
                 James Tuck and   
                    Yan Solihin   Efficient Checkpointing with Recompute
                                  Scheme for Non-volatile Main Memory  . . 18:1--18:??
     Zacharias Hadjilambrou and   
          Marios Kleanthous and   
           Georgia Antoniou and   
             Antoni Portero and   
             Yiannakis Sazeides   Comprehensive Characterization of an
                                  Open Source Document Search Engine . . . 19:1--19:??

ACM Transactions on Architecture and Code Optimization
Volume 16, Number 3, July, 2019

                Bingchao Li and   
                 Jizeng Wei and   
                 Jizhou Sun and   
           Murali Annavaram and   
                   Nam Sung Kim   An Efficient GPU Cache Architecture for
                                  Applications with Irregular Memory
                                  Access Patterns  . . . . . . . . . . . . 20:1--20:??
         Stephen I. Roberts and   
           Steven A. Wright and   
            Suhaib A. Fahmy and   
              Stephen A. Jarvis   The Power-optimised Software Envelope    21:1--21:??
        Ram Srivatsa Kannan and   
         Michael Laurenzano and   
              Jeongseob Ahn and   
                 Jason Mars and   
                   Lingjia Tang   Caliper: Interference Estimator for
                                  Multi-tenant Environments Sharing
                                  Architectural Resources  . . . . . . . . 22:1--22:??
                   Zhen Lin and   
                Hongwen Dai and   
             Michael Mantor and   
                   Huiyang Zhou   Coordinated CTA Combination and
                                  Bandwidth Partitioning for GPU
                                  Concurrent Kernel Execution  . . . . . . 23:1--23:??
              Keryan Didier and   
     Dumitru Potop-Butucaru and   
            Guillaume Iooss and   
               Albert Cohen and   
               Jean Souyris and   
         Philippe Baufreton and   
                Amaury Graillat   Correct-by-Construction Parallelization
                                  of Hard Real-Time Avionics Applications
                                  on Off-the-Shelf Predictable Hardware    24:1--24:??
           Pantea Zardoshti and   
               Tingzhe Zhou and   
            Pavithra Balaji and   
           Michael L. Scott and   
                  Michael Spear   Simplifying Transactional Memory Support
                                  in C++ . . . . . . . . . . . . . . . . . 25:1--25:??
               Jungwoo Park and   
              Myoungjun Lee and   
                Soontae Kim and   
                   Minho Ju and   
                  Jeongkyu Hong   MH Cache: a Multi-retention
                                  STT-RAM-based Low-power Last-level Cache
                                  for Mobile Hardware Rendering Systems    26:1--26:??
                Jakob Leben and   
              George Tzanetakis   Polyhedral Compilation for
                                  Multi-dimensional Stream Processing  . . 27:1--27:??
    Mohammad Sadegh Sadeghi and   
      Siavash Bayat Sarmadi and   
                Shaahin Hessabi   Toward On-chip Network Security Using
                                  Runtime Isolation Mapping  . . . . . . . 28:1--28:??
                Stephane Louise   A First Step Toward Using Quantum
                                  Computing for Low-level WCETs
                                  Estimations  . . . . . . . . . . . . . . 29:1--29:??
               Artem Chikin and   
               Taylor Lloyd and   
  José Nelson Amaral and   
              Ettore Tiotto and   
                 Muhammad Usman   Memory-access-aware Safety and
                                  Profitability Analysis for
                                  Transformation of Accelerator-bound
                                  OpenMP Loops . . . . . . . . . . . . . . 30:1--30:??
               Sanghoon Cha and   
               Bokyeong Kim and   
            Chang Hyun Park and   
                    Jaehyuk Huh   Morphable DRAM Cache Design for Hybrid
                                  Memory Systems . . . . . . . . . . . . . 31:1--31:??
                   Chao Luo and   
                  Yunsi Fei and   
                    David Kaeli   Side-channel Timing Attack of RSA on a
                                  GPU  . . . . . . . . . . . . . . . . . . 32:1--32:??
                 Liang Yuan and   
                  Chen Ding and   
               Wesley Smith and   
              Peter Denning and   
                  Yunquan Zhang   A Relational Theory of Locality  . . . . 33:1--33:??

ACM Transactions on Architecture and Code Optimization
Volume 16, Number 4, January, 2020

            Arun Thangamani and   
           V. Krishna Nandivada   Optimizing Remote Communication in X10   34:1--34:26
         Sriseshan Srikanth and   
               Anirudh Jain and   
           Joseph M. Lennon and   
            Thomas M. Conte and   
          Erik Debenedictis and   
                   Jeanine Cook   MetaStrider: Architectures for Scalable
                                  Memory-centric Reduction of Sparse Data
                                  Streams  . . . . . . . . . . . . . . . . 35:1--35:26
             Mostafa Koraei and   
                Omid Fatemi and   
                   Magnus Jahre   DCMI: a Scalable Strategy for
                                  Accelerating Iterative Stencil Loops on
                                  FPGAs  . . . . . . . . . . . . . . . . . 36:1--36:24
                Leeor Peled and   
                 Uri Weiser and   
                    Yoav Etsion   A Neural Network Prefetcher for
                                  Arbitrary Memory Access Patterns . . . . 37:1--37:27
          Nicolas Vasilache and   
          Oleksandr Zinenko and   
      Theodoros Theodoridis and   
                Priya Goyal and   
             Zachary Devito and   
           William S. Moses and   
           Sven Verdoolaege and   
               Andrew Adams and   
                   Albert Cohen   The Next 700 Accelerated Layers: From
                                  Mathematical Expressions of Network
                                  Computation Graphs to Accelerated GPU
                                  Kernels, Automatically . . . . . . . . . 38:1--38:26
               Wenbin Jiang and   
                    Yang Ma and   
                     Bo Liu and   
                 Haikun Liu and   
             Bing Bing Zhou and   
                   Jian Zhu and   
                    Song Wu and   
                        Hai Jin   Layup: Layer-adaptive and Multi-type
                                  Intermediate-oriented Memory
                                  Optimization for GPU-based CNNs  . . . . 39:1--39:23
                 Sergi Siso and   
                 Wes Armour and   
        Jeyarajan Thiyagalingam   Evaluating Auto-Vectorizing Compilers
                                  through Objective Withdrawal of Useful
                                  Information  . . . . . . . . . . . . . . 40:1--40:23
              Salonik Resch and   
       S. Karen Khatamifard and   
    Zamshed Iqbal Chowdhury and   
              Masoud Zabihi and   
             Zhengyang Zhao and   
             Jian-Ping Wang and   
       Sachin S. Sapatnekar and   
               Ulya R. Karpuzcu   PIMBALL: Binary Neural Networks in
                                  Spintronic Memory  . . . . . . . . . . . 41:1--41:26
            Zhen Hang Jiang and   
                  Yunsi Fei and   
                    David Kaeli   Exploiting Bank Conflict-based
                                  Side-channel Timing Leakage of GPUs  . . 42:1--42:24
             Kyle Daruwalla and   
                  Heng Zhuo and   
               Rohit Shukla and   
                  Mikko Lipasti   BitSAD v2: Compiler Optimization and
                                  Analysis for Bitstream Computing . . . . 43:1--43:25
        Aristeidis Mastoras and   
                Thomas R. Gross   Chunking for Dynamic Linear Pipelines    44:1--44:25
               Manuel Selva and   
              Fabian Gruber and   
              Diogo Sampaio and   
         Christophe Guillon and   
    Louis-Noël Pouchet and   
               Fabrice Rastello   Building a Polyhedral Representation
                                  from an Instrumented Execution: Making
                                  Dynamic Analyses of Nonaffine Programs
                                  Scalable . . . . . . . . . . . . . . . . 45:1--45:26
                Ahmad Yasin and   
            Jawad Haj-Yahya and   
             Yosi Ben-Asher and   
                  Avi Mendelson   A Metric-Guided Method for Discovering
                                  Impactful Features and Architectural
                                  Insights for Skylake-Based Processors    46:1--46:25
                   Jie Zhao and   
                   Albert Cohen   Flextended Tiles: a Flexible Extension
                                  of Overlapped Tiles for Polyhedral
                                  Compilation  . . . . . . . . . . . . . . 47:1--47:25
             Daniel Gerzhoy and   
                 Xiaowu Sun and   
              Michael Zuzak and   
                   Donald Yeung   Nested MIMD--SIMD Parallelization for
                                  Heterogeneous Microprocessors  . . . . . 48:1--48:27
                Chunwei Xia and   
              Jiacheng Zhao and   
                 Huimin Cui and   
              Xiaobing Feng and   
                   Jingling Xue   DNNTune: Automatic Benchmarking DNN
                                  Models for Mobile-cloud Computing  . . . 49:1--49:26
                 Ian Briggs and   
                  Arnab Das and   
            Mark Baranowski and   
              Vishal Sharma and   
      Sriram Krishnamoorthy and   
       Zvonimir Rakamari\'c and   
          Ganesh Gopalakrishnan   FailAmp: Relativization Transformation
                                  for Soft Error Detection in Structured
                                  Address Generation . . . . . . . . . . . 50:1--50:21
               Khalid Ahmad and   
                Hari Sundar and   
                      Mary Hall   Data-driven Mixed Precision Sparse
                                  Matrix Vector Multiplication for GPUs    51:1--51:24
           Larisa Stoltzfus and   
           Bastian Hagedorn and   
             Michel Steuwer and   
            Sergei Gorlatch and   
              Christophe Dubach   Tiling Optimizations for Stencil
                                  Computations Using Rewrite Rules in Lift 52:1--52:25
    Michiel A. van der Vlag and   
         Georgios Smaragdos and   
                Zaid Al-Ars and   
               Christos Strydis   Exploring Complex Brain-Simulation
                                  Workloads on Multi-GPU Deployments . . . 53:1--53:25
              Reem Elkhouly and   
          Mohammad Alshboul and   
            Akihiro Hayashi and   
                Yan Solihin and   
                   Keiji Kimura   Compiler-support for Critical Data
                                  Persistence in NVM . . . . . . . . . . . 54:1--54:25
            Lorenzo Chelini and   
          Oleksandr Zinenko and   
             Tobias Grosser and   
                 Henk Corporaal   Declarative Loop Tactics for
                                  Domain-specific Optimization . . . . . . 55:1--55:25
              Asif Ali Khan and   
               Fazal Hameed and   
         Robin Bläsing and   
        Stuart S. P. Parkin and   
            Jeronimo Castrillon   ShiftsReduce: Minimizing Shifts in
                                  Racetrack Memory 4.0 . . . . . . . . . . 56:1--56:23

ACM Transactions on Architecture and Code Optimization
Volume 17, Number 1, March, 2020

                   Yuhao Li and   
                    Dan Sun and   
                Benjamin C. Lee   Dynamic Colocation Policies with
                                  Reinforcement Learning . . . . . . . . . 1:1--1:25
      Nikolaos Tampouratzis and   
     Ioannis Papaefstathiou and   
         Antonios Nikitakis and   
         Andreas Brokalakis and   
       Stamatis Andrianakis and   
           Apostolos Dollas and   
               Marco Marcon and   
               Emanuele Plebani   A Novel, Highly Integrated Simulator for
                                  Parallel and Distributed Systems . . . . 2:1--2:28
               Lijuan Jiang and   
                  Chao Yang and   
                     Wenjing Ma   Enabling Highly Efficient Batched Matrix
                                  Multiplications on SW26010 Many-core
                                  Processor  . . . . . . . . . . . . . . . 3:1--3:23
              Mustafa Cavus and   
               Resit Sendag and   
                   Joshua J. Yi   Informed Prefetching for Indirect Memory
                                  Accesses . . . . . . . . . . . . . . . . 4:1--4:29
               Yohann Uguen and   
        Florent De Dinechin and   
              Victor Lezaud and   
                 Steven Derrien   Application-Specific Arithmetic in
                                  High-Level Synthesis Tools . . . . . . . 5:1--5:23
                  Yang Song and   
                       Bill Lin   Improving Memory Efficiency in
                                  Heterogeneous MPSoCs through Row-Buffer
                                  Locality-aware Forwarding  . . . . . . . 6:1--6:26
                     Hao Wu and   
                 Weizhi Liu and   
                Huanxin Lin and   
                    Cho-Li Wang   A Model-Based Software Solution for
                                  Simultaneous Multiple Kernels on GPUs    7:1--7:26
                Xuanhua Shi and   
                    Wei Liu and   
                  Ligang He and   
                    Hai Jin and   
                    Ming Li and   
                      Yong Chen   Optimizing the SSD Burst Buffer by
                                  Traffic Detection  . . . . . . . . . . . 8:1--8:26

ACM Transactions on Architecture and Code Optimization
Volume 17, Number 2, June, 2020

                Charu Kalra and   
             Fritz Previlon and   
                 Norm Rubin and   
                    David Kaeli   ArmorAll: Compiler-based Resilience
                                  Targeting GPU Applications . . . . . . . 9:1--9:24
           Stefano Cherubin and   
           Daniele Cattaneo and   
             Michele Chiari and   
                Giovanni Agosta   Dynamic Precision Autotuning with TAFFO  10:1--10:26
                Ahmet Erdem and   
           Cristina Silvano and   
              Thomas Boesch and   
      Andrea Carlo Ornstein and   
         Surinder-Pal Singh and   
                Giuseppe Desoli   Runtime Design Space Exploration and
                                  Mapping of DCNNs for the Ultra-Low-Power
                                  Orlando SoC  . . . . . . . . . . . . . . 11:1--11:25
  Amir Hossein Nodehi Sabet and   
                Junqiao Qiu and   
                Zhijia Zhao and   
          Sriram Krishnamoorthy   Reliability Analysis for Unreliable FSM
                                  Computations . . . . . . . . . . . . . . 12:1--12:23
                Jiachen Xue and   
           T. N. Vijaykumar and   
            Mithuna Thottethodi   Network Interface Architecture for
                                  Remote Indirect Memory Access (RIMA) in
                                  Datacenters  . . . . . . . . . . . . . . 13:1--13:22
              Qinggang Wang and   
                 Long Zheng and   
               Jieshan Zhao and   
               Xiaofei Liao and   
                    Hai Jin and   
                   Jingling Xue   A Conflict-free Scheduler for
                                  High-performance Graph Processing on
                                  Multi-pipeline FPGAs . . . . . . . . . . 14:1--14:26
                 Anita Tino and   
          Caroline Collange and   
            André Seznec   SIMT-X: Extending Single-Instruction
                                  Multi-Threading to Out-of-Order Cores    15:1--15:23

ACM Transactions on Architecture and Code Optimization
Volume 17, Number 3, August, 2020

                     Dave Kaeli   Editorial: a Message from the
                                  Editor-in-Chief  . . . . . . . . . . . . 16:1--16:2
                 Ram Rangan and   
         Mark W. Stephenson and   
            Aditya Ukarande and   
               Shyam Murthy and   
              Virat Agarwal and   
                Marc Blackstein   Zeroploit: Exploiting Zero Valued
                                  Operands in Interactive Gaming
                                  Applications . . . . . . . . . . . . . . 17:1--17:26
        Karel Adámek and   
              Sofia Dimoudi and   
                 Mike Giles and   
                  Wesley Armour   GPU Fast Convolution via the
                                  Overlap-and-Save Method in Shared Memory 18:1--18:20
                  Arnab Das and   
      Sriram Krishnamoorthy and   
                 Ian Briggs and   
      Ganesh Gopalakrishnan and   
          Ramakrishna Tipireddy   FPDetect: Efficient Reasoning About
                                  Stencil Programs Using Selective Direct
                                  Evaluation . . . . . . . . . . . . . . . 19:1--19:27
           Tarek S. Abdelrahman   Cooperative Software-hardware
                                  Acceleration of $K$-means on a Tightly
                                  Coupled CPU--FPGA System . . . . . . . . 20:1--20:24
                 Jaekyu Lee and   
                Yasuo Ishii and   
                     Dam Sunwoo   Securing Branch Predictors with
                                  Two-Level Encryption . . . . . . . . . . 21:1--21:25
                  L. Cerina and   
         M. D. Santambrogio and   
                  G. Franco and   
              C. Gallicchio and   
                     A. Micheli   EchoBay: Design and Optimization of Echo
                                  State Networks under Memory and Time
                                  Constraints  . . . . . . . . . . . . . . 22:1--22:24
             Savvas Sioutas and   
              Sander Stuijk and   
                Twan Basten and   
             Henk Corporaal and   
                     Lou Somers   Schedule Synthesis for Halide Pipelines
                                  on GPUs  . . . . . . . . . . . . . . . . 23:1--23:25
           Muhammad Huzaifa and   
            Johnathan Alsop and   
        Abdulrahman Mahmoud and   
          Giordano Salvador and   
        Matthew D. Sinclair and   
                 Sarita V. Adve   Inter-kernel Reuse-aware Thread Block
                                  Scheduling . . . . . . . . . . . . . . . 24:1--24:27

ACM Transactions on Architecture and Code Optimization
Volume 18, Number 1, January, 2021

        Syed M. A. H. Jafri and   
               Hasan Hassan and   
               Ahmed Hemani and   
                     Onur Mutlu   Refresh Triggered Computation: Improving
                                  the Energy Efficiency of Convolutional
                                  Neural Network Accelerators  . . . . . . 2:1--2:29
              Solomon Abera and   
            M. Balakrishnan and   
                   Anshul Kumar   Performance-Energy Trade-off in Modern
                                  CMPs . . . . . . . . . . . . . . . . . . 3:1--3:26
             Atefeh Mehrabi and   
             Aninda Manocha and   
            Benjamin C. Lee and   
                Daniel J. Sorin   Bayesian Optimization for Efficient
                                  Accelerator Synthesis  . . . . . . . . . 4:1--4:25
                  Minsu Kim and   
            Jeong-Keun Park and   
                  Soo-Mook Moon   Irregular Register Allocation for
                                  Translation of Test-pattern Programs . . 5:1--5:23
          Negin Nematollahi and   
       Mohammad Sadrosadati and   
             Hajar Falahati and   
         Marzieh Barkhordar and   
        Mario Paulo Drumond and   
         Hamid Sarbazi-Azad and   
                  Babak Falsafi   Efficient Nearest-Neighbor Data Sharing
                                  in GPUs  . . . . . . . . . . . . . . . . 6:1--6:26
               Lorenz Braun and   
             Sotirios Nikas and   
                  Chen Song and   
          Vincent Heuveline and   
            Holger Fröning   A Simple Model for Portable and Fast
                                  Prediction of Execution Time and Power
                                  Consumption of GPU Kernels . . . . . . . 7:1--7:25
             Marcel Mettler and   
Daniel Mueller-Gritschneder and   
               Ulf Schlichtmann   A Distributed Hardware Monitoring System
                                  for Runtime Verification on Multi-Tile
                                  MPSoCs . . . . . . . . . . . . . . . . . 8:1--8:25
               Yu Emma Wang and   
             Carole-Jean Wu and   
              Xiaodong Wang and   
              Kim Hazelwood and   
                   David Brooks   Exploiting Parallelism Opportunities
                                  with Deep Learning Frameworks  . . . . . 9:1--9:23
          Sanket Tavarageri and   
         Alexander Heinecke and   
          Sasikanth Avancha and   
                Bharat Kaul and   
            Gagandeep Goyal and   
          Ramakrishna Upadrasta   PolyDL: Polyhedral Optimizations for
                                  Creation of High-performance DL
                                  Primitives . . . . . . . . . . . . . . . 11:1--11:27
              Sujay Yadalam and   
            Vinod Ganapathy and   
                 Arkaprava Basu   SG XL: Security and Performance for
                                  Enclaves Using Large Pages . . . . . . . 12:1--12:25
     Kleovoulos Kalaitzidis and   
            André Seznec   Leveraging Value Equality Prediction for
                                  Value Speculation  . . . . . . . . . . . 13:1--13:20
             Abhishek Singh and   
                 Shail Dave and   
           Pantea Zardoshti and   
            Robert Brotzman and   
                 Chao Zhang and   
               Xiaochen Guo and   
         Aviral Shrivastava and   
                   Gang Tan and   
                  Michael Spear   SPX64: a Scratchpad Memory for
                                  General-purpose Microprocessors  . . . . 14:1--14:26
         Paolo Sylos Labini and   
          Marco Cianfriglia and   
              Damiano Perri and   
            Osvaldo Gervasi and   
             Grigori Fursin and   
            Anton Lokhmotov and   
            Cedric Nugteren and   
          Bruno Carpentieri and   
              Fabiana Zollo and   
                   Flavio Vella   On the Anatomy of Predictive Models for
                                  Accelerating GPU Convolution Kernels and
                                  Beyond . . . . . . . . . . . . . . . . . 16:1--16:24

ACM Transactions on Architecture and Code Optimization
Volume 18, Number 2, March, 2021

                  Nils Voss and   
         Bastiaan Kwaadgras and   
               Oskar Mencer and   
                  Wayne Luk and   
              Georgi Gaydadjiev   On Predictable Reconfigurable System
                                  Design . . . . . . . . . . . . . . . . . 17:1--17:28
      Anirudh Mohan Kaushik and   
         Gennady Pekhimenko and   
                    Hiren Patel   Gretch: a Hardware Prefetcher for Graph
                                  Analytics  . . . . . . . . . . . . . . . 18:1--18:25
               Nhut-Minh Ho and   
           Himeshi De Silva and   
                  Weng-Fai Wong   GRAM: a Framework for Dynamically Mixing
                                  Precisions in GPU Applications . . . . . 19:1--19:24
             Arnab Kumar Biswas   Cryptographic Software IP Protection
                                  without Compromising Performance or
                                  Timing Side-channel Leakage  . . . . . . 20:1--20:20
      Maxime France-Pillois and   
 Jérôme Martin and   
Frédéric Rousseau   A Non-Intrusive Tool Chain to Optimize
                                  MPSoC End-to-End Systems . . . . . . . . 21:1--21:22
                Pengyu Wang and   
                  Jing Wang and   
                    Chao Li and   
              Jianzong Wang and   
                 Haojin Zhu and   
                      Minyi Guo   Grus: Toward Unified-memory-efficient
                                  High-performance Graph Processing on GPU 22:1--22:25
            Ramin Izadpanah and   
         Christina Peterson and   
                Yan Solihin and   
                  Damian Dechev   PETRA: Persistent Transactional
                                  Non-blocking Linked Data Structures  . . 23:1--23:26
            Muhammad Hassan and   
            Chang Hyun Park and   
           David Black-Schaffer   A Reusable Characterization of the
                                  Memory System Behavior of SPEC2017 and
                                  SPEC2006 . . . . . . . . . . . . . . . . 24:1--24:20

ACM Transactions on Architecture and Code Optimization
Volume 18, Number 3, June, 2021

            Sugandha Tiwari and   
                  Neel Gala and   
            Chester Rebeiro and   
                    V. Kamakoti   PERI: a Configurable Posit Enabled
                                  RISC-V Core  . . . . . . . . . . . . . . 25:1--25:26
       George Charitopoulos and   
 Dionisios N. Pnevmatikatos and   
              Georgi Gaydadjiev   MC-DeF: Creating Customized CGRAs for
                                  Dataflow Applications  . . . . . . . . . 26:1--26:25
   Jose M. Rodriguez Borbon and   
               Junjie Huang and   
              Bryan M. Wong and   
                   Walid Najjar   Acceleration of Parallel-Blocked $ Q R $
                                  Decomposition of Tall-and-Skinny
                                  Matrices on FPGAs  . . . . . . . . . . . 27:1--27:25
             Michael Stokes and   
              David Whalley and   
                    Soner Onder   Decreasing the Miss Rate and Eliminating
                                  the Performance Penalty of a Data Filter
                                  Cache  . . . . . . . . . . . . . . . . . 28:1--28:22
                   Shoaib Akram   Performance Evaluation of Intel Optane
                                  Memory for Managed Workloads . . . . . . 29:1--29:26
            Yashuai Lü and   
                    Hui Guo and   
                 Libo Huang and   
                      Qi Yu and   
                    Li Shen and   
                  Nong Xiao and   
                   Zhiying Wang   GraphPEG: Accelerating Graph Processing
                                  on GPUs  . . . . . . . . . . . . . . . . 30:1--30:24
                 Hamza Omar and   
                      Omer Khan   PRISM: Strong Hardware Isolation-based
                                  Soft-Error Resilient Multicore
                                  Architecture with High Performance and
                                  Availability at Low Hardware Overheads   31:1--31:25
         Devashree Tripathy and   
       Amirali Abdolrashidi and   
       Laxmi Narayan Bhuyan and   
                 Liang Zhou and   
                    Daniel Wong   PAVER: Locality Graph-Based Thread Block
                                  Scheduling for GPUs  . . . . . . . . . . 32:1--32:26
                Wim Heirman and   
              Stijn Eyerman and   
            Kristof Du Bois and   
                    Ibrahim Hur   Automatic Sublining for Efficient Sparse
                                  Memory Accesses  . . . . . . . . . . . . 33:1--33:23
              Mustafa Cavus and   
          Mohammed Shatnawi and   
               Resit Sendag and   
                Augustus K. Uht   Fast Key-Value Lookups with Node Tracker 34:1--34:26
                Weijia Song and   
       Christina Delimitrou and   
               Zhiming Shen and   
        Robbert Van Renesse and   
         Hakim Weatherspoon and   
           Lotfi Benmohamed and   
          Frederic De Vaulx and   
                Charif Mahmoudi   CacheInspector: Reverse Engineering
                                  Cache Resources in Public Clouds . . . . 35:1--35:25
  Daniel Rodrigues Carvalho and   
            André Seznec   Understanding Cache Compression  . . . . 36:1--36:27
             Daniel Thuerck and   
              Nicolas Weber and   
                Roberto Bifulco   Flynn's Reconciliation: Automating the
                                  Register Cache Idiom for
                                  Cross-accelerator Programming  . . . . . 37:1--37:26
João P. L. De Carvalho and   
               Braedy Kuzma and   
            Ivan Korostelev and   
  José Nelson Amaral and   
         Christopher Barton and   
        José Moreira and   
                   Guido Araujo   KernelFaRer: Replacing Native-Code
                                  Idioms with High-Performance Library
                                  Calls  . . . . . . . . . . . . . . . . . 38:1--38:22
              Ricardo Alves and   
           Stefanos Kaxiras and   
           David Black-Schaffer   Early Address Prediction: Efficient
                                  Pipeline Prefetch and Reuse  . . . . . . 39:1--39:22

ACM Transactions on Architecture and Code Optimization
Volume 18, Number 4, December, 2021

            Kaustav Goswami and   
        Dip Sankar Banerjee and   
                 Shirshendu Das   Towards Enhanced System Efficiency while
                                  Mitigating Row Hammer  . . . . . . . . . 40:1--40:26
                  Jerzy Proficz   All-gather Algorithms Resilient to
                                  Imbalanced Process Arrival Patterns  . . 41:1--41:22
                     Rui Xu and   
                   Sheng Ma and   
                Yaohua Wang and   
                Xinhai Chen and   
                       Yang Guo   Configurable Multi-directional Systolic
                                  Array Architecture for Convolutional
                                  Neural Networks  . . . . . . . . . . . . 42:1--42:24
                  Wonik Seo and   
               Sanghoon Cha and   
                Yeonjae Kim and   
                Jaehyuk Huh and   
                    Jongse Park   SLO-Aware Inference Scheduler for
                                  Heterogeneous Processors in Edge
                                  Platforms  . . . . . . . . . . . . . . . 43:1--43:26
      Yasir Mahmood Qureshi and   
       William Andrew Simon and   
             Marina Zapater and   
             Katzalin Olcoz and   
                  David Atienza   Gem5-X: a Many-core Heterogeneous
                                  Simulation Platform for Architectural
                                  Exploration and Optimization . . . . . . 44:1--44:27
                  Tina Jung and   
              Fabian Ritter and   
                 Sebastian Hack   PICO: a Presburger In-bounds Check
                                  Optimization for Compiler-based Memory
                                  Safety Instrumentations  . . . . . . . . 45:1--45:27
                Zhibing Sha and   
                     Jun Li and   
                 Lihao Song and   
                Jiewen Tang and   
                  Min Huang and   
                Zhigang Cai and   
                Lianju Qian and   
               Jianwei Liao and   
                    Zhiming Liu   Low I/O Intensity-aware Partial GC
                                  Scheduling to Reduce Long-tail Latency
                                  in SSDs  . . . . . . . . . . . . . . . . 46:1--46:25
             Syed Asad Alam and   
              James Garland and   
                    David Gregg   Low-precision Logarithmic Number
                                  Systems: Beyond Base-2 . . . . . . . . . 47:1--47:25
             Candace Walden and   
               Devesh Singh and   
     Meenatchi Jagasivamani and   
                   Shang Li and   
                  Luyi Kang and   
           Mehdi Asnaashari and   
             Sylvain Dubois and   
                Bruce Jacob and   
                   Donald Yeung   Monolithically Integrating Non-Volatile
                                  Main Memory over the Last-Level Cache    48:1--48:26
              Matthew Tomei and   
                 Shomit Das and   
        Mohammad Seyedzadeh and   
           Philip Bedoukian and   
          Bradford Beckmann and   
               Rakesh Kumar and   
                     David Wood   Byte-Select Compression  . . . . . . . . 49:1--49:27
                   Cunlu Li and   
                 Dezun Dong and   
               Shazhou Yang and   
               Xiangke Liao and   
                Guangyu Sun and   
                   Yongheng Liu   CIB-HIER: Centralized Input Buffer
                                  Design in Hierarchical High-radix
                                  Routers  . . . . . . . . . . . . . . . . 50:1--50:21
                Tobias Gysi and   
      Christoph Müller and   
          Oleksandr Zinenko and   
             Stephan Herhut and   
                Eddie Davis and   
               Tobias Wicky and   
              Oliver Fuhrer and   
            Torsten Hoefler and   
                 Tobias Grosser   Domain-Specific Multi-Level IR Rewriting
                                  for GPU: The Open Earth Compiler for
                                  GPU-accelerated Climate Simulation . . . 51:1--51:23
                     An Zou and   
                Huifeng Zhu and   
               Jingwen Leng and   
                     Xin He and   
         Vijay Janapa Reddi and   
        Christopher D. Gill and   
                     Xuan Zhang   System-level Early-stage Modeling and
                                  Evaluation of IVR-assisted Processor
                                  Power Delivery System  . . . . . . . . . 52:1--52:27
             Aninda Manocha and   
             Tyler Sorensen and   
                Esin Tureci and   
          Opeoluwa Matthews and   
      Juan L. Aragón and   
             Margaret Martonosi   GraphAttack: Optimizing Data Supply for
                                  Graph Applications on In-Order Multicore
                                  Architectures  . . . . . . . . . . . . . 53:1--53:26
                Joscha Benz and   
               Oliver Bringmann   Scenario-Aware Program Specialization
                                  for Timing Predictability  . . . . . . . 54:1--54:26
        Shounak Chakraborty and   
          Magnus Själander   WaFFLe: Gated Cache-Ways with Per-Core
                                  Fine-Grained DVFS for Reduced On-Chip
                                  Temperature and Leakage Consumption  . . 55:1--55:25
         Sriseshan Srikanth and   
               Anirudh Jain and   
            Thomas M. Conte and   
       Erik P. Debenedictis and   
                   Jeanine Cook   SortCache: Intelligent Cache Management
                                  for Accelerating Sparse Data Workloads   56:1--56:24
               Paul Metzger and   
              Volker Seeker and   
           Christian Fensch and   
                    Murray Cole   Device Hopping: Transparent Mid-Kernel
                                  Runtime Switching for Heterogeneous
                                  Systems  . . . . . . . . . . . . . . . . 57:1--57:25
                   Yu Zhang and   
                    Da Peng and   
               Xiaofei Liao and   
                    Hai Jin and   
                 Haikun Liu and   
                     Lin Gu and   
                   Bingsheng He   LargeGraph: an Efficient
                                  Dependency-Aware GPU-Accelerated
                                  Large-Scale Graph Processing . . . . . . 58:1--58:24
        Hüsrev Cilasun and   
              Salonik Resch and   
       Zamshed I. Chowdhury and   
                 Erin Olson and   
              Masoud Zabihi and   
             Zhengyang Zhao and   
            Thomas Peterson and   
            Keshab K. Parhi and   
             Jian-Ping Wang and   
       Sachin S. Sapatnekar and   
               Ulya R. Karpuzcu   Spiking Neural Networks in Spintronic
                                  Computational RAM  . . . . . . . . . . . 59:1--59:21

ACM Transactions on Architecture and Code Optimization
Volume 19, Number 1, March, 2022

            Aditya Ukarande and   
          Suryakant Patidar and   
                     Ram Rangan   Locality-Aware CTA Scheduling for Gaming
                                  Applications . . . . . . . . . . . . . . 1:1--1:26
                Hongzhi Liu and   
                    Jie Luo and   
                    Ying Li and   
                    Zhonghai Wu   Iterative Compilation Optimization Based
                                  on Metric Learning and Collaborative
                                  Filtering  . . . . . . . . . . . . . . . 2:1--2:25
   Muhammad Aditya Sasongko and   
              Milind Chabbi and   
Mandana Bagheri Marzijarani and   
                     Didem Unat   ReuseTracker: Fast Yet Accurate
                                  Multicore Reuse Distance Analyzer  . . . 3:1--3:25
                Yaosheng Fu and   
             Evgeny Bolotin and   
       Niladrish Chatterjee and   
              David Nellans and   
             Stephen W. Keckler   GPU Domain Specialization via Composable
                                  On-Package Architecture  . . . . . . . . 4:1--4:23
                Daeyeal Lee and   
                   Bill Lin and   
               Chung-Kuan Cheng   SMT-Based Contention-Free Task Mapping
                                  and Scheduling on $2$D/$3$D SMART NoC
                                  with Mixed Dimension-Order Routing . . . 5:1--5:21
         Prasanth Chatarasi and   
              Hyoukjun Kwon and   
         Angshuman Parashar and   
           Michael Pellauer and   
             Tushar Krishna and   
                   Vivek Sarkar   Marvel: a Data-Centric Approach for
                                  Mapping Deep Learning Operators on
                                  Spatial Accelerators . . . . . . . . . . 6:1--6:26
              Dennis Rieber and   
                Axel Acosta and   
            Holger Fröning   Joint Program and Layout Transformations
                                  to Enable Convolutional Operators on
                                  Specialized Hardware Based on Constraint
                                  Programming  . . . . . . . . . . . . . . 7:1--7:26
                 Mengya Lei and   
                     Fan Li and   
                  Fang Wang and   
                   Dan Feng and   
                Xiaomin Zou and   
                    Renzhi Xiao   SecNVM: an Efficient and Write-Friendly
                                  Metadata Crash Consistency Scheme for
                                  Secure NVM . . . . . . . . . . . . . . . 8:1--8:26
                    Bang Di and   
                  Daokun Hu and   
                   Zhen Xie and   
                Jianhua Sun and   
                   Hao Chen and   
                 Jinkui Ren and   
                        Dong Li   TLB-pilot: Mitigating TLB Contention
                                  Attack on GPUs with
                                  Microarchitecture-Aware Scheduling . . . 9:1--9:23
         Gururaj Saileshwar and   
                Rick Boivie and   
                  Tong Chen and   
             Benjamin Segal and   
           Alper Buyuktosunoglu   HeapCheck: Low-cost Hardware Support for
                                  Memory Safety  . . . . . . . . . . . . . 10:1--10:24
             M. Waqar Azhar and   
           Miquel Peric\`as and   
             Per Stenström   Task-RM: a Resource Manager for Energy
                                  Reduction in Task-Parallel Applications
                                  under Quality of Service Constraints . . 11:1--11:26
                Cesar Gomes and   
            Maziar Amiraski and   
                 Mark Hempstead   CASHT: Contention Analysis in Shared
                                  Hierarchies with Thefts  . . . . . . . . 12:1--12:27
                 Yufei Wang and   
               Xiaoshe Dong and   
             Longxiang Wang and   
                Weiduo Chen and   
                  Xingjun Zhang   Optimizing Small-Sample Disk Fault
                                  Detection Based on LSTM-GAN Model  . . . 13:1--13:24
             Franyell Silfa and   
           Jose Maria Arnau and   
        Antonio González   E-BATCH: Energy-Efficient and
                                  High-Throughput RNN Batching . . . . . . 14:1--14:23
                  Chen Ding and   
                  Dong Chen and   
               Fangzhou Liu and   
             Benjamin Reber and   
                   Wesley Smith   CARL: Compiler Assigned Reference
                                  Leasing  . . . . . . . . . . . . . . . . 15:1--15:28

ACM Transactions on Architecture and Code Optimization
Volume 19, Number 2, June, 2022

           Christof Schlaak and   
            Tzung-Han Juang and   
              Christophe Dubach   Memory-Aware Functional IR for
                                  Higher-Level Synthesis of Accelerators   16:1--16:26
   Kartik Lakshminarasimhan and   
             Ajeya Naithani and   
         Josué Feliu and   
                Lieven Eeckhout   The Forward Slice Core: a
                                  High-Performance, Yet Low-Complexity
                                  Microarchitecture  . . . . . . . . . . . 17:1--17:25
       Sharanyan Srikanthan and   
          Sayak Chakraborti and   
            Princeton Ferro and   
              Sandhya Dwarkadas   MAPPER: Managing Application Performance
                                  via Parallel Efficiency Regulation * . . 18:1--18:26
      Tziouvaras Athanasios and   
         Dimitriou Georgios and   
             Stamoulis Georgios   Low-power Near-data Instruction
                                  Execution Leveraging Opcode-based Timing
                                  Analysis . . . . . . . . . . . . . . . . 19:1--19:26
                Xingguo Jia and   
                  Jin Zhang and   
                   Boshi Yu and   
               Xingyue Qian and   
                Zhengwei Qi and   
                   Haibing Guan   GiantVM: a Novel Distributed Hypervisor
                                  for Resource Aggregation with DSM-aware
                                  Optimizations  . . . . . . . . . . . . . 20:1--20:27
              Mehrzad Nejat and   
        Madhavan Manivannan and   
           Miquel Peric\`as and   
             Per Stenström   Cooperative Slack Management: Saving
                                  Energy of Multicore Processors by
                                  Trading Performance Slack Between
                                  QoS-Constrained Applications . . . . . . 21:1--21:27
            Hugo Pompougnac and   
            Ulysse Beaugnon and   
               Albert Cohen and   
         Dumitru Potop Butucaru   Weaving Synchronous Reactions into the
                                  Fabric of SSA-form Compilers . . . . . . 22:1--22:25
            Ghassan Shobaki and   
          Vahl Scott Gordon and   
                Paul McHugh and   
            Theodore Dubois and   
                  Austin Kerbow   Register-Pressure-Aware Instruction
                                  Scheduling Using Ant Colony Optimization 23:1--23:23
                 Qihan Wang and   
                  Zhen Peng and   
                    Bin Ren and   
                   Jie Chen and   
              Robert G. Edwards   MemHC: an Optimized GPU Memory
                                  Management Framework for Accelerating
                                  Many-body Correlation  . . . . . . . . . 24:1--24:26
               Rakesh Kumar and   
              Mehdi Alipour and   
           David Black-Schaffer   Dependence-aware Slice Execution to
                                  Boost MLP in Slice-out-of-order Cores    25:1--25:28
         Nandita Vijaykumar and   
              Ataberk Olgun and   
 Konstantinos Kanellopoulos and   
           F. Nisa Bostanci and   
               Hasan Hassan and   
             Mehrshad Lotfi and   
         Phillip B. Gibbons and   
                     Onur Mutlu   \pkgMetaSys: a Practical Open-source
                                  Metadata Management System to Implement
                                  and Evaluate Cross-layer Optimizations   26:1--26:29
                  Jing Chen and   
        Madhavan Manivannan and   
        Mustafa Abduljabbar and   
               Miquel Peric\`as   \pkgERASE: Energy Efficient Task Mapping
                                  and Resource Management for Work
                                  Stealing Runtimes  . . . . . . . . . . . 27:1--27:29
               Chencheng Ye and   
                Yuanchao Xu and   
                Xipeng Shen and   
                    Hai Jin and   
               Xiaofei Liao and   
                    Yan Solihin   Preserving Addressability Upon
                                  GC-Triggered Data Movements on
                                  Non-Volatile Memory  . . . . . . . . . . 28:1--28:26
    George Michelogiannakis and   
             Benjamin Klenk and   
               Brandon Cook and   
                Min Yee Teh and   
            Madeleine Glick and   
             Larry Dennison and   
              Keren Bergman and   
                     John Shalf   A Case For Intra-rack Resource
                                  Disaggregation in HPC  . . . . . . . . . 29:1--29:26

ACM Transactions on Architecture and Code Optimization
Volume 19, Number 3, September, 2022

                  Ping Wang and   
                    Fei Wen and   
              Paul V. Gratz and   
                 Alex Sprintson   SIMD-Matcher: a SIMD-based Arbitrary
                                  Matching Framework . . . . . . . . . . . 30:1--30:20
             Marcel Mettler and   
                Martin Rapp and   
                  Heba Khdr and   
Daniel Mueller-Gritschneder and   
           Jörg Henkel and   
               Ulf Schlichtmann   An FPGA-based Approach to Evaluate
                                  Thermal and Resource Management
                                  Strategies of Many-core Processors . . . 31:1--31:24
            Paschalis Mpeis and   
          Pavlos Petoumenos and   
              Kim Hazelwood and   
                   Hugh Leather   Object Intersection Captures on
                                  Interactive Apps to Drive a
                                  Crowd-sourced Replay-based Compiler
                                  Optimization . . . . . . . . . . . . . . 32:1--32:25
                   Cunlu Li and   
                 Dezun Dong and   
                   Xiangke Liao   MUA-Router: Maximizing the
                                  Utility-of-Allocation for On-chip
                                  Pipelining Routers . . . . . . . . . . . 33:1--33:23
            Ziaul Choudhury and   
       Shashwat Shrivastava and   
        Lavanya Ramapantulu and   
                  Suresh Purini   An FPGA Overlay for CNN Inference with
                                  Fine-grained Flexible Parallelism  . . . 34:1--34:26
        Diksha Moolchandani and   
               Anshul Kumar and   
              Smruti R. Sarangi   Performance and Power Prediction for
                                  Concurrent Execution on GPUs . . . . . . 35:1--35:27
             Ali Jahanshahi and   
                 Nanpeng Yu and   
                    Daniel Wong   PowerMorph: QoS-Aware Server Power
                                  Reshaping for Data Center Regulation
                                  Service  . . . . . . . . . . . . . . . . 36:1--36:27
                    Peng Xu and   
                Nannan Zhao and   
                Jiguang Wan and   
                    Wei Liu and   
               Shuning Chen and   
               Yuanhui Zhou and   
             Hadeel Albahar and   
                Hanyang Liu and   
                   Liu Tang and   
                      Zhihu Tan   Building a Fast and Efficient LSM-tree
                                  Store by Integrating Local Storage with
                                  Cloud Storage  . . . . . . . . . . . . . 37:1--37:26
           Horng-Ruey Huang and   
             Ding-Yong Hong and   
                 Jan-Jan Wu and   
               Kung-Fu Chen and   
               Pangfeng Liu and   
                  Wei-Chung Hsu   Accelerating Video Captioning on
                                  Heterogeneous System Architectures . . . 38:1--38:25
David Corbalán-Navarro and   
      Juan L. Aragón and   
       Martí Anglada and   
      Joan-Manuel Parcerisa and   
        Antonio González   Triangle Dropping: an Occluded-geometry
                                  Predictor for Energy-efficient Mobile
                                  GPUs . . . . . . . . . . . . . . . . . . 39:1--39:20
              Shivam Kundan and   
        Theodoros Marinakis and   
    Iraklis Anagnostopoulos and   
                Dimitri Kagaris   A Pressure-Aware Policy for Contention
                                  Minimization on Multicore Systems  . . . 40:1--40:26
            Johnathan Alsop and   
               Weon Taek Na and   
        Matthew D. Sinclair and   
             Samuel Grayson and   
                    Sarita Adve   A Case for Fine-grain Coherence
                                  Specialization in Heterogeneous Systems  41:1--41:26
    Mohammadreza Soltaniyeh and   
          Richard P. Martin and   
            Santosh Nagarakatte   An Accelerator for Sparse Convolutional
                                  Neural Networks Leveraging Systolic
                                  General Matrix--matrix Multiplication    42:1--42:26
           Dharanidhar Dang and   
                   Bill Lin and   
                 Debashis Sahoo   LiteCON: an All-photonic Neuromorphic
                                  Accelerator for Energy-efficient Deep
                                  Learning . . . . . . . . . . . . . . . . 43:1--43:22
              Lokesh Siddhu and   
               Rajesh Kedia and   
             Shailja Pandey and   
                Martin Rapp and   
              Anuj Pathania and   
           Jörg Henkel and   
            Preeti Ranjan Panda   CoMeT: an Integrated Interval Thermal
                                  Simulation Toolchain for $2$D, 2.5D, and
                                  $3$D Processor-Memory Systems  . . . . . 44:1--44:25
               M. Ben Olson and   
       Brandon Kammerdiener and   
           Michael R. Jantz and   
           Kshitij A. Doshi and   
                    Terry Jones   Online Application Guidance for
                                  Heterogeneous Memory Systems . . . . . . 45:1--45:27
    Bruno Chinelato Honorio and   
João P. L. De Carvalho and   
     Catalina Munoz Morales and   
        Alexandro Baldassin and   
                   Guido Araujo   Using Barrier Elision to Improve
                                  Transactional Code Generation  . . . . . 46:1--46:23

ACM Transactions on Architecture and Code Optimization
Volume 19, Number 4, December, 2022

                Jiansong Li and   
               Xueying Wang and   
              Xiaobing Chen and   
                 Guangli Li and   
                  Xiao Dong and   
                  Peng Zhao and   
                 Xianzhi Yu and   
               Yongxin Yang and   
                    Wei Cao and   
                    Lei Liu and   
                  Xiaobing Feng   An Application-oblivious Memory
                                  Scheduling System for DNN Accelerators   47:1--47:??
             Aditya Narayan and   
             Yvain Thonnart and   
               Pascal Vivet and   
                Ayse Coskun and   
                     Ajay Joshi   Architecting Optically Controlled Phase
                                  Change Memory  . . . . . . . . . . . . . 48:1--48:??
                 Chao Zhang and   
          Maximilian Bremer and   
                    Cy Chan and   
                 John Shalf and   
                   Xiaochen Guo   ASA: Accelerating Sparse Accumulation in
                                  Column-wise SpGEMM . . . . . . . . . . . 49:1--49:??
                   Aart Bik and   
       Penporn Koanantakool and   
          Tatiana Shpeisman and   
          Nicolas Vasilache and   
                Bixia Zheng and   
               Fredrik Kjolstad   Compiler Support for Sparse Tensor
                                  Computations in MLIR . . . . . . . . . . 50:1--50:??
             Pierre Michaud and   
                  Anis Peysieux   HAIR: Halving the Area of the Integer
                                  Register File with Odd/Even Banking  . . 51:1--51:??
       Amirreza Yousefzadeh and   
                 Jan Stuijt and   
             Martijn Hijdra and   
            Hsiao-Hsuan Liu and   
       Anteneh Gebregiorgis and   
             Abhairaj Singh and   
              Said Hamdioui and   
               Francky Catthoor   Energy-efficient In-Memory Address
                                  Calculation  . . . . . . . . . . . . . . 52:1--52:??
                  Hwisoo So and   
            Moslem Didehban and   
                   Yohan Ko and   
         Aviral Shrivastava and   
                  Kyoungwoo Lee   EXPERTISE: an Effective Software-level
                                  Redundant Multithreading Scheme against
                                  Hardware Faults  . . . . . . . . . . . . 53:1--53:??
                Tim Hartley and   
           Foivos S. Zakkak and   
                Andy Nisbet and   
        Christos Kotselidis and   
             Mikel Luján   Just-In-Time Compilation on ARM --- a
                                  Closer Look at Call-Site Code
                                  Consistency  . . . . . . . . . . . . . . 54:1--54:??
              Erling Jellum and   
          Milica Orlandi\'c and   
              Edmund Brekke and   
               Tor Johansen and   
                  Torleiv Bryne   Solving Sparse Assignment Problems on
                                  FPGAs  . . . . . . . . . . . . . . . . . 55:1--55:??
                   Yuhao Li and   
                Benjamin C. Lee   Phronesis: Efficient Performance
                                  Modeling for High-dimensional
                                  Configuration Tuning . . . . . . . . . . 56:1--56:??
   Chandrahas Tirumalasetty and   
            Chih Chieh Chou and   
            Narasimha Reddy and   
                 Paul Gratz and   
               Ayman Abouelwafa   Reducing Minor Page Fault Overheads
                                  through Enhanced Page Walker . . . . . . 57:1--57:??
                    Lan Gao and   
                  Jing Wang and   
                  Weigong Zhang   Adaptive Contention Management for
                                  Fine-Grained Synchronization on
                                  Commodity GPUs . . . . . . . . . . . . . 58:1--58:??
                Ruobing Han and   
                 Jaewon Lee and   
               Jaewoong Sim and   
                    Hyesoon Kim   COX : Exposing CUDA Warp-level Functions
                                  to CPUs  . . . . . . . . . . . . . . . . 59:1--59:??
                 Yiding Liu and   
              Xingyao Zhang and   
             Donglin Zhuang and   
                     Xin Fu and   
                  Shuaiwen Song   DynamAP: Architectural Support for
                                  Dynamic Graph Traversal on the Automata
                                  Processor  . . . . . . . . . . . . . . . 60:1--60:??
               Changwei Zou and   
                Yaoqing Gao and   
                   Jingling Xue   Practical Software-Based Shadow Stacks
                                  on x86-64  . . . . . . . . . . . . . . . 61:1--61:??

ACM Transactions on Architecture and Code Optimization
Volume 20, Number 1, March, 2023

             Thomas Luinaud and   
      J. M. Pierre Langlois and   
                   Yvon Savaria   Symbolic Analysis for Data Plane
                                  Programs Specialization  . . . . . . . . 1:1--1:??
       Nilesh Rajendra Shah and   
             Ashitabh Misra and   
        Antoine Miné and   
              Rakesh Venkat and   
          Ramakrishna Upadrasta   BullsEye: Scalable and Accurate
                                  Approximation Framework for Cache Miss
                                  Calculation  . . . . . . . . . . . . . . 2:1--2:??
                Mitali Soni and   
                 Asmita Pal and   
              Joshua San Miguel   As-Is Approximate Computing  . . . . . . 3:1--3:??
                 Parth Shah and   
      Ranjal Gautham Shenoy and   
    Vaidyanathan Srinivasan and   
                Pradip Bose and   
           Alper Buyuktosunoglu   TokenSmart: Distributed, Scalable Power
                                  Management in the Many-core Era  . . . . 4:1--4:??
               Zhangyu Chen and   
                     Yu Hua and   
            Luochangqi Ding and   
                    Bo Ding and   
                Pengfei Zuo and   
                        Xue Liu   Lock-Free High-performance Hashing for
                                  Persistent Memory via PM-aware Holistic
                                  Optimization . . . . . . . . . . . . . . 5:1--5:??
        Aristeidis Mastoras and   
       Sotiris Anagnostidis and   
          Albert-Jan N. Yzelman   Design and Implementation for
                                  Nonblocking Execution in GraphBLAS:
                                  Tradeoffs and Performance  . . . . . . . 6:1--6:??
                   Yemao Xu and   
                 Dezun Dong and   
             Dongsheng Wang and   
                     Shi Xu and   
                    Enda Yu and   
                  Weixia Xu and   
                   Xiangke Liao   SSD-SGD: Communication Sparsification
                                  for Distributed Deep Learning Training   7:1--7:??
              Ataberk Olgun and   
     Juan Gómez Luna and   
 Konstantinos Kanellopoulos and   
              Behzad Salami and   
               Hasan Hassan and   
                 Oguz Ergin and   
                     Onur Mutlu   PiDRAM: a Holistic End-to-end FPGA-based
                                  Framework for Processing-in-DRAM . . . . 8:1--8:??
           Christos Sakalis and   
           Stefanos Kaxiras and   
          Magnus Själander   Delay-on-Squash: Stopping
                                  Microarchitectural Replay Attacks in
                                  Their Tracks . . . . . . . . . . . . . . 9:1--9:??
                   Yi Liang and   
              Shaokang Zeng and   
                       Lei Wang   Quantifying Resource Contention of
                                  Co-located Workloads with the
                                  System-level Entropy . . . . . . . . . . 10:1--10:??
                 Hur Suyeon and   
                Seongmin Na and   
                Dongup Kwon and   
               Kim Joonsung and   
             Andrew Boutros and   
           Eriko Nurvitadhi and   
                    Jangwoo Kim   A Fast and Flexible FPGA-based
                                  Accelerator for Natural Language
                                  Processing Neural Networks . . . . . . . 11:1--11:??
          Ashish Gondimalla and   
               Jianqiao Liu and   
        Mithuna Thottethodi and   
               T. N. Vijaykumar   Occam: Optimal Data Reuse for
                                  Convolutional Neural Networks  . . . . . 12:1--12:??
                    Bo Peng and   
                 Yaozu Dong and   
                Jianguo Yao and   
               Fengguang Wu and   
                   Haibing Guan   FlexHM: a Practical System for
                                  Heterogeneous Memory with Flexible and
                                  Efficient Performance Optimizations  . . 13:1--13:??
                Qiang Zhang and   
                     Lei Xu and   
                      Baowen Xu   RegCPython: a Register-based Python
                                  Interpreter for Better Performance . . . 14:1--14:??
                    Hai Jin and   
                    Zhuo He and   
                 Weizhong Qiang   SpecTerminator: Blocking Speculative
                                  Side Channels Based on Instruction
                                  Classes on RISC-V  . . . . . . . . . . . 15:1--15:??
                Tuowen Zhao and   
               Tobi Popoola and   
                  Mary Hall and   
     Catherine Olschanowsky and   
                Michelle Strout   Polyhedral Specification and Code
                                  Generation of Sparse Tensor Contraction
                                  with Co-iteration  . . . . . . . . . . . 16:1--16:??
            Manuela Schuler and   
           Richard Membarth and   
              Philipp Slusallek   XEngine: Optimal Tensor
                                  Rematerialization for Neural Networks in
                                  Heterogeneous Environments . . . . . . . 17:1--17:??
            Ivan Korostelev and   
João P. L. De Carvalho and   
        José Moreira and   
      José Nelson Amaral   YaConv: Convolution with Low Cache
                                  Footprint  . . . . . . . . . . . . . . . 18:1--18:??
                Furkan Eris and   
               Marcia Louis and   
                 Kubra Eris and   
 José Abellán and   
                     Ajay Joshi   Puppeteer: a Random Forest Based Manager
                                  for Hardware Prefetchers Across the
                                  Memory Hierarchy . . . . . . . . . . . . 19:1--19:??

ACM Transactions on Architecture and Code Optimization
Volume 20, Number 2, June, 2023

         Nicolas Tollenaere and   
            Guillaume Iooss and   
     Stéphane Pouget and   
                Hugo Brunie and   
         Christophe Guillon and   
               Albert Cohen and   
              P. Sadayappan and   
               Fabrice Rastello   Autotuning Convolutions Is Easier Than
                                  You Think  . . . . . . . . . . . . . . . 20:1--20:??
 Víctor Pérez and   
               Lukas Sommer and   
       Victor Lomüller and   
         Kumudha Narasimhan and   
                     Mehdi Goli   User-driven Online Kernel Fusion for
                                  SYCL . . . . . . . . . . . . . . . . . . 21:1--21:??
         Vinicius Espindola and   
               Luciano Zago and   
       Hervé Yviquel and   
                   Guido Araujo   Source Matching and Rewriting for MLIR
                                  Using String-Based Automata  . . . . . . 22:1--22:??
                 Wenjing Ma and   
               Fangfang Liu and   
                Daokun Chen and   
                 Qinglin Lu and   
                      Yi Hu and   
               Hongsen Wang and   
                    Xinhui Yuan   An Optimized Framework for Matrix
                                  Factorization on the New Sunway
                                  Many-core Platform . . . . . . . . . . . 23:1--23:??
            Sarabjeet Singh and   
              Neelam Surana and   
             Kailash Prasad and   
              Pranjali Jain and   
               Joycee Mekie and   
                   Manu Awasthi   HyGain: High-performance,
                                  Energy-efficient Hybrid Gain Cell-based
                                  Cache Hierarchy  . . . . . . . . . . . . 24:1--24:??
     Chandra Sekhar Mummidi and   
                   Sandip Kundu   ACTION: Adaptive Cache Block Migration
                                  in Distributed Cache Architectures . . . 25:1--25:??
                 Qiaoyi Liu and   
                Jeff Setter and   
                Dillon Huff and   
            Maxwell Strange and   
              Kathleen Feng and   
              Mark Horowitz and   
             Priyanka Raina and   
               Fredrik Kjolstad   Unified Buffer: Compiling Image
                                  Processing and Machine Learning
                                  Applications to Push-Memory Accelerators 26:1--26:??
Ahmet Caner Yüzügüler and   
        Canberk Sönmez and   
              Mario Drumond and   
                   Yunho Oh and   
              Babak Falsafi and   
                Pascal Frossard   Scale-out Systolic Arrays  . . . . . . . 27:1--27:??
        Francesco Minervini and   
              Oscar Palomar and   
                Osman Unsal and   
            Enrico Reggiani and   
              Josue Quiroga and   
               Joan Marimon and   
               Carlos Rojas and   
             Roger Figueras and   
               Abraham Ruiz and   
           Alberto Gonzalez and   
           Jonnatan Mendoza and   
                Ivan Vargas and   
     César Hernandez and   
                 Joan Cabre and   
           Lina Khoirunisya and   
           Mustapha Bouhali and   
               Julian Pavon and   
              Francesc Moll and   
             Mauro Olivieri and   
                Mario Kovac and   
                 Mate Kovac and   
                Leon Dragic and   
               Mateo Valero and   
                 Adrian Cristal   Vitruvius+: an Area-Efficient RISC-V
                                  Decoupled Vector Coprocessor for High
                                  Performance Computing Applications . . . 28:1--28:??
          Hadjer Benmeziane and   
           Hamza Ouarnoughi and   
       Kaoutar El Maghraoui and   
                     Smail Niar   Multi-objective Hardware-aware Neural
                                  Architecture Search with Pareto
                                  Rank-preserving Surrogate Models . . . . 29:1--29:??
               Dongwei Chen and   
                  Dong Tong and   
                  Chun Yang and   
               Jiangfang Yi and   
                       Xu Cheng   FlexPointer: Fast Address Translation
                                  Based on Range TLB and Tagged Pointers   30:1--30:??
                 Jingwen Du and   
                  Fang Wang and   
                   Dan Feng and   
              Changchen Gan and   
                 Yuchao Cao and   
                Xiaomin Zou and   
                         Fan Li   Fast One-Sided RDMA-Based State Machine
                                  Replication for Disaggregated Memory . . 31:1--31:??

ACM Transactions on Architecture and Code Optimization
Volume 20, Number 3, September, 2023

        Abdul Rasheed Sahni and   
                 Hamza Omar and   
                  Usman Ali and   
                      Omer Khan   ASM: an Adaptive Secure Multicore for
                                  Co-located Mutually Distrusting
                                  Processes  . . . . . . . . . . . . . . . 32:1--32:??
             Sooraj Puthoor and   
               Mikko H. Lipasti   Turn-based Spatiotemporal Coherence for
                                  GPUs . . . . . . . . . . . . . . . . . . 33:1--33:??
               Ruobing Chen and   
                 Haosen Shi and   
                 Jinping Wu and   
                   Yusen Li and   
              Xiaoguang Liu and   
                      Gang Wang   Jointly Optimizing Job Assignment and
                                  Resource Partitioning for Improving
                                  System Throughput in Cloud Datacenters   34:1--34:??
     Gokul Subramanian Ravi and   
             Tushar Krishna and   
                  Mikko Lipasti   TNT: a Modular Approach to Traversing
                                  Physically Heterogeneous NOCs at
                                  Bare-wire Latency  . . . . . . . . . . . 35:1--35:??
                  Weizhi Xu and   
                 Yintai Sun and   
                Shengyu Fan and   
                     Hui Yu and   
                         Xin Fu   Accelerating Convolutional Neural
                                  Network by Exploiting Sparsity on GPUs   36:1--36:??
                   Jin Zhao and   
                   Yu Zhang and   
                  Ligang He and   
                   Qikun Li and   
                Xiang Zhang and   
                Xinyu Jiang and   
                     Hui Yu and   
               Xiaofei Liao and   
                    Hai Jin and   
                     Lin Gu and   
                 Haikun Liu and   
               Bingsheng He and   
                   Ji Zhang and   
             Xianzheng Song and   
                   Lin Wang and   
                       Jun Zhou   GraphTune: an Efficient Dependency-Aware
                                  Substrate to Alleviate Irregularity in
                                  Concurrent Graph Processing  . . . . . . 37:1--37:??
                Yufeng Zhou and   
                Alan L. Cox and   
          Sandhya Dwarkadas and   
                   Xiaowan Dong   The Impact of Page Size and
                                  Microarchitecture on Instruction Address
                                  Translation Overhead . . . . . . . . . . 38:1--38:??
             Benjamin Reber and   
              Matthew Gould and   
        Alexander H. Kneipp and   
               Fangzhou Liu and   
                Ian Prechtl and   
                  Chen Ding and   
                Linlin Chen and   
                    Dorin Patru   Cache Programming for Scientific Loops
                                  Using Leases . . . . . . . . . . . . . . 39:1--39:??
                Xinfeng Xie and   
                    Peng Gu and   
                 Yufei Ding and   
                  Dimin Niu and   
            Hongzhong Zheng and   
                       Yuan Xie   MPU: Memory-centric SIMT Processor via
                                  In-DRAM Near-bank Computing  . . . . . . 40:1--40:??
           Alexander Krolik and   
            Clark Verbrugge and   
                 Laurie Hendren   rNdN: Fast Query Compilation for NVIDIA
                                  GPUs . . . . . . . . . . . . . . . . . . 41:1--41:??
               Jiazhi Jiang and   
               Zijian Huang and   
                  Dan Huang and   
                 Jiangsu Du and   
                   Lin Chen and   
                Ziguan Chen and   
                      Yutong Lu   Hierarchical Model Parallelism for
                                  Optimizing Inference on Many-core
                                  Processor via Decoupled $3$D-CNN
                                  Structure  . . . . . . . . . . . . . . . 42:1--42:??
                 Yuwen Zhao and   
               Fangfang Liu and   
                 Wenjing Ma and   
                 Huiyuan Li and   
               Yuanchi Peng and   
                       Cui Wang   MFFT: a GPU Accelerated Highly Efficient
                                  Mixed-Precision Large-Scale FFT
                                  Framework  . . . . . . . . . . . . . . . 43:1--43:??
       Muhammad Waqar Azhar and   
        Madhavan Manivannan and   
             Per Stenström   Approx-RM: Reducing Energy on
                                  Heterogeneous Multicore Processors under
                                  Accuracy and Timing Constraints  . . . . 44:1--44:??
                 Dong Huang and   
                   Dan Feng and   
                Qiankun Liu and   
                    Bo Ding and   
                   Wei Zhao and   
               Xueliang Wei and   
                       Wei Tong   SplitZNS: Towards an Efficient LSM-Tree
                                  on Zoned Namespace SSDs  . . . . . . . . 45:1--45:??

ACM Transactions on Architecture and Code Optimization
Volume 20, Number 4, December, 2023

                 Jiangsu Du and   
               Jiazhi Jiang and   
                Jiang Zheng and   
              Hongbin Zhang and   
                  Dan Huang and   
                      Yutong Lu   Improving Computation and Memory
                                  Efficiency for Real-world Transformer
                                  Inference on GPUs  . . . . . . . . . . . 46:1--46:??
                    Hai Jin and   
                     Bo Lei and   
                 Haikun Liu and   
               Xiaofei Liao and   
               Zhuohui Duan and   
               Chencheng Ye and   
                       Yu Zhang   A Compilation Tool for Computation
                                  Offloading in ReRAM-based CIM
                                  Architectures  . . . . . . . . . . . . . 47:1--47:??
           Christian Menard and   
            Marten Lohstroh and   
             Soroush Bateni and   
           Matthew Chorlian and   
                Arthur Deng and   
              Peter Donovan and   
    Clément Fournier and   
                Shaokai Lin and   
              Felix Suchert and   
        Tassilo Tanneberger and   
                 Hokeun Kim and   
        Jeronimo Castrillon and   
                  Edward A. Lee   High-performance Deterministic
                                  Concurrency Using Lingua Franca  . . . . 48:1--48:??
                 Donglei Wu and   
                Weihao Yang and   
                Xiangyu Zou and   
                    Wen Xia and   
                   Shiyi Li and   
                  Zhenbo Hu and   
               Weizhe Zhang and   
                   Binxing Fang   Smart-DNN+: a Memory-efficient Neural
                                  Networks Compression Framework for the
                                  Model Inference  . . . . . . . . . . . . 49:1--49:??
Syed Salauddin Mohammad Tariq and   
               Lance Menard and   
                 Pengfei Su and   
                     Probir Roy   MicroProf: Code-level Attribution of
                                  Unnecessary Data Transfer in
                                  Microservice Applications  . . . . . . . 50:1--50:??
                   Shiyi Li and   
                  Qiang Cao and   
              Shenggang Wan and   
                    Wen Xia and   
                 Changsheng Xie   gPPM: a Generalized Matrix Operation and
                                  Parallel Algorithm to Accelerate the
                                  Encoding/Decoding Process of Erasure
                                  Codes  . . . . . . . . . . . . . . . . . 51:1--51:??
        Petros Anastasiadis and   
        Nikela Papadopoulou and   
            Georgios Goumas and   
          Nectarios Koziris and   
               Dennis Hoppe and   
                       Li Zhong   PARALiA: a Performance Aware Runtime for
                                  Auto-tuning Linear Algebra on
                                  Heterogeneous Systems  . . . . . . . . . 52:1--52:??
                     Hui Yu and   
                   Yu Zhang and   
                   Jin Zhao and   
                Yujian Liao and   
              Zhiying Huang and   
                 Donghao He and   
                     Lin Gu and   
                    Hai Jin and   
               Xiaofei Liao and   
                 Haikun Liu and   
               Bingsheng He and   
                    Jianhui Yue   RACE: an Efficient Redundancy-aware
                                  Accelerator for Dynamic Graph Neural
                                  Network  . . . . . . . . . . . . . . . . 53:1--53:??
             Victor Ferrari and   
               Rafael Sousa and   
             Marcio Pereira and   
João P. L. De Carvalho and   
  José Nelson Amaral and   
        José Moreira and   
                   Guido Araujo   Advancing Direct Convolution Using
                                  Convolution Slicing Optimization and ISA
                                  Extensions . . . . . . . . . . . . . . . 54:1--54:??
                   Bowen He and   
                 Xiao Zheng and   
                  Yuan Chen and   
                  Weinan Li and   
                 Yajin Zhou and   
                   Xin Long and   
            Pengcheng Zhang and   
                 Xiaowei Lu and   
              Linquan Jiang and   
                  Qiang Liu and   
                 Dennis Cai and   
                  Xiantao Zhang   DxPU: Large-scale Disaggregated GPU
                                  Pools in the Datacenter  . . . . . . . . 55:1--55:??
              Shiqing Zhang and   
      Mahmood Naderan-Tahan and   
               Magnus Jahre and   
                Lieven Eeckhout   Characterizing Multi-Chip GPU Data
                                  Sharing  . . . . . . . . . . . . . . . . 56:1--56:??
                 Jens Domke and   
                 Emil Vatai and   
              Balazs Gerofi and   
              Yuetsu Kodama and   
              Mohamed Wahib and   
              Artur Podobas and   
              Sparsh Mittal and   
           Miquel Peric\`as and   
               Lingqi Zhang and   
                  Peng Chen and   
            Aleksandr Drozd and   
               Satoshi Matsuoka   At the Locus of Performance: Quantifying
                                  the Effects of Copious $3$D-Stacked
                                  Cache on HPC Workloads . . . . . . . . . 57:1--57:??
       Satya Jaswanth Badri and   
               Mukesh Saini and   
                    Neeraj Goel   Mapi-Pro: an Energy Efficient Memory
                                  Mapping Technique for Intermittent
                                  Computing  . . . . . . . . . . . . . . . 58:1--58:??
                    Miao Yu and   
             Tingting Xiang and   
Venkata Pavan Kumar Miriyala and   
              Trevor E. Carlson   Multiply-and-Fire: an Event-Driven
                                  Sparse Neural Network Accelerator  . . . 59:1--59:??
            Ziaul Choudhury and   
               Anish Gulati and   
                  Suresh Purini   FlowPix: Accelerating Image Processing
                                  Pipelines on an FPGA Overlay using a
                                  Domain Specific Compiler . . . . . . . . 60:1--60:??
           Zachary Susskind and   
                 Aman Arora and   
         Igor D. S. Miranda and   
        Alan T. L. Bacellar and   
          Luis A. Q. Villon and   
        Rafael F. Katopodis and   
Leandro S. de Araújo and   
          Diego L. C. Dutra and   
        Priscila M. V. Lima and   
 Felipe M. G. França and   
    Mauricio Breternitz Jr. and   
                   Lizy K. John   ULEEN: a Novel Architecture for
                                  Ultra-low-energy Edge Neural Networks    61:1--61:??
                    Jia Wei and   
              Xingjun Zhang and   
             Longxiang Wang and   
                      Zheng Wei   Fastensor: Optimise the Tensor I/O Path
                                  from SSD to GPU for Deep Learning
                                  Training . . . . . . . . . . . . . . . . 62:1--62:??

ACM Transactions on Architecture and Code Optimization
Volume 21, Number 1, March, 2024

                Longfei Luo and   
                 Dingcui Yu and   
                    Yina Lv and   
                      Liang Shi   Critical Data Backup with Hybrid
                                  Flash-Based Consumer Devices . . . . . . 1:1--1:??
                  Peng Chen and   
                   Hui Chen and   
                Weichen Liu and   
                 Linbo Long and   
                Wanli Chang and   
                       Nan Guan   DAG-Order: an Order-Based Dynamic DAG
                                  Scheduling for Real-Time
                                  Networks-on-Chip . . . . . . . . . . . . 2:1--2:??
                Zhang Jiang and   
                  Ying Chen and   
                Xiaoli Gong and   
                  Jin Zhang and   
                Wenwen Wang and   
                  Pen-Chung Yew   JiuJITsu: Removing Gadgets with Safe
                                  Register Allocation for JIT Code
                                  Generation . . . . . . . . . . . . . . . 3:1--3:??
                Hayfa Tayeb and   
            Ludovic Paillat and   
         Bérenger Bramas   Autovesk: Automatic Vectorized Code
                                  Generation from Unstructured Static
                                  Kernels Using Graph Transformations  . . 4:1--4:??
               Xueying Wang and   
                 Guangli Li and   
                   Zhen Jia and   
              Xiaobing Feng and   
                      Yida Wang   Fast Convolution Meets Low Precision:
                                  Exploring Efficient Quantized Winograd
                                  Convolution on Modern CPUs . . . . . . . 5:1--5:??
                    Hao Fan and   
                 Yiliang Ye and   
              Shadi Ibrahim and   
                 Zhuo Huang and   
                  Xingru Li and   
                 Weibin Xue and   
                    Song Wu and   
                    Chen Yu and   
                Xuanhua Shi and   
                        Hai Jin   QoS-pro: a QoS-enhanced Transaction
                                  Processing Framework for Shared SSDs . . 6:1--6:??
               Yunping Zhao and   
                   Sheng Ma and   
                   Heng Liu and   
                 Libo Huang and   
                         Yi Dai   SAC: an Ultra-Efficient Spin-based
                                  Architecture for Compressed DNNs . . . . 7:1--7:??
                Tong-Yu Liu and   
                Jianmei Guo and   
                       Bo Huang   Efficient Cross-platform Multiplexing of
                                  Hardware Performance Counters via
                                  Adaptive Grouping  . . . . . . . . . . . 8:1--8:??
                    Lei Liu and   
                    Xinglei Dou   QuCloud+: a Holistic Qubit Mapping
                                  Scheme for Single/Multi-programming on
                                  $2$D/$3$D NISQ Quantum Computers . . . . 9:1--9:??
                  Lingxi Wu and   
               Minxuan Zhou and   
                 Weihong Xu and   
              Ashish Venkat and   
              Tajana Rosing and   
                  Kevin Skadron   Abakus: Accelerating $k$-mer Counting
                                  with Storage Technology  . . . . . . . . 10:1--10:??
               Seokwon Kang and   
                Jongbin Kim and   
             Gyeongyong Lee and   
             Jeongmyung Lee and   
                  Jiwon Seo and   
              Hyungsoo Jung and   
               Yong Ho Song and   
                   Yongjun Park   ISP Agent: a Generalized
                                  In-storage-processing Workload
                                  Offloading Framework by Providing
                                  Multiple Optimization Opportunities  . . 11:1--11:??
             Prasoon Mishra and   
           V. Krishna Nandivada   COWS for High Performance: Cost Aware
                                  Work Stealing for Irregular Parallel
                                  Loop . . . . . . . . . . . . . . . . . . 12:1--12:??
               Joongun Park and   
              Seunghyo Kang and   
              Sanghyeon Lee and   
                Taehoon Kim and   
                Jongse Park and   
              Youngjin Kwon and   
                    Jaehyuk Huh   Hardware-hardened Sandbox Enclaves for
                                  Trusted Serverless Computing . . . . . . 13:1--13:??
                Tyler Allen and   
             Bennett Cooper and   
                        Rong Ge   Fine-grain Quantitative Analysis of
                                  Demand Paging in Unified Virtual Memory  14:1--14:??
              Zhonghua Wang and   
                 Yixing Guo and   
                     Kai Lu and   
                Jiguang Wan and   
                Daohui Wang and   
                   Ting Yao and   
                      Huatao Wu   Rcmp: Reconstructing RDMA-Based Memory
                                  Disaggregation via CXL . . . . . . . . . 15:1--15:??
                 Linbo Long and   
                Shuiyong He and   
             Jingcheng Shen and   
                Renping Liu and   
                Zhenhua Tan and   
               Congming Gao and   
                    Duo Liu and   
                  Kan Zhong and   
                       Yi Jiang   WA-Zone: Wear-Aware Zone Management
                                  Optimization for LSM-Tree on ZNS SSDs    16:1--16:??
                 Zhihua Fan and   
                 Wenming Li and   
                  Zhen Wang and   
                    Yu Yang and   
                Xiaochun Ye and   
                Dongrui Fan and   
                Ninghui Sun and   
                      Xuejun An   Improving Utilization of Dataflow Unit
                                  for Multi-Batch Processing . . . . . . . 17:1--17:??
                Dunbo Zhang and   
               Qingjie Lang and   
                 Ruoxi Wang and   
                        Li Shen   Extension VM: Interleaved Data Layout in
                                  Vector Memory  . . . . . . . . . . . . . 18:1--18:??
                Can Firtina and   
             Kamlesh Pillai and   
          Gurpreet S. Kalsi and   
          Bharathwaj Suresh and   
           Damla Senol Cali and   
             Jeremie S. Kim and   
             Taha Shahroodi and   
         Meryem Banu Cavlak and   
        Joël Lindegger and   
             Mohammed Alser and   
     Juan Gómez Luna and   
       Sreenivas Subramoney and   
                     Onur Mutlu   ApHMM: Accelerating Profile Hidden
                                  Markov Models for Fast and
                                  Energy-efficient Genome Analysis . . . . 19:1--19:??
               Khalid Ahmad and   
                 Cris Cecka and   
            Michael Garland and   
                      Mary Hall   Exploring Data Layout for Sparse Tensor
                                  Times Dense Matrix on GPUs . . . . . . . 20:1--20:??

ACM Transactions on Architecture and Code Optimization
Volume 21, Number 2, June, 2024

     Chandra Sekhar Mummidi and   
         Victor C. Ferreira and   
       Sudarshan Srinivasan and   
                   Sandip Kundu   Highly Efficient Self-checking Matrix
                                  Multiplication on Tiled AMX Accelerators 21:1--21:??
              Zhonghua Wang and   
                  Chen Ding and   
             Fengguang Song and   
                     Kai Lu and   
                Jiguang Wan and   
                  Zhihu Tan and   
             Changsheng Xie and   
                     Guokuan Li   WIPE: a Write-Optimized Learned Index
                                  for Persistent Memory  . . . . . . . . . 22:1--22:??
             Gino A. Chacon and   
           Charles Williams and   
            Johann Knechtel and   
            Ozgur Sinanoglu and   
              Paul V. Gratz and   
                Vassos Soteriou   Coherence Attacks and Countermeasures in
                                  Interposer-based Chiplet Systems . . . . 23:1--23:??
                    Yan Wei and   
                  Zhang Xingjun   A Concise Concurrent B+-Tree for
                                  Persistent Memory  . . . . . . . . . . . 24:1--24:??
            Fareed Qararyah and   
       Muhammad Waqar Azhar and   
                 Pedro Trancoso   An Efficient Hybrid Deep Learning
                                  Accelerator for Compact and
                                  Heterogeneous CNNs . . . . . . . . . . . 25:1--25:??
Fernando Fernandes Dos Santos and   
                Luigi Carro and   
               Flavio Vella and   
                     Paolo Rech   Assessing the Impact of Compiler
                                  Optimizations on GPUs Reliability  . . . 26:1--26:??
   Valentin Isaac-Chassande and   
               Adrian Evans and   
                Yves Durand and   
Frédéric Rousseau   Dedicated Hardware Accelerators for
                                  Processing of Sparse Matrices and
                                  Vectors: a Survey  . . . . . . . . . . . 27:1--27:??
                  Benyi Xie and   
                    Yue Yan and   
               Chenghao Yan and   
                Sicheng Tao and   
         Zhuangzhuang Zhang and   
                   Xinyu Li and   
                 Yanzhi Lan and   
                   Xiang Wu and   
                 Tianyi Liu and   
             Tingting Zhang and   
                    Fuxin Zhang   An Instruction Inflation Analyzing
                                  Framework for Dynamic Binary Translators 28:1--28:??
                 Samuel Rac and   
                  Mats Brorsson   Cost-aware Service Placement and
                                  Scheduling in the Edge-Cloud Continuum   29:1--29:??
                   Feng Xue and   
                 Chenji Han and   
                   Xinyu Li and   
                Junliang Wu and   
             Tingting Zhang and   
                 Tianyi Liu and   
                  Yifan Hao and   
                  Zidong Du and   
                     Qi Guo and   
                    Fuxin Zhang   Tyche: an Efficient and General
                                  Prefetcher for Indirect Memory Accesses  30:1--30:??
                Kunpeng Xie and   
                      Ye Lu and   
                   Xinyu He and   
                   Dezhi Yi and   
               Huijuan Dong and   
                       Yao Chen   Winols: a Large-Tiling Sparse Winograd
                                  CNN Accelerator on FPGAs . . . . . . . . 31:1--31:??
                     Ke Liu and   
                     Kan Wu and   
                   Hua Wang and   
                    Ke Zhou and   
                  Peng Wang and   
                   Ji Zhang and   
                        Cong Li   SLAP: Segmented Reuse-Time-Label Based
                                  Admission Policy for Content Delivery
                                  Network Caching  . . . . . . . . . . . . 32:1--32:??
        Panagiotis Miliadis and   
    Dimitris Theodoropoulos and   
    Dionisios Pnevmatikatos and   
              Nectarios Koziris   Architectural Support for Sharing,
                                  Isolating and Virtualizing FPGA
                                  Resources  . . . . . . . . . . . . . . . 33:1--33:??
                  Haitao Du and   
                  Yuhan Qin and   
                  Song Chen and   
                        Yi Kang   FASA-DRAM: Reducing DRAM Latency with
                                  Destructive Activation and Delayed
                                  Restoration  . . . . . . . . . . . . . . 34:1--34:??
           Michael Canesche and   
   Vanderson Rosário and   
                Edson Borin and   
Fernando Quintão Pereira   The Droplet Search Algorithm for Kernel
                                  Scheduling . . . . . . . . . . . . . . . 35:1--35:??
                 Asmita Pal and   
            Keerthana Desai and   
           Rahul Chatterjee and   
              Joshua San Miguel   Camouflage: Utility-Aware Obfuscation
                                  for Accurate Simulation of Sensitive
                                  Program Traces . . . . . . . . . . . . . 36:1--36:??
             Chengying Huan and   
               Yongchao Liu and   
                 Heng Zhang and   
              Shuaiwen Song and   
             Santosh Pandey and   
               Shiyang Chen and   
              Xiangfei Fang and   
                    Yue Jin and   
            Baptiste Lepers and   
                  Yanjun Wu and   
                       Hang Liu   TEA+: a Novel Temporal Graph Random Walk
                                  Engine with Hybrid Storage Architecture  37:1--37:??
               Soojin Hwang and   
              Daehyeon Baek and   
                Jongse Park and   
                    Jaehyuk Huh   Cerberus: Triple Mode Acceleration of
                                  Sparse Matrix and Vector Multiplication  38:1--38:??
Siddhartha Raman Sundara Raman and   
                  Lizy John and   
            Jaydeep P. Kulkarni   NEM-GNN: DAC/ADC-less, Scalable,
                                  Reconfigurable, Graph and Sparsity-Aware
                                  Near-Memory Accelerator for Graph Neural
                                  Networks . . . . . . . . . . . . . . . . 39:1--39:??
                   Yan Chen and   
                   Qiwen Ke and   
                   Huiba Li and   
                 Yongwei Wu and   
                   Yiming Zhang   xMeta: SSD-HDD-hybrid Optimization for
                                  Metadata Maintenance of Cloud-scale
                                  Object Storage . . . . . . . . . . . . . 40:1--40:??
             Vidush Singhal and   
                Laith Sakka and   
   Kirshanthan Sundararajah and   
                Ryan Newton and   
                Milind Kulkarni   Orchard: Heterogeneous Parallelism and
                                  Fine-grained Fusion for Complex Tree
                                  Traversals . . . . . . . . . . . . . . . 41:1--41:??

ACM Transactions on Architecture and Code Optimization
Volume 21, Number 3, September, 2024

             Hajar Falahati and   
       Mohammad Sadrosadati and   
                  Qiumin Xu and   
     Juan Gómez-Luna and   
   Banafsheh Saber Latibari and   
                Hyeran Jeon and   
            Shaahin Hesaabi and   
         Hamid Sarbazi-Azad and   
                 Onur Mutlu and   
           Murali Annavaram and   
                  Masoud Pedram   Cross-core Data Sharing for
                                  Energy-efficient GPUs  . . . . . . . . . 42:1--42:??
              Ching-Jui Lee and   
                  Tsung Tai Yeh   ReSA: Reconfigurable Systolic Array for
                                  Multiple Tiny DNN Tensors  . . . . . . . 43:1--43:??
                Ziheng Wang and   
               Xiaoshe Dong and   
                   Yan Kang and   
                  Heng Chen and   
                     Qiang Wang   An Example of Parallel Merkle Tree
                                  Traversal: Post-Quantum Leighton--Micali
                                  Signature on the GPU . . . . . . . . . . 44:1--44:??
                   Jiang Wu and   
                 Zhuo Zhang and   
                Deheng Yang and   
                 Jianjun Xu and   
                   Jiayu He and   
                  Xiaoguang Mao   Knowledge-Augmented Mutation-Based Bug
                                  Localization for Hardware Design Code    45:1--45:??
                  Chen Ding and   
                  Jian Zhou and   
                     Kai Lu and   
                   Sicen Li and   
                Yiqin Xiong and   
                Jiguang Wan and   
                      Ling Zhan   D$^2$Comp: Efficient Offload of LSM-tree
                                  Compaction with Data Processing Units on
                                  Disaggregated Storage  . . . . . . . . . 46:1--46:??
               Zhuohao Wang and   
                    Lei Liu and   
                     Limin Xiao   iSwap: a New Memory Page Swap Mechanism
                                  for Reducing Ineffective I/O Operations
                                  in Cloud Environments  . . . . . . . . . 47:1--47:??
              Junkaixuan Li and   
                        Yi Kang   GraphSER: Distance-Aware Stream-Based
                                  Edge Repartition for Many-Core Systems   48:1--48:??
                      Ke Wu and   
                 Dezun Dong and   
                      Weixia Xu   COER: a Network Interface Offloading
                                  Architecture for RDMA and Congestion
                                  Control Protocol Codesign  . . . . . . . 49:1--49:??
                 Qunyou Liu and   
               Darong Huang and   
               Luis Costero and   
             Marina Zapater and   
                  David Atienza   Intermediate Address Space: virtual
                                  memory optimization of heterogeneous
                                  architectures for cache-resident
                                  workloads  . . . . . . . . . . . . . . . 50:1--50:??
               Dongmoon Min and   
                Ilkwon Byun and   
              Gyu-Hyeon Lee and   
                    Jangwoo Kim   CoolDC: a Cost-Effective
                                  Immersion-Cooled Datacenter with
                                  Workload-Aware Temperature Scaling . . . 51:1--51:??
                   Hai Zhou and   
                       Dan Feng   Stripe-schedule Aware Repair in
                                  Erasure-coded Clusters with
                                  Heterogeneous Star Networks  . . . . . . 52:1--52:??
                 Bobin Deng and   
          Bhargava Nadendla and   
                    Kun Suo and   
                  Yixin Xie and   
               Dan Chia-Tien Lo   Fixed-point Encoding and Architecture
                                  Exploration for Residue Number Systems   53:1--53:??
                Yizhuo Wang and   
               Fangli Chang and   
                Bingxin Wei and   
                Jianhua Gao and   
                     Weixing Ji   Optimization of Sparse Matrix
                                  Computation for Algebraic Multigrid on
                                  GPUs . . . . . . . . . . . . . . . . . . 54:1--54:??
                Luming Wang and   
                   Xu Zhang and   
               Songyue Wang and   
              Zhuolun Jiang and   
                 Tianyue Lu and   
                Mingyu Chen and   
                  Siwei Luo and   
                     Keji Huang   Asynchronous Memory Access Unit:
                                  Exploiting Massive Parallelism for Far
                                  Memory Access  . . . . . . . . . . . . . 55:1--55:??
               Yunping Zhao and   
                   Sheng Ma and   
                Hengzhu Liu and   
                   Dongsheng Li   SAL: Optimizing the Dataflow of
                                  Spin-based Architectures for Lightweight
                                  Neural Networks  . . . . . . . . . . . . 56:1--56:??
                     Kai Lu and   
                  Siqi Zhao and   
               Haikang Shan and   
                  Qiang Wei and   
                 Guokuan Li and   
                Jiguang Wan and   
                   Ting Yao and   
                  Huatao Wu and   
                    Daohui Wang   Scythe: a Low-latency RDMA-enabled
                                  Distributed Transaction System for
                                  Disaggregated Memory . . . . . . . . . . 57:1--57:??
                Wangqi Peng and   
                   Yusen Li and   
              Xiaoguang Liu and   
                      Gang Wang   Lavender: an Efficient Resource
                                  Partitioning Framework for Large-Scale
                                  Job Colocation . . . . . . . . . . . . . 58:1--58:??
                 Feng Zhang and   
                  Fulin Nan and   
                  Binbin Xu and   
               Zhirong Shen and   
                Jiebin Zhai and   
            Dmitrii Kalplun and   
                       Jiwu Shu   Achieving Tunable Erasure Coding with
                                  Cluster-Aware Redundancy Transitioning   59:1--59:??
              Ataberk Olgun and   
           F. Nisa Bostanci and   
Geraldo Francisco de Oliveira Junior and   
           Yahya Can Tugrul and   
                 Rahul Bera and   
    Abdullah Giray Yaglikci and   
               Hasan Hassan and   
                 Oguz Ergin and   
                     Onur Mutlu   Sectored DRAM: a Practical
                                  Energy-Efficient and High-Performance
                                  Fine-Grained DRAM Architecture . . . . . 60:1--60:??
                Xiaohui Wei and   
              Chenyang Wang and   
               Hengshan Yue and   
             Jingweijia Tan and   
                  Zeyu Guan and   
                  Nan Jiang and   
              Xinyang Zheng and   
              Jianpeng Zhao and   
                    Meikang Qiu   ReIPE: Recycling Idle PEs in CNN
                                  Accelerator for Vulnerable Filters
                                  Soft-Error Detection . . . . . . . . . . 61:1--61:??
                    Qiao Li and   
                    Yu Chen and   
                  Guanyu Wu and   
                  Yajuan Du and   
                     Min Ye and   
                Xinbiao Gan and   
                  Jie Zhang and   
               Zhirong Shen and   
                   Jiwu Shu and   
                       Chun Xue   Characterizing and Optimizing LDPC
                                  Performance on $3$D NAND Flash Memories  62:1--62:??
                 Jiahong Xu and   
                 Haikun Liu and   
               Zhuohui Duan and   
               Xiaofei Liao and   
                    Hai Jin and   
              Xiaokang Yang and   
                   Huize Li and   
                   Cong Liu and   
                 Fubing Mao and   
                       Yu Zhang   ReHarvest: an ADC Resource-Harvesting
                                  Crossbar Architecture for ReRAM-Based
                                  DNN Accelerators . . . . . . . . . . . . 63:1--63:??
                   Jiang Wu and   
                 Zhuo Zhang and   
                Deheng Yang and   
                 Jianjun Xu and   
                   Jiayu He and   
                  Xiaoguang Mao   Time-Aware Spectrum-Based Bug
                                  Localization for Hardware Design Code
                                  with Data Purification . . . . . . . . . 64:1--64:??

ACM Transactions on Architecture and Code Optimization
Volume 21, Number 4, December, 2024

               Zhuoran Song and   
                Zhongkai Yu and   
                Xinkai Song and   
                  Yifan Hao and   
                   Li Jiang and   
               Naifeng Jing and   
                  Xiaoyao Liang   Environmental Condition Aware
                                  Super-Resolution Acceleration Framework
                                  in Server--Client Hierarchies  . . . . . 65:1--65:??
           Georgia Antoniou and   
           Davide Bartolini and   
                Haris Volos and   
          Marios Kleanthous and   
                   Zhe Wang and   
     Kleovoulos Kalaitzidis and   
                 Tom Rollet and   
                   Ziwei Li and   
                 Onur Mutlu and   
         Yiannakis Sazeides and   
                Jawad Haj Yahya   Agile C-states: a Core C-state
                                  Architecture for Latency Critical
                                  Applications Optimizing both Transition
                                  and Cold-Start Latency . . . . . . . . . 66:1--66:??
                Xinbiao Gan and   
                  Tiejun Li and   
                 Feng Xiong and   
                    Bo Yang and   
                Xinhai Chen and   
                Chunye Gong and   
                  Shijie Li and   
                     Kai Lu and   
                    Qiao Li and   
                   Yiming Zhang   MST: Topology-Aware Message Aggregation
                                  for Exascale Graph Processing of
                                  Traversal-Centric Algorithms . . . . . . 67:1--67:??
                  Yujie Cui and   
                   Wei Chen and   
                   Xu Cheng and   
                   Jiangfang Yi   Hyperion: a Highly Effective Page and PC
                                  Based Delta Prefetcher . . . . . . . . . 68:1--68:??
                Jianhua Gao and   
                 Weixing Ji and   
                    Yizhuo Wang   Optimization of Large-Scale Sparse
                                  Matrix--Vector Multiplication on
                                  Multi-GPU Systems  . . . . . . . . . . . 69:1--69:??
               Zhengding Hu and   
                Jingwei Sun and   
               Zhongyang Li and   
                 Guangzhong Sun   AG-SpTRSV: an Automatic Framework to
                                  Optimize Sparse Triangular Solve on GPUs 70:1--70:??
                Wenbo Zhang and   
                   Yiqi Liu and   
               Tianhao Zang and   
                   Zhenshan Bao   EA4RCA: Efficient AIE accelerator design
                                  framework for regular
                                  Communication-Avoiding Algorithm . . . . 71:1--71:??
            Arun Thangamani and   
           Vincent Loechner and   
         Stéphane Genaud   A Survey of General-purpose Polyhedral
                                  Compilers  . . . . . . . . . . . . . . . 72:1--72:??
                Junqing Lin and   
                Jingwei Sun and   
               Xiaolong Shi and   
               Honghe Zhang and   
                 Xianzhi Yu and   
                Xinzhi Wang and   
                    Jun Yao and   
                 Guangzhong Sun   LO-SpMM: Low-cost Search for
                                  High-performance SpMM Kernels on GPUs    73:1--73:??
               Chenglong Yi and   
                Jintong Liu and   
              Shenggang Wan and   
                Juntao Fang and   
                    Bin Sun and   
                  Liqiang Zhang   Data Deduplication Based on Content
                                  Locality of Transactions to Enhance
                                  Blockchain Scalability . . . . . . . . . 74:1--74:??
        Joshua Dennis Booth and   
                   Phillip Lane   A NUMA-Aware Version of an Adaptive
                                  Self-Scheduling Loop Scheduler . . . . . 75:1--75:??
                    Yu Tang and   
                    Qiao Li and   
                  Lujia Yin and   
               Dongsheng Li and   
               Yiming Zhang and   
                Chenyu Wang and   
            Xingcheng Zhang and   
                 Linbo Qiao and   
             Zhaoning Zhang and   
                         Kai Lu   DELTA: Memory-Efficient Training via
                                  Dynamic Fine-Grained Recomputation and
                                  Swapping . . . . . . . . . . . . . . . . 76:1--76:??
                Zhenhua Tan and   
                 Linbo Long and   
             Jingcheng Shen and   
                Renping Liu and   
               Congming Gao and   
                  Kan Zhong and   
                       Yi Jiang   Optimizing Garbage Collection for ZNS
                                  SSDs via In-storage Data Migration and
                                  Address Remapping  . . . . . . . . . . . 77:1--77:??
                   Xiang Li and   
                Qiong Chang and   
                 Aolong Zha and   
               Shijie Chang and   
                     Yun Li and   
                   Jun Miyazaki   An Optimized GPU Implementation for GIST
                                  Descriptor . . . . . . . . . . . . . . . 78:1--78:??
                  Xiaobo Lu and   
               Jianbin Fang and   
                   Lin Peng and   
                 Chun Huang and   
                  Zidong Du and   
               Yongwei Zhao and   
                     Zheng Wang   Mentor: a Memory-Efficient Sparse-dense
                                  Matrix Multiplication Accelerator Based
                                  on Column-Wise Product . . . . . . . . . 79:1--79:??
                    Yu Feng and   
                 Weikai Lin and   
                  Zihan Liu and   
               Jingwen Leng and   
                  Minyi Guo and   
                   Han Zhao and   
               Xiaofeng Hou and   
                 Jieru Zhao and   
                      Yuhao Zhu   Potamoi: Accelerating Neural Rendering
                                  via a Unified Streaming Architecture . . 80:1--80:??
                Changxi Liu and   
                  Alen Sabu and   
         Akanksha Chaudhari and   
              Qingxuan Kang and   
              Trevor E. Carlson   Pac-Sim: Simulation of Multi-threaded
                                  Workloads using Intelligent, Live
                                  Sampling . . . . . . . . . . . . . . . . 81:1--81:??
               Saurabh Raje and   
                   Yufan Xu and   
             Atanas Rountev and   
           Edward F. Valeev and   
                  P. Sadayappan   CoNST: Code Generator for Sparse Tensor
                                  Networks . . . . . . . . . . . . . . . . 82:1--82:??
                 Danlin Jia and   
                  Geng Yuan and   
                 Yiming Xie and   
                    Xue Lin and   
                    Ningfang Mi   A Data-Loader Tunable Knob to Shorten
                                  GPU Idleness for Distributed Deep
                                  Learning . . . . . . . . . . . . . . . . 83:1--83:??
                Shaobu Wang and   
             Guangyan Zhang and   
                  Junyu Wei and   
                  Yang Wang and   
                Jiesheng Wu and   
                   Qingchao Luo   Understanding Silent Data Corruption in
                                  Processors for Mitigating its Effects    84:1--84:??
                  Yen-Yu Lu and   
              Chin-Hsien Wu and   
                Shih-Jen Li and   
              Cheng-Tze Lee and   
                   Cheng-Yen Wu   A Stable Idle Time Detection Platform
                                  for Real I/O Workloads . . . . . . . . . 85:1--85:??
                 Lingyu Sun and   
               Xiaofeng Hou and   
                    Chao Li and   
               Jiacheng Liu and   
                Xinkai Wang and   
                  Quan Chen and   
                      Minyi Guo   $ A^2 $: Towards Accelerator Level
                                  Parallelism for Autonomous Micromobility
                                  Systems  . . . . . . . . . . . . . . . . 86:1--86:??
             Manojna Sistla and   
                 Yiding Liu and   
                         Xin Fu   Towards High Performance QNNs via
                                  Distribution-Based CNOT Gate Reduction   87:1--87:??
                 Fubing Mao and   
                     Xu Liu and   
                   Yu Zhang and   
                 Haikun Liu and   
               Xiaofei Liao and   
                    Hai Jin and   
                  Wei Zhang and   
                  Jian Zhou and   
                   Yufei Wu and   
                 Longyu Nie and   
                   Yapu Guo and   
                Zihan Jiang and   
                   Jingkang Liu   PMGraph: Accelerating Concurrent Graph
                                  Queries over Streaming Graphs  . . . . . 88:1--88:??
                 Wentong Li and   
                    Yina Lv and   
                Longfei Luo and   
               Yunpeng Song and   
                      Liang Shi   Access Characteristic-Guided Remote
                                  Swapping Across Mobile Devices . . . . . 89:1--89:??
                Yinan Zhang and   
                  Shun Yang and   
                   Huiqi Hu and   
            Chengcheng Yang and   
                   Peng Cai and   
                      Xuan Zhou   SuccinctKV: a CPU-efficient LSM-tree
                                  Based KV Store with Scan-based
                                  Compaction . . . . . . . . . . . . . . . 90:1--90:??
                  Siyuan Ma and   
            Kaustubh Mhatre and   
                  Jian Weng and   
           Bagus Hanindhito and   
             Zhengrong Wang and   
              Tony Nowatzki and   
                  Lizy John and   
                     Aman Arora   PIMSAB: a Processing-In-Memory System
                                  with Spatially-Aware Communication and
                                  Bit-Serial-Aware Computation . . . . . . 91:1--91:??

ACM Transactions on Architecture and Code Optimization
Volume 22, Number 1, March, 2025

               Perry Gibson and   
                  Jose Cano and   
             Elliot Crowley and   
               Amos Storkey and   
                Michael O'boyle   DLAS: a Conceptual Model for
                                  Across-Stack Deep Learning Acceleration  1:1--1:??
                    Xinbiao Gan   GraphService: Topology-aware Constructor
                                  for Large-scale Graph Applications . . . 2:1--2:??
               Renjun Zhang and   
             Tianming Zhang and   
                  Zinuo Cai and   
                 Dongmei Li and   
                   Ruhui Ma and   
                 Buyya Rajkumar   MemoriaNova: Optimizing Memory-Aware
                                  Model Inference for Edge Computing . . . 3:1--3:??
              Andrea Lepori and   
         Alexandru Calotoiu and   
                Torsten Hoefler   Iterating Pointers: Enabling Static
                                  Analysis for Loop-based Pointers . . . . 4:1--4:??
             Viktor Razilov and   
                 Ipek Gecin and   
          Emil Matús and   
               Gerhard Fettweis   Conflict Management in Vector Register
                                  Files  . . . . . . . . . . . . . . . . . 5:1--5:??
                  Jingle Xu and   
                   Jiayu Fu and   
                    Lin Gan and   
               Yaojian Chen and   
                 Zhaoqi Sun and   
             Zhenchun Huang and   
                  Guangwen Yang   Leveraging the Hardware Resources to
                                  Accelerate cryo-EM Reconstruction of
                                  RELION on the New Sunway Supercomputer   6:1--6:??
                 Yuta Saito and   
          Kazunori Sakamoto and   
         Hironori Washizaki and   
              Yoshiaki Fukazawa   Multiple Function Merging for Code Size
                                  Reduction  . . . . . . . . . . . . . . . 7:1--7:??
               Peihua Zhang and   
               Chenggang Wu and   
                  Hanzhi Hu and   
                 Lichen Jia and   
               Mingfan Peng and   
                   Jiali Xu and   
                Mengyao Xie and   
               Yuanming Lai and   
                   Yan Kang and   
                       Zhe Wang   Shining Light on the Inter-procedural
                                  Code Obfuscation: Keep Pace with
                                  Progress in Binary Diffing . . . . . . . 8:1--8:??
                 Dengke Han and   
                 Mingyu Yan and   
                Xiaochun Ye and   
                    Dongrui Fan   Characterizing and Understanding HGNN
                                  Training on GPUs . . . . . . . . . . . . 9:1--9:??
                Jingyu Wang and   
                 Ruilong Ma and   
                 Xiang Yang and   
                      Qi Qi and   
               Zirui Zhuang and   
                  Jing Wang and   
               Jianxin Liao and   
                       Song Guo   DeepZoning: Re-accelerate CNN Inference
                                  with Zoning Graph for Heterogeneous Edge
                                  Cluster  . . . . . . . . . . . . . . . . 10:1--10:??
            Chenghao Ouyang and   
                 Jinhan Xin and   
                  Siqi Zeng and   
                  Guohui Li and   
                 Jianjun Li and   
                      Zhibin Yu   Constructing a Supplementary Benchmark
                                  Suite to Represent Android Applications
                                  with User Interactions by using
                                  Performance Counters . . . . . . . . . . 11:1--11:??
                Xinglei Dou and   
                    Lei Liu and   
                     Limin Xiao   An Intelligent Scheduling Approach on
                                  Mobile OS for Optimizing UI Smoothness
                                  and Power  . . . . . . . . . . . . . . . 12:1--12:??
             Kwanghoon Choi and   
                  Igjae Kim and   
                  Sunho Lee and   
                    Jaehyuk Huh   ShieldCXL: a Practical Obliviousness
                                  Support with Sealed CXL Memory . . . . . 13:1--13:??
                   Yun Chen and   
              Ali Hajiabadi and   
            Romain Poussier and   
             Yaswanth Tavva and   
          Andreas Diavastos and   
              Shivam Bhasin and   
              Trevor E. Carlson   PARADISE: Criticality-Aware Instruction
                                  Reordering for Power Attack Resistance   14:1--14:??
                Chunfeng Li and   
                   Feng Shi and   
                    Fei Yin and   
              Karim Soliman and   
                        Jin Wei   A High Scalability Memory NoC with
                                  Shared-Inside Hierarchical-Groupings for
                                  Triplet-Based Many-Core Architecture . . 15:1--15:??
                   Jin Zhao and   
                   Yu Zhang and   
                 Donghao He and   
                   Qikun Li and   
                Weihang Yin and   
                     Hui Yu and   
                     Hao Qi and   
               Xiaofei Liao and   
                    Hai Jin and   
                 Haikun Liu and   
                 Linchen Yu and   
                     Zhang Zhan   An Efficient ReRAM-based Accelerator for
                                  Asynchronous Iterative Graph Processing  16:1--16:??
                   Xinyu Li and   
               Guangyao Guo and   
                 Yanzhi Lan and   
                   Feng Xue and   
                 Chenji Han and   
                    Gen Niu and   
                    Fuxin Zhang   Tiaozhuan: a General and Efficient
                                  Indirect Branch Optimization for Binary
                                  Translation  . . . . . . . . . . . . . . 17:1--17:??
                Jianhua Gao and   
                 Zeming Liu and   
                Yizhuo Wang and   
                     Weixing Ji   RaNAS: Resource-Aware Neural
                                  Architecture Search for Edge Computing   18:1--18:??
               Adnan Hasnat and   
                   Shoaib Akram   SPIRIT: Scalable and Persistent
                                  In-Memory Indices for Real-Time Search   19:1--19:??
                Dezhong Yao and   
                 Sifan Zhao and   
               Tongtong Liu and   
                    Gang Wu and   
                        Hai Jin   ApSpGEMM: Accelerating Large-scale
                                  SpGEMM with Heterogeneous Collaboration
                                  and Adaptive Panel . . . . . . . . . . . 20:1--20:??
                Weiduo Chen and   
               Xiaoshe Dong and   
                  Fan Zhang and   
                   Bowen Li and   
                 Yufei Wang and   
                     Qiang Wang   ATP: Achieving Throughput Peak for DNN
                                  Training via Smart GPU Memory Management 21:1--21:??
               Zhuoran Song and   
                Jiabei Long and   
                   Li Jiang and   
               Naifeng Jing and   
                  Xiaoyao Liang   GCNTrain+: a Versatile and Efficient
                                  Accelerator for Graph Convolutional
                                  Neural Network Training  . . . . . . . . 22:1--22:??
                  Wenjie Qi and   
                Zhipeng Tan and   
                Ziyue Zhang and   
                  Ying Yuan and   
                       Dan Feng   exZNS: Extending Zoned Namespace to
                                  Support Byte-loggable Zones  . . . . . . 23:1--23:??
                 Long Zheng and   
                   Bing Zhu and   
              Pengcheng Yao and   
                Yuhang Zhou and   
                Chengao Pan and   
                 Wenju Zhao and   
               Xiaofei Liao and   
                    Hai Jin and   
                   Jingling Xue   PRAGA: a Priority-Aware
                                  Hardware/Software Co-design for
                                  High-Throughput Graph Processing
                                  Acceleration . . . . . . . . . . . . . . 24:1--24:??
             Yingshuai Dong and   
               Chencheng Ye and   
                 Haikun Liu and   
                Liting Tang and   
               Xiaofei Liao and   
                    Hai Jin and   
                 Cheng Chen and   
                Yanjiang Li and   
                        Yi Wang   DTAP: Accelerating Strongly-Typed
                                  Programs with Data Type-Aware Hardware
                                  Prefetching  . . . . . . . . . . . . . . 25:1--25:??
               Xueliang Wei and   
                   Dan Feng and   
                   Wei Tong and   
                    Bing Wu and   
                       Xu Jiang   COVER: Alleviating Crash-Consistency
                                  Error Amplification in Secure Persistent
                                  Memory Systems . . . . . . . . . . . . . 26:1--26:??
                 Xinqi Chen and   
                    Erci Xu and   
                 Dengyao Mo and   
                 Ruiming Lu and   
                  Haonan Wu and   
                  Dian Ding and   
                   Guangtao Xue   MasterPlan: a Reinforcement Learning
                                  Based Scheduler for Archive Storage  . . 27:1--27:??
       Brandon Kammerdiener and   
          J. Zach Mcmichael and   
              Michael Jantz and   
              Kshitij Doshi and   
                    Terry Jones   Flexible and Effective Object Tiering
                                  for Heterogeneous Memory Systems . . . . 28:1--28:??
              Zhiqiang Chen and   
               Yongwen Wang and   
               Hongwei Zhou and   
                     Jian Zhang   Steered Bubble: an Interposer-based
                                  Deadlock Recovery Algorithm for
                                  Multi-chiplet Systems  . . . . . . . . . 29:1--29:??
          Shruthi Karunakar and   
       Rajshekar Kalayappan and   
               Sandeep Chandran   Consequence-based Clustered Architecture 30:1--30:??
                Jiahui Yang and   
                  Fulin Nan and   
               Zhirong Shen and   
              Zhisheng Chen and   
                  Yuhui Cai and   
             Dmitrii Kaplun and   
                Xiaoli Wang and   
                Quanqing Xu and   
              Chuanhui Yang and   
                       Jiwu Shu   TPRepair: Tree-based Pipelined Repair in
                                  Clustered Storage Systems  . . . . . . . 31:1--31:25
               Jianrong Yan and   
               Wenbin Jiang and   
                  Dongao He and   
                 Suyang Wen and   
                    Yang Li and   
                    Hai Jin and   
                   Zhiyuan Shao   RT-GNN: Accelerating Sparse Graph Neural
                                  Networks by Tensor-CUDA Kernel Fusion    32:1--32:27
                     Yi Dai and   
                     Kai Lu and   
                   Sheng Ma and   
                  Jinshu Su and   
                   Dongsheng Li   Bubble-Swap Flow Control . . . . . . . . 33:1--33:26
               Dongjie Tang and   
                   Zijun Wu and   
                   Yun Wang and   
                 Yicheng Gu and   
                Fangxin Liu and   
                    Zhengwei Qi   gCom: Fine-grained Compressors in
                                  Graphics Memory of Mobile GPU  . . . . . 34:1--34:25
               Ruixing Zong and   
              Jiapeng Zhang and   
                  Zhuo Tang and   
                       Kenli Li   IBing: an Efficient Interleaved
                                  Bidirectional Ring All-Reduce Algorithm
                                  for Gradient Synchronization . . . . . . 35:1--35:23
             Quancheng Wang and   
                  Ming Tang and   
                      Ke Xu and   
                       Han Wang   Unveiling and Evaluating Vulnerabilities
                                  in Branch Predictors via a Three-Step
                                  Modeling Methodology . . . . . . . . . . 36:1--36:26
                Pengyu Yang and   
                 Weihao Cui and   
                 Chunyu Xue and   
                   Han Zhao and   
                  Chen Chen and   
                  Quan Chen and   
                  Jing Yang and   
                      Minyi Guo   Taming Flexible Job Packing in Deep
                                  Learning Training Clusters . . . . . . . 37:1--37:24
                 Zhenlin Wu and   
               Haosong Zhao and   
               Hongyuan Liu and   
                  Wujie Wen and   
                      Jiajia Li   gHyPart: GPU-friendly End-to-End
                                  Hypergraph Partitioner . . . . . . . . . 38:1--38:25
             Mariano Benito and   
            Enrique Vallejo and   
           Ramón Beivide   LIA: Latency-Improved Adaptive routing
                                  for Dragonfly networks . . . . . . . . . 39:1--39:26
                 Yiming Gan and   
               Jingwen Leng and   
                      Bo Yu and   
                      Yuhao Zhu   KINDRED: Heterogeneous Split-Lock
                                  Architecture for Safe Autonomous
                                  Machines . . . . . . . . . . . . . . . . 40:1--40:25
            Tzung-Han Juang and   
              Christophe Dubach   Maximizing Data and Hardware Reuse for
                                  HLS with Early-Stage Symbolic
                                  Partitioning . . . . . . . . . . . . . . 41:1--41:26
                   Cheng Xu and   
                    Chao Li and   
               Xiaofeng Hou and   
                  Junyi Mei and   
                  Jing Wang and   
                Pengyu Wang and   
                Shixuan Sun and   
                  Minyi Guo and   
                    Baoping Hao   Enhancing High-Throughput GPU Random
                                  Walks Through Multi-Task Concurrency
                                  Orchestration  . . . . . . . . . . . . . 42:1--42:26
                Qiong Chang and   
                Weimin Wang and   
                   Jun Miyazaki   Accelerating Nearest Neighbor Search in
                                  3D Point Cloud Registration on GPUs  . . 43:1--43:24
                Yekang Zhan and   
              Xiangrui Yang and   
                Haichuan Hu and   
                  Qiang Cao and   
                Yifan Zhang and   
                        Jie Yao   AIS: an Active Idleness I/O Scheduler to
                                  Reduce Buffer-Exhausted Degradation of
                                  Solid-State Drives . . . . . . . . . . . 44:1--44:26
                  Coby Soss and   
    Aravind Sukumaran Rajam and   
                Janet Layne and   
              Edoardo Serra and   
     Mahantesh Halappanavar and   
         Assefaw H. Gebremedhin   ScaWL: Scaling $k$-WL
                                  (Weisfeiler--Lehman) Algorithms in
                                  Memory and Performance on Shared and
                                  Distributed-Memory Systems . . . . . . . 45:1--45:25

ACM Transactions on Architecture and Code Optimization
Volume 22, Number 2, June, 2025

                Yiming Wang and   
               Weizhe Zhang and   
                   Meng Hao and   
                Weizhi Kong and   
                       Yuan Wen   Dynamic Power Management Through
                                  Multi-agent Deep Reinforcement Learning
                                  for Heterogeneous Systems  . . . . . . . 46:1--46:??
               Xinyuan Wang and   
                Xingchen Li and   
                   Yun Peng and   
                   Hejiao Huang   Comprehensive Evaluation and Opportunity
                                  Discovery for Deterministic Concurrency
                                  Control  . . . . . . . . . . . . . . . . 47:1--47:??
   Théophile Bastian and   
            Hugo Pompougnac and   
            Alban Dutilleul and   
               Fabrice Rastello   CesASMe and Staticdeps: static detection
                                  of memory-carried dependencies for code
                                  analyzers  . . . . . . . . . . . . . . . 48:1--48:??
                  Fuyu Wang and   
               Minghua Shen and   
                  Yutong Lu and   
                      Nong Xiao   Ceiba: an Efficient and Scalable DNN
                                  Scheduler for Spatial Accelerators . . . 49:1--49:??
                  Kelun Lei and   
                Shaokang Du and   
                    Xin You and   
               Hailong Yang and   
              Zhongzhi Luan and   
                     Yi Liu and   
                     Depei Qian   Exploiting Dynamic Regular Patterns in
                                  Irregular Programs for Efficient
                                  Vectorization  . . . . . . . . . . . . . 50:1--50:??
               Xueying Wang and   
                 Shigang Li and   
                   Hao Qian and   
                    Fan Luo and   
               Zhaoyang Hao and   
                    Tong Wu and   
                 Ruiyuan Xu and   
                 Huimin Cui and   
              Xiaobing Feng and   
                 Guangli Li and   
                   Jingling Xue   OptiFX: Automatic Optimization for
                                  Convolutional Neural Networks with
                                  Aggressive Operator Fusion on GPUs . . . 51:1--51:??
                    Yifu He and   
                   Han Zhao and   
                 Weihao Cui and   
               Shulai Zhang and   
                  Quan Chen and   
                      Minyi Guo   ARACHNE: Optimizing Distributed Parallel
                                  Applications with Reduced Inter-Process
                                  Communication  . . . . . . . . . . . . . 52:1--52:??
                Kailin Yang and   
 José F. Martínez   VersaTile: Flexible Tiled Architectures
                                  via Associative Processors . . . . . . . 53:1--53:??
              Changqing Shi and   
                  Yufei Sun and   
                   Rui Chen and   
                Jiahao Wang and   
                  Qiang Guo and   
                Chunye Gong and   
                Yicheng Sui and   
                 Yutong Jin and   
                    Yuzhi Zhang   TransCL: an Automatic CUDA-to-OpenCL
                                  Programs Transformation Framework  . . . 54:1--54:??
                 Zhibo Xuan and   
                    Xin You and   
                Tianyu Feng and   
               Hailong Yang and   
              Zhongzhi Luan and   
                     Yi Liu and   
                     Depei Qian   SimTrace: Exploiting Spatial and
                                  Temporal Sampling for Large-Scale
                                  Performance Analysis . . . . . . . . . . 55:1--55:??
              Congyong Chen and   
              Shengan Zheng and   
               Yuhang Zhang and   
                  Linpeng Huang   FusionFS: a Contention-Resilient File
                                  System for Persistent CPU Caches . . . . 56:1--56:??
             Jingcheng Shen and   
                  Lang Yang and   
                 Linbo Long and   
                Zhenhua Tan and   
               Congming Gao and   
                  Kan Zhong and   
                Masao Okita and   
                   Fumihiko Ino   Overlapping Aware Data Placement
                                  Optimizations for LSM Tree-Based Store
                                  on ZNS SSDs  . . . . . . . . . . . . . . 57:1--57:??
               Minghua Shen and   
                Aoxiang Qin and   
                      Nong Xiao   ODGS: Dependency-Aware Scheduling for
                                  High-Level Synthesis with Graph Neural
                                  Network and Reinforcement Learning . . . 58:1--58:??
               Gaoyang Zhao and   
                  Qiuran Li and   
               Rongzhen Lin and   
                    Yaohua Wang   Shift-CIM: In-SRAM Alignment To Support
                                  General-Purpose Bit-level Sparsity
                                  Exploration in SRAM Multiplication . . . 59:1--59:??
                  Xin Cheng and   
                 Jinpeng Ye and   
                 Haoyu Deng and   
             Tingting Zhang and   
                 Tianyi Liu and   
                      Jian Wang   LitTLS: Lightweight Thread-Level
                                  Speculation on Little Cores  . . . . . . 60:1--60:??
               Chaoyang Jia and   
                 Jingyu Liu and   
                   Shi Chen and   
                     Kai Lu and   
                        Li Shen   TSN Cache: Exploiting Data Localities in
                                  Graph Computing Applications . . . . . . 61:1--61:??
               Shantian Qin and   
                 Zhihua Fan and   
                 Wenming Li and   
                  Zhen Wang and   
                  Xuejun An and   
                Xiaochun Ye and   
                    Dongrui Fan   PANDA: Adaptive Prefetching and
                                  Decentralized Scheduling for Dataflow
                                  Architectures  . . . . . . . . . . . . . 62:1--62:??
                    Yu Tang and   
                  Lujia Yin and   
                    Qiao Li and   
                 Hongyu Zhu and   
                 Hengjie Li and   
            Xingcheng Zhang and   
                 Linbo Qiao and   
               Dongsheng Li and   
                      Jiaxin Li   Koala: Efficient Pipeline Training
                                  through Automated Schedule Searching on
                                  Domain-Specific Language . . . . . . . . 63:1--63:??
                  Yuting Li and   
                     Yun Xu and   
             Pengcheng Wang and   
                 Yonghui Xu and   
                  Weiguang Wang   A Lock-free RDMA-friendly Index in
                                  CPU-parsimonious Environments  . . . . . 64:1--64:??
               Xueliang Wei and   
                   Dan Feng and   
                   Wei Tong and   
                    Bing Wu and   
                       Xu Jiang   SEED: Speculative Security Metadata
                                  Updates for Low-Latency Secure Memory    65:1--65:??
                  Xiaobo Lu and   
               Jianbin Fang and   
                   Lin Peng and   
                 Chun Huang and   
                  Zixiao Yu and   
                      Tiejun Li   Gator: Accelerating Graph Attention
                                  Networks by Jointly Optimizing Attention
                                  and Graph Processing . . . . . . . . . . 66:1--66:??
              Yacine Hakimi and   
            Riyadh Baghdadi and   
                 Yacine Challal   Supporting Dynamic Program Sizes in Deep
                                  Learning-Based Cost Models for Code
                                  Optimization . . . . . . . . . . . . . . 67:1--67:??
               Yicheng Wang and   
                   Lijie Xu and   
                   Tian Guo and   
               Wensheng Dou and   
               Hongbin Zeng and   
                   Wei Wang and   
                    Jun Wei and   
                      Tao Huang   BridgeGC: an Efficient Cross-Level
                                  Garbage Collector for Big Data
                                  Frameworks . . . . . . . . . . . . . . . 68:1--68:??
                    Zhen Du and   
                   Ying Liu and   
                Ninghui Sun and   
                 Huimin Cui and   
              Xiaobing Feng and   
                      Jiajia Li   SRSparse: Generating Codes for
                                  High-Performance Sparse Matrix-Vector
                                  Semiring Computations  . . . . . . . . . 69:1--69:??
                 Chenji Han and   
                Zifei Zhang and   
                   Feng Xue and   
                   Xinyu Li and   
                  Yuxuan Wu and   
             Tingting Zhang and   
                 Tianyi Liu and   
                     Qi Guo and   
                    Fuxin Zhang   SnsBooster: Enhancing Sampling-based $
                                  \mu $ Arch Evaluation Efficiency through
                                  Online Performance Sensitivity Analysis  70:1--70:??
                Amit Tiwari and   
           V. Krishna Nandivada   Unleashing Parallelism with
                                  Elastic-Barriers . . . . . . . . . . . . 71:1--71:??
              Gia Bao Thieu and   
                Sven Gesper and   
Guillermo Payá-Vayá   DCMA: Accelerating Parallel DMA
                                  Transfers with a Multi-Port Direct
                                  Cached Memory Access in a
                                  Massive-Parallel Vector Processor  . . . 72:1--72:??
    Aurélie Saulquin and   
              Mazdak Fatahi and   
              Pierre Boulet and   
                   Samy Meftali   ModNEF : an Open Source Modular
                                  Neuromorphic Emulator for FPGA for
                                  Low-Power In-Edge Artificial
                                  Intelligence . . . . . . . . . . . . . . 73:1--73:??
               Zhengding Hu and   
                Jingwei Sun and   
                 Guangzhong Sun   GNNPilot: a Holistic Framework for
                                  High-Performance Graph Neural Network
                                  Computations on GPUs . . . . . . . . . . 74:1--74:??
               Jinghao Zhao and   
               Hongwei Yang and   
                   Meng Hao and   
               Weizhe Zhang and   
                     Hui He and   
                   Desheng Wang   HEngine: a High Performance Optimization
                                  Framework on a GPU for Homomorphic
                                  Encryption . . . . . . . . . . . . . . . 75:1--75:??
                  Wen Cheng and   
               Qianya Cheng and   
                     Yi Liu and   
              Lingfang Zeng and   
            Andre Brinkmann and   
                      Yang Wang   9Ring: a $3$D-Stacked Memory-Based
                                  Accelerator for Flexible and Efficient
                                  Deep CNN Applications  . . . . . . . . . 76:1--76:26
                 Cunchen Hu and   
               Heyang Huang and   
              Liangliang Xu and   
               Xusheng Chen and   
                Chenxi Wang and   
                   Jiang Xu and   
                Shuang Chen and   
                   Hao Feng and   
                    Sa Wang and   
                Yungang Bao and   
                Ninghui Sun and   
                    Yizhou Shan   ShuffleInfer: Disaggregate LLM Inference
                                  for Mixed Downstream Workloads . . . . . 77:1--77:24
               Suchita Pati and   
               Shaizeen Aga and   
             Nuwan Jayasena and   
               Matthew Sinclair   GOLDYLOC: Global Optimizations &
                                  Lightweight Dynamic Logic for
                                  Concurrency  . . . . . . . . . . . . . . 78:1--78:28
                   Yi Zhang and   
                Xiaomeng Yi and   
                   Yu Huang and   
               Jingrui Yuan and   
               Chuangyi Gui and   
                   Dan Chen and   
                 Long Zheng and   
                Jianhui Yue and   
               Xiaofei Liao and   
                    Hai Jin and   
                   Jingling Xue   Cheetah: Accelerating Dynamic Graph
                                  Mining with Grouping Updates . . . . . . 79:1--79:26
       Manolis Katsaragakis and   
          Christos Baloukas and   
       Lazaros Papadopoulos and   
           Francky Catthoor and   
              Dimitrios Soudris   Performance, Energy and NVM
                                  Lifetime-Aware Data Structure Refinement
                                  and Placement for Heterogeneous Memory
                                  Systems  . . . . . . . . . . . . . . . . 80:1--80:27
                 Farui Wang and   
                   Meng Hao and   
                  Siyu Yang and   
                   Weizhe Zhang   Deep Learning Workload Mapping
                                  Optimization on Jetson Platforms . . . . 81:1--81:23
                 Wenlong Mu and   
                   Yue Tang and   
                   Bo Huang and   
                    Jianmei Guo   AOBO: a Fast-Switching Online Binary
                                  Optimizer on AArch64 . . . . . . . . . . 82:1--82:27

ACM Transactions on Architecture and Code Optimization
Volume 22, Number 3, September, 2025

               Konrad Moron and   
            Stefan Wallentowitz   Benchmarking WebAssembly for Embedded
                                  Systems  . . . . . . . . . . . . . . . . 83:1--83:21
                 Qian Xiong and   
                Weiliang Ma and   
                Xuanhua Shi and   
              Yongluan Zhou and   
                    Hai Jin and   
                Kaiyi Huang and   
               Haozhou Wang and   
                   Zhengru Wang   gECC: a GPU-based high-throughput
                                  framework for Elliptic Curve
                                  Cryptography . . . . . . . . . . . . . . 84:1--84:27
                  Haomin Li and   
                Fangxin Liu and   
                Zongwu Wang and   
                  Ning Yang and   
              Shiyuan Huang and   
              Xiaoyao Liang and   
               Haibing Guan and   
                       Li Jiang   Attack and Defense: Enhancing Robustness
                                  of Binary Hyper-Dimensional Computing    85:1--85:25
           Chris Kjellqvist and   
                 Lisa Wills and   
                   Alvin Lebeck   BigLittleMCA: a Spatially-Optimal Tiled
                                  Hardware Accelerator for MCMC Image
                                  Processing . . . . . . . . . . . . . . . 86:1--86:26
               Chaoyang Jia and   
                Zhang Dunbo and   
               Qingjie Lang and   
                 Ruoxi Wang and   
                        Li Shen   In-SRAM Parallel Data Shuffle  . . . . . 87:1--87:24
                Xinglei Dou and   
                    Lei Liu and   
               Zhuohao Wang and   
                      Pengyu Li   LarQucut: a New Cutting and Mapping
                                  Approach for Large-sized Quantum
                                  Circuits in Distributed Quantum
                                  Computing (DQC) Environments . . . . . . 88:1--88:24
                   Hao Ding and   
               Peiling Song and   
                   Yelin Li and   
                    Junyan Qian   A Two-Stage Degradation-Based Topology
                                  Reconfiguration Algorithm for
                                  Fault-Tolerant Multiprocessor Arrays . . 89:1--89:26
                   Xiang Li and   
                Qiong Chang and   
                     Yun Li and   
                   Jun Miyazaki   $3$D GNLM: Efficient $3$D Non-Local
                                  Means Kernel with Nested Reuse
                                  Strategies for Embedded GPUs . . . . . . 90:1--90:22
                 Yiming Sun and   
                  Jie Zhang and   
                 Huawei Cao and   
                 Yuan Zhang and   
                  Xuejun An and   
              Junying Huang and   
                    Xiaochun Ye   CGCGraph: Efficient CPU-GPU Co-execution
                                  for Concurrent Dynamic Graph Processing  91:1--91:26
                Zhanyuan Di and   
                Leping Wang and   
                 Zhaojia Ma and   
                    En Shao and   
                   Jie Zhao and   
                   Ziyi Ren and   
                Siyuan Feng and   
                Dingwen Tao and   
              Guangming Tan and   
                    Ninghui Sun   Accelerating Parallel Structures in DNNs
                                  via Parallel Fusion and Operator
                                  Co-Optimization  . . . . . . . . . . . . 92:1--92:26
                  Ruihao Li and   
           Bagus Hanindhito and   
              Sanjana Yadav and   
                  Qinzhe Wu and   
               Krishna Kavi and   
              Gayatri Mehta and   
       Neeraja J. Yadwadkar and   
                   Lizy K. John   Performance Implications of Pipelining
                                  the Data Transfer in CPU-GPU
                                  Heterogeneous Systems  . . . . . . . . . 93:1--93:26
               Haozhong Qiu and   
                 Chuanfu Xu and   
               Jianbin Fang and   
                 Jian Zhang and   
                 Liang Deng and   
                    Zhe Dai and   
                   Yue Ding and   
                   Yue Wang and   
                Zhimeng Han and   
               Yonggang Che and   
                        Jie Liu   DCSolver: Accelerating Sparse Iterative
                                  Solvers via Divide-and-Conquer on GPUs   94:1--94:25
                 Yachun Liu and   
                   Dan Feng and   
                Jianxi Chen and   
                    Jing Hu and   
              Zhouxuan Peng and   
                      Jinlei Hu   ZNSFQ: an Efficient and High-Performance
                                  Fair Queue Scheduling Scheme for ZNS
                                  SSDs . . . . . . . . . . . . . . . . . . 95:1--95:27
   Omar Shaaban Ibrahim ali and   
  Juliette Fournis d'Albiat and   
          Isabel Piedrahita and   
      Vicenç Beltran and   
           Xavier Martorell and   
             Paul Carpenter and   
      Eduard Ayguadé and   
                  Jesus Labarta   Leveraging iterative applications to
                                  improve the scalability of task-based
                                  programming models on distributed
                                  systems  . . . . . . . . . . . . . . . . 96:1--96:27
                 Suhong Lee and   
                 Boyeal Kim and   
              Yongseok Choi and   
                   Hyuk-Jae Lee   HopScotch: a Holistic Approach to Data
                                  Layout-Aware Mapping on NPUs for
                                  High-Performance DNN Inference . . . . . 97:1--97:26
                 Qiliang Li and   
                    Min Lyu and   
                   Tian Liu and   
              Liangliang Xu and   
                   Wei Wang and   
                     Yinlong Xu   MetaEC: an Efficient and Resilient
                                  Erasure-Coded KV Store on Disaggregated
                                  Memory . . . . . . . . . . . . . . . . . 98:1--98:26
                   Han Zhao and   
                 Weihao Cui and   
                  Quan Chen and   
                   Zijun Li and   
                Zhenhua Han and   
                   Nan Wang and   
                    Yu Feng and   
                 Jieru Zhao and   
                  Chen Chen and   
               Jingwen Leng and   
                      Minyi Guo   EDAS: Enabling Fast Data Loading for GPU
                                  Serverless Computing . . . . . . . . . . 99:1--99:23
                  Mary Hall and   
           Cosmin E. Oancea and   
             Anne C. Elster and   
                  Ari Rasch and   
             Sameeran Joshi and   
    Amir Mohammad Tavakkoli and   
                Richard Schulze   Scheduling Language Chronology: Past,
                                  Present, and Future  . . . . . . . . . . 100:1--100:31
                Zhibing Sha and   
                Shuaiwen Yu and   
             Chengyong Tang and   
                Zhigang Cai and   
                  Peng Tang and   
                 Ming Huang and   
                     Jun Li and   
                   Jianwei Liao   Supports of Data Cache Division for
                                  Computational Solid-state Drives . . . . 101:1--101:20
               Lingxiao Jin and   
                  Zinuo Cai and   
                Haoxin Wang and   
               Zongpu Zhang and   
                   Ruhui Ma and   
               Haibing Guan and   
                   Yuan Liu and   
                 Buyya Rajkumar   Ephemera: Accelerating I/O-Intensive
                                  Serverless Workloads with a Harvested
                                  In-memory File System  . . . . . . . . . 102:1--102:24
                  Yulong Wu and   
                   Yehan Ma and   
               Mingdong Xie and   
                   Weizhe Zhang   Partitioned Scheduling and Analysis for
                                  a Typed DAG Task on Heterogeneous
                                  Multi-Cores  . . . . . . . . . . . . . . 103:1--103:24
                    Wei Niu and   
                Mengshu Sun and   
                Zhengang Li and   
                Jou-An Chen and   
              Jiexiong Guan and   
                Xipeng Shen and   
                    Jun Liu and   
                  Mei Zhang and   
                Yanzhi Wang and   
                    Xue Lin and   
                        Bin Ren   Mobile-$3$DCNN: an Acceleration
                                  Framework for Ultra-Real-Time Execution
                                  of Large $3$D CNNs on Mobile Devices . . 104:1--104:22
                  Yudong Mu and   
                 Zhihua Fan and   
                 Wenming Li and   
              Zhiyuan Zhang and   
                  Xuejun An and   
                Dongrui Fan and   
                    Xiaochun Ye   GenCNN: a Partition-Aware
                                  Multi-Objective Mapping Framework for
                                  CNN Accelerators Based on Genetic
                                  Algorithm  . . . . . . . . . . . . . . . 105:1--105:26
                 Neel Patel and   
                   Ren Wang and   
                 Mohammad Alian   RACER: Avoiding End-to-End Slowdowns in
                                  Accelerated Chip Multi-Processors  . . . 106:1--106:22
                   Ziyue Xu and   
                  Yichen Li and   
                Ranzhe Deng and   
                  Liping Yi and   
                   Yusen Li and   
                  Gang Wang and   
                  Xiaoguang Liu   SampDedup: Sampling Prediction for
                                  Efficient Inline Data Deduplication on
                                  Non-volatile Memory  . . . . . . . . . . 107:1--107:25
                    Hui Sun and   
                 Qianli Yue and   
             Guanzhong Chen and   
                     Yi Zou and   
               Yinliang Yue and   
                       Xiao Qin   HAKV: a Hotness-Aware Zone Management
                                  Approach to Optimizing Performance of
                                  LSM-tree-based Key-Value Stores  . . . . 108:1--108:26
                 Lixiao Cui and   
                  Kedi Yang and   
                   Yusen Li and   
                  Gang Wang and   
                  Xiaoguang Liu   Towards Optimizing Learned Index for
                                  High Performance, Memory Efficiency and
                                  NUMA Awareness . . . . . . . . . . . . . 109:1--109:26
               Marcin Copik and   
          Lukas Möller and   
         Alexandru Calotoiu and   
                Torsten Hoefler   Cppless: Single-Source and
                                  High-Performance Serverless Programming
                                  in C++ . . . . . . . . . . . . . . . . . 110:1--110:27
                Yifan Zhang and   
                 Xiaoyu Niu and   
             Hongzheng Tian and   
               Yanjun Zhang and   
                      Bo Yu and   
               Shaoshan Liu and   
                    Sitao Huang   A Sparsity-Aware Autonomous Path
                                  Planning Accelerator with HW\slash SW
                                  Co-Design and Multi-Level Dataflow
                                  Optimization . . . . . . . . . . . . . . 111:1--111:25
                    Xinbiao Gan   TianheGraph: Topology-aware Graph
                                  Processing . . . . . . . . . . . . . . . 112:1--112:24