Table of contents for issues of ACM Transactions on Architecture and Code Optimization

Last update: Sat Dec 23 08:30:17 MST 2017                Valid HTML 3.2!

Volume 1, Number 1, March, 2004
Volume 1, Number 2, June, 2004
Volume 1, Number 3, September, 2004
Volume 1, Number 4, December, 2004
Volume 2, Number 1, March, 2005
Volume 2, Number 2, June, 2005
Volume 2, Number 3, September, 2005
Volume 2, Number 4, December, 2005
Volume 3, Number 1, March, 2006
Volume 3, Number 2, June, 2006
Volume 3, Number 3, September, 2006
Volume 3, Number 4, December, 2006
Volume 4, Number 1, March, 2007
Volume 4, Number 2, June, 2007
Volume 4, Number 3, September, 2007
Volume 4, Number 4, January, 2008
Volume 5, Number 1, May, 2008
Volume 5, Number 2, August, 2008
Volume 5, Number 3, November, 2008
Volume 5, Number 4, March, 2009
Volume 6, Number 1, March, 2009
Volume 6, Number 2, June, 2009
Volume 6, Number 3, September, 2009
Volume 6, Number 4, October, 2009
Volume 7, Number 1, April, 2010
Volume 7, Number 2, September, 2010
Volume 7, Number 3, December, 2010
Volume 7, Number 4, December, 2010
Volume 8, Number 1, April, 2011
Volume 8, Number 2, July, 2011
Volume 8, Number 3, October, 2011
Volume 8, Number 4, January, 2012
Volume 9, Number 1, March, 2012
Volume 9, Number 2, June, 2012
Volume 9, Number 3, September, 2012
Volume 9, Number 4, January, 2013
Volume 10, Number 1, April, 2013
Volume 10, Number 2, May, 2013
Volume 10, Number 3, September, 2013
Volume 10, Number 4, December, 2013
Volume 11, Number 1, February, 2014
Volume 11, Number 2, June, 2014
Volume 11, Number 3, October, 2014
Volume 11, Number 4, January, 2015
Volume 12, Number 1, April, 2015
Volume 12, Number 2, July, 2015
Volume 12, Number 3, October, 2015
Volume 12, Number 4, January, 2016
Volume 13, Number 1, April, 2016
Volume 13, Number 2, June, 2016
Volume 13, Number 3, September, 2016
Volume 13, Number 4, December, 2016
Volume 14, Number 1, April, 2017
Volume 14, Number 2, July, 2017
Volume 14, Number 3, September, 2017
Volume 14, Number 4, December, 2017


ACM Transactions on Architecture and Code Optimization
Volume 1, Number 1, March, 2004

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
                   W. Zhang and   
                   J. S. Hu and   
               V. Degalahal and   
                M. Kandemir and   
           N. Vijaykrishnan and   
                    M. J. Irwin   Reducing instruction cache energy
                                  consumption using a compiler-based
                                  strategy . . . . . . . . . . . . . . . . 3--33
          Nemanja Isailovic and   
               Mark Whitney and   
               Yatish Patel and   
           John Kubiatowicz and   
                Dean Copsey and   
          Frederic T. Chong and   
            Isaac L. Chuang and   
                     Mark Oskin   Datapath and control for quantum wires   34--61
  Karthikeyan Sankaralingam and   
         Ramadass Nagarajan and   
                Haiming Liu and   
               Changkyu Kim and   
                Jaehyuk Huh and   
          Nitya Ranganathan and   
                Doug Burger and   
         Stephen W. Keckler and   
         Robert G. McDonald and   
               Charles R. Moore   TRIPS: a polymorphous architecture for
                                  exploiting ILP, TLP, and DLP . . . . . . 62--93
              Kevin Skadron and   
             Mircea R. Stan and   
   Karthik Sankaranarayanan and   
                  Wei Huang and   
         Sivakumar Velusamy and   
                   David Tarjan   Temperature-aware microarchitecture:
                                  Modeling and implementation  . . . . . . 94--125

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 2, June, 2004

               Alex Alet\`a and   
            Josep M. Codina and   
    Antonio González and   
                    David Kaeli   Removing communications in clustered
                                  microarchitectures through instruction
                                  replication  . . . . . . . . . . . . . . 127--151
                     Yu Bai and   
                  R. Iris Bahar   A low-power in-order/out-of-order issue
                                  queue  . . . . . . . . . . . . . . . . . 152--179
                Philo Juang and   
              Kevin Skadron and   
         Margaret Martonosi and   
                 Zhigang Hu and   
           Douglas W. Clark and   
          Philip W. Diodato and   
               Stefanos Kaxiras   Implementing branch-predictor decay
                                  using quasi-static memory cells  . . . . 180--219
        Oliverio J. Santana and   
               Alex Ramirez and   
       Josep L. Larriba-Pey and   
                   Mateo Valero   A low-complexity fetch architecture for
                                  high-performance superscalar processors  220--245

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 3, September, 2004

                    Jin Lin and   
                  Tong Chen and   
              Wei-Chung Hsu and   
              Pen-Chung Yew and   
            Roy Dz-Ching Ju and   
              Tin-Fook Ngai and   
                       Sun Chan   A compiler framework for speculative
                                  optimizations  . . . . . . . . . . . . . 247--271
            Brian A. Fields and   
            Rastislav Bodik and   
               Mark D. Hill and   
               Chris J. Newburn   Interaction cost and shotgun profiling   272--304
   Karthik Sankaranarayanan and   
                  Kevin Skadron   Profile-based adaptation for cache decay 305--322
                    Fen Xie and   
         Margaret Martonosi and   
                   Sharad Malik   Intraprogram dynamic voltage scaling:
                                  Bounding opportunities with analytic
                                  modeling . . . . . . . . . . . . . . . . 323--367

ACM Transactions on Architecture and Code Optimization
Volume 1, Number 4, December, 2004

               A. Hartstein and   
                Thomas R. Puzak   The optimum pipeline depth considering
                                  both power and performance . . . . . . . 369--388
      Adrián Cristal and   
        Oliverio J. Santana and   
               Mateo Valero and   
 José F. Martínez   Toward kilo-instruction processors . . . 389--417
             Haitham Akkary and   
                Ravi Rajwar and   
         Srikanth T. Srinivasan   An analysis of a resource efficient
                                  checkpoint architecture  . . . . . . . . 418--444
              Chia-Lin Yang and   
            Alvin R. Lebeck and   
             Hung-Wei Tseng and   
                  Chien-Hao Lee   Tolerating memory latency through push
                                  prefetching for pointer-intensive
                                  applications . . . . . . . . . . . . . . 445--475


ACM Transactions on Architecture and Code Optimization
Volume 2, Number 1, March, 2005

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
              Yuanyuan Zhou and   
                   Pin Zhou and   
                   Feng Qin and   
                    Wei Liu and   
                Josep Torrellas   Efficient and flexible architectural
                                  support for dynamic monitoring . . . . . 3--33
             Chuanjun Zhang and   
                Frank Vahid and   
                   Jun Yang and   
                   Walid Najjar   A way-halting cache for low-energy
                                  high-performance systems . . . . . . . . 34--54
               Jaume Abella and   
    Antonio González and   
                Xavier Vera and   
          Michael F. P. O'Boyle   IATAC: a smart predictor to turn-off L2
                                  cache lines  . . . . . . . . . . . . . . 55--77
       John W. Haskins, Jr. and   
                  Kevin Skadron   Accelerated warmup for sampled
                                  microarchitecture simulation . . . . . . 78--108

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 2, June, 2005

                     Tao Li and   
              Ravi Bhargava and   
               Lizy Kurian John   Adapting branch-target buffer to improve
                                  the target predictability of Java code   109--130
               Lingli Zhang and   
                 Chandra Krintz   The design, implementation, and
                                  evaluation of adaptive code unloading
                                  for resource-constrained devices . . . . 131--164
         Prasad A. Kulkarni and   
           Stephen R. Hines and   
           David B. Whalley and   
             Jason D. Hiser and   
           Jack W. Davidson and   
               Douglas L. Jones   Fast and efficient searches for
                                  effective optimization-phase sequences   165--198
       Esther Salamí and   
                   Mateo Valero   Dynamic memory interval test vs.
                                  interprocedural pointer analysis in
                                  multimedia applications  . . . . . . . . 199--219

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 3, September, 2005

                   Yan Meng and   
           Timothy Sherwood and   
                   Ryan Kastner   Exploring the limits of leakage power
                                  reduction in caches  . . . . . . . . . . 221--246
María Jesús Garzarán and   
            Milos Prvulovic and   
José María Llabería and   
Víctor Viñals and   
       Lawrence Rauchwerger and   
                Josep Torrellas   Tradeoffs in buffering speculative
                                  memory state for thread-level
                                  speculation in multiprocessors . . . . . 247--279
               David Tarjan and   
                  Kevin Skadron   Merging path and gshare indexing in
                                  perceptron branch prediction . . . . . . 280--300
              Xiangyu Zhang and   
                    Rajiv Gupta   Whole execution traces and their
                                  applications . . . . . . . . . . . . . . 301--334

ACM Transactions on Architecture and Code Optimization
Volume 2, Number 4, December, 2005

               Wankang Zhao and   
              David Whalley and   
          Christopher Healy and   
                  Frank Mueller   Improving WCET by applying a WC
                                  code-positioning optimization  . . . . . 335--365
             George A. Reis and   
             Jonathan Chang and   
          Neil Vachharajani and   
                 Ram Rangan and   
            David I. August and   
         Shubhendu S. Mukherjee   Software-controlled fault tolerance  . . 366--396
                    Jian Li and   
 José F. Martínez   Power-performance considerations of
                                  parallel computing on chip
                                  multiprocessors  . . . . . . . . . . . . 397--422
             Saurabh Sharma and   
               Jesse G. Beu and   
                Thomas M. Conte   Spectral prefetcher: An effective
                                  mechanism for L2 cache prefetching . . . 423--450


ACM Transactions on Architecture and Code Optimization
Volume 3, Number 1, March, 2006

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1--2
                    Lin Tan and   
           Brett Brotherton and   
               Timothy Sherwood   Bit-split string-matching engines for
                                  intrusion detection and prevention . . . 3--34
            Priya Nagpurkar and   
               Hussam Mousa and   
             Chandra Krintz and   
               Timothy Sherwood   Efficient remote profiling for
                                  resource-constrained devices . . . . . . 35--66
                    Jin Lin and   
              Wei-Chung Hsu and   
              Pen-Chung Yew and   
            Roy Dz-Ching Ju and   
                  Tin-Fook Ngai   Recovery code generation for general
                                  speculative optimizations  . . . . . . . 67--89
               Yoonseo Choi and   
                    Hwansoo Han   Optimal register reassignment for
                                  register stack overflow minimization . . 90--114

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 2, June, 2006

               Jingling Xue and   
                      Qiong Cai   A lifetime optimal algorithm for
                                  speculative PRE  . . . . . . . . . . . . 115--155
          Joseph J. Sharkey and   
        Dmitry V. Ponomarev and   
                Kanad Ghose and   
                     Oguz Ergin   Instruction packing: Toward fast and
                                  energy-efficient instruction scheduling  156--181
                  Luis Ceze and   
              Karin Strauss and   
                 James Tuck and   
            Josep Torrellas and   
                     Jose Renau   CAVA: Using checkpoint-assisted value
                                  prediction to hide L2 misses . . . . . . 182--208
                Lixin Zhang and   
                Mike Parker and   
                    John Carter   Efficient address remapping in
                                  distributed shared-memory systems  . . . 209--229

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 3, September, 2006

                   Min Zhao and   
          Bruce R. Childers and   
                 Mary Lou Soffa   An approach toward profit-driven
                                  optimization . . . . . . . . . . . . . . 231--262
              Kim Hazelwood and   
               Michael D. Smith   Managing bounded code caches in dynamic
                                  binary optimization systems  . . . . . . 263--294
        Olivier Rochecouste and   
               Gilles Pokam and   
            André Seznec   A case for a complexity-effective,
                                  width-partitioned microarchitecture  . . 295--326
                Ahmad Zmily and   
             Christos Kozyrakis   Block-aware instruction set architecture 327--357

ACM Transactions on Architecture and Code Optimization
Volume 3, Number 4, December, 2006

       Jedidiah R. Crandall and   
                S. Felix Wu and   
              Frederic T. Chong   Minos: Architectural support for
                                  protecting control data  . . . . . . . . 359--389
            Jaydeep Marathe and   
              Frank Mueller and   
          Bronis R. de Supinski   Analysis of cache-coherence bottlenecks
                                  with hybrid hardware/software techniques 390--423
               Ilya Ganusov and   
               Martin Burtscher   Future execution: a prefetching
                                  mechanism that uses multiple cores to
                                  speed up single threads  . . . . . . . . 424--449
                 Michele Co and   
           Dee A. B. Weikle and   
                  Kevin Skadron   Evaluating trace cache energy efficiency 450--476
                  Shiwen Hu and   
            Madhavi Valluri and   
               Lizy Kurian John   Effective management of multiple
                                  configurable units using dynamic
                                  optimization . . . . . . . . . . . . . . 477--501
              Chris Bentley and   
         Scott A. Watterson and   
         David K. Lowenthal and   
                 Barry Rountree   Implicit array bounds checking on 64-bit
                                  architectures  . . . . . . . . . . . . . 502--527


ACM Transactions on Architecture and Code Optimization
Volume 4, Number 1, March, 2007

                Brad Calder and   
                   Dean Tullsen   Introduction . . . . . . . . . . . . . . 1:1--1:1
      Kypros Constantinides and   
              Stephen Plaza and   
                Jason Blome and   
           Valeria Bertacco and   
               Scott Mahlke and   
                Todd Austin and   
                  Bin Zhang and   
              Michael Orshansky   Architecting a reliable CMP switch
                                  architecture . . . . . . . . . . . . . . 2:1--2:37
            Ruchira Sasanka and   
                 Man-Lap Li and   
             Sarita V. Adve and   
             Yen-Kuang Chen and   
                     Eric Debes   ALP: Efficient support for all levels of
                                  parallelism for complex media
                                  applications . . . . . . . . . . . . . . 3:1--3:30
                    Yan Luo and   
                     Jia Yu and   
                   Jun Yang and   
                Laxmi N. Bhuyan   Conserving network processor power
                                  consumption by exploiting traffic
                                  variability  . . . . . . . . . . . . . . 4:1--4:26
            Vassos Soteriou and   
                Noel Eisley and   
                  Li-Shiuan Peh   Software-directed power-aware
                                  interconnection networks . . . . . . . . 5:1--5:40
            Yuan-Shin Hwang and   
                     Jia-Jhe Li   Snug set-associative caches: Reducing
                                  leakage power of instruction and data
                                  caches with no performance penalties . . 6:1--6:28
                Hongbo Rong and   
              Zhizhong Tang and   
            R. Govindarajan and   
             Alban Douillet and   
                   Guang R. Gao   Single-dimension software pipelining for
                                  multidimensional loops . . . . . . . . . 7:1--7:44

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 2, June, 2007

              Fred A. Bower and   
            Daniel J. Sorin and   
                      Sule Ozev   Online diagnosis of hard faults in
                                  microprocessors  . . . . . . . . . . . . 8:1--8:??
             Pierre Michaud and   
        André Seznec and   
               Damien Fetis and   
         Yiannakis Sazeides and   
         Theofanis Constantinou   A study of thread migration in
                                  temperature-constrained multicores . . . 9:1--9:??
                    Yu Chen and   
                    Fuxin Zhang   Code reordering on limited branch offset 10:1--10:??
             A. S. Terechko and   
                   H. Corporaal   Inter-cluster communication in VLIW
                                  architectures  . . . . . . . . . . . . . 11:1--11:??
                 Jialin Dou and   
                 Marcelo Cintra   A compiler cost model for speculative
                                  parallelization  . . . . . . . . . . . . 12:1--12:??
               Wolfram Amme and   
          Jeffery von Ronne and   
                  Michael Franz   SSA-based mobile code: Implementation
                                  and empirical evaluation . . . . . . . . 13:1--13:??

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 3, September, 2007

                Xiaodong Li and   
                 Ritu Gupta and   
             Sarita V. Adve and   
                  Yuanyuan Zhou   Cross-component energy management: Joint
                                  adaptation of processor and memory . . . 14:1--14:??
                  Ron Gabor and   
               Shlomo Weiss and   
                  Avi Mendelson   Fairness enforcement in switch on event
                                  multithreading . . . . . . . . . . . . . 15:1--15:??
              Diego Andrade and   
        Basilio B. Fraguela and   
            Ramón Doallo   Precise automatable analytical modeling
                                  of the cache behavior of codes with
                                  indirections . . . . . . . . . . . . . . 16:1--16:??
           Kris Venstermans and   
            Lieven Eeckhout and   
              Koen De Bosschere   Java object header elimination for
                                  reduced memory consumption in 64-bit
                                  virtual machines . . . . . . . . . . . . 17:1--17:??
                   Shu Xiao and   
               Edmund M.-K. Lai   VLIW instruction scheduling for minimal
                                  power variation  . . . . . . . . . . . . 18:1--18:??
            Sriraman Tallam and   
                    Rajiv Gupta   Unified control flow and data dependence
                                  traces . . . . . . . . . . . . . . . . . 19:1--19:??

ACM Transactions on Architecture and Code Optimization
Volume 4, Number 4, January, 2008

                 Engin Ipek and   
             Sally A. McKee and   
                Karan Singh and   
               Rich Caruana and   
      Bronis R. de Supinski and   
                  Martin Schulz   Efficient architectural design space
                                  exploration via predictive modeling  . . 1:1--1:??
                  Yunhe Shi and   
                Kevin Casey and   
              M. Anton Ertl and   
                    David Gregg   Virtual machine showdown: Stack versus
                                  registers  . . . . . . . . . . . . . . . 2:1--2:??
                    Jun Yan and   
                      Wei Zhang   Exploiting virtual registers to reduce
                                  pressure on real registers . . . . . . . 3:1--3:??
               Zoe C. H. Yu and   
          Francis C. M. Lau and   
                    Cho-Li Wang   Object co-location and memory reuse for
                                  Java programs  . . . . . . . . . . . . . 4:1--4:??
                 Chuanjun Zhang   Reducing cache misses through
                                  programmable decoders  . . . . . . . . . 5:1--5:??
              Amit Golander and   
                   Shlomo Weiss   Hiding the misprediction penalty of a
                                  resource-efficient high-performance
                                  processor  . . . . . . . . . . . . . . . 6:1--6:??


ACM Transactions on Architecture and Code Optimization
Volume 5, Number 1, May, 2008

                Brad Calder and   
                   Dean Tullsen   Editorial  . . . . . . . . . . . . . . . 1:1--1:??
          Shashidhar Mysore and   
              Banit Agrawal and   
             Rodolfo Neuber and   
           Timothy Sherwood and   
       Nisheeth Shrivastava and   
                   Subhash Suri   Formulating and implementing profiling
                                  over adaptive ranges . . . . . . . . . . 2:1--2:??
               Antonia Zhai and   
         J. Gregory Steffan and   
     Christopher B. Colohan and   
                  Todd C. Mowry   Compiler and hardware support for
                                  reducing the synchronization of
                                  speculative threads  . . . . . . . . . . 3:1--3:??
         Jonathan A. Winter and   
              David H. Albonesi   Addressing thermal nonuniformity in SMT
                                  workloads  . . . . . . . . . . . . . . . 4:1--4:??
      Asadollah Shahbahrami and   
               Ben Juurlink and   
           Stamatis Vassiliadis   Versatility of extended subwords and the
                                  matrix register file . . . . . . . . . . 5:1--5:??
                    Zhi Guo and   
               Walid Najjar and   
                Betul Buyukkurt   Efficient hardware code generation for
                                  FPGAs  . . . . . . . . . . . . . . . . . 6:1--6:??
            Thomas Kotzmann and   
           Christian Wimmer and   
Hanspeter Mössenböck and   
           Thomas Rodriguez and   
            Kenneth Russell and   
                      David Cox   Design of the Java HotSpot\TM client
                                  compiler for Java 6  . . . . . . . . . . 7:1--7:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 2, August, 2008

                 Ram Rangan and   
          Neil Vachharajani and   
           Guilherme Ottoni and   
                David I. August   Performance scalability of decoupled
                                  software pipelining  . . . . . . . . . . 8:1--8:??
                 Jieyi Long and   
         Seda Ogrenci Memik and   
               Gokhan Memik and   
             Rajarshi Mukherjee   Thermal monitoring mechanisms for chip
                                  multiprocessors  . . . . . . . . . . . . 9:1--9:??
                 Ajay Joshi and   
            Lieven Eeckhout and   
        Robert H. Bell, Jr. and   
                   Lizy K. John   Distilling the essence of proprietary
                                  workloads into miniature benchmarks  . . 10:1--10:??
           Vincenzo Catania and   
            Maurizio Palesi and   
                   Davide Patti   Reducing complexity of multiobjective
                                  design space exploration in VLIW-based
                                  embedded systems . . . . . . . . . . . . 11:1--11:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 3, November, 2008

             Jacob Leverich and   
             Hideho Arakida and   
          Alex Solomatnikov and   
         Amin Firoozshahian and   
              Mark Horowitz and   
             Christos Kozyrakis   Comparative evaluation of memory models
                                  for chip multiprocessors . . . . . . . . 12:1--12:??
          Joseph J. Sharkey and   
                 Jason Loew and   
            Dmitry V. Ponomarev   Reducing register pressure in SMT
                                  processors through L2-miss-driven early
                                  register release . . . . . . . . . . . . 13:1--13:??
            Mojtaba Mehrara and   
                    Todd Austin   Exploiting selective placement for
                                  low-cost memory protection . . . . . . . 14:1--14:??
        Hans Vandierendonck and   
            André Seznec   Speculative return address stack
                                  management revisited . . . . . . . . . . 15:1--15:??

ACM Transactions on Architecture and Code Optimization
Volume 5, Number 4, March, 2009

         Siddhartha Chhabra and   
               Brian Rogers and   
                Yan Solihin and   
                Milos Prvulovic   Making secure processors OS- and
                                  performance-friendly . . . . . . . . . . 16:1--16:??
       Daniel A. Jiménez   Generalizing neural branch prediction    17:1--17:??
              Jinseong Jeon and   
             Keoncheol Shin and   
                    Hwansoo Han   Abstracting access patterns of dynamic
                                  memory using regular expressions . . . . 18:1--18:??
            Ghassan Shobaki and   
                Kent Wilken and   
                 Mark Heffernan   Optimal trace scheduling using
                                  enumeration  . . . . . . . . . . . . . . 19:1--19:??


ACM Transactions on Architecture and Code Optimization
Volume 6, Number 1, March, 2009

         Prasad A. Kulkarni and   
           David B. Whalley and   
              Gary S. Tyson and   
               Jack W. Davidson   Practical exhaustive optimization phase
                                  order exploration and evaluation . . . . 1:1--1:??
           Manuel Hohenauer and   
                Felix Engel and   
             Rainer Leupers and   
               Gerd Ascheid and   
                  Heinrich Meyr   A SIMD optimization framework for
                                  retargetable compilers . . . . . . . . . 2:1--2:??
              Stijn Eyerman and   
                Lieven Eeckhout   Memory-level parallelism aware fetch
                                  policies for simultaneous multithreading
                                  processors . . . . . . . . . . . . . . . 3:1--3:??
             Lukasz Strozek and   
                   David Brooks   Energy- and area-efficient architectures
                                  through application clustering and
                                  architectural heterogeneity  . . . . . . 4:1--4:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 2, June, 2009

         Guru Venkataramani and   
           Ioannis Doudalis and   
                Yan Solihin and   
                Milos Prvulovic   MemTracker: An accelerator for memory
                                  debugging and monitoring . . . . . . . . 5:1--5:??
                  Ron Gabor and   
              Avi Mendelson and   
                   Shlomo Weiss   Service level agreement for
                                  multithreaded processors . . . . . . . . 6:1--6:??
          Wilson W. L. Fung and   
                  Ivan Sham and   
                George Yuan and   
                  Tor M. Aamodt   Dynamic warp formation: Efficient MIMD
                                  control flow on SIMD graphics hardware   7:1--7:??
              Cheng-Kok Koh and   
              Weng-Fai Wong and   
                 Yiran Chen and   
                         Hai Li   Tolerating process variations in large,
                                  set-associative caches: The buddy cache  8:1--8:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 3, September, 2009

                    Lian Li and   
                   Hui Feng and   
                   Jingling Xue   Compiler-directed scratchpad memory
                                  management via graph coloring  . . . . . 9:1--9:??
              Amit Golander and   
                   Shlomo Weiss   Checkpoint allocation and release  . . . 10:1--10:??
                 Weifeng Xu and   
                Russell Tessier   Tetris-XL: a performance-driven spill
                                  reduction technique for embedded VLIW
                                  processors . . . . . . . . . . . . . . . 11:1--11:??
           Timothy M. Jones and   
      Michael F. P. O'Boyle and   
               Jaume Abella and   
    Antonio González and   
                   O\uguz Ergin   Exploring the limits of early register
                                  release: Exploiting compiler analysis    12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 6, Number 4, October, 2009

           Timothy M. Jones and   
      Michael F. P. O'Boyle and   
               Jaume Abella and   
    Antonio González and   
                   O\uguz Ergin   Energy-efficient register caching with
                                  compiler assistance  . . . . . . . . . . 13:1--13:??
                  Weijia Li and   
               Youtao Zhang and   
                   Jun Yang and   
                    Jiang Zheng   Towards update-conscious compilation for
                                  energy-efficient code dissemination in
                                  WSNs . . . . . . . . . . . . . . . . . . 14:1--14:??
              Michal Wegiel and   
                 Chandra Krintz   The single-referent collector:
                                  Optimizing compaction for the common
                                  case . . . . . . . . . . . . . . . . . . 15:1--15:??
      Samantika Subramaniam and   
                 Gabriel H. Loh   Design and optimization of the store
                                  vectors memory dependence predictor  . . 16:1--16:??


ACM Transactions on Architecture and Code Optimization
Volume 7, Number 1, April, 2010

              Xiaohang Wang and   
                   Mei Yang and   
              Yingtao Jiang and   
                       Peng Liu   A power-aware mapping approach to map IP
                                  cores onto NoCs under bandwidth and
                                  latency constraints  . . . . . . . . . . 1:1--1:??
              Zhong-Ho Chen and   
                 Alvin W. Y. Su   A hardware/software framework for
                                  instruction and data scratchpad memory
                                  allocation . . . . . . . . . . . . . . . 2:1--2:??
              Dong Hyuk Woo and   
           Joshua B. Fryman and   
             Allan D. Knies and   
              Hsien-Hsin S. Lee   Chameleon: Virtualizing idle
                                  acceleration cores of a heterogeneous
                                  multicore processor for caching and
                                  prefetching  . . . . . . . . . . . . . . 3:1--3:??
             Daniel Sanchez and   
    George Michelogiannakis and   
             Christos Kozyrakis   An analysis of on-chip interconnection
                                  networks for large-scale chip
                                  multiprocessors  . . . . . . . . . . . . 4:1--4:??
                 Xiuyi Zhou and   
                   Jun Yang and   
              Marek Chrobak and   
                   Youtao Zhang   Performance-aware thermal management via
                                  task scheduling  . . . . . . . . . . . . 5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 2, September, 2010

              Arun Raghavan and   
             Colin Blundell and   
              Milo M. K. Martin   Token tenure and PATCH: a
                                  predictive/adaptive token-counting
                                  hybrid . . . . . . . . . . . . . . . . . 6:1--6:??
           Christian Wimmer and   
Hanspeter Mössenbösck   Automatic feedback-directed object
                                  fusing . . . . . . . . . . . . . . . . . 7:1--7:??
            Benjamin C. Lee and   
                   David Brooks   Applied inference: Case studies in
                                  microarchitectural design  . . . . . . . 8:1--8:??
                  R. Rakvic and   
                     Q. Cai and   
         J. González and   
                 G. Magklis and   
                P. Chaparro and   
             A. González   Thread-management techniques to maximize
                                  efficiency in multicore and simultaneous
                                  multithreaded microprocessors  . . . . . 9:1--9:??
                  Derek Pao and   
                    Wei Lin and   
                        Bin Liu   A memory-efficient pipelined
                                  implementation of the Aho--Corasick
                                  string-matching algorithm  . . . . . . . 10:1--10:??
                Xuejun Yang and   
                 Ying Zhang and   
                 Xicheng Lu and   
               Jingling Xue and   
                 Ian Rogers and   
                     Gen Li and   
                Guibin Wang and   
                    Xudong Fang   Exploiting the reuse supplied by
                                  loop-dependent stream references for
                                  stream processors  . . . . . . . . . . . 11:1--11:??
         Vijay Janapa Reddi and   
           Simone Campanoni and   
             Meeta S. Gupta and   
           Michael D. Smith and   
                Gu-Yeon Wei and   
               David Brooks and   
                  Kim Hazelwood   Eliminating voltage emergencies via
                                  software-guided code transformations . . 12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 3, December, 2010

                   Qin Zhao and   
           Ioana Cutcutache and   
                  Weng-Fai Wong   PiPA: Pipelined profiling and analysis
                                  on multicore systems . . . . . . . . . . 13:1--13:??
                    Fei Guo and   
                Yan Solihin and   
                    Li Zhao and   
               Ravishankar Iyer   Quality of service shared cache
                                  management in chip multiprocessor
                                  architecture . . . . . . . . . . . . . . 14:1--14:??
                 Xiaoxia Wu and   
                    Jian Li and   
                Lixin Zhang and   
               Evan Speight and   
               Ram Rajamony and   
                       Yuan Xie   Design exploration of hybrid caches with
                                  disparate memory technologies  . . . . . 15:1--15:??
          Kornilios Kourtis and   
            Georgios Goumas and   
              Nectarios Koziris   Exploiting compression opportunities to
                                  improve SpMxV performance on shared
                                  memory systems . . . . . . . . . . . . . 16:1--16:??

ACM Transactions on Architecture and Code Optimization
Volume 7, Number 4, December, 2010

            Betul Buyukkurt and   
                John Cortes and   
           Jason Villarreal and   
                Walid A. Najjar   Impact of high-level transformations
                                  within the ROCCC framework . . . . . . . 17:1--17:??
            Yuan-Shin Hwang and   
              Tzong-Yen Lin and   
                Rong-Guey Chang   DisIRer: Converting a retargetable
                                  compiler into a multiplatform binary
                                  translator . . . . . . . . . . . . . . . 18:1--18:??
              Michael Boyer and   
               David Tarjan and   
                  Kevin Skadron   Federation: Boosting per-thread
                                  performance of throughput-oriented
                                  manycore architectures . . . . . . . . . 19:1--19:??
             Grigori Fursin and   
                  Olivier Temam   Collective optimization: a practical
                                  collaborative approach . . . . . . . . . 20:1--20:??
                   Fang Liu and   
                    Yan Solihin   Understanding the behavior and
                                  implications of context switch misses    21:1--21:??


ACM Transactions on Architecture and Code Optimization
Volume 8, Number 1, April, 2011

              Stijn Eyerman and   
                Lieven Eeckhout   Fine-grained DVFS using on-chip
                                  regulators . . . . . . . . . . . . . . . 1:1--1:??
             Chen-Yong Cher and   
                    Eren Kursun   Exploring the effects of on-chip thermal
                                  variation on high-performance multicore
                                  architectures  . . . . . . . . . . . . . 2:1--2:??
             Carole-Jean Wu and   
             Margaret Martonosi   Adaptive timekeeping replacement:
                                  Fine-grained capacity management for
                                  shared CMP caches  . . . . . . . . . . . 3:1--3:??
                Lucas Vespa and   
                      Ning Weng   Deterministic finite automata
                                  characterization and optimization for
                                  scalable pattern matching  . . . . . . . 4:1--4:??
     Abhishek Bhattacharjee and   
         Gilberto Contreras and   
             Margaret Martonosi   Parallelization libraries:
                                  Characterizing and reducing overheads    5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 2, July, 2011

               Xiangyu Dong and   
                   Yuan Xie and   
       Naveen Muralimanohar and   
               Norman P. Jouppi   Hybrid checkpointing using emerging
                                  nonvolatile memories for future exascale
                                  systems  . . . . . . . . . . . . . . . . 6:1--6:??
                 Jianjun Li and   
               Chenggang Wu and   
                  Wei-Chung Hsu   Efficient and effective misaligned data
                                  access handling in a dynamic binary
                                  translation system . . . . . . . . . . . 7:1--7:??
         Guru Venkataramani and   
      Christopher J. Hughes and   
              Sanjeev Kumar and   
                Milos Prvulovic   DeFT: Design space exploration for
                                  on-the-fly detection of coherence misses 8:1--8:??
             Jason D. Hiser and   
         Daniel W. Williams and   
                     Wei Hu and   
           Jack W. Davidson and   
                 Jason Mars and   
              Bruce R. Childers   Evaluating indirect branch handling
                                  mechanisms in software dynamic
                                  translation systems  . . . . . . . . . . 9:1--9:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 3, October, 2011

                 Xi E. Chen and   
                  Tor M. Aamodt   Hybrid analytical modeling of pending
                                  cache hits, data prefetching, and MSHRs  10:1--10:??
          Marios Kleanthous and   
             Yiannakis Sazeides   CATCH: a mechanism for dynamically
                                  detecting cache-content-duplication in
                                  instruction caches . . . . . . . . . . . 11:1--11:??
        Hans Vandierendonck and   
            André Seznec   Managing SMT resource usage through
                                  speculative instruction window weighting 12:1--12:??
                Po-Han Wang and   
              Chia-Lin Yang and   
              Yen-Ming Chen and   
                  Yu-Jung Cheng   Power gating strategies on GPUs  . . . . 13:1--13:??
                   Min Feng and   
                  Chen Tian and   
               Changhui Lin and   
                    Rajiv Gupta   Dynamic access distance driven cache
                                  replacement  . . . . . . . . . . . . . . 14:1--14:??
                Ahmad Samih and   
                Yan Solihin and   
                   Anil Krishna   Evaluating placement policies for
                                  managing capacity sharing in CMP
                                  architectures with private caches  . . . 15:1--15:??
            Chang-Ching Yeh and   
           Kuei-Chung Chang and   
               Tien-Fu Chen and   
                   Chingwei Yeh   Maintaining performance on power gating
                                  of microprocessor functional units by
                                  using a predictive pre-wakeup strategy   16:1--16:??
                Hyunjin Lee and   
               Sangyeun Cho and   
              Bruce R. Childers   DEFCAM: a design and evaluation
                                  framework for defect-tolerant cache
                                  memories . . . . . . . . . . . . . . . . 17:1--17:??

ACM Transactions on Architecture and Code Optimization
Volume 8, Number 4, January, 2012

         Per Stenström and   
              Koen De Bosschere   Introduction to the special issue on
                                  high-performance and embedded
                                  architectures and compilers  . . . . . . 18:1--18:??
            Jorge Albericio and   
          Rubén Gran and   
 Pablo Ibáñez and   
Víctor Viñals and   
Jose María Llabería   ABS: a low-cost adaptive controller for
                                  prefetching in a banked shared
                                  last-level cache . . . . . . . . . . . . 19:1--19:??
           Ali Galip Bayrak and   
          Nikola Velickovic and   
                Paolo Ienne and   
                 Wayne Burleson   An architecture-independent instruction
                                  shuffler to protect against side-channel
                                  attacks  . . . . . . . . . . . . . . . . 20:1--20:??
                 John Demme and   
            Simha Sethumadhavan   Approximate graph clustering for program
                                  characterization . . . . . . . . . . . . 21:1--21:??
              Mihai Pricopi and   
                   Tulika Mitra   Bahurupi: a polymorphic heterogeneous
                                  multi-core architecture  . . . . . . . . 22:1--22:??
         Jeroen V. Cleemput and   
               Bart Coppens and   
                Bjorn De Sutter   Compiler mitigations for time attacks on
                                  modern x86 processors  . . . . . . . . . 23:1--23:??
           Jason Mccandless and   
                    David Gregg   Compiler techniques to improve dynamic
                                  branch prediction for indirect jump and
                                  call instructions  . . . . . . . . . . . 24:1--24:??
Antonio García-Guirado and   
Ricardo Fernández-Pascual and   
                Alberto Ros and   
   José M. García   DAPSCO: Distance-aware partially shared
                                  cache organization . . . . . . . . . . . 25:1--25:??
             Zhenjiang Wang and   
               Chenggang Wu and   
              Pen-Chung Yew and   
                 Jianjun Li and   
                          Di Xu   On-the-fly structure splitting for heap
                                  objects  . . . . . . . . . . . . . . . . 26:1--26:??
               Dibyendu Das and   
      B. Dupont De Dinechin and   
          Ramakrishna Upadrasta   Efficient liveness computation using
                                  merge sets and DJ-graphs . . . . . . . . 27:1--27:??
          George Patsilaras and   
         Niket K. Choudhary and   
                     James Tuck   Efficiently exploiting memory level
                                  parallelism on asymmetric coupled cores
                                  in the dark silicon era  . . . . . . . . 28:1--28:??
               Roman Malits and   
             Evgeny Bolotin and   
            Avinoam Kolodny and   
                  Avi Mendelson   Exploring the limits of GPGPU scheduling
                                  in control flow bound applications . . . 29:1--29:??
                 Lois Orosa and   
            Elisardo Antelo and   
             Javier D. Bruguera   FlexSig: Implementing flexible hardware
                                  signatures . . . . . . . . . . . . . . . 30:1--30:??
            Ruben Titos-Gil and   
           Manuel E. Acacio and   
             Jose M. Garcia and   
                 Tim Harris and   
             Adrian Cristal and   
                Osman Unsal and   
                Ibrahim Hur and   
                   Mateo Valero   Hardware transactional memory with
                                  software-defined conflicts . . . . . . . 31:1--31:??
                Yongjoo Kim and   
                Jongeun Lee and   
                Toan X. Mai and   
                  Yunheung Paek   Improving performance of nested loops on
                                  reconfigurable array processors  . . . . 32:1--32:??
        Madhura Purnaprajna and   
                    Paolo Ienne   Making wide-issue VLIW processors viable
                                  on FPGAs . . . . . . . . . . . . . . . . 33:1--33:??
         Petar Radojkovi\'c and   
             Sylvain Girbal and   
             Arnaud Grasset and   
    Eduardo Quiñones and   
                 Sami Yehia and   
           Francisco J. Cazorla   On the evaluation of the impact of
                                  shared resources in multithreaded COTS
                                  processors in time-critical environments 34:1--34:??
           Leonid Domnitser and   
               Aamer Jaleel and   
                 Jason Loew and   
          Nael Abu-Ghazaleh and   
               Dmitry Ponomarev   Non-monopolizable caches: Low-complexity
                                  mitigation of cache side channel attacks 35:1--35:??
             Alejandro Rico and   
            Felipe Cabarcas and   
          Carlos Villavieja and   
             Milan Pavlovic and   
               Augusto Vega and   
                Yoav Etsion and   
               Alex Ramirez and   
                   Mateo Valero   On the simulation of large-scale
                                  architectures using multiple application
                                  abstraction levels . . . . . . . . . . . 36:1--36:??
                Selma Saidi and   
           Pranav Tendulkar and   
             Thierry Lepley and   
                     Oded Maler   Optimizing explicit data transfers for
                                  data parallel applications on the Cell
                                  architecture . . . . . . . . . . . . . . 37:1--37:??
                   Min Feng and   
               Changhui Lin and   
                    Rajiv Gupta   PLDS: Partitioning linked data
                                  structures for parallelism . . . . . . . 38:1--38:??
            Benoit Pradelle and   
            Alain Ketterlin and   
                Philippe Clauss   Polyhedral parallelization of binary
                                  code . . . . . . . . . . . . . . . . . . 39:1--39:??
                 Yaozu Dong and   
                    Yu Chen and   
                Zhenhao Pan and   
                Jinquan Dai and   
                  Yunhong Jiang   ReNIC: Architectural extension to SR-IOV
                                  I/O virtualization for efficient
                                  replication  . . . . . . . . . . . . . . 40:1--40:??
           Tom M. Bruintjes and   
        Karel H. G. Walters and   
             Sabih H. Gerez and   
             Bert Molenkamp and   
              Gerard J. M. Smit   Sabrewing: a lightweight architecture
                                  for combined floating-point and integer
                                  arithmetic . . . . . . . . . . . . . . . 41:1--41:??
             Mario Kicherer and   
               Fabian Nowak and   
              Rainer Buchty and   
                  Wolfgang Karl   Seamlessly portable applications:
                                  Managing the diversity of modern
                                  heterogeneous systems  . . . . . . . . . 42:1--42:??
       Nathanael Premillieu and   
                   Andre Seznec   SYRANT: SYmmetric Resource Allocation on
                                  Not-taken and Taken paths  . . . . . . . 43:1--43:??
        William Hasenplaugh and   
           Pritpal S. Ahuja and   
               Aamer Jaleel and   
           Simon Steely Jr. and   
                      Joel Emer   The gradient-based cache partitioning
                                  algorithm  . . . . . . . . . . . . . . . 44:1--44:??
                Javier Lira and   
           Timothy M. Jones and   
              Carlos Molina and   
        Antonio González   The migration prefetcher: Anticipating
                                  data promotion in dynamic NUCA caches    45:1--45:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   Thread Tranquilizer: Dynamically
                                  reducing performance variation . . . . . 46:1--46:??
             Dongsong Zhang and   
                   Deke Guo and   
              Fangyuan Chen and   
                     Fei Wu and   
                    Tong Wu and   
                   Ting Cao and   
                     Shiyao Jin   TL-plane-based multi-core
                                  energy-efficient real-time scheduling
                                  algorithm for sporadic tasks . . . . . . 47:1--47:??
           Michael J. Lyons and   
             Mark Hempstead and   
                Gu-Yeon Wei and   
                   David Brooks   The accelerator store: a shared memory
                                  framework for accelerator-based systems  48:1--48:??
              Daniel Orozco and   
               Elkin Garcia and   
                 Rishi Khan and   
           Kelly Livingston and   
                   Guang R. Gao   Toward high-throughput algorithms on
                                  many-core architectures  . . . . . . . . 49:1--49:??
                Kevin Stock and   
    Louis-Noël Pouchet and   
                  P. Sadayappan   Using machine learning to improve
                                  automatic vectorization  . . . . . . . . 50:1--50:??
     Kanit Therdsteerasukdi and   
               Gyungsu Byun and   
                 Jason Cong and   
             M. Frank Chang and   
                  Glenn Reinman   Utilizing RF-I and intelligent
                                  scheduling for better throughput/watt in
                                  a mobile GPU memory system . . . . . . . 51:1--51:??
        Frederick Ryckbosch and   
             Stijn Polfliet and   
                Lieven Eeckhout   VSim: Simulating multi-server setups at
                                  near native hardware speed . . . . . . . 52:1--52:??
                  Miao Zhou and   
                      Yu Du and   
             Bruce Childers and   
                Rami Melhem and   
            Daniel Mossé   Writeback-aware partitioning and
                                  replacement for last-level caches in
                                  phase change main memory systems . . . . 53:1--53:??
              Qingping Wang and   
            Sameer Kulkarni and   
               John Cavazos and   
                  Michael Spear   A transactional memory with automatic
                                  performance tuning . . . . . . . . . . . 54:1--54:??
          Bartosz Bogdanski and   
          Sven-Arne Reinemo and   
    Frank Olaf Sem-Jacobsen and   
              Ernst Gunnar Gran   sFtree: a fully connected and
                                  deadlock-free switch-to-switch routing
                                  algorithm for fat-trees  . . . . . . . . 55:1--55:??


ACM Transactions on Architecture and Code Optimization
Volume 9, Number 1, March, 2012

          Walid J. Ghandour and   
             Haitham Akkary and   
                      Wes Masri   Leveraging Strength-Based Dynamic
                                  Information Flow Analysis to Enhance
                                  Data Value Prediction  . . . . . . . . . 1:1--1:??
                 Jaekyu Lee and   
                Hyesoon Kim and   
                  Richard Vuduc   When Prefetching Works, When It Doesn't,
                                  and Why  . . . . . . . . . . . . . . . . 2:1--2:??
               Bita Mazloom and   
          Shashidhar Mysore and   
               Mohit Tiwari and   
              Banit Agrawal and   
                   Tim Sherwood   Dataflow Tomography: Information Flow
                                  Tracking For Understanding and
                                  Visualizing Full Systems . . . . . . . . 3:1--3:??
                Jung Ho Ahn and   
           Norman P. Jouppi and   
         Christos Kozyrakis and   
             Jacob Leverich and   
            Robert S. Schreiber   Improving System Energy Efficiency with
                                  Memory Rank Subsetting . . . . . . . . . 4:1--4:??
                Xuejun Yang and   
                    Li Wang and   
               Jingling Xue and   
                      Qingbo Wu   Comparability Graph Coloring for
                                  Optimizing Utilization of
                                  Software-Managed Stream Register Files
                                  for Stream Processors  . . . . . . . . . 5:1--5:??
        Abhinandan Majumdar and   
            Srihari Cadambi and   
             Michela Becchi and   
       Srimat T. Chakradhar and   
                Hans Peter Graf   A Massively Parallel, Energy Efficient
                                  Programmable Accelerator for Learning
                                  and Classification . . . . . . . . . . . 6:1--6:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 2, June, 2012

              Stijn Eyerman and   
                Lieven Eeckhout   Probabilistic modeling for job symbiosis
                                  scheduling on SMT processors . . . . . . 7:1--7:??
              Rachid Seghir and   
           Vincent Loechner and   
              Beno\^\it Meister   Integer affine transformations of
                                  parametric $Z$-polytopes and
                                  applications to loop nest optimization   8:1--8:??
                    Yi Yang and   
                 Ping Xiang and   
               Jingfei Kong and   
                Mike Mantor and   
                   Huiyang Zhou   A unified optimizing compiler framework
                                  for different GPGPU architectures  . . . 9:1--9:??
               Choonki Jang and   
                 Jaejin Lee and   
             Bernhard Egger and   
                    Soojung Ryu   Automatic code overlay generation and
                                  partially redundant code fetch
                                  elimination  . . . . . . . . . . . . . . 10:1--10:??
               Zahra Abbasi and   
     Georgios Varsamopoulos and   
            Sandeep K. S. Gupta   TACOMA: Server and workload management
                                  in Internet data centers considering
                                  cooling-computing power trade-off and
                                  energy proportionality . . . . . . . . . 11:1--11:??
             Andreas Lankes and   
                Thomas Wild and   
        Stefan Wallentowitz and   
            Andreas Herkersdorf   Benefits of selective packet discard in
                                  networks-on-chip . . . . . . . . . . . . 12:1--12:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 3, September, 2012

               Yangchun Luo and   
                   Antonia Zhai   Dynamically dispatching speculative
                                  threads to improve sequential execution  13:1--13:??
                 Huimin Cui and   
               Jingling Xue and   
                   Lei Wang and   
                  Yang Yang and   
              Xiaobing Feng and   
                    Dongrui Fan   Extendable pattern-oriented optimization
                                  directives . . . . . . . . . . . . . . . 14:1--14:??
            Adam Wade Lewis and   
            Nian-Feng Tzeng and   
                   Soumik Ghosh   Runtime energy consumption estimation
                                  for server workloads based on chaotic
                                  time-series approximation  . . . . . . . 15:1--15:??
           Alejandro Valero and   
           Julio Sahuquillo and   
             Salvador Petit and   
         Pedro López and   
              José Duato   Combining recency of information with
                                  selective random and a victim cache in
                                  last-level caches  . . . . . . . . . . . 16:1--16:??
                     Bin Li and   
              Li-Shiuan Peh and   
                    Li Zhao and   
                      Ravi Iyer   Dynamic QoS management for chip
                                  multiprocessors  . . . . . . . . . . . . 17:1--17:??
      Polychronis Xekalakis and   
            Nikolas Ioannou and   
                 Marcelo Cintra   Mixed speculative multithreaded
                                  execution models . . . . . . . . . . . . 18:1--18:??
        Mageda Sharafeddine and   
                Komal Jothi and   
                 Haitham Akkary   Disjoint out-of-order execution
                                  processor  . . . . . . . . . . . . . . . 19:1--19:??
              Diego Andrade and   
        Basilio B. Fraguela and   
            Ramón Doallo   Static analysis of the worst-case memory
                                  performance for irregular codes with
                                  indirections . . . . . . . . . . . . . . 20:1--20:??
                  Yang Chen and   
              Shuangde Fang and   
              Yuanjie Huang and   
            Lieven Eeckhout and   
             Grigori Fursin and   
              Olivier Temam and   
                   Chengyong Wu   Deconstructing iterative optimization    21:1--21:??
                 Apala Guha and   
              Kim Hazelwood and   
                 Mary Lou Soffa   Memory optimization of dynamic binary
                                  translators for embedded systems . . . . 22:1--22:??
            James R. Geraci and   
                Sharon M. Sacco   A transpose-free in-place SIMD optimized
                                  FFT  . . . . . . . . . . . . . . . . . . 23:1--23:??

ACM Transactions on Architecture and Code Optimization
Volume 9, Number 4, January, 2013

               Bart Coppens and   
            Bjorn De Sutter and   
                    Jonas Maebe   Feedback-driven binary code
                                  diversification to the special issue on
                                  high-performance embedded architectures
                                  and compilers  . . . . . . . . . . . . . 24:1--24:??
              Jeremy Fowers and   
                 Greg Brown and   
              John Wernsing and   
                     Greg Stitt   A performance and energy comparison of
                                  convolution on GPUs, FPGAs, and
                                  multicore processors . . . . . . . . . . 25:1--25:??
                Erven Rohou and   
             Kevin Williams and   
                    David Yuste   Vectorization technology to improve
                                  interpreter performance  . . . . . . . . 26:1--26:??
               Jimmy Cleary and   
              Owen Callanan and   
               Mark Purcell and   
                    David Gregg   Fast asymmetric thread synchronization   27:1--27:??
                    Yong Li and   
                Rami Melhem and   
                  Alex K. Jones   PS-TLB: Leveraging page classification
                                  information for fast, scalable and
                                  efficient translation for future CMPs    28:1--28:??
            Kristof Du Bois and   
              Stijn Eyerman and   
                Lieven Eeckhout   Per-thread cycle accounting in multicore
                                  processors . . . . . . . . . . . . . . . 29:1--29:??
           Christian Wimmer and   
              Michael Haupt and   
   Michael L. Van De Vanter and   
                Mick Jordan and   
           Laurent Dayn\`es and   
                  Douglas Simon   Maxine: an approachable virtual machine
                                  for, and in, Java  . . . . . . . . . . . 30:1--30:??
                 Malik Khan and   
               Protonu Basu and   
                  Gabe Rudy and   
                  Mary Hall and   
                  Chun Chen and   
               Jacqueline Chame   A script-based autotuning compiler
                                  system to generate high-performance CUDA
                                  code . . . . . . . . . . . . . . . . . . 31:1--31:??
        Kenzo Van Craeynest and   
                Lieven Eeckhout   Understanding fundamental design choices
                                  in single-ISA heterogeneous multicore
                                  architectures  . . . . . . . . . . . . . 32:1--32:??
        Samuel Antão and   
                   Leonel Sousa   The CRNS framework and its application
                                  to programmable and reconfigurable
                                  cryptography . . . . . . . . . . . . . . 33:1--33:??
             Boubacar Diouf and   
                 Can Hantas and   
               Albert Cohen and   
     Özcan Özturk and   
                  Jens Palsberg   A decoupled local memory allocator . . . 34:1--34:??
                 Huimin Cui and   
                    Qing Yi and   
               Jingling Xue and   
                  Xiaobing Feng   Layout-oblivious compiler optimization
                                  for matrix computations  . . . . . . . . 35:1--35:??
              Stephen Dolan and   
       Servesh Muralidharan and   
                    David Gregg   Compiler support for lightweight context
                                  switching  . . . . . . . . . . . . . . . 36:1--36:??
                 Pablo Abad and   
            Valentin Puente and   
            Jose-Angel Gregorio   LIGERO: a light but efficient router
                                  conceived for cache-coherent chip
                                  multiprocessors  . . . . . . . . . . . . 37:1--37:??
            Jorge Albericio and   
 Pablo Ibáñez and   
Víctor Viñals and   
Jose María Llabería   Exploiting reuse locality on inclusive
                                  shared last-level caches . . . . . . . . 38:1--38:??
        Paraskevas Yiapanis and   
           Demian Rosas-Ham and   
                Gavin Brown and   
             Mikel Luján   Optimizing software runtime systems for
                                  speculative parallelization  . . . . . . 39:1--39:??
            Cedric Nugteren and   
             Pieter Custers and   
                 Henk Corporaal   Algorithmic species: a classification of
                                  affine loop nests for parallel
                                  programming  . . . . . . . . . . . . . . 40:1--40:??
        Marco E. T. Gerards and   
                      Jan Kuper   Optimal DPM and DVFS for frame-based
                                  real-time systems  . . . . . . . . . . . 41:1--41:??
                Zhichao Yan and   
                 Hong Jiang and   
                 Yujuan Tan and   
                       Dan Feng   An integrated pseudo-associativity and
                                  relaxed-order approach to hardware
                                  transactional memory . . . . . . . . . . 42:1--42:??
                 Doris Chen and   
                Deshanand Singh   Profile-guided floating- to fixed-point
                                  conversion for hybrid FPGA-processor
                                  applications . . . . . . . . . . . . . . 43:1--43:??
                    Yan Cui and   
               Yingxin Wang and   
                    Yu Chen and   
                   Yuanchun Shi   Lock-contention-aware scheduler: a
                                  scalable and energy-efficient method for
                                  addressing scalability collapse on
                                  multicore systems  . . . . . . . . . . . 44:1--44:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   ADAPT: a framework for coscheduling
                                  multithreaded programs . . . . . . . . . 45:1--45:??
            Michele Tartara and   
        Stefano Crespi Reghizzi   Continuous learning of compiler
                                  heuristics . . . . . . . . . . . . . . . 46:1--46:??
          Grigorios Chrysos and   
     Panagiotis Dagritzikos and   
     Ioannis Papaefstathiou and   
               Apostolos Dollas   HC-CART: a parallel system
                                  implementation of data mining
                                  classification and regression tree
                                  (CART) algorithm on a multi-FPGA system  47:1--47:??
                Jongwon Lee and   
                   Yohan Ko and   
              Kyoungwoo Lee and   
            Jonghee M. Youn and   
                  Yunheung Paek   Dynamic code duplication with
                                  vulnerability awareness for soft error
                                  detection on VLIW architectures  . . . . 48:1--48:??
              Fabien Coelho and   
        François Irigoin   API compilation for image hardware
                                  accelerators . . . . . . . . . . . . . . 49:1--49:??
               Carlos Luque and   
              Miquel Moreto and   
       Francisco J. Cazorla and   
                   Mateo Valero   Fair CPU time accounting in CMP+SMT
                                  processors . . . . . . . . . . . . . . . 50:1--50:??
       Pavlos M. Mattheakis and   
         Ioannis Papaefstathiou   Significantly reducing MPI
                                  intercommunication latency and power
                                  overhead in both embedded and HPC
                                  systems  . . . . . . . . . . . . . . . . 51:1--51:??
            Riyadh Baghdadi and   
               Albert Cohen and   
           Sven Verdoolaege and   
            Konrad Trifunovi\'c   Improved loop tiling based on the
                                  removal of spurious false dependences    52:1--52:??
                Antoniu Pop and   
                   Albert Cohen   OpenStream: Expressiveness and data-flow
                                  compilation of OpenMP streaming programs 53:1--53:??
           Sven Verdoolaege and   
          Juan Carlos Juega and   
               Albert Cohen and   
José Ignacio Gómez and   
         Christian Tenllado and   
               Francky Catthoor   Polyhedral parallel code generation for
                                  CUDA . . . . . . . . . . . . . . . . . . 54:1--54:??
                      Yu Du and   
                  Miao Zhou and   
             Bruce Childers and   
                Rami Melhem and   
            Daniel Mossé   Delta-compressed caching for overcoming
                                  the write bandwidth limitation of hybrid
                                  main memory  . . . . . . . . . . . . . . 55:1--55:??
              Suresh Purini and   
                   Lakshya Jain   Finding good optimization sequences
                                  covering program space . . . . . . . . . 56:1--56:??
       Mehmet E. Belviranli and   
            Laxmi N. Bhuyan and   
                    Rajiv Gupta   A dynamic self-scheduling scheme for
                                  heterogeneous multiprocessor
                                  architectures  . . . . . . . . . . . . . 57:1--57:??
                Anurag Negi and   
                Ruben Titos-Gil   SCIN-cache: Fast speculative versioning
                                  in multithreaded cores . . . . . . . . . 58:1--58:??
               Thibaut Lutz and   
           Christian Fensch and   
                    Murray Cole   PARTANS: an autotuning framework for
                                  stencil computation on multi-GPU systems 59:1--59:??
               Chunhua Xiao and   
           M-C. Frank Chang and   
                 Jason Cong and   
               Michael Gill and   
             Zhangqin Huang and   
                Chunyue Liu and   
              Glenn Reinman and   
                         Hao Wu   Stream arbitration: Towards efficient
                                  bandwidth utilization for emerging
                                  on-chip interconnects  . . . . . . . . . 60:1--60:??


ACM Transactions on Architecture and Code Optimization
Volume 10, Number 1, April, 2013

                 Yunji Chen and   
               Tianshi Chen and   
                    Ling Li and   
                 Ruiyang Wu and   
                  Daofu Liu and   
                       Weiwu Hu   Deterministic Replay Using Global Clock  1:1--1:??
              Daniel Lustig and   
     Abhishek Bhattacharjee and   
             Margaret Martonosi   TLB Improvements for Chip
                                  Multiprocessors: Inter-Core Cooperative
                                  Prefetchers and Shared Last-Level TLBs   2:1--2:??
                  Rong Chen and   
                     Haibo Chen   Tiled-MapReduce: Efficient and Flexible
                                  MapReduce Processing on Multicore with
                                  Tiling . . . . . . . . . . . . . . . . . 3:1--3:??
             Michela Becchi and   
                Patrick Crowley   A-DFA: a Time- and Space-Efficient DFA
                                  Compression Algorithm for Fast Regular
                                  Expression Evaluation  . . . . . . . . . 4:1--4:26
                   Sheng Li and   
                Jung Ho Ahn and   
          Richard D. Strong and   
            Jay B. Brockman and   
            Dean M. Tullsen and   
               Norman P. Jouppi   The McPAT Framework for Multicore and
                                  Manycore Architectures: Simultaneously
                                  Modeling Power, Area, and Timing . . . . 5:1--5:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 2, May, 2013

        Angeliki Kritikakou and   
           Francky Catthoor and   
       George S. Athanasiou and   
        Vasilios Kelefouras and   
                  Costas Goutis   Near-Optimal Microprocessor and
                                  Accelerators Codesign with Latency and
                                  Throughput Constraints . . . . . . . . . 6:1--6:??
                  Lei Jiang and   
                      Yu Du and   
                    Bo Zhao and   
               Youtao Zhang and   
          Bruce R. Childers and   
                       Jun Yang   Hardware-Assisted Cooperative
                                  Integration of Wear-Leveling and
                                  Salvaging for Phase Change Memory  . . . 7:1--7:??
               Kyuseung Han and   
                Junwhan Ahn and   
                   Kiyoung Choi   Power-Efficient Predication Techniques
                                  for Acceleration of Control Flow
                                  Execution on CGRA  . . . . . . . . . . . 8:1--8:??
                  Chao Wang and   
                      Xi Li and   
              Junneng Zhang and   
                Xuehai Zhou and   
                   Xiaoning Nie   MP-Tomasulo: a Dependency-Aware
                                  Automatic Parallel Execution Engine for
                                  Sequential Programs  . . . . . . . . . . 9:1--9:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 3, September, 2013

                      Anonymous   TACO Reviewers 2012  . . . . . . . . . . 9:1--9:??
                Eran Shifer and   
                   Shlomo Weiss   Low-latency adaptive mode transitions
                                  and hierarchical power management in
                                  asymmetric clustered cores . . . . . . . 10:1--10:??
             Yosi Ben Asher and   
                    Nadav Rotem   Hybrid type legalization for a sparse
                                  SIMD instruction set . . . . . . . . . . 11:1--11:??
                 Yuanwu Lei and   
                   Yong Dou and   
                    Lei Guo and   
                   Jinbo Xu and   
                   Jie Zhou and   
                Yazhuo Dong and   
                    Hongjian Li   VLIW coprocessor for IEEE-754
                                  quadruple-precision elementary functions 12:1--12:??
          Motohiro Kawahito and   
            Hideaki Komatsu and   
             Takao Moriyama and   
              Hiroshi Inoue and   
                Toshio Nakatani   Idiom recognition framework using
                                  topological embedding  . . . . . . . . . 13:1--13:??
            Ghassan Shobaki and   
            Maxim Shawabkeh and   
        Najm Eldeen Abu Rmaileh   Preallocation instruction scheduling
                                  with register pressure minimization
                                  using a combinatorial optimization
                                  approach . . . . . . . . . . . . . . . . 14:1--14:??
                Dongrui She and   
                   Yifan He and   
                 Henk Corporaal   An energy-efficient method of supporting
                                  flexible special instructions in an
                                  embedded processor with compact ISA  . . 15:1--15:??
       V. Krishna Nandivada and   
               Rajkishore Barik   Improved bitwidth-aware variable packing 16:1--16:??
                Jung Ho Ahn and   
             Young Hoon Son and   
                       John Kim   Scalable high-radix router
                                  microarchitecture using a network switch
                                  organization . . . . . . . . . . . . . . 17:1--17:??
                 Libo Huang and   
               Zhiying Wang and   
                  Nong Xiao and   
               Yongwen Wang and   
                      Qiang Dou   Adaptive communication mechanism for
                                  accelerating MPI functions in NoC-based
                                  multicore processors . . . . . . . . . . 18:1--18:??
              Avinash Malik and   
                    David Gregg   Orchestrating stream graphs using model
                                  checking . . . . . . . . . . . . . . . . 19:1--19:??
                 Zheng Wang and   
          Michael F. P. O'boyle   Using machine learning to partition
                                  streaming programs . . . . . . . . . . . 20:1--20:??
                Ali Bakhoda and   
                   John Kim and   
                  Tor M. Aamodt   Designing on-chip networks for
                                  throughput accelerators  . . . . . . . . 21:1--21:??

ACM Transactions on Architecture and Code Optimization
Volume 10, Number 4, December, 2013

           Michael R. Jantz and   
             Prasad A. Kulkarni   Exploring single and multilevel JIT
                                  compilation policy for modern machines 1 22:1--22:??
               Xiangyu Dong and   
           Norman P. Jouppi and   
                       Yuan Xie   A circuit-architecture co-optimization
                                  framework for exploring nonvolatile
                                  memory hierarchies . . . . . . . . . . . 23:1--23:??
                Jishen Zhao and   
                Guangyu Sun and   
             Gabriel H. Loh and   
                       Yuan Xie   Optimizing GPU energy efficiency with
                                  $3$D die-stacking graphics memory and
                                  reconfigurable memory interface  . . . . 24:1--24:??
             Chien-Chi Chen and   
                  Sheng-De Wang   An efficient multicharacter transition
                                  string-matching engine based on the
                                  Aho--Corasick algorithm  . . . . . . . . 25:1--25:??
               Yangchun Luo and   
              Wei-Chung Hsu and   
                   Antonia Zhai   The design and implementation of
                                  heterogeneous multicore systems for
                                  energy-efficient speculative thread
                                  execution  . . . . . . . . . . . . . . . 26:1--26:??
          Dyer Rolán and   
        Basilio B. Fraguela and   
            Ramón Doallo   Virtually split cache: an efficient
                                  mechanism to distribute instructions and
                                  data 1 . . . . . . . . . . . . . . . . . 27:1--27:??
      Samantika Subramaniam and   
            Simon C. Steely and   
           Will Hasenplaugh and   
               Aamer Jaleel and   
              Carl Beckmann and   
             Tryggve Fossum and   
                      Joel Emer   Using in-flight chains to build a
                                  scalable cache coherence protocol  . . . 28:1--28:??
      Daniel Sánchez and   
         Yiannakis Sazeides and   
     Juan M. Cebrián and   
José M. García and   
          Juan L. Aragón   Modeling the impact of permanent faults
                                  in caches  . . . . . . . . . . . . . . . 29:1--29:??
               Sanghoon Lee and   
                     James Tuck   Automatic parallelization of
                                  fine-grained metafunctions on a chip
                                  multiprocessor . . . . . . . . . . . . . 30:1--30:??
          Christophe Dubach and   
           Timothy M. Jones and   
               Edwin V. Bonilla   Dynamic microarchitectural adaptation
                                  using machine learning . . . . . . . . . 31:1--31:??
                  Long Chen and   
                  Yanan Cao and   
                     Zhao Zhang   E$^3$CC: a memory error protection
                                  scheme with novel address mapping for
                                  subranked and low-power memories . . . . 32:1--32:??
              Yingying Tian and   
             Samira M. Khan and   
       Daniel A. Jiménez   Temporal-based multilevel correlating
                                  inclusive cache replacement  . . . . . . 33:1--33:??
                 Qixiao Liu and   
              Miquel Moreto and   
             Victor Jimenez and   
               Jaume Abella and   
       Francisco J. Cazorla and   
                   Mateo Valero   Hardware support for accurate per-task
                                  energy metering in multicore systems . . 34:1--34:??
               Sanyam Mehta and   
            Gautham Beeraka and   
                  Pen-Chung Yew   Tile size selection revisited  . . . . . 35:1--35:??
           Bogdan Prisacari and   
           German Rodriguez and   
          Cyriel Minkenberg and   
                Torsten Hoefler   Fast pattern-specific routing for fat
                                  tree networks  . . . . . . . . . . . . . 36:1--36:??
      Maximilien B. Breughe and   
                Lieven Eeckhout   Selecting representative benchmark
                                  inputs for exploring microprocessor
                                  design spaces  . . . . . . . . . . . . . 37:1--37:??
     Christoph Kerschbaumer and   
              Eric Hennigan and   
                 Per Larsen and   
          Stefan Brunthaler and   
                  Michael Franz   Information flow tracking meets
                                  just-in-time compilation . . . . . . . . 38:1--38:??
                   Rupesh Nasre   Time- and space-efficient flow-sensitive
                                  points-to analysis . . . . . . . . . . . 39:1--39:??
                Wenjia Ruan and   
                  Yujie Liu and   
                  Michael Spear   Boosting timestamp-based transactional
                                  memory by exploiting hardware cycle
                                  counters . . . . . . . . . . . . . . . . 40:1--40:??
                 Tanima Dey and   
                   Wei Wang and   
           Jack W. Davidson and   
                 Mary Lou Soffa   ReSense: Mapping dynamic workloads of
                                  colocated multithreaded applications
                                  using resource sensitivity . . . . . . . 41:1--41:??
           Adri\`a Armejach and   
            Ruben Titos-Gil and   
                Anurag Negi and   
             Osman S. Unsal and   
          Adrián Cristal   Techniques to improve performance in
                                  requester-wins hardware transactional
                                  memory . . . . . . . . . . . . . . . . . 42:1--42:??
             Myeongjae Jeon and   
                Conglong Li and   
                Alan L. Cox and   
                   Scott Rixner   Reducing DRAM row activations with eager
                                  read/write clustering  . . . . . . . . . 43:1--43:??
                Zhijia Zhao and   
           Michael Bebenita and   
                Dave Herman and   
                Jianhua Sun and   
                    Xipeng Shen   HPar: a practical parallel parser for
                                  HTML --- taming HTML complexities for
                                  parallel parsing . . . . . . . . . . . . 44:1--44:??
               Ehsan Totoni and   
                Mert Dikmen and   
María Jesús Garzarán   Easy, fast, and energy-efficient object
                                  detection on heterogeneous on-chip
                                  architectures  . . . . . . . . . . . . . 45:1--45:??
      Viacheslav V. Fedorov and   
                  Sheng Qiu and   
      A. L. Narasimha Reddy and   
                  Paul V. Gratz   ARI: Adaptive LLC-memory traffic
                                  management . . . . . . . . . . . . . . . 46:1--46:??
Cecilia González-Álvarez and   
         Jennifer B. Sartor and   
      Carlos Álvarez and   
Daniel Jiménez-González and   
                Lieven Eeckhout   Accelerating an application domain with
                                  specialized functional units . . . . . . 47:1--47:??
               Xiaolin Wang and   
               Lingmei Weng and   
               Zhenlin Wang and   
                    Yingwei Luo   Revisiting memory management on
                                  virtualized environments . . . . . . . . 48:1--48:??
              Chuntao Jiang and   
                  Zhibin Yu and   
                    Hai Jin and   
              Chengzhong Xu and   
            Lieven Eeckhout and   
                Wim Heirman and   
          Trevor E. Carlson and   
                   Xiaofei Liao   PCantorSim: Accelerating parallel
                                  architecture simulation through
                                  fractal-based sampling . . . . . . . . . 49:1--49:??
             Srdan Stipi\'c and   
         Vesna Smiljkovi\'c and   
                Osman Unsal and   
      Adrián Cristal and   
                   Mateo Valero   Profile-guided transaction
                                  coalescing-lowering transactional
                                  overheads by merging transactions  . . . 50:1--50:??
                   Zhe Wang and   
              Shuchang Shan and   
                   Ting Cao and   
                   Junli Gu and   
                      Yi Xu and   
                   Shuai Mu and   
                   Yuan Xie and   
       Daniel A. Jiménez   WADE: Writeback-aware dynamic cache
                                  management for NVM-based main memory
                                  system . . . . . . . . . . . . . . . . . 51:1--51:??
                    Yong Li and   
               Yaojun Zhang and   
                     Hai LI and   
                 Yiran Chen and   
                  Alex K. Jones   C1C: a configurable, compiler-guided
                                  STT-RAM L1 cache . . . . . . . . . . . . 52:1--52:??
              Naznin Fauzia and   
            Venmugil Elango and   
         Mahesh Ravishankar and   
               J. Ramanujam and   
           Fabrice Rastello and   
             Atanas Rountev and   
    Louis-Noël Pouchet and   
                  P. Sadayappan   Beyond reuse distance analysis: Dynamic
                                  analysis for characterization of data
                                  locality potential . . . . . . . . . . . 53:1--53:??
          Alen Bardizbanyan and   
      Magnus Själander and   
              David Whalley and   
            Per Larsson-Edefors   Designing a practical data filter cache
                                  to improve both energy efficiency and
                                  performance  . . . . . . . . . . . . . . 54:1--54:??
            Andrei Hagiescu and   
                   Bing Liu and   
              R. Ramanathan and   
  Sucheendra K. Palaniappan and   
                  Zheng Cui and   
       Bipasa Chattopadhyay and   
          P. S. Thiagarajan and   
                  Weng-Fai Wong   GPU code generation for ODE-based
                                  applications with phased shared-data
                                  access patterns  . . . . . . . . . . . . 55:1--55:??
                Junghee Lee and   
    Chrysostomos Nicopoulos and   
              Hyung Gyu Lee and   
                    Jongman Kim   TornadoNoC: a lightweight and scalable
                                  on-chip network architecture for the
                                  many-core era  . . . . . . . . . . . . . 56:1--56:??
           Christos Strydis and   
          Robert M. Seepers and   
          Pedro Peris-Lopez and   
           Dimitrios Siskos and   
                Ioannis Sourdis   A system architecture, processor, and
                                  communication protocol for secure
                                  implants . . . . . . . . . . . . . . . . 57:1--57:??
                 Wonsub Kim and   
               Yoonseo Choi and   
                    Haewoo Park   Fast modulo scheduler utilizing
                                  patternized routes for coarse-grained
                                  reconfigurable architectures . . . . . . 58:1--58:??
               Dorit Nuzman and   
               Revital Eres and   
              Sergei Dyshel and   
         Marcel Zalmanovici and   
                  Jose Castanos   JIT technology with C/C++:
                                  Feedback-directed dynamic recompilation
                                  for statically compiled languages  . . . 59:1--59:??
          Thejas Ramashekar and   
                Uday Bondhugula   Automatic data allocation and buffer
                                  management for multi-GPU machines  . . . 60:1--60:??
        Hans Vandierendonck and   
            George Tzenakis and   
      Dimitrios S. Nikolopoulos   Analysis of dependence tracking
                                  algorithms for task dataflow execution   61:1--61:??
             Yeonghun Jeong and   
              Seongseok Seo and   
                    Jongeun Lee   Evaluator-executor transformation for
                                  efficient pipelining of loops with
                                  conditionals . . . . . . . . . . . . . . 62:1--62:??
           Rajkishore Barik and   
               Jisheng Zhao and   
                   Vivek Sarkar   A decoupled non-SSA global register
                                  allocation using bipartite liveness
                                  graphs . . . . . . . . . . . . . . . . . 63:1--63:??
                Peter Gavin and   
              David Whalley and   
          Magnus Själander   Reducing instruction fetch energy in
                                  multi-issue processors . . . . . . . . . 64:1--64:??
                      Anonymous   List of distinguished reviewers ACM TACO 65:1--65:??


ACM Transactions on Architecture and Code Optimization
Volume 11, Number 1, February, 2014

                Neeraj Goel and   
               Anshul Kumar and   
            Preeti Ranjan Panda   Shared-port register file architecture
                                  for low-energy VLIW processors . . . . . 1:1--1:??
                 Zheng Wang and   
       Georgios Tournavitis and   
          Björn Franke and   
          Michael F. P. O'boyle   Integrating profile-driven parallelism
                                  detection and machine-learning-based
                                  mapping  . . . . . . . . . . . . . . . . 2:1--2:??
             Mehrzad Samadi and   
               Amir Hormati and   
              Janghaeng Lee and   
                   Scott Mahlke   Leveraging GPUs using cooperative loop
                                  speculation  . . . . . . . . . . . . . . 3:1--3:??
                   Jue Wang and   
               Xiangyu Dong and   
                   Yuan Xie and   
               Norman P. Jouppi   Endurance-aware cache line management
                                  for non-volatile caches  . . . . . . . . 4:1--4:??
                    Lei Liu and   
                  Zehan Cui and   
                    Yong Li and   
                Yungang Bao and   
                Mingyu Chen and   
                   Chengyong Wu   BPM/BPM+: Software-based dynamic memory
                                  partitioning mechanisms for mitigating
                                  DRAM bank-/channel-level interferences
                                  in multicore systems . . . . . . . . . . 5:1--5:??
       Christian Häubl and   
           Christian Wimmer and   
 Hanspeter Mössenböck   Trace transitioning and exception
                                  handling in a trace-based JIT compiler
                                  for Java . . . . . . . . . . . . . . . . 6:1--6:??
             Yongbing Huang and   
               Licheng Chen and   
                  Zehan Cui and   
                  Yuan Ruan and   
                Yungang Bao and   
                Mingyu Chen and   
                    Ninghui Sun   HMTT: a hybrid hardware/software tracing
                                  system for bridging the DRAM access
                                  trace's semantic gap . . . . . . . . . . 7:1--7:??
                  Quan Chen and   
                      Minyi Guo   Adaptive workload-aware task scheduling
                                  for single-ISA asymmetric multicore
                                  architectures  . . . . . . . . . . . . . 8:1--8:??
                Komal Jothi and   
                 Haitham Akkary   Tuning the continual flow pipeline
                                  architecture with virtual register
                                  renaming . . . . . . . . . . . . . . . . 11:1--11:??
        Angeliki Kritikakou and   
           Francky Catthoor and   
        Vasilios Kelefouras and   
                  Costas Goutis   A scalable and near-optimal
                                  representation of access schemes for
                                  memory management  . . . . . . . . . . . 13:1--13:??
               Hugh Leather and   
              Edwin Bonilla and   
                Michael O'boyle   Automatic feature generation for machine
                                  learning--based optimising compilation   14:1--14:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 2, June, 2014

                    Bin Ren and   
             Todd Mytkowicz and   
                  Gagan Agrawal   A Portable Optimization Engine for
                                  Accelerating Irregular Data-Traversal
                                  Applications on SIMD Architectures . . . 16:1--16:??
               Bor-Yeh Shen and   
              Wei-Chung Hsu and   
                       Wuu Yang   A Retargetable Static Binary Translator
                                  for the ARM Architecture . . . . . . . . 18:1--18:??
Darío Suárez Gracia and   
  Alexandra Ferrerón and   
   Luis Montesano Del Campo and   
       Teresa Monreal Arnal and   
Víctor Viñals Yúfera   Revisiting LP--NUCA Energy Consumption:
                                  Cache Access Policies and Adaptive Block
                                  Dropping . . . . . . . . . . . . . . . . 19:1--19:??
              Shuangde Fang and   
                  Zidong Du and   
                Yuntan Fang and   
              Yuanjie Huang and   
                  Yang Chen and   
            Lieven Eeckhout and   
              Olivier Temam and   
                  Huawei Li and   
                 Yunji Chen and   
                   Chengyong Wu   Performance Portability Across
                                  Heterogeneous SoCs Using a Generalized
                                  Library-Based Approach . . . . . . . . . 21:1--21:??
        Abdulrahman Kaitoua and   
                 Hazem Hajj and   
         Mazen A. R. Saghir and   
              Hassan Artail and   
             Haitham Akkary and   
              Mariette Awad and   
        Mageda Sharafeddine and   
                Khaleel Mershad   Hadoop Extensions for Distributed
                                  Computing on Reconfigurable Active SSD
                                  Clusters . . . . . . . . . . . . . . . . 22:1--22:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 3, October, 2014

                   Jue Wang and   
               Xiangyu Dong and   
                       Yuan Xie   Preventing STT-RAM Last-Level Caches
                                  from Port Obstruction  . . . . . . . . . 23:1--23:??
        M. A. Gonzalez-Mesa and   
           Eladio Gutierrez and   
           Emilio L. Zapata and   
                    Oscar Plata   Effective Transactional Memory Execution
                                  Management for Improved Concurrency  . . 24:1--24:??
               Rakesh Kumar and   
  Alejandro Martínez and   
        Antonio González   Efficient Power Gating of SIMD
                                  Accelerators Through Dynamic Selective
                                  Devectorization in an HW/SW Codesigned
                                  Environment  . . . . . . . . . . . . . . 25:1--25:??
           Stefano Di Carlo and   
          Salvatore Galfano and   
               Marco Indaco and   
             Paolo Prinetto and   
            Davide Bertozzi and   
                Piero Olivo and   
              Cristian Zambelli   FLARES: an Aging Aware Algorithm to
                                  Autonomously Adapt the Error Correction
                                  Capability in NAND Flash Memories  . . . 26:1--26:??
        Davide B. Bartolini and   
             Filippo Sironi and   
           Donatella Sciuto and   
          Marco D. Santambrogio   Automated Fine-Grained CPU Provisioning
                                  for Virtual Machines . . . . . . . . . . 27:1--27:??
          Trevor E. Carlson and   
                Wim Heirman and   
              Stijn Eyerman and   
                Ibrahim Hur and   
                Lieven Eeckhout   An Evaluation of High-Level Mechanistic
                                  Core Models  . . . . . . . . . . . . . . 28:1--28:??
              Farrukh Hijaz and   
                      Omer Khan   NUCA-L1: a Non-Uniform Access Latency
                                  Level-1 Cache Architecture for
                                  Multicores Operating at Near-Threshold
                                  Voltages . . . . . . . . . . . . . . . . 29:1--29:??
                Andi Drebes and   
           Karine Heydemann and   
             Nathalie Drach and   
                Antoniu Pop and   
                   Albert Cohen   Topology-Aware and Dependence-Aware
                                  Scheduling and Memory Allocation for
                                  Task-Parallel Languages  . . . . . . . . 30:1--30:??
        Venkata Kalyan Tawa and   
                 Ravi Kasha and   
                   Madhu Mutyam   EFGR: an Enhanced Fine Granularity
                                  Refresh Feature for High-Performance
                                  DDR4 DRAM Devices  . . . . . . . . . . . 31:1--31:??
               Gulay Yalcin and   
                 Oguz Ergin and   
                Emrah Islek and   
          Osman Sabri Unsal and   
                 Adrian Cristal   Exploiting Existing Comparators for
                                  Fine-Grained Low-Cost Error Detection    32:1--32:??
       Pradeep Ramachandran and   
     Siva Kumar Sastry Hari and   
                  Manlap Li and   
                 Sarita V. Adve   Hardware Fault Recovery for I/O
                                  Intensive Applications . . . . . . . . . 33:1--33:??
              Stijn Eyerman and   
             Pierre Michaud and   
                 Wouter Rogiest   Multiprogram Throughput Metrics: a
                                  Systematic Approach  . . . . . . . . . . 34:1--34:??

ACM Transactions on Architecture and Code Optimization
Volume 11, Number 4, January, 2015

            Cedric Nugteren and   
                 Henk Corporaal   Bones: an Automatic Skeleton-Based
                                  C-to-CUDA Compiler for GPUs  . . . . . . 35:1--35:??
                   Jue Wang and   
               Xiangyu Dong and   
                       Yuan Xie   Building and Optimizing MRAM-Based
                                  Commodity Memories . . . . . . . . . . . 36:1--36:??
         Rakesh Komuravelli and   
             Sarita V. Adve and   
                Ching-Tsun Chou   Revisiting the Complexity of Hardware
                                  Cache Coherence and Some Implications    37:1--37:??
   Gabriel Rodríguez and   
        Juan Touriño and   
             Mahmut T. Kandemir   Volatile STT--RAM Scratchpad Design and
                                  Data Allocation for Low Energy . . . . . 38:1--38:??
  Cristóbal Camarero and   
            Enrique Vallejo and   
           Ramón Beivide   Topological Characterization of Hamming
                                  and Dragonfly Networks and Its
                                  Implications on Routing  . . . . . . . . 39:1--39:??
                Hanbin Yoon and   
                Justin Meza and   
       Naveen Muralimanohar and   
           Norman P. Jouppi and   
                     Onur Mutlu   Efficient Data Mapping and Buffering
                                  Techniques for Multilevel Cell
                                  Phase-Change Memories  . . . . . . . . . 40:1--40:??
Nathanael Prémillieu and   
            André Seznec   Efficient Out-of-Order Execution of
                                  Guarded ISAs . . . . . . . . . . . . . . 41:1--41:??
                 Zheng Wang and   
              Dominik Grewe and   
          Michael F. P. O'boyle   Automatic and Portable Mapping of Data
                                  Parallel Programs to OpenCL for
                                  GPU-Based Heterogeneous Systems  . . . . 42:1--42:??
                     Dan He and   
                  Fang Wang and   
                 Hong Jiang and   
                   Dan Feng and   
              Jing Ning Liu and   
                   Wei Tong and   
                    Zheng Zhang   Improving Hybrid FTL by Fully Exploiting
                                  Internal SSD Parallelism with Virtual
                                  Blocks . . . . . . . . . . . . . . . . . 43:1--43:??
                  Eri Rubin and   
                   Ely Levy and   
                Amnon Barak and   
                    Tal Ben-Nun   MAPS: Optimizing Massively Parallel
                                  Applications Using Device-Level Memory
                                  Abstraction  . . . . . . . . . . . . . . 44:1--44:??
         Alessandro Cilardo and   
                     Luca Gallo   Improving Multibank Memory Access
                                  Parallelism with Lattice-Based
                                  Partitioning . . . . . . . . . . . . . . 45:1--45:??
       Jan Kasper Martinsen and   
          Håkan Grahn and   
                  Anders Isberg   The Effects of Parameter Tuning in
                                  Software Thread-Level Speculation in
                                  JavaScript Engines . . . . . . . . . . . 46:1--46:??
           Quentin Colombet and   
           Florian Brandner and   
                    Alain Darte   Studying Optimal Spilling in the Light
                                  of SSA . . . . . . . . . . . . . . . . . 47:1--47:??
            Jawad Haj-Yihia and   
             Yosi Ben Asher and   
               Efraim Rotem and   
                Ahmad Yasin and   
                    Ran Ginosar   Compiler-Directed Power Management for
                                  Superscalars . . . . . . . . . . . . . . 48:1--48:??
            Hong-Phuc Trinh and   
              Marc Duranton and   
             Michel Paindavoine   Efficient Data Encoding for
                                  Convolutional Neural Network application 49:1--49:??
       Maximilien B. Breugh and   
              Stijn Eyerman and   
                Lieven Eeckhout   Mechanistic Analytical Modeling of
                                  Superscalar In-Order Processor
                                  Performance  . . . . . . . . . . . . . . 50:1--50:??
             Vivek Seshadri and   
             Samihan Yedkar and   
                 Hongyi Xin and   
                 Onur Mutlu and   
         Phillip B. Gibbons and   
          Michael A. Kozuch and   
                  Todd C. Mowry   Mitigating Prefetcher-Caused Pollution
                                  Using Informed Caching Policies for
                                  Prefetched Blocks  . . . . . . . . . . . 51:1--51:??
             George Matheou and   
           Paraskevas Evripidou   Architectural Support for Data-Driven
                                  Execution  . . . . . . . . . . . . . . . 52:1--52:??
                 Amir Morad and   
              Leonid Yavits and   
                    Ran Ginosar   GP--SIMD Processing-in-Memory  . . . . . 53:1--53:??
              Thomas Schaub and   
                 Simon Moll and   
            Ralf Karrenberg and   
                 Sebastian Hack   The Impact of the SIMD Width on
                                  Control-Flow and Memory Divergence . . . 54:1--54:??
               Zhenman Fang and   
               Sanyam Mehta and   
              Pen-Chung Yew and   
               Antonia Zhai and   
             James Greensky and   
            Gautham Beeraka and   
                     Binyu Zang   Measuring Microarchitectural Details of
                                  Multi- and Many-Core Memory Systems
                                  through Microbenchmarking  . . . . . . . 55:1--55:??
              Chi Ching Chi and   
      Mauricio Alvarez-Mesa and   
                   Ben Juurlink   Low-Power High-Efficiency Video Decoding
                                  using General-Purpose Processors . . . . 56:1--56:??
             Fabio Luporini and   
       Ana Lucia Varbanescu and   
          Florian Rathgeber and   
     Gheorghe-Teodor Bercea and   
               J. Ramanujam and   
               David A. Ham and   
               Paul H. J. Kelly   Cross-Loop Optimization of Arithmetic
                                  Intensity for Finite Element Local
                                  Assembly . . . . . . . . . . . . . . . . 57:1--57:??
                  Xing Zhou and   
María J. Garzarán and   
                 David A. Padua   Optimal Parallelogram Selection for
                                  Hierarchical Tiling  . . . . . . . . . . 58:1--58:??
                 Leo Porter and   
      Michael A. Laurenzano and   
              Ananta Tiwari and   
                 Adam Jundt and   
       William A. Ward, Jr. and   
               Roy Campbell and   
               Laura Carrington   Making the Most of SMT in HPC: System-
                                  and Application-Level Perspectives . . . 59:1--59:??
                   Xin Tong and   
             Toshihiko Koju and   
          Motohiro Kawahito and   
               Andreas Moshovos   Optimizing Memory Translation Emulation
                                  in Full System Emulators . . . . . . . . 60:1--60:??
                Martin Kong and   
                Antoniu Pop and   
    Louis-Noël Pouchet and   
            R. Govindarajan and   
               Albert Cohen and   
                  P. Sadayappan   Compiler/Runtime Framework for Dynamic
                                  Dataflow Parallelization of Tiled
                                  Programs . . . . . . . . . . . . . . . . 61:1--61:??
              Nicolas Melot and   
          Christoph Kessler and   
           Jörg Keller and   
           Patrick Eitschberger   Fast Crown Scheduling Heuristics for
                                  Energy-Efficient Mapping and Scaling of
                                  Moldable Streaming Tasks on Manycore
                                  Systems  . . . . . . . . . . . . . . . . 62:1--62:??
                Wenjia Ruan and   
                  Yujie Liu and   
                  Michael Spear   Transactional Read-Modify-Write Without
                                  Aborts . . . . . . . . . . . . . . . . . 63:1--63:??
                Zia Ul Huda and   
              Ali Jannesari and   
                     Felix Wolf   Using Template Matching to Infer
                                  Parallel Design Patterns . . . . . . . . 64:1--64:??
                Heiner Litz and   
            Ricardo J. Dias and   
              David R. Cheriton   Efficient Correction of Anomalies in
                                  Snapshot Isolation Transactions  . . . . 65:1--65:??
              Helge Bahmann and   
             Nico Reissmann and   
               Magnus Jahre and   
            Jan Christian Meyer   Perfect Reconstructability of Control
                                  Flow from Demand Dependence Graphs . . . 66:1--66:??
            Venmugil Elango and   
            Naser Sedaghati and   
           Fabrice Rastello and   
    Louis-Noël Pouchet and   
               J. Ramanujam and   
            Radu Teodorescu and   
                  P. Sadayappan   On Using the Roofline Model with Lower
                                  Bounds on Data Movement  . . . . . . . . 67:1--67:??
                      Anonymous   List of Distinguished Reviewers ACM TACO
                                  2014 . . . . . . . . . . . . . . . . . . 68:1--68:??


ACM Transactions on Architecture and Code Optimization
Volume 12, Number 1, April, 2015

         Christopher Zimmer and   
                  Frank Mueller   NoCMsg: a Scalable Message-Passing
                                  Abstraction for Network-on-Chips . . . . 1:1--1:??
           Beayna Grigorian and   
                  Glenn Reinman   Accelerating Divergent Applications on
                                  SIMD Architectures Using Neural Networks 2:1--2:??
                 Anup Holey and   
             Vineeth Mekkat and   
              Pen-Chung Yew and   
                   Antonia Zhai   Performance-Energy Considerations for
                                  Shared Cache Management in a
                                  Heterogeneous Multicore Processor  . . . 3:1--3:??
                  Jinho Suh and   
           Chieh-Ting Huang and   
                  Michel Dubois   Dynamic MIPS Rate Stabilization for
                                  Complex Processors . . . . . . . . . . . 4:1--4:??
             Naghmeh Karimi and   
    Arun Karthik Kanuparthi and   
               Xueyang Wang and   
            Ozgur Sinanoglu and   
                   Ramesh Karri   MAGIC: Malicious Aging in Circuits/Cores 5:1--5:??
   Pablo De Oliveira Castro and   
                 Chadi Akel and   
                 Eric Petit and   
               Mihail Popov and   
                  William Jalby   CERE: LLVM-Based Codelet Extractor and
                                  REplayer for Piecewise Benchmarking and
                                  Optimization . . . . . . . . . . . . . . 6:1--6:??
         Benedict R. Gaster and   
                Derek Hower and   
                      Lee Howes   HRF-Relaxed: Adapting HRF to the
                                  Complexities of Industrial Heterogeneous
                                  Memory Models  . . . . . . . . . . . . . 7:1--7:??
               Kevin Streit and   
          Johannes Doerfert and   
          Clemens Hammacher and   
             Andreas Zeller and   
                 Sebastian Hack   Generalized Task Parallelism . . . . . . 8:1--8:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 2, July, 2015

               Hamed Tabkhi and   
                 Gunar Schirner   A Joint SW/HW Approach for Reducing
                                  Register File Vulnerability  . . . . . . 9:1--9:??
            Arun Kanuparthi and   
                   Ramesh Karri   Reliable Integrity Checking in Multicore
                                  Processors . . . . . . . . . . . . . . . 10:1--10:??
                Do-Heon Lee and   
              Su-Kyung Yoon and   
              Jung-Geun Kim and   
           Charles C. Weems and   
                   Shin-Dug Kim   A New Memory-Disk Integrated System with
                                  HW Optimizer . . . . . . . . . . . . . . 11:1--11:??
 Morteza Mohajjel Kafshdooz and   
                 Alireza Ejlali   Dynamic Shared SPM Reuse for Real-Time
                                  Multicore Embedded Systems . . . . . . . 12:1--12:??
                 Wenhao Jia and   
                 Elba Garza and   
              Kelly A. Shaw and   
             Margaret Martonosi   GPU Performance and Power Tuning Using
                                  Regression Trees . . . . . . . . . . . . 13:1--13:??
          Irshad Pananilath and   
            Aravind Acharya and   
              Vinay Vasista and   
                Uday Bondhugula   An Optimizing Code Generator for a Class
                                  of Lattice-Boltzmann Computations  . . . 14:1--14:??
              Shuangde Fang and   
                  Wenwen Xu and   
                  Yang Chen and   
            Lieven Eeckhout and   
              Olivier Temam and   
                 Yunji Chen and   
               Chengyong Wu and   
                  Xiaobing Feng   Practical Iterative Optimization for the
                                  Data Center  . . . . . . . . . . . . . . 15:1--15:??
                  Tao Zhang and   
               Naifeng Jing and   
              Kaiming Jiang and   
                    Wei Shu and   
                 Min-You Wu and   
                  Xiaoyao Liang   Buddy SM: Sharing Pipeline Front-End for
                                  Improved Energy Efficiency in GPGPUs . . 16:1--16:??
           Hsiang-Yun Cheng and   
               Matt Poremba and   
             Narges Shahidi and   
                Ivan Stalev and   
            Mary Jane Irwin and   
            Mahmut Kandemir and   
               Jack Sampson and   
                       Yuan Xie   EECache: a Comprehensive Study on the
                                  Architectural Design for
                                  Energy-Efficient Last-Level Caches in
                                  Chip Multiprocessors . . . . . . . . . . 17:1--17:??
               Arjun Suresh and   
    Bharath Narasimha Swamy and   
                Erven Rohou and   
            André Seznec   Intercepting Functions for Memoization:
                                  a Case Study Using Transcendental
                                  Functions  . . . . . . . . . . . . . . . 18:1--18:??
           Chung-Hsiang Lin and   
                 De-Yu Shen and   
               Yi-Jung Chen and   
              Chia-Lin Yang and   
        Cheng-Yuan Michael Wang   SECRET: a Selective Error Correction
                                  Framework for Refresh Energy Reduction
                                  in DRAMs . . . . . . . . . . . . . . . . 19:1--19:??
                 Doug Simon and   
           Christian Wimmer and   
             Bernhard Urban and   
             Gilles Duboscq and   
              Lukas Stadler and   
         Thomas Würthinger   Snippets: Taking the High Road to a Low
                                  Level  . . . . . . . . . . . . . . . . . 20:1--20:??
 Raghuraman Balasubramanian and   
            Vinay Gangadhar and   
                Ziliang Guo and   
                Chen-Han Ho and   
              Cherin Joseph and   
          Jaikrishnan Menon and   
        Mario Paulo Drumond and   
                 Robin Paul and   
             Sharath Prasad and   
            Pradip Valathol and   
      Karthikeyan Sankaralingam   Enabling GPGPU Low-Level Hardware
                                  Explorations with MIAOW: an Open-Source
                                  RTL Implementation of a GPGPU  . . . . . 21:1--21:??
                  Quan Chen and   
                      Minyi Guo   Locality-Aware Work Stealing Based on
                                  Online Profiling and Auto-Tuning for
                                  Multisocket Multicore Architectures  . . 22:1--22:??
                  Madan Das and   
           Gabriel Southern and   
                     Jose Renau   Section-Based Program Analysis to Reduce
                                  Overhead of Detecting Unsynchronized
                                  Thread Communication . . . . . . . . . . 23:1--23:??
                Atieh Lotfi and   
               Abbas Rahimi and   
                Luca Benini and   
                Rajesh K. Gupta   Aging-Aware Compilation for GP-GPUs  . . 24:1--24:??
           Brian P. Railing and   
               Eric R. Hein and   
                Thomas M. Conte   Contech: Efficiently Generating Dynamic
                                  Task Graphs for Arbitrary Parallel
                                  Programs . . . . . . . . . . . . . . . . 25:1--25:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 3, October, 2015

              Mahdad Davari and   
                Alberto Ros and   
             Erik Hagersten and   
               Stefanos Kaxiras   The Effects of Granularity and
                                  Adaptivity on Private/Shared
                                  Classification for Coherence . . . . . . 26:1--26:??
              Mark Gottscho and   
       Abbas BanaiyanMofrad and   
                 Nikil Dutt and   
               Alex Nicolau and   
                   Puneet Gupta   DPCS: Dynamic Power/Capacity Scaling for
                                  SRAM Caches in the Nanoscale Era . . . . 27:1--27:??
             Pierre Michaud and   
            Andrea Mondelli and   
            André Seznec   Revisiting Clustered Microarchitecture
                                  for Future Superscalar Cores: a Case for
                                  Wide Issue Clusters  . . . . . . . . . . 28:1--28:??
       Ragavendra Natarajan and   
                   Antonia Zhai   Leveraging Transactional Execution for
                                  Memory Consistency Model Emulation . . . 29:1--29:??
          Biswabandan Panda and   
           Shankar Balachandran   CAFFEINE: a Utility-Driven Prefetcher
                                  Aggressiveness Engine for Multicores . . 30:1--30:??
                Jishen Zhao and   
                   Sheng Li and   
              Jichuan Chang and   
              John L. Byrne and   
           Laura L. Ramirez and   
                  Kevin Lim and   
                   Yuan Xie and   
               Paolo Faraboschi   Buri: Scaling Big-Memory Computing with
                                  Hardware-Based Memory Expansion  . . . . 31:1--31:??
                  Jan Lucas and   
           Michael Andersch and   
      Mauricio Alvarez-Mesa and   
                   Ben Juurlink   Spatiotemporal SIMT and Scalarization
                                  for Improving GPU Efficiency . . . . . . 32:1--32:??

ACM Transactions on Architecture and Code Optimization
Volume 12, Number 4, January, 2016

               Subhasis Das and   
              Tor M. Aamodt and   
               William J. Dally   Reuse Distance-Based Probabilistic Cache
                                  Replacement  . . . . . . . . . . . . . . 33:1--33:??
                 Etem Deniz and   
                      Alper Sen   MINIME-GPU: Multicore Benchmark
                                  Synthesizer for GPUs . . . . . . . . . . 34:1--34:??
                     Li Tan and   
               Zizhong Chen and   
             Shuaiwen Leon Song   Scalable Energy Efficiency with
                                  Resilience for High Performance
                                  Computing Systems: a Quantitative
                                  Methodology  . . . . . . . . . . . . . . 35:1--35:??
     Kishore Kumar Pusukuri and   
                Rajiv Gupta and   
                Laxmi N. Bhuyan   Tumbler: an Effective Load-Balancing
                                  Technique for Multi-CPU Multicore
                                  Systems  . . . . . . . . . . . . . . . . 36:1--36:??
                Erik Tomusk and   
          Christophe Dubach and   
                Michael O'boyle   Four Metrics to Evaluate Heterogeneous
                                  Multicores . . . . . . . . . . . . . . . 37:1--37:??
        Morteza Hoseinzadeh and   
          Mohammad Arjomand and   
             Hamid Sarbazi-Azad   SPCM: The Striped Phase Change Memory    38:1--38:??
              Chuntao Jiang and   
                  Zhibin Yu and   
            Lieven Eeckhout and   
                    Hai Jin and   
               Xiaofei Liao and   
                  Chengzhong Xu   Two-Level Hybrid Sampled Simulation of
                                  Multithreaded Applications . . . . . . . 39:1--39:??
            Sandeep D'souza and   
                  Soumya J. and   
          Santanu Chattopadhyay   Integrated Mapping and Synthesis
                                  Techniques for Network-on-Chip
                                  Topologies with Express Channels . . . . 40:1--40:??
         Dimitrios Chasapis and   
                 Marc Casas and   
       Miquel Moretó and   
                 Raul Vidal and   
      Eduard Ayguadé and   
       Jesús Labarta and   
                   Mateo Valero   PARSECSs: Evaluating the Impact of Task
                                  Parallelism in the PARSEC Benchmark
                                  Suite  . . . . . . . . . . . . . . . . . 41:1--41:??
           Francisco Gaspar and   
         Luis Taniça and   
         Pedro Tomás and   
            Aleksandar Ilic and   
                   Leonel Sousa   A Framework for Application-Guided Task
                                  Management on Heterogeneous Embedded
                                  Systems  . . . . . . . . . . . . . . . . 42:1--42:??
         Ehsan K. Ardestani and   
  Rafael Trapani Possignolo and   
             Jose Luis Briz and   
                     Jose Renau   Managing Mismatches in Voltage Stacking
                                  with CoreUnfolding . . . . . . . . . . . 43:1--43:??
           Prashant J. Nair and   
           David A. Roberts and   
           Moinuddin K. Qureshi   FaultSim: a Fast, Configurable
                                  Memory-Reliability Simulator for
                                  Conventional and $3$D-Stacked Systems    44:1--44:??
                Byeongcheol Lee   Adaptive Correction of Sampling Bias in
                                  Dynamic Call Graphs  . . . . . . . . . . 45:1--45:??
        Andrew J. Mcpherson and   
            Vijay Nagarajan and   
              Susmit Sarkar and   
                 Marcelo Cintra   Fence Placement for Legacy
                                  Data-Race-Free Programs via
                                  Synchronization Read Detection . . . . . 46:1--46:??
             Ding-Yong Hong and   
              Chun-Chen Hsu and   
              Cheng-Yi Chou and   
              Wei-Chung Hsu and   
               Pangfeng Liu and   
                     Jan-Jan Wu   Optimizing Control Transfer and Memory
                                  Virtualization in Full System Emulators  47:1--47:??
    Aravind Sukumaran-Rajam and   
                Philippe Clauss   The Polyhedral Model of Nonlinear Loops  48:1--48:??
           Prashant J. Nair and   
           David A. Roberts and   
           Moinuddin K. Qureshi   Citadel: Efficiently Protecting Stacked
                                  Memory from TSV and Large Granularity
                                  Failures . . . . . . . . . . . . . . . . 49:1--49:??
            Andrew Anderson and   
              Avinash Malik and   
                    David Gregg   Automatic Vectorization of Interleaved
                                  Data Revisited . . . . . . . . . . . . . 50:1--50:??
                Lihang Zhao and   
               Lizhong Chen and   
                Woojin Choi and   
                 Jeffrey Draper   A Filtering Mechanism to Reduce Network
                                  Bandwidth Utilization of Transaction
                                  Execution  . . . . . . . . . . . . . . . 51:1--51:??
             Olivier Serres and   
              Abdullah Kayi and   
                Ahmad Anbar and   
               Tarek El-Ghazawi   Enabling PGAS Productivity with Hardware
                                  Support for Shared Address Mapping: a
                                  UPC Case Study . . . . . . . . . . . . . 52:1--52:??
          Riccardo Cattaneo and   
            Giuseppe Natale and   
            Carlo Sicignano and   
           Donatella Sciuto and   
    Marco Domenico Santambrogio   On How to Accelerate Iterative Stencil
                                  Loops: a Scalable Streaming-Based
                                  Approach . . . . . . . . . . . . . . . . 53:1--53:??
             Unnikrishnan C and   
               Rupesh Nasre and   
                  Y. N. Srikant   Falcon: a Graph Manipulation Language
                                  for Heterogeneous Systems  . . . . . . . 54:1--54:??
       Rajshekar Kalayappan and   
              Smruti R. Sarangi   FluidCheck: a Redundant Threading-Based
                                  Approach for Reliable Execution in
                                  Manycore Processors  . . . . . . . . . . 55:1--55:??
               Jesse Elwell and   
                 Ryan Riley and   
          Nael Abu-Ghazaleh and   
           Dmitry Ponomarev and   
               Iliano Cervesato   Rethinking Memory Permissions for
                                  Protection Against Cross-Layer Attacks   56:1--56:??
                 Amir Morad and   
              Leonid Yavits and   
           Shahar Kvatinsky and   
                    Ran Ginosar   Resistive GP-SIMD Processing-In-Memory   57:1--57:??
                Yaohua Wang and   
                  Dong Wang and   
               Shuming Chen and   
                Zonglin Liu and   
             Shenggang Chen and   
               Xiaowen Chen and   
                        Xu Zhou   Iteration Interleaving--Based SIMD Lane
                                  Partition  . . . . . . . . . . . . . . . 58:1--58:??
        Tomi Äijö and   
Pekka Jääskeläinen and   
               Tapio Elomaa and   
             Heikki Kultala and   
                   Jarmo Takala   Integer Linear Programming-Based
                                  Scheduling for Transport Triggered
                                  Architectures  . . . . . . . . . . . . . 59:1--59:??
                 Qixiao Liu and   
              Miquel Moreto and   
               Jaume Abella and   
       Francisco J. Cazorla and   
          Daniel A. Jimenez and   
                   Mateo Valero   Sensible Energy Accounting with Abstract
                                  Metering for Multicore Systems . . . . . 60:1--60:??
                  Miao Zhou and   
                      Yu Du and   
             Bruce Childers and   
               Daniel Mosse and   
                    Rami Melhem   Symmetry-Agnostic Coordinated Management
                                  of the Memory Hierarchy in Multicore
                                  Systems  . . . . . . . . . . . . . . . . 61:1--61:??
          Amir Yazdanbakhsh and   
         Gennady Pekhimenko and   
           Bradley Thwaites and   
          Hadi Esmaeilzadeh and   
                 Onur Mutlu and   
                  Todd C. Mowry   RFVP: Rollback-Free Value Prediction
                                  with Safe-to-Approximate Loads . . . . . 62:1--62:??
               Donghyuk Lee and   
              Saugata Ghose and   
         Gennady Pekhimenko and   
                Samira Khan and   
                     Onur Mutlu   Simultaneous Multi-Layer Access:
                                  Improving $3$D-Stacked Memory Bandwidth
                                  at Low Cost  . . . . . . . . . . . . . . 63:1--63:??
                   Yeoul Na and   
              Seon Wook Kim and   
                   Youngsun Han   JavaScript Parallelizing Compiler for
                                  Exploiting Parallelism from
                                  Data-Parallel HTML5 Applications . . . . 64:1--64:??
              Hiroyuki Usui and   
        Lavanya Subramanian and   
        Kevin Kai-Wei Chang and   
                     Onur Mutlu   DASH: Deadline-Aware High-Performance
                                  Memory Scheduler for Heterogeneous
                                  Systems with Hardware Accelerators . . . 65:1--65:??
 Morteza Mohajjel Kafshdooz and   
        Mohammadkazem Taram and   
              Sepehr Assadi and   
                 Alireza Ejlali   A Compile-Time Optimization Method for
                                  WCET Reduction in Real-Time Embedded
                                  Systems through Block Formation  . . . . 66:1--66:??


ACM Transactions on Architecture and Code Optimization
Volume 13, Number 1, April, 2016

        Konstantinos Koukos and   
                Alberto Ros and   
             Erik Hagersten and   
               Stefanos Kaxiras   Building Heterogeneous Unified Virtual
                                  Memories (UVMs) without the Overhead . . 1:1--1:22
               Zhigang Wang and   
               Xiaolin Wang and   
                   Fang Hou and   
                Yingwei Luo and   
                   Zhenlin Wang   Dynamic Memory Balancing for
                                  Virtualization . . . . . . . . . . . . . 2:1--2:??
               Xueyang Wang and   
                   Sek Chai and   
            Michael Isnardi and   
                 Sehoon Lim and   
                   Ramesh Karri   Hardware Performance Counter-Based
                                  Malware Identification and Detection
                                  with Adaptive Compressive Sensing  . . . 3:1--3:??
               Shoaib Akram and   
         Jennifer B. Sartor and   
        Kenzo Van Craeynest and   
                Wim Heirman and   
                Lieven Eeckhout   Boosting the Priority of Garbage:
                                  Scheduling Collection on Heterogeneous
                                  Multicore Processors . . . . . . . . . . 4:1--4:??
                Buse Yilmaz and   
              Baris Aktemur and   
MaríA J. Garzarán and   
                  Sam Kamin and   
            Furkan Kiraç   Autotuning Runtime Specialization for
                                  Sparse Matrix-Vector Multiplication  . . 5:1--5:??
              Mingzhou Zhou and   
                      Bo Wu and   
                Xipeng Shen and   
                Yaoqing Gao and   
                     Graham Yiu   Examining and Reducing the Influence of
                                  Sampling Errors on Feedback-Driven
                                  Optimizations  . . . . . . . . . . . . . 6:1--6:??
           Amanieu D'antras and   
            Cosmin Gorgovan and   
                Jim Garside and   
             Mikel Luján   Optimizing Indirect Branches in Dynamic
                                  Binary Translators . . . . . . . . . . . 7:1--7:??
         Luiz G. A. Martins and   
              Ricardo Nobre and   
  João M. P. Cardoso and   
     Alexandre C. B. Delbem and   
                Eduardo Marques   Clustering-Based Selection for the
                                  Exploration of Compiler Optimization
                                  Sequences  . . . . . . . . . . . . . . . 8:1--8:??
       Sang Wook Stephen Do and   
                  Michel Dubois   Power Efficient Hardware Transactional
                                  Memory: Dynamic Issue of Transactions    9:1--9:??
          Dmitry Evtyushkin and   
           Dmitry Ponomarev and   
              Nael Abu-Ghazaleh   Understanding and Mitigating Covert
                                  Channels Through Branch Predictors . . . 10:1--10:??
                   Hao Zhou and   
                   Jingling Xue   A Compiler Approach for Exploiting
                                  Partial SIMD Parallelism . . . . . . . . 11:1--11:??
     Gert-Jan Van Den Braak and   
                 Henk Corporaal   R-GPU: a Reconfigurable GPU Architecture 12:1--12:??
                   Peng Liu and   
                  Jiyang Yu and   
               Michael C. Huang   Thread-Aware Adaptive Prefetcher on
                                  Multicore Systems: Improving the
                                  Performance for Multithreaded Workloads  13:1--13:??
            Cosmin Gorgovan and   
           Amanieu D'antras and   
             Mikel Luján   MAMBO: a Low-Overhead Dynamic Binary
                                  Modification Tool for ARM  . . . . . . . 14:1--14:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 2, June, 2016

      Panagiotis Theocharis and   
                Bjorn De Sutter   A Bimodal Scheduler for Coarse-Grained
                                  Reconfigurable Arrays  . . . . . . . . . 15:1--15:??
                Ahmad Anbar and   
             Olivier Serres and   
         Engin Kayraklioglu and   
     Abdel-Hameed A. Badawy and   
               Tarek El-Ghazawi   Exploiting Hierarchical Locality in Deep
                                  Parallel Architectures . . . . . . . . . 16:1--16:??
Cecilia González-álvarez and   
         Jennifer B. Sartor and   
      Carlos Álvarez and   
Daniel Jiménez-González and   
                Lieven Eeckhout   MInGLE: an Efficient Framework for
                                  Domain Acceleration Using Low-Power
                                  Specialized Functional Units . . . . . . 17:1--17:??
        Christian Andreetta and   
        Vivien Bégot and   
              Jost Berthold and   
              Martin Elsman and   
             Fritz Henglein and   
           Troels Henriksen and   
         Maj-Britt Nordfang and   
               Cosmin E. Oancea   FinPar: a Parallel Financial Benchmark   18:1--18:??
    Mickaël Dardaillon and   
              Kevin Marquet and   
              Tanguy Risset and   
 Jérôme Martin and   
           Henri-Pierre Charles   A New Compilation Flow for
                                  Software-Defined Radio Applications on
                                  Heterogeneous MPSoCs . . . . . . . . . . 19:1--19:??
               Jianwei Liao and   
     François Trahay and   
                  Guoqiang Xiao   Dynamic Process Migration Based on Block
                                  Access Patterns Occurring in Storage
                                  Servers  . . . . . . . . . . . . . . . . 20:1--20:??
       Amir Hossein Ashouri and   
           Giovanni Mariani and   
           Gianluca Palermo and   
               Eunjung Park and   
               John Cavazos and   
               Cristina Silvano   COBAYN: Compiler Autotuning Framework
                                  Using Bayesian Networks  . . . . . . . . 21:1--21:??
         Kypros Chrysanthou and   
      Panayiotis Englezakis and   
          Andreas Prodromou and   
            Andreas Panteli and   
    Chrysostomos Nicopoulos and   
         Yiannakis Sazeides and   
        Giorgos Dimitrakopoulos   An Online and Real-Time Fault Detection
                                  and Localization Mechanism for
                                  Network-on-Chip Architectures  . . . . . 22:1--22:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 3, September, 2016

               Sanyam Mehta and   
                  Pen-Chung Yew   Variable Liberalization  . . . . . . . . 23:1--23:??
             Hsing-Min Chen and   
             Carole-Jean Wu and   
               Trevor Mudge and   
           Chaitali Chakrabarti   RATT-ECC: Rate Adaptive Two-Tiered Error
                                  Correction Codes for Reliable $3$D
                                  Die-Stacked Memory . . . . . . . . . . . 24:1--24:??
                Wenjie Chen and   
                Zhibin Wang and   
                     Qin Wu and   
              Jiuzhen Liang and   
                    Zhilei Chai   Implementing Dense Optical Flow
                                  Computation on a Heterogeneous FPGA SoC
                                  in C . . . . . . . . . . . . . . . . . . 25:1--25:??
                Nilay Vaish and   
          Michael C. Ferris and   
                  David A. Wood   Optimization Models for Three On-Chip
                                  Network Problems . . . . . . . . . . . . 26:1--26:??
          Somayeh Sardashti and   
               Andre Seznec and   
                  David A. Wood   Yet Another Compressed Cache: a Low-Cost
                                  Yet Effective Compressed Cache . . . . . 27:1--27:??
         Eduardo H. M. Cruz and   
            Matthias Diener and   
    Laércio L. Pilla and   
          Philippe O. A. Navaux   Hardware-Assisted Thread and Data
                                  Mapping in Hierarchical Multicore
                                  Architectures  . . . . . . . . . . . . . 28:1--28:??
             Almutaz Adileh and   
              Stijn Eyerman and   
               Aamer Jaleel and   
                Lieven Eeckhout   Maximizing Heterogeneous Processor
                                  Performance Under Power Constraints  . . 29:1--29:??
               Bagus Wibowo and   
            Abhinav Agrawal and   
             Thomas Stanton and   
                     James Tuck   An Accurate Cross-Layer Approach for
                                  Online Architectural Vulnerability
                                  Estimation . . . . . . . . . . . . . . . 30:1--30:??
                  Manuel Acacio   List of Distinguished Reviewers ACM TACO
                                  2014 . . . . . . . . . . . . . . . . . . 31:1--31:??

ACM Transactions on Architecture and Code Optimization
Volume 13, Number 4, December, 2016

                 Keval Vora and   
                Rajiv Gupta and   
                     Guoqing Xu   Synergistic Analysis of Evolving Graphs  32:1--32:??
              Yunquan Zhang and   
                 Shigang Li and   
                Shengen Yan and   
                   Huiyang Zhou   A Cross-Platform SpMV Framework on
                                  Many-Core Architectures  . . . . . . . . 33:1--33:??
                Junwhan Ahn and   
                Sungjoo Yoo and   
                   Kiyoung Choi   AIM: Energy-Efficient Aggregation Inside
                                  the Memory Hierarchy . . . . . . . . . . 34:1--34:??
        Amir Kavyan Ziabari and   
                  Yifan Sun and   
                   Yenai Ma and   
                 Dana Schaa and   
José L. Abellán and   
                Rafael Ubal and   
                   John Kim and   
                 Ajay Joshi and   
                    David Kaeli   UMH: a Hardware-Based Unified Memory
                                  Hierarchy for Systems with Multiple
                                  Discrete GPUs  . . . . . . . . . . . . . 35:1--35:??
                  Tom Spink and   
             Harry Wagstaff and   
              Björn Franke   Hardware-Accelerated Cross-Architecture
                                  Full-System Virtualization . . . . . . . 36:1--36:??
              Qingchuan Shi and   
              George Kurian and   
              Farrukh Hijaz and   
           Srinivas Devadas and   
                      Omer Khan   LDAC: Locality-Aware Data Access Control
                                  for Large-Scale Multicore Cache
                                  Hierarchies  . . . . . . . . . . . . . . 37:1--37:??
         Fernando Fernandes and   
               Lucas Weigel and   
               Claudio Jung and   
            Philippe Navaux and   
                Luigi Carro and   
                     Paolo Rech   Evaluation of Histogram of Oriented
                                  Gradients Soft Errors Criticality for
                                  Automotive Applications  . . . . . . . . 38:1--38:??
             Saumay Dublish and   
            Vijay Nagarajan and   
                   Nigel Topham   Cooperative Caching for GPUs . . . . . . 39:1--39:??
      Nikolaos Tampouratzis and   
       Pavlos M. Mattheakis and   
         Ioannis Papaefstathiou   Accelerating Intercommunication in
                                  Highly Parallel Systems  . . . . . . . . 40:1--40:??
               Hyukwoo Park and   
                Myungsu Cha and   
                  Soo-Mook Moon   Concurrent JavaScript Parsing for Faster
                                  Loading of Web Apps  . . . . . . . . . . 41:1--41:??
            Dongliang Xiong and   
                  Kai Huang and   
              Xiaowen Jiang and   
                   Xiaolang Yan   Memory Access Scheduling Based on
                                  Dynamic Multilevel Priority in Shared
                                  DRAM Systems . . . . . . . . . . . . . . 42:1--42:??
           Daniele De Sensi and   
           Massimo Torquati and   
                Marco Danelutto   A Reconfiguration Algorithm for
                                  Power-Aware Parallel Applications  . . . 43:1--43:??
           Michael R. Jantz and   
        Forrest J. Robinson and   
             Prasad A. Kulkarni   Impact of Intrinsic Profiling
                                  Limitations on Effectiveness of Adaptive
                                  Optimizations  . . . . . . . . . . . . . 44:1--44:??
            Marvin Damschen and   
                 Lars Bauer and   
               Jörg Henkel   Extending the WCET Problem to Optimize
                                  for Runtime-Reconfigurable Processors    45:1--45:??
                   Zheng Li and   
                  Fang Wang and   
                   Dan Feng and   
                     Yu Hua and   
               Jingning Liu and   
                       Wei Tong   MaxPB: Accelerating PCM Write by
                                  Maximizing the Power Budget Utilization  46:1--46:??
        Saurav Muralidharan and   
            Michael Garland and   
            Albert Sidelnik and   
                      Mary Hall   Designing a Tunable Nested Data-Parallel
                                  Programming System . . . . . . . . . . . 47:1--47:??
              Ismail Akturk and   
                 Riad Akram and   
    Mohammad Majharul Islam and   
           Abdullah Muzahid and   
               Ulya R. Karpuzcu   Accuracy Bugs: a New Class of
                                  Concurrency Bugs to Exploit Algorithmic
                                  Noise Tolerance  . . . . . . . . . . . . 48:1--48:??
                Erik Tomusk and   
          Christophe Dubach and   
                Michael O'boyle   Selecting Heterogeneous Cores for
                                  Diversity  . . . . . . . . . . . . . . . 49:1--49:??
                 Pierre Michaud   Some Mathematical Facts About Optimal
                                  Cache Replacement  . . . . . . . . . . . 50:1--50:??
                 Wenlei Bao and   
              Changwan Hong and   
           Sudheer Chunduri and   
      Sriram Krishnamoorthy and   
    Louis-Noël Pouchet and   
           Fabrice Rastello and   
                  P. Sadayappan   Static and Dynamic Frequency Scaling on
                                  Multicore CPUs . . . . . . . . . . . . . 51:1--51:??
              Tiago M. Vale and   
       João A. Silva and   
            Ricardo J. Dias and   
 João M. Lourenço   Pot: Deterministic Transactional
                                  Execution  . . . . . . . . . . . . . . . 52:1--52:??
                Zhonghai Lu and   
                       Yuan Yao   Aggregate Flow-Based Performance
                                  Fairness in CMPs . . . . . . . . . . . . 53:1--53:??
                Yigit Demir and   
              Nikos Hardavellas   Energy-Proportional Photonic
                                  Interconnects  . . . . . . . . . . . . . 54:1--54:??
            Mehmet Can Kurt and   
      Sriram Krishnamoorthy and   
              Gagan Agrawal and   
                        Bin Ren   User-Assisted Store Recycling for
                                  Dynamic Task Graph Schedulers  . . . . . 55:1--55:??
            Jawad Haj-Yihia and   
                Ahmad Yasin and   
             Yosi Ben Asher and   
                  Avi Mendelson   Fine-Grain Power Breakdown of Modern
                                  Out-of-Order Cores and Its Implications
                                  on Skylake-Based Systems . . . . . . . . 56:1--56:??
            Alberto Scolari and   
   Davide Basilio Bartolini and   
    Marco Domenico Santambrogio   A Software Cache Partitioning System for
                                  Hash-Based Caches  . . . . . . . . . . . 57:1--57:??


ACM Transactions on Architecture and Code Optimization
Volume 14, Number 1, April, 2017

               Lev Mukhanov and   
          Pavlos Petoumenos and   
                 Zheng Wang and   
            Nikos Parasyris and   
  Dimitrios S. Nikolopoulos and   
      Bronis R. De Supinski and   
                   Hugh Leather   ALEA: a Fine-Grained Energy Profiling
                                  Tool . . . . . . . . . . . . . . . . . . 1:1--1:??
              Anuj Pathania and   
 Vanchinathan Venkataramani and   
          Muhammad Shafique and   
               Tulika Mitra and   
               Jörg Henkel   Defragmentation of Tasks in Many-Core
                                  Architecture . . . . . . . . . . . . . . 2:1--2:??
            Darko Zivanovic and   
             Milan Pavlovic and   
            Milan Radulovic and   
              Hyunsung Shin and   
                Jongpil Son and   
             Sally A. Mckee and   
          Paul M. Carpenter and   
         Petar Radojkovi\'c and   
          Eduard Ayguadé   Main Memory in HPC: Do We Need More or
                                  Could We Live with Less? . . . . . . . . 3:1--3:??
             Wenguang Zheng and   
                     Hui Wu and   
                      Qing Yang   WCET-Aware Dynamic I-Cache Locking for a
                                  Single Task  . . . . . . . . . . . . . . 4:1--4:??
             Byung-Sun Yang and   
                Jae-Yun Kim and   
                  Soo-Mook Moon   Exceptionization: a Java VM Optimization
                                  for Non-Java Languages . . . . . . . . . 5:1--5:??
               Rathijit Sen and   
                  David A. Wood   Pareto Governors for Energy-Optimal
                                  Computing  . . . . . . . . . . . . . . . 6:1--6:??
           Mainak Chaudhuri and   
             Mukesh Agrawal and   
                Jayesh Gaur and   
           Sreenivas Subramoney   Micro-Sector Cache: Improving Space
                                  Utilization in Sectored DRAM Caches  . . 7:1--7:??
          Kyriakos Georgiou and   
             Steve Kerrison and   
           Zbigniew Chamski and   
                   Kerstin Eder   Energy Transparency for Deeply Embedded
                                  Programs . . . . . . . . . . . . . . . . 8:1--8:??
               Pengcheng Li and   
                  Xiaoyu Hu and   
                  Dong Chen and   
                Jacob Brock and   
                    Hao Luo and   
              Eddy Z. Zhang and   
                      Chen Ding   LD: Low-Overhead GPU Race Detection
                                  Without Access Monitoring  . . . . . . . 9:1--9:??
     Poovaiah M. Palangappa and   
                Kartik Mohanram   CompEx++: Compression-Expansion Coding
                                  for Energy, Latency, and Lifetime
                                  Improvements in MLC/TLC NVMs . . . . . . 10:1--10:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 2, July, 2017

                Dongwoo Lee and   
               Sangheon Lee and   
                Soojung Ryu and   
                   Kiyoung Choi   Dirty-Block Tracking in a Direct-Mapped
                                  DRAM Cache with Self-Balancing Dispatch  11:1--11:??
     Konstantinos Parasyris and   
       Vassilis Vassiliadis and   
   Christos D. Antonopoulos and   
               Spyros Lalis and   
                Nikolaos Bellas   Significance-Aware Program Execution on
                                  Unreliable Hardware  . . . . . . . . . . 12:1--12:??
    Gleison Mendonça and   
     Breno Guimarães and   
      Péricles Alves and   
      Márcio Pereira and   
        Guido Araújo and   
Fernando Magno Quintão Pereira   DawnCC: Automatic Annotation for Data
                                  Parallelism and Offloading . . . . . . . 13:1--13:??
     Rajeev Balasubramonian and   
            Andrew B. Kahng and   
       Naveen Muralimanohar and   
                Ali Shafiee and   
              Vaishnav Srinivas   CACTI 7: New Tools for Interconnect
                                  Exploration in Innovative Off-Chip
                                  Memories . . . . . . . . . . . . . . . . 14:1--14:??
            Vishwesh Jatala and   
           Jayvant Anantpur and   
                   Amey Karkare   Scratchpad Sharing in GPUs . . . . . . . 15:1--15:??
                Tae Jun Ham and   
      Juan L. Aragón and   
             Margaret Martonosi   Decoupling Data Supply from Computation
                                  for Latency-Tolerant Communication in
                                  Heterogeneous Architectures  . . . . . . 16:1--16:??
               Milan Stanic and   
              Oscar Palomar and   
              Timothy Hayes and   
              Ivan Ratkovic and   
             Adrian Cristal and   
                Osman Unsal and   
                   Mateo Valero   An Integrated Vector-Scalar Design on an
                                  In-Order ARM Core  . . . . . . . . . . . 17:1--17:??
           Fernando A. Endo and   
              Arthur Perais and   
            André Seznec   On the Interactions Between Value
                                  Prediction and Compiler Optimizations in
                                  the Context of EOLE  . . . . . . . . . . 18:1--18:??
       Aswinkumar Sridharan and   
          Biswabandan Panda and   
                   Andre Seznec   Band-Pass Prefetching: an Effective
                                  Prefetch Management Mechanism Using
                                  Prefetch-Fraction Metric in Multi-Core
                                  Systems  . . . . . . . . . . . . . . . . 19:1--19:??
        Andrés Goens and   
              Sergio Siccha and   
            Jeronimo Castrillon   Symmetry in Software Synthesis . . . . . 20:1--20:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 3, September, 2017

               Sander Vocke and   
             Henk Corporaal and   
               Roel Jordans and   
            Rosilde Corvino and   
                       Rick Nas   Extending Halide to Improve Software
                                  Development for Imaging DSPs . . . . . . 21:1--21:??
          Nicklas Bo Jensen and   
                  Sven Karlsson   Improving Loop Dependence Analysis . . . 22:1--22:??
              Stefan Ganser and   
     Armin Grösslinger and   
           Norbert Siegmund and   
                  Sven Apel and   
             Christian Lengauer   Iterative Schedule Optimization for
                                  Parallelization in the Polyhedron Model  23:1--23:??
                    Wei Wei and   
                Dejun Jiang and   
                  Jin Xiong and   
                    Mingyu Chen   HAP: Hybrid-Memory-Aware Partition in
                                  Shared Last-Level Cache  . . . . . . . . 24:1--24:??
            Dongliang Xiong and   
                  Kai Huang and   
              Xiaowen Jiang and   
                   Xiaolang Yan   Providing Predictable Performance via a
                                  Slowdown Estimation Model  . . . . . . . 25:1--25:??
                    Jing Pu and   
                Steven Bell and   
                  Xuan Yang and   
                Jeff Setter and   
         Stephen Richardson and   
      Jonathan Ragan-Kelley and   
                  Mark Horowitz   Programming Heterogeneous Systems from
                                  an Image Processing DSL  . . . . . . . . 26:1--26:??
                Ayman Hroub and   
           M. E. S. Elrabaa and   
              M. F. Mudawar and   
                     A. Khayyat   Efficient Generation of Compact
                                  Execution Traces for Multicore
                                  Architectural Simulations  . . . . . . . 27:1--27:??
              Nicolas Weber and   
                Michael Goesele   MATOG: Array Layout Auto-Tuning for CUDA 28:1--28:??
            Amir H. Ashouri and   
             Andrea Bignoli and   
           Gianluca Palermo and   
           Cristina Silvano and   
            Sameer Kulkarni and   
                   John Cavazos   MiCOMP: Mitigating the Compiler
                                  Phase-Ordering Problem Using
                                  Optimization Sub-Sequences and Machine
                                  Learning . . . . . . . . . . . . . . . . 29:1--29:??
                Erik Vermij and   
             Leandro Fiorin and   
              Rik Jongerius and   
       Christoph Hagleitner and   
           Jan Van Lunteren and   
                   Koen Bertels   An Architecture for Integrated Near-Data
                                  Processors . . . . . . . . . . . . . . . 30:1--30:??
          Andreas Diavastos and   
                 Pedro Trancoso   SWITCHES: a Lightweight Runtime for
                                  Dataflow Execution of Tasks on
                                  Many-Cores . . . . . . . . . . . . . . . 31:1--31:??

ACM Transactions on Architecture and Code Optimization
Volume 14, Number 4, December, 2017

                 Rahul Jain and   
        Preeti Ranjan Panda and   
           Sreenivas Subramoney   Cooperative Multi-Agent Reinforcement
                                  Learning-Based Co-optimization of Cores,
                                  Caches, and On-chip Network  . . . . . . 32:1--32:??
           Daniele De Sensi and   
         Tiziano De Matteis and   
           Massimo Torquati and   
          Gabriele Mencagli and   
                Marco Danelutto   Bringing Parallel Patterns Out of the
                                  Corner: The P$^3$ARSEC Benchmark Suite   33:1--33:??
               Chencheng Ye and   
                  Chen Ding and   
                    Hao Luo and   
                Jacob Brock and   
                  Dong Chen and   
                        Hai Jin   Cache Exclusivity and Sharing: Theory
                                  and Optimization . . . . . . . . . . . . 34:1--34:??
          Rahul Shrivastava and   
           V. Krishna Nandivada   Energy-Efficient Compilation of
                                  Irregular Task-Parallel Loops  . . . . . 35:1--35:??
                Julien Proy and   
           Karine Heydemann and   
          Alexandre Berzati and   
                   Albert Cohen   Compiler-Assisted Loop Hardening Against
                                  Fault Attacks  . . . . . . . . . . . . . 36:1--36:??
         Christina Peterson and   
                  Damian Dechev   A Transactional Correctness Tool for
                                  Abstract Data Types  . . . . . . . . . . 37:1--37:??
             Matteo Ferroni and   
               Andrea Corna and   
             Andrea Damiani and   
          Rolando Brondolin and   
         Juan A. Colmenares and   
             Steven Hofmeyr and   
        John D. Kubiatowicz and   
          Marco D. Santambrogio   Power Consumption Models for
                                  Multi-Tenant Server Infrastructures  . . 38:1--38:??
            Milad Mohammadi and   
              Tor M. Aamodt and   
               William J. Dally   CG-OoO: Energy-Efficient Coarse-Grain
                                  Out-of-Order Execution Near In-Order
                                  Energy with Near Out-of-Order
                                  Performance  . . . . . . . . . . . . . . 39:1--39:??
               Shivam Swami and   
     Poovaiah M. Palangappa and   
                Kartik Mohanram   ECS: Error-Correcting Strings for
                                  Lifetime Improvements in Nonvolatile
                                  Memories . . . . . . . . . . . . . . . . 40:1--40:??
             M. Waqar Azhar and   
         Per Stenström and   
        Vassilis Papaefstathiou   SLOOP: QoS-Supervised Loop Execution to
                                  Reduce Energy on Heterogeneous
                                  Architectures  . . . . . . . . . . . . . 41:1--41:??
     Raghavendra Kanakagiri and   
          Biswabandan Panda and   
                   Madhu Mutyam   MBZip: Multiblock Data Compression . . . 42:1--42:??
              Richard Neill and   
                Andi Drebes and   
                    Antoniu Pop   Fuse: Accurate Multiplexing of Hardware
                                  Performance Counters Across Executions   43:1--43:??
          Somayeh Sardashti and   
                  David A. Wood   Could Compression Be of General Use?
                                  Evaluating Memory Compression across
                                  Domains  . . . . . . . . . . . . . . . . 44:1--44:??
                 Libo Huang and   
            Yashuai Lü and   
                    Li Shen and   
                   Zhiying Wang   Improving the Efficiency of GPGPU
                                  Work-Queue Through Data Awareness  . . . 45:1--45:??
           Alexandra Angerd and   
               Erik Sintorn and   
             Per Stenström   A Framework for Automated and Controlled
                                  Floating-Point Accuracy Reduction in
                                  Graphics Applications on GPUs  . . . . . 46:1--46:??
              Jaime Arteaga and   
  Stéphane Zuckerman and   
                   Guang R. Gao   Generating Fine-Grain Multithreaded
                                  Applications Using a Multigrain Approach 47:1--47:??
              Ramyad Hadidi and   
                 Lifeng Nai and   
                Hyojong Kim and   
                    Hyesoon Kim   CAIRO: a Compiler-Assisted Technique for
                                  Enabling Instruction-Level Offloading of
                                  Processing-In-Memory . . . . . . . . . . 48:1--48:??
               Hongyeol Lim and   
                      Giho Park   Triple Engine Processor (TEP): a
                                  Heterogeneous Near-Memory Processor for
                                  Diverse Kernel Operations  . . . . . . . 49:1--49:??
          George Patsilaras and   
                     James Tuck   ReDirect: Reconfigurable Directories for
                                  Multicore Architectures  . . . . . . . . 50:1--50:??
               Adarsh Patil and   
         Ramaswamy Govindarajan   HAShCache: Heterogeneity-Aware Shared
                                  DRAMCache for Integrated Heterogeneous
                                  Systems  . . . . . . . . . . . . . . . . 51:1--51:??
           Christophe Alias and   
               Alexandru Plesco   Optimizing Affine Control With Semantic
                                  Factorizations . . . . . . . . . . . . . 52:1--52:??
             George Matheou and   
           Paraskevas Evripidou   Data-Driven Concurrency for High
                                  Performance Computing  . . . . . . . . . 53:1--53:??
       Giorgis Georgakoudis and   
        Hans Vandierendonck and   
               Peter Thoman and   
      Bronis R. De Supinski and   
           Thomas Fahringer and   
      Dimitrios S. Nikolopoulos   SCALO: Scalability-Aware Parallelism
                                  Orchestration for Multi-Threaded
                                  Workloads  . . . . . . . . . . . . . . . 54:1--54:??
             Toufik Baroudi and   
              Rachid Seghir and   
               Vincent Loechner   Optimization of Triangular and Banded
                                  Matrix Operations Using $2$ d-Packed
                                  Layouts  . . . . . . . . . . . . . . . . 55:1--55:??