Last update:
Fri Jan 10 10:12:27 MST 2025
C. Alvarez and
J. Corbal and
E. Salami and
M. Valero Initial Results on Fuzzy Floating Point
Computation for Multimedia Processors 1--1
A. Gordon-Ross and
S. Cotterell and
F. Vahid Exploiting Fixed Programs in Embedded
Systems: a Loop Cache Example . . . . . 2--2
Jin-Hyuck Choi and
Jung-Hoon Lee and
Seh-Woong Jeong and
Shin-Dug Kim and
C. Weems A Low Power TLB Structure for Embedded
Systems . . . . . . . . . . . . . . . . 3--3
B. Towles and
W. J. Dally Worst-case Traffic for Oblivious Routing
Functions . . . . . . . . . . . . . . . 4--4
O. S. Unsal and
C. M. Krishna and
C. A. Mositz Cool-Fetch: Compiler-Enabled Power-Aware
Fetch Throttling . . . . . . . . . . . . 5--5
Li Shang and
L. Peh and
N. K. Jha Power-efficient Interconnection
Networks: Dynamic Voltage Scaling with
Links . . . . . . . . . . . . . . . . . 6--6
A. J. KleinOsowski and
D. J. Lilja MinneSPEC: a New SPEC Benchmark Workload
for Simulation-Based Computer
Architecture Research . . . . . . . . . 7--7
H. Vandierendonck and
K. De Bosschere An Address Transformation Combining
Block- and Word-Interleaving . . . . . . 8--8
S. Tambat and
S. Vajapeyam Page-Level Behavior of Cache Contention 9--9
Philo Juang and
P. Diodato and
S. Kaxiras and
K. Skadron and
Zhigang Hu and
M. Martonosi and
D. W. Clark Implementing Decay Techniques using 4T
Quasi-Static Memory Cells . . . . . . . 10--10
YoungChul Sohn and
NaiHoon Jung and
Seungryoul Maeng Request Reordering to Enhance the
Performance of Strict Consistency Models 11--11
K. A. Shaw and
W. J. Dally Migration in Single Chip Multiprocessors 12--12
K.-H. Sihn and
Joonwon Lee and
Jung-Wan Cho A Speculative Coherence Scheme using
Decoupling Synchronization for
Multiprocessor Systems . . . . . . . . . 1--1
R. Kumar and
K. Farkas and
N. P. Jouppi and
P. Ranganathan and
D. M. Tullsen Processor Power Reduction Via Single-ISA
Heterogeneous Multi-Core Architectures 2--2
R. Sendag and
Peng-fei Chuang and
D. J. Lilja Address Correlation: Exceeding the
Limits of Locality . . . . . . . . . . . 3--3
A. Milenkovic and
M. Milenkovic Stream-Based Trace Compression . . . . . 4--4
Chuanjun Zhang and
F. Vahid and
Jun Yang and
W. Walid A Way-Halting Cache for Low-Energy
High-Performance Systems . . . . . . . . 5--5
A. Cohen and
F. Finkelstein and
A. Mendelson and
R. Ronen and
D. Rudoy On Estimating Optimal Performance of CPU
Dynamic Thermal Management . . . . . . . 6--6
A. Cristal and
J. F. Martinez and
J. Llosa and
M. Valero A case for resource-conscious
out-of-order processors . . . . . . . . 7--7
D. Citron Exploiting Low Entropy to Reduce Wire
Delay . . . . . . . . . . . . . . . . . 1--1
A. Singh and
W. J. Dally and
B. Towles and
A. K. Gupta Globally Adaptive Load-Balanced Routing
on Tori . . . . . . . . . . . . . . . . 2--2
M. E. Gomez and
J. Duato and
J. Flich and
P. Lopez and
A. Robles and
N. A. Nordbotten and
O. Lysne and
T. Skeie An Efficient Fault-Tolerant Routing
Methodology for Meshes and Tori . . . . 3--3
J. M. Stine and
N. P. Carter and
J. Flich Comparing Adaptive Routing and Dynamic
Voltage Scaling for Link Power Reduction 4--4
B. Robatmili and
N. Yazdani and
S. Sardashti and
M. Nourani Thread-Sensitive Instruction Issue for
SMT Processors . . . . . . . . . . . . . 5--5
Yue Luo and
L. K. John Efficiently Evaluating Speedup Using
Sampled Processor Simulation . . . . . . 6--6
L. Ceze and
K. Strauss and
J. Tuck and
J. Renau and
J. Torrellas CAVA: Hiding L2 Misses with
Checkpoint-Assisted Value Prediction . . 7--7
A. Singh and
W. J. Dally Buffer and Delay Bounds in High Radix
Interconnection Networks . . . . . . . . 8--8
A. L. Holloway and
G. S. Sohi Characterization of Problem Stores . . . 9--9
Y. Sazeides and
R. Kumar and
D. M. Tullsen and
T. Constantinou The Danger of Interval-Based Power
Efficiency Metrics: When Worst Is Best 1--1
O. Mutlu and
Hyesoon Kim and
J. Stark and
Y. N. Patt On Reusing the Results of Pre-Executed
Instructions in a Runahead Execution
Processor . . . . . . . . . . . . . . . 2--2
Chuanjun Zhang Balanced instruction cache: reducing
conflict misses of direct-mapped caches
through balanced subarray accesses . . . 2--5
G. Ottoni and
R. Rangan and
A. Stoler and
M. J. Bridges and
D. I. August From sequential programs to concurrent
threads . . . . . . . . . . . . . . . . 6--9
A. K. Gupta and
W. J. Dally Topology optimization of interconnection
networks . . . . . . . . . . . . . . . . 10--13
J.-L. Gaudiot and
Y. Patt and
K. Skadon Foreword . . . . . . . . . . . . . . . . 11--11
T. Y. Morad and
U. C. Weiser and
A. Kolodnyt and
M. Valero and
E. Ayguade Performance, power efficiency and
scalability of asymmetric cluster chip
multiprocessors . . . . . . . . . . . . 14--17
N. Riley and
C. Zilles Probabilistic counter updates for
predictor hysteresis and bias . . . . . 18--21
Huiyang Zhou A case for fault tolerance and
performance enhancement using chip
multi-processors . . . . . . . . . . . . 22--25
Moon-Sang Lee and
Sang-Kwon Lee and
Joonwon Lee and
Seung-Ryoul Maeng Adopting system call based address
translation into user-level
communication . . . . . . . . . . . . . 26--29
Jung Ho Ahn and
W. J. Dally Data parallel address architecture . . . 30--33
N. Eisley and
Li-Shiuan Peh and
Li Shang In-network cache coherence . . . . . . . 34--37
R. Srinivasan and
J. Cook and
O. Lubeck Performance modeling using Monte Carlo
simulation . . . . . . . . . . . . . . . 38--41
O. Ergin and
O. Unsal and
X. Vera and
A. Gonzalez Exploiting Narrow Values for Soft Error
Tolerance . . . . . . . . . . . . . . . 12--12
W. Li and
S. Mohanty and
K. Kavi A Page-based Hybrid (Software--Hardware)
Dynamic Memory Allocator . . . . . . . . 13--13
J. Donald and
M. Martonosi An Efficient, Practical Parallelization
Methodology for Multicore Architecture
Simulation . . . . . . . . . . . . . . . 14--14
A. Bracy and
K. Doshi and
Q. Jacobson Disintermediated Active Communication 15--15
A. Mallik and
B. Lin and
G. Memik and
P. Dinda and
R. P. Dick User-Driven Frequency Scaling . . . . . 16--16
C. Blundell and
E. C. Lewis and
M. M. K. Martin Subtleties of transactional memory
atomicity semantics . . . . . . . . . . 17--17
G. Price and
M. Vachharajani A Case for Compressing Traces with BDDs 18--18
M. Moreto Planas and
F. Cazorla and
A. Ramirez and
M. Valero Explaining Dynamic Cache Partitioning
Speed Ups . . . . . . . . . . . . . . . 1--4
N. Enright Jerger and
M. Lipasti and
L. Peh Circuit-Switched Coherence . . . . . . . 5--8
S. Kodakara and
J. Kim and
D. Lilja and
D. Hawkins and
W. Hsu and
P. Yew CIM: a Reliable Metric for Evaluating
Program Phase Classifications . . . . . 9--12
W. R. Dieter and
A. Kaveti and
H. G. Dietz Low-Cost Microarchitectural Support for
Improved Floating-Point Accuracy . . . . 13--16
Y. Etsion and
D. G. Feitelson Probabilistic Prediction of Temporal
Locality . . . . . . . . . . . . . . . . 17--20
Z. Guz and
I. Keidar and
A. Kolodny and
U. Weiser Nahalal: Cache Organization for Chip
Multiprocessors . . . . . . . . . . . . 21--24
J. A. Joao and
O. Mutlu and
H. Kim and
Y. N. Patt Dynamic Predication of Indirect Jumps 25--28
A. Das and
S. Ozdemir and
G. Memik and
J. Zambreno and
A. Choudhary Microarchitectures for Managing Chip
Revenues under Process Variations . . . 29--32
J. Zebchuk and
A. Moshovos A Building Block for Coarse-Grain
Optimizations in the On-Chip Memory
Hierarchy . . . . . . . . . . . . . . . 33--36
J. Kim and
J. Balfour and
W. J. Dally Flattened Butterfly Topology for On-Chip
Networks . . . . . . . . . . . . . . . . 37--40
X. Xiao and
J. Lee A Novel Parallel Deadlock Detection
Algorithm and Hardware for
Multiprocessor System-on-a-Chip . . . . 41--44
D. August and
J. Chang and
S. Girbal and
D. Gracia-Perez and
G. Mouchard and
D. A. Penry and
O. Temam and
N. Vachharajani UNISIM: an Open Simulation Environment
and Library for Complex Architecture
Design and Collaborative Development . . 45--48
R. Sendag and
J. Yi and
P. Chuang Branch Misprediction Prediction:
Complementary Branch Predictors . . . . 49--52
G. Yalcin and
O. Ergin Using tag-match comparators for
detecting soft errors . . . . . . . . . 53--56
J. A. Joao and
O. Mutlu and
H. Kim and
Y. N. Patt Dynamic Predication of Indirect Jumps 1--4
A. Das and
S. Ozdemir and
G. Memik and
J. Zambreno and
A. Choudhary Microarchitectures for Managing Chip
Revenues under Process Variations . . . 5--8
A. Roth Physical register reference counting . . 9--12
J. Flich and
J. Duato Logic-Based Distributed Routing for NoCs 13--16
J. H. Yoon and
E. H. Nam and
Y. J. Seong and
H. Kim and
B. Kim and
S. L. Min and
Y. Cho Chameleon: a High Performance Flash/FRAM
Hybrid Solid State Disk Architecture . . 17--20
A. Biswas and
P. Racunas and
J. Emer and
S. Mukherjee Computing Accurate AVFs using ACE
Analysis on Performance Models: a
Rebuttal . . . . . . . . . . . . . . . . 21--24
S. Cho and
R. Melhem Corollaries to Amdahl's Law for Energy 25--28
J. Balfour and
W. Dally and
D. Black-Schaffer and
V. Parikh and
J. Park An Energy-Efficient Processor
Architecture for Embedded Systems . . . 29--32
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2
D. Pao and
W. Lin and
B. Liu Pipelined Architecture for Multi-String
Matching . . . . . . . . . . . . . . . . 33--36
R. Sunkam Ramanujam and
B. Lin Randomized Partially-Minimal Routing on
Three-Dimensional Mesh Networks . . . . 37--40
D. Black-Schaffer and
J. Balfour and
W. Dally and
V. Parikh and
J. Park Hierarchical Instruction Register
Organization . . . . . . . . . . . . . . 41--44
J. Lee and
X. Xiao A Parallel Deadlock Detection Algorithm
with $ O(1) $ Overall Run-time
Complexity . . . . . . . . . . . . . . . 45--48
C. Gomez Requena and
F. Gilabert Villamon and
M. Gomez and
P. Lopez and
J. Duato Beyond Fat-tree: Unidirectional
Load--Balanced Multistage
Interconnection Network . . . . . . . . 49--52
Z. Li and
C. Zhu and
L. Shang and
R. Dick and
Y. Sun Transaction-Aware Network-on-Chip
Resource Reservation . . . . . . . . . . 53--56
S. Fide and
S. Jenks Proactive Use of Shared L3 Caches to
Enhance Cache Communications in
Multi-Core Processors . . . . . . . . . 57--60
I. Walter and
I. Cidon and
A. Kolodny BENoC: a Bus-Enhanced Network on-Chip
for a Power Efficient CMP . . . . . . . 61--64
A. Golander and
S. Weiss and
R. Ronen DDMR: Dynamic and Scalable Dual Modular
Redundancy with Short Validation
Intervals . . . . . . . . . . . . . . . 65--68
Anonymous Information for authors . . . . . . . . c3--c3
Anonymous IEEE Computer Society [Cover 4] . . . . c4--c4
Rohit Sunkam Ramanujam and
Bill Lin Weighted Random Routing on Torus
Networks . . . . . . . . . . . . . . . . 1--4
Jung Ho Ahn and
Jacob Leverich and
Robert S. Schreiber and
Norman P. Jouppi Multicore DIMM: an Energy Efficient
Memory Module with Independently
Controlled DRAMs . . . . . . . . . . . . 5--8
Po-Han Wang and
Yen-Ming Chen and
Chia-Lin Yang and
Yu-Jung Cheng A Predictive Shutdown Technique for GPU
Shader Processors . . . . . . . . . . . 9--12
Christopher Barnes and
Pranav Vaidya and
Jaehwan John Lee An XML-Based ADL Framework for Automatic
Generation of Multithreaded Computer
Architecture Simulators . . . . . . . . 13--16
Carlos Luque and
Miquel Moreto and
Francisco J. Cazorla and
Roberto Gioiosa and
Alper Buyuktosunoglu and
Mateo Valero CPU Accounting in CMP Processors . . . . 17--20
Vassos Soteriou and
Rohit Sunkam Ramanujam and
Bill Lin and
Li-Shiuan Peh A High-Throughput Distributed
Shared-Buffer NoC Router . . . . . . . . 21--24
Zvika Guz and
Evgeny Bolotin and
Idit Keidar and
Avinoam Kolodny and
Avi Mendelson and
Uri C. Weiser Many-Core vs. Many-Thread Machines: Stay
Away From the Valley . . . . . . . . . . 25--28
Aniruddha Desai and
Jugdutt Singh Architecture Independent
Characterization of Embedded Java
Workloads . . . . . . . . . . . . . . . 29--32
Elisardo Antelo A Comment on ``Beyond Fat-tree:
Unidirectional Load-Balanced Multistage
Interconnection Network'' . . . . . . . 33--34
Anonymous [Advertisement] . . . . . . . . . . . . 35--35
Anonymous Ad --- IEEE Computer Society Digital
Library . . . . . . . . . . . . . . . . 36--36
Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous Information for authors . . . . . . . . c3--c3
Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
Jean-Luc Gaudiot Introducing the New Editor-in-Chief of
\booktitleIEEE Computer Architecture
Letters . . . . . . . . . . . . . . . . 37--38
K. Skadron Letter from the Editor . . . . . . . . . 39--39
Kevin Skadron Untitled . . . . . . . . . . . . . . . . 39--39
Jing Xin and
Russ Joseph Exploiting Locality to Improve
Circuit-level Timing Speculation . . . . 40--43
Arvind Sudarsanam and
Ramachandra Kallam and
Aravind Dasu PRR--PRR Dynamic Relocation . . . . . . 44--47
Jacob Leverich and
Matteo Monchiero and
Vanish Talwar and
Partha Ranganathan and
Christos Kozyrakis Power Management of Datacenter Workloads
Using Per-Core Power Gating . . . . . . 48--51
Enric Musoll A Process-Variation Aware Technique for
Tile-Based, Massive Multicore Processors 52--55
Alexandro Baldassin and
Felipe Klein and
Guido Araujo and
Rodolfo Azevedo and
Paulo Centoducatte Characterizing the Energy Consumption of
Software Transactional Memory . . . . . 56--59
James Balfour and
R. Curtis Harting and
William J. Dally Operand Registers and Explicit Operand
Forwarding . . . . . . . . . . . . . . . 60--63
Derek Chiou and
Hari Angepat and
Nikhil A. Patil and
Dam Sunwoo Accurate Functional-First Multicore
Simulators . . . . . . . . . . . . . . . 64--67
Anonymous [Advertisement] . . . . . . . . . . . . 68--68
Anonymous [Advertisement] . . . . . . . . . . . . 69--69
Anonymous [Advertisement] . . . . . . . . . . . . 70--70
Anonymous [Advertisement] . . . . . . . . . . . . 71--71
Anonymous [Advertisement] . . . . . . . . . . . . 72--72
Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous Information for authors . . . . . . . . c3--c3
Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
Shruti Patil and
David J. Lilja Using Resampling Techniques to Compute
Confidence Intervals for the Harmonic
Mean of Rate-Based Performance Metrics 1--4
Andre Seznec A Phase Change Memory as a Secure Main
Memory . . . . . . . . . . . . . . . . . 5--8
Seon-yeong Park and
Euiseong Seo and
Ji-Yong Shin and
Seungryoul Maeng and
Joonwon Lee Exploiting Internal Parallelism of
Flash-based SSDs . . . . . . . . . . . . 9--12
Hari Subramoni and
Fabrizio Petrini and
Virat Agarwal and
Davide Pasetto Intra-Socket and Inter-Socket
Communication in Multi-core Systems . . 13--16
Giang Hoang and
Chang Bae and
John Lange and
Lide Zhang and
Peter Dinda and
Russ Joseph A Case for Alternative Nested Paging
Models for Virtualized Systems . . . . . 17--20
Evgeni Krimer and
Robert Pawlowski and
Mattan Erez and
Patrick Chiang Synctium: a Near-Threshold Stream
Processor for Energy-Constrained
Parallel Applications . . . . . . . . . 21--24
Andrew Hilton and
Amir Roth SMT-Directory: Efficient Load-Load
Ordering for SMT . . . . . . . . . . . . 25--28
Mohammad Hammoud and
Sangyeun Cho and
Rami G. Melhem A Dynamic Pressure-Aware Associative
Placement Strategy for Large Scale Chip
Multiprocessors . . . . . . . . . . . . 29--32
Hyungjun Kim and
Paul V. Gratz Leveraging Unused Cache Block Words to
Reduce Power in CMP Interconnect . . . . 33--36
Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous Information for authors . . . . . . . . c3--c3
Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
K. Skadron Editorial: Letter from the
Editor-in-Chief . . . . . . . . . . . . 37--44
Kevin Skadron Untitled . . . . . . . . . . . . . . . . 37--44
Syed Muhammad Zeeshan Iqbal and
Yuchen Liang and
Hakan Grahn ParMiBench --- an Open-Source Benchmark
for Embedded Multiprocessor Systems . . 45--48
Zhen Fang and
Erik G. Hallnor and
Bin Li and
Michael Leddige and
Donglai Dai and
Seung Eun Lee and
Srihari Makineni and
Ravi Iyer Boomerang: Reducing Power Consumption of
Response Packets in NoCs with Minimal
Performance Impact . . . . . . . . . . . 49--52
Michael J. Lyons and
Mark Hempstead and
Gu-Yeon Wei and
David Brooks The Accelerator Store framework for
high-performance, low-power
accelerator-based systems . . . . . . . 53--56
Ran Manevich and
Israel Cidon and
Avinoam Kolodny and
Isask'har Walter Centralized Adaptive Routing for NoCs 57--60
Meng Zhang and
Alvin R. Lebeck and
Daniel J. Sorin Fractal Consistency: Architecting the
Memory System to Facilitate Verification 61--64
Anonymous Advertisement --- \booktitleIEEE
Transactions on Computers Celebrates 60
Years . . . . . . . . . . . . . . . . . 65--65
Anonymous 2011 IEEE Computer Society Simulator
Design Competition . . . . . . . . . . . 66--66
Anonymous Advertisement --- Special Student Offer 67--67
Anonymous Advertisement --- Distinguish Yourself
With the CSDP . . . . . . . . . . . . . 68--68
Anonymous Conference Proceedings Services (CPS)
[advertisement] . . . . . . . . . . . . 69--69
Anonymous IEEE Computer Society Jobs . . . . . . . 70--70
Anonymous Advertisement --- Stay Connected to the
IEEE Computer Society . . . . . . . . . 71--71
Anonymous Advertisement --- Computer Society
Digital Library . . . . . . . . . . . . 72--72
Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous Information for authors . . . . . . . . c3--c3
Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
K. Skadron Editorial: Letter from the
Editor-in-Chief . . . . . . . . . . . . 1--3
Kevin Skadron Untitled . . . . . . . . . . . . . . . . 1--3
Hans Vandierendonck and
Andre Seznec Fairness Metrics for Multi-Threaded
Processors . . . . . . . . . . . . . . . 4--7
Jie Tang and
Shaoshan Liu and
Zhimin Gu and
Chen Liu and
Jean-Luc Gaudiot Prefetching in Embedded Mobile Systems
Can Be Energy-Efficient . . . . . . . . 8--11
Omer Khan and
Mieszko Lis and
Yildiz Sinangil and
Srinivas Devadas DCC: a Dependable Cache Coherence
Multicore Architecture . . . . . . . . . 12--15
Paul Rosenfeld and
Elliott Cooper-Balis and
Bruce Jacob DRAMSim2: a Cycle Accurate Memory System
Simulator . . . . . . . . . . . . . . . 16--19
Chunyang Gou and
Georgi N. Gaydadjiev Exploiting SPMD Horizontal Locality . . 20--23
Xiaoqun Wang and
Zhenzhou Ji and
Chen Fu and
Mingzeng Hu GCMS: a Global Contention Management
Scheme in Hardware Transactional Memory 24--27
Anonymous 2010 Reviewers List . . . . . . . . . . 28--28
Anonymous 2010 Annual Index . . . . . . . . . . . ??
Anonymous Cover 2 . . . . . . . . . . . . . . . . c2--c2
Anonymous Cover 3 . . . . . . . . . . . . . . . . c3--c3
Anonymous Cover 4 . . . . . . . . . . . . . . . . c4--c4
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Jason Mars and
Lingjia Tang and
Robert Hundt Heterogeneity in ``Homogeneous''
Warehouse-Scale Computers: a Performance
Opportunity . . . . . . . . . . . . . . 29--32
George Michelogiannakis and
Nan Jiang and
Daniel U. Becker and
William J. Dally Packet Chaining: Efficient Single-Cycle
Allocation for On-Chip Networks . . . . 33--36
Chen-Han Ho and
Garret Staus and
Aaron Ulmer and
Karthikeyan Sankaralingam Exploring the Interaction Between Device
Lifetime Reliability and Security
Vulnerabilities . . . . . . . . . . . . 37--40
Carles Hernandez and
Antoni Roca and
Jose Flich and
Federico Silla and
Jose Duato Fault-Tolerant Vertical Link Design for
Effective 3D Stacking . . . . . . . . . 41--44
Inseok Choi and
Minshu Zhao and
Xu Yang and
Donald Yeung Experience with Improving Distributed
Shared Cache Performance on Tilera's
Tile Processor . . . . . . . . . . . . . 45--48
Pablo Prieto and
Valentin Puente and
Jose-Angel Gregorio Multilevel Cache Modeling for
Chip-Multiprocessor Systems . . . . . . 49--52
Kostas Siozios and
Dimitrios Rodopoulos and
Dimitrios Soudris On Supporting Rapid Thermal Analysis . . 53--56
Anonymous Cover 3 . . . . . . . . . . . . . . . . c3--c3
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous IEEE Computer Society [society
information] . . . . . . . . . . . . . . c4--c4
Anonymous Publication information . . . . . . . . c2--c2
Simha Sethumadhavan and
Ryan Roberts and
Yannis Tsividis A Case for Hybrid Discrete-Continuous
Architectures . . . . . . . . . . . . . 1--4
Ji Kong and
Peilin Liu and
Yu Zhang Atomic Streaming: a Framework of On-Chip
Data Supply System for Task-Parallel
MPSoCs . . . . . . . . . . . . . . . . . 5--8
Abhishek Deb and
Josep Maria Codina and
Antonio Gonzalez A HW/SW Co-designed Programmable
Functional Unit . . . . . . . . . . . . 9--12
Roberta Piscitelli and
Andy D. Pimentel A High-Level Power Model for MPSoC on
FPGA . . . . . . . . . . . . . . . . . . 13--16
Ian Finlayson and
Gang-Ryung Uh and
David Whalley and
Gary Tyson An Overview of Static Pipelining . . . . 17--20
Lisa Wu and
Martha A. Kim and
Stephen A. Edwards Cache Impacts of Datatype Acceleration 21--24
Anonymous 2011 Reviewers List . . . . . . . . . . 25--26
Anonymous There now is a quick and easy way to
find out about our collection of
\booktitleTransactions [Advertisement] 26--26
Anonymous Advertisement --- Conference Publishing
Services (CPS) . . . . . . . . . . . . . 28--28
Anonymous 2011 Annual Index . . . . . . . . . . . ??
Anonymous [Cover2] . . . . . . . . . . . . . . . . c2--c2
Anonymous [Cover3] . . . . . . . . . . . . . . . . c3--c3
Anonymous [Front cover and table of contents] . . c1--c1
Anonymous IEEE Computer Society [Back cover] . . . c4--c4
John D. Davis and
Suzanne Rivoire and
Moises Goldszmidt and
Ehsan K. Ardestani Including Variability in Large-Scale
Cluster Power Models . . . . . . . . . . 29--32
Nagesh B. Lakshminarayana and
Jaekyu Lee and
Hyesoon Kim and
Jinwoo Shin DRAM Scheduling Policy for GPGPU
Architectures Based on a Potential
Function . . . . . . . . . . . . . . . . 33--36
Yaohua Wang and
Shuming Chen and
Kai Zhang and
Jianghua Wan and
Xiaowen Chen and
Hu Chen and
Haibo Wang Instruction Shuffle: Achieving MIMD-like
Performance on SIMD Architectures . . . 37--40
Reena Panda and
Paul V. Gratz and
Daniel A. Jiménez B-Fetch: Branch Prediction Directed
Prefetching for In-Order Processors . . 41--44
Timothy N. Miller and
Renji Thomas and
Radu Teodorescu Mitigating the Effects of Process
Variation in Ultra-low Voltage Chip
Multiprocessors using Dual Supply
Voltages and Half-Speed Units . . . . . 45--48
Yong Li and
Rami Melhem and
Alex K. Jones Leveraging Sharing in Second Level
Translation-Lookaside Buffers for Chip
Multiprocessors . . . . . . . . . . . . 49--52
Christina Delimitrou and
Sriram Sankar and
Kushagra Vaid and
Christos Kozyrakis Decoupling Datacenter Storage Studies
from Access to Large-Scale Applications 53--56
Jie Chen and
Guru Venkataramani and
Gabriel Parmer The Need for Power Debugging in the
Multi-Core Environment . . . . . . . . . 57--60
Justin Meza and
Jichuan Chang and
HanBin Yoon and
Onur Mutlu and
Parthasarathy Ranganathan Enabling Efficient and Scalable Hybrid
Memories Using Fine-Granularity DRAM
Cache Management . . . . . . . . . . . . 61--64
Tsahee Zidenberg and
Isaac Keslassy and
Uri Weiser MultiAmdahl: How Should I Divide My
Heterogeneous Chip? . . . . . . . . . . 65--68
Anonymous [Back cover] . . . . . . . . . . . . . . c4--c4
Anonymous [Back inside cover] . . . . . . . . . . c3--c3
Anonymous [Front inside cover] . . . . . . . . . . c2--c2
Kevin Skadron Introducing the New Editor-in-Chief of
the \booktitleIEEE Computer Architecture
Letters . . . . . . . . . . . . . . . . 1--1
Anonymous 2012 Annual Index . . . . . . . . . . . 1--4
Lieven Eeckhout A Message from the New Editor-in-Chief
and Introduction of New Associate
Editors . . . . . . . . . . . . . . . . 2--2
J. Martinez A Message from the New Editor-in-Chief
and Introduction of New Associate
Editors . . . . . . . . . . . . . . . . 2--4
Arash Tavakkol and
Mohammad Arjomand and
Hamid Sarbazi-Azad Network-on-SSD: a Scalable and
High-Performance Communication Design
Paradigm for SSDs . . . . . . . . . . . 5--8
Guang Sun and
Chia-Wei Chang and
Bill Lin A New Worst-Case Throughput Bound for
Oblivious Routing in Odd Radix Mesh
Network . . . . . . . . . . . . . . . . 9--12
I. Burak Karsli and
Pedro Reviriego and
M. Fatih Balli and
O\uguz Ergin and
J. A. Maestro Enhanced Duplication: a Technique to
Correct Soft Errors in Narrow Values . . 13--16
Michael Lyons and
Gu-Yeon Wei and
David Brooks Shrink-Fit: a Framework for Flexible
Accelerator Sizing . . . . . . . . . . . 17--20
Nam Duong and
Alexander V. Veidenbaum Compiler-Assisted, Selective
Out-Of-Order Commit . . . . . . . . . . 21--24
Siddharth Nilakantan and
Steven Battle and
Mark Hempstead Metrics for Early-Stage Modeling of
Many-Accelerator Architectures . . . . . 25--28
Christina Delimitrou and
Christos Kozyrakis The Netflix Challenge: Datacenter
Edition . . . . . . . . . . . . . . . . 29--32
Anonymous 2012 reviewers list . . . . . . . . . . 33--34
Anonymous IEEE Open Access Publishing . . . . . . 35--35
Anonymous \booktitleIEEE Transactions Newsletter 36--36
J. F. Martinez Editorial . . . . . . . . . . . . . . . 37--38
Xun Jian and
John Sartori and
Henry Duwe and
Rakesh Kumar High Performance, Energy Efficient
Chipkill Correct Memory with
Multidimensional Parity . . . . . . . . 39--42
Rakan Maddah and
Sangyeun Cho and
Rami Melhem Data Dependent Sparing to Manage
Better-Than-Bad Blocks . . . . . . . . . 43--46
Hanjoon Kim and
Yonggon Kim and
John Kim Clumsy Flow Control for High-Throughput
Bufferless On-Chip Networks . . . . . . 47--50
Yi Kai and
Yi Wang and
Bin Liu GreenRouter: Reducing Power by
Innovating Router's Architecture . . . . 51--54
Yongsoo Joo and
Sangsoo Park A Hybrid PRAM and STT--RAM Cache
Architecture for Extending the Lifetime
of PRAM Caches . . . . . . . . . . . . . 55--58
Emily Blem and
Hadi Esmaeilzadeh and
Renee St Amant and
Karthikeyan Sankaralingam and
Doug Burger Multicore Model from Abstract Single
Core Inputs . . . . . . . . . . . . . . 59--62
Pierre Michaud Demystifying Multicore Throughput
Metrics . . . . . . . . . . . . . . . . 63--66
Priyanka Tembey and
Augusto Vega and
Alper Buyuktosunoglu and
Dilma Da Silva and
Pradip Bose SMT Switch: Software Mechanisms for
Power Shifting . . . . . . . . . . . . . 67--70
Anonymous IEEE Open Access Publishing . . . . . . 71--71
Anonymous Stay Connected to the IEEE Computer
Society . . . . . . . . . . . . . . . . 72--72
Anonymous [Back cover] . . . . . . . . . . . . . . c4--c4
Anonymous [Back inside cover] . . . . . . . . . . c3--c3
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Anonymous [Front inside cover] . . . . . . . . . . c2--c2
Angelos Arelakis and
Per Stenström A Case for a Value-Aware Cache . . . . . 1--4
Zheng Chen and
Huaxi Gu and
Yintang Yang and
Luying Bai and
Hui Li A Power Efficient and Compact Optical
Interconnect for Network-on-Chip . . . . 5--8
Emilio G. Cota and
Paolo Mantovani and
Michele Petracca and
Mario R. Casu and
Luca P. Carloni Accelerator Memory Reuse in the Dark
Silicon Era . . . . . . . . . . . . . . 9--12
Yu-Liang Chou and
Shaoshan Liu and
Eui-Young Chung and
Jean-Luc Gaudiot An Energy and Performance Efficient DVFS
Scheme for Irregular Parallel
Divide-and-Conquer Algorithms on the
Intel SCC . . . . . . . . . . . . . . . 13--16
Nadav Rotem and
Yosi Ben Asher Block Unification IF-conversion for High
Performance Architectures . . . . . . . 17--20
Aleksandar Ilic and
Frederico Pratas and
Leonel Sousa Cache-aware Roofline model: Upgrading
the loft . . . . . . . . . . . . . . . . 21--24
Rotem Efraim and
Ran Ginosar and
C. Weiser and
Avi Mendelson Energy Aware Race to Halt: a Down to
EARtH Approach for Platform Energy
Management . . . . . . . . . . . . . . . 25--28
Yaman Çakmakçi and
O\uguz Ergin Exploiting Virtual Addressing for
Increasing Reliability . . . . . . . . . 29--32
Yuhao Zhu and
Aditya Srikanth and
Jingwen Leng and
Vijay Janapa Reddi Exploiting Webpage Characteristics for
Energy-Efficient Mobile Web Browsing . . 33--36
Amir Morad and
Tomer Y. Morad and
Leonid Yavits and
Ran Ginosar and
Uri Weiser Generalized MultiAmdahl: Optimization of
Heterogeneous Multi-Accelerator SoC . . 37--40
Shahar Kvatinsky and
Yuval H. Nacson and
Yoav Etsion and
Eby G. Friedman and
Avinoam Kolodny and
Uri C. Weiser Memristor-Based Multithreading . . . . . 41--44
Joseph G. Wingbermuehle and
Ron K. Cytron and
Roger D. Chamberlain Optimization of Application-Specific
Memories . . . . . . . . . . . . . . . . 45--48
Yunlong Xu and
Rui Wang and
Nilanjan Goswami and
Tao Li and
Depei Qian Software Transactional Memory for GPU
Architectures . . . . . . . . . . . . . 49--52
Keun Sup Shim and
Mieszko Lis and
Omer Khan and
Srinivas Devadas Thread Migration Prediction for
Distributed Shared Caches . . . . . . . 53--56
Anonymous Table of Contents . . . . . . . . . . . C1--C4
Anonymous \booktitleIEEE Transactions on Pattern
Analysis and Machine Intelligence
Editorial Board . . . . . . . . . . . . C2--C2
Anonymous \booktitleIEEE Transactions on Pattern
Analysis and Machine Intelligence
Information for Authors . . . . . . . . C3--C3
Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Maysam Lavasani and
Hari Angepat and
Derek Chiou An FPGA-based In-Line Accelerator for
Memcached . . . . . . . . . . . . . . . 57--60
Xiang Song and
Jian Yang and
Haibo Chen Architecting Flash-based Solid-State
Drive for High-performance I/O
Virtualization . . . . . . . . . . . . . 61--64
Carole-Jean Wu Architectural Thermal Energy Harvesting
Opportunities for Sustainable Computing 65--68
Leonid Yavits and
Amir Morad and
Ran Ginosar Cache Hierarchy Optimization . . . . . . 69--72
Sadegh Yazdanshenas and
Marzieh Ranjbar Pirbasti and
Mahdi Fazeli and
Ahmad Patooghy Coding Last Level STT-RAM Cache For High
Endurance And Low Power . . . . . . . . 73--76
Jan Kasper Martinsen and
Hakan Grahn and
Anders Isberg Heuristics for Thread-Level Speculation
in Web Applications . . . . . . . . . . 77--80
Vivek S. Nandakumar and
Ma\lgorzata Marek-Sadowska On Optimal Kernel Size for Integrated
CPU--GPUs --- a Case Study . . . . . . . 81--84
Qixiao Liu and
Victor Jimenez and
Miquel Moreto and
Jaume Abella and
Francisco J. Cazorla and
Mateo Valero Per-task Energy Accounting in Computing
Systems . . . . . . . . . . . . . . . . 85--88
Hamid Mahmoodi and
Sridevi Srinivasan Lakshmipuram and
Manish Arora and
Yashar Asgarieh and
Houman Homayoun and
Bill Lin and
Dean M. Tullsen Resistive Computation: a Critique . . . 89--92
Stijn Eyerman and
Lieven Eeckhout Restating the Case for Weighted-IPC
Metrics to Evaluate Multiprogram
Workload Performance . . . . . . . . . . 93--96
Sonya R. Wolff and
Ronald D. Barnes Revisiting Using the Results of
Pre-Executed Instructions in Runahead
Processors . . . . . . . . . . . . . . . 97--100
Youngsok Kim and
Jaewon Lee and
Donggyu Kim and
Jangwoo Kim ScaleGPU: GPU Architecture for
Memory-Unaware GPU Programming . . . . . 101--104
Sriram Sankar and
Sudhanva Gurumurthi Soft Failures in Large Datacenters . . . 105--108
Daehoon Kim and
Hwanju Kim and
Jaehyuk Huh vCache: Providing a Transparent View of
the LLC in Virtualized Environments . . 109--112
Anonymous Table of Contents . . . . . . . . . . . C1--C1
Anonymous \booktitleIEEE Computer Architecture
Letters Editorial Board . . . . . . . . C2--C2
Anonymous \booktitleIEEE Computer Architecture
Letters Information for Authors . . . . C3--C3
Anonymous IEEE Computer Society [advertisement] C4--C4
Jianwei Liao and
Fengxiang Zhang and
Li Li and
Guoqiang Xiao Adaptive Wear-Leveling in Flash-Based
Memory . . . . . . . . . . . . . . . . . 1--4
Anonymous 2014 Index \booktitleIEEE Computer
Architecture Letters Vol. 13 . . . . . . 1--5
Jie Chen and
Guru Venkataramani A Hardware-Software Cooperative Approach
for Application Energy Profiling . . . . 5--8
Dae-Hyun Kim and
Prashant J. Nair and
Moinuddin K. Qureshi Architectural Support for Mitigating Row
Hammering in DRAM Memories . . . . . . . 9--12
Ralph Nathan and
Daniel J. Sorin Argus-G: Comprehensive, Low-Cost Error
Detection for GPGPU Cores . . . . . . . 13--16
Seongil O and
Sanghyuk Kwon and
Young Hoon Son and
Yujin Park and
Jung Ho Ahn CIDR: a Cache Inspired Area-Efficient
DRAM Resilience Architecture against
Permanent Faults . . . . . . . . . . . . 17--20
O. Seongil and
Sanghyuk Kwon and
Young Hoon Son and
Yujin Park and
Jung Ho Ahn CIDR: a Cache Inspired Area-Efficient
DRAM Resilience Architecture against
Permanent Faults . . . . . . . . . . . . 17--20
Ujjwal Gupta and
Umit Y. Ogras Constrained Energy Optimization in
Heterogeneous Platforms Using
Generalized Scaling Models . . . . . . . 21--25
Amin Farmahini-Farahani and
Jung Ho Ahn and
Katherine Morrow and
Nam Sung Kim DRAMA: an Architecture for Accelerated
Processing Near Memory . . . . . . . . . 26--29
Trevor E. Carlson and
Siddharth Nilakantan and
Mark Hempstead and
Wim Heirman Epoch Profiles: Microarchitecture-Based
Application Analysis and Optimization 30--33
Jason Power and
Joel Hestness and
Marc S. Orr and
Mark D. Hill and
David A. Wood gem5-gpu: a Heterogeneous CPU--GPU
Simulator . . . . . . . . . . . . . . . 34--36
Dilan Manatunga and
Joo Hwan Lee and
Hyesoon Kim Hardware Support for Safe Execution of
Native Client Applications . . . . . . . 37--40
Longjun Liu and
Chao Li and
Hongbin Sun and
Yang Hu and
Jingmin Xin and
Nanning Zheng and
Tao Li Leveraging Heterogeneous Power for
Improving Datacenter Efficiency and
Resiliency . . . . . . . . . . . . . . . 41--45
Rui Wang and
Wangyuan Zhang and
Tao Li and
Depei Qian Leveraging Non-Volatile Storage to
Achieve Versatile Cache Optimizations 46--49
Milad Mohammadi and
Song Han and
Tor M. Aamodt and
William J. Dally On-Demand Dynamic Branch Prediction . . 50--53
Leonid Azriel and
Avi Mendelson and
Uri Weiser Peripheral Memory: a Technique for
Fighting Memory Bandwidth Bottleneck . . 54--57
Zhaoguo Wang and
Han Yi and
Ran Liu and
Mingkai Dong and
Haibo Chen Persistent Transactional Memory . . . . 58--61
Enric Gibert and
Raul Martínez and
Carlos Madriles and
Josep M. Codina Profiling Support for Runtime Managed
Code: Next Generation Performance
Monitoring Units . . . . . . . . . . . . 62--65
Daecheol You and
Ki-Seok Chung Quality of Service-Aware Dynamic Voltage
and Frequency Scaling for Embedded GPUs 66--69
Sungjin Lee and
Jihong Kim and
Arvind Refactored Design of I/O Architecture
for Flash Storage . . . . . . . . . . . 70--74
Fengkai Yuan and
Zhenzhou Ji and
Suxia Zhu Set-Granular Regional Distributed
Cooperative Caching . . . . . . . . . . 75--78
Junghee Lee and
Youngjae Kim and
Jongman Kim and
Galen M. Shipman Synchronous I/O Scheduling of
Independent Write Caches for an Array of
SSDs . . . . . . . . . . . . . . . . . . 79--82
Anonymous Rock Stars of Wearables . . . . . . . . 83--83
Anonymous Rock Stars of Cybersecurity 2015
Conference . . . . . . . . . . . . . . . 84--84
Anonymous Table of Contents . . . . . . . . . . . C1--C1
Anonymous \booktitleIEEE Computer Architecture
Letters Editorial Board . . . . . . . . C2--C2
Anonymous \booktitleIEEE Computer Architecture
Letters Information for Authors . . . . C3--C3
Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Qingchuan Shi and
Henry Hoffmann and
Omer Khan A Cross-Layer Multicore Architecture to
Tradeoff Program Accuracy and Resilience
Overheads . . . . . . . . . . . . . . . 85--89
Zhong Zheng and
Zhiying Wang and
Mikko Lipasti Adaptive Cache and Concurrency
Allocation on GPGPUs . . . . . . . . . . 90--93
Tony Nowatzki and
Venkatraman Govindaraju and
Karthikeyan Sankaralingam A Graph-Based Program Representation for
Analyzing Hardware Specialization
Approaches . . . . . . . . . . . . . . . 94--98
Seung Hun Kim and
Dohoon Kim and
Changmin Lee and
Won Seob Jeong and
Won Woo Ro and
Jean-Luc Gaudiot A Performance-Energy Model to Evaluate
Single Thread Execution Acceleration . . 99--102
William Song and
Saibal Mukhopadhyay and
Sudhakar Yalamanchili Architectural Reliability: Lifetime
Reliability Characterization and
Management of Many-Core Processors . . . 103--106
Pavan Poluri and
Ahmed Louri A Soft Error Tolerant Network-on-Chip
Router Pipeline for Multi-Core Systems 107--110
Canwen Xiao and
Yue Yang and
Jianwen Zhu A Sufficient Condition for Deadlock-Free
Adaptive Routing in Mesh Networks . . . 111--114
Sparsh Mittal and
Jeffrey S. Vetter AYUSH: a Technique for Extending
Lifetime of SRAM--NVM Hybrid Caches . . 115--118
Rajit Manohar Comparing Stochastic and Deterministic
Computing . . . . . . . . . . . . . . . 119--122
Bon-Keun Seo and
Seungryoul Maeng and
Joonwon Lee and
Euiseong Seo DRACO: a Deduplicating FTL for Tangible
Extra Capacity . . . . . . . . . . . . . 123--126
Vivek Seshadri and
Kevin Hsieh and
Amirali Boroum and
Donghyuk Lee and
Michael A. Kozuch and
Onur Mutlu and
Phillip B. Gibbons and
Todd C. Mowry Fast Bulk Bitwise AND and OR in DRAM . . 127--131
Muhammad Shoaib Bin Altaf and
David A. Wood LogCA: a Performance Model for Hardware
Accelerators . . . . . . . . . . . . . . 132--135
Dionysios Diamantopoulos and
Sotirios Xydis and
Kostas Siozios and
Dimitrios Soudris Mitigating Memory-Induced Dark Silicon
in Many-Accelerator Architectures . . . 136--139
Matthew Poremba and
Tao Zhang and
Yuan Xie NVMain 2.0: a User-Friendly Memory
Simulator to Model (Non-) Volatile
Memory Systems . . . . . . . . . . . . . 140--143
Hans Vandierendonck and
Ahmad Hassan and
Dimitrios S. Nikolopoulos On the Energy-Efficiency of
Byte-Addressable Non-Volatile Memory . . 144--147
Leonid Yavits and
Shahar Kvatinsky and
Amir Morad and
Ran Ginosar Resistive Associative Processor . . . . 148--151
Suk Chan Kang and
Chrysostomos Nicopoulos and
Ada Gavrilovska and
Jongman Kim Subtleties of Run-Time Virtual Address
Stacks . . . . . . . . . . . . . . . . . 152--155
Dimitrios Rodopoulos and
Francky Catthoor and
Dimitrios Soudris Tackling Performance Variability Due to
RAS Mechanisms with PID-Controlled DVFS 156--159
Nikola Markovic and
Daniel Nemirovsky and
Osman Unsal and
Mateo Valero and
Adrian Cristal Thread Lock Section-Aware Scheduling on
Asymmetric Single-ISA Multi-Core . . . . 160--163
Gennady Pekhimenko and
Evgeny Bolotin and
Mike O'Connor and
Onur Mutlu and
Todd C. Mowry and
Stephen W. Keckler Toggle-Aware Compression for GPUs . . . 164--168
Anonymous Table of Contents . . . . . . . . . . . C1--C1
Anonymous \booktitleIEEE Computer Architecture
Letters Editorial Board . . . . . . . . C2--C2
Anonymous \booktitleIEEE Computer Architecture
Letters Information for Authors . . . . C3--C3
Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Wo-Tak Wu and
Ahmed Louri A Methodology for Cognitive NoC Design 1--4
Anonymous 2015 Index \booktitleIEEE Computer
Architecture Letters Vol. 14 . . . . . . 1--6
Anonymous 2015 Index \booktitleIEEE Computer
Architecture Letters Vol. 14 . . . . . . 1--6
Seyyed Hossein Seyyedaghaei Rezaei and
Abbas Mazloumi and
Mehdi Modarressi and
Pejman Lotfi-Kamran Dynamic Resource Sharing for
High-Performance $3$-D Networks-on-Chip 5--8
Miguel Gorgues and
Jose Flich End-Point Congestion Filter for Adaptive
Routing with Congestion-Insensitive
Performance . . . . . . . . . . . . . . 9--12
Biswabandan Panda and
Shankar Balachandran Expert Prefetch Prediction: an Expert
Predicting the Usefulness of Hardware
Prefetchers . . . . . . . . . . . . . . 13--16
Abdulaziz Eker and
O\uguz Ergin Exploiting Existing Copies in Register
File for Soft Error Correction . . . . . 17--20
Matthew Maycock and
Simha Sethumadhavan Hardware Enforced Statistical Privacy 21--24
Dongdong Li and
Tor M. Aamodt Inter-Core Locality Aware Memory
Scheduling . . . . . . . . . . . . . . . 25--28
Libei Pu and
Kshitij Doshi and
Ellis Giles and
Peter Varman Non-Intrusive Persistence with a Backend
NVM Controller . . . . . . . . . . . . . 29--32
P. Garcia and
T. Gomes and
J. Monteiro and
A. Tavares and
M. Ekpanyapong On-Chip Message Passing Sub-System for
Embedded Inter-Domain Communication . . 33--36
Minghua Li and
Guancheng Chen and
Qijun Wang and
Yonghua Lin and
Peter Hofstee and
Per Stenstrom and
Dian Zhou PATer: a Hardware Prefetching Automatic
Tuner on IBM POWER8 Processor . . . . . 37--40
Mohammad Alian and
Daehoon Kim and
Nam Sung Kim pd-gem5: Simulation Infrastructure for
Parallel/Distributed Computer Systems 41--44
Yoongu Kim and
Weikun Yang and
Onur Mutlu Ramulator: a Fast and Extensible DRAM
Simulator . . . . . . . . . . . . . . . 45--49
Lena E. Olson and
Simha Sethumadhavan and
Mark D. Hill Security Implications of Third-Party
Accelerators . . . . . . . . . . . . . . 50--53
Bruce Jacob The Case for VLIW--CMP as a Building
Block for Exascale . . . . . . . . . . . 54--57
Marios Kleanthous and
Yiannakis Sazeides and
Emre Ozer and
Chrysostomos Nicopoulos and
Panagiota Nikolaou and
Zacharias Hadjilambrou Toward Multi-Layer Holistic Evaluation
of System Designs . . . . . . . . . . . 58--61
Bhavya K. Daya and
Li-Shiuan Peh and
Anantha P. Chandrakasan Towards High-Performance Bufferless NoCs
with SCEPTER . . . . . . . . . . . . . . 62--65
Anonymous Introducing IEEE Collabratec . . . . . . 66--66
Anonymous Introducing IEEE Collabratec . . . . . . 66--66
Anonymous Experience the Newest and Most Advanced
Thinking in Big Data Analytics . . . . . 67--67
Anonymous \booktitleIEEE Cyber Security . . . . . 68--68
Anonymous Table of Contents . . . . . . . . . . . C1--C1
Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2
Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2
Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3
Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3
Anonymous [Back cover] . . . . . . . . . . . . . . C4--C4
Shuang Liang and
Shouyi Yin and
Leibo Liu and
Yike Guo and
Shaojun Wei A Coarse-Grained Reconfigurable
Architecture for Compute-Intensive
MapReduce Acceleration . . . . . . . . . 69--72
Bo-Cheng Charles Lai and
Luis Garrido Platero and
Hsien-Kai Kuo A Quantitative Method to Data Reuse
Patterns of SIMT Applications . . . . . 73--76
Yaman Çakmakçi and
Will Toms and
Javier Navaridas and
Mikel Lujan Cyclic Power-Gating as an Alternative to
Voltage and Frequency Scaling . . . . . 77--80
Erik Tomusk and
Christophe Dubach and
Michael O'Boyle Diversity: a Design Goal for
Heterogeneous Processors . . . . . . . . 81--84
Milad Hashemi and
Debbie Marr and
Doug Carmean and
Yale N. Patt Efficient Execution of Bursty
Applications . . . . . . . . . . . . . . 85--88
Sudarsun Kannan and
Moinudin Qureshi and
Ada Gavrilovska and
Karsten Schwan Energy Aware Persistence: Reducing the
Energy Overheads of Persistent Memory 89--92
Alejandro Valero and
Negar Miralaei and
Salvador Petit and
Julio Sahuquillo and
Timothy M. Jones Enhancing the L1 Data Cache Design to
Mitigate HCI . . . . . . . . . . . . . . 93--96
Rathijit Sen and
David A. Wood GPGPU Footprint Models to Estimate
per-Core Power . . . . . . . . . . . . . 97--100
Daejin Jung and
Sheng Li and
Jung Ho Ahn Large Pages on Steroids: Small Ideas to
Accelerate Big Memory Applications . . . 101--104
Javier Verdu and
Alex Pajuelo Performance Scalability Analysis of
JavaScript Applications with Web Workers 105--108
Christina Delimitrou and
Christos Kozyrakis Security Implications of Data Mining in
Cloud Scheduling . . . . . . . . . . . . 109--112
Zhenning Wang and
Jun Yang and
Rami Melhem and
Bruce Childers and
Youtao Zhang and
Minyi Guo Simultaneous Multikernel: Fine-Grained
Sharing of GPUs . . . . . . . . . . . . 113--116
Chulian Zhang and
Hamed Tabkhi and
Gunar Schirner Studying Inter-Warp Divergence Aware
Execution on GPUs . . . . . . . . . . . 117--120
Arash Tavakkol and
Pooyan Mehrvarzy and
Hamid Sarbazi-Azad TBM: Twin Block Management Policy to
Enhance the Utilization of Plane-Level
Parallelism in SSDs . . . . . . . . . . 121--124
Bruce Jacob The 2 PetaFLOP, 3 Petabyte, 9 TB/s, 90
kW Cabinet: a System Architecture for
Exascale and Big Data . . . . . . . . . 125--128
He Xiao and
Wen Yueh and
Saibal Mukhopadhyay and
Sudhakar Yalamanchili Thermally Adaptive Cache Access
Mechanisms for 3D Many-Core
Architectures . . . . . . . . . . . . . 129--132
Qi Hu and
Peng Liu and
Michael C. Huang Threads and Data Mapping: Affinity
Analysis for Traffic Reduction . . . . . 133--136
Anonymous Table of Contents . . . . . . . . . . . C1--C1
Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2
Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3
Anonymous Table of contents [back cover] . . . . . C4--C4
Nathan Beckmann and
Daniel Sanchez Cache Calculus: Modeling Caches through
Differential Equations . . . . . . . . . 1--5
Anonymous 2016 Index \booktitleIEEE Computer
Architecture Letters Vol. 15 . . . . . . 1--6
Xin Zhan and
Reza Azimi and
Svilen Kanev and
David Brooks and
Sherief Reda CARB: a C-State Power Management Arbiter
for Latency-Critical Workloads . . . . . 6--9
Dong-Ik Jeon and
Ki-Seok Chung CasHMC: a Cycle-Accurate Simulator for
Hybrid Memory Cube . . . . . . . . . . . 10--13
Hao Wu and
Fangfei Liu and
Ruby B. Lee Cloud Server Benchmark Suite for
Evaluating New Hardware Architectures 14--17
Seyed Mohammad Seyedzadeh and
Alex K. Jones and
Rami Melhem Counter-Based Tree Structure for Row
Hammering Mitigation in DRAM . . . . . . 18--21
Hoda Naghibijouybari and
Nael Abu-Ghazaleh Covert Channels on GPGPUs . . . . . . . 22--25
Wonjun Song and
Hyung-Joon Jung and
Jung Ho Ahn and
Jae W. Lee and
John Kim Evaluation of Performance Unfairness in
NUMA System Architecture . . . . . . . . 26--29
Uri Verner and
Avi Mendelson and
Assaf Schuster Extending Amdahl's Law for Multicores
with Turbo Boost . . . . . . . . . . . . 30--33
Hiroshi Sasaki and
Fang-Hsiang Su and
Teruo Tanimoto and
Simha Sethumadhavan Heavy Tails in Program Structure . . . . 34--37
Liang Feng and
Hao Liang and
Sharad Sinha and
Wei Zhang HeteroSim: a Heterogeneous CPU--FPGA
Simulator . . . . . . . . . . . . . . . 38--41
Xia Zhao and
Yuxi Liu and
Almutaz Adileh and
Lieven Eeckhout LA-LLC: Inter-Core Locality-Aware
Last-Level Cache to Exploit Many-to-Many
Traffic in GPGPUs . . . . . . . . . . . 42--45
Amirali Boroumand and
Saugata Ghose and
Minesh Patel and
Hasan Hassan and
Brandon Lucia and
Kevin Hsieh and
Krishna T. Malladi and
Hongzhong Zheng and
Onur Mutlu LazyPIM: an Efficient Cache Coherence
Mechanism for Processing-in-Memory . . . 46--50
Mark Gottscho and
Mohammed Shoaib and
Sriram Govindan and
Bikash Sharma and
Di Wang and
Puneet Gupta Measuring the Impact of Memory Errors on
Application Performance . . . . . . . . 51--55
Almutaz Adileh and
Stijn Eyerman and
Aamer Jaleel and
Lieven Eeckhout Mind The Power Holes: Sifting Operating
Points in Power-Limited Heterogeneous
Multicores . . . . . . . . . . . . . . . 56--59
Hiroshi Sasaki and
Alper Buyuktosunoglu and
Augusto Vega and
Pradip Bose Mitigating Power Contention: a
Scheduling Based Approach . . . . . . . 60--63
David Gonzalez Marquez and
Adrian Cristal Kestelman and
Esteban Mocskos Mth: Codesigned Hardware/Software
Support for Fine Grain Threads . . . . . 64--67
Tomer Y. Morad and
Gil Shomron and
Mattan Erez and
Avinoam Kolodny and
Uri C. Weiser Optimizing Read-Once Data Flow in
Big-Data Applications . . . . . . . . . 68--71
Ali Yasoubi and
Reza Hojabr and
Mehdi Modarressi Power-Efficient Accelerator Design for
Neural Networks Using Computation Reuse 72--75
Young Hoon Son and
Hyunyoon Cho and
Yuhwan Ro and
Jae W. Lee and
Jung Ho Ahn SALAD: Achieving Symmetric Access
Latency with Asymmetric DRAM
Architecture . . . . . . . . . . . . . . 76--79
Patrick Judd and
Jorge Albericio and
Andreas Moshovos Stripes: Bit-Serial Deep Neural Network
Computing . . . . . . . . . . . . . . . 80--83
Gokul Subramanian Ravi and
Mikko Lipasti Timing Speculation in Multi-Cycle Data
Paths . . . . . . . . . . . . . . . . . 84--87
Samira Khan and
Chris Wilkerson and
Donghyuk Lee and
Alaa R. Alameldeen and
Onur Mutlu A Case for Memory Content-Based
Detection and Mitigation of
Data-Dependent Failures in DRAM . . . . 88--93
Sparsh Mittal and
Jeffrey S. Vetter and
Lei Jiang Addressing Read-Disturbance Issue in
STT--RAM by Data Compression and
Selective Duplication . . . . . . . . . 94--98
Mohammad Bakhshalipour and
Pejman Lotfi-Kamran and
Hamid Sarbazi-Azad An Efficient Temporal Data Prefetcher
for L1 Caches . . . . . . . . . . . . . 99--102
Jorge A. Martínez and
Juan Antonio Maestro and
Pedro Reviriego A Scheme to Improve the Intrinsic Error
Detection of the Instruction Set
Architecture . . . . . . . . . . . . . . 103--106
Rujia Wang and
Sparsh Mittal and
Youtao Zhang and
Jun Yang Decongest: Accelerating Super-Dense PCM
Under Write Disturbance by Hot Page
Remapping . . . . . . . . . . . . . . . 107--110
Teruo Tanimoto and
Takatsugu Ono and
Koji Inoue and
Hiroshi Sasaki Enhanced Dependence Graph Model for
Critical Path Analysis on Modern
Out-of-Order Processors . . . . . . . . 111--114
Junghee Lee and
Kalidas Ganesh and
Hyuk-Jun Lee and
Youngjae Kim FESSD: a Fast Encrypted SSD Employing
On-Chip Access-Control Memory . . . . . 115--118
Abdel-Hameed A. Badawy and
Donald Yeung Guiding Locality Optimizations for Graph
Computations via Reuse Distance Analysis 119--122
Yue Zha and
Jing Li IMEC: a Fully Morphable In-Memory
Computing Fabric Enabled by Resistive
Crossbar . . . . . . . . . . . . . . . . 123--126
Li-Jhan Chen and
Hsiang-Yun Cheng and
Po-Han Wang and
Chia-Lin Yang Improving GPGPU Performance via Cache
Locality Aware Thread Block Scheduling 127--131
James Garland and
David Gregg Low Complexity Multiply Accumulate Unit
for Weight-Sharing Convolutional Neural
Networks . . . . . . . . . . . . . . . . 132--135
Myoungsoo Jung NearZero: an Integration of Phase Change
Memory with Multi-Core Coprocessor . . . 136--140
Leonid Yavits and
Uri Weiser and
Ran Ginosar Resistive Address Decoder . . . . . . . 141--144
Madhavan Manivannan and
Miquel Peric\`as and
Vassilis Papaefstathiou and
Per Stenström Runtime-Assisted Global Cache Management
for Task-Based Parallel Programs . . . . 145--148
Arthur Perais and
Andre Seznec Storage-Free Memory Dependency
Prediction . . . . . . . . . . . . . . . 149--152
Amirhossein Mirhosseini and
Aditya Agrawal and
Josep Torrellas Survive: Pointer-Based In-DRAM
Incremental Checkpointing for Low-Cost
Data Persistence and Rollback-Recovery 153--157
Sandro Pinto and
Jorge Pereira and
Tiago Gomes and
Mongkol Ekpanyapong and
Adriano Tavares Towards a TrustZone-Assisted Hypervisor
for Real-Time Embedded Systems . . . . . 158--161
Trevor E. Carlson and
Kim-Anh Tran and
Alexandra Jimborean and
Konstantinos Koukos and
Magnus Själander and
Stefanos Kaxiras Transcending Hardware Limits with
Software Out-of-Order Processing . . . . 162--165
Hossein Ahmadvand and
Maziar Goudarzi Using Data Variety for Efficient
Progressive Big Data Processing in
Warehouse-Scale Computers . . . . . . . 166--169
Dan Zhang and
Xiaoyu Ma and
Derek Chiou Worklist-Directed Prefetching . . . . . 170--173
Alberto Scionti and
Somnath Mazumdar and
Stephane Zuckerman Enabling Massive Multi-Threading with
Fast Hashing . . . . . . . . . . . . . . 1--4
Anonymous 2017 Index \booktitleIEEE Computer
Architecture Letters Vol. 16 . . . . . . 1--6
Dong-Ik Jeon and
Kyeong-Bin Park and
Ki-Seok Chung HMC-MAC: Processing-in Memory
Architecture for Multiply--Accumulate
Operations with Hybrid Memory Cube . . . 5--8
Sam Van den Steen and
Lieven Eeckhout Modeling Superscalar Processor
Memory-Level Parallelism . . . . . . . . 9--12
Srdjan Durkovic and
Zoran Cica Birkhoff--von Neumann Switch Based on
Greedy Scheduling . . . . . . . . . . . 13--16
Binh Pham and
Derek Hower and
Abhishek Bhattacharjee and
Trey Cain TLB Shootdown Mitigation for Low-Power
Many-Core Servers with L1 Virtual Caches 17--20
Leonid Yavits and
Ran Ginosar Accelerator for Sparse Machine Learning 21--24
Eleftherios-Iordanis Christoforidis and
Sotirios Xydis and
Dimitrios Soudris CF-TUNE: Collaborative Filtering
Auto-Tuning for Energy Efficient
Many-Core Processors . . . . . . . . . . 25--28
Amjad F. Almatrood and
Harpreet Singh Design of Generalized Pipeline Cellular
Array in Quantum-Dot Cellular Automata 29--32
Yue Zha and
Jing Li CMA: a Reconfigurable Complex Matching
Accelerator for Wire-Speed Network
Intrusion Detection . . . . . . . . . . 33--36
Myoungsoo Jung and
Jie Zhang and
Ahmed Abulila and
Miryeong Kwon and
Narges Shahidi and
John Shalf and
Nam Sung Kim and
Mahmut Kandemir SimpleSSD: Modeling Solid State Drives
for Holistic System Simulation . . . . . 37--41
Zamshed Chowdhury and
Jonathan D. Harms and
S. Karen Khatamifard and
Masoud Zabihi and
Yang Lv and
Andrew P. Lyle and
Sachin S. Sapatnekar and
Ulya R. Karpuzcu and
Jian-Ping Wang Efficient In-Memory Processing Using
Spintronics . . . . . . . . . . . . . . 42--46
Mohammadamin Ajdari and
Pyeongsu Park and
Dongup Kwon and
Joonsung Kim and
Jangwoo Kim A Scalable HW-Based Inline Deduplication
for SSD Arrays . . . . . . . . . . . . . 47--50
Morteza Hoseinzadeh Flow-Based Simulation Methodology . . . 51--54
Stijn Eyerman and
Wim Heirman and
Kristof Du Bois and
Ibrahim Hur Multi-Stage CPI Stacks . . . . . . . . . 55--58
Guowei Zhang and
Daniel Sanchez Leveraging Hardware Caches for
Memoization . . . . . . . . . . . . . . 59--63
Armin Vakil-Ghahani and
Sara Mahdizadeh-Shahri and
Mohammad-Reza Lotfi-Namin and
Mohammad Bakhshalipour and
Pejman Lotfi-Kamran and
Hamid Sarbazi-Azad Cache Replacement Policy Based on
Expected Hit Count . . . . . . . . . . . 64--67
Zacharias Hadjilambrou and
Shidhartha Das and
Marco A. Antoniades and
Yiannakis Sazeides Sensing CPU Voltage Noise Through
Electromagnetic Emanations . . . . . . . 68--71
Daejin Jung and
Sunjung Lee and
Wonjong Rhee and
Jung Ho Ahn Partitioning Compute Units in CNN
Acceleration for Statistical Memory
Traffic Shaping . . . . . . . . . . . . 72--75
Joshua San Miguel and
Karthik Ganesan and
Mario Badr and
Natalie Enright Jerger The EH Model: Analytical Exploration of
Energy-Harvesting Architectures . . . . 76--79
Jihun Kim and
Joonsung Kim and
Pyeongsu Park and
Jong Kim and
Jangwoo Kim SSD Performance Modeling Using
Bottleneck Analysis . . . . . . . . . . 80--83
Kevin Angstadt and
Jack Wadden and
Vinh Dang and
Ted Xie and
Dan Kramp and
Westley Weimer and
Mircea Stan and
Kevin Skadron MNCaRT: an Open-Source,
Multi-Architecture Automata-Processing
Research and Execution Ecosystem . . . . 84--87
Hao Zheng and
Ahmed Louri EZ-Pass: an Energy &
Performance-Efficient Power-Gating
Router Architecture for Scalable NoCs 88--91
Leila Delshadtehrani and
Schuyler Eldridge and
Sadullah Canakci and
Manuel Egele and
Ajay Joshi Nile: a Programmable Monitoring
Coprocessor . . . . . . . . . . . . . . 92--95
Eojin Lee and
Sukhan Lee and
G. Edward Suh and
Jung Ho Ahn TWiCe: Time Window Counter Based Row
Refresh to Prevent Row-Hammering . . . . 96--99
Joydeep Rakshit and
Kartik Mohanram LEO: Low Overhead Encryption ORAM for
Non-Volatile Memories . . . . . . . . . 100--104
Sang Wook Stephen Do and
Michel Dubois Core Reliability: Leveraging Hardware
Transactional Memory . . . . . . . . . . 105--108
Manolis Kaliorakis and
Athanasios Chatzidimitriou and
George Papadimitriou and
Dimitris Gizopoulos Statistical Analysis of Multicore CPUs
Operation in Scaled Voltage Conditions 109--112
Soroosh Khoram and
Yue Zha and
Jing Li An Alternative Analytical Approach to
Associative Processing . . . . . . . . . 113--116
S. Karen Khatamifard and
M. Hassan Najafi and
Ali Ghoreyshi and
Ulya R. Karpuzcu and
David J. Lilja On Memory System Design for Stochastic
Computing . . . . . . . . . . . . . . . 117--121
Dimitris Mouris and
Nektarios Georgios Tsoutsos and
Michail Maniatakos TERMinator Suite: Benchmarking
Privacy-Preserving Architectures . . . . 122--125
Esha Choukse and
Mattan Erez and
Alaa Alameldeen CompressPoints: an Evaluation
Methodology for Compressed Memory
Systems . . . . . . . . . . . . . . . . 126--129
Seikwon Kim and
Wonsang Kwak and
Changdae Kim and
Jaehyuk Huh Zebra Refresh: Value Transformation for
Zero-Aware DRAM Refresh Reduction . . . 130--133
Youngeun Kwon and
Minsoo Rhu A Case for Memory-Centric HPC System
Architecture for Training Deep Neural
Networks . . . . . . . . . . . . . . . . 134--138
Engin Ipek and
Florian Longnos and
Shihai Xiao and
Wei Yang Bit-Level Load Balancing: a New
Technique for Improving the Write
Throughput of Deeply Scaled STT-MRAM . . 139--142
Konstantinos Iliakis and
Sotirios Xydis and
Dimitrios Soudris Decoupled MapReduce for Shared-Memory
Multi-Core Architectures . . . . . . . . 143--146
Zhaoshi Li and
Leibo Liu and
Yangdong Deng and
Shouyi Yin and
Shaojun Wei Breaking the Synchronization Bottleneck
with Reconfigurable Transactional
Execution . . . . . . . . . . . . . . . 147--150
Engin Ipek and
Florian Longnos and
Shihai Xiao and
Wei Yang Vertical Writes: Closing the Throughput
Gap between Deeply Scaled STT-MRAM and
DRAM . . . . . . . . . . . . . . . . . . 151--154
Yu Gan and
Christina Delimitrou The Architectural Implications of Cloud
Microservices . . . . . . . . . . . . . 155--158
Ofir Shwartz and
Yitzhak Birk Distributed Memory Integrity Trees . . . 159--162
Ji-Tae Yun and
Su-Kyung Yoon and
Jeong-Geun Kim and
Bernd Burgstaller and
Shin-Dug Kim Regression Prefetcher with Preprocessing
for DRAM--PCM Hybrid Main Memory . . . . 163--166
Jiangwei Zhang and
Donald Kline, Jr. and
Long Fang and
Rami Melhem and
Alex K. Jones RETROFIT: Fault-Aware Wear Leveling . . 167--170
Neeraj Kulkarni and
Feng Qi and
Christina Delimitrou Leveraging Approximation to Improve
Datacenter Resource Efficiency . . . . . 171--174
Laith M. AlBarakat and
V. Paul Gratz and
Daniel A. Jiménez MTB-Fetch: Multithreading Aware Hardware
Prefetching for Chip Multiprocessors . . 175--178
Thiruvengadam Vijayaraghavan and
Amit Rajesh and
Karthikeyan Sankaralingam MPU--BWM: Accelerating Sequence
Alignment . . . . . . . . . . . . . . . 179--182
Sander De Pestel and
Sam Van den Steen and
Shoaib Akram and
Lieven Eeckhout RPPM: Rapid Performance Prediction of
Multithreaded Applications on Multicore
Hardware . . . . . . . . . . . . . . . . 183--186
Wenyi Zhao and
Quan Chen and
Minyi Guo KSM: Online Application-Level
Performance Slowdown Prediction for
Spatial Multitasking GPGPU . . . . . . . 187--191
Shivam Swami and
Kartik Mohanram ARSENAL: Architecture for Secure
Non-Volatile Memories . . . . . . . . . 192--196
Abanti Basak and
Xing Hu and
Shuangchen Li and
Sang Min Oh and
Yuan Xie Exploring Core and Cache Hierarchy
Bottlenecks in Graph Processing
Workloads . . . . . . . . . . . . . . . 197--200
S. Karen Khatamifard and
Longfei Wang and
Selcuk Köse and
Ulya R. Karpuzcu A New Class of Covert Channels
Exploiting Power Management
Vulnerabilities . . . . . . . . . . . . 201--204
Sushant Kondguli and
Michael Huang Bootstrapping: Using SMT Hardware to
Improve Single-Thread Performance . . . 205--208
Donald Kline, Jr. and
Rami Melhem and
Alex K. Jones Counter Advance for Reliable Encryption
in Phase Change Memory . . . . . . . . . 209--212
Debiprasanna Sahoo and
Swaraj Sha and
Manoranjan Satpathy and
Madhu Mutyam ReDRAM: a Reconfigurable DRAM Cache for
GPGPUs . . . . . . . . . . . . . . . . . 213--216
Susumu Mashimo and
Ryota Shioya and
Koji Inoue VMOR: Microarchitectural Support for
Operand Access in an Interpreter . . . . 217--220
Seungwon Min and
Mohammad Alian and
Wen-Mei Hwu and
Nam Sung Kim Semi-Coherent DMA: an Alternative I/O
Coherency Management for Embedded
Systems . . . . . . . . . . . . . . . . 221--224
Negin Nematollahi and
Mohammad Sadrosadati and
Hajar Falahati and
Marzieh Barkhordar and
Hamid Sarbazi-Azad Neda: Supporting Direct Inter-Core
Neighbor Data Exchange in GPUs . . . . . 225--229
Hamza Omar and
Halit Dogan and
Brian Kahne and
Omer Khan Multicore Resource Isolation for
Deterministic, Resilient and Secure
Concurrent Execution of Safety-Critical
Applications . . . . . . . . . . . . . . 230--234
Farzaneh Zokaee and
Hamid R. Zarandi and
Lei Jiang AligneR: a Process-in-Memory
Architecture for Short Read Alignment in
ReRAMs . . . . . . . . . . . . . . . . . 235--238
Qian Lou and
Lei Jiang BRAWL: a Spintronics-Based Portable
Basecalling-in-Memory Architecture for
Nanopore Genome Sequencing . . . . . . . 239--242
Donghyun Min and
Donggyu Park and
Jinwoo Ahn and
Ryan Walker and
Junghee Lee and
Sungyong Park and
Youngjae Kim Amoeba: an Autonomous Backup and
Recovery SSD for Ransomware Attack
Defense . . . . . . . . . . . . . . . . 243--246
Chinam Kim and
Hyukjun Lee A High-Bandwidth PCM-Based Memory System
for Highly Available IP Routing Table
Lookup . . . . . . . . . . . . . . . . . 246--249
Jiho Kim and
Jehee Cha and
Jason Jong Kyu Park and
Dongsuk Jeon and
Yongjun Park Improving GPU Multitasking Efficiency
Using Dynamic Resource Sharing . . . . . 1--5
Anonymous 2018 Index \booktitleIEEE Computer
Architecture Letters Vol. 17 . . . . . . 1--8
Sheng Xu and
Xiaoming Chen and
Ying Wang and
Yinhe Han and
Xuehai Qian and
Xiaowei Li PIMSim: a Flexible and Detailed
Processing-in-Memory Simulator . . . . . 6--9
Gil Shomron and
Uri Weiser Spatial Correlation and Value Prediction
in Convolutional Neural Networks . . . . 10--13
Ujjwal Gupta and
Sumit K. Mandal and
Manqing Mao and
Chaitali Chakrabarti and
Umit Y. Ogras A Deep Q-Learning Approach for Dynamic
Management of Heterogeneous Processors 14--17
Samuel Rogers and
Joshua Slycord and
Ronak Raheja and
Hamed Tabkhi Scalable LLVM-Based Accelerator Modeling
in gem5 . . . . . . . . . . . . . . . . 18--21
Berkin Akin and
Alaa R. Alameldeen A Case For Asymmetric Processing in
Memory . . . . . . . . . . . . . . . . . 22--25
Konstantinos Tovletoglou and
Lev Mukhanov and
Dimitrios S. Nikolopoulos and
Georgios Karakonstantis Shimmer: Implementing a
Heterogeneous-Reliability DRAM Framework
on a Commodity Server . . . . . . . . . 26--29
Chanchal Kumar and
Sidharth Singh and
Gregory T. Byrd Hybrid Remote Access Protocol . . . . . 30--33
Yicheng Wang and
Yang Liu and
Peiyun Wu and
Zhao Zhang Detect DRAM Disturbance Error by Using
Disturbance Bin Counters . . . . . . . . 34--37
Xinfeng Xie and
Xing Hu and
Peng Gu and
Shuangchen Li and
Yu Ji and
Yuan Xie NNBench-X: Benchmarking and
Understanding Neural Network Workloads
for Accelerator Designs . . . . . . . . 38--42
Asif Ali Khan and
Fazal Hameed and
Robin Bläsing and
Stuart Parkin and
Jeronimo Castrillon RTSim: a Cycle-Accurate Simulator for
Racetrack Memories . . . . . . . . . . . 43--46
Yiming Gan and
Yuxian Qiu and
Jingwen Leng and
Yuhao Zhu SVSoC: Speculative Vision
Systems-on-a-Chip . . . . . . . . . . . 47--50
Ting-Ru Lin and
Yunfan Li and
Massoud Pedram and
Lizhong Chen Design Space Exploration of Memory
Controller Placement in Throughput
Processors with Deep Learning . . . . . 51--54
Yehia Arafa and
Abdel-Hameed A. Badawy and
Gopinath Chennupati and
Nandakishore Santhi and
Stephan Eidenbenz PPT--GPU: Scalable GPU Performance
Modeling . . . . . . . . . . . . . . . . 55--58
Bradley Denby and
Brandon Lucia Orbital Edge Computing: Machine
Inference in Space . . . . . . . . . . . 59--62
He Liu and
Jianhui Han and
Youhui Zhang A Unified Framework for Training,
Mapping and Simulation of ReRAM-Based
Convolutional Neural Network
Acceleration . . . . . . . . . . . . . . 63--66
Tian Tan and
Eriko Nurvitadhi and
Derek Chiou Dark Wires and the Opportunities for
Reconfigurable Logic . . . . . . . . . . 67--70
Ajeya Naithani and
Josue Feliu and
Almutaz Adileh and
Lieven Eeckhout Precise Runahead Execution . . . . . . . 71--74
V. Agrawal and
M. A. Dinani and
Y. Shui and
M. Ferdman and
N. Honarmand Massively Parallel Server Processors . . 75--78
H. Golestani and
G. Gupta and
R. Sen Performance Modeling and Bottleneck
Analysis of EDGE Processors Using
Dependence Graphs . . . . . . . . . . . 79--82
J. Leng and
A. Buyuktosunoglu and
R. Bertran and
P. Bose and
V. J. Reddi Asymmetric Resilience for
Accelerator-Rich Systems . . . . . . . . 83--86
E. Sadredini and
R. Rahimi and
V. Verma and
M. Stan and
K. Skadron A Scalable and Efficient In-Memory
Interconnect Architecture for Automata
Processing . . . . . . . . . . . . . . . 87--90
A. Yasin and
A. Mendelson and
Y. Ben-Asher Tuning Performance via Metrics with
Expectations . . . . . . . . . . . . . . 91--94
L. Wang and
M. Jahre and
A. Adileh and
Z. Wang and
L. Eeckhout Modeling Emerging Memory-Divergent GPU
Applications . . . . . . . . . . . . . . 95--98
G. Shomron and
T. Horowitz and
U. Weiser SMT-SA: Simultaneous Multithreading in
Systolic Arrays . . . . . . . . . . . . 99--102
D. Masouros and
S. Xydis and
D. Soudris Rusty: Runtime System Predictability
Leveraging LSTM Neural Networks . . . . 103--106
S. Kim and
H. Jung and
W. Shin and
H. Lee and
H. Lee HAD-TWL: Hot Address Detection-Based
Wear Leveling for Phase-Change Memory
Systems with Low Latency . . . . . . . . 107--110
H. Zhou and
G. T. Byrd Quantum Circuits for Dynamic Runtime
Assertions in Quantum Computation . . . 111--114
J. Rao and
T. Ao and
K. Dai and
X. Zou ARCE: Towards Code Pointer Integrity on
Embedded Processors Using
Architecture-Assisted Run-Time Metadata
Management . . . . . . . . . . . . . . . 115--118
K. Bhardwaj and
M. Havasi and
Y. Yao and
D. M. Brooks and
J. M. H. Lobato and
G. Wei Determining Optimal Coherency Interface
for Many-Accelerator SoCs Using Bayesian
Optimization . . . . . . . . . . . . . . 119--123
Ali Ansari and
Pejman Lotfi-Kamran and
Hamid Sarbazi-Azad Code Layout Optimization for Near-Ideal
Instruction Cache . . . . . . . . . . . 124--127
Kiran Ranganath and
AmirAli Abdolrashidi and
Shuaiwen Leon Song and
Daniel Wong Speeding up Collective Communications
Through Inter-GPU Re-Routing . . . . . . 128--131
Dylan Stow and
Amin Farmahini-Farahani and
Sudhanva Gurumurthi and
Michael Ignatowski and
Yuan Xie Power Profiling of Modern Die-Stacked
Memory . . . . . . . . . . . . . . . . . 132--135
Seyed Morteza Nabavinejad and
Hassan Hafez-Kolahi and
Sherief Reda Coordinated DVFS and Precision Control
for Deep Neural Networks . . . . . . . . 136--140
Seunghak Lee and
Nam Sung Kim and
Daehoon Kim Exploiting OS-Level Memory Offlining for
DRAM Power Management . . . . . . . . . 141--144
Theodoros Marinakis and
Iraklis Anagnostopoulos Performance and Fairness Improvement on
CMPs Considering Bandwidth and Cache
Utilization . . . . . . . . . . . . . . 1--4
Adarsha Balaji and
Shihao Song and
Anup Das and
Nikil Dutt and
Jeff Krichmar and
Nagarajan Kandasamy and
Francky Catthoor A Framework to Explore Workload-Specific
Performance and Lifetime Trade-offs in
Neuromorphic Computing . . . . . . . . . 149--152
Hyeran Jeon and
Hodjat Asghari Esfeden and
Nael B. Abu-Ghazaleh and
Daniel Wong and
Sindhuja Elango Locality-Aware GPU Register File . . . . 153--156
Chen Li and
Yifan Sun and
Lingling Jin and
Lingjie Xu and
Zheng Cao and
Pengfei Fan and
David Kaeli and
Sheng Ma and
Yang Guo and
Jun Yang Priority-Based PCIe Scheduling for
Multi-Tenant Multi-GPU Systems . . . . . 157--160
Jian Weng and
Sihao Liu and
Vidushi Dadu and
Tony Nowatzki DAEGEN: a Modular Compiler for Exploring
Decoupled Spatial Accelerators . . . . . 161--165
Konstantinos Iliakis and
Sotirios Xydis and
Dimitrios Soudris LOOG: Improving GPU Efficiency With
Light-Weight Out-Of-Order Execution . . 166--169
Reoma Matsuo and
Ryota Shioya and
Hideki Ando Improving the Instruction Fetch
Throughput with Dynamically Configuring
the Fetch Pipeline . . . . . . . . . . . 170--173
Vamsee Reddy Kommareddy and
Baogang Zhang and
Fan Yao and
Rickard Ewetz and
Amro Awad Are Crossbar Memories Secure? New
Security Vulnerabilities in Crossbar
Memories . . . . . . . . . . . . . . . . 174--177
Kristin Barber and
Anys Bacha and
Li Zhou and
Yinqian Zhang and
Radu Teodorescu Isolating Speculative Data to Prevent
Transient Execution Attacks . . . . . . 178--181
Ki-Dong Kang and
Gyeongseo Park and
Nam Sung Kim and
Daehoon Kim Network Packet Processing Mode-Aware
Power Management for Data Center Servers 1--4
Mustafa Cavus and
Mohammed Shatnawi and
Resit Sendag and
Augustus K. Uht Exploring Prefetching, Pre-Execution and
Branch Outcome Streaming for In-Memory
Database Lookups . . . . . . . . . . . . 5--8
Rahul Bodduna and
Vinod Ganesan and
Patanjali SLPSK and
Kamakoti Veezhinathan and
Chester Rebeiro Brutus: Refuting the Security Claims of
the Cache Timing Randomization
Countermeasure Proposed in CEASER . . . 9--12
Minsub Kim and
Jaeha Kung and
Sungjin Lee Towards Scalable Analytics with
Inference-Enabled Solid-State Drives . . 13--17
Congmiao Li and
Jean-Luc Gaudiot Challenges in Detecting an Evasive
Spectre . . . . . . . . . . . . . . . . 18--21
Mingyu Yan and
Zhaodong Chen and
Lei Deng and
Xiaochun Ye and
Zhimin Zhang and
Dongrui Fan and
Yuan Xie Characterizing and Understanding GCNs on
GPU . . . . . . . . . . . . . . . . . . 22--25
Chanchal Kumar and
Aayush Chaudhary and
Shubham Bhawalkar and
Utkarsh Mathur and
Saransh Jain and
Adith Vastrad and
Eric Rotenberg Post-Silicon Microarchitecture . . . . . 26--29
Stijn Eyerman and
Wim Heirman and
Sam Van den Steen and
Ibrahim Hur Breaking In-Order Branch Miss Recovery 30--33
Zhi-Gang Liu and
Paul N. Whatmough and
Matthew Mattina Systolic Tensor Array: an Efficient
Structured-Sparse GEMM Accelerator for
Mobile CNN Inference . . . . . . . . . . 34--37
Srivatsan Krishnan and
Zishen Wan and
Kshitij Bhardwaj and
Paul Whatmough and
Aleksandra Faust and
Gu-Yeon Wei and
David Brooks and
Vijay Janapa Reddi The Sky Is Not the Limit: a Visual
Performance Model for Cyber-Physical
Co-Design in Autonomous Machines . . . . 38--42
Pierre Michaud Exploiting Thermal Transients With
Deterministic Turbo Clock Frequency . . 43--46
Zhufei Chu and
Huiming Tian and
Zeqiang Li and
Yinshui Xia and
Lunyao Wang A High-Performance Design of Generalized
Pipeline Cellular Array . . . . . . . . 47--50
Lingjun Zhu and
Lennart Bamberg and
Anthony Agnesina and
Francky Catthoor and
Dragomir Milojevic and
Manu Komalan and
Julien Ryckaert and
Alberto Garcia-Ortiz and
Sung Kyu Lim Heterogeneous $3$D Integration for a
RISC-V System With STT-MRAM . . . . . . 51--54
Tony Mason and
Thaleia Dimitra Doudali and
Margo Seltzer and
Ada Gavrilovska Unexpected Performance of Intel Optane
DC Persistent Memory . . . . . . . . . . 55--58
Zhihui Zhang and
Jingwen Leng and
Lingxiao Ma and
Youshan Miao and
Chao Li and
Minyi Guo Architectural Implications of Graph
Neural Networks . . . . . . . . . . . . 59--62
Anderson L. Sartor and
Anish Krishnakumar and
Samet E. Arda and
Umit Y. Ogras and
Radu Marculescu HiLITE: Hierarchical and Lightweight
Imitation Learning for Power Management
of Embedded SoCs . . . . . . . . . . . . 63--67
Harsh Desai and
Brandon Lucia A Power-Aware Heterogeneous Architecture
Scaling Model for Energy-Harvesting
Computers . . . . . . . . . . . . . . . 68--71
Bo-Cheng Lai and
Chun-Yen Chen and
Yi-Da Hsin and
Bo-Yen Lin A Two-Directional BigData Sorting
Architecture on FPGAs . . . . . . . . . 72--75
Peng Gu and
Benjamin S. Lim and
Wenqin Huangfu and
Krishan T. Malladi and
Andrew Chang and
Yuan Xie NMTSim: Transaction-Command Based
Simulator for New Memory Technology
Devices . . . . . . . . . . . . . . . . 76--79
Seyyed Hossein SeyyedAghaei Rezaei and
Mehdi Modarressi and
Rachata Ausavarungnirun and
Mohammad Sadrosadati and
Onur Mutlu and
Masoud Daneshtalab NoM: Network-on-Memory for Inter-Bank
Data Transfer in Highly-Banked Memories 80--83
Anonymous 2019 Index \booktitleIEEE Computer
Architecture Letters Vol. 18 . . . . . . 1--8
Alberto Ros and
Alexandra Jimborean The Entangling Instruction Prefetcher 84--87
Rahul Singh and
Gokul Subramanian Ravi and
Mikko Lipasti and
Joshua San Miguel Value Locality Based Approximation With
ODIN . . . . . . . . . . . . . . . . . . 88--91
Jie Zhang and
Miryeong Kwon and
Sanghyun Han and
Nam Sung Kim and
Mahmut Kandemir and
Myoungsoo Jung FastDrain: Removing Page Victimization
Overheads in NVMe Storage Stack . . . . 92--96
Junsu Im and
Hanbyeol Kim and
Yumin Won and
Jiho Oh and
Minjae Kim and
Sungjin Lee Probability-Based Address Translation
for Flash SSDs . . . . . . . . . . . . . 97--100
Ahmed Samara and
James Tuck The Case for Domain-Specialized Branch
Predictors for Graph-Processing . . . . 101--104
Reza Mirosanlou and
Danlu Guo and
Mohamed Hassan and
Rodolfo Pellizzoni MCsim: an Extensible DRAM Memory
Controller Simulator . . . . . . . . . . 105--109
Shang Li and
Zhiyuan Yang and
Dhiraj Reddy and
Ankur Srivastava and
Bruce Jacob DRAMsim3: a Cycle-Accurate,
Thermal-Capable DRAM Simulator . . . . . 106--109
Joo Hwan Lee and
Hui Zhang and
Veronica Lagrange and
Praveen Krishnamoorthy and
Xiaodong Zhao and
Yang Seok Ki SmartSSD: FPGA Accelerated Near-Storage
Data Analytics on SSD . . . . . . . . . 110--113
Purab Ranjan Sutradhar and
Mark Connolly and
Sathwika Bavikadi and
Sai Manoj Pudukotai Dinakarrao and
Mark A. Indovina and
Amlan Ganguly pPIM: a Programmable Processor-in-Memory
Architecture With Precision-Scaling for
Deep Learning . . . . . . . . . . . . . 118--121
Wonkyo Choe and
Jonghyeon Kim and
Jeongseob Ahn A Study of Memory Placement on
Hardware-Assisted Tiered Memory Systems 122--125
Nada Lachtar and
Abdulrahman Abu Elkhail and
Anys Bacha and
Hafiz Malik A Cross-Stack Approach Towards Defending
Against Cryptojacking . . . . . . . . . 126--129
Fatemeh Golshan and
Mohammad Bakhshalipour and
Mehran Shakerinava and
Ali Ansari and
Pejman Lotfi-Kamran and
Hamid Sarbazi-Azad Harnessing Pairwise-Correlating Data
Prefetching With Runahead Metadata . . . 130--133
Nikita Lazarev and
Neil Adit and
Shaojie Xiang and
Zhiru Zhang and
Christina Delimitrou Dagger: Towards Efficient RPCs in Cloud
Microservices With Near-Memory
Reconfigurable NICs . . . . . . . . . . 134--138
Ali Jahanshahi and
Hadi Zamani Sabzi and
Chester Lau and
Daniel Wong GPU-NEST: Characterizing Energy
Efficiency of Multi-GPU Inference
Servers . . . . . . . . . . . . . . . . 139--142
Darya Mikhailenko and
Yujin Nakamoto and
Ben Feinberg and
Engin Ipek Adapting In Situ Accelerators for
Sparsity with Granular Matrix Reordering 143--146
Yasuo Ishii and
Jaekyu Lee and
Krishnendra Nathella and
Dam Sunwoo Rebasing Instruction Prefetching: an
Industry Perspective . . . . . . . . . . 147--150
Newton and
Virendra Singh and
Trevor E. Carlson PIM-GraphSCC: PIM-Based Graph Processing
Using Graph's Community Structures . . . 151--154
Zamshed I. Chowdhury and
S. Karen Khatamifard and
Zhaoyong Zheng and
Tali Moreshet and
R. Iris Bahar and
Ulya R. Karpuzcu Voltage Noise Mitigation With Barrier
Approximation . . . . . . . . . . . . . 155--158
Yuezhi Che and
Yuanzhou Yang and
Amro Awad and
Rujia Wang A Lightweight Memory Access Pattern
Obfuscation Framework for NVM . . . . . 163--166
Elaheh Sadredini and
Reza Rahimi and
Kevin Skadron Enabling In-SRAM Pattern Processing With
Low-Overhead Reporting Architecture . . 167--170
Ferdous Sharifi and
Nezam Rohbani and
Shaahin Hessabi Aging-Aware Context Switching in
Multicore Processors Based on Workload
Classification . . . . . . . . . . . . . 159--162
Anonymous 2020 Index \booktitleIEEE Computer
Architecture Letters Vol. 19 . . . . . . 1--7
Hyoukjun Kwon and
Michael Pellauer and
Angshuman Parashar and
Tushar Krishna Flexion: a Quantitative Metric for
Flexibility in DNN Accelerators . . . . 1--4
Byeongho Kim and
Jaehyun Park and
Eojin Lee and
Minsoo Rhu and
Jung Ho Ahn TRiM: Tensor Reduction in Memory . . . . 5--8
Nirmal Kumar Boran and
Shubhankit Rathore and
Meet Udeshi and
Virendra Singh Fine-Grained Scheduling in
Heterogeneous-ISA Architectures . . . . 9--12
Salonik Resch and
Swamit Tannu and
Ulya R. Karpuzcu and
Moinuddin Qureshi A Day In the Life of a Quantum Error . . 13--16
Mohsin Shan and
Omer Khan Accelerating Concurrent Priority
Scheduling Using Adaptive in-Hardware
Task Distribution in Multicores . . . . 17--21
Arthur Perais A Case for Speculative Strength
Reduction . . . . . . . . . . . . . . . 22--25
Marta Navarro and
Lucia Pons and
Julio Sahuquillo Hy-Sched: a Simple Hyperthreading-Aware
Thread to Core Allocation Strategy . . . 26--29
Mohammad Alian and
Jongmin Shin and
Ki-Dong Kang and
Ren Wang and
Alexandros Daglis and
Daehoon Kim and
Nam Sung Kim IDIO: Orchestrating Inbound Network Data
on Server Processors . . . . . . . . . . 30--33
Hweesoo Kim and
Sunjung Lee and
Jaewan Choi and
Jung Ho Ahn Row-Streaming Dataflow Using a Chaining
Buffer and Systolic Array+ Structure . . 34--37
Hans Kasan and
John Kim The Case for Dynamic Bias in Global
Adaptive Routing . . . . . . . . . . . . 38--41
Parth Shah and
Ranjal Gautham Shenoy and
Vaidyanathan Srinivasan and
Pradip Bose and
Alper Buyuktosunoglu TokenSmart: Distributed, Scalable Power
Management in the Many-Core Era . . . . 42--45
Qian Li and
Bin Li and
Pietro Mercati and
Ramesh Illikkal and
Charlie Tai and
Michael Kishinevsky and
Christos Kozyrakis RAMBO: Resource Allocation for
Microservices Using Bayesian
Optimization . . . . . . . . . . . . . . 46--49
Sunghwan Kim and
Gyusun Lee and
Jiwon Woo and
Jinkyu Jeong Zero-Copying I/O Stack for Low-Latency
SSDs . . . . . . . . . . . . . . . . . . 50--53
Chao Yu and
Sihang Liu and
Samira Khan MultiPIM: a Detailed and Configurable
Multi-Stack Processing-In-Memory
Simulator . . . . . . . . . . . . . . . 54--57
Tian Tan and
Eriko Nurvitadhi and
Aravind Dasu and
Martin Langhammer and
Derek Chiou FlexScore: Quantifying Flexibility . . . 58--4
Arindam Sarkar and
Newton Singh and
Varun Venkitaraman and
Virendra Singh DAM: Deadblock Aware Migration
Techniques for STT-RAM-Based Hybrid
Caches . . . . . . . . . . . . . . . . . 62--4
Han Li and
Mingyu Yan and
Xiaocheng Yang and
Lei Deng and
Wenming Li and
Xiaochun Ye and
Dongrui Fan and
Yuan Xie Hardware Acceleration for GCNs via
Bidirectional Fusion . . . . . . . . . . 66--4
Yongjoo Jang and
Sejin Kim and
Daehoon Kim and
Sungjin Lee and
Jaeha Kung Deep Partitioned Training From
Near-Storage Computing to DNN
Accelerators . . . . . . . . . . . . . . 70--73
Salonik Resch and
Husrev Cilasun and
Ulya R. Karpuzcu Cryogenic PIM: Challenges Opportunities 74--77
Wim Heirman and
Stijn Eyerman and
Kristof Du Bois and
Ibrahim Hur RIO: ROB-Centric In-Order Modeling of
Out-of-Order Processors . . . . . . . . 78--81
Aporva Amarnath and
Subhankar Pal and
Hiwot Tadese Kassa and
Augusto Vega and
Alper Buyuktosunoglu and
Hubertus Franke and
John-David Wellman and
Ronald Dreslinski and
Pradip Bose Heterogeneity-Aware Scheduling on SoCs
for Autonomous Vehicles . . . . . . . . 82--85
Lei Wang and
Xingwang Xiong and
Jianfeng Zhan and
Wanling Gao and
Xu Wen and
Guoxin Kang and
Fei Tang WPC: Whole-Picture Workload
Characterization Across Intermediate
Representation, ISA, and
Microarchitecture . . . . . . . . . . . 86--89
Stijn Eyerman and
Wim Heirman and
Ibrahim Hur Modeling DRAM Timing in Parallel
Simulators With Immediate-Response
Memory Model . . . . . . . . . . . . . . 90--93
Hajar Falahati and
Masoud Peyro and
Hossein Amini and
Mehran Taghian and
Mohammad Sadrosadati and
Pejman Lotfi-Kamran and
Hamid Sarbazi-Azad Data-Aware Compression of Neural
Networks . . . . . . . . . . . . . . . . 94--97
Benjamin Wu and
Trishita Tiwari and
G. Edward Suh and
Aaron B. Wagner Guessing Outputs of Dynamically Pruned
CNNs Using Memory Access Patterns . . . 98--101
Mingi Yoo and
Jaeyong Song and
Jounghoo Lee and
Namhyung Kim and
Youngsok Kim and
Jinho Lee Making a Better Use of Caches for GCN
Accelerators with Feature Slicing and
Automatic Tile Morphing . . . . . . . . 102--105
Bongjoon Hyun and
Jiwon Lee and
Minsoo Rhu Characterization and Analysis of Deep
Learning for 3D Point Cloud Analytics 106--109
Alexander Rucker and
Muhammad Shahbaz and
Kunle Olukotun Chopping off the Tail: Bounded
Non-Determinism for Real-Time
Accelerators . . . . . . . . . . . . . . 110--113
Jiya Su and
Linfeng He and
Peng Jiang and
Rujia Wang Exploring PIM Architecture for
High-Performance Graph Pattern Mining 114--117
Yunjae Lee and
Youngeun Kwon and
Minsoo Rhu Understanding the Implication of
Non-Volatile Memory for Large-Scale
Graph Neural Network Training . . . . . 118--121
Francisco Muñoz-Martínez and
José L. Abellán and
Manuel E. Acacio and
Tushar Krishna STONNE: Enabling Cycle-Level
Microarchitectural Simulation for DNN
Inference Accelerators . . . . . . . . . 122--125
Nima Shoghi and
Andrei Bersatti and
Moinuddin Qureshi and
Hyesoon Kim SmaQ: Smart Quantization for DNN
Training by Exploiting Value Clustering 126--129
Haris Volos The Case for Replication-Aware
Memory-Error Protection in Disaggregated
Memory . . . . . . . . . . . . . . . . . 130--133
Truls Asheim and
Boris Grot and
Rakesh Kumar BTB-X: a Storage-Effective BTB
Organization . . . . . . . . . . . . . . 134--137
Pratik Kumar and
Chavhan Sujeet Yashavant and
Biswabandan Panda DAMARU: a Denial-of-Service Attack on
Randomized Last-Level Caches . . . . . . 138--141
Fatemeh Ghasemi and
Magnus Jahre Modeling Periodic Energy-Harvesting
Computing Systems . . . . . . . . . . . 142--145
Neelu Shivprakash Kalani and
Biswabandan Panda Instruction Criticality Based
Energy-Efficient Hardware Data
Prefetching . . . . . . . . . . . . . . 146--149
Jiho Kim and
Myoungsoo Jung and
John Kim Decoupled SSD: Reducing Data Movement on
NAND-Based Flash SSD . . . . . . . . . . 150--153
Hyeon Gyu Lee and
Minwook Kim and
Juwon Lee and
Eunji Lee and
Bryan S. Kim and
Sungjin Lee and
Yeseong Kim and
Sang Lyul Min and
Jin-Soo Kim Learned Performance Model for SSD . . . 154--157
Sudhanva Gurumurthi and
Kijun Lee and
Munseon Jang and
Vilas Sridharan and
Aaron Nygren and
Yesin Ryu and
Kyomin Sohn and
Taekyun Kim and
Hoeju Chung HBM3 RAS: Enhancing Resilience at Scale 158--161
Pavlos Aimoniotis and
Christos Sakalis and
Magnus Själander and
Stefanos Kaxiras Reorder Buffer Contention: a Forward
Speculative Interference Attack for
Speculation Invariant Instructions . . . 162--165
Seyed Morteza Nabavinejad and
Sherief Reda BayesTuner: Leveraging Bayesian
Optimization For DNN Inference
Configuration Selection . . . . . . . . 166--170
Hyungkyu Ham and
Hyunuk Cho and
Minjae Kim and
Jueon Park and
Jeongmin Hong and
Hyojin Sung and
Eunhyeok Park and
Euicheol Lim and
Gwangsun Kim Near-Data Processing in Memory Expander
for DNN Acceleration on GPUs . . . . . . 171--174
Wenjie Liu and
Wim Heirman and
Stijn Eyerman and
Shoaib Akram and
Lieven Eeckhout Scale-Model Simulation . . . . . . . . . 175--178
Anonymous 2021 Index \booktitleIEEE Computer
Architecture Letters Vol. 20 . . . . . . 1--8
Xinfeng Xie and
Peng Gu and
Jiayi Huang and
Yufei Ding and
Yuan Xie MPU-Sim: a Simulator for In-DRAM
Near-Bank Processing Architectures . . . 1--4
Mo Zou and
Mingzhe Zhang and
Rujia Wang and
Xian-He Sun and
Xiaochun Ye and
Dongrui Fan and
Zhimin Tang Accelerating Graph Processing With
Lightweight Learning-Based Data
Reordering . . . . . . . . . . . . . . . 5--8
Kristin Barber and
Moein Ghaniyoun and
Yinqian Zhang and
Radu Teodorescu A Pre-Silicon Approach to Discovering
Microarchitectural Vulnerabilities in
Security Critical Applications . . . . . 9--12
Dusol Lee and
Duwon Hong and
Wonil Choi and
Jihong Kim MQSim-E: an Enterprise SSD Simulator . . 13--16
Benjamin J. Lucas and
Ali Alwan and
Marion Murzello and
Yazheng Tu and
Pengzhou He and
Andrew J. Schwartz and
David Guevara and
Ujjwal Guin and
Kyle Juretus and
Jiafeng Xie Lightweight Hardware Implementation of
Binary Ring-LWE PQC Accelerator . . . . 17--20
Yongwon Shin and
Juseong Park and
Jeongmin Hong and
Hyojin Sung Runtime Support for Accelerating CNN
Models on Digital DRAM
Processing-in-Memory Hardware . . . . . 33--36
Hoyong Jin and
Donghun Jeong and
Taewon Park and
Jong Hwan Ko and
Jungrae Kim Multi-Prediction Compression: an
Efficient and Scalable Memory
Compression Framework for GP-GPU . . . . 37--40
Argyris Kokkinis and
Dionysios Diamantopoulos and
Kostas Siozios Dynamic Optimization of On-Chip Memories
for HLS Targeting Many-Accelerator
Platforms . . . . . . . . . . . . . . . 41--44
Sungmin Yun and
Byeongho Kim and
Jaehyun Park and
Hwayong Nam and
Jung Ho Ahn and
Eojin Lee GraNDe: Near-Data Processing
Architecture With Adaptive Matrix
Mapping for Graph Convolutional Networks 45--48
Rui Ma and
Evangelos Georganas and
Alexander Heinecke and
Sergey Gribok and
Andrew Boutros and
Eriko Nurvitadhi FPGA-Based AI Smart NICs for Scalable
Distributed AI Training Systems . . . . 49--52
Fazal Hameed and
Asif Ali Khan and
Sebastien Ollivier and
Alex K. Jones and
Jeronimo Castrillon DNA Pre-Alignment Filter Using
Processing Near Racetrack Memory . . . . 53--56
Ling Yang and
Libo Huang and
Run Yan and
Nong Xiao and
Sheng Ma and
Li Shen and
Weixia Xu Stride Equality Prediction for Value
Speculation . . . . . . . . . . . . . . 57--60
Jeongmin Hong and
Sungjun Cho and
Gwangsun Kim Overcoming Memory Capacity Wall of GPUs
With Heterogeneous Memory Stack . . . . 61--64
Luca Piccolboni and
Davide Giri and
Luca P. Carloni Accelerators & Security: The Socket
Approach . . . . . . . . . . . . . . . . 65--68
Mingyu Yan and
Mo Zou and
Xiaocheng Yang and
Wenming Li and
Xiaochun Ye and
Dongrui Fan and
Yuan Xie Characterizing and Understanding HGNNs
on GPUs . . . . . . . . . . . . . . . . 69--72
Cecil Accetti and
Rendong Ying and
Peilin Liu Structured Combinators for Efficient
Graph Reduction . . . . . . . . . . . . 73--76
Yu Omori and
Keiji Kimura Open-Source Hardware Memory Protection
Engine Integrated With NVMM Simulator 77--80
Minjae Kim and
Bryan S. Kim and
Eunji Lee and
Sungjin Lee A Case Study of a DRAM-NVM Hybrid Memory
Allocator for Key--Value Stores . . . . 81--84
Zhengrong Wang and
Christopher Liu and
Tony Nowatzki Infinity Stream: Enabling Transparent
and Automated In-Memory Computing . . . 85--88
Lingxi Wu and
Rasool Sharifi and
Ashish Venkat and
Kevin Skadron DRAM-CAM: General-Purpose Bit-Serial
Exact Pattern Matching . . . . . . . . . 89--92
Salonik Resch and
Ulya Karpuzcu On Variable Strength Quantum ECC . . . . 93--96
Peter Salvesen and
Magnus Jahre LMT: Accurate and Resource-Scalable
Slowdown Prediction . . . . . . . . . . 97--100
Gyeongcheol Shin and
Junsoo Kim and
Joo-Young Kim OpenMDS: an Open-Source Shell Generation
Framework for High-Performance Design on
Xilinx Multi-Die FPGAs . . . . . . . . . 101--104
Majid Jalili and
Mattan Erez Managing Prefetchers With Deep
Reinforcement Learning . . . . . . . . . 105--108
Marzieh Lenjani and
Alif Ahmed and
Kevin Skadron Pulley: an Algorithm/Hardware
Co-Optimization for In-Memory Sorting 109--112
Yongye Zhu and
Shijia Wei and
Mohit Tiwari Revisiting Browser Performance
Benchmarking From an Architectural
Perspective . . . . . . . . . . . . . . 113--116
Donghyun Gouk and
Seungkwan Kang and
Miryeong Kwon and
Junhyeok Jang and
Hyunkyu Choi and
Sangwon Lee and
Myoungsoo Jung PreGNN: Hardware Acceleration to Take
Preprocessing Off the Critical Path in
Graph Neural Networks . . . . . . . . . 117--120
Yinshen Wang and
Wenming Li and
Tianyu Liu and
Liangjiang Zhou and
Bingnan Wang and
Zhihua Fan and
Xiaochun Ye and
Dongrui Fan and
Chibiao Ding Characterization and Implementation of
Radar System Applications on a
Reconfigurable Dataflow Architecture . . 121--124
Xiaofeng Hou and
Cheng Xu and
Jiacheng Liu and
Xuehan Tang and
Lingyu Sun and
Chao Li and
Kwang-Ting Cheng Characterizing and Understanding
End-to-End Multi-Modal Neural Networks
on GPUs . . . . . . . . . . . . . . . . 125--128
Jared Nye and
Omer Khan SSE: Security Service Engines to
Accelerate Enclave Performance in Secure
Multicore Processors . . . . . . . . . . 129--132
Gino A. Chacon and
Charles Williams and
Johann Knechtel and
Ozgur Sinanoglu and
Paul V. Gratz Hardware Trojan Threats to Cache
Coherence in Modern 2.5D Chiplet Systems 133--136
Lieven Eeckhout A First-Order Model to Assess Computer
Architecture Sustainability . . . . . . 137--140
Ranyang Zhou and
Sepehr Tabrizchi and
Arman Roohi and
Shaahin Angizi LT-PIM: an LUT-Based Processing-in-DRAM
Architecture With RowHammer
Self-Tracking . . . . . . . . . . . . . 141--144
Jongwon Park and
Jinkyu Jeong Speculative Multi-Level Access in LSM
Tree-Based KV Store . . . . . . . . . . 145--148
Marjan Fariborz and
Mahyar Samani and
Terry O'Neill and
Jason Lowe-Power and
S. J. Ben Yoo and
Venkatesh Akella A Model for Scalable and Balanced
Accelerators for Graph Processing . . . 149--152
Jianming Huang and
Yu Hua Ensuring Data Confidentiality in
eADR-Based NVM Systems . . . . . . . . . 153--156
Sejin Kim and
Jungwoo Kim and
Yongjoo Jang and
Jaeha Kung and
Sungjin Lee SEMS: Scalable Embedding Memory System
for Accelerating Embedding-Based DNNs 157--160
Daniel A. Jiménez and
Elvira Teran and
Paul V. Gratz Last-Level Cache Insertion and Promotion
Policy in the Presence of Aggressive
Prefetching . . . . . . . . . . . . . . 17--20
Yaebin Moon and
Wanju Doh and
Kwanhee Kyung and
Eojin Lee and
Jung Ho Ahn ADT: Aggressive Demotion and Promotion
for Tiered Memory . . . . . . . . . . . 21--24
Gyeongseo Park and
Ki-Dong Kang and
Minho Kim and
Daehoon Kim CoreNap: Energy Efficient Core
Allocation for Latency-Critical
Workloads . . . . . . . . . . . . . . . 1--4
Joonseop Sim and
Soohong Ahn and
Taeyoung Ahn and
Seungyong Lee and
Myunghyun Rhee and
Jooyoung Kim and
Kwangsik Shin and
Donguk Moon and
Euiseok Kim and
Kyoung Park Computational CXL-Memory Solution for
Accelerating Memory-Intensive
Applications . . . . . . . . . . . . . . 5--8
Burkhard Ringlein and
Francois Abel and
Dionysios Diamantopoulos and
Beat Weiss and
Christoph Hagleitner and
Dietmar Fey Advancing Compilation of DNNs for FPGAs
Using Operation Set Architectures . . . 9--12
Seonho Lee and
Ranggi Hwang and
Jongse Park and
Minsoo Rhu HAMMER: Hardware-Friendly Approximate
Computing for Self-Attention With
Mean-Redistribution and Linearization 13--16
Hanyeoreum Bae and
Donghyun Gouk and
Seungjun Lee and
Jiseon Kim and
Sungjoon Koh and
Jie Zhang and
Myoungsoo Jung Intelligent SSD Firmware for
Zero-Overhead Journaling . . . . . . . . 25--28
Xia Zhao and
Guangda Zhang and
Lu Wang and
Yangmei Li and
Yongjun Zhang RouteReplies: Alleviating Long Latency
in Many-Chip-Module GPUs . . . . . . . . 29--32
Kevin Weston and
Farabi Mahmud and
Vahid Janfaza and
Abdullah Muzahid SmartIndex: Learning to Index Caches to
Improve Performance . . . . . . . . . . 33--36
Soroosh Khoram and
Kyle Daruwalla and
Mikko Lipasti Energy-Efficient Bayesian Inference
Using Bitstream Computing . . . . . . . 37--40
Jennifer Brana and
Brian C. Schwedock and
Yatin A. Manerkar and
Nathan Beckmann Kobold: Simplified Cache Coherence for
Cache-Attached Accelerators . . . . . . 41--44
Kiseok Jeon and
Junghee Lee and
Bumsoo Kim and
James J. Kim Hardware Accelerated Reusable Merkle
Tree Generation for Bitcoin Blockchain
Headers . . . . . . . . . . . . . . . . 69--72
Hwanjun Lee and
Seunghak Lee and
Yeji Jung and
Daehoon Kim T-CAT: Dynamic Cache Allocation for
Tiered Memory Systems With Memory
Interleaving . . . . . . . . . . . . . . 73--76
Ipoom Jeong and
Jiaqi Lou and
Yongseok Son and
Yongjoo Park and
Yifan Yuan and
Nam Sung Kim LADIO: Leakage-Aware Direct I/O for
I/O-Intensive Workloads . . . . . . . . 77--80
Chandana S. Deshpande and
Arthur Perais and
Frédéric Pétrot Toward Practical 128-Bit General Purpose
Microarchitectures . . . . . . . . . . . 81--84
Achilleas Tzenetopoulos and
Dimosthenis Masouros and
Dimitrios Soudris and
Sotirios Xydis DVFaaS: Leveraging DVFS for FaaS
Workflows . . . . . . . . . . . . . . . 85--88
Hwayong Nam and
Seungmin Baek and
Minbok Wi and
Michael Jaemin Kim and
Jaehyun Park and
Chihun Song and
Nam Sung Kim and
Jung Ho Ahn X-ray: Discovering DRAM Internal
Structure and Error Characteristics by
Issuing Memory Commands . . . . . . . . 89--92
Ahmed Nematallah and
Chang Hyun Park and
David Black-Schaffer Exploring the Latency Sensitivity of
Cache Replacement Policies . . . . . . . 93--96
Fernando Mosquera and
Krishna Kavi and
Gayatri Mehta and
Lizy John Guard Cache: Creating Noisy
Side-Channels . . . . . . . . . . . . . 97--100
Jason Mars and
Yiping Kang and
Roland Daynauth and
Baichuan Li and
Ashish Mahendra and
Krisztian Flautner and
Lingjia Tang The Jaseci Programming Paradigm and
Runtime Stack: Building Scale-Out
Production Applications Easy and Fast 101--104
Naorin Hossain and
Alper Buyuktosunoglu and
John-David Wellman and
Pradip Bose and
Margaret Martonosi SoCurity: a Design Approach for
Enhancing SoC Security . . . . . . . . . 105--108
Justin Feng and
Fatemeh Arkannezhad and
Christopher Ryu and
Enoch Huang and
Siddhant Gupta and
Nader Sehatbakhsh Simulating Our Way to Safer Software: a
Tale of Integrating Microarchitecture
Simulation and Leakage Estimation
Modeling . . . . . . . . . . . . . . . . 109--112
Jaewan Choi and
Jaehyun Park and
Kwanhee Kyung and
Nam Sung Kim and
Jung Ho Ahn Unleashing the Potential of PIM:
Accelerating Large Batched Inference of
Transformer-Based Generative Models . . 113--116
Yonghae Kim and
Anurag Kar and
Jaewon Lee and
Jaekyu Lee and
Hyesoon Kim Hardware-Assisted Code-Pointer Tagging
for Forward-Edge Control-Flow Integrity 117--120
Gururaj Saileshwar and
Moinuddin Qureshi The Mirage of Breaking MIRAGE: Analyzing
the Modeling Pitfalls in Emerging
Attacks on MIRAGE . . . . . . . . . . . 121--124
Yun-Chen Lo and
Yu-Chih Tsai and
Ren-Shuo Liu LV: Latency-Versatile Floating-Point
Engine for High-Performance Deep Neural
Networks . . . . . . . . . . . . . . . . 125--128
Maziar Goudarzi and
Reza Azimi and
Julian Humecki and
Faizaan Rehman and
Richard Zhang and
Chirag Sethi and
Tanishq Bomman and
Yuqi Yang By-Software Branch Prediction in Loops 129--132
Yugyoung Yun and
Eunhyeok Park Fast Performance Prediction for
Efficient Distributed DNN Training . . . 133--136
Meng Wu and
Mingyu Yan and
Xiaocheng Yang and
Wenming Li and
Zhimin Zhang and
Xiaochun Ye and
Dongrui Fan Characterizing and Understanding Defense
Methods for GNNs on GPUs . . . . . . . . 137--140
Pratyush Patel and
Zibo Gong and
Syeda Rizvi and
Esha Choukse and
Pulkit Misra and
Thomas Anderson and
Akshitha Sriraman Towards Improved Power Management in
Cloud GPUs . . . . . . . . . . . . . . . 141--144
Shiqing Zhang and
Mahmood Naderan-Tahan and
Magnus Jahre and
Lieven Eeckhout Balancing Performance Against Cost and
Sustainability in Multi-Chip-Module GPUs 145--148
Chanyoung Park and
Chun-Yi Liu and
Kyungtae Kang and
Mahmut Kandemir and
Wonil Choi Design of a High-Performance,
High-Endurance Key-Value SSD for
Large-Key Workloads . . . . . . . . . . 149--152
Jie Liu and
Zhongyuan Zhao and
Zijian Ding and
Benjamin Brock and
Hongbo Rong and
Zhiru Zhang An Intermediate Language for General
Sparse Format Customization . . . . . . 153--156
Seunghak Lee and
Ki-Dong Kang and
Gyeongseo Park and
Nam Sung Kim and
Daehoon Kim NoHammer: Preventing Row Hammer With
Last-Level Cache Management . . . . . . 157--160
Pau Escofet and
Anabel Ovide and
Carmen G. Almudever and
Eduard Alarcón and
Sergi Abadal Hungarian Qubit Assignment for Optimized
Mapping of Quantum Circuits on
Multi-Core Architectures . . . . . . . . 161--164
Lingfei Lu and
Yudi Qiu and
Shiyan Yi and
Yibo Fan A Flexible Embedding-Aware Near Memory
Processing Architecture for
Recommendation System . . . . . . . . . 165--168
Hailong Li and
Jaewan Choi and
Yongsuk Kwon and
Jung Ho Ahn A Hardware-Friendly Tiled Singular-Value
Decomposition-Based Matrix
Multiplication for Transformer-Based
Models . . . . . . . . . . . . . . . . . 169--172
Adam Hastings and
Ryan Piersma and
Simha Sethumadhavan Architectural Security Regulation . . . 173--176
Theodoros Trochatos and
Chuanqi Xu and
Sanjay Deshpande and
Yao Lu and
Yongshan Ding and
Jakub Szefer A Quantum Computer Trusted Execution
Environment . . . . . . . . . . . . . . 177--180
Peiyun Wu and
Trung Le and
Zhichun Zhu and
Zhao Zhang Redundant Array of Independent Memory
Devices . . . . . . . . . . . . . . . . 181--184
Jonathan Garcia-Mallen and
Shuohao Ping and
Alex Miralles-Cordal and
Ian Martin and
Mukund Ramakrishnan and
Yipeng Huang Towards an Accelerator for Differential
and Algebraic Equations Useful to
Scientists . . . . . . . . . . . . . . . 185--188
João Vieira and
Nuno Roma and
Gabriel Falcao and
Pedro Tomás gem5-accel: a Pre-RTL Simulation
Toolchain for Accelerator Architecture
Validation . . . . . . . . . . . . . . . 1--4
Atiyeh Gheibi-Fetrat and
Negar Akbarzadeh and
Shaahin Hessabi and
Hamid Sarbazi-Azad Tulip: Turn-Free Low-Power
Network-on-Chip . . . . . . . . . . . . 5--8
Yosuke Ueno and
Yuna Tomida and
Teruo Tanimoto and
Masamitsu Tanaka and
Yutaka Tabuchi and
Koji Inoue and
Hiroshi Nakamura Inter-Temperature Bandwidth Reduction in
Cryogenic QAOA Machines . . . . . . . . 9--12
Hyeseong Kim and
Yunjae Lee and
Minsoo Rhu FPGA-Accelerated Data Preprocessing for
Personalized Recommendation Systems . . 7--10
Christodoulos Peltekis and
Vasileios Titopoulos and
Chrysostomos Nicopoulos and
Giorgos Dimitrakopoulos DeMM: a Decoupled Matrix Multiplication
Engine Supporting Relaxed Structured
Sparsity . . . . . . . . . . . . . . . . 17--20
Caden Corontzos and
Eitan Frachtenberg Direct-Coding DNA With Multilevel
Parallelism . . . . . . . . . . . . . . 21--24
Ramin Ayanzadeh and
Moinuddin Qureshi Enhancing the Reach and Reliability of
Quantum Annealers by Pruning Longer
Chains . . . . . . . . . . . . . . . . . 25--28
Courtney Golden and
Dan Ilan and
Caroline Huang and
Niansong Zhang and
Zhiru Zhang and
Christopher Batten Supporting a Virtual Vector Instruction
Set on a Commercial Compute-in-SRAM
Accelerator . . . . . . . . . . . . . . 29--32
Samuel Thomas and
Kidus Workneh and
Ange-Thierry Ishimwe and
Zack McKevitt and
Phaedra Curlin and
R. Iris Bahar and
Joseph Izraelevitz and
Tamara Lehman Baobab Merkle Tree for Efficient Secure
Memory . . . . . . . . . . . . . . . . . 33--36
Minsik Cho and
Keivan A. Vahid and
Qichen Fu and
Saurabh Adya and
Carlo C. Del Mundo and
Mohammad Rastegari and
Devang Naik and
Peter Zatloukal eDKM: an Efficient and Accurate
Train-Time Weight Clustering for Large
Language Models . . . . . . . . . . . . 37--40
Yang-Gon Kim and
Yun-Ki Han and
Jae-Kang Shin and
Jun-Kyum Kim and
Lee-Sup Kim Accelerating Deep Reinforcement Learning
via Phase-Level Parallelism for Robotics
Applications . . . . . . . . . . . . . . 41--44
Yuxin Yang and
Xiaoming Chen and
Yinhe Han JANM-IK: Jacobian Argumented
Nelder--Mead Algorithm for Inverse
Kinematics and its Hardware Acceleration 45--48
Mohammad Hafezan and
Ehsan Atoofian Improving Energy-Efficiency of Capsule
Networks on Modern GPUs . . . . . . . . 49--52
Mahita Nagabhiru and
Gregory T. Byrd Achieving Forward Progress Guarantee in
Small Hardware Transactions . . . . . . 53--56
Rui Ma and
Jia-Ching Hsu and
Ali Mansoorshahi and
Joseph Garvey and
Michael Kinsner and
Deshanand Singh and
Derek Chiou Primate: a Framework to Automatically
Generate Soft Processors for Network
Applications . . . . . . . . . . . . . . 57--60
Lo\"\ic France and
Florent Bruguier and
David Novo and
Maria Mushtaq and
Pascal Benoit Reducing the Silicon Area Overhead of
Counter-Based Rowhammer Mitigations . . 61--64
L. Yavits DRAMA: Commodity DRAM Based Content
Addressable Memory . . . . . . . . . . . 65--68
Deepanjali Mishra and
Konstantinos Kanellopoulos and
Ashish Panwar and
Akshitha Sriraman and
Vivek Seshadri and
Onur Mutlu and
Todd C. Mowry Address Scaling: Architectural Support
for Fine-Grained Thread-Safe Metadata
Management . . . . . . . . . . . . . . . 69--72
Changmin Shin and
Taehee Kwon and
Jaeyong Song and
Jae Hyung Ju and
Frank Liu and
Yeonkyu Choi and
Jinho Lee A Case for In-Memory Random
Scatter--Gather for Fast Graph
Processing . . . . . . . . . . . . . . . 73--77
Lieven Eeckhout R.I.P. Geomean Speedup Use Equal-Work
(Or Equal-Time) Harmonic Mean Speedup
Instead . . . . . . . . . . . . . . . . 78--82
Z. Jahshan and
L. Yavits MajorK: Majority Based kmer Matching in
Commodity DRAM . . . . . . . . . . . . . 83--86
Shiyan Yi and
Yudi Qiu and
Lingfei Lu and
Guohao Xu and
Yong Gong and
Xiaoyang Zeng and
Yibo Fan GATe: Streamlining Memory Access and
Communication to Accelerate Graph
Attention Network With Near-Memory
Processing . . . . . . . . . . . . . . . 87--90
Mrinmay Sasmal and
Tresa Joseph and
Bindiya T. S. Approximate Multiplier Design With
LFSR-Based Stochastic Sequence
Generators for Edge AI . . . . . . . . . 91--94
Varun Gohil and
Sundar Dev and
Gaurang Upasani and
David Lo and
Parthasarathy Ranganathan and
Christina Delimitrou The Importance of Generalizability in
Machine Learning for Systems . . . . . . 95--98
Nikhil Agarwal and
Mitchell Fream and
Souradip Ghosh and
Brian C. Schwedock and
Nathan Beckmann UDIR: Towards a Unified Compiler
Framework for Reconfigurable Dataflow
Architectures . . . . . . . . . . . . . 99--103
Kyriaki Tsantikidou and
Nicolas Sklavos An Area Efficient Architecture of a
Novel Chaotic System for High Randomness
Security in e-Health . . . . . . . . . . 104--107
Yongmo Park and
Subhankar Pal and
Aporva Amarnath and
Karthik Swaminathan and
Wei D. Lu and
Alper Buyuktosunoglu and
Pradip Bose Dramaton: a Near-DRAM Accelerator for
Large Number Theoretic Transforms . . . 108--111
Haocong Luo and
Yahya Can Tu\ugrul and
F. Nisa Bostancì and
Ataberk Olgun and
A. Giray Ya\uglìkçì and
Onur Mutlu Ramulator 2.0: a Modern, Modular, and
Extensible DRAM Simulator . . . . . . . 112--116
Hyungyo Kim and
Gaohan Ye and
Nachuan Wang and
Amir Yazdanbakhsh and
Nam Sung Kim Exploiting Intel Advanced Matrix
Extensions (AMX) for Large Language
Model Inference . . . . . . . . . . . . 117--120
Tianzheng Li and
Enfang Cui and
Yuting Wu and
Qian Wei and
Yue Gao TeleVM: a Lightweight Virtual Machine
for RISC-V Architecture . . . . . . . . 121--124
Yingjie Qi and
Jianlei Yang and
Ao Zhou and
Tong Qiao and
Chunming Hu Architectural Implications of GNN
Aggregation Programming Abstractions . . 125--128
Asif Ali Khan and
Fazal Hameed and
Taha Shahroodi and
Alex K. Jones and
Jeronimo Castrillon Efficient Memory Layout for
Pre-Alignment Filtering of Long DNA
Reads Using Racetrack Memory . . . . . . 129--132
Saurav Maji and
Kyungmi Lee and
Anantha P. Chandrakasan SparseLeakyNets: Classification
Prediction Attack Over Sparsity-Aware
Embedded Neural Networks Using Timing
Side-Channel Information . . . . . . . . 133--136
Seyyed Hossein SeyyedAghaei Rezaei and
Parham Zilouchian Moghaddam and
Mehdi Modarressi Smart Memory: Deep Learning Acceleration
in 3D-Stacked Memories . . . . . . . . . 137--141
Hossein Katebi and
Navidreza Asadi and
Maziar Goudarzi FullPack: Full Vector Utilization for
Sub-Byte Quantized Matrix--Vector
Multiplication on General Purpose CPUs 142--145
Erika S. Alcorta and
Mahesh Madhav and
Richard Afoakwa and
Scott Tetrick and
Neeraja J. Yadwadkar and
Andreas Gerstlauer Characterizing Machine Learning-Based
Runtime Prefetcher Selection . . . . . . 146--149
Andreas Kosmas Kakolyris and
Dimosthenis Masouros and
Sotirios Xydis and
Dimitrios Soudris SLO-Aware GPU DVFS for Energy-Efficient
LLM Inference Serving . . . . . . . . . 150--153
Dongho Yoon and
Taehun Kim and
Jae W. Lee and
Minsoo Rhu A Quantitative Analysis of State Space
Model-Based Large Language Model: Study
of Hungry Hungry Hippos . . . . . . . . 154--157
Mohammadamin Ajdari and
Behrang Montazerzohour and
Kimia Abdi and
Hossein Asadi Empirical Architectural Analysis on
Performance Scalability of Petascale
All-Flash Storage Systems . . . . . . . 158--161
Ali Mohammadpur-Fard and
Sina Darabi and
Hajar Falahati and
Negin Mahani and
Hamid Sarbazi-Azad Exploiting Direct Memory Operands in GPU
Instructions . . . . . . . . . . . . . . 162--165
Pablo Andreu and
Pedro Lopez and
Carles Hernandez Hashing ATD Tags for Low-Overhead Safe
Contention Monitoring . . . . . . . . . 166--169
Deniz Gurevin and
Caiwen Ding and
Omer Khan Exploiting Intrinsic Redundancies in
Dynamic Graph Neural Networks for
Processing Efficiency . . . . . . . . . 170--174
Reoma Matsuo and
Toru Koizumi and
Hidetsugu Irie and
Shuichi Sakai and
Ryota Shioya TURBULENCE: Complexity-Effective
Out-of-Order Execution on GPU With
Distance-Based ISA . . . . . . . . . . . 175--178
Dongjae Lee and
Bongjoon Hyun and
Taehun Kim and
Minsoo Rhu Analysis of Data Transfer Bottlenecks in
Commercial PIM Systems: a Study With
UPMEM--PIM . . . . . . . . . . . . . . . 179--182
Seunghyuk Yu and
Hyeonu Kim and
Kyoungho Jeun and
Sunyoung Hwang and
Eojin Lee Architecting Compatible PIM Protocol for
CPU--PIM Collaboration . . . . . . . . . 183--186
Yazheng Tu and
Pengzhou He and
Chip-Hong Chang and
Jiafeng Xie LTE: Lightweight and Time-Efficient
Hardware Encoder for Post-Quantum Scheme
HQC . . . . . . . . . . . . . . . . . . 187--190
Mohamed Hossam and
Salah Hessien and
Mohamed Hassan Octopus: a Cycle-Accurate Cache System
Simulator . . . . . . . . . . . . . . . 191--194
Paresh Baidya and
Rourab Paul and
Swagata Mandal and
Sumit Kumar Debnath Efficient Implementation of Knuth Yao
Sampler on Reconfigurable Hardware . . . 195--198
Rui Xie and
Asad Ul Haq and
Linsen Ma and
Krystal Sun and
Sanchari Sen and
Swagath Venkataramani and
Liu Liu and
Tong Zhang SmartQuant: CXL-Based AI Model Store in
Support of Runtime Configurable Weight
Quantization . . . . . . . . . . . . . . 199--202
Haeyoon Cho and
Hyojun Son and
Jungmin Choi and
Byungil Koh and
Minho Ha and
John Kim Proactive Embedding on Cold Data for
Deep Learning Recommendation Model
Training . . . . . . . . . . . . . . . . 203--206
Hyesung Ji and
Sangpyo Kim and
Jaewan Choi and
Jung Ho Ahn Accelerating Programmable Bootstrapping
Targeting Contemporary GPU
Microarchitecture . . . . . . . . . . . 207--210
Yuya Degawa and
Shota Suzuki and
Junichiro Kadomoto and
Hidetsugu Irie and
Shuichi Sakai Cycle-Oriented Dynamic Approximation:
Architectural Framework to Meet
Performance Requirements . . . . . . . . 211--214
Md Tareq Mahmud and
Ke Wang A Flexible Hybrid Interconnection Design
for High-Performance and
Energy-Efficient Chiplet-Based Systems 215--218
Hyungkyu Ham and
Wonhyuk Yang and
Yunseon Shin and
Okkyun Woo and
Guseul Heo and
Sangyeop Lee and
Jongse Park and
Gwangsun Kim ONNXim: a Fast, Cycle-Level Multi-Core
NPU Simulator . . . . . . . . . . . . . 219--222
Shizhuo Zhu and
Illia Shkirko and
Jacob Levinson and
Zhengrong Wang and
Tony Nowatzki SPGPU: Spatially Programmed GPU . . . . 223--226
Eunyeong Cho and
Jehyeon Bang and
Minsoo Rhu Characterization and Analysis of
Text-to-Image Diffusion Models . . . . . 227--230
Farid Samandi and
Natheesan Ratnasegar and
Michael Ferdman A Case for Hardware Memoization in
Server CPUs . . . . . . . . . . . . . . 231--234
Hanna Cha and
Sungchul Lee and
Yeonan Ha and
Hanhwi Jang and
Joonsung Kim and
Youngsok Kim GCStack: a GPU Cycle Accounting
Mechanism for Providing Accurate Insight
Into GPU Performance . . . . . . . . . . 235--238
Hongtao Wang and
Peiquan Jin ZoneBuffer: an Efficient Buffer
Management Scheme for ZNS SSDs . . . . . 239--242
Samuel Coulon and
Tianyou Bao and
Jiafeng Xie SCALES: SCALable and Area-Efficient
Systolic Accelerator for Ternary
Polynomial Multiplication . . . . . . . 243--246
Navnil Choudhury and
Chao Lu and
Kanad Basu Quantum Assertion Scheme for Assuring
Qudit Robustness . . . . . . . . . . . . 247--250
Haseung Bong and
Nahyeon Kang and
Youngsok Kim and
Joonsung Kim and
Hanhwi Jang IntervalSim++: Enhanced Interval
Simulation for Unbalanced Processor
Designs . . . . . . . . . . . . . . . . 1--4
Myoungjun Chun and
Jaeyong Lee and
Inhyuk Choi and
Jisung Park and
Myungsuk Kim and
Jihong Kim Straw: a Stress-Aware WL-Based Read
Reclaim Technique for High-Density NAND
Flash-Based SSDs . . . . . . . . . . . . 5--8
Chaithanya Krishna Vadlamudi and
Bahar Asgari Electra: Eliminating the Ineffectual
Computations on Bitmap Compressed
Matrices . . . . . . . . . . . . . . . . 9--12