A parallel implementation of the 2D wavelet transform using CUDA J Franco, G Bernabé, J Fernández, ME Acacio 2009 17th Euromicro International Conference on Parallel, Distributed and …, 2009 | 114 | 2009 |
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture ME Acacio, J González, JM García, J Duato SC'02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, 49-49, 2002 | 107 | 2002 |
A two-level directory architecture for highly scalable cc-NUMA multiprocessors ME Acacio, J González, JM Garcia, J Duato IEEE Transactions on parallel and distributed systems 16 (1), 67-79, 2005 | 78 | 2005 |
A new scalable directory architecture for large-scale multiprocessors ME Acacio, J González, JM García, J Duato Proceedings HPCA Seventh International Symposium on High-Performance …, 2001 | 66 | 2001 |
A direct coherence protocol for many-core chip multiprocessors A Ros, ME Acacio, JM Garcia IEEE Transactions on Parallel and Distributed Systems 21 (12), 1779-1792, 2010 | 64 | 2010 |
Stonne: Enabling cycle-level microarchitectural simulation for dnn inference accelerators F Muñoz-Martínez, JL Abellán, ME Acacio, T Krishna 2021 IEEE International Symposium on Workload Characterization (IISWC), 201-213, 2021 | 63 | 2021 |
DiCo-CMP: Efficient cache coherency in tiled CMP architectures A Ros, ME Acacio, JM García 2008 IEEE International Symposium on Parallel and Distributed Processing, 1-11, 2008 | 63 | 2008 |
The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors ME Acacio, J González, JM Garcia, J Duato Proceedings. International Conference on Parallel Architectures and …, 2002 | 61 | 2002 |
Heterogeneous interconnects for energy-efficient message management in cmps A Flores, JL Aragón, ME Acacio IEEE Transactions on Computers 59 (1), 16-28, 2009 | 47 | 2009 |
A low overhead fault tolerant coherence protocol for CMP architectures R Fernández-Pascual, JM García, ME Acacio, J Duato 2007 IEEE 13th International Symposium on High Performance Computer …, 2007 | 47 | 2007 |
Glocks: Efficient support for highly-contended locks in many-core cmps JL Abell, J Fern, ME Acacio 2011 IEEE International Parallel & Distributed Processing Symposium, 893-905, 2011 | 46 | 2011 |
Understanding the design-space of sparse/dense multiphase GNN dataflows on spatial accelerators R Garg, E Qin, F Muñoz-Matrínez, R Guirado, A Jain, S Abadal, ... 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022 | 43* | 2022 |
Heterogeneous NoC design for efficient broadcast-based coherence protocol support M Lodde, J Flich, ME Acacio 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, 59-66, 2012 | 40 | 2012 |
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures A Flores, JL Aragón, ME Acacio The Journal of Supercomputing 45, 341-364, 2008 | 40 | 2008 |
Zebra: A data-centric, hybrid-policy hardware transactional memory design R Titos-Gil, A Negi, ME Acacio, JM Garcia, P Stenstrom Proceedings of the international conference on Supercomputing, 53-62, 2011 | 39 | 2011 |
π-TM: Pessimistic invalidation for scalable lazy hardware transactional memory A Negi, R Titos-Gil, ME Acacio, JM Garcia, P Stenstrom IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012 | 37 | 2012 |
Scalable Directory Organization for Tiled CMP Architectures. A Ros, ME Acacio, JM García CDES 8, 112-118, 2008 | 37 | 2008 |
Efficient hardware barrier synchronization in many-core cmps JL Abellán, J Fernández, ME Acacio IEEE Transactions on Parallel and Distributed Systems 23 (8), 1453-1466, 2011 | 35 | 2011 |
Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient dnn processing F Muñoz-Martínez, R Garg, M Pellauer, JL Abellán, ME Acacio, T Krishna Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 34 | 2023 |
An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration ME Acacio, J Gonzalez, JM Garcia, J Duato IEEE Transactions on Parallel and Distributed Systems 15 (8), 755-768, 2004 | 32 | 2004 |