Publications

Our research has been published at top HPC and system conferences (e.g., SC, PPoPP, ASPLOS, DAC, ICS, IPDPS, ICPP, CLUSTER, DATE and Euro-Par) and journals (e.g., TC, TPDS, TCAD, TACO, TODAES, JPDC and PARCO), and most of our papers provide open-source code. Welcome to evaluate and reproduce our work!

2024

Authors: Mingjia Fan, Xiaoming Chen, Dechuang Yang, Zhou Jin, Weifeng Liu

Title: ReCG: ReRAM-Accelerated Sparse Conjugate Gradient

Venue: 61st ACM/IEEE Design Automation Conference (DAC '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yinuo Bai, Enxin Yi, Wei Xing, Bei Yu, Zhou Jin

Title:  Unleashing the Potential of AQFP Logic Placement via Entanglement Entropy and Projection

Venue: 61st ACM/IEEE Design Automation Conference (DAC '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Chenxi Li, Boyuan Zhang, Yongqiang Duan, Yang Li, Zuochang Ye, Weifeng Liu, Dingwen Tao, Zhou Jin

Title: MASC: A Memory-Efficient Adjoint Sensitivity Analysis through Compression Using Novel Spatiotemporal Prediction

Venue: 61st ACM/IEEE Design Automation Conference (DAC '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Mingyue Wang, Yuanqing Cheng, Yage Lin, Kelin Peng, Shunchuan Yang, Zhou Jin, Wei Xing

Title: MAUnet: Multiscale Attention U-Net for Effective IR Drop Prediction

Venue: 61st ACM/IEEE Design Automation Conference (DAC '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, Depei Qian

Title: Adaptive Auto-tuning Framework for Global Exploration of Stencil Optimization on GPUs

Venue: IEEE Transactions on Parallel and Distributed Systems (TPDS)

Year: 2024

[PDF] [DOI] [Bibtex] [Code]

Authors: Zizheng Guo, Tsung-Wei Huang, Zhou Jin, Cheng Zhuo, Yibo Lin, Runsheng Wang, Ru Huang

Title: Heterogeneous Static Timing Analysis with Advanced Delay Calculator

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yichao Dong, Dan Niu, Zhou Jin, Chuan Zhang, Changyin Sun, Zhenya Zhou

Title: ISPT-Net: A Noval Transient Backward-stepping Reduction Policy by Irregular Sequential PredictionTransformer

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yinuo Bai, Xiaoyu Yang, Yicheng Lu, Dan Niu, Cheng Zhuo, Zhou Jin, Weifeng Liu

Title: Efficient Spectral-Aware Power Supply Noise Analysis for Low-Power Design Verification

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhou Jin, Tian Feng, Xiao Wu, Dan Niu, Zhenya Zhou, Cheng Zhuo

Title: MSH: A Multi-Stage HiZ-Aware Homotopy Framework for Nonlinear DC Analysis

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Pengju Chen, Dan Niu, Zhou Jin, Changyin Sun, Qi Li, Hao Yan

Title: TSA-TICER: A Two-Stage TICER Acceleration Framework for Model Order Reduction

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Enxin Yi, Yiru Duan, Yinuo Bai, Kang Zhao, Zhou Jin, Weifeng Liu

Title: Cuper: Customized Dataflow and Perceptual Decoding for Sparse Matrix-Vector Multiplication on HBM-Equipped FPGAs

Venue: 21st Design, Automation and Test in Europe Conference (DATE '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Guofeng Feng, Hongyu Wang, Zhuoqiang Guo, Mingzhen Li, Tong Zhao, Zhou Jin, Weile Jia, Guangming Tan, Ninghui Sun

Title: Accelerating Large-scale Sparse LU Factorization for RF Circuit Simulation

Venue: 30th International European Conference on Parallel and Distributed Computing (Euro-Par '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhou Jin, Wenhao Li, Yinuo Bai, Tengcheng Wang, Yicheng Lu, Weifeng Liu

Title: Machine Learning and GPU Accelerated Sparse Linear Solvers for Transistor-Level Circuit Simulation: A Perspective Survey (Invited Paper)

Venue: 29th ACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC '24)

Year: 2024

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Mouzhi Yang, Peng Zhang, Jianbin Fang, Weifeng Liu, Chun Huang

Title: thSORT: An Efficient Parallel Sorting Algorithm on Multi‑core DSPs

Venue: CCF Transactions on High Performance Computing (CCF THPC)

Year: 2024

[PDF] [DOI] [Bibtex] [Code]

2023

Authors: Xu Fu, Bingbin Zhang, Tengcheng Wang, Wenhao Li, Yuechen Lu, Enxin Yi, Jianqi Zhao, Xiaohan Geng, Fangying Li, Jingwen Zhang, Zhou Jin, Weifeng Liu

Title: PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems

Award: Best Paper Award

Venue: 36th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yuechen Lu, Weifeng Liu

Title: DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication

Venue: 36th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Tengcheng Wang, Wenhao Li, Haojie Pei, Yuying Sun, Zhou Jin, Weifeng Liu

Title: Accelerating Sparse LU Factorization with Density-Aware Adaptive Matrix Multiplication for Circuit Simulation

Venue: 60th ACM/IEEE Design Automation Conference (DAC '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Mingjia Fan, Xiaotian Tian, Yintao He, Junxian Li, Yiru Duan, Xiaozhe Hu, Ying Wang,  Zhou Jin, Weifeng Liu

Title: AmgR: Algebraic Multigrid Accelerated on ReRAM

Venue: 60th ACM/IEEE Design Automation Conference (DAC '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Dan Niu, Yichao Dong, Zhou Jin, Chuan Zhang, Qi Li, Changyin Sun

Title: OSSP-PTA: An Online Stochastic Stepping Policy for PTA on Reinforcement Learning

Venue:  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

Year: 2023

[PDF] [DOI] [Bibtex] [Code]

Authors: Jianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, Fengwei Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, Depei Qian

Title: Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU

Venue: 37th IEEE International Parallel and Distributed Processing Symposium (IPDPS '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Helin Cheng, Wenxuan Li, Yuechen Lu, Weifeng Liu

Title: HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors

Venue: 52nd International Conference on Parallel Processing (ICPP '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Wenxuan Li, Helin Cheng, Zhengyang Lu, Yuechen Lu, Weifeng Liu

Title: HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors

Venue: 25th IEEE International Conference on Cluster Computing (CLUSTER '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Hongli Mi, Xiangrui Yu, Xiaosong Yu, Shuangyuan Wu, Weifeng Liu

Title: Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication

Venue: 23rd IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhengyang Lu, Weifeng Liu

Title: TileSpTRSV: A Tiled Algorithm for Parallel Sparse Triangular Solve on GPUs

Venue: CCF Transactions on High Performance Computing (CCF THPC)

Year: 2023

[PDF] [DOI] [Bibtex] [Code]

Authors: Yichao Dong, Dan Niu, Zhou Jin, Chuan Zhang, Qi Li, Changyin Sun

Title: Adaptive Stepping PTA for DC Analysis Based on Reinforcement Learning

Venue: IEEE Transactions on Circuits and Systems--II: Express Briefs (TCAS-II)

Year: 2023

[PDF] [DOI] [Bibtex] [Code]

Authors: Xiaru Zha, Haojie Pei, Dan Niu, Xiao Wu, Zhou Jin

Title: Deep Learning Enhanced Time-step Control in Pseudo Transient Analysis for Efficient Nonlinear DC Simulation

Award: Honorable Paper Award

Venue: 1st IEEE/ACM International Symposium of Electronics Design Automation (ISEDA  '23)

Year: 2023

[PDF] [Slides] [DOI] [Bibtex] [Code]

2022

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Ruizhe Zhang, Ming Dun, Mingzhen Li, Xiaoyan Liu, Wencong Xiao, Yong Li, Zhongzhi Luan, Depei Qian

Title: CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs

Venue: 35th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, Weifeng Liu

Title: TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs

Venue: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhou Jin, Haojie Pei, Yichao Dong, Xiang Jin, Xiao Wu, Wei Xing, Dan Niu

Title: Accelerating Nonlinear DC Circuit Simulation with Reinforcement Learning

Venue: 59th ACM/IEEE Design Automation Conference (DAC '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun

Title: A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures

Venue: IEEE Transactions on Parallel and Distributed Systems (TPDS)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, Depei Qian

Title: StencilMART: Predicting Optimization Selection for Stencil Computations Across GPUs

Venue: 36th IEEE International Parallel and Distributed Processing Symposium (IPDPS '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan, Weifeng Liu

Title: TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs

Venue: 51st International Conference on Parallel Processing (ICPP '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Jiwei Hao, Hailong Yang, Qingxiao Sun, Huaitao Zhang, Zhongzhi Luan, Depei Qian

Title: Towards Optimized Streaming Tensor Completion on Multiple GPUs

Venue: 24th IEEE International Conference on High Performance Computing and Communications (HPCC '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Liu Yi, Hailong Yang, Mingzhen Li, Zhongzhi Luan, Depei Qian

Title: QoS-aware Dynamic Resource Allocation with Improved Utilization and Energy Efficiency on GPU

Venue: Parallel Computing (PARCO)

Year: 2022

[PDF] [DOI] [Bibtex] [Code]

Authors: Wei Xing, Xiang Jin, Tian Feng, Dan Niu, Weishen Zhao, Zhou Jin

Title: BoA-PTA: A Bayesian Optimization Accelerated PTA Solver for SPICE Simulation

Venue: ACM Transactions on Design Automation of Electronic Systems (TODAES '22) 

Year: 2022

[PDF] [DOI] [Bibtex] [Code]

Authors: Yufei Chen, Haojie Pei, Xiao Dong, Zhou Jin, Cheng Zhuo 

Title: Application of Deep Learning in Back-End Simulation: Challenges and Opportunities

Venue: 27th ACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC '22)

Year: 2022

[PDF] [Slides] [DOI] [Bibtex] [Code]

2021

Authors: Jianqi Zhao, Yao Wen, Yuchen Luo, Zhou Jin, Weifeng Liu, Zhenya Zhou

Title: SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs

Venue: 58th ACM/IEEE Design Automation Conference (DAC '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Feng Zhang, Jiya Su, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang

Title: YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve

Venue: IEEE Transactions on Parallel and Distributed Systems (TPDS)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Jing Chen, Jianbin Fang, Weifeng Liu, Canqun Yang

Title: BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization

Venue: IEEE Transactions on Parallel and Distributed Systems (TPDS)

Year: 2021

[PDF] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Ming Dun, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian

Title: Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP

Award: IEEE Computer's "Spotlight on Transactions" column

Venue: IEEE Transactions on Computers (TC)

Year: 2021

[PDF] [DOI] [Bibtex] [Code]

Authors: Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian

Title: An Optimized Tensor Completion Library for Multiple GPUs

Venue: 35th ACM International Conference on Supercomputing (ICS '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan

Title: TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs

Venue: 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Mingzhen Li, Yi Liu, Hailong Yang, Yongmin Hu, Qingxiao Sun, Bangduo Chen, Xin You, Xiaoyan Liu, Zhongzhi Luan, Depei Qian

Title: Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors

Venue: 50th International Conference on Parallel Processing (ICPP '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Xiaoyan Liu, Ming Dun, Zhongzhi Luan, Depei Qian

Title: csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs

Award: Best Paper Finalist

Venue: 23rd IEEE International Conference on Cluster Computing (CLUSTER '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhou Jin, Tian Feng, Yiru Duan, Xiao Wu, Minghou Cheng, Zhenya Zhou, Weifeng Liu

Title: PALBBD: A Parallel ArcLength Method Using Bordered Block Diagonal Form for DC Analysis

Venue: 31st ACM Great Lakes Symposium on VLSI (GLSVLSI '21)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Yuechen Lu, Yuchen Luo, Haocheng Lian, Zhou Jin, Weifeng Liu

Title: Implementing LU and Cholesky Factorizations on Artificial Intelligence Accelerators

Venue: CCF Transactions on High Performance Computing (CCF THPC)

Year: 2021

[PDF] [DOI] [Bibtex] [Code]

Authors: Haonan Ji, Shibo Lu, Kaixi Hou, Hao Wang, Zhou Jin, Weifeng Liu, Brian Vinter

Title: Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

Venue: International Journal of Parallel Programming (IJPP)

Year: 2021

[PDF] [Slides] [DOI] [Bibtex] [Code]

2020

Authors: Qingxiao Sun, Yi Liu, Ming Dun, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian

Title: SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning

Venue: 33th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '20)

Year: 2020

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Mingzhen Li , Yi Liu , Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian

Title: The Deep Learning Compiler: A Comprehensive Survey

Venue: IEEE Transactions on Parallel and Distributed Systems (TPDS)

Year: 2020

[PDF] [DOI] [Bibtex] [Code]

Authors: Zhengyang Lu, Yuyao Niu, Weifeng Liu

Title: Efficient Block Algorithms for Parallel Sparse Triangular Solve

Venue: 49th International Conference on Parallel Processing (ICPP '20)

Year: 2020

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Jiya Su, Feng Zhang, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang

Title: CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs

Venue: 49th International Conference on Parallel Processing (ICPP '20)

Year: 2020

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Ming Dun, Yunchun Li, Qingxiao Sun, Hailong Yang, Wei Li, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian

Title: Towards Efficient Canonical Polyadic Decomposition on Sunway Many-core Processor

Venue: Information Sciences

Year: 2020

[PDF] [DOI] [Bibtex] [Code]

Authors: Xiaosong Yu, Huihui Ma, Zhengyu Qu, Jianbin Fang, Weifeng Liu

Title: NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-based Many-Core Architectures

Venue: 17th IFIP International Conference on Network and Parallel Computing (NPC '20)

Year: 2020

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Zhiyong Xiao, Xu Liu, Jingheng Xu, Qingxiao Sun, Lin Gan

Title: Highly Scalable Parallel Genetic Algorithm on Sunway Many-core Processors

Venue: Future Generation Computer Systems (FGCS)

Year: 2020

[PDF] [DOI] [Bibtex] [Code]

2019

Authors: Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun

Title: IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication

Venue: 33rd ACM International Conference on Supercomputing (ICS '19)

Year: 2019

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Qingxiao Sun, Yi Liu, Hailong Yang, Zhongzhi Luan, Depei Qian

Title: SMQoS: Improving Utilization and Energy Efficiency with QoS Awareness on GPUs

Venue: 21st IEEE International Conference on Cluster Computing (CLUSTER '19)

Year: 2019

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Ming Dun, Yunchun Li, Xin You, Qingxiao Sun, Zerong Luan, Hailong Yang

Title: Accelerating De Novo Assembler WTDBG2 on Commodity Servers

Venue: 19th IEEE International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP '19)

Year: 2019

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Feng Zhang, Weifeng Liu, Ningxuan Feng, Jidong Zhai, Xiaoyong Du

Title: Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors

Venue: CCF Transactions on High Performance Computing (CCF THPC)

Year: 2019

[PDF] [DOI] [Bibtex] [Code]

2018

Authors: Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu

Title: swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures

Venue: 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18)

Year: 2018

[PDF] [Slides] [DOI] [Bibtex] [Source code (athread)]

Authors: Junhong Liu, Xin He, Weifeng Liu, Guangming Tan

Title: Register-based Implementation of the Sparse General Matrix-matrix Multiplication on GPUs

Venue: 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18)

Year: 2018

[PDF] [Poster] [DOI] [Bibtex] [Code]

Authors: Chao Yu, Yuebin Bai, Qingxiao Sun, Hailong Yang

Title: Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory

Venue: ACM Transactions on Architecture and Code Optimization (TACO)

Year: 2018

[PDF] [DOI] [Bibtex] [Code]

Authors: Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, Shuaiwen Leon Song

Title: Warp-Consolidation: A Novel Execution Model for GPUs

Venue: 32nd ACM International Conference on Supercomputing (ICS '18)

Year: 2018

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Canqun Yang

Title: clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization

Venue: Future Generation Computer Systems (FGCS)

Year: 2018

[PDF] [Slides] [DOI] [Bibtex] [Source code (opencl)]

Authors: Junhong Liu, Xin He, Weifeng Liu, Guangming Tan

Title: Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

Venue: International Journal of Parallel Programming (IJPP)

Year: 2018

[PDF] [DOI] [Bibtex] [Code]

2017

Authors: Ang Li, Weifeng Liu, Mads R. B. Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song

Title: Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels

Award: Best Paper Finalist

Venue: 30th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '17)

Year: 2017

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal

Title: Locality-Aware CTA Clustering for Modern GPUs

Award: HiPEAC Paper Award

Venue: 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17)

Year: 2017

[PDF] [Slides] [DOI] [Bibtex] [Code]

Authors: Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng

Title: Fast Segmented Sort on GPUs

Venue: 31st ACM International Conference on Supercomputing (ICS '17)

Year: 2017

[PDF] [Slides] [DOI] [Bibtex] [Source code (cuda)]

Authors: Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter

Title: Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides

Venue: Concurrency and Computation: Practice and Experience (CCPE)

Year: 2017

[PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]

Authors: Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Xuhao Chen, Canqun Yang

Title: Efficient and Portable ALS Matrix Factorization for Recommender Systems

Venue: 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics (Parlearning '17, held with IPDPS '17)

Year: 2017

[PDF] [Slides] [DOI] [Bibtex] [Code]

2016

Authors: Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng

Title: Parallel Transposition of Sparse Data Structures

Venue: 30th ACM International Conference on Supercomputing (ICS '16)

Year: 2016

[PDF] [Slides] [DOI] [Bibtex] [Source code (avx2, knc)]

Authors: Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter

Title: A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Venue: 22nd International European Conference on Parallel and Distributed Computing (Euro-Par '16)

Year: 2016

[PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]

2015

Authors: Weifeng Liu, Brian Vinter

Title: CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Venue: 29th ACM International Conference on Supercomputing (ICS '15)

Year: 2015

[PDF] [Slides] [DOI] [Bibtex] [Source code (avx2, avx512, knc, cuda, opencl-amd, opencl-nvidia)]

Authors: Weifeng Liu, Brian Vinter

Title: A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

Venue: Journal of Parallel and Distributed Computing (JPDC)

Year: 2015

[PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]

Authors: Weifeng Liu, Brian Vinter

Title: Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Venue: Parallel Computing (PARCO)

Year: 2015

[PDF] [DOI] [Bibtex] [Source code (cuda, opencl-amd, opencl-intel)]

2014

Authors: Weifeng Liu, Brian Vinter

Title: An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Venue: 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS '14)

Year: 2014

[PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]

Authors: Weifeng Liu, Brian Vinter

Title: Ad-heap: An Efficient Heap Data Structure for Asymmetric Multicore Processors

Venue: 7th Workshop on General Purpose Processing Using GPUs (GPGPU-7, held with ASPLOS '14)

Year: 2014

[PDF] [Slides] [DOI] [Bibtex] [Code]