Short Bio

Weifeng Liu is currently a Full Professor at the China University of Petroleum-Beijing. Formerly, he was a Marie Curie Fellow at the Norwegian University of Science and Technology. He received his PhD in 2016 from the Niels Bohr Institute of the University of Copenhagen under advisor Brian Vinter. He has been shortly working as a Research Associate with Iain S. Duff at the STFC Rutherford Appleton Laboratory in 2016. He also has been working as a Senior Researcher in high performance computing technology at the SINOPEC Exploration and Production Research Institute for about six years (2006-2012). He received his BE and ME degrees in computer science, both from the China University of Petroleum-Beijing, in 2002 and 2006, respectively. He is a Senior Member of the IEEE and a Member of the ACM and the SIAM. His research interests are in high performance numerical linear algebra, in particular include domain specific architectures, data structures, parallel and distributed algorithms, linear solver mathematical software for sparse matrix computations.

Publications

  • [DAC '24] Mingjia Fan, Xiaoming Chen, Dechuang Yang, Zhou Jin, Weifeng Liu. ReCG: ReRAM-Accelerated Sparse Conjugate Gradient. 61st ACM/IEEE Design Automation Conference. 2024. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DAC '24] Chenxi Li, Boyuan Zhang, Yongqiang Duan, Yang Li, Zuochang Ye, Weifeng Liu, Dingwen Tao, Zhou Jin. MASC: A Memory-Efficient Adjoint Sensitivity Analysis through Compression Using Novel Spatiotemporal Prediction. 61st ACM/IEEE Design Automation Conference. 2024. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DATE '24] Yinuo Bai, Xiaoyu Yang, Yicheng Lu, Dan Niu, Cheng Zhuo, Zhou Jin, Weifeng Liu. Efficient Spectral-Aware Power Supply Noise Analysis for Low-Power Design Verification. 21st Design, Automation and Test in Europe Conference. 2024. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DATE '24] Enxin Yi, Yiru Duan, Yinuo Bai, Kang Zhao, Zhou Jin, Weifeng Liu. Cuper: Customized Dataflow and Perceptual Decoding for Sparse Matrix-Vector Multiplication on HBM-Equipped FPGAs. 21st Design, Automation and Test in Europe Conference. 2024. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ASP-DAC '24] Zhou Jin, Wenhao Li, Yinuo Bai, Tengcheng Wang, Yicheng Lu, Weifeng Liu. Machine Learning and GPU Accelerated Sparse Linear Solvers for Transistor-Level Circuit Simulation: A Perspective Survey (Invited Paper). 29th ACM/IEEE Asia and South Pacific Design Automation Conference. 2024. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CCF THPC] Mouzhi Yang, Peng Zhang, Jianbin Fang, Weifeng Liu, Chun Huang. thSORT: An Efficient Parallel Sorting Algorithm on Multi‑core DSPs. CCF Transactions on High Performance Computing. 2024. [PDF] [DOI] [Bibtex] [Code]
  • [SC '23] Xu Fu, Bingbin Zhang, Tengcheng Wang, Wenhao Li, Yuechen Lu, Enxin Yi, Jianqi Zhao, Xiaohan Geng, Fangying Li, Jingwen Zhang, Zhou Jin, Weifeng Liu. PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems. 36th International Conference for High Performance Computing, Networking, Storage, and Analysis. 2023. Best Paper Award[PDF] [Slides] [DOI] [Bibtex] [Code]
  • [SC '23] Yuechen Lu, Weifeng Liu. DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication. 36th International Conference for High Performance Computing, Networking, Storage, and Analysis. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DAC '23] Tengcheng Wang, Wenhao Li, Haojie Pei, Yuying Sun, Zhou Jin, Weifeng Liu. Accelerating Sparse LU Factorization with Density-Aware Adaptive Matrix Multiplication for Circuit Simulation. 60th ACM/IEEE Design Automation Conference. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DAC '23] Mingjia Fan, Xiaotian Tian, Yintao He, Junxian Li, Yiru Duan, Xiaozhe Hu, Ying Wang,  Zhou Jin, Weifeng Liu. AmgR: Algebraic Multigrid Accelerated on ReRAM. 60th ACM/IEEE Design Automation Conference. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICPP '23] Helin Cheng, Wenxuan Li, Yuechen Lu, Weifeng Liu. HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors. 52nd International Conference on Parallel Processing. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CLUSTER '23] Wenxuan Li, Helin Cheng, Zhengyang Lu, Yuechen Lu, Weifeng Liu. HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors. 25th IEEE International Conference on Cluster Computing. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CCGRID '23] Hongli Mi, Xiangrui Yu, Xiaosong Yu, Shuangyuan Wu, Weifeng Liu. Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication. 23rd IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 2023. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CCF THPC] Zhengyang Lu, Weifeng Liu. TileSpTRSV: A Tiled Algorithm for Parallel Sparse Triangular Solve on GPUs. CCF Transactions on High Performance Computing. 2023. [PDF] [DOI] [Bibtex] [Code]
  • [PPoPP '22] Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, Weifeng Liu. TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs. 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2022. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [TPDS] Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun. A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures. IEEE Transactions on Parallel and Distributed Systems. 2022. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICPP '22] Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan, Weifeng Liu. TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs. 51st International Conference on Parallel Processing. 2022. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [DAC '21] Jianqi Zhao, Yao Wen, Yuchen Luo, Zhou Jin, Weifeng Liu, Zhenya Zhou. SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs. 58th ACM/IEEE Design Automation Conference. 2021. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [TPDS] Feng Zhang, Jiya Su, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang. YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve. IEEE Transactions on Parallel and Distributed Systems. 2021. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [TPDS] Jing Chen, Jianbin Fang, Weifeng Liu, Canqun Yang. BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization. IEEE Transactions on Parallel and Distributed Systems. 2021. [PDF] [DOI] [Bibtex] [Code]
  • [IPDPS '21] Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan. TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. 35th IEEE International Parallel and Distributed Processing Symposium. 2021.[PDF] [Slides] [DOI] [Bibtex] [Code]
  • [GLSVLSI '21] Zhou Jin, Tian Feng, Yiru Duan, Xiao Wu, Minghou Cheng, Zhenya Zhou, Weifeng Liu. PALBBD: A Parallel ArcLength Method Using Bordered Block Diagonal Form for DC Analysis. 31st ACM Great Lakes Symposium on VLSI. 2021. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CCF THPC] Yuechen Lu, Yuchen Luo, Haocheng Lian, Zhou Jin, Weifeng Liu. Implementing LU and Cholesky Factorizations on Artificial Intelligence Accelerators. CCF Transactions on High Performance Computing. 2021. [PDF] [DOI] [Bibtex] [Code]
  • [IJPP '21] Haonan Ji, Shibo Lu, Kaixi Hou, Hao Wang, Zhou Jin, Weifeng Liu, Brian Vinter. Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations. International Journal of Parallel Programming. 2021. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICPP '20] Zhengyang Lu, Yuyao Niu, Weifeng Liu. Efficient Block Algorithms for Parallel Sparse Triangular Solve. 49th International Conference on Parallel Processing. 2020. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICPP '20] Jiya Su, Feng Zhang, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang. CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs. 49th International Conference on Parallel Processing. 2020. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [NPC '20] Xiaosong Yu, Huihui Ma, Zhengyu Qu, Jianbin Fang, Weifeng Liu. NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-based Many-Core Architectures. 17th IFIP International Conference on Network and Parallel Computing. 2020. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICS '19] Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun. IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication. 33rd ACM International Conference on Supercomputing. 2019. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [CCF THPC] Feng Zhang, Weifeng Liu, Ningxuan Feng, Jidong Zhai, Xiaoyong Du. Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors. CCF Transactions on High Performance Computing. 2019. [PDF] [DOI] [Bibtex] [Code]
  • [PPoPP '18] Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu. swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2018. [PDF] [Slides] [DOI] [Bibtex] [Source code (athread)]
  • [PPoPP '18] Junhong Liu, Xin He, Weifeng Liu, Guangming Tan. Register-based Implementation of the Sparse General Matrix-matrix Multiplication on GPUs. 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2018. [PDF] [Poster] [DOI] [Bibtex] [Code]
  • [ICS '18] Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, Shuaiwen Leon Song. Warp-Consolidation: A Novel Execution Model for GPUs. 32nd ACM International Conference on Supercomputing. 2018. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [FGCS] Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Canqun Yang. clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization. Future Generation Computer Systems. 2018. [PDF] [Slides] [DOI] [Bibtex] [Source code (opencl)]
  • [IJPP] Junhong Liu, Xin He, Weifeng Liu, Guangming Tan. Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication. International Journal of Parallel Programming. 2018. [PDF] [DOI] [Bibtex] [Code]
  • [SC '17] Ang Li, Weifeng Liu, Mads R. B. Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song. Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernel. 30th International Conference for High Performance Computing, Networking, Storage, and Analysis. 2017. Best Paper Finalist. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ASPLOS '17] Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal. Locality-Aware CTA Clustering for Modern GPUs. 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2017. HiPEAC Paper Award. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICS '17] Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng. Fast Segmented Sort on GPUs. 31st ACM International Conference on Supercomputing. 2017. [PDF] [Slides] [DOI] [Bibtex] [Source code (cuda)]
  • [CCPE] Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience. 2017. [PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]
  • [Parlearning '17, held with IPDPS '17] Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Xuhao Chen, Canqun Yang. Efficient and Portable ALS Matrix Factorization for Recommender Systems. 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics. 2017. [PDF] [Slides] [DOI] [Bibtex] [Code]
  • [ICS '16] Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng. Parallel Transposition of Sparse Data Structures. 30th ACM International Conference on Supercomputing. 2016. [PDF] [Slides] [DOI] [Bibtex] [Source code (avx2, knc)]
  • [Euro-Par '16] Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. 22nd International European Conference on Parallel and Distributed Computing. 2016. [PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]
  • [ICS '15] Weifeng Liu, Brian Vinter. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. 29th ACM International Conference on Supercomputing. 2015. [PDF] [Slides] [DOI] [Bibtex] [Source code (avx2, avx512, knc, cuda, opencl-amd, opencl-nvidia)]
  • [JPDC] Weifeng Liu, Brian Vinter. A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors. Journal of Parallel and Distributed Computing. 2015. [PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]
  • [PARCO] Weifeng Liu, Brian Vinter. Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors. Parallel Computing. 2015. [PDF] [DOI] [Bibtex] [Source code (cuda, opencl-amd, opencl-intel)]
  • [IPDPS '14] Weifeng Liu, Brian Vinter. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data. 28th IEEE International Parallel & Distributed Processing Symposium. 2014. [PDF] [Slides] [DOI] [Bibtex] [Source code (cuda, opencl-amd)]
  • [GPGPU-7, held with ASPLOS '14] Weifeng Liu, Brian Vinter. Ad-heap: An Efficient Heap Data Structure for Asymmetric Multicore Processors. 7th Workshop on General Purpose Processing Using GPUs. 2014. [PDF] [Slides] [DOI] [Bibtex] [Code]