Abstract

Modern engineering and scientific computing often requires solving sparse linear systems containing point-block matrix to model multiphysics problems. The space-time parallel method is popular and attractive in fluid dynamics, fitting parallel computers very well. In this paper, we design and implement a parallel, multi-GPU enabled GMRES solver for solving linear systems in the Kronecker product form arising from the domain decomposition based space-time parallel methods. To improve the efficiency of the solver, we also design a set of optimization strategies for Sparse Matrix-Vector Multiplication (SpMV) in Kronecker product form. These include: (1) enhancing the Compute-to-Memory Access Ratio (CMAR) to fully utilize the high bandwidth nature of the GPU during the computation phase and (2) introducing a parallel buffering scheme and a pre-mapping algorithm to enable the use of GPU-Direct for accelerating the communication phase. We conducted experiments on 1, 2, 4, and 8 GPUs and compared the performance of OKP-Solver with the cuSPARSE based implementation. On the V100 platform, the Kronecker product based SpMV computation ([Formula: see text]) achieves speedups of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] on 1, 2, 4, and 8 GPUs, respectively, while the communication time ([Formula: see text]) achieves [Formula: see text], [Formula: see text], and [Formula: see text] on 2, 4, and 8 GPUs, respectively. On the A100 platform, [Formula: see text] achieves speedups of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], while [Formula: see text] achieves [Formula: see text], [Formula: see text], and [Formula: see text]. The overall solver runtime ([Formula: see text]) achieves speedups of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] on V100, and [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] on A100, for 1, 2, 4, and 8 GPUs, respectively.

Keywords

GPUKronecker productLinear systemParallelPoint-block matrix

Affiliated Institutions

Related Publications

The Theory of Matrices

Volume 2: XI. Complex symmetric, skew-symmetric, and orthogonal matrices: 1. Some formulas for complex orthogonal and unitary matrices 2. Polar decomposition of a complex matrix...

1984 8577 citations

Introduction to Matrix Analysis

Foreword Preface to the Second Edition Preface 1. Maximization, Minimization, and Motivation 2. Vectors and Matrices 3. Diagonalization and Canonical Forms for Symmetric Matrice...

1960 2970 citations

Publication Info

Year
2025
Type
article
Volume
15
Issue
1
Pages
43529-43529
Citations
0
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex
0
Influential

Cite This

Wenpeng Ma, Siyuan Zhao, Xiaofan Le et al. (2025). A multi-GPU enabled solver in Kronecker product form for multiphysics problems. Scientific Reports , 15 (1) , 43529-43529. https://doi.org/10.1038/s41598-025-27400-3

Identifiers

DOI
10.1038/s41598-025-27400-3
PMID
41372255
PMCID
PMC12695889

Data Quality

Data completeness: 81%