TY - GEN
T1 - Improving Statistical Characterization of Data Tensors with the Generalized Canonical Polyadic Tensor Decomposition
AU - Merris, Matthew D.
AU - Andersen, Tim
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This work explores the potential of the generalized canonical polyadic (GCP) tensor decomposition to be used as a diagnostic tool to determine the underlying statistical nature of a dataset. The GCP reformulates the standard canonical polyadic (CP) decomposition problem as a maximum likelihood estimate and models the natural parameter of the statistical distribution assumed to be associated with a data tensor as opposed to modeling the data itself through the use of a variety of statistically motivated loss functions. This property is of particular interest when a dataset is strongly non-Gaussian, such as is the case with binary or count data. In the work presented, we compare competing CP models of datasets with differing statistical natures to determine if the GCP can be used as an exploratory tool for the statistical characterization of a data tensor of interest. The quality of competing models is assessed via multiple metrics that include fit score, cosine similarity of the tensors, and the Core Consistency Diagnostic (CORCONDIA) score. Results are presented for a variety of artificially generated data tensors.
AB - This work explores the potential of the generalized canonical polyadic (GCP) tensor decomposition to be used as a diagnostic tool to determine the underlying statistical nature of a dataset. The GCP reformulates the standard canonical polyadic (CP) decomposition problem as a maximum likelihood estimate and models the natural parameter of the statistical distribution assumed to be associated with a data tensor as opposed to modeling the data itself through the use of a variety of statistically motivated loss functions. This property is of particular interest when a dataset is strongly non-Gaussian, such as is the case with binary or count data. In the work presented, we compare competing CP models of datasets with differing statistical natures to determine if the GCP can be used as an exploratory tool for the statistical characterization of a data tensor of interest. The quality of competing models is assessed via multiple metrics that include fit score, cosine similarity of the tensors, and the Core Consistency Diagnostic (CORCONDIA) score. Results are presented for a variety of artificially generated data tensors.
KW - dataset characterization
KW - multi-linear algebra
KW - tensor decomposition
KW - tensors
KW - unsupervised learning
UR - https://www.scopus.com/pages/publications/105021470800
U2 - 10.1109/HPEC67600.2025.11196360
DO - 10.1109/HPEC67600.2025.11196360
M3 - Conference contribution
AN - SCOPUS:105021470800
T3 - 2025 IEEE High Performance Extreme Computing Conference, HPEC 2025
BT - 2025 IEEE High Performance Extreme Computing Conference, HPEC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE High Performance Extreme Computing Conference, HPEC 2025
Y2 - 15 September 2025 through 19 September 2025
ER -