UP | HOME

Date: <2024-09-11 Wed>

Neural Acceleration of Graph Based Utility Functions for Sparse Matrices

Table of Contents

[file:][pdf:] by Joshua Booth, Greg Bolet

1. Objective

  • Use Neural Acceleration to approximate a utility function of Sparse Matrices
  • The utility function here is the Prediction function for fill-in of an ordering algorithm

We demonstrate these techniques on the problem related to the utility functions of sparse matrix ordering and fill-in (i.e., zero elements becoming nonzero during factorization) calculations [Page 1]

  • Find good ordering to reduce fill-in for Cholesky factorization

This paper focuses on the sparse matrix problem of ordering to reduce fill-in for the Cholesky factorization of a sparse matrix. [Page 3]

2. Traditional Method is Symbolic Factorization

One way to find a good ordering is through Symbolic Factorization of Sparse Matrix

  1. generate ordering
  2. estimate non zero count of \(L\) matrix using symbolic analysis
  3. apply cholesky with the best ordering to find the actual \(LU\) factorization

This problem is normally done in two steps: (1) generating and applying the ordering to the sparse matrix and (2) applying a symbolic analysis via the graph representation to calculate the nonzero counts (i.e., nnz(L)). These steps together are normally referred to as symbolic factorization by sparse linear solver packages.

2.1. Question: Is analysis required only once?

Page 3

For a fixed sparse matrix, this step may only be done once, but the analysis could consider multiple different orderings.

3. Can be formulated mostly as a Graph Based problem

The problem of finding best ordering has components that can be formulated as graph based problem

3.1. Graph Based Utility Functions

All ordering algorithms, which we consider, are graph based.

Ordering is a core graph problem but includes other graph problems like:

  • vertex information
  • graph partition

Calculating fill-in is Subgraph identification problem.

Both RCM and AMD have some part of the vertex information subproblem, and ND uses the graph partition subclass within its algorithm. [Page 3]

The problem of ordering to reduce fill-in contains a number of graph subclass problems. Overall, the problem fits into the first subclass [Page 2] of graph ordering. However, many ordering algorithms themselves fall into other subclasses, such as vertex information and graph partitioning. [Page 4]

The algorithm for calculating fill-in is a graph-based algorithm that deals with the subclass of subgraph identification. [Page 4]

3.2. Graph Based problems are usually approximated

Sparse matrix problems formulated as Graph problems are usually only solved by approximation. So it makes sense to use Neural Acceleration.

Many of the irregular sparse problems that are solved are already approximated in some manner, and this is especially true related to graph algorithms. This approximation for many of these problems is due to the exact solution being NP-hard, the exact solution being too computationally expensive, or the exact solution not being able to be parallelized. Therefore, these problems make sense to try to accelerate with neural acceleration. [Page 2]

4. Use Graph based Neural Network

4.1. Dense NN are not fit for Neural Acceleration of Graph Algorithms

  • Graph based algorithms build information iteratively
  • Dense NN don't mimic this behaviour
  • However, RRN and GNN do mimic this behaviour

The difficulty in many graph-based algorithms is the iterative nature that information is accumulated via the graph’s connectivity of nodes and edges as it builds up a holistic view of the input graph. [Page 5]

The next limiting factor is that the current approach utilizes only small dense neural networks that do not accurately approximate the computation pattern of graph algorithms. [Page 5]

Two general classifications of neural networks that do mimic this iterative nature are recurrent neural networks and graph neural networks. [Page 5]

  • This paper show that neural acceleration can be used for graph based algorithms too in scalable way
  • And the graph pattern of NN should be representative of graph algorithm

In our approach, we demonstrate that the concept of neural acceleration can be extended to graph-based algorithms using different base neural networks that better fit the computational pattern of the application. Moreover, we will demonstrate that this approach is still scalable in terms of performance. [Page 5]

we attempt to show in this work that the graph pattern is not something that should be searched for or explored but should be representative of the graph algorithm we are looking to approximate. [Page 7]

4.2. RNN: GRU vs LSTM

  • GRU is cheaper than LSTM
  • GRU is better than plain RNN

We select the GRU as it provides a better pass-through of history than RNN and is computationally cheaper than an LSTM. [Page 6]

4.3. GNN: GCN vs GAT

  • GAT is more sensitive to input parameter space and requires more data and training
  • GCN is cheaper
  • We use two variety of GCN: [Page 7]
    • GCN
    • Gated Graph Convolution

4.3.1. Question: What is spectral subarea convolution operator?

GraphConv is commonly referred to as a GCN and is in the spectral subarea convolution operator, as opposed to Graph Attention Networks (GAT) which is a commonly used operator in the spatial subarea. [Page 7]

4.3.2. Question: Network for M5?

\(\hat{x}_i = GRU(x_i, concat_{j \in N(i)} W_1 x_j)\)

  • We use One layer of graph convolution methods [Page 7]
  • concat => send the features for each neighbour edge one by one to RNN
  • So, one layer means one don't pass the output of one GRU to another GRU
  • M5 graph is that of Figure 3?

In number M5, the GatedGraph model (equation 4) is used where the graph is that in figure 3 with n vertices and the vertex data is each column of the sparse matrix. [Page 9]

typo?

5. Experiments

5.1. Dataset is SuiteSparse Collection

scaled down using Neighbourhood voting filter [TODO]

This generated dataset is constructed using square sparse matrices of size 512 to 1,000,000 from the SuiteSparse matrix collection. [Page 9]

Larger matrices are scaled down to the needed dimension using a 3 × 3 local neighborhood voting filter (i.e., a local filter similar to edge detection and blurring in image process- ing) to determine the nonzero pattern. [Page 9]

5.2. Loss Function is Smooth L1 Loss

5.3. Individual Model prediction Performance

[See Table 5]

5.4. Usefullness for selecting best ordering

5.4.1. Definition of Accuracy

  • Ordering that have a prediction within \(\tau_b\) tolerance are recommended
  • Accuracy = % of times best ordering is among the orderings recommended
  • Accuracy can be trivially (but not arbitarily) increased by increasing \(\tau_b\)

Recall that our method returns a list of all four orderings that are within τb = .05 of the smallest predicted nonzero count. If the observed smallest nonzero count is within this list, we consider this as being an accurate prediction of this ordering, even if it is not the smallest nonzero count for the returned list.

5.4.2. Accuracy vs Precision/Recall

However, we would like to identify if our individual models are truly sorting out the different orderings or if the measurement of accuracy is due to solely the approximation tolerance (τb ).

For completeness, we additionally provide a confusion matrix for fill-in selection for each model type. [Confusion Matrix]

5.4.3. QN: Typo?

Divide by N is missing

5.5. M5 is good

  • high accuracy with low \(\tau_b\)

M5 is about to achieve an accuracy of around .90 with τb being .01. This demonstrates that all the submodels in M5 are fairly good approximation functions based on our argument related to our second consideration for sensitivity

5.6. Question: Does 30x speedup include finding the fill in too?

Does the time for M5 include computing the ordering too?

As I understand, time for CHOLMOD is to find ordering as per all methods, and then compute the fill in for all methods.

But for M5 is to find fill in directly for all methods.

The time for CHOLMOD to find each of the orderings and calculate the fill-in is approximately 385.2 seconds for a suite of all 996 sparse matrices.

The time for M5 is 302.1 seconds (∼ 1.3× speedup) on the CPU and 12.7 seconds (30.3× speedup) on GPU. [Page 16]

6. Conclusion

Objective achieved by understanding and using:

6.1. We use NA to identify best ordering

graph-based problem of computing the nonzero count of the factorized sparse matrix within the framework of the graph problem of identifying the best sparse matrix ordering.

6.2. Considering computation pattern is important

picking the correct method and considering the connectivity computational pattern graph is important in reducing the search space for a suitable approximation method.

6.3. Upto 30x speedup within ~5% accuracy

speed up the execution of this problem by more 30× compared to the standard serial graph-based method while still providing an average estimated fill-in within ∼ 5% of the true


Backlinks


You can send your feedback, queries here