SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Exploring optimizations for dynamic pagerank algorithm based on GPU
Subhajit Sahu
Advisor: Kishore Kothapalli
Center for Security, Theory, and Algorithmic Research (CSTAR)
International Institute of Information Technology, Hyderabad (IIITH)
Gachibowli, Hyderabad, India - 500 032
subhajit.sahu@research.iiit.ac.in
1. Introduction
Graph is a generic data structure and is a superset of lists, and trees. Binary search on
sorted lists can be interpreted as a balanced binary tree search. Database tables can be
thought of as indexed lists, and table joins represent relations between columns. This can be
modeled as graphs instead. Assignment of registers to variables (by compiler), and
assignment of available channels to a radio transmitter and also graph problems. Finding
shortest path between two points, and sorting web pages in order of importance are also
graphs problems. Neural networks are graphs too. Interaction between messenger
molecules in the body, and interaction between people on social media, also modeled as
graphs.
The web has a bowtie structure on many levels. There is usually one giant strongly
connected component, with several pages pointing into this component, several pages
pointed to by the component, and a number of disconnected pages. This structure is seen as
a fractal on many different levels. [1]
Static graphs are those which do not change with time. Static graph algorithms are
techniques used to solve such a graph problem (developed since the 1940s). To solve larger
and larger problems, a number of optimizations (both algorithmic and hardware/software
techniques) have been developed to take advantage of vector-processors (like Cray),
multicores, and GPUs. A lot of research had to be done in order to find ways to enhance
concurrency. The techniques include a number of concurrency models, locking
techniques, transactions, etc. This is especially due to a lack of single-core performance
improvements.
Graphs where relations vary with time, are called temporal graphs. As you might guess,
many problems use temporal graphs. These temporal graphs can be thought of as a series
of static graphs at different points in time. In order to solve graph problems with these
temporal graphs, people would normally take the graph at a certain point in time, and run the
necessary static graph algorithm on it. This worked out fine, and as the size of the temporal
graph grows, this repeated computation becomes increasingly slower. It is possible to take
advantage of previous results, in order to compute the result for the next time point. Such
algorithms are called dynamic graph algorithms. This is an ongoing area of research,
which includes new algorithms, hardware/software optimization techniques for distributed
systems, multicores (shared memory), GPUS, and even FPGAs. Optimization of algorithms
can focus on space complexity (memory usage), time complexity (query time),
preprocessing time, and even accuracy of result.
While dynamic algorithms only focus on optimizing the algorithm’s computation time,
dynamic graph data structures focus on improving graph update time, and memory usage.
Dense graphs are usually represented by an adjacency matrix (bit matrix). Sparse graphs
can be represented with variations of adjacency lists (like CSR), and edge lists. Sparse
graphs can also be thought of as sparse matrices, and edges of a vertex can be considered
a bitset. In fact, a number of graph algorithms can be modeled as linear algebra operations
(see nvGraph, cuGraph frameworks). A number of dynamic graph data structures have also
been developed to improve update speed (like PMA), or enable concurrent updates and
computation (like Aspen’s compressed functional trees). [2]
Streaming / dynamic / time-evolving graph data structures maintain only the latest graph
information. Historical graphs on the other hand keep track of all previous states of the
graph. Changes to the graphs can be thought of as edge insertions and deletions, which
are usually done in batches. Except for functional techniques, updating a graph usually
involves modifying a shared structure using some kind of fine-grained synchronization. It
might also be possible to store additional information along with vertices/edges, though this
is usually not the focus of research (graph databases do). In the recent decade or so, a
number of graph streaming frameworks have been developed, each with a certain focus
area, and targeting a certain platform (distributed system / multiprocessor / GPU / FPGA /
ASIC). Such frameworks focus on designing an improved dynamic graph data structure, and
define a fundamental model of computation. For GPUs, the following frameworks exist:
cuSTINGER, aimGraph, faimGraph, Hornet, EvoGraph, and GPMA. [2]
2. Pagerank algorithm
The pagerank algorithm is a technique used to sort web pages (or vertices of a graph) by
importance. It is quite popularly the algorithm published by the founders of Google. Other
link analysis algorithms include HITS, TrustRank, and HummingBird. Such algorithms are
also used for word sense disambiguation in lexical semantics, rank streets by traffic,
measure impact of communities on web, provide recommendations, analysis of
neural/protein networks, determine species essential for health of the environment, or even
quantify the scientific impact of researchers. [3]
In order to understand the pagerank algorithm, consider this random (web) surfer model.
Each web page is modeled as a vertex, and each hyperlink as an edge. The surfer (such as
you) initially visits a web page at random. He then follows one of the links on the page,
leading to another web page. After following some links, the surfer would eventually decide
to visit another web page (at random). The probability of the random surfer being on a
certain page is what the pagerank algorithm returns. This probability (or importance) of a
web page depends upon the importance of web pages pointing to it (markov chain). This
definition of pagerank is recursive, and takes the form of an eigen-value problem. Solving
for pagerank thus requires multiple iterations of computation, which is known as the
power-iteration method. Each computation is essentially a (sparse) matrix multiplication.
A damping factor (of 0.85) is used to counter the effect of spider-traps (like self-loops),
which can otherwise suck up all importance. Dead-ends (web pages with no out-links) are
countered by effectively linking it to all vertices of the graph (making markov matrix column
stochastic), which otherwise would leak out importance. [4]
Note that as originally conceived, the PageRank model does not factor a web browser’s
back button into a surfer’s hyperlinking possibilities. Surfers in one class, if teleporting, may
be much more likely to jump to pages about sports, while surfers in another class may be
much more likely to jump to pages pertaining to news and current events. Such differing
teleportation tendencies can be captured in two different personalization vectors. However,
it makes the once query-independent, user independent PageRankings user-dependent and
more calculation-laden. Nevertheless, it seems this little personalization vector has had more
significant side effects. Google has recently used this personalization vector to control
spamming done by the so-called link farms. [1]
Pagerank algorithms almost always take the following parameters: damping, tolerance, and
max. iterations. Here, tolerance defines the error between the previous and the current
iterations. Though this is usually L1-norm, L2 and L∞-norm are also used sometimes. Both
damping and tolerance control the rate of convergence of the algorithm. The choice of
tolerance function also affects the rate of convergence. However, adjusting damping can
give completely different pagerank values. Since the ordering of vertices is important, and
not the exact values, it can usually be a good idea to choose a larger tolerance value.
3. Optimizing Pagerank
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try
reducing the work per iteration, and the other is to try reducing the number of iterations.
These goals are often at odds with one another. A no. of techniques can be used to
compress adjacency lists. The gap technique proposed stores only the difference between
the neighbour ids in edge lists. The reference encoding technique uses sets of edges as
reference to define an edge list (but it is not easy to find the reference vertices). Research
has also been done of compressing the rank vector (which is dense), using smaller
custom data types, but it was found to be not so useful. [1]
The adapting pagerank technique “locks” vertices which have converged, and saves
iteration time by skipping their computation. [1] Identical nodes, which have the same
in-links, can be removed to reduce duplicate computations and thus reduce iteration time.
Road networks often have chains which can be short-circuited before pagerank computation
to improve performance. Final ranks of chain nodes can be easily calculated. This reduces
both the iteration time, and the number of iterations. If a graph has no dangling nodes,
pagerank of each strongly connected component can be computed in topological order. This
helps reduce the iteration time, no. of iterations, and also enable concurrency in pagerank
computation. The combination of all of the above methods is the STICD algorithm. [5] A
somewhat similar aggregation algorithm is BlockRank which computes the pagerank of
hosts, local pagerank of pages within hosts independently, and aggregates them with
weights for the final rank vector. It produces a speed-up of factor 2 on some datasets. The
global PageRank solution can be found in a computationally efficient manner by computing
the subPageRank of each connected component, then pasting the subPageRanks together
to form the global PageRank, using Avrachenkov et. al. method. These methods exploit
the inherent reducibility in the graph. Bianchini et. al. suggest using the Jacobi method to
compute the PageRank vector. [1]
Pagerank algorithm is a live algorithm which means that an ongoing computation can be
paused during graph update, and simply be resumed afterwards (instead of restarting it).
The first updating paper by Chien et al. (2002) identifies a small portion of the web graph
“near” the link changes and model the rest of the web as a single node in a new, much
smaller graph; compute a pagerank for this small graph and transfer these results to the
much bigger, original graph. [1]
4. Graph streaming frameworks / databases
STINGER uses an extended form of CSR with edge lists represented and link-list of
contiguous blocks. Each edge has 2 timestamps, and fine-locking is used per edge.
cuSTINGER extends STINGER for CUDA GPUs and uses contiguous edge list instead
(CSR). faimGraph is a GPU framework with fully dynamic vertex and edge updates. It has
an in-GPU memory manager, and uses a paged linked-list for edges similar to STINGER.
Hornet also implements its own memory manager, and uses B+ trees to maintain blocks
efficiently, and keep track of empty space. LLAMA uses a variant of CSR with large
multi-versioned arrays. It stores all snapshots of a graph, and persists old snapshots to disk.
GraphIn uses CSR along with edge lists, and updates CSR after edge lists are large
enough. GraphOne is also similar, and uses page-aligned memory for high-degree vertices.
GraphTau is based on Apache Spark and uses read-only partitioned collections of data sets.
It uses a window sliding model for graph snapshots. Aspen uses C-tree (tree of trees) based
on purely functional compressed search trees to store graph structures. Elements are stored
in chunks and compressed using difference encoding. It allows any no. of readers and a
single writer, and the framework guarantees strict serializability. Tegra stores the full history
of the graph and relies on recomputing graph algorithms on affected subgraphs. It also uses
a cost model to guess when full recomputation might be better. It uses an adaptive radix tree
as the core data structure for efficient updates and range scans. [2]
Unlike graph streaming frameworks, graph databases focus on rich attached data, complex
queries, transactional support with ACID properties, data replication and sharding. A few
graph databases have started to support global analytics as well. However, most graph
databases do not offer dedicated support for incremental changes. Little research exists into
accelerating streaming graph processing using low-cost atomics, hardware transactions,
FPGAs, high-performance networking hardware. On average, the highest rate of ingestion
is achieved by shared memory single-node designs. [2]
5. NVIDIA Tesla V100 GPU Architecture
NVIDIA Tesla was a line of products targeted at stream processing / general-purpose
graphics processing units (GPGPUs). In May 2020, NVIDIA retired the Tesla brand because
of potential confusion with the brand of cars. Its new GPUs are branded NVIDIA Data
Center GPUs as in the Ampere A100 GPU. [6]
The NVIDIA Tesla GV100 (Volta) is a 21.1 billion transistor TSMC 12nm FinFET with die
size 815 mm2
. Here is a short summary of its features:
● 84 SMs, each with 64 independent FP, INT cores.
● Shared memory size config. up to 96KB / SM.
● 4 512-bit memory controllers (total 4096-bit).
● Upto 6 bidirectional NVLink, 25 GB/s per direction (for IBM Power 9 CPUs).
● 4 dies / HBM stack, with 4 stacks. 16 GB with 900 GB/s HBM2 (Samsung).
● Native/sideband SEDEC (1 correct, 2 detect) ECC (for HBM, REG, L1, L2).
Each SM has 4 processing blocks (each handles 1 warp of 32 threads). L1 data cache is
combined with shared memory of 128 KB / SM (explicit caching not as necessary anymore).
Volta also supports write-caching (not just load, as previous architectures). NVLink
supports coherency allowing data reads from GPU memory to be stored in CPU cache.
Address Translation Service (ATS) allows the GPU to access CPU page tables directly
(malloc ptr). The new copy engine doesn't need pinned memory. Volta per-thread
program-counter, call-stack, allows interleaved executions of warp threads, enabling
fine-grained synchronization between threads within a warp (use __syncwarp()).
Cooperative groups enable synchronization between warps, grid-wide, multi-GPUs,
cross-warp, sub-warp. [7]
6. Experiments
Adjusting data types for rank vector
Custom fp16 bfloat16 float double
1. Performance of vector element sum using float vs bfloat16 as the storage type.
2. Comparison of PageRank using float vs bfloat16 as the storage type (pull, CSR).
3. Performance of PageRank using 32-bit floats vs 64-bit floats (pull, CSR).
Adjusting CSR format for graph
Regular 32-bit Hybrid 32-bit Hybrid 64-bit
single bit 32-bit index
4-bit block 28-bit index (30 eff.) 60-bit index (62 eff.)
8-bit block 24-bit index (27 eff.) 56-bit index (59 eff.)
16-bit block 16-bit index (20 eff.) 48-bit index (52 eff.)
32-bit block 32-bit index (32 eff.)
1. Comparing space usage of regular vs hybrid CSR (various sizes).
Adjusting Pagerank parameters
Damping Factor adjust dynamic-adjust
Tolerance L1 norm L2 norm L∞ norm
1. Comparing the effect of using different values of damping factor, with PageRank (pull, CSR).
2. Experimenting PageRank improvement by adjusting damping factor (α) between iterations.
3. Comparing the effect of using different functions for convergence check, with PageRank (...).
4. Comparing the effect of using different values of tolerance, with PageRank (pull, CSR).
Adjusting Sequential approach
Push Pull Class CSR
1. Performance of contribution-push based vs contribution-pull based PageRank.
2. Performance of C++ DiGraph class based vs CSR based PageRank (pull).
Adjusting OpenMP approach
Map Reduce Uniform Hybrid
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Performance of sequential execution based vs OpenMP based vector element sum.
3. Performance of uniform-OpenMP based vs hybrid-OpenMP based PageRank (pull, CSR).
Comparing sequential approach
OpenMP nvGraph
Sequential vs vs
OpenMP vs
1. Performance of sequential execution based vs OpenMP based PageRank (pull, CSR).
2. Performance of sequential execution based vs nvGraph based PageRank (pull, CSR).
3. Performance of OpenMP based vs nvGraph based PageRank (pull, CSR).
Adjusting Monolithic (Sequential) optimizations (from STICD)
Split components Skip in-identicals Skip chains Skip converged
1. Performance benefit of PageRank with vertices split by components (pull, CSR).
2. Performance benefit of skipping in-identical vertices for PageRank (pull, CSR).
3. Performance benefit of skipping chain vertices for PageRank (pull, CSR).
4. Performance benefit of skipping converged vertices for PageRank (pull, CSR).
Adjusting Levelwise (STICD) approach
Min. component size Min. compute size Skip teleport calculation
1. Comparing various min. component sizes for topologically-ordered components (levelwise...).
2. Comparing various min. compute sizes for topologically-ordered components (levelwise...).
3. Checking performance benefit of levelwise PageRank when teleport calculation is skipped.
Note: min. components size merges small components even before generating block-graph /
topological-ordering, but min. compute size does it before pagerank computation.
Comparing Levelwise (STICD) approach
Monolithic nvGraph
Levelwise (STICD) vs
1. Performance of monolithic vs topologically-ordered components (levelwise) PageRank.
Adjusting ranks for dynamic graphs
update new zero fill 1/N fill
update old, new scale, 1/N fill
1. Comparing strategies to update ranks for dynamic PageRank (pull, CSR).
Adjusting Levelwise (STICD) dynamic approach
Skip unaffected components For fixed graphs For temporal graphs
1. Checking for correctness of levelwise PageRank when unchanged components are skipped.
2. Perf. benefit of levelwise PageRank when unchanged components are skipped (fixed).
3. Perf. benefit of levelwise PageRank when unchanged components are skipped (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Comparing dynamic approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: temporal
Monolithic static vs: fixed, temporal vs: fixed, temporal
Levelwise static vs: fixed vs: fixed, temporal
1. Performance of nvGraph based static vs dynamic PageRank (temporal).
2. Performance of static vs dynamic PageRank (temporal).
3. Performance of static vs dynamic levelwise PageRank (fixed).
4. Performance of levelwise based static vs dynamic PageRank (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Adjusting Monolithic CUDA approach
Map launch
Reduce memcpy launch in-place launch vs
Thread /V launch sort/p. vertices sort edges
Block /V launch sort/p. vertices sort edges
Switched /V thread launch block launch switch-point
1. Comparing various launch configs for CUDA based vector multiply.
2. Comparing various launch configs for CUDA based vector element sum (memcpy).
3. Comparing various launch configs for CUDA based vector element sum (in-place).
4. Performance of memcpy vs in-place based CUDA based vector element sum.
5. Comparing various launch configs for CUDA thread-per-vertex based PageRank (pull, CSR).
6. Sorting vertices and/or edges by in-degree for CUDA thread-per-vertex based PageRank.
7. Comparing various launch configs for CUDA block-per-vertex based PageRank (pull, CSR).
8. Sorting vertices and/or edges by in-degree for CUDA block-per-vertex based PageRank.
9. Launch configs for CUDA switched-per-vertex based PageRank focusing on thread approach.
10. Launch configs for CUDA switched-per-vertex based PageRank focusing on block approach.
11. Sorting vertices and/or edges by in-degree for CUDA switched-per-vertex based PageRank.
12. Comparing various switch points for CUDA switched-per-vertex based PageRank (pull, ...).
Note: sort/p. vertices ⇒ sorting vertices by ascending or descending order of in-degree, or simply
partitioning (by in-degree). sort edges ⇒ sorting edges by ascending or descending order of id.
Adjusting Monolithic CUDA optimizations (from STICD)
Split components Skip in-identicals Skip chains Skip converged
1. Performance benefit of CUDA based PageRank with vertices split by components.
2. Performance benefit of skipping in-identical vertices for CUDA based PageRank (pull, CSR).
3. Performance benefit of skipping chain vertices for CUDA based PageRank (pull, CSR).
4. Performance benefit of skipping converged vertices for CUDA based PageRank (pull, CSR).
Adjusting Levelwise (STICD) CUDA approach
Min. component size Min. compute size Skip teleport calculation
1. Min. component sizes for topologically-ordered components (levelwise, CUDA) PageRank.
2. Min. compute sizes for topologically-ordered components (levelwise CUDA) PageRank.
Note: min. components size merges small components even before generating block-graph /
topological-ordering, but min. compute size does it before pagerank computation.
Comparing Levelwise (STICD) CUDA approach
nvGraph Monolithic CUDA
Monolithic vs vs
Monolithic CUDA vs
Levelwise CUDA vs vs
1. Performance of sequential execution based vs CUDA based PageRank (pull, CSR).
2. Performance of nvGraph vs CUDA based PageRank (pull, CSR).
3. Performance of Monolithic CUDA vs Levelwise CUDA PageRank (pull, CSR, ...).
Comparing dynamic CUDA approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
Monolithic static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
Levelwise static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
1. Performance of static vs dynamic CUDA based PageRank (fixed).
2. Performance of static vs dynamic CUDA based PageRank (temporal).
3. Performance of CUDA based static vs dynamic levelwise PageRank (fixed).
4. Performance of static vs dynamic CUDA based levelwise PageRank (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Comparing dynamic optimized CUDA approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: fixed vs: fixed vs: fixed
Monolithic static vs: fixed vs: fixed vs: fixed
Levelwise static vs: fixed vs: fixed vs: fixed
1. Performance of CUDA based optimized dynamic monolithic vs levelwise PageRank (fixed).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
7. Packages
1. CLI for SNAP dataset, which is a collection of more than 50 large networks.
This is for quickly fetching SNAP datasets that you need right from the CLI. Currently there is
only one command clone, where you can provide filters for specifying exactly which datasets
you need, and where to download them. If a dataset already exists, it is skipped. This
summary is shown at the end. You can install this with npm install -g snap-data.sh.
2. CLI for nvGraph, which is a GPU-based graph analytics library written by NVIDIA,
using CUDA.
This is for running nvGraph functions right from the CLI with graphs in MatrixMarket format
(.mtx) directly. It just needs a x86_64 linux machine with NVIDIA GPU drivers installed.
Execution time, along with the results can be saved in JSON/YAML file. The executable code
is written in C++. You can install this with npm install -g nvgraph.sh.
8. Further action
List dynamic graph algorithms
List dynamic graph data structures
List graph processing frameworks
List graph applications
Package graph processing frameworks
9. Bibliography
[1] A. Langville and C. Meyer, “Deeper Inside PageRank,” Internet Math., vol. 1, no. 3, pp.
335–380, Jan. 2004, doi: 10.1080/15427951.2004.10129091.
[2] M. Besta, M. Fischer, V. Kalavri, M. Kapralov, and T. Hoefler, “Practice of Streaming and
Dynamic Graphs: Concepts, Models, Systems,  and Parallelism,” CoRR, vol.
abs/1912.12740, 2019.
[3] Contributors to Wikimedia projects, “PageRank,” Wikipedia, Jul. 2021.
https://en.wikipedia.org/wiki/PageRank (accessed Mar. 01, 2021).
[4] J. Leskovec, “PageRank Algorithm, Mining massive Datasets (CS246), Stanford
University,” YouTube, 2019.
[5] P. Garg and K. Kothapalli, “STIC-D: Algorithmic techniques for efficient parallel
pagerank computation on real-world graphs,” in Proceedings of the 17th International
Conference on Distributed Computing and Networking - ICDCN ’16, New York, New
York, USA, Jan. 2016, pp. 1–10, doi: 10.1145/2833312.2833322.
[6] Contributors to Wikimedia projects, “Nvidia Tesla,” Wikipedia, Apr. 2021.
https://en.wikipedia.org/wiki/Nvidia_Tesla (accessed Jun. 01, 2021).
[7] NVIDIA Corporation, “NVIDIA Tesla V100 GPU Architecture Whitepaper,” NVIDIA
Corporation, 2017. Accessed: Jul. 13, 2021. [Online]. Available:
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pd
f.

Contenu connexe

Tendances

Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobDatabricks
 
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Spark Summit
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01Krishna Sankar
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data SuccessLars Albertsson
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
 

Tendances (20)

Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
Prashant_Agrawal_CV
Prashant_Agrawal_CVPrashant_Agrawal_CV
Prashant_Agrawal_CV
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
 
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data Success
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 

Similaire à Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3

Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Subhajit Sahu
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlesSoundar Msr
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Implementing sorting in database systems
Implementing sorting in database systemsImplementing sorting in database systems
Implementing sorting in database systemsunyil96
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big DataNick Boucart
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Parallel algorithms for multi-source graph traversal and its applications
Parallel algorithms for multi-source graph traversal and its applicationsParallel algorithms for multi-source graph traversal and its applications
Parallel algorithms for multi-source graph traversal and its applicationsSubhajit Sahu
 

Similaire à Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3 (20)

Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Implementing sorting in database systems
Implementing sorting in database systemsImplementing sorting in database systems
Implementing sorting in database systems
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Parallel algorithms for multi-source graph traversal and its applications
Parallel algorithms for multi-source graph traversal and its applicationsParallel algorithms for multi-source graph traversal and its applications
Parallel algorithms for multi-source graph traversal and its applications
 

Plus de Subhajit Sahu

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESSubhajit Sahu
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Subhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESSubhajit Sahu
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTESSubhajit Sahu
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERSubhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESSubhajit Sahu
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESSubhajit Sahu
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTESSubhajit Sahu
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESSubhajit Sahu
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESSubhajit Sahu
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...Subhajit Sahu
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESSubhajit Sahu
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESSubhajit Sahu
 

Plus de Subhajit Sahu (20)

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTES
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTES
 

Dernier

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 

Dernier (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 

Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3

  • 1. Exploring optimizations for dynamic pagerank algorithm based on GPU Subhajit Sahu Advisor: Kishore Kothapalli Center for Security, Theory, and Algorithmic Research (CSTAR) International Institute of Information Technology, Hyderabad (IIITH) Gachibowli, Hyderabad, India - 500 032 subhajit.sahu@research.iiit.ac.in 1. Introduction Graph is a generic data structure and is a superset of lists, and trees. Binary search on sorted lists can be interpreted as a balanced binary tree search. Database tables can be thought of as indexed lists, and table joins represent relations between columns. This can be modeled as graphs instead. Assignment of registers to variables (by compiler), and assignment of available channels to a radio transmitter and also graph problems. Finding shortest path between two points, and sorting web pages in order of importance are also graphs problems. Neural networks are graphs too. Interaction between messenger molecules in the body, and interaction between people on social media, also modeled as graphs.
  • 2. The web has a bowtie structure on many levels. There is usually one giant strongly connected component, with several pages pointing into this component, several pages pointed to by the component, and a number of disconnected pages. This structure is seen as a fractal on many different levels. [1] Static graphs are those which do not change with time. Static graph algorithms are techniques used to solve such a graph problem (developed since the 1940s). To solve larger and larger problems, a number of optimizations (both algorithmic and hardware/software techniques) have been developed to take advantage of vector-processors (like Cray), multicores, and GPUs. A lot of research had to be done in order to find ways to enhance concurrency. The techniques include a number of concurrency models, locking techniques, transactions, etc. This is especially due to a lack of single-core performance improvements. Graphs where relations vary with time, are called temporal graphs. As you might guess, many problems use temporal graphs. These temporal graphs can be thought of as a series of static graphs at different points in time. In order to solve graph problems with these temporal graphs, people would normally take the graph at a certain point in time, and run the necessary static graph algorithm on it. This worked out fine, and as the size of the temporal graph grows, this repeated computation becomes increasingly slower. It is possible to take advantage of previous results, in order to compute the result for the next time point. Such algorithms are called dynamic graph algorithms. This is an ongoing area of research, which includes new algorithms, hardware/software optimization techniques for distributed systems, multicores (shared memory), GPUS, and even FPGAs. Optimization of algorithms can focus on space complexity (memory usage), time complexity (query time), preprocessing time, and even accuracy of result. While dynamic algorithms only focus on optimizing the algorithm’s computation time, dynamic graph data structures focus on improving graph update time, and memory usage.
  • 3. Dense graphs are usually represented by an adjacency matrix (bit matrix). Sparse graphs can be represented with variations of adjacency lists (like CSR), and edge lists. Sparse graphs can also be thought of as sparse matrices, and edges of a vertex can be considered a bitset. In fact, a number of graph algorithms can be modeled as linear algebra operations (see nvGraph, cuGraph frameworks). A number of dynamic graph data structures have also been developed to improve update speed (like PMA), or enable concurrent updates and computation (like Aspen’s compressed functional trees). [2] Streaming / dynamic / time-evolving graph data structures maintain only the latest graph information. Historical graphs on the other hand keep track of all previous states of the graph. Changes to the graphs can be thought of as edge insertions and deletions, which are usually done in batches. Except for functional techniques, updating a graph usually involves modifying a shared structure using some kind of fine-grained synchronization. It might also be possible to store additional information along with vertices/edges, though this is usually not the focus of research (graph databases do). In the recent decade or so, a number of graph streaming frameworks have been developed, each with a certain focus area, and targeting a certain platform (distributed system / multiprocessor / GPU / FPGA / ASIC). Such frameworks focus on designing an improved dynamic graph data structure, and define a fundamental model of computation. For GPUs, the following frameworks exist: cuSTINGER, aimGraph, faimGraph, Hornet, EvoGraph, and GPMA. [2]
  • 4. 2. Pagerank algorithm The pagerank algorithm is a technique used to sort web pages (or vertices of a graph) by importance. It is quite popularly the algorithm published by the founders of Google. Other link analysis algorithms include HITS, TrustRank, and HummingBird. Such algorithms are also used for word sense disambiguation in lexical semantics, rank streets by traffic, measure impact of communities on web, provide recommendations, analysis of neural/protein networks, determine species essential for health of the environment, or even quantify the scientific impact of researchers. [3] In order to understand the pagerank algorithm, consider this random (web) surfer model. Each web page is modeled as a vertex, and each hyperlink as an edge. The surfer (such as you) initially visits a web page at random. He then follows one of the links on the page, leading to another web page. After following some links, the surfer would eventually decide to visit another web page (at random). The probability of the random surfer being on a certain page is what the pagerank algorithm returns. This probability (or importance) of a web page depends upon the importance of web pages pointing to it (markov chain). This definition of pagerank is recursive, and takes the form of an eigen-value problem. Solving for pagerank thus requires multiple iterations of computation, which is known as the power-iteration method. Each computation is essentially a (sparse) matrix multiplication. A damping factor (of 0.85) is used to counter the effect of spider-traps (like self-loops), which can otherwise suck up all importance. Dead-ends (web pages with no out-links) are countered by effectively linking it to all vertices of the graph (making markov matrix column stochastic), which otherwise would leak out importance. [4] Note that as originally conceived, the PageRank model does not factor a web browser’s back button into a surfer’s hyperlinking possibilities. Surfers in one class, if teleporting, may be much more likely to jump to pages about sports, while surfers in another class may be much more likely to jump to pages pertaining to news and current events. Such differing
  • 5. teleportation tendencies can be captured in two different personalization vectors. However, it makes the once query-independent, user independent PageRankings user-dependent and more calculation-laden. Nevertheless, it seems this little personalization vector has had more significant side effects. Google has recently used this personalization vector to control spamming done by the so-called link farms. [1] Pagerank algorithms almost always take the following parameters: damping, tolerance, and max. iterations. Here, tolerance defines the error between the previous and the current iterations. Though this is usually L1-norm, L2 and L∞-norm are also used sometimes. Both damping and tolerance control the rate of convergence of the algorithm. The choice of tolerance function also affects the rate of convergence. However, adjusting damping can give completely different pagerank values. Since the ordering of vertices is important, and not the exact values, it can usually be a good idea to choose a larger tolerance value. 3. Optimizing Pagerank Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. A no. of techniques can be used to compress adjacency lists. The gap technique proposed stores only the difference between the neighbour ids in edge lists. The reference encoding technique uses sets of edges as reference to define an edge list (but it is not easy to find the reference vertices). Research has also been done of compressing the rank vector (which is dense), using smaller custom data types, but it was found to be not so useful. [1] The adapting pagerank technique “locks” vertices which have converged, and saves iteration time by skipping their computation. [1] Identical nodes, which have the same in-links, can be removed to reduce duplicate computations and thus reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This reduces both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This helps reduce the iteration time, no. of iterations, and also enable concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [5] A somewhat similar aggregation algorithm is BlockRank which computes the pagerank of hosts, local pagerank of pages within hosts independently, and aggregates them with weights for the final rank vector. It produces a speed-up of factor 2 on some datasets. The global PageRank solution can be found in a computationally efficient manner by computing the subPageRank of each connected component, then pasting the subPageRanks together to form the global PageRank, using Avrachenkov et. al. method. These methods exploit the inherent reducibility in the graph. Bianchini et. al. suggest using the Jacobi method to compute the PageRank vector. [1]
  • 6. Pagerank algorithm is a live algorithm which means that an ongoing computation can be paused during graph update, and simply be resumed afterwards (instead of restarting it). The first updating paper by Chien et al. (2002) identifies a small portion of the web graph “near” the link changes and model the rest of the web as a single node in a new, much smaller graph; compute a pagerank for this small graph and transfer these results to the much bigger, original graph. [1] 4. Graph streaming frameworks / databases STINGER uses an extended form of CSR with edge lists represented and link-list of contiguous blocks. Each edge has 2 timestamps, and fine-locking is used per edge. cuSTINGER extends STINGER for CUDA GPUs and uses contiguous edge list instead (CSR). faimGraph is a GPU framework with fully dynamic vertex and edge updates. It has an in-GPU memory manager, and uses a paged linked-list for edges similar to STINGER. Hornet also implements its own memory manager, and uses B+ trees to maintain blocks efficiently, and keep track of empty space. LLAMA uses a variant of CSR with large multi-versioned arrays. It stores all snapshots of a graph, and persists old snapshots to disk. GraphIn uses CSR along with edge lists, and updates CSR after edge lists are large enough. GraphOne is also similar, and uses page-aligned memory for high-degree vertices. GraphTau is based on Apache Spark and uses read-only partitioned collections of data sets. It uses a window sliding model for graph snapshots. Aspen uses C-tree (tree of trees) based on purely functional compressed search trees to store graph structures. Elements are stored in chunks and compressed using difference encoding. It allows any no. of readers and a single writer, and the framework guarantees strict serializability. Tegra stores the full history of the graph and relies on recomputing graph algorithms on affected subgraphs. It also uses a cost model to guess when full recomputation might be better. It uses an adaptive radix tree as the core data structure for efficient updates and range scans. [2] Unlike graph streaming frameworks, graph databases focus on rich attached data, complex queries, transactional support with ACID properties, data replication and sharding. A few graph databases have started to support global analytics as well. However, most graph databases do not offer dedicated support for incremental changes. Little research exists into accelerating streaming graph processing using low-cost atomics, hardware transactions, FPGAs, high-performance networking hardware. On average, the highest rate of ingestion is achieved by shared memory single-node designs. [2]
  • 7. 5. NVIDIA Tesla V100 GPU Architecture NVIDIA Tesla was a line of products targeted at stream processing / general-purpose graphics processing units (GPGPUs). In May 2020, NVIDIA retired the Tesla brand because of potential confusion with the brand of cars. Its new GPUs are branded NVIDIA Data Center GPUs as in the Ampere A100 GPU. [6] The NVIDIA Tesla GV100 (Volta) is a 21.1 billion transistor TSMC 12nm FinFET with die size 815 mm2 . Here is a short summary of its features: ● 84 SMs, each with 64 independent FP, INT cores.
  • 8. ● Shared memory size config. up to 96KB / SM. ● 4 512-bit memory controllers (total 4096-bit). ● Upto 6 bidirectional NVLink, 25 GB/s per direction (for IBM Power 9 CPUs). ● 4 dies / HBM stack, with 4 stacks. 16 GB with 900 GB/s HBM2 (Samsung). ● Native/sideband SEDEC (1 correct, 2 detect) ECC (for HBM, REG, L1, L2). Each SM has 4 processing blocks (each handles 1 warp of 32 threads). L1 data cache is combined with shared memory of 128 KB / SM (explicit caching not as necessary anymore). Volta also supports write-caching (not just load, as previous architectures). NVLink supports coherency allowing data reads from GPU memory to be stored in CPU cache. Address Translation Service (ATS) allows the GPU to access CPU page tables directly (malloc ptr). The new copy engine doesn't need pinned memory. Volta per-thread program-counter, call-stack, allows interleaved executions of warp threads, enabling fine-grained synchronization between threads within a warp (use __syncwarp()). Cooperative groups enable synchronization between warps, grid-wide, multi-GPUs, cross-warp, sub-warp. [7] 6. Experiments Adjusting data types for rank vector Custom fp16 bfloat16 float double 1. Performance of vector element sum using float vs bfloat16 as the storage type. 2. Comparison of PageRank using float vs bfloat16 as the storage type (pull, CSR). 3. Performance of PageRank using 32-bit floats vs 64-bit floats (pull, CSR). Adjusting CSR format for graph Regular 32-bit Hybrid 32-bit Hybrid 64-bit
  • 9. single bit 32-bit index 4-bit block 28-bit index (30 eff.) 60-bit index (62 eff.) 8-bit block 24-bit index (27 eff.) 56-bit index (59 eff.) 16-bit block 16-bit index (20 eff.) 48-bit index (52 eff.) 32-bit block 32-bit index (32 eff.) 1. Comparing space usage of regular vs hybrid CSR (various sizes). Adjusting Pagerank parameters Damping Factor adjust dynamic-adjust Tolerance L1 norm L2 norm L∞ norm 1. Comparing the effect of using different values of damping factor, with PageRank (pull, CSR). 2. Experimenting PageRank improvement by adjusting damping factor (α) between iterations. 3. Comparing the effect of using different functions for convergence check, with PageRank (...). 4. Comparing the effect of using different values of tolerance, with PageRank (pull, CSR). Adjusting Sequential approach Push Pull Class CSR 1. Performance of contribution-push based vs contribution-pull based PageRank. 2. Performance of C++ DiGraph class based vs CSR based PageRank (pull). Adjusting OpenMP approach Map Reduce Uniform Hybrid 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Performance of sequential execution based vs OpenMP based vector element sum. 3. Performance of uniform-OpenMP based vs hybrid-OpenMP based PageRank (pull, CSR). Comparing sequential approach OpenMP nvGraph
  • 10. Sequential vs vs OpenMP vs 1. Performance of sequential execution based vs OpenMP based PageRank (pull, CSR). 2. Performance of sequential execution based vs nvGraph based PageRank (pull, CSR). 3. Performance of OpenMP based vs nvGraph based PageRank (pull, CSR). Adjusting Monolithic (Sequential) optimizations (from STICD) Split components Skip in-identicals Skip chains Skip converged 1. Performance benefit of PageRank with vertices split by components (pull, CSR). 2. Performance benefit of skipping in-identical vertices for PageRank (pull, CSR). 3. Performance benefit of skipping chain vertices for PageRank (pull, CSR). 4. Performance benefit of skipping converged vertices for PageRank (pull, CSR). Adjusting Levelwise (STICD) approach Min. component size Min. compute size Skip teleport calculation 1. Comparing various min. component sizes for topologically-ordered components (levelwise...). 2. Comparing various min. compute sizes for topologically-ordered components (levelwise...). 3. Checking performance benefit of levelwise PageRank when teleport calculation is skipped. Note: min. components size merges small components even before generating block-graph / topological-ordering, but min. compute size does it before pagerank computation. Comparing Levelwise (STICD) approach Monolithic nvGraph Levelwise (STICD) vs 1. Performance of monolithic vs topologically-ordered components (levelwise) PageRank. Adjusting ranks for dynamic graphs update new zero fill 1/N fill
  • 11. update old, new scale, 1/N fill 1. Comparing strategies to update ranks for dynamic PageRank (pull, CSR). Adjusting Levelwise (STICD) dynamic approach Skip unaffected components For fixed graphs For temporal graphs 1. Checking for correctness of levelwise PageRank when unchanged components are skipped. 2. Perf. benefit of levelwise PageRank when unchanged components are skipped (fixed). 3. Perf. benefit of levelwise PageRank when unchanged components are skipped (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Comparing dynamic approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: temporal Monolithic static vs: fixed, temporal vs: fixed, temporal Levelwise static vs: fixed vs: fixed, temporal 1. Performance of nvGraph based static vs dynamic PageRank (temporal). 2. Performance of static vs dynamic PageRank (temporal). 3. Performance of static vs dynamic levelwise PageRank (fixed). 4. Performance of levelwise based static vs dynamic PageRank (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Adjusting Monolithic CUDA approach Map launch Reduce memcpy launch in-place launch vs Thread /V launch sort/p. vertices sort edges Block /V launch sort/p. vertices sort edges
  • 12. Switched /V thread launch block launch switch-point 1. Comparing various launch configs for CUDA based vector multiply. 2. Comparing various launch configs for CUDA based vector element sum (memcpy). 3. Comparing various launch configs for CUDA based vector element sum (in-place). 4. Performance of memcpy vs in-place based CUDA based vector element sum. 5. Comparing various launch configs for CUDA thread-per-vertex based PageRank (pull, CSR). 6. Sorting vertices and/or edges by in-degree for CUDA thread-per-vertex based PageRank. 7. Comparing various launch configs for CUDA block-per-vertex based PageRank (pull, CSR). 8. Sorting vertices and/or edges by in-degree for CUDA block-per-vertex based PageRank. 9. Launch configs for CUDA switched-per-vertex based PageRank focusing on thread approach. 10. Launch configs for CUDA switched-per-vertex based PageRank focusing on block approach. 11. Sorting vertices and/or edges by in-degree for CUDA switched-per-vertex based PageRank. 12. Comparing various switch points for CUDA switched-per-vertex based PageRank (pull, ...). Note: sort/p. vertices ⇒ sorting vertices by ascending or descending order of in-degree, or simply partitioning (by in-degree). sort edges ⇒ sorting edges by ascending or descending order of id. Adjusting Monolithic CUDA optimizations (from STICD) Split components Skip in-identicals Skip chains Skip converged 1. Performance benefit of CUDA based PageRank with vertices split by components. 2. Performance benefit of skipping in-identical vertices for CUDA based PageRank (pull, CSR). 3. Performance benefit of skipping chain vertices for CUDA based PageRank (pull, CSR). 4. Performance benefit of skipping converged vertices for CUDA based PageRank (pull, CSR). Adjusting Levelwise (STICD) CUDA approach Min. component size Min. compute size Skip teleport calculation 1. Min. component sizes for topologically-ordered components (levelwise, CUDA) PageRank. 2. Min. compute sizes for topologically-ordered components (levelwise CUDA) PageRank. Note: min. components size merges small components even before generating block-graph / topological-ordering, but min. compute size does it before pagerank computation. Comparing Levelwise (STICD) CUDA approach nvGraph Monolithic CUDA
  • 13. Monolithic vs vs Monolithic CUDA vs Levelwise CUDA vs vs 1. Performance of sequential execution based vs CUDA based PageRank (pull, CSR). 2. Performance of nvGraph vs CUDA based PageRank (pull, CSR). 3. Performance of Monolithic CUDA vs Levelwise CUDA PageRank (pull, CSR, ...). Comparing dynamic CUDA approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal Monolithic static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal Levelwise static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal 1. Performance of static vs dynamic CUDA based PageRank (fixed). 2. Performance of static vs dynamic CUDA based PageRank (temporal). 3. Performance of CUDA based static vs dynamic levelwise PageRank (fixed). 4. Performance of static vs dynamic CUDA based levelwise PageRank (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Comparing dynamic optimized CUDA approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: fixed vs: fixed vs: fixed Monolithic static vs: fixed vs: fixed vs: fixed Levelwise static vs: fixed vs: fixed vs: fixed 1. Performance of CUDA based optimized dynamic monolithic vs levelwise PageRank (fixed). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs.
  • 14. 7. Packages 1. CLI for SNAP dataset, which is a collection of more than 50 large networks. This is for quickly fetching SNAP datasets that you need right from the CLI. Currently there is only one command clone, where you can provide filters for specifying exactly which datasets you need, and where to download them. If a dataset already exists, it is skipped. This summary is shown at the end. You can install this with npm install -g snap-data.sh.
  • 15. 2. CLI for nvGraph, which is a GPU-based graph analytics library written by NVIDIA, using CUDA. This is for running nvGraph functions right from the CLI with graphs in MatrixMarket format (.mtx) directly. It just needs a x86_64 linux machine with NVIDIA GPU drivers installed. Execution time, along with the results can be saved in JSON/YAML file. The executable code is written in C++. You can install this with npm install -g nvgraph.sh. 8. Further action List dynamic graph algorithms List dynamic graph data structures List graph processing frameworks List graph applications Package graph processing frameworks 9. Bibliography [1] A. Langville and C. Meyer, “Deeper Inside PageRank,” Internet Math., vol. 1, no. 3, pp. 335–380, Jan. 2004, doi: 10.1080/15427951.2004.10129091. [2] M. Besta, M. Fischer, V. Kalavri, M. Kapralov, and T. Hoefler, “Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems,  and Parallelism,” CoRR, vol. abs/1912.12740, 2019. [3] Contributors to Wikimedia projects, “PageRank,” Wikipedia, Jul. 2021. https://en.wikipedia.org/wiki/PageRank (accessed Mar. 01, 2021). [4] J. Leskovec, “PageRank Algorithm, Mining massive Datasets (CS246), Stanford University,” YouTube, 2019. [5] P. Garg and K. Kothapalli, “STIC-D: Algorithmic techniques for efficient parallel pagerank computation on real-world graphs,” in Proceedings of the 17th International Conference on Distributed Computing and Networking - ICDCN ’16, New York, New York, USA, Jan. 2016, pp. 1–10, doi: 10.1145/2833312.2833322. [6] Contributors to Wikimedia projects, “Nvidia Tesla,” Wikipedia, Apr. 2021. https://en.wikipedia.org/wiki/Nvidia_Tesla (accessed Jun. 01, 2021). [7] NVIDIA Corporation, “NVIDIA Tesla V100 GPU Architecture Whitepaper,” NVIDIA Corporation, 2017. Accessed: Jul. 13, 2021. [Online]. Available: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pd f.