2019 KDD Knowledge Sharing

5 min readSep 12, 2019

Knowledge Discovery and Data-mining conference (KDD) was hosted in Alaska this August 4–8th. This is my first time attending, I found it contains a good mix of research and practical data science work in deep neural networks field. In this post, I want to share some learnings especially among graph neural networks (GNN).

Why GNN?

So why GNN and how does it differ from CNN and RNNs that we’ve known? 《The Book of Why》by Judea Pearl he emphasizes the importance of graph in cognitive ability. It’s not just statistics but cause effect analysis and logic inference. And logical inference is tied to graph. If we compare human to these deep learning networks, CNN and RNNs and attention models like Bert is more like system I, while Graph neural networks is more like system two where we perform logical inference on things that we already know. Another reason of using graph is that not everything can be represented as a sequence (RNN) or a grid (CNN). There’s more complex networks in real life.

GNNs follow a neighborhood aggregation scheme, where the representation vector of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks.

Graph Representation

To learn from the network, we need a way to represent the graph so we could use that as input for the ML models to learn.

The most intuitive way is to represent the graph with an adjacency matrix, where the matrix rows and columns represents the nodes and value represents the relationship between the nodes.

Network Embedding

But simply representing the graph is not enough. As the number of nodes grow, the matrix grows exponentially. Using an adjacency matrix as a feature space for large graphs is almost impossible. So imagine a graph with 1M nodes and an adjacency matrix of 1M x 1M. Embeddings are more practical than the adjacency matrix since they packnode properties in a vector with a smaller dimension. Network embedding is the compressed version of graph representation.

The basic idea behind node embedding approaches is to use dimensionality reduction techniques to distill the high-dimensional information about a node’s neighborhood into a dense vector embedding. These node embeddings can then be fed to downstream machine learning systems and aid in tasks such as node classification, clustering, and link prediction.

The key distinction between representation learning approaches and previous work is how they treat the problem of capturing structural information about the graph. Previous work treated this problem as a pre-processing step, using hand-engineered statistics to extract structural information. In contrast, representation learning approaches treat this problem as machine learning task itself, using a data-driven approach to learn embeddings that encode graph structure.

Evolvement of Network Embedding

There are many ways of embedding the network evolved over the time, and people categorize quite it differently.

One of the reasons why GNN is not widely used is that graph is way more complicated than the a sequence and grids, and there is no standard way of representing a graph. More difficulties are listed below:

Large and variable output space: for n nodes we need to generate nxn valueGraph size (nodes, edges) varies
Non-unique representations: n node graph can be represented in n! waysHard to compute/optimize objective functions (e.g., reconstruction error)
Complex dependencies: Edge formation has long-range dependencies

Skip-gram Based Graph Embedding — DeepWalk

The first kind is Skip-gram Based Graph Embedding which is inspired by Word2Vec.

Word2vec is an embedding method which transforms words into embedding vectors. Similar words should have similar embeddings. Word2vec uses the skip-gram network which is the neural network with one hidden layer. The skip-gram is trained to predict neighbor word in the sentence. Skip-gram based graph embedding use the embedding principle from Word2vec: just like Word2vec uses the skip-gram network to predict neighbor word in the sentence, Skip-gram in DeepWalk accepts a node from the random walk as a one-hot vector as an input and maximizes the probability for predicting neighbor nodes.

More Skip-gram Based Graph Embedding

Below is more Skip-gram based Graph Embedding, variations in how the node is traversed.

GCN — Convolution-inspired Embedding

Another type is GCN, which is inspired by convolution networks. GCN is a neural network architecture for machine learning on graphs that operates on graphs. Given a graph G = (V, E), a GCN takes an input feature matrix N × F⁰ feature matrix, X, where N is the number of nodes and F⁰ is the number of input features for each node, and an N × N matrix representation of the graph structure such as the adjacency matrix A of G.

GraphSAGE

Unlike previous methods, GraphSAGE generate node embeddings for previously unseen data. It is a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood, that enables embeddings to be quickly generated for unseen nodes.

Other Interesting Stuff

Russ Salakhutdinov: XLNet — autoregressive language modeling that beats Bert https://arxiv.org/pdf/1906.08237.pdf
Pinterest train user embeddings: link1 link2
Explanable AI https://www.slideshare.net/KrishnaramKenthapadi/explainable-ai-in-industry-kdd-2019-tutorial

Originally published at http://pretteyandnerdy.wordpress.com on September 12, 2019.