in this post i review the paper  and i would like to reference more information related to “Sampling of Graph” .
As we know we need the sampling to estimate the important properties of large networks. As i mentioned in post  there are two objective for sampling. therefor based on the goal, the design of sampling will vary. another factor that has impact graph sampling is the query mechanisms.
there are two types of querying mechanisms :
- backend querying : it is available when on can obtain the value of an entity characteristics from population graph. for example : degree of a sample node refers to its degree in population graph
- frontend querying : it is available when on can obtain the value of an entity characteristics from the sample graph. (sampling graph is a representative subgraph).
paper claims that many studies in Internet Topology do not utilize the design considerations. so they conduct a large scale experimental study to demonstrate the relation between various sampling designs and the accuracy of resulting samples in estimating various characteristics of population graphs. the followings are the topological characteristics to study in this paper:
- degree of a vertex i and degree distribution of a graph
- clustering coefficient : which is a measure of closed connections appearing among vertex groups in a graph and it depends on the number of connected pairs between all neighbors of vertex i.
- path between nodes i and j :which is a sequence of consecutive edges starting from vertex i and ending at node j. another important is path length distribution of a graph.
there are several types of graphs, the paper only mentions three types of them which are frequently used.
- Random Graphs : graphs that are obtained by using Gilbert/Erdos-Renyi model.
- Graphs with Power-Law Degree Distribution : which are called scale-free graphs and they can be obtained by using Barbasi-Albert model.
- Small-World Graphs : which are the graphs with a large average clustering coefficient and a small average shortest path. having large clustering coefficient implies the existence of highly clustered cliques in the graph. Also small shortest path refers to the existence of hub vertices or hub cliques within the graph. these graphs are obtained by using Watts-Strogatz model .
in this study these three types of graphs are underlying population graph. in graph sampling applications sampling methods are mainly determined based on the data collection mechanisms. the following sampling methods are used to collect network samples from each type.
- Random Vertex : selects a number of vertices with equal probability.
- Induced Random Vertex : which includes the existing edges from the population graph among the randomly selected vertices .
- Random Edge : selects a number of edges with equal probability .
- Induced Random Edge: which includes the existing vertices from the population graph among the randomly selected edges.
- Random Walk : selects a random vertex as current vertex, then at each step , it selects a new vertex among the immediate neighbors of the current vertex and designates it as the new current vertex.
- Metropolized Random Walk: removes the bias introduced by Random Walk. the bias in Random Walk is that is it likely to choose vertices with higher degree, and MRW removes this bias.
- Random Shortest Path : selects the paths among all possible shortest paths with equal probability. one of the problem of this method is that finding all shortest paths in a graph is costly. the variations of this method are (k,k) sampling which is sampling among k vertices and (k,m) sampling which sampling between k sources and m destinations.
for measuring the difference between the sample and the population distributions the Jensen-Shannon Divergence has been used. the sample size is 10% of each population graph.