Using distributed local information to improve global performance in Grids

Grid computing refers to the federation of geographically distributed and heterogeneous computer resources. These resources may belong to diﬀerent administrative domains, but are shared among users. Every grid presents a key component responsible for obtaining, distributing, indexing and archiving information about the conﬁguration and state of services and resources. Optimizing tasks assignations and user requests to resources require the maintenance of up-to-date information about the grid. In large scale Grids, the dynamics of the resource information cannot be captured using a static hierarchy and relying in manual conﬁguration and administration. It is necessary to design new policies for discovery and propagation of resource information. There is a growing interest in the interaction of Grid Computing and the Peer to Peer (P2P) paradigm, pushing towards scalable solutions. In this work, starting from the Best-Neighbor policy based on previously published ideas, the reasons behind its lack of performance are explored. A new improved Best-Neighbor policy are proposed and analyzed, comparing it with Random, Hierarchical and Super-Peer policies.


Introduction
Grid computing refers to the federation of geographically distributed and heterogeneous computer resources [1].These resources may belong to different administrative domains, but are shared among users.Grid infrastructure may be confined to a small network of workstations within a corporation or large public collaborations across many countries and networks.
Every grid infrastructure needs a component responsible for obtaining, distributing, indexing and archiving information about the configuration and state of services and resources.Optimizing tasks assignations and user requests to resources require the maintenance of up-to-date information about the grid [2].It is widely known that standard centralized organization approach has several drawbacks [3].Static hierarchy has become the defacto implementation of grid information systems [4].
In medium-to-large scale environments, the dynamics of the resource information cannot be captured using a static hierarchy [5].This approach has similar drawbacks to the centralized one, such as the point of failure, and poor scaling for a large number of users/providers [6,7].Therefore, it is necessary to design new policies for discovery and propagation of resource information.
There is a growing interest in the interaction of Grid Computing and the P2P paradigm, pushing towards scalable solutions [8,5].These initiatives are base in two common facts: i) very dynamic and heterogeneous environment and ii) create a virtual working environment by collecting the resources available from a series of distributed, individual entities [7].
Iamnitchi et al. [9,10] proposed a P2P approach for organizing the information components in a flat dynamic P2P network.This decentralized approach envisages that every administrative domain maintains its information services and makes it available as part of the P2P network.Schedulers may initiate look-up queries that are forwarded in the P2P network using flooding (a similar approach to the unstructured P2P network Gnutella [11]).
A key aspect of P2P systems is how peers interact between them.Different algorithms for this interaction are available and the selection may severally impact in system performance.The most common policies are: • Random: Every node chooses randomly any other node to query information from.There is no structure at all.
• Best-Neighbor: Some information about each answer is stored and the next neighbor to query is selected using the quality of the previous answers.At the beginning, the node has no information about its neighbors, thus it chooses randomly.As information is collected , the probability of choosing a neighbor randomly is inversely proportional to the amount of information stored.
• Super-Peer: Some nodes are defined as super-peers working like servers for a subset of nodes and as peers in the network of super-peers.In this way, a two level structure is defined such that the normal nodes are allowed to talk only with a single super-peer and the cluster defined by it.
Mastroiani et al. [12] evaluated the performance of these policies and analyzed the pros and cons of each solution.In despite of the majority of the evaluated aspects strongly depend on time, it is usually discarded in the analysis leading to a the missing of the inherent dynamical nature of the system.Some other structured P2P approaches have also been proposed, see for example the work of Basu et al [13].
In Mocskos et al. [14] the authors introduced a new set of metrics (LIR, GIR and GIV) that incorporate the notion of time decay of information for evaluating system performance.The best results in terms of the proposed metrics were attained by the hierarchical policy, followed by super-peer which outperformed random and best-neighbor.
Iamnitchi et al. [9,10] introduced the Best-Neighbor policy which records the requests answered by each node and directs the following to the peer that previously answered or chooses randomly if no relevant experience exists.Following the taxonomy proposed in Ranjan et al [3], this approach can be included in the class of unstructured and non-deterministic P2P systems.
Based on these results, in Mocskos et al [14] some good initial results of this policy were shown.Best-neighbor get good performance and, mainly, the overall information known by the system increases with time.Notwithstanding, in later studies with the system growth Best-Neighbor shows a similar performance of Random policy without getting the increase of GIR with time (data not shown).
In this work, we start from the Best-Neighbor policy based on the ideas of Iamnitchi et al. [9,10] and explore the reasons behind its lack of performance.We propose and analyze some improvements to the policy.Finally, we compare the obtained policy with Random, Hierarchical and Super-Peer.

Materials and Methods
To evaluate the different scenarios and policies, we used GridMatrix1 , an open source tool focused on the analysis of discovery and monitoring information policies, based on SimGrid2 [16].This simulator includes three different metrics [14] for the study of information propagation, described below: • Local Information Rate (LIR): captures the amount of information that a particular host has from all the entire grid in a single moment.For the host k, LIR k is: where N is number of hosts in the system, expiration h is the expiration time of the resources of host h in host k, age h is the time passed since the information was obtained from that host, resourceCount h is the amount of resources in host h and totalResourceCount is the total amount of resources in the whole grid.
• Global Information Rate (GIR): captures the amount of information that the whole grid knows of itself, calculated as the mean value of every node's LIR.
• Global Information Variability (GIV): measures the variability of GIR in the system (less is better), calculated as the standard deviation of GIR.
Three topologies were used to study the information dynamics: Ring, Clique and Exponential (see figure 1).In a Ring topology, every node is connected exactly to two other nodes, forming a cycle (figure 1a).Clique topology proposes a scenario where every node is connected to every other node (figure 1b).To represent a more realistic network, the exponential distribution model is used for the connections, where the amount of connections of each node follows an exponential distribution law (figure 1c), commonly seen in the Internet or collaborative networks [17,18].All theses topologies and scenarios were generated by the included features in the GridMatrix simulator.The standard best-neighbor implementation (BN) ranks the nodes with the following scoring function [14]: where RES COUNT is the amount of available resources in the node, RTT corresponds to the Round Trip Time and RESPONSE FAILED counts the number of messages looses.a, b, and c are parameters to change the weight of each variable.
We present fBN, an implementation of Best-Neighbor policy that incorporates a new term which captures information about the amount of local resources of the node: where the new term OWN RES COUNT is the amount of local resources in the node and d is the weighting coefficient of this variable.

Results and Discussion
The standard best-neighbor implementation (BN) strongly depends on knowing as much nodes as possible in the network.When the policy starts, the nodes are randomly selected until sufficient nodes are known (some threshold value is selected) creating a local database with the information about the known neighbors.In medium-to-large scale networks, knowing the whole network can be very demanding, and so starting the best-neighbor strategy may be delayed leading to extremely large "random" stage (also known as learning stage).
To achieve this, many methods have been proposed, from which we choose merging lists for our implementation.This technique consists of sharing the lists of neighbors that a particular node has to any other node that communicates with it.With such simple implementation, significant improvements are reported and all nodes know about almost all the network greatly shortening the learning stage.In figure 2, we show the learning curve of the nodes in two exponential networks (30 and 400 nodes) using the merging list algorithm versus just randomly exploring the network.Using this improvement, all the nodes of the network are quickly known and best neighbor method can start choosing the most appropriate nodes (figure 2, blue lines).On the other hand, as network sizes scale, knowing every node in the infrastructure is increasingly demanding (figure 2, green lines), leading eventually to a situation where the learning stage become the strongly dominant phase.Figure 2: Learning curve of the nodes in two exponential networks (30 and 400 nodes) using the merging list algorithm versus just using the random nodes selected.In very little steps, with the merge-list method, all nodes of the network are known using less messages.
Once the network is sufficiently well known, Best-Neighbor method can rank the nodes to connect with, and select the most informative one following the scoring function.This function involves an implicit relation between their weighting coefficients (see Materials and Methods for details).The selection of each weight in this relation leads to focusing on some of the aspect of the system, in this work a standard set of parameters were used following previous works [19,20,14].Using the standard scoring function, the scoring function may select a node that do not have much proper information and instead has lots of data about its neighbors.This fact would penalize the amount of information collected due to the time delay of propagation of information.
In figure 3, green line shows the evolution of GIR for this situation, performing just over random policy (blue line).To overcome this problem, we introduce the fBN, a modification to the original Best-Neighbor policy that takes into account the amount of proper information available.In all the topologies and networks sizes studied (only shown 400 nodes networks in figure 3), fBN outperforms Random and BN policies.Next, we compare this new implementation with the different policies: Random, Hierarchical and Super-peer.These policies have different needs of administration.Hierarchical consists of a human supervised construction of a logical hierarchy using the nodes.Evidently, this policy has a very high cost of configuration and maintenance, but would result in very high GIR values.We compare this policy with other policy which needs very little supervision: Super-peer.In the used setup, 100 nodes are selected to act as super-peers.Finally, we present the comparison to the proposed implementation of Best-Neighbor, a completely unsupervised policy.
In figure 4, we show the evolution of GIR for the exposed policies.Data is smoothed by taking the moving average 5 points to each side of each point.Hierarchical (red line) shows the higher GIR values, far from the other implementations.This policy shows a lower GIR in the case of Ring topology due to the underlying network infrastructure and the longer paths needed to send messages between the nodes.On the other hand, random (blue line) shows the worst values.In the middle, closer to Random policy, Super-peer (cyan line) and best-neighbor (green line) shows a similar average GIR in the case of Exponential topology.For the other two policies, Superpeer overlaps with Random policy.Super-peer results in a more variable GIR over time, while best-neighbor shows a very stable behavior.

Conclusions
Grid computing refers to the federation of geographically distributed and heterogeneous computer resources.Every grid infrastructure needs a component responsible for obtaining, distributing, indexing and archiving information about the configuration and state of services and resources.The dynamics of the resource information cannot be captured using a static hierarchy due to similar drawbacks as the centralized one.Therefore, it is necessary to design new policies for discovery and propagation of resource information.
Four policies are usually considered: Random, Best-Neighbor, Super-Peer and Hierarchical, all of them have different needs of administration and supervision.
Two modifications are introduced to improve the performance of Best-Neighbor policy obtaining fBN: i) merge the lists of neighbors during the learning stage to decrease the length of this phase, ii) a new term which considers the amount of local resources provided by the node is added to the scoring function.
fBN presents a short learning phase which maintains almost constant with the considered system sizes.On the other hand, fBN outperforms Random policy and shows similar behavior as Super-peer.Hierarchical shows the best performance, but on the other hand, is the policy needing more setup and administration.
fBN results in a good trade-off between fully automated policy and obtained performance.

Figure 1 :
Figure 1: Schemes of the network topologies analysed in this work.In the Ring topology(a) each node connects to exactly two other nodes, while the Clique(b) is an all-to-all connected network.Exponential topology(c) is formed following an exponential distribution law.

Figure 3 :
Figure 3: Evolution of GIR for Random, BN and fBN policies in three topologies with 400 nodes.BN shows similar behavior to Random, while fBN outperforms both other policies.

Figure 4 :
Figure4: Evolution of GIR for Random, fBN, Super-Peer (100 nodes) and hierarchical policies.fBN shows similar behavior to Super-Peer, indicating that unsupervised method may obtain comparable result to supervised ones.Both results perform better Random, but very far from the hierarchical policy.