Abstract:
Today's Internet maps, which are all collected from a small number of vantage points, are falling short of being accurate. We suggest here a paradigm shift for this task. DIMES is a distributed measurement infrastructure for the Internet that is based on the deployment of thousands of light weight measurement agents around the globe. We describe the rationale behind DIMES deployment, discuss its design trade-offs and algorithmic challenges, and analyze the structure of the Internet as it seen with DIMES.

Abstract:
The Internet is constantly changing, and its hierarchy was recently shown to become flatter. Recent studies of inter-domain traffic showed that large content providers drive this change by bypassing tier-1 networks and reaching closer to their users, enabling them to save transit costs and reduce reliance of transit networks as new services are being deployed, and traffic shaping is becoming increasingly popular. In this paper we take a first look at the evolving connectivity of large content provider networks, from a topological point of view of the autonomous systems (AS) graph. We perform a 5-year longitudinal study of the topological trends of large content providers, by analyzing several large content providers and comparing these trends to those observed for large tier-1 networks. We study trends in the connectivity of the networks, neighbor diversity and geographical spread, their hierarchy, the adoption of IXPs as a convenient method for peering, and their centrality. Our observations indicate that content providers gradually increase and diversify their connectivity, enabling them to improve their centrality in the graph, and as a result, tier-1 networks lose dominance over time.

Abstract:
In highly distributed Internet measurement systems distributed agents periodically measure the Internet using a tool called {\tt traceroute}, which discovers a path in the network graph. Each agent performs many traceroute measurement to a set of destinations in the network, and thus reveals a portion of the Internet graph as it is seen from the agent locations. In every period we need to check whether previously discovered edges still exist in this period, a process termed {\em validation}. For this end we maintain a database of all the different measurements performed by each agent. Our aim is to be able to {\em validate} the existence of all previously discovered edges in the minimum possible time. In this work we formulate the validation problem as a generalization of the well know set cover problem. We reduce the set cover problem to the validation problem, thus proving that the validation problem is ${\cal NP}$-hard. We present a $O(\log n)$-approximation algorithm to the validation problem, where $n$ in the number of edges that need to be validated. We also show that unless ${\cal P = NP}$ the approximation ratio of the validation problem is $\Omega(\log n)$.

Abstract:
The geographical location of Internet IP addresses has an importance both for academic research and commercial applications. Thus, both commercial and academic databases and tools are available for mapping IP addresses to geographic locations. Evaluating the accuracy of these mapping services is complex since obtaining diverse large scale ground truth is very hard. In this work we evaluate mapping services using an algorithm that groups IP addresses to PoPs, based on structure and delay. This way we are able to group close to 100,000 IP addresses world wide into groups that are known to share a geo-location with high confidence. We provide insight into the strength and weaknesses of IP geolocation databases, and discuss their accuracy and encountered anomalies.

Abstract:
Using Dijkstra's algorithm to compute the shortest paths in a graph from a single source node to all other nodes is common practice in industry and academia. Although the original description of the algorithm advises using a Fibonacci Heap as its internal queue, it has been noted that in practice, a binary (or $d$-ary) heap implementation is significantly faster. This paper introduces an even faster queue design for the algorithm. Our experimental results currently put our prototype implementation at about twice as fast as the Boost implementation of the algorithm on both real-world and generated large graphs. Furthermore, this preliminary implementation was written in only a few weeks, by a single programmer. The fact that such an early prototype compares favorably against Boost, a well-known open source library developed by expert programmers, gives us reason to believe our design for the queue is indeed better suited to the problem at hand, and the favorable time measurements are not a product of any specific implementation technique we employed.

Abstract:
Consider the setting of \emph{randomly weighted graphs}, namely, graphs whose edge weights are chosen independently according to probability distributions with finite support over the non-negative reals. Under this setting, properties of weighted graphs typically become random variables and we are interested in computing their statistical features. Unfortunately, this turns out to be computationally hard for some properties albeit the problem of computing them in the traditional setting of algorithmic graph theory is tractable. For example, there are well known efficient algorithms that compute the \emph{diameter} of a given weighted graph, yet, computing the \emph{expected} diameter of a given randomly weighted graph is \SharpP{}-hard even if the edge weights are identically distributed. In this paper, we define a family of properties of weighted graphs and show that for each property in this family, the problem of computing the \emph{$k^{\text{th}}$ moment} (and in particular, the expected value) of the corresponding random variable in a given randomly weighted graph $G$ admits a \emph{fully polynomial time randomized approximation scheme (FPRAS)} for every fixed $k$. This family includes fundamental properties of weighted graphs such as the diameter of $G$, the \emph{radius} of $G$ (with respect to any designated vertex) and the weight of a \emph{minimum spanning tree} of $G$.

Abstract:
Modeling complex networks has been the focus of much research for over a decade. Preferential attachment (PA) is considered a common explanation to the self organization of evolving networks, suggesting that new nodes prefer to attach to more popular nodes. The PA model results in broad degree distributions, found in many networks, but cannot explain other common properties such as: The growth of nodes arriving late and Clustering (community structure). Here we show that when the tendency of networks to adhere to trends is incorporated into the PA model, it can produce networks with such properties. Namely, in trending networks, newly arriving nodes may become central at random, forming new clusters. In particular, we show that when the network is young it is more susceptible to trends, but even older networks may have trendy new nodes that become central in their structure. Alternatively, networks can be seen as composed of two parts: static, governed by a power law degree distribution, and a dynamic part governed by trends, as we show on Wiki pages. Our results also show that the arrival of trending new nodes not only creates new clusters, but also has an effect on the relative importance and centrality of all other nodes in the network. This can explain a variety of real world networks in economics, social and online networks, and cultural networks. Products popularity, formed by the network of people's opinions, exhibit these properties. Some lines of products are increasingly susceptible to trends and hence to shifts in popularity, while others are less trendy and hence more stable. We believe that our findings have a big impact on our understanding of real networks.

Abstract:
Resource allocation for cloud services is a complex task due to the diversity of the services and the dynamic workloads. One way to address this is by overprovisioning which results in high cost due to the unutilized resources. A much more economical approach, relying on the stochastic nature of the demand, is to allocate just the right amount of resources and use additional more expensive mechanisms in case of overflow situations where demand exceeds the capacity. In this paper we study this approach and show both by comprehensive analysis for independent normal distributed demands and simulation on synthetic data that it is significantly better than currently deployed methods.

Abstract:
The discovery of Autonomous Systems (ASes) interconnections and the inference of their commercial Type-of-Relationships (ToR) has been extensively studied during the last few years. The main motivation is to accurately calculate AS-level paths and to provide better topological view of the Internet. An inherent problem in current algorithms is their extensive use of heuristics. Such heuristics incur unbounded errors which are spread over all inferred relationships. We propose a near-deterministic algorithm for solving the ToR inference problem. Our algorithm uses as input the Internet core, which is a dense sub-graph of top-level ASes. We test several methods for creating such a core and demonstrate the robustness of the algorithm to the core's size and density, the inference period, and errors in the core. We evaluate our algorithm using AS-level paths collected from RouteViews BGP paths and DIMES traceroute measurements. Our proposed algorithm deterministically infers over 95% of the approximately 58,000 AS topology links. The inference becomes stable when using a week worth of data and as little as 20 ASes in the core. The algorithm infers 2-3 times more peer-to-peer relationships in edges discovered only by DIMES than in RouteViews edges, validating the DIMES promise to discover periphery AS edges.

Abstract:
The k-shell decomposition of a random graph provides a different and more insightful separation of the roles of the different nodes in such a graph than does the usual analysis in terms of node degrees. We develop this approach in order to analyze the Internet's structure at a coarse level, that of the "Autonomous Systems" or ASes, the subnetworks out of which the Internet is assembled. We employ new data from DIMES (see http://www.netdimes.org), a distributed agent-based mapping effort which at present has attracted over 3800 volunteers running more than 7300 DIMES clients in over 85 countries. We combine this data with the AS graph information available from the RouteViews project at Univ. Oregon, and have obtained an Internet map with far more detail than any previous effort. The data suggests a new picture of the AS-graph structure, which distinguishes a relatively large, redundantly connected core of nearly 100 ASes and two components that flow data in and out from this core. One component is fractally interconnected through peer links; the second makes direct connections to the core only. The model which results has superficial similarities with and important differences from the "Jellyfish" structure proposed by Tauro et al., so we call it a "Medusa." We plan to use this picture as a framework for measuring and extrapolating changes in the Internet's physical structure. Our k-shell analysis may also be relevant for estimating the function of nodes in the "scale-free" graphs extracted from other naturally-occurring processes.