|
计算机科学 2007
Continuous Distributed Top-k Monitoring over Data Streams
|
Abstract:
Monitoring data streams in a distributed system is the focus of much research in recent years.This paper addresses the generic and efficient processing of distributed top-k monitoring,which is continuously reporting the k largest values according to a user-specified ranking function over distributed multiple data streams.In practice,the user-specified ranking function would be arbitrary ranking function.Unfortunately,state-of-art distributed top-k monitoring approaches only support the sum function as the ranking function.In this paper,we present a general algorithm GMR for distributed top-k monitoring,which supports arbitrary continuous and strict monotone aggregation functions.The communication cost of GMR is independent of k.We verify the effectiveness of GMR empirically using both real-world and synthetic data sets.We show that GMR reduces overall communication cost by an order of magnitude compared with alternatives.