Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams (original) (raw)
Abstract
Maintaining frequency counts for items over data stream has a wide range of applications such as web advertisement fraud detection. Study of this problem has attracted great attention from both researchers and practitioners. Many algorithms have been proposed. In this paper, we propose a new method, error-adaptive pruning method, to maintain frequency more accurately. We also propose a method called fractionization to record time information together with the frequency information. Using these two methods, we design three algorithms for finding frequent items and top-k frequent items. Experimental results show these methods are effective in terms of improving the maintenance accuracy.
This work was supported in part by the National Natural Science Foundation of China under Grant No. 70471006 and 70321001, and by the U.S. National Science Foundation NSF IIS- 02-09199 and IIS-03-08215.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of 28th Intl. Conf. on Very Large Data Bases, pp. 346–357 (2002)
Google Scholar - Metwally, A., Agrawal, D., El Abbadi, A.: Efficient. Computation of Frequent and Top-k Elements in Data Streams. In: Proceedings of the 10th ICDT International Conference on Database Theory, pp. 398–412 (2005)
Google Scholar - Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: Proc. of 22nd ACM Symposium on Principles of Database Systems (PODS), pp. 296–306 (2003)
Google Scholar - Demaine, E., Lopez-Ortiz, A., Munro, J.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Proc. of 10th Annual European Symposium on Algorithms (2002)
Google Scholar - Jin, C., Qian, W., Sha, C., Yu, J., Zhou, A.: Dynamically Maintaining Frequent Items Over a Data Stream. In: Proc. of CIKM (2003)
Google Scholar - Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Item Sets from High Speed Transactional Data Streams. In: Proc. of 30th VLDB, pp. 204–215 (2004)
Google Scholar - Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Chapter Google Scholar - Knuth, D.E.: The Art of Programming. Addison-Wesley, Reading (1973)
Google Scholar
Author information
Authors and Affiliations
- Tsinghua University, 100084, China
Hongyan Liu - University of Illinois, Urbana, Champaign, 61801, USA
Ying Lu & Jiawei Han - Renmin University of China, 100872, China
Jun He
Authors
- Hongyan Liu
- Ying Lu
- Jiawei Han
- Jun He
Editor information
Editors and Affiliations
- Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu - Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa - Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, H., Lu, Y., Han, J., He, J. (2006). Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300\_41
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/11775300\_41
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-35225-9
- Online ISBN: 978-3-540-35226-6
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science