Fisher Information in Flow Size Distribution Estimation (original) (raw)

Abstract

The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present dual sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS, and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet samplingbased competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of "flow quality" for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (29)

  1. N. Duffield, C. Lund, and M. Thorup, "Estimating flow distributions from sampled flow statistics," IEEE/ACM Trans. Netw., vol. 13, no. 5, pp. 933-946, Oct. 2005.
  2. N. Hohn and D. Veitch, "Inverting sampled traffic," in Proc. ACM SIGCOMM Internet Measurement Conf., Miami, FL, Oct. 2003, pp. 222-233.
  3. N. Hohn and D. Veitch, "Inverting sampled traffic," IEEE/ACM Trans. Netw., vol. 14, no. 1, pp. 68-80, Jan. 2006.
  4. B. Ribeiro, D. Towsley, T. Ye, and J. Bolot, R. de Janeiro, Ed., "Fisher information on sampled packets: An application to flow size estimation," in Proc. ACM/SIGCOMM Internet Measurement Conf., Oct. 2006, pp. 15-26.
  5. G. Varghese, Network Algorithmics. San Francicso, CA: Elsevier/ Morgan Kaufmann, 2005.
  6. C. Estan and G. Varghese, "New directions in traffic measurement and accounting," ACM Trans. Comput. Syst., vol. 21, no. 3, pp. 270-313, Aug. 2003.
  7. P. Tune and D. Veitch, "Towards optimal sampling for flow size esti- mation," in Proc. ACM SIGCOMM Internet Measurement Conf., Vou- liagmeni, Greece, Oct. 20-22, 2008, pp. 243-256.
  8. L. Yang and G. Michailidis, "Estimation of flow lengths from sampled traffic," presented at the GLOBECOM, San Francisco, CA, Nov. 2006.
  9. T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006.
  10. A. Hero, J. Fessler, and M. Usman, "Exploring estimator bias-variance tradeoffs using the uniform CR bound," IEEE Trans. Signal Process., vol. 44, no. 8, pp. 2026-2041, Aug. 1996.
  11. J. D. Gorman and A. O. Hero, "Lower bounds for parametric esti- mation with constraints," IEEE Trans. Inf. Theory, vol. 26, no. 6, pp. 1285-1301, Nov. 1990.
  12. D. Harville, Matrix Algebra from a Statistician's Perspective. New York: Springer-Verlag, 1997.
  13. J. E. Strum, "Binomial matrices," The Two Year College Math. J., vol. 8, no. 5, pp. 260-266, Nov. 1977.
  14. E. L. Lehmann and G. Casella, "Theory of point estimation," in Ser. Springer Texts in Statistics, 2nd ed. New York: Springer, 1998.
  15. P. Tune and D. Veitch, Fisher Information in Flow Size Distribution Es- timation: Tech. Rep., Dept. E&EE, Univ. Melbourne, Australia, 2008.
  16. D. Shah, S. Iyer, B. Prabhakar, and N. McKeown, "Maintaining sta- tistics counters in line cards," IEEE Micro, vol. 22, no. 1, pp. 76-81, 2002.
  17. S. Ramabhadran and G. Varghese, "Efficient implementation of a sta- tistics counter architecture," ACM SIGMETRICS Performance Eval. Rev., vol. 31, no. 1, pp. 261-271, Jun. 2003.
  18. Q. Zhao, J. Xu, and Z. Liu, "Design of a novel statistics counter archi- tecture with optimal space and time efficiency," Proc. ACM SIGMET- RICS, vol. 34, no. 1, pp. 323-334, Jun. 2006.
  19. R. T. Rockafellar, Convex Analysis, ser. Princeton Landmarks in Math- ematics and Physics. Princeton, NJ: Princeton Univ. Press, 1970.
  20. C. Estan, K. Keyes, D. Moore, and G. Varghese, "Building a better netflow," presented at the ACM SIGCOMM, Portland, OR, Aug. 2004.
  21. NLANR, Leipzig-II Trace Data [Online]. Available: http://pma.nlanr. net/Special/leip2.html
  22. NLANR, Abilene-III Trace Data [Online]. Available: http://pma.nlanr. net/Special/ipls3.html
  23. S. Kay, Fundamentals of Statistical Signal Processing, Volume I, Esti- mation Theory. Upper Saddle River, NJ: Prentice-Hall, 1993.
  24. M. Mitzenmacher and E. Upfal, Probability and Computing. Cam- bridge, U.K.: Cambridge Univ. Press, 2005.
  25. G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, 2nd ed. Hoboken, NJ: Wiley Interscience, 2008.
  26. F. Zhang, Matrix Theory: Basic Results and Techniques. New York: Springer-Verlag, 1999.
  27. R. Zamir, "A proof of the Fisher information inequality via a data processing argument," IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 1246-1250, May 1998.
  28. A. E. Taylor and W. R. Mann, Advanced Calculus, 3rd ed. Hoboken, NJ: Wiley, 1983.
  29. S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.