Big Data Classification : Aspects on Many Features and Many Observations (original) (raw)

Abstract

In this paper we discuss the performance of classical classification methods on Big Data. We distinguish the cases many features and many observations. For the many features case we look at projection methods, distance-based methods, and feature selection. For the many observations case we mainly consider subsampling. The examples in this paper show that standard classification methods should not be blindly applied to Big Data.

Similar content being viewed by others

Notes

    1. Thanks to T. Glasmachers for suggesting this definition.
    1. This part of the paper was supported by the Mercator Research Center Ruhr, grant Pr-2013-0015, see http://www.largescalesvm.de/.
    1. This simulation was carried out using the R-packages BatchJobs (Bischl et al. 2015) and mlr on the SLURM cluster of the Statistics Department of TU Dortmund University.
    1. This example is inspired by Fan et al. (2011).

References

Download references

Author information

Authors and Affiliations

  1. Chair of Computational Statistics, Faculty of Statistics, TU Dortmund, Dortmund, Germany
    Claus Weihs
  2. Department of Statistics, TU Dortmund University, Dortmund, Germany
    Daniel Horn & Bernd Bischl

Authors

  1. Claus Weihs
  2. Daniel Horn
  3. Bernd Bischl

Corresponding author

Correspondence toClaus Weihs .

Editor information

Editors and Affiliations

  1. Jacobs University Bremen , Bremen, Germany
    Adalbert F.X. Wilhelm
  2. Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
    Hans A. Kestler

Rights and permissions

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Weihs, C., Horn, D., Bischl, B. (2016). Big Data Classification : Aspects on Many Features and Many Observations. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1\_10

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us