Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? (original) (raw)

References

Altmann, G. (1988). Wiederholungen in texten [Repetitions in texts]. Bochum, Germany: Brockmeyer.
Google Scholar
Balasubrahmanyan, V. K. & Naranan, S. (1996). Quantitative linguistics and complex system studies. Journal of Quantitative Linguistics, 3:3, 177-228.
Google Scholar
Bookstein, A. & Swanson, Don R. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25, 312-318.
Google Scholar
Chitashvili, R. J. & Baayen, R. H. (1993). Word frequency distributions. In G. Altmann & L. Hřebíček (Eds.). Quantitative Text Analysis (pp. 46-135). Trier, Germany: wvt.
Google Scholar
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information Retrieval and Knowledge Management (ACM-CIKM-98) (pp. 148-155).
Grotjahn, R. (1982). Ein statistisches Modell für die Verteilung der Wortl¨ange [A statistical model for the distribution of word length]. Zeitschrift f¨ur Sprachwissenschaft, 1, 44-75.
Google Scholar
Harter, S. P. (1975). A probabilistic approach to automatic keyword indexing, Part I. Journal of the American Society for Information Science, 26, 197-206.
Google Scholar
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the Tenth European Conference on Machine Learning (ECML '98), Lecture Notes in Computer Science, Number 1398 (pp. 137-142).
Kral´k, J. (1977). An application of exponential distribution law in quantitative linguistics. Prague Studies in Mathematical Linguistics, 5, 223-235.
Google Scholar
Krylov, Ju. K. (1995). A stationary model of coherent text generation. Journal of Quantitative Linguistics, 2:2, 157-167.
Google Scholar
Lezius,W., Rapp, R., & Wettler,M. (1998). A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for German. In Proceedings of the COLING-ACL 1998 (pp. 743-747).
Mandelbrot, B. (1953). On the theory of word frequencies and on related Markovian models of discourse. In R. Jakobson (Ed.), Structure of Language and its Mathematical Aspects, Proceedings of Symposia in Applied Mathematics (Vol. XII, pp. 190-210). Providence, RI: American Mathematical Society.
Google Scholar
Manning, C. D. & Schütze, H. (1999). Foundations of statistical natural language processing, Cambridge, MA: MIT-Press.
Google Scholar
Margulis, E. L. (1993). Modelling documents with multiple poisson distributions. Information Processing and Management, 29, 215-228.
Google Scholar
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:1/2, 103-134.
Google Scholar
Orlov, Ju. K. (1982). Linguostatistik: Aufstellung von Sprachnormen oder Analyse des Redeprozesses? (Die Antinomie 'Sprache-Rede' in der statistischen Linquistik)[Linguostatistics: Establishing language norms of analysis of the speech process (The antinomy 'language-speech' in statistical linguistics.).] In Ju. K. Orlov, M. G. Boroda, & I. S. NadarejČvili (Eds.). Sprache, Text, Kunst. Quantitative Analysen (pp. 1-55). Bochum, Germany: Brockmeyer.
Google Scholar
Porter, M. F. (1980) An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14:3, 130-137.
Google Scholar
Rieger, B. B. (1999). Semiotics and computational linguistics. On semiotic cognitive information processintg. In Zadeh, L. A. & J. Kacprzyk (Eds.). Computing with words in information/intelligent systems I. foundations (pp. 93-118). Heidelberg, Germany: Physica.
Google Scholar
Salton, G. & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw Hill.
Google Scholar
Stricker, M.,Vichot, F., Dreyfus, G., & Wolinski F. (2000).Vers la conception automatique de filtres d'informations efficaces [Towards the automatic design of efficient custom filters]. In Reconnaissance des Formes et Intelligence Artificielle (RFIA '2000) (pp. 129-137).
Vapnik, Vladimir N. (1998). Statistical learning theory. New York: Wiley.
Google Scholar
Wimmer, G., Köhler, R., Grotjahn, R., & Altmann, G. (1994). Towards a theory of word length distribution. Journal of Quantitative Linguistics, 1, 98-106.
Google Scholar
Zipf, G. K. (1949). Human behavior and the principle of least effort. An introduction to human ecology. Cambridge, MA: Addison-Wesley.
Google Scholar

Download references