Computational Approaches to Discourse and Document Processing (original) (raw)

Abstract

This introduction tracks the evolution of the definition and role of discourse issues in NLP from the knowledge-intensive "discourse understanding" methods of the 80's to the recent concern with "accessing contents" in vast document bases via data-intensive methods. As text/discourse linguistics moves toward corpus approaches, also in connection with the development of large text bases and of computational instruments, we explore potential new forms of convergence.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (55)

Allen, J., Natural Language Understanding, Menlo Park, CA, Benjamin Cummings, 1987.
Bateman, J., Kamps, T., Kleinz, J., Reichenberger, K., "Towards Constructive Text, Diagram, and Layout Generation for Information Presentation", Computational Linguistics, vol. 27 no. 3, 2001, p. 409-449.
Bestgen, Y., "Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore", Computational Linguistics, vol. 32 no. 1, 2006, p. 5-12.
Carlson, L., Marcu, D., Okurowski, M. E., "Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory", in J. van Kuppevelt, R. Smith (Eds.), Current Directions in Discourse and Dialogue, Dordrecht, Kluwer Academic Publishers, 2002, p. 85-109.
Cole, R. A., Mariani, J., Uszkoreit, H., Varile, G. B. (Eds.). Survey of the State of the Art in Human Language Technology, Pisa, Giardini, 1998 (Electronic version available on: http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html).
Delin, J., Bateman, J., Allen, P., "A Model of Genre in Document Layout", Information Design Journal, vol. 11 no. 1, 2002, p. 54-66.
Halliday, M. A. K., "Text as semantic choice in social contexts", in J. Webster (Ed.), The Collected Works of M.A.K. Halliday (Volume 2): Linguistic Studies of Text and Discourse, London, Continuum, 2003, p. 23-81 (reprinted from van Dijk, T., Petöfi, J.S. (Eds.), Grammars and Descriptions, Berlin, Walter de Gruyter, 1977, p.176-226).
Halliday, M. A. K. and Hasan, R., Cohesion in English, London, Longman, 1976.
Hearst, M., "TextTiling: segmenting text into multi-paragraph subtopic passages", Computational Linguistics, vol. 23 no. 1, 1997, p. 33-64.
Louwerse, M. M., and Graesser, A. C., "Macrostructure", In K. Brown (Ed.), Encyclopedia of Language and Linguistics, Elsevier, 2005, Vol. 7.
Luc, C., and Virbel, J., "Le modèle d'architecture textuelle : fondements et expérimentation", Verbum, vol. 23 no. 1, 2001, p. 103-123.
Mann, W. C., Thompson, S. A., "Rhetorical structure theory: Toward a functional theory of text organization", Text, vol. 8 no. 3, 1988, p. 243-281.
Marcu, D., The Theory and Practice of Discourse Parsing and Summarization, Cambridge, MA, MIT Press, 2001.
Marcu, D., "Discourse Parsing, Automatic", in K. Brown (Ed.), Encyclopedia of Language and Linguistics, Elsevier, 2005, Vol.3.
Miltsakaki, E., Prasad, R., Joshi, A., Webber, B., "Annotating Discourse Connectives and their Arguments", HLT/NAACL Workshop on Frontiers in Corpus Annotation, Boston, MA. 2004.
Moore, J. D., Wiemer-Hastings, P., "Discourse in Computational Linguistics and Artificial Intelligence", In A. C. Graesser, M. A. Gernsbacher, S. R. Goldman (Eds.), Handbook of Discourse Processes, London, Lawrence Erlbaum, 2003, p. 439-485.
Morris, J., Hirst, G., "Lexical cohesion computed by thesaural relations as an indicator of the structure of text", Computational Linguistics, vol. 17 no. 1, 1991, p. 21-48.
Nazarenko, A., "Sur quelle sémantique reposent les méthodes automatiques d'accès au contenu textuel ? ", in A. Condamines (Ed.), Sémantique et Corpus, Paris, Hermès, 2005, p. 211-244.
Nunberg, G., The Linguistics of Punctuation, Menlo Park, CSLI, 1990.
Orasan, C., "Automatic annotation of Corpora for Text Summarisation: A Comparative Study", In Proceedings of 6th International Conference, CICLing2005, Mexico City, 2005, Berlin, Springer-Verlag, p. 670-681
Péry-Woodley, M.-P., "Discours, corpus, traitements automatiques", in A. Condamines (Ed.), Sémantique et Corpus, Paris, Hermès, 2005, p. 177-210.
Power, R., Scott, D., Bouayad-Agha, N., "Document Structure", Computational Linguistics, vol. 29 no. 2, 2003, p. 211-260.
Scott, D. and Evans, R. "Multilingual Document Management Without Translation: Using natural language generation in the Multilingual Information Society". Elsnews, vol. 7 no. 1, February 1998.
Tutin, A., "A Corpus-based Study of Pronominal Anaphors in French", DAARC 2002, Lisbonne, Portugal, 2002.
Virbel, J., "Langage et métalangage dans le texte du point de vue de l'édition en informatique textuelle", Cahiers de Grammaire, vol. 10, 1985, p. 5-72.
Walker, M., Moore, J. M., "Empirical studies in discourse", Computational Linguistics, vol. 23 no. 1, 1997, p. 1-12.
Webber, B., Byron, D. K., "Discourse Annotation", ACL 2004 Workshop on Discourse Annotation, Barcelona, Spain, 2004.
Comité de lecture spécifique : N. Asher (CNRS, U. Toulouse 3, France)
J. Bateman (U. Bremen, Germany)
Y. Bestgen (U. C. Louvain, Belgium)
N. Bouayad-Agha (U. Pompeu Fabra, Barcelona, Spain)
M. Charolles (U. Paris 3, France)
N. Colineau (CSIRO, Australia)
D. Cristea (U. Iasi, Romania)
L. Danlos (U. Paris 7, France)
L. Degand (U. C. Louvain, Belgium)
P. Enjalbert (U. Caen, France)
S. Ferrari (U. Caen, France)
B. Grau (U. Paris-Sud, France)
C. Hallett (Open University, U.K.)
A. Hartley (U. Leeds, U.K.)
N. Hernandez (U. Nantes, France)
J. Karlgren (Swedish Institute of Computer Science, Sweden)
L. Kosseim (U. Concordia, Quebec, Canada)
G. Lapalme (U. Montréal, Quebec, Canada)
H. Le Thanh (Hanoi University of Technology, Vietnam)
N. Lucas (CNRS, U. Caen, France)
C. Mancini (Open University, U.K.)
A. Max (U. Paris-Sud, France)
J.-L. Minel (CNRS, U. Paris 10, France)
C. Paris (CSIRO, Australia)
R. Power (Open University, U.K.)
H. Saggion (U. Sheffield, U.K.)
S. Teufel (U. Cambridge, U.K.)
K. van Deemter (U. Aberdeen, U.K.)