KDD-2000 Workshop on Text Mining (original) (raw)

August 20, 2000

Held at KDD-2000, Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA

Call for papers (expired) on the following topics was open till May 15, 2000. We received 40 submissions that entered the review process. The list of pre-registered workshop attendees contains about 80 names.

Workshop Schedule, Workshop Proceedings and the list of accepted papers are available on the Web as well as theWorkshop Summary (to appera in SIGKDD Explorations, January 2001).

Topics of interest

The objective of this workshop is to enable presentation and exchange of ideas on various aspects of Text Mining. Our desire is to facilitate communication among researchers and practitioners from related and complementary research areas, who are working on similar problems but with possibly different focus and problem solving approaches. More precisely, we invite papers from the four areas:

Text Mining (or Text Learning) (TM)
Information Retrieval (IR)
Natural Language Processing (NLP)
Information Extraction (IE). Particular topics of interest for the workshop include but are not limited to:
text mining & information retrieval
text mining & natural language processing
text mining & web mining
text representation
text categorization
text segmentation
information extraction
scalability of developed approaches
performance evaluation measures
feature selection
multilingual approaches to text mining
influence of domain and domain specific text mining
innovative applications of text mining.

Workshop schedule

The workshop consists of two invited talks, presentation of refereed papers and posters, and discussions. We hope that the program will stimulate future collaboration among researchers on text mining problems.

8:45am - 8:55am Opening
8:55am - 9:25am Invited talk by Ronen Feldman: "Text Mining: Opportunities and Challenges"
9:25am - 10:15am Papers Ia: Information Extraction and Text Mining (session intro. + 3 papers - 12 mins each)
- 9:30am - 9:45am Data Mining on Symbolic Knowledge Extracted from the Web, R. Ghani, R. Jones, D. Mladenic, K. Nigam, S. Slattery
- 9:45am - 10:00am Using Information Extraction to Aid the Discovery of Prediction Rules from Text, U. Y. Nahm, R. J. Mooney
- 10:00am - 10:15am High Precision Information Extraction, R. Caruana, P. G. Hodor
10:15am - 10:35am Coffee Break
10:35am - 11:05am Papers Ib: Text categorization methods using machine learning (2 papers - 12 mins each)
- 10:35am - 10:50am Large Margin Winnow Methods for Text Categorization, T. Zhang
- 10:50am - 11:05am A Feature Weight Adjustment Algorithm for Document Categorization, S. Shankar, G. Karypis
11:05am - 11:40am Papers IIa: Text Mining applications: Mining time-tagged text (session intro. + 2 papers - 12 mins each)
- 11:10am - 11:25am TimeMines: Constructing Timelines with Statistical Models of Word Usage, R. Swan, D. Jensen
- 11:25am - 11:40am Mining of Concurrent Text and Time-Series, V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan
11:40am - 12:00pm Poster Session I (session intro. + 8 posters - 1 min each)
12:00pm - 1:00pm Lunch Break
1:00pm - 1:30pm Invited talk by David Lewis: "ATTICS: A Toolkit for Text Classification and Text Mining"
1:30pm - 2:30pm Papers IIb: Text Mining applications: Finding themes/topics in text, Mining e-mail data (4 papers - 12 mins each)
- 1:30pm - 1:45pm Mining Themes from Bookmarks, S. Chakrabarti, Y. Batterywala
- 1:45pm - 2:00pm Discovering Encyclopedic Structure and Topics in Text, L. A. Mather, J. Note
- 2:00pm - 2:15pm Mining E-mail Authorship, O. De Vel
- 2:15pm - 2:30pm ifile: An Application of Machine Learning to E-Mail Filtering, J. Rennie
2:30pm - 2:45pm Poster Session II (7 posters - 1 min each)
2:45pm - 3:30pm Posters on the boards
3:30pm - 4:00pm Discussion and closing

Organization

Program Chairs

Marko Grobelnik
J.Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
Marko.Grobelnik@ijs.si

Dunja Mladenic
J.Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia and Carnegie Mellon University, School of Computer Science, Pittsburgh, USA, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
Dunja.Mladenic@cs.cmu.edu

Natasa Milic-Frayling
Microsoft Research Ltd, St. George House, 1 Guildhall Street Cambridge, CB2 3NH, United Kingdom
natasamf@microsoft.com

Program Committee

Helena Ahonen,University of Helsinki, Helsinki, Finland
Simon Corston-Oliver, Microsoft Research, Redmond, WA
Mark Craven, University of Wisconsin, Madison, Wisconsin
Walter Daelemans, University of Antwerp, Antwerpen, Belgium
Susan Dumais, Microsoft Research, Redmond, WA
David Elworthy, Microsoft Research Ltd, Cambridge, UK
Ronen Feldman, Instinct Software, Israel
Marko Grobelnik, J.Stefan Institute, Ljubljana, Slovenia
Thorsten Joachims, Universitaet Dortmund, Dortmund, Germany
Rosie Jones, Carnegie Mellon University, Pittsburgh, PA
Natasa Milic-Frayling, Microsoft Research Ltd, Cambridge, UK
Dunja Mladenic, J.Stefan Institute, Ljubljana, Slovenia
Jason Rennie, Massachusetts Institute of Technology, MA
Stephen Robertson, Microsoft Research Ltd, Cambridge, UK
Sean Slattery, Carnegie Mellon University, Pittsburgh, PA
Ian Witten, University of Waikato, Hamilton, New Zealand

This Workshop is partially supported by the European FP5 project "Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net)".