Knowledge engineering and the Web (original) (raw)

1. Knowledge engineering and the Web Guus Schreiber VU University Amsterdam Computer Science, Network Institute
2. Overview of this talk • Web data representation – a meta view • Knowledge for the Web: categories – key sources – Alignment • Using knowledge: visualization and search
3. My journey knowledge engineering • design patterns for problem solving • methodology for knowledge systems • models of domain knowledge • ontology engineering
4. My journey access to digital heritage
5. My journey Web standards Chair of •Web metadata: RDF 1.1 •OWL Web Ontology Language 1.0 •SKOS model for publishing vocabularies on the Web •Deployment & best practices
6. A few words about Web standardization • Key success factor! • Consensus process actually works – Some of the time at least • Public review – Taking every comment seriously • The danger of over-designing – Principle of minimal commitment
7. Example: W3C RDF 1.1 group • 8K group messages (publicly visible) • 2K messages about external comments • 125+ teleconferences • 200 issues resolved
8. Web data representation
9. Caution • Representation languages are there for you • And not the other way around ….
10. HTML5: a leap forward Rationale •Consistent separation of content and presentation •Semantics of the structure of information Typical new elements
11. RDF: triples and graphs RDF is simply labeling resources and links
12. RDF: multiple graphs www.example.org/bob
13. Data modeling on the Web RDF •Class hierarchy •Property hierarchy •Domain and range restrictions •Data types • OWL • Property characteristics – E.g., inverse, functional, transitive, ….. • Identify management – E.g., same as, equivalent class • …….. I prefer a pick-and-choose approach
14. Writing in an ontology language does not make it an ontology! • Ontology is vehicle for sharing • Papers about your own idiosyncratic “university ontology” should be rejected at conferences • The quality of an ontology does not depend on the number of, for example, OWL constructs used
15. Rationale •A vocabulary represents distilled knowledge of a community •Typically product of a consensus process over longer period of time Use •200+ vocabularies published •E.g.: Library of Congress Subject Headings •Mainly in library field SKOS: making existing vocabularies Web accessible
16. The strength of SKOS lies its simplicity Baker et al: Key choices in the design of SKOS
17. Beware of ontological over- commitment • We have the understandable tendency to use semantic modeling constructs whenever we can • Better is to limit any Web model to the absolute minimum
18. Knowledge on the web: categories
19. The concept triad Source: http://www.jamesodell.com/Ontology\_White\_Paper\_2011-07-15.pdf.
20. Categorization • OWL (Description logic) takes an extensional view of classes – A set is completely defined by its members • This puts the emphasis on specifying class boundaries • Work of Rosch et al. takes a different view 20
21. Categories (Rosch) • Help us to organize the world • Tools for perception • Basic-level categories – Are the prime categories used by people – Have the highest number of common and distinctive attributes – What those basic-level categories are may depend on context 21
22. Basic-level categories 22
23. 23 FOAF: Friend of a Friend
24. 24 Dublin Core: metadata of Web resources
25. Iconclass categorizing image scene
26. schema.org categories for TV programs
27. schema.org the notion of “Role”
28. schema.org issues • Top-down versus bottom-up • Ownership and control • Who can update/extend? • Does use for general search bias the vocabulary?
29. The myth of a unified vocabulary • In large virtual collections there are always multiple vocabularies – In multiple languages • Every vocabulary has its own perspective – You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links – “Vocabulary alignment”
30. Category alignment vs. identity disambiguation • Alignment concerns finding links between (similar) categories, which typically have no identity in the real world • Identity disambiguation is finding out whether two or more IDs point to the same object in the real world (e.g., person, building, ship) • The distinction is more subtle that “class versus instance”
31. Alignment techniques • Syntax: comparison of characters of the terms – Measures of syntactic distance – Language processing • E.g. Tokenization, single/plural, • Relate to lexical resource – Relate terms to place in WordNet hierarchy • Taxonomy comparison – Look for common parents/children in taxonomy • Instance based mapping – Two classes are similar if their instances are similar.
32. Alignment evaluation
33. Limitations of categorical thinking
34. Be modest! Don’t recreate, but enrich and align • Knowledge engineers should refrain from developing their own idiosyncratic ontologies • Instead, they should make the available rich vocabularies, thesauri and databases available in an interoperable (web) format • Techniques: learning, alignment
35. Using knowledge: visualization and search
36. Visualising piracy events
37. Extracting piracy events from piracy reports & Web sources
38. Enriching description of search results
39. Using alignment in search “Tokugawa” SVCN period Edo SVCN is local in-house ethnology thesaurus AAT style/period Edo (Japanese period) Tokugawa AAT is Getty’s Art & Architecture Thesaurus
40. Sample graph search algorithm From search term (literal) to art work •Find resources with matching label •Find path from resource to art work – Cost of each step (step when above cost threshold) – Special treatment of semantics: sameAs, inverseOf, … •Cluster results based on path similarities
41. Graph search
42. Example of path clustering Issues: •number of clusters •path length
43. Location-based search: Moulin de la Galette relatively easy
44. Relation search: Picasso, Matisse & Braque
46. Acknowledgements • Long list of people • Projects: COMMIT, Agora, PrestoPrime, EuropeanaConnect, Poseidon, BiographyNet, Multimedian E-Culture