syncing changes from pyMicrodata by joernhees · Pull Request #587 · RDFLib/rdflib (original) (raw)

@iherman please also have a look at https://travis-ci.org/RDFLib/rdflib/jobs/105300072#L1296 .
Is the following the output you'd expect for https://github.com/RDFLib/rdflib/blob/master/test/mdata/codelab.html?

[] a schema:TechArticle ;
schema:articleBody """
Exercise 1: From basic HTML to RDFa: first steps
Exercise 2: Embedded types
Exercise 3: From strings to things
""" ;
schema:author "Author Name" ;
schema:datePublished "January 29, 2014" ;
schema:description """
About this codelab
""" ;
schema:educationalUse "codelab" ;
schema:image file:///home/travis/build/RDFLib/rdflib/test/mdata/squares.png ;
schema:name "Structured data with schema.org codelab" .

The test

<test/mdata/codelab.html> md:item ( [ a schema:TechArticle ;

shows that in the old version there was some info about the document itself, which is gone now it seems.

This looks correct to me.

I presume the reason of the problem is the new lines in the literals for 'schema:description' and 'schema:articleBody'. It was indeed a long discussion back in the RDFa days whether generated literals should be 'normalized', ie, whether spaces and new lines at the beginning and the end of the literal should be removed but, at the end, it was decided not to do that. The microdata conversion followed the same approach also for compatibility. Formally, there is no such normalization step in the algorithmic definition:

https://www.w3.org/TR/microdata-rdf/#h3_algorithm-terms