Replace long list of namespaces with list of prefixes used when using serialize. (original) (raw)

I am wondering why the latest version of rdflib gives me a long list of namespaces when serializing a Graph to JSON-LD. It didn't used to be like that.

            return self.graph.serialize(
                format=return_format,
                context=dict(self.graph.namespaces()),
                auto_compact=True
            )

The return_format is 'json-ld'. The context in the result is:

  "@context": {
    "brick": "https://brickschema.org/schema/Brick#",
    "csvw": "http://www.w3.org/ns/csvw#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcam": "http://purl.org/dc/dcam/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcmitype": "http://purl.org/dc/dcmitype/",
    "dcterms": "http://purl.org/dc/terms/",
    "doap": "http://usefulinc.com/ns/doap#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "gtaa": "http://data.beeldengeluid.nl/gtaa/",
    "non-gtaa": "http://data.beeldengeluid.nl/nongtaa/",
    "odrl": "http://www.w3.org/ns/odrl/2/",
    "org": "http://www.w3.org/ns/org#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "prof": "http://www.w3.org/ns/dx/prof/",
    "prov": "http://www.w3.org/ns/prov#",
    "qb": "http://purl.org/linked-data/cube#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "https://schema.org/",
    "sdo": "https://schema.org/",
    "sh": "http://www.w3.org/ns/shacl#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "time": "http://www.w3.org/2006/time#",
    "vann": "http://purl.org/vocab/vann/",
    "void": "http://rdfs.org/ns/void#",
    "xml": "http://www.w3.org/XML/1998/namespace",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },

This is where the graph and namespace bindings (including some custom ones) were created:

        self.graph = Graph()
        self.graph.namespace_manager.bind("skos", SKOS)
        self.graph.namespace_manager.bind("gtaa", Namespace(self._model.GTAA_NAMESPACE))
        self.graph.namespace_manager.bind("non-gtaa", Namespace(self._model.NON_GTAA_NAMESPACE))

Further, in the custom class I add another namespace and triples. Here's a fragment:

        self.graph.namespace_manager.bind('sdo', Namespace(self._model.SCHEMA_DOT_ORG_NAMESPACE))
        # create a node for the record
        self.itemNode = URIRef(self.get_uri(concept_type, metadata["id"]))

        # get the RDF class URI for this type
        self.classUri = self._model.CLASS_URIS_FOR_DAAN_LEVELS[concept_type]

        # add the type
        self.graph.add((self.itemNode, RDF.type, URIRef(self.classUri)))

The custom Class is used to read some JSON from a backend system, interpret this and generate RDF for the item. You could see this as a wrapper pattern.

As I wrote in comments to this issue, I had to create some custom function to remove unused prefixes from the context, but that code is not so dynamic:

    def remove_unused_prefixes(self):
        """ Clean up the long list of namespaces.
        """
        context = dict(self.graph.namespaces())
        used_prefixes = ['gtaa', 'non-gtaa', 'rdf', 'rdfs', 'sdo', 'skos', 'xml', 'xsd']
        return {p: context[p] for p in used_prefixes}

and this is used here:

            context_used = self.remove_unused_prefixes()
            return self.graph.serialize(
                format=return_format,
                context=context_used,
                auto_compact=True
            )

Now I just discovered that the context argument can be left out. This is probably because of recent improvements and integration of json-ld. Well done. But it still gives me the long list. Also, when omitting the context argument the auto_compact=True argument finally gives me a short representation that I wanted, for example: "sdo:datePublished": "2006-02-19",. This is not the case when using this context = dict(self.graph.namespaces()). But, after all I still get the long list of namespace that I aren't used.

Another discovery: when further reducing the number of arguments, I still get JSON-LD, but no context at all in the results.

           return self.graph.serialize(
                format=return_format
            )

To summarize this issue: I would like to get JSON-LD serialization including context, but with a minimal list of (used) prefixes/namespaces in response to this request:

            return self.graph.serialize(
                format='json-ld',
                auto_compact=True
            )

I hope provided examples will help.