Support for Simple GraphML Files

CGE's support/limitations related to GraphML

CGE enables importing simple GraphML files and generating the corresponding quads for the given graph(s). To enable importing a GraphML file, the user can either list a GraphML file in a graph.info file as part of a database build, or load a GraphML file. When CGE processes an input file, any file that ends with the .graphml extension will be treated as a GraphML file.

The syntax supported for GraphML files is based on the DTD specification provided at: http://graphml.graphdrawing.org/

The following is a sample GraphML file that represents a simple graph:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="undirected">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <node id="n4"/>
    <edge id="e1" source="n0" target="n2"/>
    <edge id="e2" source="n1" target="n2"/>
    <edge id="e3" source="n2" target="n3"/>
    <edge id="e4" source="n3" target="n4"/>
  </graph>
</graphml>

Limitations

There are multiple limitations in the current support for GraphML files, including the following:
  • The xml and graphml elements are parsed, but otherwise ignored.
  • Edge data is currently ignored.
  • Default edge direction for a graph is ignored.
  • Edge direction attribute is ignored.
  • Default values for data are ignored.
  • Elements in a graph are limited to descriptions, data, nodes and edges.
  • Nodes and edges can only contain descriptions or data as subelements.
  • Nested graphs are not supported.
CGE will report warning or error messages to the log file for any incorrect syntax or unsupported features.

When translating an edge to a quad, CGE will convert the edge identifier as well as the source and target identifiers to URIs.

For example, given the following edge from the example above:
<edge id="e1" source="n0" target="n2"/>
CGE would generate the following quad:
<urn:n0> <urn:e2> <urn:n2> <urn:G> .

Note that when converting the identifier to a URI, CGE will insert the urn: prefix by default. Also, if any error is found when parsing an edge no quad will be generated for that edge. For example, if a node referred to by an edge does not exist in the given graph, or if there was an error when parsing the node declaration, these errors will prevent a quad from being generated for an edge.

NVPs for GraphML Support

  • cge.server.ExportGMLRDFEnable - Setting this NVP to 1 will cause CGE to export the quads generated for a given GraphML file to an nt file of the same name as the input GraphML file but with the nt extension. For example, if a graph.info file includes the line:
    /my/path/to/file_name.graphml
    The given NVP is set to 1 then CGE will write the quads produced by the GraphML file to an nt file named:
    /my/path/to/file_name.nt
    Exporting the quads to an nt file can be useful if the quads will be loaded multiple times since loading quads is faster and uses less memory than loading from a GraphML file. This NVP is off by default.
  • cge.server.GMLInsertPrefix - Setting this to 1 will cause CGE to insert the urn: prefix when converting identifiers for graphs, nodes, and edges to URIs. For example, the following edge:
    <edge id="e1" source="n0" target="n2"/>
    would result in URIs of <urn:e1>, <urn:n0> and <urn:n2> for the edge, source and target identifiers, respectively. This NVP is on by default.
  • cge.server.GMLCheckPrefix - Setting this to 1 will cause CGE to check an identifier for a known prefix before inserting the urn: default prefix. The prefixes that CGE will check for are:
    • urn:
    • http:
    • https:
    If a graph, node or edge identifier starts with one of these prefixes and this NVP is set, CGE will not insert the urn: prefix. For example, given the following edge:
    <edge id="http://www.mysite.com/e1" source="n0" target="n2"/>
    and having this NVP set will result in the following URIs:
    • <http://www.mysite.com/e1>
    • <urn:n0>
    • <urn:n2>
    Notice that since the source and target identifiers did not include a known prefix, CGE will insert the urn: prefix by default.