Using owlapi to parse owl file with reserved characters, saved from Protege

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error. The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Timothy Redmond
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Timothy Redmond

One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!

This is again just a quick answer but Protege (4.2 and better) and the latest WebProtege simply use the OWL api to save the file, so if there is an issue then it is an issue with whatever version of the OWL api Protege is using.  I will try to add more detail later.

-Timothy.


On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Timothy Redmond
In reply to this post by blaisec

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.

I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.

The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy



On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

TestSlash.owl (3K) Download Attachment
DLQueryExample.java (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)? On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!


On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Timothy Redmond
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.

On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy




On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Tania Tudorache
Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Matthew Horridge-2
Administrator
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Matthew Horridge-2
Administrator
Hi Blaise,

You need to use AnnotationValueShortFormProvider instead of SimpleShortFormProvider.  Create an instance of, setting it up to specify the correct annotation value to use - I don’t know what this is for your particular case (rdfs:label is a typical case and you can get an instance of OWLAnnotationProperty corresponding to rdfs:label from an OWLDataFactory).  All the other values can be set to empty for now.  Give this a try and let us know if you have further problems.

Cheers,

Matthew



On 12 Jun 2014, at 12:09, Blaise Che <[hidden email]> wrote:

Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Hi Matthew, appreciate your support. I have not been able to get the AnnotationValueShortFormProvider working so far in my case. I have attached a sample Protege generated owl file. If you have a code snippet on using the AnnotationValueShortFormProvider instead for entity extraction, please send. Else, I will keep on experimenting.

Again, appreciate your time. Great team out there!

Blaise


On Thu, Jun 12, 2014 at 12:14 PM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

You need to use AnnotationValueShortFormProvider instead of SimpleShortFormProvider.  Create an instance of, setting it up to specify the correct annotation value to use - I don’t know what this is for your particular case (rdfs:label is a typical case and you can get an instance of OWLAnnotationProperty corresponding to rdfs:label from an OWLDataFactory).  All the other values can be set to empty for now.  Give this a try and let us know if you have further problems.

Cheers,

Matthew



On 12 Jun 2014, at 12:09, Blaise Che <[hidden email]> wrote:

Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

OwlSample.owl (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Lorenz Buehmann
Hi Blaise,

I'm not sure if I got the whole workflow, but in your attached ontology there are no annotations like rdfs:label etc, thus, it wouldn't make sense to use the AnnotationValueShortFormProvider in this case unless this one has an internal fallback like the SimpleShortFormProvider.
@Matthew: Is there any implementation in this direction? Otherwise it should be rather simple to implement it I guess.

Lorenz
On 06/13/2014 12:30 AM, Blaise Che wrote:
Hi Matthew, appreciate your support. I have not been able to get the AnnotationValueShortFormProvider working so far in my case. I have attached a sample Protege generated owl file. If you have a code snippet on using the AnnotationValueShortFormProvider instead for entity extraction, please send. Else, I will keep on experimenting.

Again, appreciate your time. Great team out there!

Blaise


On Thu, Jun 12, 2014 at 12:14 PM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

You need to use AnnotationValueShortFormProvider instead of SimpleShortFormProvider.  Create an instance of, setting it up to specify the correct annotation value to use - I don’t know what this is for your particular case (rdfs:label is a typical case and you can get an instance of OWLAnnotationProperty corresponding to rdfs:label from an OWLDataFactory).  All the other values can be set to empty for now.  Give this a try and let us know if you have further problems.

Cheers,

Matthew



On 12 Jun 2014, at 12:09, Blaise Che <[hidden email]> wrote:

Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

blaisec
Hello Lorenz,

The owl file we receive from our vendor does not have any labels. It is classified material, so I attached a basic example to illustrate. I still cannot get AnnotationValueShortFormProvider to work for it. If Matthew or an other could assist with extracting the entity from that example the same way 'DL Query' does, then that should do it.

Thanks,

Blaise


On Fri, Jun 13, 2014 at 3:52 AM, Lorenz Bühmann <[hidden email]> wrote:
Hi Blaise,

I'm not sure if I got the whole workflow, but in your attached ontology there are no annotations like rdfs:label etc, thus, it wouldn't make sense to use the AnnotationValueShortFormProvider in this case unless this one has an internal fallback like the SimpleShortFormProvider.
@Matthew: Is there any implementation in this direction? Otherwise it should be rather simple to implement it I guess.

Lorenz

On 06/13/2014 12:30 AM, Blaise Che wrote:
Hi Matthew, appreciate your support. I have not been able to get the AnnotationValueShortFormProvider working so far in my case. I have attached a sample Protege generated owl file. If you have a code snippet on using the AnnotationValueShortFormProvider instead for entity extraction, please send. Else, I will keep on experimenting.

Again, appreciate your time. Great team out there!

Blaise


On Thu, Jun 12, 2014 at 12:14 PM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

You need to use AnnotationValueShortFormProvider instead of SimpleShortFormProvider.  Create an instance of, setting it up to specify the correct annotation value to use - I don’t know what this is for your particular case (rdfs:label is a typical case and you can get an instance of OWLAnnotationProperty corresponding to rdfs:label from an OWLDataFactory).  All the other values can be set to empty for now.  Give this a try and let us know if you have further problems.

Cheers,

Matthew



On 12 Jun 2014, at 12:09, Blaise Che <[hidden email]> wrote:

Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Using owlapi to parse owl file with reserved characters, saved from Protege

Lorenz Buehmann
Hi Blaise,

I'm still not sure if I get it, but if the ontology of your vendor does not contain any labels and also no other textual description of entities by using other annotation properties, then it can not work using the AnnotationValueShortFormProvider because there are no annotations. In this case you would have to use the SimpleShortFormProvider. But as I said, I did not follow the entire discussion and probably did not get the whole targeted process.

Lorenz
On 06/14/2014 12:37 AM, Blaise Che wrote:
Hello Lorenz,

The owl file we receive from our vendor does not have any labels. It is classified material, so I attached a basic example to illustrate. I still cannot get AnnotationValueShortFormProvider to work for it. If Matthew or an other could assist with extracting the entity from that example the same way 'DL Query' does, then that should do it.

Thanks,

Blaise


On Fri, Jun 13, 2014 at 3:52 AM, Lorenz Bühmann <[hidden email]> wrote:
Hi Blaise,

I'm not sure if I got the whole workflow, but in your attached ontology there are no annotations like rdfs:label etc, thus, it wouldn't make sense to use the AnnotationValueShortFormProvider in this case unless this one has an internal fallback like the SimpleShortFormProvider.
@Matthew: Is there any implementation in this direction? Otherwise it should be rather simple to implement it I guess.

Lorenz

On 06/13/2014 12:30 AM, Blaise Che wrote:
Hi Matthew, appreciate your support. I have not been able to get the AnnotationValueShortFormProvider working so far in my case. I have attached a sample Protege generated owl file. If you have a code snippet on using the AnnotationValueShortFormProvider instead for entity extraction, please send. Else, I will keep on experimenting.

Again, appreciate your time. Great team out there!

Blaise


On Thu, Jun 12, 2014 at 12:14 PM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

You need to use AnnotationValueShortFormProvider instead of SimpleShortFormProvider.  Create an instance of, setting it up to specify the correct annotation value to use - I don’t know what this is for your particular case (rdfs:label is a typical case and you can get an instance of OWLAnnotationProperty corresponding to rdfs:label from an OWLDataFactory).  All the other values can be set to empty for now.  Give this a try and let us know if you have further problems.

Cheers,

Matthew



On 12 Jun 2014, at 12:09, Blaise Che <[hidden email]> wrote:

Hi Matthew,

I am using the sample example at: https://github.com/owlcs/owlapi/blob/master/contract/src/test/java/org/coode/owlapi/examples/DLQueryExample.java. Looks like it uses the SimpleShortFormProvider. If you can assist with adapting it to use the AnnotationValueShortFormProvider, and query entities like 'United_States(US)' from a basic stripped down owl file from Protege, that should resolve the issue I guess. A code snippet should assist as well.

Thanks,

Blaise


On Thu, Jun 12, 2014 at 11:45 AM, Matthew Horridge <[hidden email]> wrote:
Hi Blaise,

Tim’s reply from Jun 5, 2014 at 8:40 PM seems spot on.  To be sure, please can you post your (exact) code that sets up the DL query to the list?  My guess is that you’re not configuring the short form provider, which maps entities to short names, correctly.  You probably want to use an instance of AnnotationValueShortFormProvider.

Cheers,

Matthew





On 12 Jun 2014, at 11:40, Tania Tudorache <[hidden email]> wrote:

Hi Blaise,

Matthew will reply.

T.

On 06/12/2014 09:35 AM, Blaise Che wrote:
As an aside, you are correct that a space is not allowed on an IRI. The highlight on the example: 'United States(US)' was actually the parenthesis for which owlapi examples handle differently from Protege's 'DL Query'.


On Thu, Jun 12, 2014 at 9:24 AM, Blaise Che <[hidden email]> wrote:
Hi Tim, thanks again. As I mentioned initially, our company is receiving owl files from one of our vendors that uses Protege to create them. Our goal is to extract the entities (superclasses and subclasses) the same way 'DL Query' does. The example on the owlapi website seems to process the owl file slightly differently from the 'DL Query' internal parser, although we know Protege uses owlapi internally as well. Our company does not have control over how our vendor saves the file (if they use a label or not). So if you have sample code to extract entities the same way 'DL Query' does, including when those reserved characters are available, that will resolve this issue. Also, you wrote: "If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a". Could you please explain this statement with sample code?

Thanks,

Blaise



On Wed, Jun 11, 2014 at 10:00 PM, Timothy Redmond <[hidden email]> wrote:
On 06/09/2014 04:29 PM, Blaise Che wrote:
Thanks for the follow-up Tim. Looks like you only tested with the foward slash reserved character. Can you attempt with other reserved characters? For example, can you change your entity name to: United States (US)?

I don't think that the IRI can have a space in it.  An rdfs:label annotation can though.

As far as I can tell the space is not allowed as a character in the fragment of an IRI.  I believe that the definitive specification of IRI's is here:

             http://www.ietf.org/rfc/rfc3987.txt

My logic for concluding this used the following productions from the grammar therein:

ifragment      = *( ipchar / "/" / "?" )
ipchar         = iunreserved / pct-encoded / sub-delims / ":"
                  / "@"
iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
pct-encoded    = "%" HEXDIG HEXDIG
sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

The ucschar characters are beyond the ascii character set (space is 0x20) and nothing else will work. So Protege is right to replace it with underscores.  If you really want to represent a space the best thing you could try would be the percent encoded which I think would look like this "%20".

The OWL api didn't do so well with this example.  It saved the ontology with an IRI with a space in it but it was unable to read it back in.


On 'DL Query', I query the class as: 'United_States_(US)'. In this case, note the underscore and single quotes I have to add in 'DL Query' for it to work. Any thoughts on why? Also when I query this file with the owlapi example code, I get an error: 'Encountered United_States_ at line 1 column 1'. Do you know what class name to use to query related class information using the owlapi example? Can you confirm these issues and get any relevant input from developers as well? Thanks!

I am no longer sure what problem we are trying to solve here.  If you want a flexible range of reserved characters then I would recommend that you use rdfs:label to represent the names of your OWL entities.  If you want to write code that a user can use to make DL queries, then in your code you setup the short form provider with the mapping from strings to OWL entities that you want to use.a

-Timothy





On Thu, Jun 5, 2014 at 8:40 PM, Timothy Redmond <[hidden email]> wrote:

Having now given a go at replicating your problem and based on what you have told us so far, I think that your troubles involve the different names for entities and how various programs are translating these names.


I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters.

I made an ontology (attached) that has an entity with the name

    http://mousey.synology.me/ontologies/TestSlash#A/B

I don't know if this is a valid IRI or not but the OWL api had no problems saving or reading it.  I think that I remember that the code that checks IRI's for validity was fixed quite some time ago and now properly checks their syntax but I am not certain.

The IRI is the "real" name for the entity and it is what you use if you want to be unambiguous.  Unfortunately this is not a convenient name for humans and this is why Protege and other tools let you use other simpler names such as the rdfs:label. 

Different mappings between the readable names and the IRI are possible and you are using two OWL api programs (Protege and the DLQueryExample) that have been set up with different mappings.


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package.


Before I gave this entity an rdfs:label of "A/B", it rendered in Protege as B and the DLQuery tab would properly show the inferred individuals in B.  After I gave this entity an rdfs:label of "A/B", it rendered in Protege as A/B and the DLQuery tab would properly show the inferred individuals in A/B.  However, in both cases the DLQueryExample.java program that you mentioned would only respond to the name B.

Thus Protege is allowing you to use whatever name is used to render the entity.  The DLQueryExample program is simply using the fragment at the end of the IRI (e.g. B for our entity) and this is indicated in the comment before the short form provider is set in the program:

            // Entities are named using IRIs. These are usually too long for use
            // in user interfaces. To solve this
            // problem, and so a query can be written using short class,
            // property, individual names we use a short form
            // provider. In this case, we'll just use a simple short form
            // provider that generates short froms from IRI
            // fragments.
            ShortFormProvider shortFormProvider = new SimpleShortFormProvider();

I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters

In my experiments, I saw no evidence of any bug in Protege or in the OWL api.  It is true that I didn't check the specifications for the validity of the IRI.  But the ontology saved and loaded correctly and seemed to behave correctly when it was loaded.

-Timothy




On 06/05/2014 03:01 PM, Blaise Che wrote:
Thanks Tim. The motivation here is that we are receiving owl file from a vendor with these reserved characters, and would like to parse and extract useful data with owlapi. The issue is very easy to reproduce by saving an owl file from Protege with a class having a reserved character. If you open th example file you just generated, and hit the DLQuery tab, you wil realize that you can query for related class information for the class 'A/B'. However, if you try to query the same information with the official owlapi examples at https://github.com/owlcs/owlapi/tree/master/contract/src/test/java/org/coode/owlapi/examples, you will receive parsing errors because of the '/' character. One of the experts at owlapi argues that it is an error with the way Protege saves the owl file with these reserved characters (per stackoverflow link below). However, I would like a Protege developer to confirm and if so, whether a fix is planned for it. Thanks!


On Thu, Jun 5, 2014 at 7:15 AM, Timothy Redmond <[hidden email]> wrote:
On 06/03/2014 09:41 AM, Blaise Che wrote:
I tried to post this in developer forum but it bounced back. Sorry if cross-posting.
I am running P4.3 and exporting an owl file with class names having forward
slashes and other reserved characters. I noticed that these characters are
maintained. However when I use the owlapi tutorial example file to query for
subclasses of these classes, I get an error.

I haven't looked at this in detail yet but there are a few general things that can be said.  First of all, Protege is based on the OWL api so - modulo OWL api versioning - anything that happens in Protege can be mirrored in the OWL api.  That being said, the Protege user interface may be doing something to your input when you put it in (especially if the syntax is illegal for IRI's).  It is also possible that the OWL api is lenient with IRI's when an ontology is saved.

But if you want to make it easy for someone to say more, then add some steps to reproduce the issue, maybe even supply an ontology.  In answering your question, a developer may try to reproduce your situation.  I did, for example create a name for an entity with  '/' in it and I found an IRI of the form:

	http://www.semanticweb.org/redmond/ontologies/2014/5/untitled-ontology-63#A/B

and Protege gave the short name B. I haven't yet checked if this is illegal syntax.

Also - what is your motivation for trying this?

-Timothy


The odd characters are getting
in the way. However, if I use 'DL Query' in Protege to query these classes
for subclasses, as long as I enclose the classes with single quotes, the
query works just fine. I can't seem to do the same using owlapi package. I
inquired at stackoverflow and one of the owlapi developers responded that it was a bug
in Protege while exporting owl file with class names having reserved character(s). Any
thoughts? Here is the discussion in stackoverflow:
http://stackoverflow.com/questions/23506879/using-owlapi-to-parse-owl-file-containing-classes-with-odd-characters



_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user





_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user




_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user