Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

Robert Rovetto
The Auto-generate ID option (in Preferences-->New Entities) adds numerical IRIs to new entities.

But how can we change the IRIs of existing entities to be numerical?

And for both, how can we make it add at 5,10 or X intervals, e.g., 00005, 000010, ...?

If this is not a current feature, I'd like to request it.

In general, what are the pro's and con's of using either:
(a) numeric
(b) alphanumeric
(c) random (alpha)numeric?

What are the pros and cons of including the ontology alphabetic or alphanumeric namespace in the IRI?
E.g., ABC_00010 instead of 00010

Thanks

_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

samsontu


On Feb 7, 2020, at 9:31 AM, Robert Rovetto <[hidden email]> wrote:

The Auto-generate ID option (in Preferences-->New Entities) adds numerical IRIs to new entities.

But how can we change the IRIs of existing entities to be numerical?

And for both, how can we make it add at 5,10 or X intervals, e.g., 00005, 000010, ...?

If this is not a current feature, I'd like to request it.

Add the feature request to https://github.com/protegeproject/protege/issues yourself.


In general, what are the pro's and con's of using either:
(a) numeric
(b) alphanumeric
(c) random (alpha)numeric?

What are the pros and cons of including the ontology alphabetic or alphanumeric namespace in the IRI?
E.g., ABC_00010 instead of 00010


In my experience, the pros and cons of coding schemes depend a lot on factors like your expected users, whether your ontology/terminology will be used in conjunction with other terminologies, how your ontology is organized, and the size of your terminology. If you expect your terminology to be used by human coders (e.g., ICD codes for diseases), completely random alphanumeric codes  do not help users learn and recognize broad categories of the codes (e.g., In ICD-10, G00-G99 are diseases of the nervous system). If you expect your ontology to be used in conjunction with other terminologies, then it’s helpful to to prefix your codes with the identifying name space (e.g., OBO Foundry ontologies). Similarly, if your ontology is organized as a collection of modules, you may want to identify the home modules of your entities with prefixes. Use of alphabet letters give you more entries per character and possibly more compact codes. 

With best regards,
Samson


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

Robert Rovetto
To confirm, does that mean protege does not currently have a feature to automatically change exiting entity (class, relations, etc.) IRIs to a numerical one?

On Saturday, February 8, 2020, 12:43:19 PM EST, Samson Tu <[hidden email]> wrote:




On Feb 7, 2020, at 9:31 AM, Robert Rovetto <[hidden email]> wrote:

The Auto-generate ID option (in Preferences-->New Entities) adds numerical IRIs to new entities.

But how can we change the IRIs of existing entities to be numerical?

And for both, how can we make it add at 5,10 or X intervals, e.g., 00005, 000010, ...?

If this is not a current feature, I'd like to request it.

Add the feature request to https://github.com/protegeproject/protege/issues yourself.


In general, what are the pro's and con's of using either:
(a) numeric
(b) alphanumeric
(c) random (alpha)numeric?

What are the pros and cons of including the ontology alphabetic or alphanumeric namespace in the IRI?
E.g., ABC_00010 instead of 00010



In my experience, the pros and cons of coding schemes depend a lot on factors like your expected users, whether your ontology/terminology will be used in conjunction with other terminologies, how your ontology is organized, and the size of your terminology. If you expect your terminology to be used by human coders (e.g., ICD codes for diseases), completely random alphanumeric codes  do not help users learn and recognize broad categories of the codes (e.g., In ICD-10, G00-G99 are diseases of the nervous system). If you expect your ontology to be used in conjunction with other terminologies, then it’s helpful to to prefix your codes with the identifying name space (e.g., OBO Foundry ontologies). Similarly, if your ontology is organized as a collection of modules, you may want to identify the home modules of your entities with prefixes. Use of alphabet letters give you more entries per character and possibly more compact codes. 

With best regards,
Samson


_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user
Reply | Threaded
Open this post in threaded view
|

Re: Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

Phillip Lord
In reply to this post by Robert Rovetto
We wrote a paper on this a while back.

http://ceur-ws.org/Vol-2137/paper_33.pdf

As well as randomness, it's also worth considering pronounceability and checksuming. The advantages and disadvantages are:

numeric: easy to pronounce, but needs co-ordination and authority for coinage
alphanumeric: a valid IRI fragments (numbers are not, IIRC).
random: harder to pronounce (because they will be longer), but can be coined without coordination
checksummed: easily checkable for transcription errors, but with a smaller namespace.

The identitas library we created is available in protege now I think, but not sure if it made the release version.

________________________________________
From: protege-user <[hidden email]> on behalf of Robert Rovetto <[hidden email]>
Sent: 07 February 2020 17:31
To: User Support for WebProtege and Protege Desktop
Subject: [protege-user] Auto-generate IDs for existing and new enttiies - (alpha)numerical, at intervals,

The Auto-generate ID option (in Preferences-->New Entities) adds numerical IRIs to new entities.

But how can we change the IRIs of existing entities to be numerical?

And for both, how can we make it add at 5,10 or X intervals, e.g., 00005, 000010, ...?

If this is not a current feature, I'd like to request it.

In general, what are the pro's and con's of using either:
(a) numeric
(b) alphanumeric
(c) random (alpha)numeric?

What are the pros and cons of including the ontology alphabetic or alphanumeric namespace in the IRI?
E.g., ABC_00010 instead of 00010

Thanks
_______________________________________________
protege-user mailing list
[hidden email]
https://mailman.stanford.edu/mailman/listinfo/protege-user