programmatic insertion of unicode characters in OWL Database project

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

programmatic insertion of unicode characters in OWL Database project

Gandalf-6
Hello,

I am developing an OWL Database project backed on a mysql server. When I
create programmatically instances having UTF-8 characters in their
properties, these characters are replaced by question marks (i.e. '?')
upon storage in the database.

For example, the following code :

OWLIndividual ind = owlClass.createOWLIndividual("Târgu_Mureş");

would store "Târgu_Mure?" in the database and display it as such in
Protégé's instance browser tab.

I don't encounter this problem when I create such instances manually
through Protégé's GUI. So, obviously, it must be possible to tell
Protégé OWL to encode the strings in UTF-8.

How can I do that ?


Thanks for your kind help.

GLG

-------------------------------------------------------------------------
To unsubscribe go to http://protege.stanford.edu/community/subscribe.html

Reply | Threaded
Open this post in threaded view
|

Re: programmatic insertion of unicode characters in OWL Database project

Tania Tudorache
This problem occurs also in the GUI, not just by programmatic access.

Protege uses as its default encoding utf8, but MySQL server may use
another one. You can find out what encodings are used by your MySQL
server, with this query:
"SHOW VARIABLES LIKE 'character_set%'".

In order to force a certain character set for a connection, you should
add this to your url for connecting to the database server:
?characterEncoding=UTF-8

More about MySQL and character sets:
http://mirrors.dotsrc.org/mysql/doc/refman/5.0/en/cj-character-sets.html
http://dev.mysql.com/doc/refman/5.0/en/localization.html

Let me know if this works for you. If not, you will need to change the
character set for the database table. I can help you with that.

Tania


Gandalf wrote:

> Hello,
>
> I am developing an OWL Database project backed on a mysql server. When
> I create programmatically instances having UTF-8 characters in their
> properties, these characters are replaced by question marks (i.e. '?')
> upon storage in the database.
>
> For example, the following code :
>
> OWLIndividual ind = owlClass.createOWLIndividual("Târgu_Mureş");
>
> would store "Târgu_Mure?" in the database and display it as such in
> Protégé's instance browser tab.
>
> I don't encounter this problem when I create such instances manually
> through Protégé's GUI. So, obviously, it must be possible to tell
> Protégé OWL to encode the strings in UTF-8.
>
> How can I do that ?
>
>
> Thanks for your kind help.
>
> GLG
>
> -------------------------------------------------------------------------
> To unsubscribe go to http://protege.stanford.edu/community/subscribe.html
>
>

-------------------------------------------------------------------------
To unsubscribe go to http://protege.stanford.edu/community/subscribe.html

Reply | Threaded
Open this post in threaded view
|

Re: programmatic insertion of unicode characters in OWL Database project

Gandalf-6
Hi Tania,

Thank you for your quick answer.

I am running easyphp 1.8 (MySQL 4.1.9-max) and my tables are encoded
with collation utf8_general_ci.

Tania Tudorache wrote:
> This problem occurs also in the GUI, not just by programmatic access.

You are right. Upon insertion through the GUI, things look fine, but
when you restart Protégé the unicode characters are not displayed
correctly any more...

> Protege uses as its default encoding utf8, but MySQL server may use
> another one. You can find out what encodings are used by your MySQL
> server, with this query:
> "SHOW VARIABLES LIKE 'character_set%'".

This query returns the following results:

character_set_client   utf8
character_set_connection utf8
character_set_database utf8
character_set_results utf8
character_set_server latin1
character_set_system utf8

Adding the following line in "my.ini" file resolved the problem:

character-set-server=utf8
collation-server=utf8_general_ci

Obviously, the 'latin1' value of variable 'character_set_server' may be
the problem. Thank you for the hint.

> In order to force a certain character set for a connection, you should
> add this to your url for connecting to the database server:
> ?characterEncoding=UTF-8

In case I or someone else would need this later, could you please
clarify where '?characterEncoding=UTF-8' should be added ? Should it be
specified through the project creation wizard ? Can this be done
programmatically also and how ? So far I have been using the following
code to access to my data:

m = (OWLModel) (Project.loadProjectFromFile("myfile.pprj",
                        new ArrayList())).getKnowledgeBase();

What would be the alternative ?

> Let me know if this works for you. If not, you will need to change the
> character set for the database table. I can help you with that.

Thank you for your kind help.

Regards.

GLG

-------------------------------------------------------------------------
To unsubscribe go to http://protege.stanford.edu/community/subscribe.html

Reply | Threaded
Open this post in threaded view
|

Re: programmatic insertion of unicode characters in OWL Database project

Tania Tudorache
Hi,

I'm glad that you found a solution.

> In case I or someone else would need this later, could you please
> clarify where '?characterEncoding=UTF-8' should be added ? Should it
> be specified through the project creation wizard ? Can this be done
> programmatically also and how ? So far I have been using the following
> code to access to my data:
>
> m = (OWLModel) (Project.loadProjectFromFile("myfile.pprj",
>             new ArrayList())).getKnowledgeBase();
>
> What would be the alternative ?

Sorry, for not being clear.

At project creation time, when you specify the database connection
parameters, you can append in the "JDBC Driver URL" the
"?characterEncoding=UTF-8" string. For example:

jdbc:mysql://my_db_server:3306/my_db?characterEncoding=UTF-8

This connection string is stored in the pprj file. If you open the pprj
file in a text editor you can search for "jdbc:mysql" and you can append
the encoding string. This is a quick hack if you want to reuse the old
pprj file.

As long as the pprj file stores the correct database connection
information, you don't have to worry about it in the code.

There is a method to change the URL connection string programmatically.
I did not test it, but it should work:
http://protege.stanford.edu/doc/pdk/api/edu/stanford/smi/protege/storage/database/DatabaseKnowledgeBaseFactory.html#setURL(edu.stanford.smi.protege.util.PropertyList,%20java.lang.String)

Tania

-------------------------------------------------------------------------
To unsubscribe go to http://protege.stanford.edu/community/subscribe.html