Ticket #235 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

Retrieving publications from pubmed can't handle non-ascii characters

Reported by: robert@… Owned by: robert@…
Priority: major Milestone: Should have
Component: General Version:
Keywords: Cc:
Product: Operating system:
URL: Hardware:

Description

In the study wizard, it is possible to look up a publication on pubmed. When a title or authors-list contains 'special' characters (e.g. ë or ï, see  http://ci.gscf.nmcdsp.org/gscf-0.6.1-ci/publication/show/1254 for example), it is not correctly saved into gscf.

When one enters those characters using the add publication (menu -> publications -> view publications -> publication aanmaken), those characters are correctly saved.

Change History

Changed 3 years ago by work@…

  • milestone set to 0.7

sounds like some unicode issue...

Changed 3 years ago by robert@…

It does indeed. However, I have tried it and cannot reproduce how to get those characters in the database. When I try to import the publications using firefox, IE8 (and IE8 in IE7 compatibilty mode), it does work properly.

Maybe it has something to do with encoding in IE7, I'll take a look when I'm at TNO again

Changed 3 years ago by robert@…

The problem happens when using IE7 (7.0.5730.13) that is used in TNO. When searching for a publication, the names are already wrong in the autocomplete dropdown box. Still, the xml is correctly sent in utf-8, using the right headers. Probably some parsing error. Maybe you have to explicitly tell IE7 that it is UTF-8 encoded.

Changed 3 years ago by business@…

  • status changed from new to assigned
  • owner set to robert@…

Changed 3 years ago by robert@…

Should be solved now in r1673. I'll test again on TNO computers

Changed 3 years ago by robert@…

  • owner changed from robert@… to tsteemers@…

The issue seems to be solved in r1673 in development mode. On CI the problems still appear. It has something to do with the fact that we use a proxy (studywizard/entrezproxy) to access the data and that is doesn't always return correct utf8. Maybe grails has some configuration option that has something to do with it.

Jeroen, could you check configuration? You can check the functionality by searching for a publication with term 'horlin'. The first publication you find, is a publication with author 'hőrlin'. If the problem still appears, the author will be shown as 'h??rlin'.

Changed 3 years ago by robert@…

  • owner changed from tsteemers@… to work@…

Assigning to Jeroen, sorry Taco :)

Changed 3 years ago by work@…

It also does not work on Tomcat directly ( http://ci.nmcdsp.org:8081/gscf-0.7.0-ci), so it has nothing to do with the Apache configuration

Changed 3 years ago by work@…

Also, response headers are also: text/xml;charset=UTF-8

Changed 3 years ago by work@…

  • owner changed from work@… to robert@…

Tried debugging the issue, but all seems well in the code as well on the server configuration.

However, I wonder why you are using a proxy altogether? You are able to access a remote webservice on another domain using JavaScript?, this is what happens with the Ontology chooser as well... Is it not easier to just call the remote webservice directly instead of proxying?

Changed 3 years ago by work@…

resolved in r1722

Changed 3 years ago by work@…

  • status changed from assigned to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.