Bug report #5255
Wrong codepage of shapefile
Status: | Closed | ||
---|---|---|---|
Priority: | Normal | ||
Assignee: | - | ||
Category: | - | ||
Affected QGIS version: | 1.7.4 | Regression?: | No |
Operating System: | Easy fix?: | No | |
Pull Request or Patch supplied: | No | Resolution: | upstream |
Crashes QGIS or corrupts data: | No | Copied to github as #: | 14989 |
Description
When opening shapefiles, it doesn't matters what codepage You choose, it is always UTF-8 in QGIS 1.74, so polish letters are wrong displayed (when shapefile was saved in other codepage than UTF-8, of course). Other coding is on list but it not works. In QGIS 1.73 it works perfect. The same problem is in master version.
Related issues
History
#1 Updated by Alexander Bruy over 12 years ago
This is because 1.7.4 and master now compiled against GDAL 1.9.0.
#2 Updated by zirneklitis - over 12 years ago
When *.dbf file is re-saved with OpenOffice Calc, QGIS shows the correct characters with any given code page. Until any edits are saved within QGIS. Question marks are saved in place of any non-latin characters. It's impossible to switch the code page for any shape files created by QGIS.
#3 Updated by Giovanni Manghi over 12 years ago
zirneklitis - wrote:
It's impossible to switch the code page for any shape files created by QGIS.
it is not qgis fault, is gdal one. see:
http://ssrebelious.wordpress.com/2012/03/11/qgis-and-gdal1-9-encoding-issue-a-workaround/
this is because 1.7.3 works, it is compiled with an old release of gdal.
#4 Updated by Alexander Bruy over 12 years ago
Bug in GDAL already fixed, see http://trac.osgeo.org/gdal/ticket/4650
#5 Updated by Giovanni Manghi over 12 years ago
- Resolution set to upstream
- Status changed from Open to Closed
#6 Updated by zirneklitis - over 12 years ago
Recompiled GDAl and QGIS:
QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.
Nothing has changed. The problem still remains.
OS: Fedora 14 x64.
#7 Updated by Alexander Bruy over 12 years ago
- Status changed from Closed to Reopened
You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved
#8 Updated by Giovanni Manghi over 12 years ago
- Status changed from Reopened to Closed
zirneklitis - wrote:
Recompiled GDAl and QGIS:
QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.
Nothing has changed. The problem still remains.
OS: Fedora 14 x64.
still a gdal issue, not a qgis one.
#9 Updated by zirneklitis - over 12 years ago
I insist that this is a QGIS issue.
GDAL 1.9.0 (and newer) is trying to interpret the encoding setting from the shape file itself. When creating a new shape file “ENCODING” should be passed as an attribute, which, obviously, is not done.
Calling qgis from terminal allows two track down an warning messages. Saving non-Latin characters in a shape files generates following warning message: “Warning 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1.
This warning will not be emitted anymore”.
On the other hand, most of the shape files used by users are without character encoding byte. So QGIS has to operate with environmental variable “SHAPE_ENCODING”. At present the only solution is to use the same character coding for the given QGIS session, e.g.:
SHAPE_ENCODING=UTF-8
export SHAPE_ENCODING
qgis
The example above allows to create and edit shape files with UTF-8 as a character encoding (example for Linux users, Windows users must use “SET SHAPE_ENCODING=UTF-8”).
------------------------------------------------
Excerpt from
http://trac.osgeo.org/gdal/wiki/ConfigOptions
In C/C++ configuration switches can be set programmatically like this:
#include "cpl_conv.h"
...
CPLSetConfigOption( "GDAL_CACHEMAX", "64" );
Normally a configuration option applies to all threads active in a program, but they can be limited to only the current thread this way:
CPLSetThreadLocalConfigOption( "GDAL_CACHEMAX", "64" );
#10 Updated by zirneklitis - over 12 years ago
The Linux example above should be as follows:
$ SHAPE_ENCODING=UTF-8 $ export SHAPE_ENCODING $ qgis
#11 Updated by Alexander Bruy over 12 years ago
zirneklitis - wrote:
I insist that this is a QGIS issue.
This is GDAL issue. GDAL always reports that it returned attributes is UTF-8, even when attributes have different encoding. SHAPE_ENCODING environment variable didn't work in most cases. This bug was partially fixed (see http://trac.osgeo.org/gdal/ticket/4650), but some more fixes needed
#12 Updated by Jürgen Fischer over 12 years ago
Alexander Bruy wrote:
You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved
how?
#13 Updated by Alexander Bruy over 12 years ago
Jürgen Fischer wrote:
how?
This is only workaround, not real fix. We simply reverted some parts of 2d0edcd7a2 (related to OLCStringsAsUTF8). With GDAL 2.0 in most cases all works fine without this workaround and we are working on final fix for GDAL
#14 Updated by Even Rouault over 12 years ago
Note that I've just pushed additonnal fixes in GDAL ( see http://trac.osgeo.org/gdal/ticket/4650 ) that should make OLCStringsAsUTF8 more reliable.
#15 Updated by Tim Sutton over 12 years ago
Hi
Could you please provide a Free, minimal test dataset so the we can add a test to our test suit, along with an idea of how we can evaluate the test as passing.
#16 Updated by Even Rouault over 12 years ago
- File chinese.zip added
I'm attaching a small shapefile generated by the following OGR Python script (needs latest GDAL trunk, to support recoding of field name from UTF-8 to CP936 - reading should be OK with GDAL 1.9)
import sys from osgeo import ogr, osr, gdal import struct ds = ogr.GetDriverByName('ESRI Shapefile').CreateDataSource('chinese.dbf') lyr = ds.CreateLayer('chinese', options = ['ENCODING=LDID/77']) chinese_str = struct.pack('B' * 6, 229, 144, 141, 231, 167, 176) lyr.CreateField(ogr.FieldDefn(chinese_str, ogr.OFTString)) feat = ogr.Feature(lyr.GetLayerDefn()) feat.SetField(0, chinese_str) lyr.CreateFeature(feat) ds = None
#17 Updated by zirneklitis - over 12 years ago
Who should create the .cpg files – GDAL or QGIS? Shape file with *.cpg* present works as expected (partly – QGIS has no idea of the existence of this file). The attribute values are not crippled any more. More about *.cpg files:
http://support.esri.com/en/knowledgebase/techarticles/detail/21106
#18 Updated by Minoru Akagi over 12 years ago
I installed GDAL 1.9.1 by using OSGeo4W.
When I convert a dataset of Shapefile which dbf file has "19" value (it means "CP932") in LDID field to KML format with ogr2ogr, the following message is shown.
Warning1: Recode from CP932 to UTF-8 not supported, treated as ISO8859-1 to UTF-8
The Japanese characters of generated KML file is incorrect. This will also result character corruption in QGIS.
I think that recoding of GDAL with iconv library is not enabled now.
For testing, I built GDAL 1.9.1 compiled with HAVE_ICONV constant declared and linked with iconv library.
With my built ogr2ogr, the warning is not appeared and a KML file with readable Japanese characters is generated.
I, as a Japanese user of the great softwares, desired that QGIS use GDAL with iconv library linked.
#19 Updated by Minoru Akagi over 12 years ago
I've also reported this recoding issue to OSGeo4W Trac.
http://trac.osgeo.org/osgeo4w/ticket/294
#20 Updated by Minoru Akagi over 12 years ago
Sorry, I noticed that I had a problem, which had been solved already in latest GDAL trunk. There is no problem converting CP932 to UTF-8.
#21 Updated by Jürgen Fischer over 6 years ago
- Related to Bug report #13203: When opening Shapefile the .cpg file is ignored in Windows 8.1 added