jeudi 18 juin 2015

GDAL/OGR 2.0.0 released

On behalf of the GDAL/OGR development team and community, I am pleased to announce the release of GDAL/OGR 2.0.0.  GDAL/OGR is a C++ geospatial data access library for raster and vector file formats, databases and web services.  It includes bindings for several languages, and a variety of command line tools.

The 2.0.0 release is a major new feature release with the following highlights:

 * GDAL and OGR driver and dataset unification  
 * 64-bit integer fields and feature IDs
 * Support for curve geometries      
 * Support for OGR field subtypes (boolean, int16, float32)      
 * RasterIO() improvements : resampling and progress callback      
 * Stricter SQL quoting      
 * OGR not-null constraints and default values      
 * OGR dataset transactions      
 * Refined SetFeature() and DeleteFeature() semantics      
 * OFTTime/OFTDateTime millisecond accuracy
 * 64bit histogram bucket count      

 * New GDAL drivers:
    - BPG: read-only driver for Better Portable Graphics format (experimental)
    - GPKG: read/write/update capabilities in the unified raster/vector driver              
    - KEA: read/write driver for KEA format               
    - PLMosaic: read-only driver for Planet Labs Mosaics API               
    - ROI_PAC: read/write driver for image formats of JPL's ROI_PAC project               
    - VICAR: read-only driver for VICAR format               

 * New OGR drivers:
    - Cloudant : read/write driver for Cloudant service               
    - CSW: read-only driver for OGC CSW (Catalog Service for the Web) protocol               
    - JML: read/write driver for OpenJUMP .jml format               
    - PLScenes: read-only driver for Planet Labs Scenes API                
    - Selaphin: read/write driver for the Selaphin/Seraphin format
 * Significantly improved drivers: CSV, GPKG, GTiff, JP2OpenJPEG, MapInfo, PG,

 * Upgrade to EPSG 8.5 database

 * Fix locale related issues when formatting or reading floating point numbers

You can also find more complete information on the new features and fixes in the 2.0.0.

The release can be downloaded from:
  * - source as a zip
  * - source as .tar.gz
  * - source as .tar.xz
  * - test suite
  * - documentation/website

As there have been a few changes to the C++ API, and to a lesser extent to the C API, developers are strongly advised to read the migration guide.

dimanche 14 juin 2015

Multi-threaded GeoTIFF compression

GDAL 2.0.0 should be released soon (RC2 available), but the development version (2.1.0dev) has already received interesting improvements, especially for massive pixel crunchers.

The lossless DEFLATE compression scheme is notoriously known to be rather slow on the compression side. For example, converting with gdal_translate a 21600x21600 pixels RGB BMNG tile into a DEFLATE compressed GeoTIFF takes 44s on a Core i5 750 with the default setting. Using low values of the ZLEVEL creation option, that controls the speed vs compresssion ratio, makes things a bit faster :
- ZLEVEL=1, 1 thread: 26s, 486 MB
- ZLEVEL=6(default), 1 thread: 44s, 440 MB

With the development version, you can now use the NUM_THREADS creation option, whose value must be an integer number or ALL_CPUS to make all your cores busy.  New results:
  ZLEVEL=1, 4 threads: 12s, 486 MB
- ZLEVEL=6, 4 threads: 16s, 440 MB

Using horizontal differencing prediction (PREDICTOR=2) leads to further compression gains (generally, but that depends on the nature of data) with a compression time slightly greater (almost unchanged at ZLEVEL=1, but noticeably slower at ZLEVEL=6) :

- ZLEVEL=1, PREDICTOR=2, 1 thread : 25 s, 354 MB
- ZLEVEL=1, PREDICTOR=2, 4 threads : 12 s, 354 MB
- ZLEVEL=6, PREDICTOR=2, 1 thread : 1m 5s, 301 MB
- ZLEVEL=6, PREDICTOR=2, 4 threads : 22 s, 301 MB

Multi-threaded compression has been implemeneted for DEFLATE, LZW, PACKBITS and LZMA compression schemes. JPEG could potentially be added as well, but probably with little benefits since it is already a very fast compression method, especially with libjpeg-turbo (10 s on above example, single-threaded).

Surprisingly, multi-threaded compression is easier to implement than multi-threaded decompression. This is due to the fact that the writing side of gdal_translate does not really care when data actually hits the disk : it pushes pixel blocks to the output driver and let it do what it needs to do with them (of course details are a bit more complicated since GDAL has a block cache that buffers actual writes, the GeoTIFF driver has a one-block cache to deal with pixel-interleaved files, and in some circumstances a driver may need to re-read data it has written). On the reading side, when a block is asked, the pixel values must be returned synchronously. So multi-threaded reading is less obvious and has been left aside for now in the GeoTIFF driver. It could be enabled if a large enough window of pixels is announced to be read (this is actually what is currently implemented in the JPEG2000 OpenJPEG driver when a request crosses several JPEG2000 tiles). When only small windows are requested, a heuristics could be used to recognize the read pattern and start asynchronous decoding of the next likely blocks that will be asked. For example, if the driver realizes that the blocks are asked sequentially from the top to the bottom of the image, it might anticipate that the full image will be eventually read and start spawning tasks to decode the next blocks. We could also imagine to have that as a generic mechanism: in that case a dedicated GDAL dataset would have to be opened for each worker thread so as to avoid messing up with the internal state.

Regarding lossless compression schemes, the LZ4 method might be worth considering as a new method for TIFF. It offers great compression performance and blazingly fast decompression. At the expense of the compression ratio.