官术网_书友最值得收藏!

File structure

The preceding elements can be stored in a variety of ways in a single file, multiple files, or database depending on the format. Additionally, this geospatial information can be stored in a variety of formats, including embedded binary headers, XML, database tables, spreadsheets/CSV, separate text, or binary files.

Human readable formats such as XML files, spreadsheets, and structured text files require only a text editor to investigate. These files are also easily parsed and processed using Python's built-in modules, data types, and string manipulation functions. Binary-based formats are more complicated. It is thus typically easier to use a third-party library to deal with binary formats.

However, you don't have to use a third-party library, especially if you just want to investigate the data at a high level. Python's built-in struct module has everything that you need. The struct module lets you read and write binary data as strings. When using the struct module, you need to be aware of the concept of byte order. Byte order refers to how the bytes of information that make up a file are stored in memory. This order is usually platform-specific, but in some rare cases, including shapefiles, the byte order is mixed in the file. The Python struct module uses the greater than (>) and less than (<) symbols to specify byte order (big endian and little endian, respectively).

The following brief example demonstrates the usage of the Python struct module to parse the bounding box coordinates from an Esri shapefile vector dataset. You can download this shapefile as a zipped file at the following URL:

https://geospatialpython.googlecode.com/files/hancock.zip

When you unzip this, you will see three files. For this example, we'll be using hancock.shp. The Esri shapefile format has a fixed location and data type in the file header from byte 36 to byte 37 for the minimum x, minimum y, maximum x, and maximum y bounding box values. In this example, we will execute the following steps:

  1. Import the struct module.
  2. Open the hancock.zip shapefile in the binary read mode.
  3. Navigate to byte 36.
  4. Read each 8-byte double specified as d, and unpack it using the struct module in little-endian order as designated by the < sign.

The best way to execute this script is in the interactive Python interpreter. We will read the minimum longitude, minimum latitude, maximum longitude, and maximum latitude:

>>> import struct
>>> f = open("hancock.shp","rb")
>>> f.seek(36)
>>> struct.unpack("<d", f.read(8))
(-89.6904544701547,)
>>> struct.unpack("<d", f.read(8))
(30.173943486533133,)
>>> struct.unpack("<d", f.read(8))
(-89.32227546981174,)
>>> struct.unpack("<d", f.read(8))
(30.6483914869749,)

You'll notice that when the struct module unpacks a value, it returns a Python tuple with one value. You can shorten the preceding unpacking code to one line by specifying all four doubles at once and increasing the byte length to 32 bytes as shown in the following code:

>>> f.seek(36)
>>> struct.unpack("<dddd", f.read(32))
(-89.6904544701547, 30.173943486533133, -89.32227546981174, 30.6483914869749)
主站蜘蛛池模板: 安阳市| 吉林省| 安泽县| 郑州市| 阿勒泰市| 鲁甸县| 金湖县| 彰武县| 沙雅县| 思南县| 英山县| 观塘区| 绩溪县| 栾城县| 嵩明县| 蓝田县| 达尔| 昭通市| 西青区| 威信县| 望江县| 九江县| 南昌县| 汾西县| 象山县| 库尔勒市| 吉安市| 新邵县| 辽阳县| 呼伦贝尔市| 苏尼特右旗| 淮滨县| 淮南市| 玉门市| 左贡县| 且末县| 夹江县| 社旗县| 新野县| 柯坪县| 论坛|