- Learning Geospatial Analysis with Python(Second Edition)
- Joel Lawhead
- 238字
- 2021-07-23 14:45:50
File structure
The preceding elements can be stored in a variety of ways in a single file, multiple files, or database depending on the format. Additionally, this geospatial information can be stored in a variety of formats, including embedded binary headers, XML, database tables, spreadsheets/CSV, separate text, or binary files.
Human readable formats such as XML files, spreadsheets, and structured text files require only a text editor to investigate. These files are also easily parsed and processed using Python's built-in modules, data types, and string manipulation functions. Binary-based formats are more complicated. It is thus typically easier to use a third-party library to deal with binary formats.
However, you don't have to use a third-party library, especially if you just want to investigate the data at a high level. Python's built-in struct module has everything that you need. The struct module lets you read and write binary data as strings. When using the struct module, you need to be aware of the concept of byte order. Byte order refers to how the bytes of information that make up a file are stored in memory. This order is usually platform-specific, but in some rare cases, including shapefiles, the byte order is mixed in the file. The Python struct module uses the greater than (>) and less than (<) symbols to specify byte order (big endian and little endian, respectively).
The following brief example demonstrates the usage of the Python struct module to parse the bounding box coordinates from an Esri shapefile vector dataset. You can download this shapefile as a zipped file at the following URL:
https://geospatialpython.googlecode.com/files/hancock.zip
When you unzip this, you will see three files. For this example, we'll be using hancock.shp
. The Esri shapefile format has a fixed location and data type in the file header from byte 36 to byte 37 for the minimum x, minimum y, maximum x, and maximum y bounding box values. In this example, we will execute the following steps:
- Import the
struct
module. - Open the
hancock.zip
shapefile in the binary read mode. - Navigate to byte
36
. - Read each 8-byte double specified as
d
, and unpack it using thestruct
module in little-endian order as designated by the<
sign.
The best way to execute this script is in the interactive Python interpreter. We will read the minimum longitude, minimum latitude, maximum longitude, and maximum latitude:
>>> import struct >>> f = open("hancock.shp","rb") >>> f.seek(36) >>> struct.unpack("<d", f.read(8)) (-89.6904544701547,) >>> struct.unpack("<d", f.read(8)) (30.173943486533133,) >>> struct.unpack("<d", f.read(8)) (-89.32227546981174,) >>> struct.unpack("<d", f.read(8)) (30.6483914869749,)
You'll notice that when the struct
module unpacks a value, it returns a Python tuple with one value. You can shorten the preceding unpacking code to one line by specifying all four doubles at once and increasing the byte length to 32 bytes as shown in the following code:
>>> f.seek(36) >>> struct.unpack("<dddd", f.read(32)) (-89.6904544701547, 30.173943486533133, -89.32227546981174, 30.6483914869749)
- C++ Primer習(xí)題集(第5版)
- Advanced Quantitative Finance with C++
- Dynamics 365 for Finance and Operations Development Cookbook(Fourth Edition)
- Testing with JUnit
- Implementing Cisco Networking Solutions
- Mastering Unity Shaders and Effects
- 程序設(shè)計(jì)基礎(chǔ)教程:C語言
- Creating Mobile Apps with jQuery Mobile(Second Edition)
- Learning Docker Networking
- 精通MySQL 8(視頻教學(xué)版)
- 大學(xué)計(jì)算機(jī)基礎(chǔ)實(shí)驗(yàn)指導(dǎo)
- Android應(yīng)用開發(fā)實(shí)戰(zhàn)(第2版)
- Raspberry Pi Blueprints
- Getting Started with JUCE
- 和孩子一起學(xué)編程:用Scratch玩Minecraft我的世界