官术网_书友最值得收藏!

Using hdfs3 with HDFS

hdfs3 is a lightweight Python wrapper around the C/C++ libhdfs3 library. It allows us to use HDFS natively from Python. To start, we first need to connect with the HDFS NameNode; this is done using the HDFileSystem class:

from hdfs3 import HDFileSystem
hdfs = HDFileSystem(host = 'localhost', port=8020)

This automatically establishes a connection with the NameNode. Now, we can access a directory listing using the following:

print(hdfs.ls('/tmp')) 

This will list all the files and directories in the tmp folder. You can use functions such as mkdir to make a directory and cp to copy a file from one location to another. To write into a file, we open it first using the open method and use write:

with hdfs.open('/tmp/file1.txt','wb') as f:
f.write(b'You are Awesome!')

Data can be read from the file:

with hdfs.open('/tmp/file1.txt') as f:
print(f.read())

You can learn more about hdfs3 from its documentation: https://media.readthedocs.org/pdf/hdfs3/latest/hdfs3.pdf

主站蜘蛛池模板: 杭锦后旗| 巴中市| 千阳县| 扎囊县| 凌海市| 西林县| 安化县| 南汇区| 灵武市| 呼图壁县| 屯昌县| 资阳市| 临安市| 深州市| 施秉县| 柳河县| 马龙县| 河西区| 荔浦县| 广平县| 潜江市| 峨边| 红安县| 寿阳县| 夏邑县| 莒南县| 安岳县| 威信县| 壤塘县| 景泰县| 赣州市| 张家界市| 吉林市| 米泉市| 兴安盟| 墨玉县| 甘谷县| 静宁县| 镇平县| 巩留县| 菏泽市|