官术网_书友最值得收藏!

Dictionaries for text analysis

A common use of dictionaries is to count the occurrences of like items in a sequence; a typical example is counting the occurrences of words in a body of text. The following code creates a dictionary where each word in the text is used as a key and the number of occurrences as its value. This uses a very common idiom of nested loops. Here we are using it to traverse the lines in a file in an outer loop and the keys of a dictionary on the inner loop:

def wordcount(fname): 
try:
fhand=open(fname)
except:
print('File cannot be opened')
exit()

count= dict()
for line in fhand:
words = line.split()
for word in words:
if word not in count:
count[word] = 1
else:
count[word] += 1
return(count)

This will return a dictionary with an element for each unique word in the text file. A common task is to filter items such as these into subsets we are interested in. You will need a text file saved in the same directory as you run the code. Here we have used alice.txt, a short excerpt from Alice in Wonderland. To obtain the same results, you can download alice.txt from davejulian.net/bo5630, or use a text file of your own. In the following code, we create another dictionary, filtered, containing a subset of items from count:

count=wordcount('alice.txt') 
filtered = { key:value for key, value in count.items() if value < 20 and value > 15 }

When we print the filtered dictionary, we get the following:

Note the use of the dictionary comprehension used to construct the filtered dictionary. Dictionary comprehensions work in an identical way to the list comprehensions we looked at in Chapter 1, Python Objects, Types, and Expressions.

主站蜘蛛池模板: 布尔津县| 孝感市| 巴南区| 金华市| 丰镇市| 汝城县| 稷山县| 蓝山县| 新泰市| 武隆县| 科尔| 拉萨市| 长武县| 正定县| 青海省| 龙泉市| 阿克陶县| 岢岚县| 城市| 桦川县| 东乌| 米脂县| 肥城市| 武乡县| 延川县| 顺义区| 平昌县| 海原县| 海林市| 永新县| 苍溪县| 蚌埠市| 彩票| 灵寿县| 静海县| 阿拉尔市| 晋中市| 佛教| 陕西省| 密云县| 肃宁县|