- Julia 1.0 Programming Complete Reference Guide
- Ivo Balbaert Adrian Salceanu
- 329字
- 2021-06-24 14:21:48
An example project – word frequency
A lot of the concepts and techniques that we have seen so far in this book come together in this little project. Its aim is to read a text file, remove all characters that are not used in words, and count the frequency of the words in the remaining text. This can be useful, for example, when counting the word density on a web page, the frequency of DNA sequences, or the number of hits on a website that came from various IP addresses. This can be done in some ten lines of code. For example, when words1.txt contains the sentence to be, or not to be, that is the question!, then this is the output of the program:
Word : frequency be : 2 is : 1 not : 1 or : 1 question : 1 that : 1 the : 1 to : 2
Here is the code with comments:
# code in chapter 5\word_frequency.jl: # 1- read in text file: str = read("words1.txt", String) # 2- replace non alphabet characters from text with a space: nonalpha = r"(\W\s?)" # define a regular expression str = replace(str, nonalpha => ' ') digits = r"(\d+)" str = replace(str, digits => ' ') # 3- split text in words: word_list = split(str, ' ') # 4- make a dictionary with the words and count their frequencies: word_freq = Dict{String, Int64}() for word in word_list word = strip(word) if isempty(word) continue end
haskey(word_freq, word) ?
word_freq[word] += 1 :
word_freq[word] = 1 end # 5- sort the words (the keys) and print out the frequencies: println("Word : frequency \n") words = sort!(collect(keys(word_freq))) for word in words println("$word : $(word_freq[word])") end
The strip() function removes white space from a string at the front/back.
The isempty function is quite general and can be used on any collection.
Try the code out with the example text files words1.txt or words2.txt. See the output in results_words1.txt and results_words2.txt.
- Learning Scala Programming
- LaTeX Cookbook
- 在最好的年紀學Python:小學生趣味編程
- Java持續(xù)交付
- Redis Essentials
- Java程序設計:原理與范例
- Haskell Data Analysis Cookbook
- PySide 6/PyQt 6快速開發(fā)與實戰(zhàn)
- Qt5 C++ GUI Programming Cookbook
- R Data Science Essentials
- C語言程序設計
- Node.js 6.x Blueprints
- MongoDB Cookbook
- Mastering Unity 2017 Game Development with C#(Second Edition)
- MATLAB語言及編程實踐:生物數學模型應用