官术网_书友最值得收藏!

Trimming excess whitespace

The text obtained from sources may unintentionally include beginning or trailing whitespace characters. When parsing such an input, it is often wise to trim the text. For example, when Haskell source code contains trailing whitespace, the GHC compiler ignores it through a process called lexing. The lexer produces a sequence of tokens, effectively ignoring meaningless characters such as excess whitespace.

In this recipe, we will use built-in libraries to make our own trim function.

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Import the isSpace :: Char -> Bool function from the built-in Data.Char package:
    import Data.Char (isSpace)
  2. Write a trim function that removes the beginning and trailing whitespace:
    trim :: String -> String
    trim = f . f
      where f = reverse . dropWhile isSpace
  3. Test it out within main:
    main :: IO ()
    main = putStrLn $ trim " wahoowa! "
  4. Running the code will result in the following trimmed string:
    $ runhaskell Main.hs
    
    wahoowa!
    

How it works...

Our trim function lazily strips the whitespace from the beginning and ending parts of the string. It starts by dropping whitespace letters from the beginning. Then, it reverses the string to apply the same function again. Finally, it reverses the string one last time to bring it back to the original form. Fortunately, the isSpace function from Data.Char handles any Unicode space character as well as the control characters \t, \n, \r, \f, and \v.

There's more…

Ready-made parser combinator libraries such as parsec or uu-parsinglib could be used to do this instead, rather than reinventing the wheel. By introducing a Token type and parsing to this type, we can elegantly ignore the whitespace. Alternatively, we can use the alex lexing library (package name, alex) for this task. These libraries are overkill for this simple task, but they allow us to perform a more generalized tokenizing of text.

主站蜘蛛池模板: 汕头市| 炉霍县| 神池县| 南川市| 得荣县| 措美县| 仁化县| 安岳县| 永新县| 沿河| 新疆| 曲麻莱县| 盐源县| 综艺| 泰兴市| 惠来县| 陆丰市| 五常市| 汉川市| 江口县| 广平县| 沛县| 霸州市| 鹤岗市| 洪湖市| 涪陵区| 外汇| 克东县| 长白| 宁武县| 正镶白旗| 财经| 泸州市| 方城县| 镇原县| 盐池县| 莱阳市| 连云港市| 通州市| 花莲市| 文安县|