官术网_书友最值得收藏!

Searching for a substring using Data.ByteString

There are many algorithms to search for a string within another string. This recipe will use an existing breakSubstring function in the Data.ByteString library to do most of the heavy lifting.

The ByteString documentation establishes its merits by declaring the following claim:

"[A ByteString is] a time- and space-efficient implementation of byte vectors using packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. Byte vectors are encoded as strict Word8 arrays of bytes, held in a ForeignPtr, and can be passed between C and Haskell with little effort."

More information and documentation can be obtained on the package web page at http://hackage.haskell.org/package/bytestring/docs/Data-ByteString.html.

How to do it...

  1. Import the breakSubstring function as well as the Data.ByteString.Char8 package as follows:
    import Data.ByteString (breakSubstring)
    import qualified Data.ByteString.Char8 as C
  2. Pack the strings as a ByteString and feed them into breakSubstring which has the following type: ByteString -> ByteString -> (ByteString, ByteString). Then determine whether the string is found:
    substringFound :: String -> String -> Bool
    
    substringFound query str = 
      (not . C.null . snd) $ 
      breakSubstring (C.pack query) (C.pack str)
  3. Try out some tests in main as follows:
    main = do
      print $ substringFound "scraf" "swedish scraf mafia"
      print $ substringFound "flute" "swedish scraf mafia"
  4. Executing main will print out the following results:
    True
    False
    

How it works...

The breakSubstring function recursively checks if the pattern is a prefix of the string. To lazily find the first occurrence of a string, we can call snd (breakSubstring pat str).

There's more...

Another elegant way to quickly find a substring is by using the isInfixOf function provided by both Data.List and Data.ByteString. Moreover, we can also use the OverloadedStrings language extension to remove verbiage, as shown in the following code snippet:

{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString (isInfixOf)

main = do
  print $ isInfixOf "scraf" "swedish scraf mafia"
  print $ isInfixOf "flute" "swedish scraf mafia"

See also

Depending on the length of the pattern we're trying to find and the length of the whole string itself, other algorithms may provide better performance. Refer to the Searching a string using the Boyer-Moore-Horspool algorithm and Searching a string using the Rabin-Karp algorithm recipes for more details.

主站蜘蛛池模板: 青龙| 鸡泽县| 杨浦区| 伊宁市| 施甸县| 海门市| 天津市| 屯门区| 卫辉市| 靖安县| 遂昌县| 诏安县| 济南市| 博罗县| 嵩明县| 专栏| 无锡市| 郧西县| 清徐县| 含山县| 安仁县| 华亭县| 郧西县| 湘乡市| 桓台县| 金湖县| 松潘县| 三门峡市| 松桃| 通山县| 开阳县| 延吉市| 乐昌市| 台湾省| 五常市| 钟山县| 留坝县| 苏尼特左旗| 柳江县| 任丘市| 凌源市|