官术网_书友最值得收藏!

Using regular expressions (regex)

There are times when an engineer wants to parse specific data from a sentence or a big chunk of data. Regex is the best tool of the trade for this purpose. Regex is a common concept in every programming language, with the only difference being the syntax in each programming language.

The following example shows how to use regex in Python:

import re
sample="From Jan 2018 till Nov 2018 I was learning python daily at 10:00 PM"

# '\W+' represents Non-Alphanumeric characters or group of characters
print(re.split('\W+', sample))

#Extract only the month and Year from the string and print it
regex=re.compile('(?P<month>\w{3})\s+(?P<year>[0-9]{4})')

for m in regex.finditer(sample):
value=m.groupdict()
print ("Month: "+value['month']+" , "+"Year: "+value['year'])

# to extract the time with AM or PM addition
regex=re.compile('\d+:\d+\s[AP]M')
m=re.findall(regex,sample)
print (m)

The sample output is as follows:

>
['From', 'Jan', '2018', 'till', 'Nov', '2018', 'I', 'was', 'learning', 'python', 'daily', 'at', '10', '00', 'PM']
Month: Jan , Year: 2018
Month: Nov , Year: 2018
['10:00 PM']

As we can see in the preceding output, the first line of code, is a simple sentence split into separate words. The other output is a regex in a loop, which extracts all the months and years depicted by three characters (mmm) and four digits (yyyy). Finally, in the last line of code, a time extraction (extracting a time value using regex) is performed, based upon AM/PM in the hh:mm format.

There can be multiple variations that we can work with using regex. It would be beneficial to refer to online tutorials for detailed insight into the different types of regex and how to use the right one to extract information.
主站蜘蛛池模板: 晋宁县| 唐河县| 石阡县| 公主岭市| 南靖县| 柳江县| 桑植县| 称多县| 兴仁县| 衡水市| 修文县| 湄潭县| 柞水县| 越西县| 清流县| 庄河市| 南涧| 松溪县| 温泉县| 东辽县| 山西省| 醴陵市| 高雄县| 和静县| 抚松县| 武城县| 洱源县| 绥德县| 榆树市| 滨州市| 尖扎县| 西宁市| 灌南县| 濉溪县| 涟源市| 汝阳县| 新郑市| 客服| 济源市| 霍林郭勒市| 江陵县|