- Hands-On Machine Learning with ML.NET
- Jarred Capellman
- 283字
- 2021-06-24 16:43:34
The BaseML class
For the BaseML class, we have made several enhancements, starting with the constructor. In the constructor, we initialize the stringRex variable to the regular expression we will use to extract strings. Encoding.RegisterProvider is critical to utilize the Windows-1252 encoding. This encoding is the encoding Windows Executables utilize:
private static Regex _stringRex;
protected BaseML()
{
MlContext = new MLContext(2020);
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
_stringRex = new Regex(@"[ -~\t]{8,}", RegexOptions.Compiled);
}
The next major addition is the GetStrings method. This method takes the bytes, runs the previously created compiled regular expression, and extracts the string matches:
- To begin, we define the method definition and initialize the stringLines variable to hold the strings:
protected string GetStrings(byte[] data)
{
var stringLines = new StringBuilder();
- Next, we will sanity check the input data is not null or empty:
if (data == null || data.Length == 0)
{
return stringLines.ToString();
}
- The next block of code we open a MemoryStream object and then a StreamReader object:
using (var ms = new MemoryStream(data, false))
{
using (var streamReader = new StreamReader(ms, Encoding.GetEncoding(1252), false, 2048, false))
{
- We will then loop through the streamReader object until an EndOfStream condition is reached, reading line by line:
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
- We then will apply some string clean up of the data and handle whether the line is empty or not gracefully:
if (string.IsNullOrEmpty(line))
{
continue;
}
line = line.Replace("^", "").Replace(")", "").Replace("-", "");
- Then, we will append the regular expression matches and append those matches to the previously defined stringLines variable:
stringLines.Append(string.Join(string.Empty,
_stringRex.Matches(line).Where(a => !string.IsNullOrEmpty(a.Value) && !string.IsNullOrWhiteSpace(a.Value)).ToList()));
- Lastly, we will return the stringLines variable converted into a single string using the string.Join method:
return string.Join(string.Empty, stringLines);
}
推薦閱讀
- Flask Blueprints
- PyTorch自動駕駛視覺感知算法實戰
- 信息安全技術
- 零基礎學C語言程序設計
- Cocos2d-x by Example:Beginner's Guide(Second Edition)
- Python程序設計開發寶典
- MySQL 8從零開始學(視頻教學版)
- Application Development with Parse using iOS SDK
- Flask開發Web搜索引擎入門與實戰
- Isomorphic Go
- Blender 3D Cookbook
- Koa與Node.js開發實戰
- Socket.IO Cookbook
- ASP.NET本質論
- Java Script從入門到精通(第5版)