官术网_书友最值得收藏!

The BaseML class

For the BaseML class, we have made several enhancements, starting with the constructor. In the constructor, we initialize the stringRex variable to the regular expression we will use to extract strings. Encoding.RegisterProvider is critical to utilize the Windows-1252 encoding. This encoding is the encoding Windows Executables utilize:

private static Regex _stringRex;

protected BaseML()
{
MlContext = new MLContext(2020);

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

_stringRex = new Regex(@"[ -~\t]{8,}", RegexOptions.Compiled);
}

The next major addition is the GetStrings method. This method takes the bytes, runs the previously created compiled regular expression, and extracts the string matches:

  1. To begin, we define the method definition and initialize the stringLines variable to hold the strings:
protected string GetStrings(byte[] data)
{
var stringLines = new StringBuilder();
  1. Next, we will sanity check the input data is not null or empty:
if (data == null || data.Length == 0)
{
return stringLines.ToString();
}
  1. The next block of code we open a MemoryStream object and then a StreamReader object:
 using (var ms = new MemoryStream(data, false))
{
using (var streamReader = new StreamReader(ms, Encoding.GetEncoding(1252), false, 2048, false))
{
  1. We will then loop through the streamReader object until an EndOfStream condition is reached, reading line by line:
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
  1. We then will apply some string clean up of the data and handle whether the line is empty or not gracefully:
if (string.IsNullOrEmpty(line))
{
continue;
}

line = line.Replace("^", "").Replace(")", "").Replace("-", "");
  1. Then, we will append the regular expression matches and append those matches to the previously defined stringLines variable:
stringLines.Append(string.Join(string.Empty,
_stringRex.Matches(line).Where(a => !string.IsNullOrEmpty(a.Value) && !string.IsNullOrWhiteSpace(a.Value)).ToList()));
  1. Lastly, we will return the stringLines variable converted into a single string using the string.Join method:
    return string.Join(string.Empty, stringLines);
}
主站蜘蛛池模板: 丘北县| 双鸭山市| 赞皇县| 白玉县| 浮山县| 牟定县| 丹巴县| 香河县| 克拉玛依市| 云霄县| 柳河县| 牡丹江市| 康定县| 河源市| 乐东| 曲阜市| 沂源县| 房产| 綦江县| 兰州市| 阳新县| 将乐县| 呼玛县| 水城县| 葫芦岛市| 蒙山县| 波密县| 册亨县| 南靖县| 安义县| 林周县| 揭西县| 井陉县| 佛学| 崇信县| 汶上县| 宜春市| 腾冲县| 泰宁县| 台湾省| 始兴县|