- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 150字
- 2021-06-24 12:29:08
Extracting N-grams
In standard quantitative analysis of text, N-grams are sequences of N tokens (for example, words or characters). For instance, given the text The quick brown fox jumped over the lazy dog, if our tokens are words, then the 1-grams are the, quick, brown, fox, jumped, over, the, lazy, and dog. The 2-grams are the quick, quick brown, brown fox, and so on. The 3-grams are the quick brown, quick brown fox, brown fox jumped, and so on. Just like the local statistics of the text allowed us to build a Markov chain to perform statistical predictions and text generation from a corpus, N-grams allow us to model the local statistical properties of our corpus. Our ultimate goal is to utilize the counts of N-grams to help us predict whether a sample is malicious or benign. In this recipe, we demonstrate how to extract N-gram counts from a sample.
- GNU-Linux Rapid Embedded Programming
- 計(jì)算機(jī)圖形學(xué)
- 基于多目標(biāo)決策的數(shù)據(jù)挖掘方法評(píng)估與應(yīng)用
- 大數(shù)據(jù)技術(shù)與應(yīng)用
- Cloudera Administration Handbook
- OpenStack Cloud Computing Cookbook
- 網(wǎng)站入侵與腳本攻防修煉
- Godot Engine Game Development Projects
- Applied Data Visualization with R and ggplot2
- LMMS:A Complete Guide to Dance Music Production Beginner's Guide
- 手機(jī)游戲策劃設(shè)計(jì)
- Windows安全指南
- MPC5554/5553微處理器揭秘
- 企業(yè)級(jí)Web開(kāi)發(fā)實(shí)戰(zhàn)
- x86/x64體系探索及編程