官术网_书友最值得收藏!

How it works...

The basic idea behind ssdeep is to combine a number of traditional hashes whose boundaries are determined by the context of the input. This collection of hashes can then be used to identify modified versions of known files even when they have been modified by insertion, modification, or deletion.

For our recipe, we began by creating a set of four test strings meant as a toy example to illustrate how changes in a string will affect its similarity measures (step 1). The first, str1, is simply the first sentence of Lorem Ipsum. The second string, str2, differs in the capitalization of m in magna. The third string, str3, is missing the word magna altogether. Finally, the fourth string is an entirely different string. Our next step, step 2, is to hash the strings using the similarity hashing ssdeep library. Observe that similar strings have visibly similar similarity hashes. This should be contrasted with traditional hashes, in which even a small alteration produces a completely different hash. Next, we derive the similarity score between the various strings using ssdeep (step 3). In particular, observe that the ssdeep similarity score between two strings is an integer ranging between 0 and 100, with 100 being identical and 0 being dissimilar. Two identical strings will have a similarity score of 100. Changing the case of one letter in our string lowered the similarity score significantly to 39 because the strings are relatively short. Removing a word lowered it to 37. And two completely different strings had a similarity of 0.

Although other, in some cases better, fuzzy hashes are available, ssdeep is still a primary choice because of its speed and being a de facto standard.

主站蜘蛛池模板: 波密县| 吉隆县| 昆山市| 中超| 望谟县| 乡宁县| 固始县| 汕尾市| 格尔木市| 自贡市| 普安县| 自治县| 牙克石市| 雅江县| 蓝山县| 秭归县| 梁平县| 永兴县| 鄂温| 湖南省| 理塘县| 剑川县| 河北省| 秭归县| 屯门区| 阿勒泰市| 通城县| 石河子市| 永顺县| 邯郸市| 辽中县| 额济纳旗| 江门市| 芜湖县| 顺平县| 吐鲁番市| 铁力市| 宿迁市| 呼和浩特市| 阳曲县| 军事|