如何区分线条的各个部分？

2023-12-08

我有两个文件想要比较。这些行有时间戳，可能还有一些我想在匹配算法中忽略的其他内容，但如果匹配算法发现文本的其余部分存在差异，我仍然希望输出这些项目。例如：

1c1
<    [junit4] 2013-01-11 04:43:57,392 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
>    [junit4] 2013-01-11 22:16:07,398 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

不应该被发射，但是：

1c1
<    [junit4] 2013-01-11 04:43:57,392 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
>    [junit4] 2013-01-11 22:16:07,398 INFO  com.example.MyClass:456 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

应该发出（因为行号不同）。请注意，时间戳仍然会发出。

如何才能做到这一点？

我之前曾多次希望有这个功能，由于它再次出现在这里，我决定用谷歌搜索一下并找到了 perl 的Algorithm::Diff您可以将其提供给哈希函数（他们称之为“密钥生成函数”）“应该返回一个唯一标识给定元素的字符串”算法用来进行比较的内容（而不是您提供给它的实际内容）。

基本上，您需要做的就是添加一个 sub，它以您希望从字符串中过滤掉不需要的内容的方式执行一些正则表达式魔法，并将 subref 作为参数添加到调用中diff()（参见我的CHANGE 1 and CHANGE 2下面的代码片段中的评论）。

如果您需要正常（或统一）diff输出，检查详细的diffnew.pl模块附带的示例并在此文件中进行必要的更改。为了演示目的，我将使用简单的diff.pl它也附带，因为它很短，我可以将它完整地发布在这里。

mydiff.pl

#!/usr/bin/perl

# based on diff.pl that ships with Algorithm::Diff
# demonstrates the use of a key generation function

# the original diff.pl is:
# Copyright 1998 M-J. Dominus. ([email protected])
# This program is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.

use Algorithm::Diff qw(diff);

die("Usage: $0 file1 file2") unless @ARGV == 2;

my ($file1, $file2) = @ARGV;

-f $file1 or die("$file1: not a regular file");
-f $file2 or die("$file2: not a regular file");
-T $file1 or die("$file1: binary file");
-T $file2 or die("$file2: binary file");

open (F1, $file1) or die("Couldn't open $file1: $!");
open (F2, $file2) or die("Couldn't open $file2: $!");
chomp(@f1 = <F1>);
close F1;
chomp(@f2 = <F2>);
close F2;

# CHANGE 1
# $diffs = diff(\@f1, \@f2);
$diffs = diff(\@f1, \@f2, \&keyfunc);

exit 0 unless @$diffs;

foreach $chunk (@$diffs)
{
        foreach $line (@$chunk)
        {
                my ($sign, $lineno, $text) = @$line;
                printf "%4d$sign %s\n", $lineno+1, $text;
        }
}
exit 1;

# CHANGE 2 {
sub keyfunc
{
        my $_ = shift;
        s/^(\d{2}:\d{2})\s+//;
        return $_;
}
# }

one.txt

12:15 one two three
13:21 three four five

two.txt

10:01 one two three
14:38 seven six eight

示例运行

$ ./mydiff.pl one.txt two.txt
   2- 13:21 three four five
   2+ 13:21 seven six eight

示例运行 2

这是正常情况下的一个diff输出基于diffnew.pl

$ ./my_diffnew.pl one.txt two.txt
2c2
< 13:21 three four five
---
> 13:21 seven six eight

正如您所看到的，两个文件中的第一行都会被忽略，因为它们仅在时间戳上有所不同，并且哈希函数会删除这些行以进行比较。

瞧，您刚刚推出了自己的内容感知功能diff!

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

shell