PHP 中正则表达式的解析器？

2024-03-04

我需要将正则表达式解析为 PHP 中的组件。我创建或执行正则表达式没有问题，但我想显示有关正则表达式的信息（例如列出捕获组、将重复字符附加到其目标，...）。整个项目是一个 WordPress 插件，它提供了有关重写规则的信息，这些规则是具有替换模式的正则表达式，并且可能很难理解。

我已经写了一个简单的实现 http://codereview.appspot.com/3745046/patch/1/4我自己，它似乎可以处理我扔给它的简单正则表达式并将它们转换为语法树。在扩展此示例以支持更多正则表达式语法之前，我想知道是否还有其他好的实现可以查看。实现语言并不重要。我认为大多数解析器都是为了优化匹配速度而编写的，但这对我来说并不重要，甚至可能会妨碍清晰度。

我是的创造者Debuggex http://www.debuggex.com，其要求与您的非常相似：优化可显示的信息量。

下面是 Debuggex 使用的解析器的经过大量修改（为了可读性）的片段。它不能按原样工作，而是为了演示代码的组织。大部分错误处理已被删除。许多简单但冗长的逻辑也是如此。

注意递归下降 http://en.wikipedia.org/wiki/Recursive_descent_parser用来。这就是您在解析器中所做的事情，只不过您的解析器被扁平化为单个函数。我大约使用了这个语法：

Regex -> Alt
Alt -> Cat ('|' Cat)*
Cat -> Empty | (Repeat)+
Repeat -> Base (('*' | '+' | '?' | CustomRepeatAmount) '?'?)
Base -> '(' Alt ')' | Charset | Literal
Charset -> '[' (Char | Range | EscapeSeq)* ']'
Literal -> Char | EscapeSeq
CustomRepeatAmount -> '{' Number (',' Number)? '}'

您会注意到我的很多代码只是处理正则表达式的 javascript 风格的特性。您可以在以下位置找到有关它们的更多信息这个参考 https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions。对于 PHP，this http://www.php.net/manual/en/reference.pcre.pattern.syntax.php有您需要的所有信息。我认为你的解析器进展顺利；剩下的就是实现其余的运算符并正确处理边缘情况。

：）享受：

var Parser = function(s) {
  this.s = s; // This is the regex string.
  this.k = 0; // This is the index of the character being parsed.
  this.group = 1; // This is a counter for assigning to capturing groups.
};

// These are convenience methods to make reading and maintaining the code
// easier.
// Returns true if there is more string left, false otherwise.
Parser.prototype.more = function() {
  return this.k < this.s.length;
};
// Returns the char at the current index.
Parser.prototype.peek = function() { // exercise
};
// Returns the char at the current index, then advances the index.
Parser.prototype.next = function() { // exercise
};
// Ensures c is the char at the current index, then advances the index.
Parser.prototype.eat = function(c) { // exercise
};

// We use a recursive descent parser.
// This returns the root node of our tree.
Parser.prototype.parseRe = function() {
  // It has exactly one child.
  return new ReTree(this.parseAlt());
  // We expect that to be at the end of the string when we finish parsing.
  // If not, something went wrong.
  if (this.more()) {
    throw new Error();
  }
};

// This parses several subexpressions divided by |s, and returns a tree
// with the corresponding trees as children.
Parser.prototype.parseAlt = function() {
  var alts = [this.parseCat()];
  // Keep parsing as long as a we have more pipes.
  while (this.more() && this.peek() === '|') {
    this.next();
    // Recursive descent happens here.
    alts.push(this.parseCat());
  }
  // Here, we allow an AltTree with single children.
  // Alternatively, we can return the child if there is only one.
  return new AltTree(alts);
};

// This parses several concatenated repeat-subexpressions, and returns
// a tree with the corresponding trees as children.
Parser.prototype.parseCat = function() {
  var cats = [];
  // If we reach a pipe or close paren, we stop. This is because that
  // means we are in a subexpression, and the subexpression is over.
  while (this.more() && ')|'.indexOf(this.peek()) === -1) {
    // Recursive descent happens here.
    cats.push(this.parseRepeat());
  }
  // This is where we choose to handle the empty string case.
  // It's easiest to handle it here because of the implicit concatenation
  // operator in our grammar.
  return (cats.length >= 1) ? new CatTree(cats) : new EmptyTree();
};

// This parses a single repeat-subexpression, and returns a tree
// with the child that is being repeated.
Parser.prototype.parseRepeat = function() {
  // Recursive descent happens here.
  var repeat = this.parseBase();
  // If we reached the end after parsing the base expression, we just return
  // it. Likewise if we don't have a repeat operator that follows.
  if (!this.more() || '*?+{'.indexOf(this.peek()) === -1) {
    return repeat;
  }

  // These are properties that vary with the different repeat operators.
  // They aren't necessary for parsing, but are used to give meaning to
  // what was parsed.
  var min = 0; var max = Infinity; var greedy = true;
  if (this.peek() === '*') { // exercise
  } else if (this.peek() === '?') { // exercise
  } else if (this.peek() === '+') {
    // For +, we advance the index, and set the minimum to 1, because
    // a + means we repeat the previous subexpression between 1 and infinity
    // times.
    this.next(); min = 1;
  } else if (this.peek() === '{') { /* challenging exercise */ }

  if (this.more() && this.peek() === '?') {
    // By default (in Javascript at least), repetition is greedy. Appending
    // a ? to a repeat operator makes it reluctant.
    this.next(); greedy = false;
  }
  return new RepeatTree(repeat, {min:min, max:max, greedy:greedy});
};

// This parses a "base" subexpression. We defined this as being a
// literal, a character set, or a parnthesized subexpression.
Parser.prototype.parseBase = function() {
  var c = this.peek();
  // If any of these characters are spotted, something went wrong.
  // The ) should have been eaten by a previous call to parseBase().
  // The *, ?, or + should have been eaten by a previous call to parseRepeat().
  if (c === ')' || '*?+'.indexOf(c) !== -1) {
    throw new Error();
  }
  if (c === '(') {
    // Parse a parenthesized subexpression. This is either a lookahead,
    // a capturing group, or a non-capturing group.
    this.next(); // Eat the (.
    var ret = null;
    if (this.peek() === '?') { // excercise
      // Parse lookaheads and non-capturing groups.
    } else {
      // This is why the group counter exists. We use it to enumerate the
      // group appropriately.
      var group = this.group++;
      // Recursive descent happens here. Note that this calls parseAlt(),
      // which is what was initially called by parseRe(), creating
      // a mutual recursion. This is where the name recursive descent
      // comes from.
      ret = new MatchTree(this.parseAlt(), group);
    }
    // This MUST be a ) or something went wrong.
    this.eat(')');
    return ret;
  } else if (c === '[') {
    this.next(); // Eat the [.
    // Parse a charset. A CharsetTree has no children, but it does contain
    // (pseudo)chars and ranges, and possibly a negation flag. These are
    // collectively returned by parseCharset().
    // This piece can be structured differently depending on your
    // implementation of parseCharset()
    var opts = this.parseCharset();
    // This MUST be a ] or something went wrong.
    this.eat(']');
    return new CharsetTree(opts);
  } else {
    // Parse a literal. Like a CharsetTree, a LiteralTree doesn't have
    // children. Instead, it contains a single (pseudo)char.
    var literal = this.parseLiteral();
    return new LiteralTree(literal);
  }
};

// This parses the inside of a charset and returns all the information
// necessary to describe that charset. This includes the literals and
// ranges that are accepted, as well as whether the charset is negated.
Parser.prototype.parseCharset = function() {
  // challenging exercise
};

// This parses a single (pseudo)char and returns it for use in a LiteralTree.
Parser.prototype.parseLiteral = function() {
  var c = this.next();
  if (c === '.' || c === '^' || c === '$') {
    // These are special chars. Their meaning is different than their
    // literal symbol, so we set the 'special' flag.
    return new CharInfo(c, true);
  } else if (c === '\\') {
    // If we come across a \, we need to parse the escaped character.
    // Since parsing escaped characters is similar between literals and
    // charsets, we extracted it to a separate function. The reason we
    // pass a flag is because \b has different meanings inside charsets
    // vs outside them.
    return this.parseEscaped({inCharset: false});
  }
  // If neither case above was hit, we just return the exact char.
  return new CharInfo(c);
};

// This parses a single escaped (pseudo)char and returns it for use in
// either a LiteralTree or a CharsetTree.
Parser.prototype.parseEscaped = function(opts) {
  // Here we instantiate some default options
  opts = opts || {};
  inCharset = opts.inCharset || false;

  var c = peek();
  // Here are a bunch of escape sequences that require reading further
  // into the string. They are all fairly similar.
  if (c === 'c') { // exercises
  } else if (c === '0') {
  } else if (isDigit(c)) {
  } else if (c === 'x') {
  } else if (c === 'u') {
    // Use this as an example for implementing the ones above.
    // A regex may be used for this portion, but I think this is clearer.
    // We make sure that there are exactly four hexadecimal digits after
    // the u. Modify this for the escape sequences that your regex flavor
    // uses.
    var r = '';
    this.next();
    for (var i = 0; i < 4; ++i) {
      c = peek();
      if (!isHexa(c)) {
        throw new Error();
      }
      r += c;
      this.next();
    }
    // Return a single CharInfo desite having read multiple characters.
    // This is why I used "pseudo" previously.
    return new CharInfo(String.fromCharCode(parseInt(r, 16)));
  } else { // No special parsing required after the first escaped char.
    this.next();
    if (inCharset && c === 'b') {
      // Within a charset, \b means backspace
      return new CharInfo('\b');
    } else if (!inCharset && (c === 'b' || c === 'B')) {
      // Outside a charset, \b is a word boundary (and \B is the complement
      // of that). We mark it one as special since the character is not
      // to be taken literally.
      return new CharInfo('\\' + c, true);
    } else if (c === 'f') { // these are left as exercises
    } else if (c === 'n') {
    } else if (c === 'r') {
    } else if (c === 't') {
    } else if (c === 'v') {
    } else if ('dDsSwW'.indexOf(c) !== -1) {
    } else {
      // If we got to here, the character after \ should be taken literally,
      // so we don't mark it as special.
      return new CharInfo(c);
    }
  }
};

// This represents the smallest meaningful character unit, or pseudochar.
// For example, an escaped sequence with multiple physical characters is
// exactly one character when used in CharInfo.
var CharInfo = function(c, special) {
  this.c = c;
  this.special = special || false;
};

// Calling this will return the parse tree for the regex string s.
var parse = function(s) { return (new Parser(s)).parseRe(); };

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

PHP 中正则表达式的解析器？的相关文章

PHP 会话不适用于游戏

我正在尝试模仿一款名为 SKUNK 用骰子玩的游戏来完成一项作业我无法让会话正常工作这是我第一次使用 PHP 我还被告知无需会议即可完成这是我的代码
FPDI/FPDF：水印和打印多页

我修改了这个堆栈问题当用户尝试下载文件时在 pdf 文件上应用水印 https stackoverflow com questions 3983432 applying watermarks on pdf files when users
Woocommerce 让产品显示在存档页面中

我正在尝试让所有产品显示在我商店的存档页面中我想知道他们的id我正在使用我的一个钩子它在 wp head 上运行并检查 if is product category 我想以某种方式访问产品的查询并获取它们的 ID if is prod
Apache 访问 Linux 中的 NTFS 链接文件夹

在 Debian jessie 中使用 Apache2 PHP 当我想在 Apache 的文档文件夹 var www 中创建一个新的小节时我只需创建一个指向我的 php 文件所在的外部文件夹的链接然后只需更改该文件夹的所有者和权限文件夹
如何使用 php 发送服务器错误响应？

一旦用户点击删除按钮我的 jQuery 脚本要求服务器删除所选项目现在我想要我的php发送成功或错误响应的脚本是否有可能触发错误回调万一该项目无法删除 Thanks 我的 jQuery 代码 ajax type post url myA
PHP严格标准：声明应该兼容

我有以下类层次结构 class O Base class O extends O Base abstract class A Abstract public function save O Base obj class A extends
Notepad++正则表达式，查找大写字母但忽略某些单词

我想找到所有大写字母但我需要忽略某些单词字母例如 I m Surprised I Myself I Am Excited 在这种情况下我试图标记所有大写字母但排除 2 个单独的大写字母I and I m 这是我的出发点 A Z I
从 .phar 存档中提取文件

对于 Phar 文件我完全错过了一些东西我正在安装一个需要 phpunit pdepend 和其他依赖项的项目我将它们作为 phar 文件获取但是我无法使用命令行工具 php 命令从中提取文件我用谷歌搜索了这个问题但没有发现
“使用未定义常量”注意，但该常量应该被定义

共有三个文件 common php controller php 和 user php 文件 common php 如下所示文件controller php看起来像文件 user php 如下所示执行脚本时会给出通知注意使用未定
交换关联数组中的两个项目

Example arr array apple gt sweet grapefruit gt bitter pear gt tasty banana gt yellow 我想调换一下柚子和梨的位置这样数组就变成了 arr array ap
PHP 在输入流中使用 fwrite 和 fread

我正在寻找将 PHP 输入流的内容写入磁盘的最有效方法而不使用授予 PHP 脚本的大量内存例如如果可以上传的最大文件大小为 1 GB 但 PHP 只有 32 MB 内存 define MAX FILE LEN 1073741824 1
PHP 脚本可以在终端中运行，但不能在浏览器中运行

我正在尝试执行exec命令但我遇到了问题当我运行以下代码时当我通过浏览器运行它时它不起作用但如果我把输出 str将其复制并粘贴到终端中它工作得很好造成这种情况的原因是什么我该如何解决目前我正在运行localhost php
MySQL 追加字符串

How can I append a string to the end of an existing table value Let s say I have the table below And let s say that Mari
Spark SQL 中的 SQL LIKE

我正在尝试使用 LIKE 条件在 Spark SQL 中实现联接我正在执行连接的行看起来像这样称为修订 Table A 8NXDPVAE Table B 4 8 NXD V 在 SQL Server 上执行联接 A revision
jQuery Mobile 表单验证

我有一个移动网站除了验证之外一切都工作正常基本上我希望从用户那里获取值然后在单独的页面 process php 上处理它们但是在这样做之前我需要检查以确保字段已填充我已经研究了几种方法来做到这一点但似乎没有一种有效我现在有
跟踪用户何时点击浏览器上的后退按钮

是否可以检测用户何时单击浏览器的后退按钮我有一个 Ajax 应用程序如果我可以检测到用户何时单击后退按钮我可以显示适当的数据任何使用 PHP JavaScript 的解决方案都是优选的任何语言的解决方案都可以只需要我可以翻译成
Doctrine EntityManager 清除嵌套实体中的方法

我想用学说批量插入处理 http doctrine orm readthedocs org en latest reference batch processing html为了优化大量实体的插入问题出在 Clear 方法上它表示此方法
替换字符串/文本中“从第 n 次到最后一次”出现的单词

这个问题以前曾被问过但尚未得到令提问者满意的答案 https stackoverflow com questions 36368712 how to use stringrs replace all function to replace
简单的 PHP 表单：电子邮件附件（代码 Golf）

想象一下一个用户想要在其网站上放置一个表单该表单将允许网站访问者上传一个文件和一条简单的消息该消息将立即通过电子邮件发送即该文件未存储在服务器上或者如果该文件存储在服务器上仅暂时作为文件附件并在邮件正文中添加注释查看更多
对具有混合类型值的数组进行数字排序

我有一个像这样的混合数组 fruits array lemon Lemon 20 banana apple 121 40 50 然后申请sort 其功能如下 sort fruits SORT NUMERIC foreach fruits a

随机推荐

查找两个 C# 对象之间的属性差异

我正在开发的项目需要在用户更改电子邮件帐单地址等时进行一些简单的审核日志记录我们正在使用的对象来自不同的来源一个是 WCF 服务另一个是 Web 服务我已经使用反射实现了以下方法来查找两个不同对象上属性的更改这会生成一个具有差异
如何使用 Bower 作为包管理器获取 Less v.2.0.0 的 Rhino 版本

我正在使用 less js 的 Rhino 版本使用 Bower 将其提取到我的开发环境中在 1 7 5 下一切正常 bower 将获得所有不同的版本 Node 和 Rhino 版本都将存储在我的环境中最近更新到 2 0 0 后似
PHP：嵌入另一个 URL 的 URL 的较短/模糊编码？

我正在为自己编写一个脚本它基本上可以让我在单个 get 请求的查询字符串中发送一个 URL 和两个整数维度我使用 base64 对其进行编码但它太长了我担心 URL 可能会变得太大有谁知道另一种更短的方法吗在 get 请求中收到
在 GAE/J 和 JPA 上使用 @MappedSuperclass 时，“字段 jdoFieldFlags 发生冲突”

在 GAE J 上查询由映射超类扩展的实体时出现错误映射超类 import java sql Timestamp import javax persistence MappedSuperclass MappedSuperclass pu
numba 中的协程

我正在开发一些需要快速协程的东西我相信 numba 可以加快我的代码速度这是一个愚蠢的例子一个函数对其输入进行平方并添加其被调用的次数 def make square plus count i 0 def square plus c
如何在CSS中使滑块居中？

我在主题上安装了这个插件这个主题有一个滑块但我不喜欢它所以我安装了这个 http wordpress org extend plugins content slide http wordpress org extend plugins
使用 gcc 编译 Python C 扩展时出现“...无法弄清楚...的架构类型”问题

我刚刚从 Snow Leopard 升级到 Lion 我必须更新的旧 python c 扩展不想正确编译我真的不知道在这里做什么任何人都可以帮助我使其编译正常吗它在 Snow Leopard 中编译得很好 Home folder M
正确从 QGraphicsScene/QgraphicsItemGroup/QGraphicsView 中删除项目

我正在尝试使用可移动控制点绘制一个图QGraphicsView QGraphicsScene QGraphicsItemGroup 我遇到的问题是我找不到任何删除删除该项目的方法我的想法如下我将有一个 QGraphicsView 在它
IIS 的 URL 重写规则替换每个页面中的文件夹路径

我的网站项目有 300 多个页面随着时间的推移我们创建了一个安全的新服务器该服务器专门用于网站中的所有图像所以这是场景当前图像的实现在 aspx 中在 css 中 http www mysite com assets comm
朴素高斯预测概率仅返回 0 或 1

我从 scikit sklearn 训练了 GaussianNB 模型当我调用该方法时classifier predict proba它仅在新数据上返回 1 或 0 预计会返回预测正确与否的置信度百分比我怀疑它能否对以前从未见过的新数据
Android 中将 ImageView 置于另一个 ImageView 中

好吧我必须将一个 ImageView 放入另一个 ImageView 中它较小并且必须正好位于中心我有两个针对不同屏幕分辨率缩放的图像但我只能在一部手机上进行测试我想知道是否使用 dpi 设置第二个图像的高度和宽度以适合我的屏幕分
使用 FUSE 在 python 中创建临时文件

我正在尝试使用 python fuse 编写程序但无法记录文件我的 file class 看起来像这样 class FuseFile object def init self path flags mode debug path deb
获取 SWT 视图的大小

我正在尝试确定 SWT 视图的大小以便可以在插件中正确布局小部件我正在使用 Java 8 运行 Eclipse Neon 我正在使用的代码如下 import org eclipse swt SWT import org eclipse
实体框架：连接两个表和 where 子句

我在使用实体框架和 PostgreSQL 时遇到问题有人知道如何连接两个表并将第二个表用作 where 子句吗我想在实体框架中执行的选择将在 SQL 中 SELECT ai id ai title ai description ai c
如何包含绝对位置div？

我有这个小提琴here http jsfiddle net 45atnh0u 这是下图我需要实现的是让黑色容器根据里面的项目项目是A B C 动态扩展输出必须是无需静态设置高度我的html是 div class container
相当于 gcc/g++ 中的 __declspec( bare )

相当于什么 declspec naked in gcc g declspec naked 实际上用于声明一个没有任何尾声和序言的函数在某些体系结构上 gcc 支持称为 naked 的属性最近的gcc docs http gcc gnu
如何修复拖放 JavaScript

我创建了这个页面和脚本来使用 JavaScript HTML CSS 拖放对象我聚焦对象跟随鼠标将鼠标悬停在页面项目上并将其放在容器上但我的问题是拖放不起作用这是代码 HTML div class container span spa
按年份分割数据

我有这样的数据 ID ATTRIBUTE START END 1 A 01 01 2000 15 03 2010 1 B 05 11 2001 06 02 2002 2 B 01 02 2002 08 05 2008 2 B 01 06 2
运行 JUnit 测试时 Spring Security 不调用我的自定义身份验证过滤器

我正在尝试按照以下方式使用 Spring Security 实现自定义无状态身份验证article http www future processing pl blog exploring spring boot and spring se
PHP 中正则表达式的解析器？

我需要将正则表达式解析为 PHP 中的组件我创建或执行正则表达式没有问题但我想显示有关正则表达式的信息例如列出捕获组将重复字符附加到其目标整个项目是一个 WordPress 插件它提供了有关重写规则的信息这些规则是具有替换模式

PHP 中正则表达式的解析器？

PHP 中正则表达式的解析器？ 的相关文章

随机推荐

热门标签

PHP 中正则表达式的解析器？的相关文章