menu

Questions & Answers

Determine the shortest possible match of a regular expression

I have an randomly ordered array of regular expressions like this:

let patterns = [
    /foo+ba+r/,
    /foo/,
    /foo+bar/,
    /foobar/,
    /m[eo]{4,}w/,
    /boo/,
    /fooo*/,
    /meow/
]

I'm not sure if this is possible but I would like to write an algorithm which sorts the regular expressions from least greedy to most greedy, like this:

[
    /foo/,
    /boo/,
    /fooo*/,
    /meow/,
    /foobar/,
    /foo+bar/,
    /m[eo]{4,}w/,
    /foo+ba+r/
]

I would imagine such sorting could be achieved like so:

patterns.sort((p1, p2) { return p1.greediness() - p2.greediness() });

But there exists no method called greediness in the the RegExpr class.

Ideally, the greediness method would return the number of characters which could be possibly matched at minimum. i.e:

/foo/.greediness() == 3
/boo/.greediness() == 3
/fooo*/.greediness() == 3
/meow/.greediness() == 4
/foobar/.greediness() == 6
/foo+bar/.greediness() == 6
/m[eo]{4,}w/.greediness() == 6
/foo+ba+r/.greediness() == 6

What would your solution be to this problem?

Comments:
2023-01-22 23:10:15
Seems like a pretty hard problem in general.
2023-01-22 23:10:15
I would start by writing a function that takes a regex and returns a number based on a 'greediness' quotient.
2023-01-22 23:10:15
greediness() could be easily extended within RegEx.prototype. Did you tried that ?
Answers(2) :

Here is a concept of what you might use as an idea.

You can use prototype to define a custom function for the RegEx:

// Extend Regular Expression with a custom method
RegExp.prototype.greediness = function () {
  return (this.source.match(/[*+{]/g) || []).length
}

And then you perform a sorting on that patterns:

// Sort patterns by greediness
const patternByGreediness = patterns.sort((a, b) => {
  return b.greediness() - a.greediness()
})

Check full code snippet

let patterns = [
  /foo+ba+r/,
  /foo/,
  /foo+bar/,
  /foobar/,
  /m[eo]{4,}w/,
  /boo/,
  /fooo*/,
  /meow/
]

// Sort
patterns.sort((a, b) => {
  return b.source.length - a.source.length
}).reverse()

// Reverse order as you want
console.log(patterns)

// Extend Regular Expression with a custom method
RegExp.prototype.greediness = function () {
  return (this.source.match(/[*+{]/g) || []).length
}

// Sort patterns by greediness
const patternByGreediness = patterns.sort((a, b) => {
  return b.greediness() - a.greediness()
})
console.log(patternByGreediness)

// Get a number of how greedy a pattern is
console.log(/m[eo]{4,}w/.greediness())

Hope that helps.

Comments:
2023-01-22 23:10:15
Definitely needs some improvement but I like how you have used prototype to add the greediness method.
2023-01-22 23:10:15
Altering the built-in prototypes is widely considered a bad practice these last several... decades.

As Pointy said in the comments, this is a hard problem.

Here is the beginning of a solution:

const greediness = (s) =>
  s .toString () .slice (1, -1)
    .replace (/\[[^\]]+]/g, 'X')
    .replace (/.\{((\d+)(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
    .replace (/.\+/g, 'X')
    .replace (/.\?/g, '')
    .replace (/.\*/g, '')
    .length

const sortByGreediness = (patterns) =>
  [...patterns] .sort ((a, b) => greediness (a) - greediness (b))
    // .map (s => [s, greediness (s)])  // to show sizes


const patterns = [/foo+ba+r/, /foo/, /foo+bar/, /foobar/, /m[eo]{4,}w/, /boo/, /fooo*/, /meow/]


console .log (sortByGreediness (patterns))

We simply take the text of the regex and replace quantifiers and their preceding characters with the smallest number of characters that might match. We do something similar for blocks like [eo] and X{4,}.

This we might go through steps like this:

m[eo]{4,}wp+u+r?r*
mX{4,}wp+u+r?r*
mXXXXwp+u+r?r*
mXXXXXwXr?r*
mXXXXXwXr*
mXXXXXwX
  - length 7

But this doesn't touch on the complexities that can be inside a regex, and doesn't even try to handle capturing groups. I think it would next to impossible to do for the full regex spec, but perhaps this can be expanded toward what you need.

(If you're getting more complex, you might want to do this repeatedly with code something like the following, or with a while loop in place of its recursion.

const greediness = (s, next = s .toString () .slice (1, -1), prev = null) =>
  next === prev
    ? next .length
    : greediness (s, next
        .replace (/\[[^\]]+]/g, 'X')
        .replace (/.\{((\d+)(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
        .replace (/.\+/g, 'X')
        .replace (/.\?/g, '')
        .replace (/.\*/g, '')
      , next)