News:

Wieners, Brats, Franks, we've got 'em all.

Main Menu

Wildcard kick and ban

Started by Lance, July 08, 2008, 02:04:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

iago

It may be slightly faster, but it's also more confusing, I think, and definitely harder to debug if something is wrong.

Camel

You're saying that regexps are less confusing than an extremely simple pattern match algorithm? I could never agree with you on that; not by any stretch.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

iago

Using regexps for simple, targeted uses is easy, yes. You use .* for * and . for ?, that's all you need to know. Besides that, it's basically a super short wrapper ("Put a \ before non-alphanumeric") and calling a known, tested built-in function.

It's extremely unclear from looking at your function what it does, and if my results weren't right I wouldn't even know where to start.

Camel

#18
I agree that that specific implementation is fairly ugly (particularly due to lack of comments and poorly chosen variable names), but the actual algorithm is extremely simple; if you were to manually look at the pattern and input string, you'd probably use the same algorithm in your head to test for a match.

That is probably #2 on my list of things you should never ever do, immediately below running untrusted code. The fact that you tout yourself as a security guy, and then turn around and recommend that someone actually generates regexp patterns on the fly makes me seriously question your integrity. You're naively assuming that regexps are known, tested, and built-in; in fact, none of those statements are true. Aside from that, there are plenty of other good reasons you shouldn't do it.
1) Throwing a slash before a every character is not sufficient for escaping in a regex; it's only by popular ignorance that characters are accepted in that way by some implementations
2) There is no guarantee that you can use regexps on all platforms; some JVMs simply don't support it, so you'll have to handle that situation anyways
3) If the JVM does give you a regexp implementation, there's no guarantee that it's going to be identical to the one you developed with. Regexps are one of the very few places where Java is famous for incompatibility; if you develop for Windows, you can be pretty sure that your regexp won't work on Linux - there's simply too many variants, and no authoritative standard on which one you'll get

I'm not trying to say that there are no conditions where regexp are useful or merited -- I do actually use them myself in my bot -- but I am saying that they're always undesirable. I use them to pick out URLs in the text area so that I can launch a browser when they're clicked on, which has a pretty high rate of failure anyways, so I only have to worry about the platforms that I support.

This is a case where the amount of work required to reinvent the wheel is significantly less than the amount of work required to handle every potential problem that could come up, even if you don't consider that that means writing your own pattern match algorithm to fall back on when regexps are unavailable.

I beg of you, do not generate regexp patterns on the fly! It's a wide open door just waiting to be exploited, and all you're doing is asking for trouble.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

iago

There can definitely be security problems with regexp implementations (and I seem to recall one rather recently? PCRE, maybe?), but there can be security problems with any function. Are you going to stop using the string functions in case the developers missed a buffer check? What about the XML functions, do you also recommend you roll your own?

To answer your various points:
1) It's always worked for me. In fact, the Java specifications clearly state that it's valid:
Quote
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

2) Regular expressions have always worked for me on various platforms (Linux, Sun, Windows) on different Java versions (1.3 - 1.6) with identical code. Maybe if you're doing something complicated or tricky it's different, but the general stuff has worked flawlessly for me everywhere. Would you mind listing some differences that could affect this usage?

3) I've never had that problem. Mind listing the affected platforms?

It's possible that you're talking about non-Sun implementations, but because Java isn't an open standard, you're comparing apples to oranges.

Quote from: Camel on July 23, 2008, 05:21:01 PM
I beg of you, do not generate regexp patterns on the fly! It's a wide open door just waiting to be exploited, and all you're doing is asking for trouble.
You end your post by stating that, but you didn't have anything in your post that even comes close to making that assertion.

Camel

If you look in to the java.util.regex package, you'll notice that its set up as a system of factories specifically so that it can be implemented more easily through JNI, which most non-Sun JVMs do.

1) That part of the java spec has not always been a standard; in fact, the package didn't even exist in 1.4. There was a sun.misc class regex that was very volatile across platforms, and for a long time after 1.5 came out, bugs from that era were still affecting even the official 1.5 JVM. Furthermore, Sun is not the only entity that has ever made a JVM, and there are still JDKs that to this day strongly discourage the use of classes from the java.util.regex package. At my company, discussion of the package is a non-starter, as we rely on 100% stability 100% of the time, which the package can not guarantee for our target environment.

2) That's impossible; the packages that existed in 1.3, 1.4, and 5.0 have very little overlap, and there is zero overlap from 1.3 to 5.0. They did this intentionally to address the security and performance problems or the original design.

3) Any early 1.5 version of any non-Sun JDK will most likely be affected by the issue where the Pattern and Matcher classes are stubs which return null. I generally use the Blackdown JVM, which was affected by an unrelated issue in my own use case at that time. The cause of this was not a bug in Blackdown's JVM, it was a bug in Sun's JVM that was worked around in the a part of the JRE which is reused by Blackdown.

In my experience, the vast majority of people who have Java installed have an early release of a 1.5 JRE (1.5.2-1.5.4). The regexp classes are so broken in 1.5.6 that it actually can crash Hotspot and bring down the JVM. The behavior is not as expected in versions as recent as 1.5.9.

Java is most definitely an open standard! In fact, it is well on its way to being a fully open-source project. Where on earth did you get the notion that it was not open? Java started the trend!

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Camel

To clarify: Sun's JRE has been fully open-sourced and distributed with the bigger JDK package since 1.5, and they're working on open-sourcing the JVM.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

iago


Camel

#23
I won't be a part of your semantic argument. Regexps are unsafe in Java; I made my point.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

iago

It's not semantics, it's a very important distinction.

You can't expect non-Sun implementations of Java to behave the same as Sun's Java, be it for regular expressions, string functions, module loading, etc.

Camel

#25
The discussion about whether Java is a standard or not is a semantic one, because it depends on the meaning of the word standard. You were referring to the approval of a large standards organization, while I didn't even mean to imply that it was a standard in the first place - was just trying to say it's fully open.

You can't even expect Sun's implementations of Java to behave the same as Sun's Java! There's a reason they come out with a new patch every other week, and it isn't because they're adding new features. In general, non-Sun JVMs do a better job of implementing Sun's specification than Sun does, which generally has a lot to do with the fact that Sun doesn't QA the JRE as it's implemented.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!