Clan x86

Technical (Development, Security, etc.) => General Programming => Tutorials, References, and Examples => Topic started by: Sidoh on September 11, 2005, 10:52:44 PM

Title: [PHP] URL Regular Expression
Post by: Sidoh on September 11, 2005, 10:52:44 PM
Anyone have one that works?  I've found a few on the internet, but it's been so long since I've created my own regular expressions.

I'm looking for one that follows the syntax:

<protocol>://<subdomain>.<domain>.<com/net,etc>

Thanks in advance!

Edit --

Since no one who posted in this thread was able to come up with one, I was forced to do it on my own!  *sob*

So none of the rest of you have to suffer through such horrid events, I'll post my solution at the top of the thread:

$search[] =

"^(((ht|f)tp(s?))\:\/\/)(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.([a-z])(\:[0-9]+)*((\/?))(([a-zA-Z0-9\.

\,\;\?\'\\\=\/\_\-\#]+)?)^";
Title: Re: [PHP] URL Regular Expression
Post by: deadly7 on September 11, 2005, 10:59:13 PM
One word: Huh?
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 11, 2005, 11:03:41 PM
This is the one I found:

/^(((ht|f)tp(s?))\:\/\/)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$/

I'm not too sure what I need to begin or truncate that string with, but it doesn't work with preg_replace.
Title: Re: [PHP] URL Regular Expression
Post by: Quik on September 11, 2005, 11:04:45 PM
Maybe you could include some information about what you need this for.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 11, 2005, 11:31:50 PM
Like this:

http://www.google.com

See how SMF automatically generates an ancor tag becaue it recognizes it as a link?  That's what I'm wanting.
Title: Re: [PHP] URL Regular Expression
Post by: Quik on September 11, 2005, 11:39:15 PM
If 'http://' is present, it creates <a href=" ...
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 11, 2005, 11:50:22 PM
It's more elaborate than that.

ftp://www.something.com

And I would probably settle for that, but I'm not sure how to translate it into a regular expression, which is why I'm creating this post.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 12, 2005, 05:29:59 PM
Someone's got to have one... :(
Title: Re: [PHP] URL Regular Expression
Post by: Blaze on September 12, 2005, 06:57:00 PM
I'd search for www, http and ://.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 12, 2005, 07:02:48 PM
Quote from: Blaze on September 12, 2005, 06:57:00 PM
I'd search for www, http and ://.

I know that... :P

I want it translated into a Regular Expression.  That's why I put regex in the title!  :D
Title: Re: [PHP] URL Regular Expression
Post by: Ryan Marcus on September 13, 2005, 04:39:19 PM
Its not all that hard.

Method 1: Split the message into an array seperated by spaces. One element per word. Then, use parse_url. Slow and clunky.
Method 2: Use strpos to search find "http", "://", ".com", ".net" and ".org". Use a buffer type method.
Method 3: Use a good WYSWYG web based editor, like the one in exponent. (http://www.exponentcms.org/")

Hope I helped...

Title: Re: [PHP] URL Regular Expression
Post by: Blaze on September 13, 2005, 04:40:33 PM
How about you combine method2 and 1...

Search for it, then use that function.
Title: Re: [PHP] URL Regular Expression
Post by: Ryan Marcus on September 13, 2005, 04:41:43 PM
Why bother? Once you know its a URL, you don't need to parse it.. Just add the <a> tag.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 13, 2005, 04:49:51 PM
Quote from: Ryan Marcus on September 13, 2005, 04:39:19 PM
Its not all that hard.

Method 1: Split the message into an array seperated by spaces. One element per word. Then, use parse_url. Slow and clunky.
Method 2: Use strpos to search find "http", "://", ".com", ".net" and ".org". Use a buffer type method.
Method 3: Use a good WYSWYG web based editor, like the one in exponent. (http://www.exponentcms.org/")

Hope I helped...


You're thinking of too abstract a method.

I want a regular expression that defines a URL.  I posted one, but there's obviously something incorrect about it--it doesn't work.

Regular Expressions are vastly more efficient than anything you posted.  There's no reason for me to use a WYSIWYG editor.  I'm developing a function that will search and define links in dynamic content.  I just need a regular expression that accurately defines a URL.  After this, I'd just use preg_replace to replace all URL's to a URL+Anchor.

It's been a long time since I've worked with regular expressions much, and I was hoping there was someone here who's more fresh with them than I.
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 13, 2005, 10:53:25 PM
for(int i = 0; i < strlen(data); i++) {
  if (substr(data, i, 7) == "http://") {
    // url starts here
  } elseif (substr(data, i, 6) == "ftp://") {
    // omfg another url!
  } elseif (something) {
    // you get the drift
  } else {
    // LOL NOTHING
  }
}


EDIT -
I had my less than sign backwards, as usual. -sigh-
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 13, 2005, 11:04:19 PM
Inefficient.



$message = preg_replace("<REGULAR EXPRESSION THAT WORKS>", '<a href="\1">\1</a>', $message);



SO PWNS that.  I just need a working regular expression.  I'm shocked none of you have worked with them.  O_o

Well, not so much you, Joe.  You're a VB person =p
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 13, 2005, 11:56:54 PM
<?
  echo findurl("ftp://www.test.org str1 http://www.test.com str2 ftp://www.x86labs.org str3");

  function findurl($data) {
    $ret = "";
    $words = explode(" ", $data);
    foreach($words as $word) {
      if (substr($word, 0, 7) == "http://") {
        $ret = $ret . '<a href="' . $word . '">' . $word . "</a> ";
      }elseif (substr($word, 0, 6) == "ftp://") {
        $ret = $ret . '<a href="' . $word . '">' . $word . "</a> ";
      }else{
        $ret = $ret . $word . " ";
      }
    }
    return $ret;
  }
?>

*backs away while it still works*

http://www.javaop.com/~joe/url.php
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 14, 2005, 12:04:40 AM
Thanks, but I'm going to stick to finding a working regular expression.  :P
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 14, 2005, 12:23:30 AM
A WORKING? WTF U SMOKIN NGR?
Title: Re: [PHP] URL Regular Expression
Post by: Mythix on September 14, 2005, 02:45:52 PM
pft we all know joe hardcoded those links.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 14, 2005, 04:32:26 PM
<?php	$seach = array();	$replace = array();	$search[] = "^(((ht|f)tp(s?))\:\/\/)(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.([a-z])(\:[0-9]+)*((\/?))(([a-zA-Z0-9\.\,\;\?\'\\\=\/\_\-\#]+)?)^";	$replace[] = '<a href="\0" target="_blank">\0</a>';	$string = "Hello.  This is a test string.  http://www.google.com  <br /><br />I hope this serves as proof that regular expressions > all.  http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!";	$string = preg_replace($search, $replace, $string);	echo $string;?>


http://sidoh.no-ip.org/reg-ex.php

Owned.
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 14, 2005, 05:07:38 PM
You should be shot? =p
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 12:55:27 PM
Quote from: Joe[e2] on September 14, 2005, 05:07:38 PM
You should be shot? =p

Regex > You
Title: Re: [PHP] URL Regular Expression
Post by: deadly7 on September 15, 2005, 06:11:58 PM
Sidoh, do you always mess up your code?
Quote from: Sidoh on September 14, 2005, 04:32:26 PM
<?php	[b]$seach = array();[/b]	$replace = array();	[b]$search[] = [/b]"^(((ht|f)tp(s?))\:\/\/)(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.([a-z])(\:[0-9]+)*((\/?))(([a-zA-Z0-9\.\,\;\?\'\\\=\/\_\-\#]+)?)^";	$replace[] = '<a href="\0" target="_blank">\0</a>';	$string = "Hello. This is a test string. http://www.google.com <br /><br />I hope this serves as proof that regular expressions > all. http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!";	$string = preg_replace($search, $replace, $string);	echo $string;?>


http://sidoh.no-ip.org/reg-ex.php

Owned.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 06:16:32 PM
Know that [ php] is a syntax highlighted tag, so don't expect the bold tag will work in the future.

Additionally, since PHP contains the ability to define variables implicitly, I didn't mess up my code.  It works fine, there's just memory allocated for an array that isn't used.  :P
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 15, 2005, 08:03:05 PM
echo(findurl(""Hello. This is a test string. http://www.google.com <br /><br />I hope this serves as proof that regular expressions > all. http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!");
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 08:17:50 PM
Quote from: Joe[e2] on September 15, 2005, 08:03:05 PM
echo(findurl(""Hello. This is a test string. http://www.google.com <br /><br />I hope this serves as proof that regular expressions > all. http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!");

o_O
Title: Re: [PHP] URL Regular Expression
Post by: Joe on September 15, 2005, 08:58:53 PM
You can fix it.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 09:02:22 PM
Quote from: Joe[e2] on September 15, 2005, 08:58:53 PM
You can fix it.

hahaha.
Title: Re: [PHP] URL Regular Expression
Post by: Blaze on September 15, 2005, 09:13:06 PM

<?php

echo(findurl("Hello. This is a test string. http://www.google.com <br /><br />I hope this serves as proof that regular expressions > all. http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!"));

?>


Like that?
Title: Re: [PHP] URL Regular Expression
Post by: deadly7 on September 15, 2005, 09:13:45 PM
Quote from: Sidoh on September 15, 2005, 06:16:32 PM
Know that [ php] is a syntax highlighted tag, so don't expect the bold tag will work in the future.

Additionally, since PHP contains the ability to define variables implicitly, I didn't mess up my code.  It works fine, there's just memory allocated for an array that isn't used.  :P
Which is stupid. :)
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 09:16:01 PM
Quote from: Blaze on September 15, 2005, 09:13:06 PM

<?php

echo(findurl("Hello. This is a test string. http://www.google.com <br /><br />I hope this serves as proof that regular expressions > all. http://sidoh.no-ip.org/reg-ex.php FTW. <br /><br /> Visit http://www.x86labs.org/forum/index.php?topic=2790.msg27101#msg27101 for more information!"));

?>


Like that?

/golfclap  :P
Title: Re: [PHP] URL Regular Expression
Post by: Blaze on September 15, 2005, 11:20:58 PM
I don't get it...  :-[
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on September 15, 2005, 11:38:23 PM
Quote from: deadly7 on September 15, 2005, 09:13:45 PM
Quote from: Sidoh on September 15, 2005, 06:16:32 PM
Know that [ php] is a syntax highlighted tag, so don't expect the bold tag will work in the future.

Additionally, since PHP contains the ability to define variables implicitly, I didn't mess up my code.  It works fine, there's just memory allocated for an array that isn't used.  :P
Which is stupid. :)

Blame PHP, not me.
Title: Re: [PHP] URL Regular Expression
Post by: Sidoh on October 03, 2005, 09:33:01 PM
Okay, this is really pretty frusturating.  I should have realized it earlier.  My bbcode function replaces [link ] tags, which also follow the format of this regular expression.  When it replaces it with the HTML <a> tag, the actual URL (which follows the pattern of the regular expression) contained in it is replaced by the regular expression.  It's really annoying, and I found a way around it, but it's not ideal.  I think I'm going to have to resort to using some search/buffer/replace type method like was originally suggested.  Even though this is a lot slower and less efficient, it's really the only methodology I can forsee working in every case.

Here's the regular expression I used (it works great!):

$s[] = "^[\s]+(((ht|f)tp(s?))\:\/\/)(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.([a-z])(\:[0-9]+)*((\/?))(([a-zA-Z0-9\.\,\;\?\'\\\=\/\_\-\#\%\&]+)?)^";
$r[] = '<a href="\0" target="_blank">\0</a>';


Notice the "[\s]+", which indicates some amount of whitespace (space, linebreak, tab, etc) prepending the URL, as long as it's there.  This works in most cases, but what if the link is at the beginning of the text or IS the entire text?  Yeah, doesn't work in those situations. 

Without it, though, the following regular expression causes issues because of the nature it represents:

$s[] = "#\[(link|url)=(((ht|f)tp(s?))\:\/\/)(.*?)](.*?)\[/link\]#si";
$r[] = '<a href="\2\6" target="_blank">\7</a>';

This is a [ link ] [ /link ] tag.  It also works fine in isolation, but it also gets replaced with the following after the URL regular expression runs through preg_replace:

[link=<a href="<URL>"><URL></a>]<text>[/link]

There's one possible solution I can think of still retaining the regular expression methodology, but I haven't gotten it to work.  That is: If it matches this string and "DISCLUDES" this string.  IE it can't begin with url= or link=, which would eliminate my problem with this.

I'm going to try to mess around with a bit of search/buffer methodologies of URL replacing.  I'll post with updates.