Posts By Category

Posts By Date

Resources:

C# Books
ASP.NET Books DotNet4All








If you like to support this site, feel free to make a donation to support improvements.

Thank you!

Monetize Your Blog

Alternative way to string.split using Regular Expressions

The other day I was working on a YoutTube auto uploader software for one of my clients, and I needed to prevent passing along single character tags to the uploader engine. If you are familiar with the rules of the tag field when uploading videos to YouTube, you know that tags must be 2 or more characters long.

In my YouTube auto video uploader, tags were automatically generated from file names and other criteria like mp3 tag data, so to make a long story short, there were cases when single character tags slipped through the cracks to the uploader engine, causing my uploads to fail.

So to give an example, consider this YouTube tag string:

3,doors,down,its,not,my,time,Its,Not,My,T,ime,CDM,hh,h

In this string, I needed to eliminate "3", "T" and "h" since they are single character strings, and merge them into the neighboring next or previous tag. So I needed to transform the above string to:

3doors,down,its,not,my,time,Its,Not,My,Time,CDM,hhh

My first thing to do was to write code to find single characters strings. I could of course use string.split with the comma as the separator, then loop through all strings and find single charactered ones...etc. but I didn't really feel like writing too much code, so I decided to go the RegEx way. So, using the following regular expression, I was able to find tags that were 1 character long:

(?<=(\A|,))(?<val>(\w|\W){1})(?=(,|\Z))

Isn't it pretty? :-)

My next thing was to combine these single-character tags with the neighboring next (or previous) string in the comma separated tag string, so I wrote this code:

string tagLine = "3,doors,down,its,not,my,time,Its,Not,My,T,ime,CDM,hh,h" ;

MatchCollection matches = Regex .Matches(tagLine,

@"(?<=(\A|,))(?<val>\w{1})(?=(,|\Z))" ,

RegexOptions .IgnoreCase);

foreach ( Match match in matches)

    tagLine = tagLine.Replace( "," + match.Groups[ "val" ].Value

               + "," , "," + match.Groups[ "val" ].Value);

The above code takes care of the single charactered strings that happen to be within the bigger string, but it does not take care of the two edge conditions, if they occur at start, and/or at end of the tagLine string, so I needed to add the following code to cover edge conditions:

if (tagLine[1] == ',')

    tagLine = tagLine.Remove(1, 1);

if (tagLine[tagLine.Length - 2] == ',')

    tagLine = tagLine.Remove(tagLine.Length - 2, 1);

Now my tagLine string is ready for uploading to YouTube :-)

The uploader engine itself however is another story... contact me if you need a copy of it :-)

kick it on DotNetKicks.com

Feedback

Please post your comments:

Name:  
Email (optional): Your email address will not be posted.
URL (optional):
Comments: HTML will be ignored, URLs will be converted to hyperlinks  
Enter the text you see in the box:
 


Copyright © 2007 Yousef Mannaa. All material on this site is copyrighted.
Do not publish or reproduce any of this material without written permission from the Author