One common requirement especially when doing screen scrapping is to find strings contained between html tags, or other strings. Regular Expressions provide a powerful way to do so, but are initially intimidating for a beginner programmer. Here is an alternative solution to finding a string between two other strings that uses simple string manipulation in C#, without Regular Expressions:
Usage:
string
myString = "<span>Joe Smith</span>";
string
[] result = GetStringInBetween("<span>", "</span>", myString);
string
output = result[0];
string
next = result[1];
GetStringInBetween finds the first occurrence of the “begin” and “end” strings, then you can use result[1] to allow you to move ahead down the html document to find the next value.
Here is GetStringInBetween implementation:
public
static
string[] GetStringInBetween(string strBegin,
string strEnd, string strSource,
bool includeBegin, bool includeEnd)
{
string[] result ={ "", "" };
int iIndexOfBegin = strSource.IndexOf(strBegin);
if (iIndexOfBegin != -1)
{
// include the Begin string if desired
if (includeBegin)
iIndexOfBegin -= strBegin.Length;
strSource = strSource.Substring(iIndexOfBegin
+ strBegin.Length);
int iEnd = strSource.IndexOf(strEnd);
if (iEnd != -1)
{
// include the End string if desired
if (includeEnd)
iEnd += strEnd.Length;
result[0] = strSource.Substring(0, iEnd);
// advance beyond this segment
if (iEnd + strEnd.Length < strSource.Length)
result[1] = strSource.Substring(iEnd
+ strEnd.Length);
}
}
else
// stay where we are
result[1] = strSource;
return result;
}
Notice you can choose to include or exclude the beginning and ending search strings in the result.
It's a handy utility that I've been using allot in some screen scrapping projects I've done lately.