Strip html tags and extract subset of string from text using regular expression in c-sharp

Saturday, February 9, 2013

Strip html tags and extract subset of string from text using regular expression in c-sharp

Today I am presenting a quick tips on how to strip html from text using regular expression (with Regex class) in C#. In a scenario like presenting a blurb or summary of certain characters we may need to remove html tags from a html string (of news details, article details etc.). I have following function in my Helper library for the very problem.

    /// 
    /// Strip out html tags from text
    /// 
    /// Source string
    /// 
    public static string StripTagsFromHtml(string source)
    {
        return Regex.Replace(source, "<.*?>", string.Empty);
    }

To extract a number of characters from the source string, we can extend the function as following.

    /// 
    /// Strip out html tags from text and return extract from it
    /// 
    /// Source string
    /// Number of characters to extract
    /// 
    public static string StripTagsFromHtml(string source, int characterCount)
    {
        string stripped = Regex.Replace(source, "<.*?>", string.Empty);
        if (stripped.Length <= characterCount)
            return stripped;
        else
            return stripped.Substring(0, characterCount);
    }

Happy programming!

dotNETspidor: A dot net programming blog

Pages

Browse All Posts

Labels

Saturday, February 9, 2013

Strip html tags and extract subset of string from text using regular expression in c-sharp

0 comments:

Post a Comment

Popular Posts

Recent Articles

Love your inbox?

Follow me!

Followers

About Me