Saturday, February 9, 2013

Strip html tags and extract subset of string from text using regular expression in c-sharp

Today I am presenting a quick tips on how to strip html from text using regular expression (with Regex class) in C#. In a scenario like presenting a blurb or summary of certain characters we may need to remove html tags from a html string (of news details, article details etc.). I have following function in my Helper library for the very problem.


    /// 
    /// Strip out html tags from text
    /// 
    /// Source string
    /// 
    public static string StripTagsFromHtml(string source)
    {
        return Regex.Replace(source, "<.*?>", string.Empty);
    }


To extract a number of characters from the source string, we can extend the function as following.

    /// 
    /// Strip out html tags from text and return extract from it
    /// 
    /// Source string
    /// Number of characters to extract
    /// 
    public static string StripTagsFromHtml(string source, int characterCount)
    {
        string stripped = Regex.Replace(source, "<.*?>", string.Empty);
        if (stripped.Length <= characterCount)
            return stripped;
        else
            return stripped.Substring(0, characterCount);
    }

Happy programming!

Shout it

0 comments:

Post a Comment

Hope you liked this post. You can leave your message or you can put your valuable suggestions on this post here. Thanks for the sharing and cooperation!

Popular Posts