Friday, June 15, 2012

HTMLAgilityPack


In this blog iam describe the implementation and usage of HTMLAgilityPack
HtmlAgilityPack is one of the great open sources projects I ever worked with. It is a HTML parser for .NET applications, works with great performance, supports malformed HTML.
HTMLAgilityPack is used to retrieve the value from browser. The main goal of the HTMLAgilityPack
Website crawling or screen scraping

Download and build the HTMLAgilityPack solution.

The sample code are here


Take the html hidden value

var webGet = new HtmlWeb();
            var document = webGet.Load(url);
            var value = document.DocumentNode.SelectSingleNode("//input[@type='hidden' and @name='mob']")
                .Attributes["value"].Value;


Take the html td value



var webGet = new HtmlWeb();
            var document = webGet.Load(url);

            foreach (HtmlNode li in document.DocumentNode.SelectNodes("//td[@class='textnopad']"))
            {
                if(li.InnerText.Contains("91"))
                {
                    string mob = li.InnerText;
                }
            }
 Take the html anchor  tag value

var webGet = new HtmlWeb();
            var document = webGet.Load(url);
            var linksOnPage = from lnks in document.DocumentNode.Descendants()
                              where lnks.Name == "a" &&
                                    lnks.Attributes["href"] != null &&
                                     lnks.Attributes["href"].Value.Contains(id)
                                   
                              select new
                              {
                                  Url = lnks.Attributes["href"].Value,
                                  Text = lnks.InnerText

                              }.Url;

  Take the html anchor  tag value

var webGet = new HtmlWeb();
            var document = webGet.Load(url);
              foreach (HtmlNode li in document.DocumentNode.SelectNodes("//div[@id='cssBox']"))
            {
                string mob = li.InnerText;
            }



No comments:

Post a Comment