Classic ASP
Classic ASP
Must-Match Patterns
See more Spider Examples
You may restrict the spider to only follow links that match any one of a set of "must-match" wildcard patterns. The AddMustMatchPattern can be called repeatedly to add must-match patterns.Chilkat Classic ASP Downloads
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<%
success = 0
set spider = Server.CreateObject("Chilkat.Spider")
' --------------------------------------------------------------------
' Note: The URLs in this example are no longer valid.
' You should replace the URLs with URLs from a site of your
' own choosing -- preferably your own site if testing.
' (Google's Directory no longer exists.)
' --------------------------------------------------------------------
' First, we'll get the outbound links for a page in the
' Google directory. Then we'll add some must-match
' and then re-fetch, to see it work...
spider.Initialize "directory.google.com"
spider.AddUnspidered "http://directory.google.com/Top/Recreation/Outdoors/Hiking/Backpacking/"
success = spider.CrawlNext()
' Display the outbound links
For i = 0 To spider.NumOutboundLinks - 1
Response.Write "<pre>" & Server.HTMLEncode( spider.GetOutboundLink(i)) & "</pre>"
Next
' The output:
' http://www.backpacker.com
' http://www.cmc.org
' http://www.backpacking.net
' http://www.thebackpacker.com/
' http://www.rei.com/online/store/LearnShareArticlesList?categoryId=Camping
' http://www.trailspace.com/
' http://www.catskillhikes.com/
' http://gorp.away.com/gorp/location/asia/nepal/favpicks.htm
' http://www.backpackinglight.com/cgi-bin/backpackinglight/index.html
' http://www.yetizone.com/
' http://www.backpackingfun.com
' http://www.freezerbagcooking.com/
' http://www.spadout.com/backpacking/
' http://sierrabackpacker.com
' http://www.abovecalifornia.com/
' http://www.personal.psu.edu/faculty/r/p/rpc1/bbb/
' http://www.thebackpackersguide.com
' http://www.journeywest.com/WB/index.html
' http://www.johann-sandra.com/backpackdir.htm
' http://www.geocities.com/amytys/
' http://www.cloudwalkersbasecamp.com
' http://www.netbackpacking.com
' http://members.tripod.com/~stooges/
' http://www.thebackpackingsite.com
' http://www.thruhikers.com/
' http://www.redcompservices.com/AT/
' http://members.aol.com/CMorHiker/backpack
' http://mywebpages.comcast.net/midwestpacker/
' http://www.midwesthiker.com/
' http://www.WeBackpack.com
' http://www.michiganhiker.com
' http://www.host33.com/backpack/
' http://www.wilderness-backpacking.com
' http://www.thetravelmonkey.net
' http://dmoz.org/cgi-bin/add.cgi?where=Recreation/Outdoors/Hiking/Backpacking
' http://dmoz.org/about.html
' http://dmoz.org/cgi-bin/apply.cgi?where=Recreation/Outdoors/Hiking/Backpacking
' http://dmoz.org
' http://dmoz.org/profiles/cdog.html
' http://dmoz.org/profiles/justinwp.html
' Do it again, but this time with avoid patterns.
spider.Initialize "directory.google.com"
spider.AddUnspidered "http://directory.google.com/Top/Recreation/Outdoors/Hiking/Backpacking/"
' Add some must-match patterns:
spider.AddMustMatchPattern "*.com/*"
spider.AddMustMatchPattern "*.net/*"
' Add some avoid-patterns:
spider.AddAvoidOutboundLinkPattern "*.mypages.*"
spider.AddAvoidOutboundLinkPattern "*.personal.*"
spider.AddAvoidOutboundLinkPattern "*.comcast.*"
spider.AddAvoidOutboundLinkPattern "*.aol.*"
spider.AddAvoidOutboundLinkPattern "*~*"
success = spider.CrawlNext()
Response.Write "<pre>" & Server.HTMLEncode( "-----------------------") & "</pre>"
' Display the outbound links
For i = 0 To spider.NumOutboundLinks - 1
Response.Write "<pre>" & Server.HTMLEncode( spider.GetOutboundLink(i)) & "</pre>"
Next
' Output:
' http://www.thebackpacker.com/
' http://www.rei.com/online/store/LearnShareArticlesList?categoryId=Camping
' http://www.trailspace.com/
' http://www.catskillhikes.com/
' http://gorp.away.com/gorp/location/asia/nepal/favpicks.htm
' http://www.backpackinglight.com/cgi-bin/backpackinglight/index.html
' http://www.yetizone.com/
' http://www.freezerbagcooking.com/
' http://www.spadout.com/backpacking/
' http://www.abovecalifornia.com/
' http://www.journeywest.com/WB/index.html
' http://www.johann-sandra.com/backpackdir.htm
' http://www.geocities.com/amytys/
' http://www.thruhikers.com/
' http://www.redcompservices.com/AT/
' http://www.midwesthiker.com/
' http://www.host33.com/backpack
%>
</body>
</html>