Classic ASP
Classic ASP
Avoid URLs Matching Any of a Set of Patterns
See more Spider Examples
Demonstrates how to use "avoid patterns" to prevent spidering any URL that matches a wildcarded pattern. This example avoids URLs containing the substrings "java", "python", or "perl".Chilkat Classic ASP Downloads
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<%
success = 0
set spider = Server.CreateObject("Chilkat.Spider")
' The spider object crawls a single web site at a time. As you'll see
' in later examples, you can collect outbound links and use them to
' crawl the web. For now, we'll simply spider 10 pages of chilkatsoft.com
spider.Initialize "www.chilkatsoft.com"
' Add the 1st URL:
spider.AddUnspidered "http://www.chilkatsoft.com/"
' Avoid URLs matching these patterns:
spider.AddAvoidPattern "*java*"
spider.AddAvoidPattern "*python*"
spider.AddAvoidPattern "*perl*"
' Begin crawling the site by calling CrawlNext repeatedly.
For i = 0 To 9
success = spider.CrawlNext()
If (success = 1) Then
' Show the URL of the page just spidered.
Response.Write "<pre>" & Server.HTMLEncode( spider.LastUrl) & "</pre>"
' The HTML is available in the LastHtml property
Else
' Did we get an error or are there no more URLs to crawl?
If (spider.NumUnspidered = 0) Then
Response.Write "<pre>" & Server.HTMLEncode( "No more URLs to spider") & "</pre>"
Else
Response.Write "<pre>" & Server.HTMLEncode( spider.LastErrorText) & "</pre>"
End If
End If
' Sleep 1 second before spidering the next URL.
spider.SleepMs 1000
Next
%>
</body>
</html>