Sample code for 30+ languages & platforms
Classic ASP

Avoid URLs Matching Any of a Set of Patterns

See more Spider Examples

Demonstrates how to use "avoid patterns" to prevent spidering any URL that matches a wildcarded pattern. This example avoids URLs containing the substrings "java", "python", or "perl".

Chilkat Classic ASP Downloads

Classic ASP
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<%
success = 0

set spider = Server.CreateObject("Chilkat.Spider")

' The spider object crawls a single web site at a time.  As you'll see
' in later examples, you can collect outbound links and use them to 
' crawl the web.  For now, we'll simply spider 10 pages of chilkatsoft.com
spider.Initialize "www.chilkatsoft.com"

' Add the 1st URL:
spider.AddUnspidered "http://www.chilkatsoft.com/"

' Avoid URLs matching these patterns:
spider.AddAvoidPattern "*java*"
spider.AddAvoidPattern "*python*"
spider.AddAvoidPattern "*perl*"

' Begin crawling the site by calling CrawlNext repeatedly.

For i = 0 To 9

    success = spider.CrawlNext()
    If (success = 1) Then
        ' Show the URL of the page just spidered.
        Response.Write "<pre>" & Server.HTMLEncode( spider.LastUrl) & "</pre>"
        ' The HTML is available in the LastHtml property
    Else
        ' Did we get an error or are there no more URLs to crawl?
        If (spider.NumUnspidered = 0) Then
            Response.Write "<pre>" & Server.HTMLEncode( "No more URLs to spider") & "</pre>"
        Else
            Response.Write "<pre>" & Server.HTMLEncode( spider.LastErrorText) & "</pre>"
        End If

    End If

    ' Sleep 1 second before spidering the next URL.
    spider.SleepMs 1000
Next

%>
</body>
</html>