Sphider, the open source PHP spider (aka Web crawler) and search engine, uses the fsockopen() function to get files that are spidered. This means that if the site you are spidering is protected via .htaccess or the Apache directive to protect realms, Sphider will return a “401 unreachable” error when attempting to fetch files during spidering and indexing.

To enable Sphider to access files in a protected realm, we need to modify the functions url_status and getFileContents.

First, create a user such as “sphider” and assign it a password via the shell…

htpasswd yourhtaccessfile sphider

… and provide a password when prompted. This user will be used exclusively for Sphider.

Then, modify the functions in /admin/spiderfuncs.php:

getFileContents function

Replace:

$request = "GET $path HTTP/1.0\r\nHost: $host$portq\r\nAccept: $all\r\n
           User-Agent: $user_agent\r\n\r\n";

with:

$user="sphider";
$pass="abc12345";
$request = "GET $path HTTP/1.1\r\nHost: $host$portq\r\nAuthorization: Basic "  . 
           base64_encode ("$user:$pass") . "\r\n\r\n" .  
           "Accept: $all\r\nUser-Agent: $user_agent\r\n\r\n";

url_status function

Replace:

$request = "HEAD $path HTTP/1.0\r\nHost: $host$portq\r\nAccept: $all\r\n
           User-Agent: $user_agent\r\n\r\n";

with:

$user="sphider";
$pass="abc12345";
$request = "HEAD $path HTTP/1.1\r\nHost: $host$portq\r\nAuthorization: Basic "  . 
           base64_encode ("$user:$pass") . "\r\n\r\n" .  
           "Accept: $all\r\nUser-Agent: $user_agent\r\n\r\n";

The user and password will not be visible to users, as it used solely during indexing.

Download this mod for Sphider (1.3.4)

Download this mod for Sphider Plus (1.6)

From Seth Godin’s blog:

Some of Godin’s buzzwords that made it into books (or the other way around…):

And his new book “Tribes”

Mass marketing created an angry, selfish beast, a hungry one, one that demanded to be fed. So marketers fed it, they fed it with any ads they could find. And when they couldn’t find ads, they spammed us. All in the name of commerce, all because they’re doing their job.

Things have changed, far more dramatically than most people realize. Not just what marketers buy, but what the media does all day, and what marketers build, and what we get paid to do and what and where we pay attention…