开发规范
[!TIP|style:callout|label:爬山虎应用 worker 有两种运作模式:|iconVisibility:default|labelVisibility:default|className:block-tip] 1、单 worker 运作模式:限定只能编写特定的downloader实例,即可完成所有的爬虫需求;
好处是开箱即用,不依赖redis服务,使用PHP内置队列,缺点是只能对付简单的爬虫需求;
2、多 worker 运作模式:支持自由编写任意多个业务worker实例,这是爬山虎默认的工作模式;
--
[!WARNING|style:callout|label: 作者推荐使用爬山虎应用框架开发|iconVisibility:default|labelVisibility:default|className:block-warning] 手册所有例子都是以爬山虎应用框架为上下文应用环境的,若要自行编写请参考"常见问题"章节
1、全局启动脚本编写:
全局启动脚本是一个独立的全局启动脚本,其中一次性加载了多个业务 worker 实例,该脚本的存放位置随意, 默认由爬山虎应用助手自动生成,如果手动自由编写,只需要保证脚本能够正常引入如下代码即可:
require_once "/path/to/PHPCreeper-Appication/Application/Core/Launcher.php";
2、单一启动脚本编写:
单一启动脚本指的是各个独立的业务 worker 启动脚本,同样默认由爬山虎应用助手自动生成, 除非你手动自由编写,否则这些脚本的存放位置不可随意摆放,必须位于如下特定的目录中:
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本1.php
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本2.php
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本N.php
单一启动脚本代码片段 AppProducer.php:
namespace PHPCreeperApp\Spider\News\Start;
require_once dirname(__FILE__, 4) . '/Core/Launcher.php';
use PHPCreeperApp\Core\Launcher;
use PHPCreeper\Kernel\PHPCreeper;
use PHPCreeper\Producer;
use PHPCreeper\Kernel\Task;
use Configurator\Configurator;
use Logger\Logger;
class AppProducer
{
/**
* single instance
*
* @var object
*/
static protected $_instance;
/**
* producer instance
*
* @var object
*/
protected $_producer;
/**
* @brief get single instance
*
* @return object
*/
static public function getInstance()
{
if(!self::$_instance instanceof self)
{
self::$_instance = new self();
}
return self::$_instance;
}
/**
* @brief start entry
*
* @return mixed
*/
public function start($config)
{
//single instance
$this->_producer = new Producer($config);
//set name
$this->_producer->setName('producer1');
//set process number
$this->_producer->setCount(1);
//set callback
$this->_producer->onProducerStart = array($this, 'onProducerStart');
$this->_producer->onProducerStop = array($this, 'onProducerStop');
$this->_producer->onProducerReload = array($this, 'onProducerReload');
}
/**
* @brief onProducerStart
*
* @param object $producer
*
* @return mixed
*/
public function onProducerStart($producer)
{
}
/**
* @brief onProducerStop
*
* @param object $producer
*
* @return mixed
*/
public function onProducerStop($producer)
{
}
/**
* @brief onProducerReload
*
* @param object $producer
*
* @return mixed
*/
public function onProducerReload($producer)
{
}
}
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
if(!defined('GLOBAL_START'))
{
$classname = pathinfo(__FILE__, PATHINFO_FILENAME);
$config = Launcher::getSpiderConfig($spider ?? getSpiderName(), $classname);
$_classname = __NAMESPACE__ . "\\" . $classname;
$_classname::getInstance()->start($config);
PHPCreeper::start();
}
3、每个业务启动脚本名称必须和相应的配置文件名称完全一致:
/path/to/Application/Spider/News/Start/AppProducer.php
/path/to/Application/Spider/News/Config/AppProducer.php