hadoop v 0.20 + Hbase + thrift + php 教學手冊
此教材包含兩個部份
- 第一部份:Hadoop v 0.20 + Hbase + Thrift + PHP 的安裝、設定與測試
- 第二部份:透過 Thrift 用 PHP 的程式碼存取 Hbase 資料
第一部份
安裝編譯及測試
一、前言
- 本篇建議使用 Linux 作業系統 Ubuntu 8.04 含以上
- 假設已經安裝好 Hadoop (0.20 ) Hbase (0.20) Sun Java 6,完成設定,並且已在運作中
安裝設定 hadoop 0.20
- 本篇的hadoop 安裝於 /opt/hadoop
安裝設定 hbase 0.20
- 本篇的Hbase 安裝於 /opt/hbase
安裝設定 php & apache
- 用apt-get install 方式安裝 php 與 apache
二、安裝 thrift
編譯 thrift
- 目前 thrift 最新版本為 0.4
- 將原始碼解壓縮後的完整路徑為 /opt/thrift,並進入該目錄
$ cd /opt/thrift
- 安裝之前請先確定有裝了 libboost (c++的函式庫),以及make時會用到的yacc flex
$ sudo apt-get install build-essential libboost-dev automake libtool flex bison g++ python python-all-dev
- 接著編譯與安裝thrift
$ ./bootstrap.sh $ ./configure $ make $ sudo make install
產生 thrift 函式庫
- 產生可以存取hbase的php函式庫
$ cp -r /opt/hbase/src/java/org/apache/hadoop/hbase/thrift ./hbase_thrift_src $ cd hbase_thrift_src $ thrift --gen php Hbase.thrift
以上若動作都確實完成,可以看到已經產生出一個資料夾: gen-php/Hbase/ 並且包含兩個php檔案,此兩個php檔可以幫助你存取hbase,不過此 php 檔還是需要其他檔案當函式庫
- 在/var/www內產放入php的thrift函式庫
$ sudo mkdir /var/www/hbase $ sudo cp -r /opt/thrift/lib/php/src /var/www/hbase/thrift $ sudo chown -R $USER:$USER /var/www/hbase $ mkdir /var/www/hbase/thrift/packages $ cp -r /opt/thrift/hbase_thrift_src/gen-php/* /var/www/hbase/thrift/packages/
三、測試 php -> thrift -> hbase
- 複製範例程式 DemoClient.php
$ cd /var/www/hbase $ cp /opt/hbase/src/examples/thrift/DemoClient.php /var/www/hbase/DemoClient.php
- 修改 /var/www/hbase/DemoClient.php
$GLOBALS['THRIFT_ROOT'] = '/var/www/hbase/thrift'; ..... $socket = new TSocket( 'localhost’', 9090 ); .....
- 啟動hbase 的 thrift 服務
$ /opt/hbase/bin/hbase thrift start &
scanning tables... found: t1 found: t2 creating table: demo_table column families in demo_table: column: entry, maxVer: 10 column: unused, maxVer: 3
參考
第二部份
程式碼解析
零、前言
- thrift 是透過非java的其他程式語言,直接對hbase 進行存取的中介函式庫
- 此篇介紹的是如何用php透過 thrift 對 hbase 操作
- 程式安裝測試方法已紀錄於: 安裝編譯及測試
- 因此目錄結構為
hadoop /opt/hadoop hbase /opt/hbase 網頁根目錄 /var/www/ hbase 的php碼目錄 /var/www/hbase thrift php /var/www/hbase/thrift
- 測試程式之前,請先確定
- hbase , hadoop 都有正常運作中
- $ bin/hbase thrift start 尚在執行
一、php引用thrift lib
<? $GLOBALS['THRIFT_ROOT'] = '/var/www/hbase/thrift'; require_once( $GLOBALS['THRIFT_ROOT'].'/Thrift.php' ); require_once( $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php' ); require_once( $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php' ); require_once( $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php' ); require_once( $GLOBALS['THRIFT_ROOT'].'/packages/Hbase/Hbase.php' ); $socket = new TSocket( 'localhost', 9090 ); $socket->setSendTimeout( 10000 ); // Ten seconds (too long for production, but this is just a demo ;) $socket->setRecvTimeout( 20000 ); // Twenty seconds $transport = new TBufferedTransport( $socket ); $protocol = new TBinaryProtocol( $transport ); $client = new HbaseClient( $protocol ); $transport->open(); ?> ........ 其他html 碼 <? 或是 下面提到的各式讀寫操作 ?> ....... <? $transport->close(); ?>
- 所有的程式碼都必須包含這些引入函式庫、開啟關閉socket 的敘述
二、各種對hbase的操作
2.1 列出hbase 裡的所有 table
<? echo( "listing tables...\n" ); $tables = $client->getTableNames(); sort( $tables ); foreach ( $tables as $name ) { echo( " found: {$name}\n" ); } } ?>
2.2 刪除table
<? $name = "hbase table name"; if ($client->isTableEnabled( $name )) { echo( " disabling table: {$name}\n"); $client->disableTable( $name ); } echo( " deleting table: {$name}\n" ); $client->deleteTable( $name ); } ?>
2.3 新增table
- 我們先定義columns 的物件結構如下
<? $columns = array( new ColumnDescriptor( array( 'name' => 'entry:', 'maxVersions' => 10 ) ), new ColumnDescriptor( array( 'name' => 'unused:' ) ) ); ?>
- 將剛剛的column 放到table 內
<? $t = "table name"; echo( "creating table: {$t}\n" ); try { $client->createTable( $t, $columns ); } catch ( AlreadyExists $ae ) { echo( "WARN: {$ae->message}\n" ); } ?>
2.4 列出 table內的家族成員 family
<? $t = "table name"; echo( "column families in {$t}:\n" ); $descriptors = $client->getColumnDescriptors( $t ); asort( $descriptors ); foreach ( $descriptors as $col ) { echo( " column: {$col->name}, maxVer: {$col->maxVersions}\n" ); } ?>
2.5 寫入資料
<? $t = "table name"; $row = "row name" $valid = "foobar-\xE7\x94\x9F\xE3\x83\x93"; $mutations = array( new Mutation( array( 'column' => 'entry:foo', 'value' => $valid ) ), ); $client->mutateRow( $t, $row, $mutations ); ?>
2.6 讀取資料
get 取得一個 column value
- get 取得一個 column value 的用法
<? $table_name = 't1'; $row_name = '1'; $fam_col_name = 'f1:c1'; $arr = $client->get($table_name, $row_name , $fam_col_name); // $arr = array foreach ( $arr as $k=>$v ) { // $k = TCell echo ("value = {$v->value} , <br> "); echo ("timestamp = {$v->timestamp} <br>"); } } ?>
getRow 取得一整個row
- getRow($tableName, $row) 用法
<? $table_name = "table name"; $row_name = "row name"; $arr = $client->getRow($table_name, $row_name); // $client->getRow return a array foreach ( $arr as $k=>$TRowResult ) { // $k = 0 ; non-use // $TRowResult = TRowResult printTRowResult($TRowResult); } ?>
scan 一整個table
<? $table_name = 't1'; $start_row = ""; // 從row 的起點開始 $family = array( "f1","f2","f3" ); $scanner = $client->scannerOpen( $table_name, $start_row , $family ); // $scanner 是一個遞增數字 for open socket // scannerGet() 一次只抓一row,因此要用while迴圈不斷地抓 while (true ){ $get_arr = $client->scannerGet( $scanner ); // get_arr is an array if($get_arr == null) break; // 沒有回傳值代表已經沒有資料可抓,跳脫此無限迴圈 foreach ( $get_arr as $TRowResult ){ // $TRowResult = TRowResult echo (" row = {$TRowResult->row} ; <br> "); $column = $TRowResult->columns; foreach ($column as $family_column=>$Tcell){ echo ("family:column = $family_column "); // $family_column = family_column // $Tcell = Tcell echo (" value = {$Tcell->value} "); echo (" timestamp = {$Tcell->timestamp} <br>"); } } } echo( "<br> ----------------- " ); echo( "<br> Scanner finished <br>" ); $client->scannerClose( $scanner ); ?>
補充
TCell
<? class TCell { static $_TSPEC; public $value = null; public $timestamp = null; public function __construct($vals=null) { if (!isset(self::$_TSPEC)) { self::$_TSPEC = array( 1 => array( 'var' => 'value', 'type' => TType::STRING, ), 2 => array( 'var' => 'timestamp', 'type' => TType::I64, ), ); } if (is_array($vals)) { if (isset($vals['value'])) { $this->value = $vals['value']; } if (isset($vals['timestamp'])) { $this->timestamp = $vals['timestamp']; } } } } ?>
TRowResult
<? class TRowResult { static $_TSPEC; public $row = null; public $columns = null; public function __construct($vals=null) { if (!isset(self::$_TSPEC)) { self::$_TSPEC = array( 1 => array( 'var' => 'row', 'type' => TType::STRING, ), 2 => array( 'var' => 'columns', 'type' => TType::MAP, 'ktype' => TType::STRING, 'vtype' => TType::STRUCT, 'key' => array( 'type' => TType::STRING, ), 'val' => array( 'type' => TType::STRUCT, 'class' => 'TCell', ), ), ); } if (is_array($vals)) { if (isset($vals['row'])) { $this->row = $vals['row']; } if (isset($vals['columns'])) { $this->columns = $vals['columns']; } } } } ?>

