Changes between Initial Version and Version 1 of III140614/Lab11


Ignore:
Timestamp:
Jun 15, 2014, 1:08:54 PM (10 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • III140614/Lab11

    v1 v1  
     1[[PageOutline]]
     2
     3◢ <[wiki:III140614/Lab10 實作十]> | <[wiki:III140614 回課程大綱]> ▲ | <[wiki:III140614/Lab12 實作十二]> ◣
     4
     5= 實作十一 Lab 11 =
     6
     7{{{
     8#!html
     9<div style="text-align: center;"><big style="font-weight: bold;"><big>練習豬的拉丁語<br/>Pig Latin in Practice</big></big></div>
     10}}}
     11
     12{{{
     13#!text
     14以下練習,請連線至 hadoop.3du.me 操作。底下的 userXX 等於您的用戶名稱。
     15}}}
     16
     17== Aggregation (Local Mode) ==
     18
     19{{{
     20~$ wget http://www.hadoop.tw/excite-small.log
     21~$ pig -x local
     22grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     23grunt> grpd = GROUP log BY user;
     24grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     25grunt> STORE cntd INTO 'lab8_out1';
     26grunt> quit
     27~$ head lab8_out1/part-*
     28}}}
     29
     30== Filter (Local Mode) ==
     31
     32{{{
     33~$ pig -x local
     34grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     35grunt> grpd = GROUP log BY user;
     36grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     37grunt> fltrd = FILTER cntd BY cnt > 50;
     38grunt> STORE fltrd INTO 'lab8_out2';
     39grunt> quit
     40~$ head lab8_out2/part-*
     41}}}
     42
     43== Sorting (Local Mode) ==
     44
     45{{{
     46~$ pig -x local
     47grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     48grunt> grpd = GROUP log BY user;
     49grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     50grunt> fltrd = FILTER cntd BY cnt > 50;
     51grunt> srtd = ORDER fltrd BY cnt;
     52grunt> STORE srtd INTO 'lab8_out3';
     53grunt> quit
     54~$ head lab8_out3/part-*
     55}}}
     56
     57== Connect Pig to Hadoop (Full Distributed Mode) ==
     58
     59{{{
     60~$ hadoop fs -put excite-small.log .
     61~$ pig
     62grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     63grunt> grpd = GROUP log BY user;
     64grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     65grunt> STORE cntd INTO 'lab8_out1';
     66grunt> quit
     67~$ hadoop fs -cat lab8_out1/part-00000
     68}}}