wiki:jazz/10-08-04
close Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": libsvn_fs_util-1.so.1: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.

Version 1 (modified by jazz, 15 years ago) (diff)

--

2010-08-04

Hadoop : Steaming

  • [範例] 使用 gzip 當作輸入格式
    • 目前只支援三種壓縮格式,詳org.apache.hadoop.io.compress.CompressionCodec
      hadoop dfs -rmr $4
      hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
      -mapper $1 -reducer $2 -input $3/* -output
       $4 -file $1 -file $2 -jobconf mapred.job.name="$5"   -jobconf
      stream.recordreader.compression=gzip \
      -jobconf mapred.output.compress=true \
      -jobconf
      mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec