| Version 25 (modified by jazz, 14 years ago) (diff) | 
|---|
雲端運算技術與生物資訊應用
Cloud Computing Technologies and its Bioinformatics Applications
Cloud Computing Technologies and its Bioinformatics Applications
課程資訊 Course Info.
- 上課時間: 2011/05/09 (一) ~ 2011/06/06 (一) 15:30 ~ 17:20
 - Date and Time: 15:30 to 17:20, from 9 May 2011 to 6 June 2011, every Monday
 - 上課地點: 陽明大學 圖資大樓 電腦教室 R401
 - Location: R401 PC Room, Library and Information Building, Natioanl Yang-Ming University
 - 系所公告:http://bmi.ym.edu.tw/wp/?p=4532
 
線上討論 Web IRC chatroom
課程大綱 Course Outline
|  時段 Date  |  分類 Section  | 課程內容 Topics |  投影片 Slides  |  實作 Hands-On  |  補充資料 Notes  | 
| 05/09 | Introduction |   - 高速運算於生物資訊之應用  - HPC for Bioinformatics - PC Cluster 101 - 平行運算程式的種類 - Parallel Programming Model  |  part-1 part-2  | 實作一 |  Intel 談多核心的重要性 關於 Amdahl’s Law  | 
| 05/16 | Introduction |  - Cloud Computing Architecture  - Introduction to Hadoop - Hadoop Distributed File System - Hands-on: HDFS commands  |  part-3 part-4  |  實作二 實作三  | Hadoop 單機安裝(for Windows XP) | 
| 05/23 | ---- | 因校園連外網路品質不穩,順延一週授課 | |||
| 05/30 | Hands-On |  - MapReduce Algorithm  - Hands-on: Running MapReduce Examples - Hadoop 相關專案簡介 - Introduction to Hadoop Ecosystem  |  part-4 part-5  |  實作四 實作五 實作六  | 不同語言的 MapReduce 實作 | 
| 06/06 | ---- | 端午節,順延一週授課 | |||
| 06/13 | Hands-On |   - 大型網站架構與 HBase 分散式資料庫  - Large Scale Website and HBase distributed datastore - Pig 簡介 - Introduction to Pig  |  part-5 part-6  |  實作七 實作八  | 用 hadoop streaming 跑 velvet? | 
| 06/20 | Hands-On |  - Bioinformatics Apps using Hadoop  - Hands-on:  | 
公用環境 Public Cluster
- http://hadoop.nchc.org.tw - 實驗叢集入口網站
 - http://hadoop.nchc.org.tw/ganglia - 實驗叢集負載狀態
 - http://hadoop.nchc.org.tw:50030 - 實驗叢集正在執行與執行完畢的任務
 - http://hadoop.nchc.org.tw:50070 - 實驗叢集的硬碟空間狀態
 - http://hadoop.nchc.org.tw/hadoop-doc - Hadoop 相關說明文件
 - http://hadoop.nchc.org.tw/hadoop-doc/api/index.html - Hadoop 0.20.2 javadoc 文件
 - http://forum.hadoop.tw - 台灣 Hadoop 使用者討論區
 
作業一 Homework 1
- 題目:請嘗試將 實作五 的 WordCount2.java 改成逆向索引(Reverse Index) ReverseIndex.java。使 ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作五最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false),
 - Please try to modified WordCount2.java downloaded from Lab5. Rename it to ReverseIndex.java. Let ReverseIndex output as "Keyword <TAB> filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive.
 - 參考步驟:
Here is the reference steps:$ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java $ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼 $ mkdir -p MyJava3 $ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java $ jar -cvf reverseindex.jar -C MyJava3 . $ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt $ hadoop fs -cat lab6_out4/part-00000
 - 參考結果應該為:(路徑不限)
The reference result should be as following:(no limitation for the format of "path")and input2 cloud input1,input2 course input1,input2,input2 enjoy input2 i input1,input2 like input1,input2 nctu input1,input2 this input2 we input2
 - 繳交期限:2011年6月13日(一) 上午 11:59
 - Due date: 11:59 AM, Monday, June 13th, Year 2011
 - 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。
 - Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw
 - 提示:
Hint:- 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, IntWritable) 改成 (Text, Text)
 - Replace (Key,Value) pair from (Text, IntWritable) to (Text, Text)
 
 - 加分題:(Extra)
- 試將出現次數統計加入結果,亦即參考結果如下:
Try to add count of each file in the result, i.e. The reference result should be as following:and input2(1) cloud input1(1),input2(1) course input1(1),input2(2) enjoy input2(1) i input1(1),input2(1) like input1(1),input2(1) nctu input1(1),input2(1) this input2(1) we input2(1)
 
 - 試將出現次數統計加入結果,亦即參考結果如下:
 - 配分比例:
- 標準題原始碼 Source Code:60%
 - 報告 Report :20%
- 參考內容入下:Reference Items should be shown in your report
 - 封面 Cover : 姓名、學號 ( Your Name and ID )
 - 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw)
 - 執行結果 The result of your program
 
 - 加分題:20%
 
 
學員背景
| 生物 | C | Perl | R | Java | |
| qulqul | O | X | O | O | O | 
| @ne_ | X | O | O | O | O | 
| vincentt | X | O | X | X | O | 
| Rodney_ | X | O | O | O | X | 
| sunny | X | X | X | X | O | 
| wally | O | X | O | X | X | 
| clair | X | X | O | X | X | 
| Jason2 | O | O | O | O | X | 
| Angela | O | X | O | X | X | 
| chenf | X | O | X | X | X | 
| Yen-Kuang | O | X | O | X | O | 
| Eric | X | O | O | O | X | 
| Tony-Chang | O | X | O | O | O | 
| lcyang | ? | O | X | X | X | 
| Microarray | ? | O | O | O | X | 
| O | 6 | 8 | 11 | 7 | 6 | 
| X | 9 | 7 | 4 | 8 | 9 | 
補充:資料整合(Data Integration)與資料倉儲(Datawarehouse)
Attachments (8)
- part-1.pdf (1.4 MB) - added by jazz 15 years ago.
 - part-2.pdf (186.5 KB) - added by jazz 15 years ago.
 - part-3.pdf (2.5 MB) - added by jazz 15 years ago.
 - part-4.pdf (2.4 MB) - added by jazz 15 years ago.
 - part-5.pdf (1.3 MB) - added by jazz 15 years ago.
 - part-6.pdf (1.2 MB) - added by jazz 15 years ago.
 - 11-05-09_bio_cloud.log (3.8 KB) - added by jazz 15 years ago.
 - part-7.pdf (2.4 MB) - added by jazz 14 years ago.
 


