| 14 | | Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a |
| 15 | | practical data center of that scale, it is a common case that I/Obound |
| 16 | | jobs and CPU-bound jobs, which demand different |
| 17 | | resources, run simultaneously in the same cluster. In the |
| 18 | | MapReduce framework, parallelization of these two kinds of job |
| 19 | | has not been concerned. In this paper, we give a new view of the |
| 20 | | MapReduce model, and classify the MapReduce workloads into |
| 21 | | three categories based on their CPU and I/O utilization. With |
| 22 | | workload classification, we design a new dynamic MapReduce |
| 23 | | workload predict mechanism, MR-Predict, which detects the |
| 24 | | workload type on the fly. We propose a Triple-Queue Scheduler |
| 25 | | based on the MR-Predict mechanism. The Triple-Queue |
| 26 | | scheduler could improve the usage of both CPU and disk I/O |
| 27 | | resources under heterogeneous workloads. And it could improve |
| 28 | | the Hadoop throughput by about 30% under heterogeneous |
| 29 | | workloads. |
| | 16 | = 2. 設計一個高效能的雲端平台 = |
| | 17 | An New Data Parallelism Approach with High Performace Clouds |
| | 18 | * 宣稱設計更為簡化,因此效能較好 |
| | 19 | * 號稱某些case比hadoop 快兩倍 |
| | 21 | = 3. parallel closed cube 演算法 = |
| | 22 | A Parallel Algorithm for Closed Cube Computation |
| | 23 | * parallel closed cube 是個不容易瞭解的演算法,而作者設計了一個能用在MR平台下的parallel closed cube 演算法 |
| | 24 | * 並宣稱實驗結果有得到好處 |
| | 25 | |
| | 26 | = 4. 用雲端運算處理衛星資料 = |
| | 27 | Cloud Computing for Satellite Data Processing on High End Compute Clusters |
| | 28 | * 用高檔設備透過Hadoop處理衛星資料 |
| | 29 | * 此篇數據比較了 有用MapReduce 以及沒用的差別 (作者說程式沒有差很多) |
| | 30 | |
| | 31 | = 5. 一個整合計算與資料管理的系統 = |
| | 32 | Clustera: An Integrated Computation And Data Management System |
| | 33 | * 介紹一個資料管理系統,提供兩個特點 |
| | 34 | * 特點一為有延展性並且有能力於掌控大範圍Job 資料,並用最小的sql查詢語法減少I/O |
| | 35 | * 特點二為用最新的軟體建立區塊,如此可以瞭解在應用伺服器或關連資料庫內的效能、使用率等資料 |
| | 36 | * 最後用 clustera 跟 Hadoop、 condor 比較 |
| | 37 | |
| | 38 | = 6.用sector做高效能資料探勘 = |
| | 39 | Data Mining Using High Performance Data Clouds |
| | 40 | |
| | 41 | = 7.探勘日誌來偵測大範圍的系統問題 = |
| | 42 | Detecting Large-Scale System Problems by Mining Console Logs |
| | 43 | * 透過探勘log檔,來偵查出有可能出現的系統runtime problem |
| | 44 | * 實驗於 Hadoop日誌與DarkStar線上遊戲 |
| | 45 | |
| | 46 | |
| | 47 | = 8. Disco 的實驗論文 = |
| | 48 | |
| | 49 | = 9. Sphere 的論文 = |
| | 50 | |
| | 51 | = 10. 用Hadoop來算使用者習慣= |
| | 52 | Extraction of User Profile Based on the Hadoop |
| | 53 | * 架構與icas很相向,果然是阿路仔 |
| | 54 | * 此篇用hadoop 來找使用者的習慣,其實只有在做map reduce 字數統計而已 |
| | 55 | * 也有畫出單一台與多台hadoop的效能比較,由於他們只有80MB的資料,因此一台最快,三台最慢 |
| | 56 | |
| | 57 | |