wiki:HadoopWorkshop

國網中心邀請演講資訊

Hadoop 與雲端運算

雲端運算為 2008 年重大 IT 熱門議題,而 Hadoop 為 Apache Software Foundation 所開發之自由軟體,目前已廣泛應用於 Amazon 與 Yahoo! 等雲端運算服務提供者的格網架構之上。

Devaraj Das 是 Yahoo! Bangalore Grid Computing Group 的 Engineering Manager,亦為 Apache Committer,對於 Hadoop 有多年的開發經驗。此外, Yahoo! Bangalore Grid Computing Group 著重於如何打造足以處理 Peta-bytes 資料,由數千台主機組成的格網架構,將帶給中心從事格網相關研究的同仁來自於產業界的開發經驗分享。

講者簡歷

Devaraj Das (ddas@…) is the Engineering Manager of the Grid Computing group at Yahoo! Bangalore. He graduated with a Masters degree in Computer Science from Indian Institute of Science, Bangalore. Prior to Yahoo!, Devaraj was with HP. Devaraj is an Apache committer.

The Grid Computing group at Yahoo! Bangalore focuses on Grid frameworks that scale to thousands of machines and handle peta-bytes of data. The group is especially involved in the development of the Open Source Hadoop platform and its deployment within Yahoo!.

2008-11-04 上午

  • 時間:11/04 星期二 上午 11:00 - 12:30
  • 地點:國家高速網路與計算中心 北群多媒體教室(新竹市研發六路七號)
  • 講員:Devaraj Das,Yahoo! Bangalore Grid Computing Group 的 Engineering Manager,亦為 Apache Committer
  • 主題:Introduction to Hadoop and Cloud Computing

演講摘要

Hadoop (http://hadoop.apache.org/), an open source volunteer project under the Apache Software Foundation, is a framework for running applications on large clusters built of commodity hardware. It lets one easily write and run applications that process vast amounts of data (terabytes to petabytes).

Hadoop implements a computational paradigm named Map-Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Yahoo! is one of the main contributors to Hadoop and uses it extensively to manage large clusters of machines.

I hope to engage the open-source community on Hadoop and encourage participation in its development. I will present an overview of Hadoop and its architecture with a focus on the Map-Reduce component. I will describe the engineering challenges and briefly talk about how Hadoop clusters are used in Yahoo!.

2008-11-04 下午

  • 時間:11/04 星期二 下午 14:00 - 17:00
  • 地點:國家高速網路與計算中心 北群多媒體教室(新竹市研發六路七號)
  • 講員:Devaraj Das,Yahoo! Bangalore Grid Computing Group 的 Engineering Manager,亦為 Apache Committer
  • 主題:Hadoop Hands-on Labs (1)
    • Basics of DFS commands
    • How to develop MapReduce program using Hadoop?

實作練習摘要

Hadoop 主要由 Hadoop Distributed File System(HDFS)、MapReduce API 與 Job 管理三個部份組成,缺一不可。本實作將帶領大家體驗一次如何運用 Hadoop 所提供的 MapReduce API 來撰寫簡單的 MapReduce 程式。

與會須知

欲參加 Hands-on Lab 的同仁請自行攜帶筆記型電腦,並先行下載 JDK6 、 ant 與 Hadoop 1.8 版本。

2008-11-04 下午

  • 時間:11/04 星期二 下午 14:00 - 17:00
  • 地點:國家高速網路與計算中心 北群多媒體教室(新竹市研發六路七號)
  • 講員:Devaraj Das,Yahoo! Bangalore Grid Computing Group 的 Engineering Manager,亦為 Apache Committer
  • 主題:Hadoop Hands-on Labs (2)
    • Distributed Setup of Hadoop
    • Any specific topics that attendees from the first presentation want covered

實作練習摘要

Hadoop 主要由 Hadoop Distributed File System(HDFS)、MapReduce API 與 Job 管理三個部份組成,缺一不可。而 HDFS 最主要的基本假設是『主機一定會壞、硬碟一定會壞』,因此怎麼架設一個大型的 HDFS 以提升整體系統的可靠度是非常重要的。本實作將帶領大家體驗一次 Hadoop 的大量部署,並討論第一場 Hands-on Lab 未能完整回覆的主題。

與會須知

欲參加 Hands-on Lab 的同仁請自行攜帶筆記型電腦,並先行下載 JDK6 、 ant 與 Hadoop 1.8 版本。

Last modified 15 years ago Last modified on Nov 3, 2008, 11:39:01 AM

Attachments (2)