| | 1 | = 2012-01-04 = |
| | 2 | |
| | 3 | == AJAX Crawler / Crawling AJAX == |
| | 4 | |
| | 5 | * [wiki:jazz/10-10-17 2010-10-17] |
| | 6 | * <參考> [http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178 Crawling AJAX] |
| | 7 | {{{ |
| | 8 | Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of |
| | 9 | describing the "event-driven" approach to web crawling. |
| | 10 | |
| | 11 | It has following three key components |
| | 12 | |
| | 13 | 1. Javascript analysis and interpretation with linking to Ajax |
| | 14 | 2. DOM event handling and dispatching |
| | 15 | 3. Dynamic DOM content extraction |
| | 16 | |
| | 17 | The easiest way to implement an AJAX-enabled, event-driven crawler is to use Watir and |
| | 18 | Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract |
| | 19 | page data after it has processed any Javascript. |
| | 20 | }}} |
| | 21 | * 可以用的工具包括基於 Ruby 可以控制 IE 的 [http://watir.com/ Watir],跟可以用 GET/PUT 方式控制 Firefox 的 [http://simile.mit.edu/wiki/Crowbar Crowbar],兩個的授權都是 BSD。 |
| | 22 | * [http://code.google.com/intl/zh-TW/web/ajaxcrawling/ Making AJAX Applications Crawlable] - Google 提出一個應變標準(Specification)來讓 AJAX 應用程式或網頁可以被搜尋得到。 |
| | 23 | * [http://crawljax.com/ crawljax] - 用 Java 寫的 AJAX Crawler ,[http://crawljax.com/documentation/publications/ 有很多論文發表] |
| | 24 | * http://watij.com/ - Watij – Web Application Testing in Java |
| | 25 | * http://htmlunit.sourceforge.net/ - HtmlUnit is a "GUI-Less browser for Java programs" |