Changeset 74 for nutchez-0.1/conf/crawl-urlfilter.txt
- Timestamp:
- Jun 3, 2009, 2:46:36 PM (16 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
nutchez-0.1/conf/crawl-urlfilter.txt
r66 r74 29 29 30 30 # skip image and other suffixes we can't yet parse 31 -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP )$31 -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|swf)$ 32 32 33 33 # skip URLs containing certain characters as probable queries, etc.
Note: See TracChangeset
for help on using the changeset viewer.