<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-5853964016719400932.post1500388700692039740..comments</id><updated>2009-12-10T14:23:57.820-08:00</updated><title type='text'>Comments on The Official Cruxlux Blog: Build a Search Engine with 10 Open Source Software...</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.cruxlux.com/feeds/1500388700692039740/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default'/><link rel='alternate' type='text/html' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html'/><author><name>The Official Cruxlux Blog</name><uri>http://www.blogger.com/profile/09369467704623503662</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>3</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5853964016719400932.post-3665261741992274080</id><published>2009-02-02T17:27:31.526-08:00</published><updated>2009-02-02T17:27:31.526-08:00</updated><title type='text'>Anonymous... You are correct that cURL provides ju...</title><content type='html'>Anonymous... &lt;BR/&gt;&lt;BR/&gt;You are correct that cURL provides just the download mechanism.  We essentially have a global queue that producers enqueue urls to crawl based on how often they are updated and workers that pull from that queue and do the actual download/parse/enqueue of new urls.  As you mentioned, the implementation of this has a lot of interesting hurdles.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/3665261741992274080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/3665261741992274080'/><link rel='alternate' type='text/html' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html?showComment=1233624451526#c3665261741992274080' title=''/><author><name>The Official Cruxlux Blog</name><uri>http://www.blogger.com/profile/09369467704623503662</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='04724908543092897833'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html' ref='tag:blogger.com,1999:blog-5853964016719400932.post-1500388700692039740' source='http://www.blogger.com/feeds/5853964016719400932/posts/default/1500388700692039740' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-5853964016719400932.post-211539715537470617</id><published>2009-02-02T17:20:00.000-08:00</published><updated>2009-02-02T17:20:00.000-08:00</updated><title type='text'>Nice information! The crawler itself, did you writ...</title><content type='html'>Nice information! The crawler itself, did you write it yourself? I know the crawler concept is quite simple, but some implementation details regarding robustness/efficiency/scalability can make it quite interesting...&lt;BR/&gt;&lt;BR/&gt;p.s.: AFAIK cURL provides you just the "download" mechanism, right?</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/211539715537470617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/211539715537470617'/><link rel='alternate' type='text/html' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html?showComment=1233624000000#c211539715537470617' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html' ref='tag:blogger.com,1999:blog-5853964016719400932.post-1500388700692039740' source='http://www.blogger.com/feeds/5853964016719400932/posts/default/1500388700692039740' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-5853964016719400932.post-9047162249164586000</id><published>2008-10-15T09:26:37.362-07:00</published><updated>2008-10-15T09:26:37.362-07:00</updated><title type='text'>intriguing stuff.  libev is also worth looking at ...</title><content type='html'>intriguing stuff.  libev is also worth looking at as an event loop.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/9047162249164586000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5853964016719400932/1500388700692039740/comments/default/9047162249164586000'/><link rel='alternate' type='text/html' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html?showComment=1224087997362#c9047162249164586000' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.cruxlux.com/2008/10/build-search-engine-with-10-open-source.html' ref='tag:blogger.com,1999:blog-5853964016719400932.post-1500388700692039740' source='http://www.blogger.com/feeds/5853964016719400932/posts/default/1500388700692039740' type='text/html'/></entry></feed>