Wednesday, July 22, 2009
Cruxlux Small World Private Beta
Thank you for all the positive responses so far for our latest release, Cruxlux Small World. The service extends our bookmarklet to tell you how you personally relate to what you are reading. If you want to take a glimpse at the magical small world follow us on twitter for new invitation codes as they are released.
Tuesday, January 20, 2009
Announcing Cruxlux Annotation
We are pleased to announce the release of Cruxlux Annotation. This product builds on the theme of shining the light on the crux of things.

The Idea
Annotation stems from the human tendency to underline, highlight, or write in between the lines of all the sentences we read on paper. We wanted to take this online by allowing people to comment on sentences in a body of text, be it a poem, an article, a blog post, a novella, a song, or an essay. Not only that, comments online can be richer than what you can do on paper because you can link to video, images, or any rich supplementary content that is all over the web. People can then vote on the best annotations. The goal is to build a body of knowledge that is grander than the document itself.
Announcing Cruxlux Annotation
Labels:
W e
Wednesday, October 15, 2008
Build a Search Engine with 10 Open Source Software Projects
Developing a large software system is always about standing on the shoulders of giants and dodging the proverbial reinvention of the wheel. By using the following open source projects in our search engine effort, we were able to do both. This is the first in a series of technical articles about the Cruxlux architecture in which we will explore how we use exceptional free technologies. Cruxlux is not a whole-web search engine like Google or MSN, but many of the development hurdles are common across any type of search. Without further adieu, and in no particular order of importance, here is the list.
Operating System: Linux
In choosing a foundation OS, there really wasn’t a choice for us. Linux provides a solid architecture and a lot of the open source mindshare is constantly improving it. It is comfortable to develop on with powerful tools such as wmii, vim, gcc, and valgrind as well as simple to use package managers such as synaptic. In an added bonus, there are a wide variety of heavily tested images for Amazon EC2. If you are wondering what distros, we use a mix of Ubuntu and Gentoo.
Database: MySQL
Every web service needs some kind of datastore, so we went with our favorite open source database: MySQL. It has widely tested client libraries in many languages, and a lot of exciting features and development going into it. Guha had a stellar experience with it in Folding@Home, which generated terabytes of data. We are especially interested in tracking Drizzle, given that it will be specifically tailored toward high levels of concurrency and cloud computing. MySQL is used to store metadata that our backend leverages, our user data, as well as posts in our debate infrastructure.
HTTP Client: Curl
Every information extraction system needs a powerful way to grab data from the net. Curl stands the test of time as the best HTTP networking client library out there. It provides us with very high performance, highly concurrent crawls that can easily fill our bandwidth pipe with fresh content. Its threading support is very clean, and the event based support is something that may yield ludicrous speed. We are looking forward to exploring that more.
General Purpose Library: Boost
Boost is frankly amazing. The library is well thought out and the API usage is consistent throughout, so you don’t have to make a mental context switch every time you use a different Boost tool. Inside of Boost alone, we use Build, Date Time, Filesystem, Math, Pool, Regex, Serialization, Smart Ptr, String, Test, and, last but not least, bjam to build the whole shebang.
Networking Services: Libevent
Event based programming has intrigued everyone with its scalability as well as how it allows developers to achieve concurrency while thinking in a single threaded mindset. See the C10K problem. We utilize libevent in our heavily service oriented architecture. Once you go non blocking you never go back.
Hash Table: Google Sparse/Dense Hash
Don't leave home without your trusty hash table. Thankfully, Google released some of its well guarded secrets out to the world because this library is really imperative to anyone wanting to deal with a lot of data in memory. We use this not only to cache certain data used by our web server, but also during calculation phases by our backend. Couple it with Paul Hsieh’s speedy string hash, and you can have an elegant way to quickly address a great deal of websites on a single machine. If any of you has any other hash functions that you know of, please let us know in the comments. Murmur Hash looks fun to play with, and we will explore it in a later article when we look into hash functions in the domain of URLs.
Indexing Engine: Sphinx
Once you get data into a system, you gotta have a way to get it out again! We tried a lot of different indexing systems: CLucene, SOLR, Mysql Fulltext, but Sphinx won out because of the speed of indexing, powerful delta indexing, and a lightweight, scalable server. Each of them have their own strengths, but Sphinx fit our bill the best. Whatever you are looking for always seems to be at your fingertips in the documentation, and the community is top notch.
Web Server: Nginx
To continue theme of using fast, lightweight, Russian open source projects, we went with nginx for our web server. It’s solid proxying abilities let us use different app servers on the mid tier for various tasks, whether it be mongrel, merb, or a custom C search server.
Web Framework: Ruby/Rails
Ruby has a vast amount of libraries and has been a very powerful tool in prototyping a lot our algorithms and search features we research before porting them to C/C++. Ruby golf has become one of our hobbies. We use Rails throughout our webapp to provide a lot of the structure and additional features around search.
Javascript Libraries: Jquery, Prototype, Scriptaculous
What is core to our design is to provide quick access to mass amounts of data, then use the power of modern clients to process and filter that data inside the browser. It at least gives us a chance of scaling. Javascript, extended via jquery, Prototype, and Scriptaculous, gives us the tools necessary to create a unique interface, that will get a nice face lift over the next few weeks. Jquery plugins in our posse include sparkline, Cycle Lite, and marquee, for example.
So that’s the short list, with plenty of other open source projects sprinkled in there that we will address in future articles. Is there a project we should be using in this mix? Feel free to let us know in the comments. We’ll be doing a series of posts in the coming weeks that focus in how we use these different projects in our own service, and we hope you find them useful in your own pursuits.

Operating System: Linux
In choosing a foundation OS, there really wasn’t a choice for us. Linux provides a solid architecture and a lot of the open source mindshare is constantly improving it. It is comfortable to develop on with powerful tools such as wmii, vim, gcc, and valgrind as well as simple to use package managers such as synaptic. In an added bonus, there are a wide variety of heavily tested images for Amazon EC2. If you are wondering what distros, we use a mix of Ubuntu and Gentoo.Database: MySQL
Every web service needs some kind of datastore, so we went with our favorite open source database: MySQL. It has widely tested client libraries in many languages, and a lot of exciting features and development going into it. Guha had a stellar experience with it in Folding@Home, which generated terabytes of data. We are especially interested in tracking Drizzle, given that it will be specifically tailored toward high levels of concurrency and cloud computing. MySQL is used to store metadata that our backend leverages, our user data, as well as posts in our debate infrastructure.HTTP Client: Curl
General Purpose Library: Boost
Boost is frankly amazing. The library is well thought out and the API usage is consistent throughout, so you don’t have to make a mental context switch every time you use a different Boost tool. Inside of Boost alone, we use Build, Date Time, Filesystem, Math, Pool, Regex, Serialization, Smart Ptr, String, Test, and, last but not least, bjam to build the whole shebang.Networking Services: Libevent
Event based programming has intrigued everyone with its scalability as well as how it allows developers to achieve concurrency while thinking in a single threaded mindset. See the C10K problem. We utilize libevent in our heavily service oriented architecture. Once you go non blocking you never go back.Hash Table: Google Sparse/Dense Hash
Don't leave home without your trusty hash table. Thankfully, Google released some of its well guarded secrets out to the world because this library is really imperative to anyone wanting to deal with a lot of data in memory. We use this not only to cache certain data used by our web server, but also during calculation phases by our backend. Couple it with Paul Hsieh’s speedy string hash, and you can have an elegant way to quickly address a great deal of websites on a single machine. If any of you has any other hash functions that you know of, please let us know in the comments. Murmur Hash looks fun to play with, and we will explore it in a later article when we look into hash functions in the domain of URLs.Indexing Engine: Sphinx
Once you get data into a system, you gotta have a way to get it out again! We tried a lot of different indexing systems: CLucene, SOLR, Mysql Fulltext, but Sphinx won out because of the speed of indexing, powerful delta indexing, and a lightweight, scalable server. Each of them have their own strengths, but Sphinx fit our bill the best. Whatever you are looking for always seems to be at your fingertips in the documentation, and the community is top notch.Web Server: Nginx
To continue theme of using fast, lightweight, Russian open source projects, we went with nginx for our web server. It’s solid proxying abilities let us use different app servers on the mid tier for various tasks, whether it be mongrel, merb, or a custom C search server.Web Framework: Ruby/Rails
Ruby has a vast amount of libraries and has been a very powerful tool in prototyping a lot our algorithms and search features we research before porting them to C/C++. Ruby golf has become one of our hobbies. We use Rails throughout our webapp to provide a lot of the structure and additional features around search.Javascript Libraries: Jquery, Prototype, Scriptaculous
What is core to our design is to provide quick access to mass amounts of data, then use the power of modern clients to process and filter that data inside the browser. It at least gives us a chance of scaling. Javascript, extended via jquery, Prototype, and Scriptaculous, gives us the tools necessary to create a unique interface, that will get a nice face lift over the next few weeks. Jquery plugins in our posse include sparkline, Cycle Lite, and marquee, for example.So that’s the short list, with plenty of other open source projects sprinkled in there that we will address in future articles. Is there a project we should be using in this mix? Feel free to let us know in the comments. We’ll be doing a series of posts in the coming weeks that focus in how we use these different projects in our own service, and we hope you find them useful in your own pursuits.
Build a Search Engine with 10 Open Source Software Projects
Labels:
open source,
software
Tuesday, October 14, 2008
Updated look
Hope you like the updated look of the front page, and the improved navigation. There are many more changes in the works!

Also, note that we've moved a bit in the direction of having more clusters. So whereas before you may have seen all political (or sports) sites in one cluster, with positioning within that cluster telling you have interrelated they were, now you're more likely to see multiple clusters for a category. It appears to be simpler to understand this way for first time users, but let us know if you like the old way.
Updated look
Labels:
interface
Sunday, August 17, 2008
Cruxlux Search Launch
We’re happy to announce the launch of a powerful new search feature on our home page! Search for any topic and get back not only what Cruxlux users are talking about but also what blogs across the Internet are saying about it, graphically and intuitively presented.
Try it out!
Search for something that interests you, or just click on one of the spotlighted topics.
What do the boxes mean?
Each of the boxes in the map corresponds to a given site, whose name you can see in the top left corner of the box. The box also shows the title of an article that site has recently posted related to the terms you searched for. Click on a box to view more details on the article (in some cases the first few sentences) and a link to it, based on the site’s RSS feed. The closer together two boxes are in the map, the more related those two sites are. For example, let’s say your query is related to something political that’s been discussed a lot in the blogosphere recently. You’ll see in the results that sites with a liberal perspective will tend to cluster together. Conservative sites will not be as close to them, but will be closer to them than sites that don’t have a political view at all, and so on. All the blog relationships are computed automatically. Our algorithm seems to perform well, but it’s not perfect and sometimes it may have to “take a guess” when it doesn’t know much about a particular site, so there may be some queries where some boxes seem misplaced.
Focus your search by using the “Sites Like” field
You can focus your search by specifying a web site or blog that’s representative of the news sources you interested in hearing from. For example, for a hollywood search, focus your search by specifiying your favorite gossip site.
What do the stacks mean?
Sometimes a box has more than one article from that site. You can cycle through the articles in a stack by first clicking on the stack and then using the links.
What are you waiting for?
Please enjoy using this new feature to gather a variety of current opinions on any topic of interest. And also, we hope you’ll jump into some discussions related to the topic, listed on the left side of the home page (or start a new one!) and help everyone get to the crux of any issue.

Try it out!
Search for something that interests you, or just click on one of the spotlighted topics.
What do the boxes mean?
Each of the boxes in the map corresponds to a given site, whose name you can see in the top left corner of the box. The box also shows the title of an article that site has recently posted related to the terms you searched for. Click on a box to view more details on the article (in some cases the first few sentences) and a link to it, based on the site’s RSS feed. The closer together two boxes are in the map, the more related those two sites are. For example, let’s say your query is related to something political that’s been discussed a lot in the blogosphere recently. You’ll see in the results that sites with a liberal perspective will tend to cluster together. Conservative sites will not be as close to them, but will be closer to them than sites that don’t have a political view at all, and so on. All the blog relationships are computed automatically. Our algorithm seems to perform well, but it’s not perfect and sometimes it may have to “take a guess” when it doesn’t know much about a particular site, so there may be some queries where some boxes seem misplaced.
Focus your search by using the “Sites Like” field
You can focus your search by specifying a web site or blog that’s representative of the news sources you interested in hearing from. For example, for a hollywood search, focus your search by specifiying your favorite gossip site.
What do the stacks mean?
Sometimes a box has more than one article from that site. You can cycle through the articles in a stack by first clicking on the stack and then using the links.
What are you waiting for?
Please enjoy using this new feature to gather a variety of current opinions on any topic of interest. And also, we hope you’ll jump into some discussions related to the topic, listed on the left side of the home page (or start a new one!) and help everyone get to the crux of any issue.
Cruxlux Search Launch
Labels:
announcements
Subscribe to:
Posts (Atom)