Wrble: a community-powered and community-edited search engine
We're able to function.... github, Wrble miners, edits, moderators, etc.
But one edit is just one edit, isn't it? No! We use machine-learning to extend your single edit to similar queries and results. One edit can modify thousands of queries and results.
Why create a custom index?
Many large search engines don't use their own index. Yahoo, DuckDuckGo, Qwant, FindX, and many others all use Bing. Wrble creates it's own index and ranking algorithms as there would be no other way to truly power the editable.
Why another search engine?
Microsoft and Google control about 99.91% of all search traffic: http://gs.statcounter.com/search-engine-market-share/all/united-states-of-america (Bing, DuckDuckGo, Yahoo are all the same underlying index and MSN is owned by Microsoft).
It's time for change.
Wrble is the first major attempt to create an independent index updated at the frequency required to serve relevant and timely traffic.
Key-Value Indexing vs MapReduce
We use a key-value structured index which provides some generational improvements over the traditional MapReduce (Google, Bing, etc) implementations.
MapReduce takes a (very) large input set and re-indexes the entire thing into relatively static index files. These files are then distributed to all front-end search instances. As these files can get large there might be several versions of them which are queried simultaneously and re-assembled into a final result set for a given query. These files are incredibly compact and changing them (re-scoring, removing parts, etc) is relatively expensive and hard to keep in sync so another "live updated" index that's much smaller can be added on to take in the latest news and updates and provide result-set augmentation.
At the end of the day, re-scoring documents in a MapReduce index is quite expensive as a full re-index is expensive. Wrble uses a key-value index shared between all indexers and searchers. This live single-source of truth powers all queries without any reconstruction needed. Re-scoring a document based on an edit from one of our users or admins can be done incrementally with very low cost. This enables us to be the most reactive and "live" search engine out there with constantly updated content and reactivity to changes in overall relevance.