Overview
SIREn is a Lucene/Solr extension for efficient schemaless semi-structured full-text search. SIREn is not a complete application by itself, but rather a code library and API that can easily be used to create a full-featured semi-structured search engine.
Efficient, large scale handling of semi-structured data is an increasingly important issue in information search scenarios on the web as well as in the enterprise.
While Lucene has long offered these capabilities, its native capabilities are not intended for collections of schemaless semi-structured documents, e.g., collections where the schema varies across documents or collections with a complex schema and a complex nested structure. For this reason we have developed SIREn, a Lucene/Solr plugin to overcome these shortcomings and efficiently index and query complex JSON documents with arbitrary schema.
For its features, SIREn can be seen as being halfway between Solr (of which it offers all the search features) and MongoDB (given it can index arbitrary JSON documents).
Learn more about SIREn
You can find a detailed description of SIREn's architecture and API in the Java Documentation of the project. You can also watch a recent talk about SIREn at the Lucene Revolution 2013 conference:
Finally, you can read one of our scientific publications for details about the data model and algorithms behind SIREn.
Reference
If you are using SIREn for your scientific work, please cite the following article:
Renaud Delbru, Stephane Campinas, Giovanni Tummarello, Searching web data: An entity retrieval and high-performance indexing model, In Web Semantics: Science, Services and Agents on the World Wide Web, ISSN 1570-8268, 10.1016/j.websem.2011.04.004.
Community
Mailing List
SIREn-User is a mailing list that you can join to seek help, discuss about possible improvements, etc.
License
SIREn is open-source under the Apache 2 License.
Issues
You can report issues at the GitHub issue tracker.
Acknowledgements
The SIREn project is based upon works supported by:
- the European FP7 Okkam (GA 215032) and LOD2 (257943) projects,
- the SFI funded project Lion2 under Grant No. SFI/08/CE/I1380,
- the Irish Research Council for Science, Engineering and Technology.