Advanced search queries with Spring Data Solr

The Spring Data family of projects provide a great way to interact with underlying data stores in a consistent and well structured fashion. We've been using Spring Data MongoDB and Spring Data Solr extensively.

Among Spring Data's core strengths are the ability to map POJOs to and from their backing store representations without much effort on the part of the developer and a consistent approach to CRUD operations, paging, and sorting.

However, some of these projects don't yet provide access to some of the underlying platform's advanced features. In particular, Spring Data Solr, lacks support for Apache Solr's more advanced relevancy features.

In this article, we'll examine a technique for executing a Solr ExtendedDisMax (eDisMax) query while still benefitting from Spring Data's paging and type converstion facilities.

The Extended DisMax Query Parser (eDisMax) is a robust parser designed to process advanced user input directly. It is built on the original DisMaxQParserPlugin but adds many features. It searches for the query words across multiple fields with different boosts, based on the significance of each field. Additional options let you influence the score based on rules specific to each use case (independent of user input). The DisMax page has more background on the conceptual origins and behavior.

To start, we'll define out interface:

public interface SearchService {  
  /**
   * Returns a page of search documents matching the given terms.
  */
  Page<SearchDocument> search(String terms, Pageable page);
}

From the perspecive of our API users, they simply want to be able to pass in some search terms and have us return a page of documents relvant to their search query.

Page and Pageable in our interface are Spring Data conventions for dealing with page requests and results in a consistent fashion across backing stores. We'll use these here instead of exposing an underlying Solr pagination mechanism to abstract consumers of our API from caring that we're accessing Solr directly in our implementation. Furthermore, once support is available for advanced relevancy features in the Spring Data Solr framework, we can simply rewrite our implentation without impacting our public API.

SearchDocument is simply our POJO representing a document stored in the Solr index. It could be defined as:

@SolrDocument
public class SearchDocument implements Serializable {  
    private static final long serialVersionUID = 1L;
    private @Field String id;
    private @Field String title;
    private @Field String summary;
    // getters and setters omitted for brevity

See the documentation for more information on defining entity beans and for an explantion of the annotations.

Here's our SearchService implementation:

public final class SolrSearchService implements SearchService {

  private final SolrTemplate solrTemplate;

  public SolrSearchService(final SolrTemplate solrTemplate) {
    this.solrTemplate = solrTemplate;
  }

  @Override
  public Page<SearchDocument> search(final String terms, final Pageable page) {

    Assert.notNull(terms, "Terms cannot be null!");
    final SolrQuery query = new SolrQuery(); // 1

    // 2 edismax query setup
    query.set("q", terms);
    query.set("qf", "title^20.0 summary^2.0 author");
    query.set("defType", "edismax");

    // 3 paging
    query.setStart(page.getOffset());
    query.setRows(page.getPageSize());

    try {
        return execute(query, page); // 4
    } catch (SolrServerException ex) {
        throw (solrTemplate.getExceptionTranslator().translateExceptionIfPossible(new RuntimeException(ex))); // 5
    }
  }

  private Page<SearchDocument> execute(final SolrQuery query, final Pageable page) throws SolrServerException {
    final QueryResponse resp = solrTemplate.getSolrServer().query(query);
    final List<SearchDocument> beans = solrTemplate.convertQueryResponseToBeans(resp, SearchDocument.class); // 6
    return new SolrResultPage<SearchDocument>(beans, page, resp.getResults().getNumFound()); // 7
  }
}

Here's what we did in our implementation:
1. Created a native SolrQuery
2. Set up an edismax query
3. Applied the paging request
4. Called execute, returning the result
5. Translated any thrown exception into a Spring Data exception
6. Used SolrTemplate to convert the navtive response to POJOs
7. Created a SolrResultPage to return

However, this can be further simplified using SolrTemplate.execute() method.

public class SolrExtendedDisMaxSearchService implements SearchService {

 private final SolrTemplate solrTemplate;

 public SolrExtendedDisMaxSearchService(final SolrTemplate solrTemplate) {
  this.solrTemplate = solrTemplate;
 }

 @Override
 public Page<SearchDocument> search(final String terms, final Pageable page) {

  Assert.notNull(terms, "Terms cannot be null!");
  final SolrQuery query = new SolrQuery();

  // edismax query setup
  query.set("q", terms);
  query.set(DisMaxParams.QF, "title^20.0 summary^2.5");
  query.set("defType", "edismax");

  // paging
  query.setStart(page.getOffset());
  query.setRows(page.getPageSize());

  return solrTemplate.execute(new SolrCallback<Page<SearchDocument>>() {
   @Override
   public Page<SearchDocument> doInSolr(SolrServer solr) throws SolrServerException, IOException {
       final QueryResponse resp = solr.query(query);
       final List<SearchDocument> beans = solrTemplate.convertQueryResponseToBeans(resp, SearchDocument.class);
       return new SolrResultPage<SearchDocument>(beans, page, resp.getResults().getNumFound());
   }
  });
 }
}

Full source code for this post is available as GitHub Gist

comments powered by Disqus

Software Engineer living in NYC
More Detail →
  • Software Engineering
  • Full Stack
  • Java
  • Spring
  • Linux
  • MongoDB
  • Objective-C
← Back

Recent Posts