Search Extractors

Use search extractors to add useful indexes to Confluence's search to find pages that meet specific criteria. Some examples are:

  • Search for all home pages

  • Search for pages with large attachments

  • Search all pages for a specific label

  • Search for all pages last modified by a specific user

  • Search for pages created in a specific year

Read more about extractors in the Confluence Extractor Module documentation.

To create a search extractor, follow these steps: 

  1. Navigate to Administration > Search Extractors > Create Search Extractor
  2. Enter a Name for your search extractor. 
  3. Enter an optional Note to describe your search extractor. 
  4. Enter a script by doing one of the following: 
    • For existing code, enter a file name in the Script File field.
    • For new code, enter it in the Inline Script field.

      Select Show Snippets to view examples of code.

  5. Select Add to save the new search extractor.

Example Extractors

All of the following examples are available on Administration > Search Extractors > Create Search Extractor > Show Snippets.

Search Page By Year Extractor

The extractor returns all the pages created in a year.

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    Calendar calendar = Calendar.getInstance() // <1>
    calendar.setTime(page.getCreationDate()) // <2>
    String pageYear = calendar.get(Calendar.YEAR) as String // <3>
    document.add(new StringField("year", pageYear, Field.Store.YES)) // <4>
}

Line 7: Create an instance of Calendar.

Line 8: Set the time of our calendar to match the page’s creation date.

Line 9: Get the year the page was created.

Line 10: Store the year as a field in the Lucene document.

The following screen shot shows an example search result for year : 2017:

Pages With Label Extractor

This extractor returns all the pages that contain the finance label.

import com.atlassian.confluence.labels.Label
import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = (Page) searchable
    String labelText = "finance"
    Label myLabel = new Label(labelText) // <1>
    List<Label> labels = page.getLabels()
    if (labels.contains(myLabel)) {
        document.add(new StringField("label", labelText, Field.Store.YES)) // <2>
    }
}


Line 9: Create a new Label, "finance".

Line 12: If the page has the "finance" label, store that as a field in the Lucene document.

The search string for this extractor is label : finance.

Pages With Attachments Size Extractor

This extractor helps to search all the pages with attachment more than 20 meg in size.

import com.atlassian.confluence.pages.Attachment
import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    if (page.getAttachments()) {
        long twenty_meg = 20 * 1024 * 1024 // <1>
        long fileSize = page.getAttachments().sum { Attachment attachment -> attachment.getFileSize() } as long // <2>

        if (fileSize && fileSize > twenty_meg) {
            document.add(new StringField("attachment", "20", Field.Store.YES)) // <3>
        }
    }
}

Line 9: Calculate 20 megabytes as bytes.

Line 10: Get all attachments for a page and get total size in bytes.

Line 13: If the total attachment size is large enough, store attachment with value 20 for the page.

The search string for this extractor is attachment : 20.

Page Last Modified By Extractor

This extractor finds all the pages that were last modified by a specific user.

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    String name = page.getLastModifier().getName() // <1>
    document.add(new StringField("modifier", name, Field.Store.YES)) // <2>
}

Line 7: Get the name of the last modifier.

Line 8: Store the modifier field with user name as its value.

Use the Confluence username to do a search. For an example if user name is "rfranco" then the search string will be modifier : rfranco.