The highlight module contains classes and functions for displaying short excerpts from hit documents in the search results you present to the user, with query terms highlighted.
The highlighting system has four main elements.
See How to create highlighted search result excerpts for more information.
See how to highlight terms in search results.
Yields Fragment objects based on the text and the matched terms.
Parameters: |
|
---|
Yields Fragment objects based on the tokenized text.
Parameters: |
|
---|
Returns True if this fragmenter requires retokenized text.
If this method returns True, the fragmenter’s fragment_tokens method will be called with an iterator of ALL tokens from the text, with the tokens for matched terms having the matched attribute set to True.
If this method returns False, the fragmenter’s fragment_matches method will be called with a LIST of matching tokens.
Doesn’t fragment the token stream. This object just returns the entire entire stream as one “fragment”. This is useful if you want to highlight the entire text.
Note that even if you use the WholeFragmenter, the highlight code will return no fragment if no terms matched in the given field. To return the whole fragment even in that case, call highlights() with minscore=0:
# Query where no terms match in the "text" field
q = query.Term("tag", "new")
r = mysearcher.search(q)
r.fragmenter = highlight.WholeFragmenter()
r.formatter = highlight.UppercaseFormatter()
# Since no terms in the "text" field matched, we get no fragments back
assert r[0].highlights("text") == ""
# If we lower the minimum score to 0, we get a fragment even though it
# has no matching terms
assert r[0].highlights("text", minscore=0) == "This is the text field."
Breaks the text up on sentence end punctuation characters (”.”, ”!”, or ”?”). This object works by looking in the original text for a sentence end as the next character after each token’s ‘endchar’.
When highlighting with this fragmenter, you should use an analyzer that does NOT remove stop words, for example:
sa = StandardAnalyzer(stoplist=None)
Parameters: | maxchars – The maximum number of characters allowed in a fragment. |
---|
Looks for matched terms and aggregates them with their surrounding context.
Parameters: |
|
---|
This is a NON-RETOKENIZING fragmenter. It builds fragments from the positions of the matched terms.
Parameters: |
|
---|
Returns a string in which the matched terms are in UPPERCASE.
Parameters: | between – the text to add between fragments. |
---|
Returns a string containing HTML formatting around the matched terms.
This formatter wraps matched terms in an HTML element with two class names. The first class name (set with the constructor argument classname) is the same for each match. The second class name (set with the constructor argument termclass is different depending on which term matched. This allows you to give different formatting (for example, different background colors) to the different terms in the excerpt.
>>> hf = HtmlFormatter(tagname="span", classname="match", termclass="term")
>>> hf(mytext, myfragments)
"The <span class="match term0">template</span> <span class="match term1">geometry</span> is..."
This object maintains a dictionary mapping terms to HTML class names (e.g. term0 and term1 above), so that multiple excerpts will use the same class for the same term. If you want to re-use the same HtmlFormatter object with different searches, you should call HtmlFormatter.clear() between searches to clear the mapping.
Parameters: |
|
---|
Returns a Genshi event stream containing HTML formatting around the matched terms.
Parameters: |
|
---|
Represents a fragment (extract) from a hit document. This object is mainly used to keep track of the start and end points of the fragment and the “matched” character ranges inside; it does not contain the text of the fragment or do much else.
The useful attributes are:
Parameters: |
|
---|