![]() While relevancy based on word frequency is a good default for the search sorting, quite often the data contains important indicators that are more relevant than simply the frequency. If I modify the query to do search for darth OR vader, it returns in about 80 ms, because there are now over 1000 matching result that need ranking and sorting. To exemplify, the above query returns in 5-7 ms on my computer. This means that if the WHERE condition matches a lot of rows, PostgreSQL needs to visit them all in order to do the ranking, and that can be slow. One thing to note about ts_rank is that it needs to access the search column for each result. 10292397 Rogue One: A Star Wars Story (film) | 0. 18902963 Star Wars: Episode III – Revenge of the Sith | 0. 26263964 Star Wars Episode IV: A New Hope (aka Star Wars) | 0. To use them in a query, you can do something like this:Ĭopy SELECT title, ts_rank(search, websearch_to_tsquery( ' english ', ' darth vader ' )) rank FROM movies WHERE search websearch_to_tsquery( ' english ', ' darth vader ' ) ORDER BY rank DESC LIMIT 10 title | rank -+- The Empire Strikes Back | 0. ![]() The difference between them is that while they both take into account the frequency of the term, ts_rank_cd also takes into account the proximity of matching lexemes to each other. The two ranking functions mentioned are ts_rank and ts_rank_cd. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs. The built-in ranking functions are only examples. Different applications might require additional information for ranking, e.g., document modification time. However, the concept of relevancy is vague and very application-specific. PostgreSQL provides two predefined ranking functions, which take into account lexical, proximity, and structural information that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur. However, for a good search experience, it is important to show the best results first - meaning that the results need to be sorted by relevancy. So far, we've seen how ts_vector and ts_query can match search queries. Running the same query with the simple search configuration results in a tsvector that includes all the words as they were found in the text: In the example above, the transformation rules from words to lexemes are based on the english search configuration. “refus” is the 12th and the 13th word in the text) and the weights (which are useful for ranking and we'll discuss later). For each word, the positions in the original phrase are recorded (e.g. ![]() “refuse” and “Refusing” are both transformed into “refus”). The words are normalized and reduced to their root (e.g. ' )) lexeme | positions | weights -+-+- go | ( 6 rows )Īs you can see, stop words like “I”, “to” or “an” are removed, because they are too common to be useful for search. Copy SELECT * FROM unnest(to_tsvector( ' english ', ' I '' m going to make him an offer he can '' t refuse. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |