array stemming( string sText, string sLang="en" );
Function performs morphological parsing (stemming) of text and returns an array of unvaried word stems.
First the analyzer is searched and connected for the specified language (sLang parameter):
- File connection attempt is in progress $_SERVER["DOCUMENT_ROOT"]."/bitrix/php_interface/".$sLang."/search/stemming.php" and if it has specified "stemming_"$sLang function, the search will be completed at that.
- Otherwise, an attempt is made to connect the analyzer from the set, supplied with the search module.
- If even this attempt has failed, the by-default analyzer Stemming_default will be used.
Also, it is expected that one of files will have the specified stop word search feature - "stemming_stop_".$sLang. If not so, by-default function Stemming_stop_default will be used to define the list of stop words, .
Then the text is converted to the uppercase and is split into separate words. The split is performed by deleting all symbols, not included into the language alphabet. Alphabet is specified as follows:
- All symbols, returned by the Stemming_letter_default function are taken into account;
- Symbols, returned by the "stemming_letter_".$sLang function (if such is specified) are added to them;
- As well as symbols, included into the corresponding search module setting.
Unvaried stem is specified for each word via the function, found previously. After that, if the word is not the stop word, it is entered into the array, that will be returned as the result.
|sText||Contains text for morphological parsing (stemming).|
|sLang||Text language ID.|
Function returns an associative array, which keys are the unvaried word stems, and as values - their frequancy.
Examples of use
<? $ar = stemming("Zebras love other zebras, not elephants!", "en"); print_r($ar); //Will be spelled as follows: /* array( "ZEBRA" => 2 "LOV" => 1 "ELEPHANT" => 1 ) */ ?>