

{"id":121234,"date":"2022-05-05T01:34:25","date_gmt":"2022-05-04T20:04:25","guid":{"rendered":"http:\/\/analyticstraining.com\/?p=5704"},"modified":"2022-07-14T15:00:40","modified_gmt":"2022-07-14T09:30:40","slug":"stringi-package-r","status":"publish","type":"post","link":"https:\/\/www.jigsawacademy.com\/stringi-package-r\/","title":{"rendered":"Stringi Package in R"},"content":{"rendered":"<p align=\"justify\">While base R as well as\u00a0stringr\u00a0functions are good for only simple text processing,we needed better packages for dealing with more complex problems such as natural language processing.<\/p>\n<p style=\"text-align: left;\" align=\"justify\">Hence, stringi\u00a0<strong>is a package which provides<\/strong> replacements for nearly all the\u00a0<strong>character string processing functions<\/strong>\u00a0known from base R. It also provides\u00a0<strong>high performance<\/strong>\u00a0and\u00a0<strong>portability<\/strong>\u00a0of its facilities .Some of its many features include text sorting, text comparing, extracting words, sentences and characters, text transliteration, replacing strings,etc.<\/p>\n<p><!--more--><\/p>\n<div class=\"_form_3\"><\/div>\n<p><script src=\"https:\/\/jigsawacademy67103.activehosted.com\/f\/embed.php?id=3\" type=\"text\/javascript\" charset=\"utf-8\"><\/script><br \/>\nFollowing is a list of the commands in R available under stringi package which can be used for text processing :<br \/>\n<strong>#Install and load stringi<\/strong><br \/>\n&gt;install.packages(&#8220;stringi&#8221;)<br \/>\n&gt;library(stringi)<br \/>\n<strong>#Consider this object \u201ctest\u201d which is a review about Iphone 6<\/strong><\/p>\n<p align=\"justify\">&gt;test&lt;-&#8220;I loved my i5 but hate the i6.\u00a0 To be fair, the display quality is much better and the camera\/photo resolution is amazing. Some will find the privacy feature on the browser to be a welcome change.\u00a0 However, I have average size hands and find that the buttons are all in the wrong places.\u00a0 I frequently put the phone into sleep mode when trying to text (due to the placement of the sleep button on the side) and unless you have super-long monkey fingers and thumbs it is very difficult to cover the span of the keyboard (when turned sideways) and impossible to work one\/handed (just try reaching those icons in the upper and left portions of the phone with your thumb).\u00a0 I took advantage of Apple&#8217;s free trade-in offer, but I&#8217;m going in tomorrow and asking for my old i5 back. The enhanced display and camera resolution simply can&#8217;t make up for the increased difficulty and hassle to operate.&#8221;<\/p>\n<p><strong># stri_split_boundaries<\/strong><br \/>\n<strong>#Extract words :Input is a character variable<\/strong><strong>\u00a0<\/strong><br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-1.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5705\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-1.jpg\" alt=\"Dec 29 1\" width=\"581\" height=\"274\" title=\"\"><\/a><\/p>\n<p><strong># Extract sentences<\/strong><br \/>\n&gt;test1&lt;-stri_split_boundaries(test, stri_opts_brkiter(type=&#8221;sentence&#8221;))<br \/>\n&gt;test1<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5709\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29.jpg\" alt=\"Dec 29\" width=\"1360\" height=\"612\" title=\"\"><\/a><br \/>\n<strong>#Extract characters <\/strong><br \/>\n&gt;stri_split_boundaries(test, stri_opts_brkiter(type=&#8221;character&#8221;)) # extract characters<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-2.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5706\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-2.jpg\" alt=\"Dec 29 2\" width=\"1282\" height=\"523\" title=\"\"><\/a><br \/>\n<strong>##The following code is to convert test1(which is a list) to character format<\/strong><br \/>\n<strong>#Since there are 7 sentences create a vector words with 7 characters<\/strong><br \/>\n&gt;words&lt;-c(&#8220;H&#8221;,&#8221;H&#8221;,&#8221;H&#8221;,&#8221;H&#8221;,&#8221;H&#8221;,&#8221;H&#8221;,&#8221;H&#8221;)<br \/>\n<strong>#The following loop populates &#8220;Words&#8221; with the 7 sentences in character format<\/strong><br \/>\n<strong>#This is because <em>stri_extract_words<\/em>,<em>stri_replace_all_fixed<\/em> and such functions take in input as character string and not a list<\/strong><br \/>\n<strong>\u00a0<\/strong><br \/>\n&gt;for(i in 1:7)<br \/>\n{ words[i]&lt;-test1[[1]][i];print(i) }<br \/>\n&gt;words<br \/>\n&gt;class(words)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-3.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5708\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-3.jpg\" alt=\"Dec 29 3\" width=\"987\" height=\"361\" title=\"\"><\/a><br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-4.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5711\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-4.jpg\" alt=\"Dec 29 4\" width=\"1013\" height=\"105\" title=\"\"><\/a><br \/>\n<strong>#Extracts words in a string<\/strong><br \/>\n&gt;stri_extract_words(test)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-5.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5712\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-5.jpg\" alt=\"Dec 29 5\" width=\"783\" height=\"367\" title=\"\"><\/a><br \/>\n&gt;stri_extract_words(words) ##Gives the wordlist for each sentence<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-6.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5713\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-6.jpg\" alt=\"Dec 29 6\" width=\"747\" height=\"369\" title=\"\"><\/a><br \/>\n<strong>#Counts words in a string<\/strong><br \/>\n&gt;stri_count_words(test)<br \/>\n## [1] 164<br \/>\n&gt;stri_count_words(words) ##Gives the word count for each sentence<br \/>\n## [1] 8 \u00a016 \u00a014 \u00a017 \u00a070 \u00a021 \u00a018<br \/>\n<strong>#Determine whether a string starts or ends with a given pattern.<\/strong><br \/>\n&gt;stri_startswith_fixed(words, &#8220;I&#8221;)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/DEC-29-Missed-1.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5715\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/DEC-29-Missed-1.jpg\" alt=\"DEC 29 Missed 1\" width=\"472\" height=\"28\" title=\"\"><\/a><br \/>\n<strong>#stri_replace_all_* : Replaces a word with another word based on conditions<\/strong><br \/>\n<strong>#stri_replace_all_* gained a vectorize_all parameter, which defaults to TRUE for backward compatibility.<\/strong><br \/>\n<strong>#In this example, amazing and welcome are replaced with &#8220;excellent&#8221; and &#8220;good&#8221;<\/strong><br \/>\n&gt;stri_replace_all_fixed(words,c(&#8220;amazing&#8221;,&#8221;welcome&#8221;), c(&#8220;excellent&#8221;,&#8221;good&#8221;), vectorize_all=FALSE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-7.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5714\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-7.jpg\" alt=\"Dec 29 7\" width=\"926\" height=\"257\" title=\"\"><\/a><br \/>\n<strong>## stri_replace_all_fixed<\/strong><br \/>\n<strong>#Here we are comparing between vectorize_all=FALSE and vectorize_all=TRUE<\/strong><br \/>\n<strong>#This replaces the given string with another string<\/strong><br \/>\n&gt;stri_replace_all_fixed(&#8220;The white color iphone 6S is more appealing to the customers than iphone 5s&#8221;,c(&#8220;appeal&#8221;, &#8220;white&#8221;), c(&#8220;interest&#8221;,\u00a0 &#8220;red&#8221;), vectorize_all=TRUE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-8.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5717\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-8.jpg\" alt=\"Dec 29 8\" width=\"649\" height=\"51\" title=\"\"><\/a><br \/>\nstri_replace_all_fixed(&#8220;The white color iphone 6S appeals more to the customers than iphone 5s&#8221;,c(&#8220;appeal&#8221;, &#8220;white&#8221;), c(&#8220;interest&#8221;,\u00a0 &#8220;red&#8221;),vectorize_all=FALSE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-9.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5718\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-9.jpg\" alt=\"Dec 29 9\" width=\"747\" height=\"31\" title=\"\"><\/a><br \/>\n<strong>## stri_replace_all_regex<\/strong><br \/>\n<strong># Compare the results:<\/strong><br \/>\n<strong>#Here we are comparing between vectorize_all=FALSE and vectorize_all=TRUE<\/strong><br \/>\n&gt;stri_replace_all_fixed(&#8220;The white color iphone 6S is more appealing to the customers than iphone 5s&#8221;,c(&#8220;appeal&#8221;, &#8220;white&#8221;), c(&#8220;interest&#8221;,\u00a0 &#8220;red&#8221;), vectorize_all=FALSE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-10.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5719\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-10.jpg\" alt=\"Dec 29 10\" width=\"725\" height=\"29\" title=\"\"><\/a><br \/>\n&gt;stri_replace_all_regex(&#8220;The white color iphone 6S appeals more to the customers than iphone 5s&#8221;,&#8221;\\\\b&#8221;%s+%c(&#8220;appeal&#8221;, &#8220;white&#8221;)%s+%&#8221;\\\\b&#8221;, c(&#8220;interest&#8221;,\u00a0 &#8220;red&#8221;),vectorize_all=FALSE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-11.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5720\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-11.jpg\" alt=\"Dec 29 11\" width=\"734\" height=\"33\" title=\"\"><\/a><br \/>\n<strong>##The following command helps us to filter only valid email id\u2019s<\/strong><br \/>\n&gt;stri_subset_regex(c(&#8220;john@office.company.com&#8221;, &#8220;steve1932@g00gl3.eu&#8221;, &#8220;No email here&#8221;,&#8221;abi20hotmail.com&#8221;),&#8221;^[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\\\\.)+[A-Za-z]{2,4}$&#8221;)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-Missed-2.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5722\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-Missed-2.jpg\" alt=\"Dec 29 Missed 2\" width=\"464\" height=\"30\" title=\"\"><\/a><br \/>\n<strong>#For complete references to regex&#8217;s refer : <\/strong><a href=\"http:\/\/docs.rexamine.com\/R-man\/stringi\/stringi-search-regex.html\" target=\"_blank\" rel=\"noopener\"><strong>http:\/\/docs.rexamine.com\/R-man\/stringi\/stringi-search-regex.html<\/strong><\/a><br \/>\n<strong>#stri_split_fixed<\/strong><br \/>\n<strong>#If you want to split sentences based on &#8220;;&#8221;,&#8221;_&#8221; or any other metric<\/strong><br \/>\n&gt;stri_split_fixed(c(&#8220;ipone5s-&gt;bad&#8221;, &#8220;ipone6s-&gt;good&#8221;, &#8220;phone&#8221;, &#8220;&#8221;), &#8220;-&gt;&#8221;, n_max=1, tokens_only=TRUE, omit_empty=TRUE)<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-13.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5721\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-13.jpg\" alt=\"Dec 29 13\" width=\"198\" height=\"225\" title=\"\"><\/a><br \/>\n&gt;stri_split_fixed(c(&#8220;ipone5s-&gt;bad&#8221;, &#8220;ipone6s-&gt;good&#8221;, &#8220;phone&#8221;, &#8220;&#8221;), &#8220;-&gt;&#8221;, n_max=2, tokens_only=TRUE, omit_empty=TRUE)<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-14.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5723\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-14.jpg\" alt=\"Dec 29 14\" width=\"251\" height=\"232\" title=\"\"><\/a><br \/>\n<strong>#stri_list2matrix<\/strong><br \/>\n<strong>#Helps you to convert lists of atomic vectors to character matrices<\/strong><br \/>\n&gt;stri_list2matrix(stri_split_fixed(c(&#8220;ipone5s-&gt;bad&#8221;, &#8220;ipone6s-&gt;good&#8221;, &#8220;phone&#8221;, &#8220;&#8221;), &#8220;-&gt;&#8221;, n_max=2, tokens_only=TRUE, omit_empty=TRUE))<br \/>\n<a href=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-15.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-5724\" src=\"https:\/\/www.jigsawacademy.com\/wp-content\/uploads\/2014\/12\/Dec-29-15.jpg\" alt=\"Dec 29 15\" width=\"429\" height=\"73\" title=\"\"><\/a><br \/>\nRelated Articles:<br \/>\n<a href=\"http:\/\/analyticstraining.com\/2014\/memory-management-in-r-and-how-it-handles-big-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">Memory Management in R and how it Handles Big Data<\/a><br \/>\n<a href=\"http:\/\/analyticstraining.com\/2014\/how-to-create-a-word-cloud-in-r\/\" target=\"_blank\" rel=\"noopener noreferrer\">How to Create a Word Cloud in R<\/a><br \/>\n<a href=\"http:\/\/analyticstraining.com\/2014\/examples-of-how-r-is-used\/\" target=\"_blank\" rel=\"noopener noreferrer\">Examples of How R is Used<\/a><\/p>\n<div><em>Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.<\/em><\/div>\n<div><\/div>\n<div><em><strong>Jigsaw\u2019s Data Science with SAS Course &#8211;\u00a0<a href=\"http:\/\/jigsawacademy.us3.list-manage.com\/track\/click?u=04f18588afa72136cc00176e4&amp;id=6462c3ee60&amp;e=8b6942fd51\" target=\"_blank\" rel=\"noopener noreferrer\">click here<\/a>.<\/strong><\/em><\/div>\n<div>\n<div><em><strong>Jigsaw\u2019s\u00a0<\/strong><strong>Data Science with R Course\u00a0&#8211;\u00a0<\/strong><strong><strong><a href=\"http:\/\/jigsawacademy.us3.list-manage.com\/track\/click?u=04f18588afa72136cc00176e4&amp;id=2a39e5c27d&amp;e=8b6942fd51\" target=\"_blank\" rel=\"noopener noreferrer\">click here<\/a>.<\/strong><\/strong><\/em><\/div>\n<div><em><strong>Jigsaw&#8217;s Big Data Course &#8211; <a href=\"http:\/\/jigsawacademy.us3.list-manage.com\/track\/click?u=04f18588afa72136cc00176e4&amp;id=b344e5b3cf&amp;e=8b6942fd51\" target=\"_blank\" rel=\"noopener noreferrer\">click here<\/a>.<\/strong><\/em><\/div>\n<\/div>\n<div class=\"_form_3\"><\/div>\n<p><script src=\"https:\/\/jigsawacademy67103.activehosted.com\/f\/embed.php?id=3\" type=\"text\/javascript\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>While base R as well as\u00a0stringr\u00a0functions are good for only simple text processing,we needed better packages for dealing with more complex problems such as natural language processing. Hence, stringi\u00a0is a package which provides replacements for nearly all the\u00a0character string processing functions\u00a0known from base R. It also provides\u00a0high performance\u00a0and\u00a0portability\u00a0of its facilities .Some of its many features [&hellip;]<\/p>\n","protected":false},"author":105,"featured_media":120975,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[541],"tags":[83,545,48,753,609],"form":[1499],"acf":[],"_links":{"self":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/121234"}],"collection":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/users\/105"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/comments?post=121234"}],"version-history":[{"count":1,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/121234\/revisions"}],"predecessor-version":[{"id":241454,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/121234\/revisions\/241454"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media\/120975"}],"wp:attachment":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media?parent=121234"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/categories?post=121234"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/tags?post=121234"},{"taxonomy":"form","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/form?post=121234"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}