Regular Expression Tutorial [Regex]
What is Regular Expression a.k.a RegEx , an attempt to define this is at times makes the tool use under explained or overly complicated. Though let me try RegEx in simple terms is a way of handling and manipulating characters in ways different ways while simplifying the effort of the process.
We use RegEx on a daily basis it is just that we do not know areas where we use are below:
When using search and find
- Most of us have searched words in word documents excel, the process in which this applications are able to search apply RegEx expressions at the background
When creating passwords
- We find ourselves when creating passwords being told we need special characters, capitalized letter, numerics and lower case letters. How does the system know when you have not been able to put all this into the password. Very thought provocative I would say.
2. Tenets of RegEx
RegEx really on key concepts that make it work:
|Character Matching||- Being able to locating alphabetic characters e.g.
|Numerical Matching||- Locating numerals e.g.
|Special Character Matching||- Locating special characters e.g.
i) Character Matching
What happens when we want to find a certain character in a text. Lets get our hands dirty.
ex_text <- c("The","fat","cat","sat on the mat") grep("at",ex_text, value = TRUE)
##  "fat" "cat" "sat on the mat"
What you will notice the output picks words that have
at any where on the string vector
Am a proponent of Tidyverse packages, this are the likes of tidyr, dplyr, stringr etc
One of my favourite stingr manipulation library is stringr, it has easy understanding of syntax. Let use the example we had previously
library(stringr) ex_text <- c("The","fat","cat","sat on the mat") str_detect(ex_text,"at")
##  FALSE TRUE TRUE TRUE
##  "fat" "cat" "sat on the mat"
You will notice from
str_detect(ex_text,"at") that we get logical values
TRUE shows elements in the vector that match will
FALSE is the opposite.
But at times we want to pick alphabetic characters from a text with mixed characters. How can we do this? Before we talk about that I will introduce
meta characters that are very important.
||A period is used to connote any instance of a single character|
||Square brackets are used to give ranges both Alphabets and Numerics|
||Match multiple characters 0 or more times|
||Match multiple characters 1 or more times|
||The next character or instance is optional|
||Matches a character at least
||Match the exact characters|
||Defacto logic for
||Escaping characters, when you want to mean special characters are part of the text and this special characters are part of RegEx commands|
||Matches the start of a statement or Word|
||Matches the end of a statement or word|
ii) Numerical Matching
At times we want to pick numbers from a list of characters. Let us try. We will start using
stringr moving forward
num_ex <- "My name is 123 what is456" str_extract_all(num_ex,"[0-9]")
## [] ##  "1" "2" "3" "4" "5" "6"
Notice how we have used
[0-9] what this is doing it is matching a range of numbers from
9. If we want specific numbers we place the numbers directly.
num_ex <- "My name is 123 what is456" str_extract(num_ex,"23")
##  "23"
iii) Special Character Matching
And what if we want to get special characters
!@#$. Lets do this
spe_exa <- c("M@ !| Bri@n") str_extract_all(spe_exa,"\\|")
## [] ##  "|"
Notice that we have
\\| this is an escaping character
\ we are instructing the machine that this to exempt this particular symbol
| is not the syntaxical command
Now lets get our hands dirty with in depth practical examples.
We can find very interesting addins for RStudio that can help with regex RegEx Addin, Though I would shy away from this. :-)
4. Example workings
Import exercise 1.txt
Using G-Suite base syntax