Ruby

Regular Expression with Capture Groups

Note: Chilkat uses PCRE2. See PCRE2 Regular Expressions
Also see: PCRE2 Performance

Demonstrates the following PCRE2 regular expression:

See the sample code below.

Name:\s+(\w+)\s+(\w+),\s+Email:\s+(\S+)

And apply it to this string:

Name: John Smith, Email: john.smith@example.com

Regex Components Explained

Part	Meaning	Matched Text
"Name:"	Matches the literal text "Name:"	"Name:"
"\s+"	Matches one or more whitespace characters (spaces, tabs, etc.)	(space)
"(\w+)"	Capture Group 1: One or more word characters ("a-zA-Z0-9_")	"John"
"\s+"	More whitespace	(space)
"(\w+)"	Capture Group 2: Another word (the last name)	"Smith"
","	A literal comma	","
"\s+"	Whitespace again	(space)
"Email:"	Matches the literal "Email:"	"Email:"
"\s+"	Whitespace	(space)
"(\S+)"	Capture Group 3: One or more non-whitespace characters	"john.smith@example.com"

Matches for Your Example String

String:

"Name: John Smith, Email: john.smith@example.com"

Regex Match Groups:

Group	Captured Value
Group 1	"John"
Group 2	"Smith"
Group 3	"john.smith@example.com"

Notes on Character Classes

\w matches [a-zA-Z0-9_] — so it doesn’t include punctuation like a period.
\S matches any non-whitespace character, so it’s good for capturing an email.

Chilkat Ruby Downloads

Download Chilkat for Ruby

Ruby

require 'chilkat'

success = false

subject = "Name: John Smith, Email: john.smith@example.com"
pattern = "Name:\\s+(\\w+)\\s+(\\w+),\\s+Email:\\s+(\\S+)"

sb = Chilkat::CkStringBuilder.new()
sb.Append(subject)

json = Chilkat::CkJsonObject.new()
json.put_EmitCompact(false)

timeoutMs = 2000
numMatches = sb.RegexMatch(pattern,json,timeoutMs)
if (numMatches < 0)
    # Probably an error in the regular expression.
    # Suggestion: Use AI to help create and/or diagnose regular expressions.
    print sb.lastErrorText() + "\n";
    exit
end

# Examine the matches:
print json.emit() + "\n";

# This is the JSON with the match information.
# See the JSON parsing code below to get the matched capture group values.

# Important:  Capture group 0 always contains the entire match — that is, the portion of the input string that matches the full regular expression.

# {
#   "match": [
#     {
#       "group": [
#         {
#           "cap": "Name: John Smith, Email: john.smith@example.com",
#           "idx": 0,
#           "len": 47
#         },
#         {
#           "cap": "John",
#           "idx": 6,
#           "len": 4
#         },
#         {
#           "cap": "Smith",
#           "idx": 11,
#           "len": 5
#         },
#         {
#           "cap": "john.smith@example.com",
#           "idx": 25,
#           "len": 22
#         }
#       ]
#     }
#   ]
# }

i = 0
matchCount = json.SizeOfArray("match")
while i < matchCount
    print "Match " + (i + 1).to_s() + ":" + "\n";
    json.put_I(i)
    j = 0
    numCaptureGroups = json.SizeOfArray("match[i].group")
    while j < numCaptureGroups
        json.put_J(j)
        cap = json.stringOf("match[i].group[j].cap")
        print j.to_s() + ": " + cap + "\n";
        j = j + 1
    end
    i = i + 1
end

# Capture group 0 always contains the entire match — that is, the portion of the input string that matches the full regular expression.

# Output

# Match 1:
# 0: Name: John Smith, Email: john.smith@example.com
# 1: John
# 2: Smith
# 3: john.smith@example.com