The Get Web Page Text command enables users to extract text content from a webpage dynamically and flexibly. It supports different Text Retrieval Modes to meet various automation needs.
Command Overview
The command now offers three modes for extracting text content from the page:
- All Text: Retrieves the full text content of the page.
- Matching Pattern Text: Retrieves the full text that matches a specified regex pattern. You can specify which match to return when multiple matches are found.
- Matching Group Text: Retrieves a specific group (portion of the text) captured by a regex pattern. You can specify both the match index and group number.
Command Parameters
Parameter | Description | Values | Applicable When |
Text Retrieval Mode | Determines how text content is retrieved from the page. |
All text, Matching Pattern text, Matching Group text |
Always |
Matching Pattern | The regular expression pattern used to match and extract specific text. Supports regex syntax. | Any valid regex | for matching regex, or group |
Match Index | Specifies which match to return when multiple matches are found. Use 1 for the first index. | Positive integers (1, 2, ...) | for matching regex, or group |
Group Number | Specifies which regex group to return. Group numbers start from 1. | Positive integers (1, 2, ...) | for matching group |
Return Values
The Get Web Page Text command returns the following based on the selected mode:
- All Text: Returns the entire page text as a single string.
- Matching Pattern Text: Returns the entire regex-matched text at the specified index.
- Matching Group Text: Returns the content of the specified group within the regex match.
Practical Examples
Example 1: Retrieve All Text from a Web Page
Scenario: You want to retrieve the complete text content of the page for validation or logging.
Command Input:
Text Retrieval Mode: "All Text"
Result: The full text content of the page is returned as a single string.
Example 2: Extract Text Matching a Regex
Scenario: You need to extract the text Transaction ID: 12345
from a confirmation page.
Command Input:
Text Retrieval Mode: "Matching Pattern Text"
Matching Pattern: "Transaction ID: [0-9]+"
Match Index: "1"
Result:
"Transaction ID: 12345"
Example 3: Extract a Specific Group Within a Regex Match
Scenario: From the text Transaction ID: 12345 issued on December 15, 2024
, you want to retrieve the transaction ID 12345
only.
Command Input:
Text Retrieval Mode: "Matching Group Text"
Matching Pattern: "Transaction ID: ([0-9]+) issued on (.+)"
Match Index: "1"
Group Number: "1"
Explanation:
- Group 1:
12345
(Transaction ID). - Group 2:
December 15, 2024
(Date).
Result:
"12345"
Example 4: Extract Another Group from the Same Regex Match
Scenario: From the same text, extract the date December 15, 2024
instead.
Command Input:
Text Retrieval Mode: "Matching Group Text"
Matching Pattern: "Transaction ID: ([0-9]+) issued on (.+)"
Match Index: "1"
Group Number: "2"
Result:
"December 15, 2024"
Example 5: Handle Multiple Regex Matches on a Page
Scenario: A log file contains multiple error messages:Error Code: ERR001
, Error Code: ERR002
, Error Code: ERR003
. You want to extract the second the error code ERR002
, without the prefix ("Error Code: ").
Command Input:
Text Retrieval Mode: "Matching Group Text"
Matching Pattern: "Error Code: (ERR[0-9]+)"
Match Index: "2"
Group Number: "1"
Result:
"ERR002"
Key Points to Remember
-
Text Retrieval Modes:
- Use All Text when you need the complete page content.
- Use Matching Pattern Text to match a regex pattern and retrieve the full matched text.
- Use Matching Group Text to retrieve a specific dynamic portion captured by a regex group.
-
Match Index:
When multiple matches are found for a regex pattern, use the Match Index to specify which match to return (1-based index). -
Group Numbers:
Use parentheses( )
in your regex pattern to define groups. Group numbers start from 1.
Troubleshooting Tips
-
No Matches Found: If no text is returned:
- Ensure your regex pattern is correct and matches the actual content on the page.
- Verify that the Match Index value is within range.
-
Incorrect Group Results:
- Verify that the regex includes parentheses
( )
to define groups. - Check that the specified Group Number matches the intended group.
- Verify that the regex includes parentheses
Comments
0 comments
Please sign in to leave a comment.