Modern web applications often present dynamic content that changes based on user actions or system states. The Get Page Text command with regex support makes it easier to retrieve this information by combining static surrounding text for context and regex for precise matching.
The Get Web Page Text command offers multiple approaches to extract text from web pages, making it versatile for different test automation scenarios.
-
Extract all text from a web page
-
Extract text matching a specific regex pattern
-
Extract specific pattern groups from regex matches
What is Regex?
Regular expressions (regex) are patterns used to match and extract specific text. They allow you to define rules for identifying strings based on structure, such as letters, numbers, or symbols. For example, you can use regex to capture a timestamp, an ID, or a dynamic phrase on a webpage.
For more details on regex, you can explore regex documentation, online tutorials, or use AI chatbots to generate regex for specific requirements.
1. Getting All Text From Web Page
Use this approach when you want to capture all visible text from the currently loaded web page. This can be useful, for example, if you need to store the text in a variable and process it for further test logic development.
2. Getting Text Matching a Regex Pattern
This is useful when you want to retrieve text from a web page that matches a specific pattern. You may also indicate the matching instance (index, starts with 1) you are interested to fetch.
For example, following statement extracts the first timestamp displayed on the currently active web page.
3. Get Text from Pattern Groups
This is the most powerful feature of the text extraction capability allowing you to extract specific portions of dynamic text using regex pattern groups. The key advantage of this approach is that you can combine static text with dynamic patterns to precisely locate and extract the desired information.
For example, consider a web page displaying multiple balance amounts:
Total balance: $28,000
Savings Account balance: $18,000
Current Account balance: $10,000
To extract just the Total balance amount, you would use:
The pattern in this example consists of two parts:
-
Static text: "Total balance: $" - This acts as a context delimiter
-
Dynamic pattern: "([0-9,.]+)" - This captures the actual amount
Including the static text "Total balance: $" in the pattern ensures that we extract the correct amount, even when multiple dollar amounts appear on the page. The static portion serves as context, making the extraction more reliable and precise.
Another example:
Consider the following display of flight information on a page
// Web page content: "Flight AI302: Departs 10:45 AM - Arrives 01:30 PM"
pattern = "Flight ([A-Z0-9]+): Departs (\d{2}:\d{2} [AP]M) - Arrives (\d{2}:\d{2} [AP]M)"
Group number 1 returns the departure time and group number 2 returns the arrival time of the flight.
Why Static Surrounding Text Matters
Including static surrounding text in your regex pattern is vital to:
-
Provide Context: Ensure that the correct portion of the page is being targeted.
-
Improve Reliability: Anchor the dynamic data within a consistent, predictable pattern.
-
Avoid False Matches: Minimize the risk of inadvertently matching unrelated text on the page.
Handling Hidden Characters in Get Page Text
If you are not getting the expected match when using regex patterns on web pages, it may be due to invisible or extraneous special characters (e.g., spaces, tabs, or line breaks) between portions of strings like "Total Balance" and the actual balance value. To handle this, include \s*
between "Total Balance" and the number in your regex to ensure these characters are matched without breaking the pattern.
Identifying Such Characters:
To manually detect these hidden characters on a web page:
- Copy-Paste into a Plain Text Editor: Copy the text from the webpage and paste it into a plain text editor to observe any unexpected characters or spacing. This step is usually sufficient for detecting common issues.
- Inspect the HTML Source (if needed): If copying into a plain text editor doesn't reveal the issue, right-click on the element and choose "Inspect" or "View Page Source" in your browser to examine the exact structure of the text.
- Use Developer Tools (if needed): Check the computed DOM and element styles in the browser's developer tools to identify if additional elements or styles (e.g., padding or spacing) might be introducing these characters.
If your regex is not working as expected, anticipate these issues and adjust your pattern accordingly.
Best Practices & Troubleshooting
No Matches Found
- Issue: Pattern doesn't match any text on the page
- Solution: Verify the pattern against actual page content
Wrong Group Index
- Issue: Extracted text is not what was expected
- Solution: Verify group numbering in pattern
- Example: In pattern "(a)(b)", group 1 is "a" and group 2 is "b"
Special Characters
- Issue: Pattern fails due to special characters
- Solution: Escape special characters properly
- Example: Use \\. for decimal point, \\$ for dollar sign
Comments
0 comments
Please sign in to leave a comment.