Reading Dynamic Text from a Web Page – ACCELQ

Modern web applications often present dynamic content that changes based on user actions or system states. The Get Page Text command with regex support makes it easier to retrieve this information by combining static surrounding text for context and regex for precise matching.

The Get Web Page Text command offers multiple approaches to extract text from web pages, making it versatile for different test automation scenarios.

Extract all text from a web page
Extract text matching a specific regex pattern
Extract specific pattern groups from regex matches

What is Regex?

Regular expressions (regex) are patterns used to match and extract specific text. They allow you to define rules for identifying strings based on structure, such as letters, numbers, or symbols. For example, you can use regex to capture a timestamp, an ID, or a dynamic phrase on a webpage.

For more details on regex, you can explore regex documentation, online tutorials, or use AI chatbots to generate regex for specific requirements.

1. Getting All Text From Web Page

Use this approach when you want to capture all visible text from the currently loaded web page. This can be useful, for example, if you need to store the text in a variable and process it for further test logic development.

2. Getting Text Matching a Regex Pattern

This is useful when you want to retrieve text from a web page that matches a specific pattern. You may also indicate the matching instance (index, starts with 1) you are interested to fetch.

For example, following statement extracts the first timestamp displayed on the currently active web page.

3. Get Text from Pattern Groups

This is the most powerful feature of the text extraction capability allowing you to extract specific portions of dynamic text using regex pattern groups. The key advantage of this approach is that you can combine static text with dynamic patterns to precisely locate and extract the desired information.

For example, consider a web page displaying multiple balance amounts:

Total balance: $28,000
Savings Account balance: $18,000
Current Account balance: $10,000

To extract just the Total balance amount, you would use:

The pattern in this example consists of two parts:

Static text: "Total balance: $" - This acts as a context delimiter
Dynamic pattern: "([0-9,.]+)" - This captures the actual amount

Including the static text "Total balance: $" in the pattern ensures that we extract the correct amount, even when multiple dollar amounts appear on the page. The static portion serves as context, making the extraction more reliable and precise.

Another example:
Consider the following display of flight information on a page

// Web page content: "Flight AI302: Departs 10:45 AM - Arrives 01:30 PM"

pattern = "Flight ([A-Z0-9]+): Departs (\d{2}:\d{2} [AP]M) - Arrives (\d{2}:\d{2} [AP]M)"

Group number 1 returns the departure time and group number 2 returns the arrival time of the flight.

Why Static Surrounding Text Matters

Including static surrounding text in your regex pattern is vital to:

Provide Context: Ensure that the correct portion of the page is being targeted.
Improve Reliability: Anchor the dynamic data within a consistent, predictable pattern.
Avoid False Matches: Minimize the risk of inadvertently matching unrelated text on the page.

Handling Hidden Characters in Get Page Text

If you are not getting the expected match when using regex patterns on web pages, it may be due to invisible or extraneous special characters (e.g., spaces, tabs, or line breaks) between portions of strings like "Total Balance" and the actual balance value. To handle this, include \s* between "Total Balance" and the number in your regex to ensure these characters are matched without breaking the pattern.

Identifying Such Characters:

To manually detect these hidden characters on a web page:

Copy-Paste into a Plain Text Editor: Copy the text from the webpage and paste it into a plain text editor to observe any unexpected characters or spacing. This step is usually sufficient for detecting common issues.
Inspect the HTML Source (if needed): If copying into a plain text editor doesn't reveal the issue, right-click on the element and choose "Inspect" or "View Page Source" in your browser to examine the exact structure of the text.
Use Developer Tools (if needed): Check the computed DOM and element styles in the browser's developer tools to identify if additional elements or styles (e.g., padding or spacing) might be introducing these characters.

If your regex is not working as expected, anticipate these issues and adjust your pattern accordingly.

Best Practices & Troubleshooting

No Matches Found

Issue: Pattern doesn't match any text on the page
Solution: Verify the pattern against actual page content

Wrong Group Index

Issue: Extracted text is not what was expected
Solution: Verify group numbering in pattern
Example: In pattern "(a)(b)", group 1 is "a" and group 2 is "b"

Special Characters

Issue: Pattern fails due to special characters
Solution: Escape special characters properly
Example: Use \\. for decimal point, \\$ for dollar sign