Data Source Tutorial | Appendix B: Data Parser |
|
|
First we need to extract the value of the title attribute using another GetTag rule (which, again, is inserted as a child of the first rule) that extracts the text between "title="" (including the opening quote character) and the closing quote character:
<ParsingRule type="GetTag" source="DaySource" result="Condition">
<StartTag>title="</StartTag>
<EndTag>"</EndTag>
</ParsingRule>
Then we remove the "Chance for" string, and any text that follows it, using the TrimFromStart rule:
<ParsingRule type="TrimFromStart" source="Condition" result="Condition">
<SearchText>Chance for</SearchText> </ParsingRule>
This leaves us with the desired text, but with a trailing space character (e.g. "Heavy Rain "). We remove this extra space with a Trim rule:
<ParsingRule type="Trim" source="Condition" result="Condition" />
Why didn't we just use a <space> token in the SearchText parameter of the TrimFromStart rule (e.g. <SearchText><space>Chance for</SearchText>)? Because TrimFromStart's SearchText parameter does not support the special "tag" syntax, so it doesn't interpret the <space> token as a space character.
Now let's turn to the temperature. In each table cell, the temperature string is always found between a <br> tag and a closing </font> tag (e.g. <br>Hi <font color="#FF0000">81°F</font>). So we use those tags in a GetTag rule to extract the temperature:
<ParsingRule type="GetTag" source="DaySource" result="Temp"> <StartTag><br></StartTag> <EndTag></font></EndTag>
</ParsingRule>
This gets us the temperature, but with the string "<font color="#FF0000">" embedded inside. We want to keep the "Hi" or "Lo" part, so we just want to remove that opening <font …> element. If you look at the HTML code, you'll see that "Hi" temperatures get a font color of #FF0000 while "Lo" temperatures get #0033CC. Since we want to remove the font tag regardless of what color is specified, we use a ReplaceTag rule to remove everything from the "<font" to the ">":
<ParsingRule type="ReplaceTag" source="Temp" result="Temp"> <StartTag><font</StartTag>
<EndTag>></EndTag>
<NewText/>
</ParsingRule>
Polycom, Inc. | 187 |