VBscript: Need some advice on how to parse HTML code and split them into parts

Hi All! I'm currently stuck with doing my school project and need some help.

I need to split the html code into 5 parts,

Authors, Year, Title, Journal, Page

example of the HTML code is as follows:

<br><br>Chua Eng Huang, Cecil, Sandeep Purao, Veda C. Storey, 2006, Developing Maintainable Software: The READABLE Approach, <i>Decision Support Systems (Netherlands)</i>, Vol. 42, No. 1, pp 469 - 491.<br><br>Chua Eng Huang, Cecil, Huoy Min Khoo, Detmar W. Straub, Savitha Kadiyala, David Kuechler, 2005, The Evolution of e-Commerce Research: A Stakeholder Perspective, <i>Journal of Electronic Commerce Research (United States)</i>, Vol. 6, No. 4, pp 262 - 280.

I manage to do a Split with "<br><br> " to separate the journal papers but are having problems now spliting them further into the 5 parts started above. Truncate commands can't be used because there is no standard length for the different parts.

The criteria i was looking at is to to extract 4 numbers XXXX to identify the year, the big chunk to the left of the year will be the authors, the last part will be split into 3 by taking out the part in italics as the journal, the part to the left of the journal is the title and the part on the right is the page.

Is there a better way to do it? Because i'm struggling with coding the logic i mentioned in the above paragraph and if thats the only way. Can some helpful soul provide some advice or aid to break the html code into 5 parts?

Thanks...
[1606 byte] By [tzixiang] at [2007-11-20 10:20:37]
# 1 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
Since this is a school project, we are willing to help, but not solve. School projects should be completed by the person who is in school. With that said...

As you can see, you need to split the content using <randomtag>. Regular expressions can help you do that. Here (http://www.4guysfromrolla.com/ASPScripts/PrintPage.asp?REF=%2Fwebtech%2F120400-1.3.shtml) is a link that should help you to finish this project.
PeejAvery at 2007-11-8 0:43:06 >
# 2 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
Hi PeejAvery~

Thanks for the reference. Will read up and figure it out.

What about the values that are not split by HTML tags?
for example the authors are to the left of XXXX (4 numbers = year)?

Zx
tzixiang at 2007-11-8 0:43:57 >
# 3 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
After you split by <*>, then you can split buy commas.
PeejAvery at 2007-11-8 0:44:59 >
# 4 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
If the year is the only four-digit number, you can use a regular expression to match it. Regular expressions can match numbers by using character class "[0-9]" or special match sequence "\d".

You can make sure that the number is by itself by requiring word boundaries (i.e. neighboring characters that are not word characters - letters, numbers and underscore are word characters). These are represented by "\b". They are placed around the string you want to separate: "\b(hi)\b" will match "hi" in strings such as "hi there" and "hi-fi", but not "high" or "this".

To match something an exact number of times, you can use the "{x}" quantifier, where x is the number of times you want to match: "\b(hi){3}\b" will match "hihihi". You can specify ranges by using a comma, where the first number is minimum and the second maximum. The second number can be left out to represent infinity:
The "?" quantifier is the same as "{0,1}", "+" same as "{1,}" and "*" same as "{0,}".
andreasblixt at 2007-11-8 0:45:59 >
# 5 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
Thanks Peej and andreasblixt !

I managed to get the 5 values i needed to insert into the database =) Thanks!!!
This regular expression thing is real useful~
tzixiang at 2007-11-8 0:47:01 >
# 6 Re: VBscript: Need some advice on how to parse HTML code and split them into parts
Thanks Peej and andreasblixt !
You're welcome.

This regular expression thing is real useful~
Yes. And it can be very confusing at times too!
PeejAvery at 2007-11-8 0:48:11 >