U Extract Content From Html
KnowleSys
 
Contact Us
Web Data Extraction Service
Fast, Accurate, Reliable!
 
Home   |   Services  |   Products  |   Solutions  |   Testimonials  |   Support  |   Company 

Extract Content From Html

"Hello,

Are there any utilities to help me extract Content from HTML ?

I'd like to store this data in a database.

The HTML consists of about 10,000 files with a total size of
about 160 Mb. Each file is a thread from a message forum. Each
thread has several contributions. The threads are in linear
order of date posted with filenames such as 000125633.html. The
HTML is marked up with , etc tags. This HTML is very
badly formed with crucial tags missing (such as , ,
etc.). There is no coherence to this; no system - sometimes tags
are missing and sometimes they are present. Despite this, the
threads seem to render correctly; such is the forgiving nature
of modern browsers.

Fields for each post are usually identified by an attribute tag.
(usually an attribute of a
or .

Sometimes I need to actually store HTML with the content (for
instance when a post includes a link, colored writing or text
formatted with
 tags.

My purpose in storing this in a database is to make the content
(a) easier to search and (b) use a more efficient storage
medium.

The original database from which these web-forum posts were
taken is no longer available on the web nor does it look like it
ever will be again. Nor can I contact the person who 'owns' it.
If I did contact them, they would be unlikely to release the
data.

Despite this, there are no copyright issues here. Every single
post made to the forum was made using an alias and no forum
poster wants to be identified, nor do any posters wish to claim
"ownership" of their contributions.

Mark4 "

Relative Articles:Extract Data Text File,Extract Data From Website

Web2DB Service for: Extract Html,Extract Data Text File,Extract Data From Website

Web Data Extraction Examples
Web Data Extraction Big Picture
 Screenshots of Examples
  Web2DB Data Service
  You receive the extracted
  structural data directly.

  Get your data in several days after
  we start your project!
  Web2DB Custom Extractor
  You run the extraction
  in your house at any time.

  Get your data in minutes after
  clicking the "Start" button!

 
 
Copyright ©2009 KnowleSys Software Inc.