Knowlesys Web Newshub system

Powered by the world's leading data aquisition technology, the Knowlesys Web Newshub System enables web editors to find the latest newsworthy information in a systematic, massive and fast manner every day.

I. Overview

Knowlesys Web Newshub System is an edition platform that automatically collects, summarizes and identifies critical information in real time from numerous target websites (e.g. news, BBS, blogs, microblogs) to find newsworthy information and provide functions for subsequent edition and review.

Its system architecture is illustrated below:

Fig.1. Architecture of Knowlesys Web Newshub System

Compared to the current manual Newshub, it has the following advantages:

Indicator for comparison

Knowlesys Web Newshub System

Manual reprint

Target site

hundreds, thousands, even tens of thousands


Labor cost

automatic access to network information, only a few editors are needed to perform manual content viewing and analysis within the private LAN

a large number of editors are needed to log in to all sites to manually access, copy and paste the content, which is tiresome.

News source identification

manual confirmation based on automatic identification

manual review and confirmation item by item required

Information storage

accurate, full coverage, easy to track

fragmented, errors unavoidable

Data storage

all stored in a large relational database and under centralized management

paste at any time, hard to manage

Work report

based on automated statistical analysis
with both text and illustration, detailed statistical data support, daily, weekly and monthly report generation

ambiguous, unclear, no quantitative analysis: Knowlesys

Reprint effect

systematic and massive Newshub from partner media or exposures from users, traffic and ranking boosted quickly

unsystematic, little

II. Benefits

1. The latest information from all news sites, paper media, BBS, blogs and video sites are automatically presented;
2. the system finds valuable information immediately which can be selected just by a click;
3. editors have more time for in-depth edition or origination 乐�?/span>
4. Daily reprint volume is increased by dozens or hundreds of times, and so is website traffic and ranking.

III. Composition

Knowlesys Web Newshub System consists of three sub-systems: extraction sub-system, Analysis sub-system and presentation sub-system. Their connections are shown below:

Fig. 2. System composition

The network topology of Knowlesys Web Newshub System is shown below. It can be separately implemented on the Internet LAN and private LAN as needed.


Fig. 3: Network topology

IV. Functional description of automatic acquisition sub-system

The automatic acquisition sub-system can collect any target website automatically.

E.g. Xinhua net,,,,,,,,, and other sites specified by users It can extract all news articles or threads, or content of the latest thread. It can also extract all replies to a threadt or contents of the lastest reply. It can not only monitor a specified target website but also all website around the world without specified target sites, or uses the two modes in combination. It can monitor not only domestic websites but also foreign ones, e.g. BBC, CNN.

The back-end database supports any mainstream relational databases, e.g. Oracle, IBM DB2, MS SQL Server, MySQL, Sybase and document databases, e.g. Access. �?�?�?�?/span>

The all-round monitoring function of the automatic acquisition sub-system is illustrated below:

Fig. 4. All-round monitoring of extraction sub-system

The automatic acquisition sub-system has the following features:

1. World's leading automatic data mining function
Knowlesys' web data mining technology is leading in the world and is able to perform accurate collection of any data on any web pages. Every day, Knowlesys provides data mining service from all kinds of websites to clients within and outside China. To achieve this, an efficient and stable acquisition platform is necessary.

2. All targets can be monitored.
News, BBS, blogs, public chat rooms, search engines, message boards, applications, electronic editions of newspapers and websites can be monitored in real-time.

3. Thousands of news websites can be monitored without additional configuration.
With the built-in configuration for worldwide website monitoring, titles and texts can be automatically acquired as long as the key words are typed in.

4. Powerful multi-language centralized processing function
Information in multiple languages can be automatically processed and stored such as Chinese, English, French, German, Japanese and Korean.

5. Smart article extraction
Article texts and titles can be directly extracted from the article-type web pages without additional configuration as well as release dates, while irrelevant contents like adverts, columns and copyright information are removed automatically.

6. All web page conditions are supported:
Popular Web 2.0 AJAX dynamic web site
Auto-login with user ID and password
Form query新闻转载
Next page automatic view
Automatic extraction and combination of article contents extending several pages �?�?�?�?/span>
Automatic downloading of images contained in texts and various attachments
Original snapshot saving option for review
multiple Internet protocols supported: HTTP, HTTPS and FTP
multiple web file formats supported: HTML/XML/CSV/TEXT/RSS/ATOM

7. Automatic deduplication function
For the same URL, each time only the latest uncollected article contents or replies are collected; the contents already aquired are ignored. Automatic deduplication can be applied to reprinted articles.

8. Various built-in post-data processing functions
After data are acquired from web pages, they can be further processed into finer data fields or integrated, replaced or summarized, for example, extraction of key words, street addresses, province/city names, postal codes, telephone numbers, fax numbers, e-mail addresses, QQ/MSN/Skype accounts and URLs. Knowlesys

9. Automatic, unattended acquisition around the clock
The system can either operate by schedule or on a 7/24 basis, at an interval as short as 1 minute.

10. Users can add target websites themselves.
With the acquisition platform provided by the system, users can easily make visual analysis of target websites, configure acquisition task files and add them in the deployment process so as to modify, add and remove any monitored target freely.

V. Functional description of presentation sub-system

The presentation sub-system allows the latest information from all possible source sites to be presented on users' desktop browsers. Its functional architecture is illustrated below:

Fig. 5. Functional architecture of presentation sub-system

The presentation sub-system has the following distinct features:

1. Working in collaboration
Different users view different contents, execute different operations and perform different duties.

2. Displaying article elements
For news and blogs, titles, texts, authors, release time and sources can be collected.
Key words are highlighted 新闻转载
and even title lists can be displayed for quick view

3. Displaying post elements
For posts on BBS, titles, texts, posting time, view counts, number of replies and poster IP addresses can be collected.
Key words are highlighted
and even title lists can be displayed for quick view.

4. Classifying and compiling
The contents acquired can be filtered, classified, added with notes and complied for subsequent management and analysis.

5. Powerful search function
can perform precise search or fuzzy search, and can search by category or by source.

6. Supporting manual adding
The manual adding of articles, and the monitoring of news, BBS and blogs are possible.

7. Anti-website restrictions
Collection of blocked foreign websites in China, collection of websites subject to restrictions to source IP and access frequency and automatic collection of proxy IP addresses are possible without further configuration.

VI. Implementation

The system is mainly applied to all portal operators.
Due to the complexity of the Internet, communication and cooperation with users are required for the implementation of the Knowlesys Web New Reprint System.
We provide the following implementation services to meet user requirements:





Turn-key project

Provide a full package of software and documentations of Knowlesys Web Newshub System;
provide the acquisition configuration files of N websites specified by users.
Ensure the contents of target websites can be timely integrated after the system is launched.



E-training or training at clients' premises


Subsequent services

Provide configuration parameter files after the update of target websites;
revisit and respond to technical queries, answer questions on a regular basis


Technical support

Answer questions from users via Email, QQ/MSN/Skype, give technical support