Web Monitoring: From Big Data to Small Data Analysis Through OSINT
Every organization should know what information is circulating on the Internet about its activities so that it can use concrete actions to handle any potential threat to the organization or to capture a competitive advantage for its own business. Organizations should monitor not only the so-called “sentiment,” but also, for example, information on the unauthorized sale of products, existence of information classified as confidential, unauthorized use of name and trademark, and counterfeiting.
The identification and assessment of specific risk stemming from information freely circulating on the Internet are often unsatisfactory, inefficient and costly. Monitoring the web means working directly on the big data, which carries multiple challenges: too much information to be managed, limited availability of information on which to focus the research, repetitive and frustrating manual work, problems obtaining information in time, and the assignment of priorities to the results obtained.
It should be noted that the web monitoring process has focused services available on niches of data, which can improve the quality of research compared to generalist search engines. There are specialized search engines, portals and social networks equipped with their own internal research and aggregators. Using them makes it possible to go from a generalized manual search process (costly, inefficient, obscure, delayed) to a more focused service (automatic, economical, intelligent, on time), which is useful for discovering external risk to business objectives.The resulting information, appropriately filtered, aggregated and put in order is, therefore, focused, can be easily analyzed, and is of better quality.
Three Steps for Defining the Web Monitoring Model
The first step to creating a systematic search process of all the information on the web that is potentially interesting to the organization (web monitoring) is to establish the context (e.g., industry sector, type of activity, values, problems, needs, goals, expectations) of the organization.
After conducting an analysis of its context and its problems, the organization can then prepare a list of Internet risk. This is useful for identifying and isolating the lists of key information, which will form the definition basis of the search rules.
The second step consists of understanding what types of public information archives on the Internet (news, forums, e-commerce, web pages, databases, file sharing services, messages, social media, etc.) are useful in terms of business objectives. To be clear, the organization needs to identify Open-Source Intelligence (OSINT) services, useful for improving the recovery of the information.
The third and last step is the systematic analysis of the data to extract the most reduced and, therefore, humanly manageable set of information. The technology is powerful for making decisions, but a little human knowledge on the business goals drastically reduces false positives, i.e., collected data that are not a problem for the organization and whose knowledge does not bring any advantage to it. Because of this, it is necessary to use an organizational process that involves both technology and human resources.
In summary, this web monitoring service:
Identifies the key words and the search rules based on the knowledge of the internal purposes of the business, its questions, requirements and expectations
Extracts sets of specialized data from OSINT services that are sufficiently small enough to be processed by traditional methods, compared to the total chaos of big data
Further reduces the size of the data sets and improves the quality of the results by combining human work and ranking algorithms
The upper layer of the results obtained, arranged by the desired ranking, represents the information useful for top management to understand the exposure to risk or potential opportunities for the company.