What is noise?
Noise, or irrelevant information, are messages that do not stroke with the purpose of the query. Simply put: noise is made up of every irrelevant and unwanted message together. Preferably you would like to minimize noise to a low a level as possible to not become overwhelmed as a user by the large quantities of messages, and only keep the relevant messages.
What types of noise are there?
Noise appears in varying forms and by recognizing them, it becomes easier to apply effective noise-reducing methods. Below are four of the most prominent types of noise.
1. Double meaning
The type of noise with double meanings is caused by a word or sentence that can have more than 1 meaning. To illustrate: the word jab can be used in two different ways. As in a vaccination shot (e.g. COVID-19 jab) or as in punching someone (e.g. a jab to the chin). This becomes noise once you are interested in only 1 of the two meanings.
2. Wrong context
Noise is also caused by searching broader than the scope of the topic of interest. Generic terms like 'fireworks' or 'corona' are an example of this. In 2020 and beyond, corona is mentioned in virtually every message, without this being relevant to the topic of your interest (like fire in testing streets, vaccinations etc.).
The risk of noise can be even greater when you're searching for multiple words, in an undefined order. For example, when you want to collect messages of people saying it's "too full" somewhere inside, the content is easily changed when the order of words change, like in the example below.
3. Unwanted sources
There is a significant large part of social media that posts always the same type of unwanted messages. Consider for example bot accounts, sex advertisements, real estate companies, Twitter news accounts (sometimes). Messages from these accounts are noise when they post predictable content, irrelevant to the purpose of the search strategy.
4. Unknown language
Within Maltego Monitor it's possible to select a search language for the searches. The most relevant language is preset by default on your Maltego Monitor account.
Logically, you would expect to collect only messages in this language. Unfortunately, there is an exception to the sources of YouTube, Fora, Telegram and News websites.
Messages from these sources are not preemptively labeled by the source, which means you might encounter messages from these sources in a different language than the language you're searching in.
How do I localize noise?
Before you start to reduce noise, it's important to identify the main causes. Always start with the Searches that generate the most messages, as these usually also contain the most noise. We advise making use of several analytical tools to isolate the causes of noise.
Sources
On top of the message in a case, you can find a dropdown list of the analytical tools. Open the tool called 'Sources'. Open this to see which of the sources contain the most messages. Due to the nature of the source, some might cause different types of noise than others.
Wordcount
Use the Wordcount option to quickly identify the causes of noise. The 8 most frequently used words can indicate which of the searches is collecting what kind of noise. By selecting the word, you temporarily filter all messages with this word. Add the word to a search or filter by selecting the 3 dots beside it. This helps you to make different combinations.
Accounts
Accounts show which of the accounts post the most within the search. Zoom in on the account by selecting them or adding them as a Search or filter.
How can I reduce noise?
Reducing noise starts already when creating the case. Yet, creating a noise-free case preemptively is not always possible, for example when a hashtag on Twitter suddenly becomes trending, or simply because it can be hard to predict how noise is caused.
The following methods to reduce noise can be applied at any stage in the process. We advise you to strive for a nice 80-20 balance (80% relevant - 20% noise).
Method | Explanation | Type of noise | Example |
Purposeful formulation | Formulate terms in such a specific way that it is proportional to the search goal. | Double meaning/ Wrong context |
|
AND combinations | Combine terms and building blocks to search in a specific manner. | Double meaning/ Wrong context |
|
Quotation marks “” | Use quotation marks always when the order of the words in a phrase should not be changed. | Wrong context |
|
Exclude terms on case level | Add exclusion terms to a case, to exclude messages with this term in the entire case | Double meaning |
|
Exclude/report accounts | Exclude or report Twitter-accounts and News sources to stop collecting data of them. | Unwanted sources |
|
Exclude terms on building block level | Exclude terms on the entire building block (larger impact) | Double meaning |
|
Exclude terms on term-level | Exclude terms per term in the building block (smaller impact) | Double meaning |
|
Unknown language filter | Hide messages from YouTube, Fora, Telegram and news websites, in a different language than the search language | Unknown language |