Open navigation

How do I reduce noise?

Modified on: Wed, 16 Oct, 2024 at 4:26 AM

What is noise?

Noise, or irrelevant information, are messages that do not stroke with the purpose of the query. Simply put: noise is made up of every irrelevant and unwanted message together. Preferably you would like to minimize noise to a low a level as possible to not become overwhelmed as a user by the large quantities of messages, and only keep the relevant messages. 

 

What types of noise are there? 

Noise appears in varying forms and by recognizing them, it becomes easier to apply effective noise-reducing methods. Below are four of the most prominent types of noise. 

 

1. Double meaning

The type of noise with double meanings is caused by a word or sentence that can have more than 1 meaning. To illustrate: the word jab can be used in two different ways. As in a vaccination shot (e.g. COVID-19 jab) or as in punching someone (e.g. a jab to the chin). This becomes noise once you are interested in only 1 of the two meanings. 


A close up of a text

Description automatically generated


2. Wrong context

Noise is also caused by searching broader than the scope of the topic of interest. Generic terms like 'fireworks' or 'corona' are an example of this. In 2020 and beyond, corona is mentioned in virtually every message, without this being relevant to the topic of your interest (like fire in testing streets, vaccinations etc.). 


The risk of noise can be even greater when you're searching for multiple words, in an undefined order. For example, when you want to collect messages of people saying it's "too full" somewhere inside, the content is easily changed when the order of words change, like in the example below. 

 

A close up of a text

Description automatically generated


3. Unwanted sources

There is a significant large part of social media that posts always the same type of unwanted messages. Consider for example bot accounts, sex advertisements, real estate companies, Twitter news accounts (sometimes). Messages from these accounts are noise when they post predictable content, irrelevant to the purpose of the search strategy. 


 

4. Unknown language

Within Maltego Monitor it's possible to select a search language for the searches. The most relevant language is preset by default on your Maltego Monitor account. 



Logically, you would expect to collect only messages in this language. Unfortunately, there is an exception to the sources of YouTube, Fora, Telegram and News websites. 


Messages from these sources are not preemptively labeled by the source, which means you might encounter messages from these sources in a different language than the language you're searching in. 


A close up of a white background

Description automatically generated


How do I localize noise?

Before you start to reduce noise, it's important to identify the main causes. Always start with the Searches that generate the most messages, as these usually also contain the most noise. We advise making use of several analytical tools to isolate the causes of noise. 

 

Sources

On top of the message in a case, you can find a dropdown list of the analytical tools. Open the tool called 'Sources'. Open this to see which of the sources contain the most messages. Due to the nature of the source, some might cause different types of noise than others. 


A white background with black text

Description automatically generated


Wordcount

Use the Wordcount option to quickly identify the causes of noise. The 8 most frequently used words can indicate which of the searches is collecting what kind of noise. By selecting the word, you temporarily filter all messages with this word. Add the word to a search or filter by selecting the 3 dots beside it. This helps you to make different combinations.


A screenshot of a computer

Description automatically generated


Accounts

Accounts show which of the accounts post the most within the search. Zoom in on the account by selecting them or adding them as a Search or filter. 


 

How can I reduce noise?

Reducing noise starts already when creating the case. Yet, creating a noise-free case preemptively is not always possible, for example when a hashtag on Twitter suddenly becomes trending, or simply because it can be hard to predict how noise is caused. 


The following methods to reduce noise can be applied at any stage in the process. We advise you to strive for a nice 80-20 balance (80% relevant - 20% noise). 


Method

Explanation

Type of noise

Example

Purposeful formulation

Formulate terms in such a specific way that it is proportional to the search goal. 

Double meaning/ Wrong context

  • Fireworks ban instead of fireworks
  • Corona measures  instead of corona

AND combinations

Combine terms and building blocks to search in a specific manner. 

Double meaning/ Wrong context

  • Storm Gerda instead of Gerda
  • Market square Amsterdam instead of market square

Quotation marks “”

Use quotation marks always when the order of the words in a phrase should not be changed. 

Wrong context

  • “Too full”

Exclude terms on case level

Add exclusion terms to a case, to exclude messages with this term in the entire case 

Double meaning

  • Add 'COVID' as an exclusion term to exclude messages about the COVID jab

Exclude/report accounts

Exclude or report Twitter-accounts and News sources to stop collecting data of them.  

Unwanted sources

 

Exclude terms on building block level

Exclude terms on the entire building block (larger impact) 

Double meaning

  • Add 'COVID' as an exclusion term to exclude messages about the COVID jab

Exclude terms on term-level

Exclude terms per term in the building block (smaller impact) 

Double meaning

  • Add 'COVID' as an exclusion term to exclude messages about the COVID jab

Unknown language filter

Hide messages from YouTube, Fora, Telegram and news websites, in a different language than the search language

Unknown language

A screenshot of a computer

Description automatically generated

 

 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.