1. The Taxonomy:
A media has sources which each publishes new articles. From every new article, we solely keep useful words (no stop words such as "the, is, but...") which we call an "abstract".
Every single word of this new abstract is matched vs several hundreds of datasets. Every single classification in our taxonomy has its own dataset.
Gauging the media from the Inside.
Including expertise and sensitivity.
This method is close to the tribe model. Every tribe uses a dialect made of words signing this dialect. When you recognize a dialect, you recognize a tribe. Here is a classification. Depending on the number and the weight of words, we are able to gauge the level of expertise in the classification. This gives us a score for the article. Playing with the length of past days, we can also gauge the sensitivity to news for the sources. Compiling sources tells us where the media stands.
Each 1 million abstracts, this means 75,000,000,000 operations.
Bottom line: We have an unbiased, universal taxonomy, always updated and able to be filtered by expertise level and over 3 periods of time to sense time sensitivity
2. The Perceptions:
"Fake news", "Junk science", and other toxic appreciations are tangible. Rarely, can this be intangible (for sure), because those are up to appreciations. What is "fake news" to some people, is not for others. This is why we treat those appreciations as "spotted as", or "perceived on the internet as".
Same as for our taxonomy, perception is an appreciation, but where taxonomy is about the publisher itself, we call it the Publisher Inside, the perception is the Publisher Outside, or how the publisher is perceived for those terms.
To do this, we collect how the publisher is perceived on the Internet, strictly excluding any publisher properties.
This gives us pages and words which we match with Perception datasets (one for "fake news", one for "junk science" and so on) in a similar way explained above.
We then have a score which when above a threshold make the publication "spotted as fake news" for example.
Gauging the media from the Outside.
How is a Media "spotted as".
Bottom line: We sense how a publication is perceived as not one person or group can make any universal statement.
3. Keep up with language evolutions:
Permanently update and control overlapping in datasets.
For the taxonomy, words from articles are measured vs Classifications datasets (a.k.a. Bags of words)