Economy and Enterprise – US vs France Taxonomies.

France’s Economy and Enterprise Taxonomy.

Let’s ask TrustedOut, for France, what is the Taxonomy of all media covering the “Economy and Enterprise” Classification group over the past 7 days.

Here’s the corpus query:

Showing 373 Medias and 750 sources

and the taxonomy DNA:

USA’s Economy and Enterprise Taxonomy.

Let’s ask TrustedOut, for the US, what is the Taxonomy of all media covering the “Economy and Enterprise” Classification group over the past 7 days.

Here’s the corpus query:

Showing 1,961 Medias and 3,645 sources

and the taxonomy DNA:


Comparisons:

France USA
General, 43.6% General, 41.9%
General > Economy and Enterprise, 20.2% General > Economy and Enterprise, 21.5%
General > Economy and Enterprise > Economy, 6.2%
General >  Finance, 5.2% General >  Finance, 5.4%
General >  Law, 6.5%
General > Tech, 8% General > Tech, 8%
Industries, 22.2% Industries, 19.1%
Sciences, 6.7% Sciences, 7.7%
People, 27.3% People, 31.2%
People > Culture and Arts, 7.3%
People > Sports, 6.6%
People > Lifestyle, 5.9%

How to read the table above: The percentage means how much of the classification datasets are in the publication. Ex: In the US,  media covering Economy and Enterprise have also 5.9% have words belonging to the classification Lifestyle which is part of People.

In France, media covering “Economy and Enterprise” also talk:

Deeper in Economy
Law
Culture and Arts
Sports

While, in the US, media covering “Economy and Enterprise” also talk:

Lifestyle

Fine tuning your corpus to compare apple to apple.

Want to compare countries for a product launch but do not want the Lifestyle classification in the US?

Simple add the IS NOT Lifestyle taxonomy:

Voila. Now run your analytics on those 2 corpuses or/and Get the corresponding whitelists…

Questions? Contact us!

 

 

Fine tuning your corpus to perfect analytics and build brand consistent whitelists.

Let’s have a look at some cool product updates our alpha-testers can enjoy since last night.

Demo scenario: Let say we want to create a Corpus for the US Food Market for some analytics on our brand and a new ad campaign coming right up.

1/ The broad definition. Country and Taxonomy.

Add country, select United States.

Add Taxonomy IS made of these two classifications :

  • Industry > Manufacturing and Retail > Food and Beverages
  • People > Lifestyle > Food and Beverages Services.

As you know, TrustedOut also profiles the level of expertise and the sensitivity on news for each media over the period of rolling time the taxonomy is computed. Here we want ALL levels and a taxonomy, stable, over the past rolling quarter (-90 days from today). We do recompute and update everything permanently.

We have 4,003 media for our Whitelist and 10,027 sources to feed our analytic tools with.

2/ Refining the target. Excluding a classification.

For this effort, we do not want media specialized in Food Processing, profiled over the same period of time, so we exclude it from our Corpus like this:

  • IS Industry > Manufacturing and Retail > Food and Beverages
  • IS People > Lifestyle > Food and Beverages Services.
  • IS NOT Industry > Agriculture > Food Processing

We now have 487 media and 732 sources.

3/ Hand picking media we do or do not want. By name, by URL.

(this is an example. nothing personal for those sites 🙂

From past experiences, we do not want to work with anything related to foodnavigator.com and its subsidiaries, neither do we want a site named “Food processing”. Clicking on “Get” and Scrolling through the media list TrustedOut gives me, I see they, indeed, are in the list:

Let’s remove them.

Let’s tell our corpus to add the following conditions:

  • Name DOES NOT CONTAIN “Food processing”
  • Website DOES NOT CONTAIN “foodnavigator” in its domain

Voila. 484 media and 726 sources.

Corpus is ready to feed our BI Analytic Tool and be our whitelist to imported in our DSP.

Questions? Shoot!

 

 

How our AI-powered classification works.

We receive a lot of questions about how our AI-powered Classification is working, so we decided to make 2 drawings to explain how it works.

1. The Taxonomy.
Gauging the media from the Inside.
Including expertise and sensitivity.

A media has sources which each publishes new articles. From every new article, we solely keep useful words (no stop words such as “the, is, but…”) which we call an “abstract”.

Every single work of this new abstract is matched vs several hundreds of datasets. Every single classification in our taxonomy has its own dataset.

Each 1 million abstracts, this means 75,000,000,000 operations.

This method is close to the tribe model. Every tribe uses a dialect made of words signing this dialect. When you recognize a dialect, you recognize a tribe. Here is a classification. Depending on the number and the weight of words, we are able to gauge the level of expertise in the classification. This gives us a score for the article.

Playing with the length of past days, we can also gauge the sensitivity to news for the sources. Compiling sources tells us where the media stands.

Bottom line: We have a universal taxonomy, always updated and able to be filtered by expertise level and over 3 periods of time to sense time sensitivity (and trend forming, but we’ll tell you more real soon)

2. The Perceptions.
Gauging the media from the Outside.
How is a Media “spotted as”.

“Fake news”, “Junk science”, and other toxic appreciations are tangible. Rarely, can this be intangible (for sure), because those are up to appreciations. What is “fake news” to some people, is not for others. This is why we treat those appreciations as “spotted as”, or “perceived on the internet as”.

Same as for our taxonomy, perception is an appreciation, but where taxonomy is about the publisher itself, we call it the Publisher Inside, the perception is the Publisher Outside, or how the publisher is perceived for those terms.

To do this, we collect how the publisher is perceived on the Internet, strictly excluding any publisher properties.

This gives us pages and words which we match with Perception datasets (one for “fake news”, one for “junk science” and so on) in a similar way explained above.

We then have a score which when above a threshold make the publication “spotted as fake news” for example.

Bottom line: We sense how a publication is perceived as not one person or group can make any universal statement.

Questions? Shoot!

 

Adding News-Sensitivity in our Taxonomy.

Taxonomy classification over different periods for CNN Politics.

Brand Safety 2.0 is about brand values.

As we wrote in our previous post: “Brand Safety 1.0 was about toxic keywords, 2.0 adds brand values.”

Brand values are tangible. So must be media profiling.

To gauge brand values, which are made of tangible perceptions, the matching publisher brands must be profiled with content classification, using AI to be unbiased, universal and always up-to-date.

Our AI-based Taxonomy and massive data processing already allow universal taxonomy AND expertise depth…

We’ve presented, in previous posts, our universal taxonomy and its DNA view: “Media profiles are key to Business Intelligence and Advertising.”

… announcing today, new-sensitivity in taxonomy!

Playing with periods of time in the past, past week, past month, past quarter, we are now able to classify accordingly our classification and thus, here, our taxonomy.

In other words, depending on the marketeer project and brand values, TrustedOut will be able to deliver news-sensitive or stable media.

No UI yet, but we couldn’t keep this for ourselves, here’s how CNN – Politics looks like over past week, month and quarter.

A quick read is:

International:

… disappears from top 5 over the period of time of the past quarter (-90 days). This might be due Iran and trade war/mexico stuff. Depth is also getting lower with time.

Political party:

… goes lower with time.

Defense gets civil over time and Education and LGBTQ are very news sensitive. Disappear over time.

New UI and new killer feature coming up…

We will include news-sensibility in our Corpus definition and, teasing again, we’ll reveal a killer features using this brand new and unique capability,

Stay tuned.

Feel free to reach out if you have a question!

 

 

 

 

 

 

New demo page showcasing TrustedOut and BI, Ads and PR

A new demo page has been added to TrustedOut.com

The scenario

 

ACME is a sport car maker launching a new model extensively using Artificial Intelligence (AI). ACME has 2 main countries, US and France and wonder what market to test first.

1. Corpus Intelligence for Business Intelligence: Market selection.

New corpus, the CMO (or Marketing Manager) defines 3 conditions to be necessary.
a. Where are the publications? We said France and the United States
b. What should these publications be about? ACME wants to grab how AI is perceived from publications covering Politics, for regulations, Law, for any legal aspects, Tech, to gauge technology used and perceptions and, of course, Transportation, for anything car related.
c. Want to be safe from any toxic content? Of course, no fake new and no junk science TrustedOut classification knows how gauge the expertise level of a source and how sensitive to the news the taxonomy should be. At this stage, we want generalist publications by setting the expertise level to “Covered” Here is the corresponding query for our Corpus, which we are going to name “ACME AI in new model”.

Go to the demo page >

2. Corpus Intelligence for Brand Safety & Campaigns. White listing.

ACME’s CMO wants to check if Pure Player Media (media only available online) is a good target. After all, Pure Players should be more reactive and not having to sync print, for example, that can be daily, weekly or monthly, with immediate online publishing. Let’s go back to TrustedOut and change the Corpus as follow: a. Where are the publications? We now want to limit to France. b. Select Pure Players? We want media where “out of digital” is set to None to only get those not publishing on any other support.

Go to the demo page >

3. Corpus Intelligence for Coverage & Content Analytics. PR campaigning.

Digimind gives us the key concepts to write our Press Release: European Union/Commission and Neuronal Networks. With the Corpus we have what publications to target, with those key concepts we have how to write a Press Release that will interest those targets.

Go to the demo page >

Questions? Shoot!

Deck and demo from our 1st public event: TrustedOut+Digimind.

It was this Thursday morning and it was great. It was our first public presentation and it was great to partner with Digimind to show why TrustedOut can make Intelligence smarter and trustworthy. Merci Aurelien and Valentin.

The deck. TrustedOut.com/Digimind

Deck is in english. If you have question, let us know with the form below.

The demo. Step by step.

The scenario

ACME is a sport car maker launching a new model extensively using Artificial Intelligence (AI). ACME has 2 main countries, US and France and wonder what market to test first.

Step 1. Corpus Creation for country comparison.

New corpus, the CMO (or Marketing Manager) defines 3 conditions to be necessary.

a. Where are the publications? We said France and the United States
b. What should these publications be about? ACME wants to grab how AI is perceived from publications covering Politics, for regulations, Law, for any legal aspects, Tech, to gauge technology used and perceptions and, of course, Transportation, for anything car related.
c. Want to be safe from any toxic content? Of course, no fake new and no junk science

TrustedOut classification knows how gauge the expertise level of a source and how sensitive to the news the taxonomy should be.

At this stage, we want generalist publications by setting the expertise level to “Covered”

Here is the corresponding query for our Corpus, which we are going to name “ACME AI in new model”.

Once ready, “Save” will show us how many media and sources our Corpus will include…

… and the Taxonomy of your Corpus.

Let’s now connect your Corpus to Digimind to get Social Intelligence from your Corpus. Process is simple, click on “Get” and, instead of “Downloading” a csv or json file with all media and sources, which will not be up dated at all time, click on Connect and pick Digimind.

Your “ACME AI in new model” Corpus is now live and accessible for any projects related to this corpus definition. TrustedOut will continue to update it, all the time, with relevant media and sources.

Digimind collects content from those media sources, so no need to also connect “article abstracts” with Digimind.

Step 2. Comparing countries on AI.

As the Corpus is immediately available and up to date in Digimind, we can read the following top concepts in both countries about AI.

ACME is very sensitive to ethic in AI, so consequently pick France as the first country to test its new model to handle this ethic topic super carefully.

Step 3. Best media profiles for ad campaigns.

ACME’s CMO wants to check if Pure Player Media (media only available online) is a good target. After all, Pure Players should be more reactive and not having to sync print, for example, that can be daily, weekly or monthly, with immediate online publishing.

Let’s go back to TrustedOut and change the Corpus as follow:

a. Where are the publications? We now want to limit to France.
b. Select Pure Players? We want media where “out of digital” is set to None to only get those not publishing on any other support.

“Save”. And now we get these amounts

Step 4. The perfect mix ethic and Business for a 1st ad campaign.

While France is more “Ethic” on AI, Pure Players are more Business Oriented vs all. ACME CMO is seeing the growth from 37% (all media) to 45% (Pure Players) in business for this selection of media as the perfect vehicle to test an ethic message onto business oriented people.

Step 5. Talk to the talk in AI.

Now, ACME wants to launch its first Press Release and wants to address first the geek, very technical community.

Let’s go back to TrustedOut and make the following changes:

a. What should these publications be about? We now want only Tech and Transportation publications
b. How expert? Dedicated.

… and, of course, more specialized pubs means less as a total:

Step 6. Key concepts for an optimal PR campaign.

Digimind gives us the key concepts to write our Press Release: European Union/Commission and Neuronal Networks.

With the Corpus we have what publications to target, with those key concepts we have how to write a Press Release that will interest those targets.

Bottom line:
TrustedOut+Digimind = Market selection, Optimal ad budgets and Perfect PR.

Questions? Shoot!

 

TrustedOut AI-Operated Classification

Data Collection and Content Classification.

Our database of Media profiles has 2 distinct jobs. Collecting intangible data, like revenue, ownership, years online…) and Classifying content for our taxonomy and how sites are “spotted as” (like “fake news”, “junk science”…)

Data Collection is a multi-references, cross checking and evolution watch crawling exercise when…

Content Classification is all about Machine Learning.

And all about “bags of words”. For every classification job, we build datasets made of words onto which the frequency of occurence is used to train a classifier.

As mentioned above, we have 2 types of Classification: Taxonomy and “spotted as”.

Taxonomy Classification.

As in the graphic above, every articles is matched against our taxonomy datasets so we can classify each and every article. This gives us a clear picture of a feed, and thus, the whole media.

This, of course, makes a (big) lot of operations: 75,000 per article. Yes, 75 Billions ops per million of articles daily.

Taxonomy fun facts (as of today!)

Taxonomy DNA

Hereafter is the visualization of the New York Times, Tech section’s DNA.

Sensitiveness and depth customization. Tailor-made for the analyst.

Datasets used to classify articles can use a customized buffer of time for those datasets and thus, manage how sensitive to daily news the taxonomy will be. In addition, cliffs can also be customized to select a depth of expertise, from “dedicated” to “covered” or even “all sounds”. Both combined, plus the “always up-to-date” factor, makes our taxonomy perfectly tailor-made for the job the analyst wants to run. Reason why we use “Corpus Intelligence” as our tagline.

Enterprise mapping.

We can also link our taxonomy to our Enterprise Client’s taxonomy, so Corpus Intelligence can use the client’s business environment, (We’ll cover this in a dedicated post later. If you can’t wait, ask using the form below)

“Spotted as” Classification.

Point of being AI-Operated is we do not have any emotion or opinion. Everything is made for our client to define what they truly need and trust for content.

TrustedOut does not score nor judge anything or anyone. In addition, notions like “fake news” is not as cristal clear as people may think. The “Media, Trust and Democracy report” says it perfectly in its introduction: “Concern about “fake news” is high, but we can’t agree on what that means.”

A vivid picture on how a Media is “spotted as”.

As, TrustedOut profiles Media and their brand values, we have developed a sophisticated way to classify how a Media is “spotted”. In other words, we do not score or judge, we tell you if a Media is “spotted as” a fake news publication, for example.

In addition, the way a Media is “spotted as” varies over time. Some are getting worse, some are just revivals of previously shutdown ones, some are, of course, fixed and improved. This is why it’s mandatory to keep an always updated classification. And consequently, have your Corpus of documents always up-to-date.

Works with any terms. Bad or good.

“Fake news” is always the first coming to mind, then all toxic or suspicious terms like “Extreme bias”, “Junk Science”… but it can also works perfectly for neutral or positive terms, like “Visionary”, “Optimistic”… This opens doors to Enterprise-wide personalization.

Questions? Shoot!

 

 

 

 

 

Keywords (Data) Voids: Misinformations via Google and Bing.

Credit: pexels.com

In decreasing order of Trust in News: Media I use, Media Overall, Search engines and Social Media.

From the must-read Reuters Institute and Oxford University Digital News Report, you can read the following for the US:

Misinformation using keyword voids via Google and Bing. The “evil unicorn problem”.

Keywords Voids, also known as Data Voids, might not be the only reason for this low level of Trust but it’s important to know how this works.

Desperately seeking quality content.

Every one of us searches Google 3-4 times every day.

But every searches are not equal. Lots of searches are too vague and thus will return lots of noise and (yes!) 15% of all searches on a yearly basis were never searched before.

Bottom line, in the too vague a query, you will add more words and the combinaison may not have much quality content. Same for searches never searched before.

This means there are many search terms for which the available relevant data is limited, non-existent, or deeply problematic. We call these “data voids” or “keywords voids”

The malicious exploit: Wide open door to misinformation and manipulation.

Typology of Keywords/Data Voids (source (highly recommended read): Data Voids: Where Missing Data Can Easily Be Exploited)

Active Keywords/Data Voids on breaking news.

“Data voids that are actively weaponized by adversarial actors immediately following a breaking news event, usually involving names of locations or suspects in violent attacks (e.g., “Sutherland Springs” or “Parkland.”)”

Active Keywords/Data Voids on problematic terms.

“Data voids that are actively weaponized by adversarial actors around problematic search terms, usually with racial, gendered, or other discriminatory intent (e.g., “black on white crime” or “The Greatest Story Never Told” or “white genocide statistics.”)”

Passive Keywords/Data Voids on a particular group

“Data voids that passively reflect bias or prejudice in society but are not ac- tively being weaponized or exploited by a particular group (e.g., “CEO.”)”

A byproduct of cultural prejudice.

Not an easy task. “Data voids are a byproduct of cultural prejudice and a site of significant manipulation by individuals and organizations with nefarious intentions. Addressing data voids cannot be achieved by removing problematic content, not only because removal might go against the goals of search engines but also because doing so would not be effective. Without high-quality content to re- place removed content, new malicious content can easily surface”

Responding to data voids requires making certain that high-quality content…

“Unlike other forms of content moderation, responding to data voids requires making certain that high-quality content is available in spaces where people may seek to exploit or manipulate users into engaging with malignant information.”

… but only you can decide what is a “quality content”.

This is why we are building TrustedOut. Am AI-Operated database of media profiles. Unbiased. Up-to-date. Universal.

Questions? Shoot!

 

Presentation to the GESTE (major online editors in France)

We were very proud to be invited to present at the latest GESTE (major online editors in France) event about “Trust and labelization“.

The presentation.

Click to run presentation

 

The Table of Content.

The problem:
Distrust in media.

The consequence:
In decision-making, only what is trusted in can be trusted out.

The point:
Distrust is general, trust is personal. No universal list.

The logic:
Trust is about reputation. Media Brands are about reputation.

The solution:
Industrial Profiling Media Brands.

The application:
Easy querying, live feeding.

The technology:
Machine learning, Web crawling, big data and microservices to self-feed, self-grow and daily validations

Our live taxonomy:
Permanent machine learning, customizable sensitiveness & specialty depth,
enterprise mapping.

The opportunity:
BI, Ads & PR

Question? shoot!