Taxonomy DNA (cont.) – comparing a specialist vs a generalist

Following our Introduction to Taxonomy DNA, we would like here to showcase the sensitivity of our AI-operated taxonomy.

Comparing a specialist, Techcrunch, and a generalist, the New York Times – Technology.

Taxonomy DNA views: Both 12/18/18, 3% threshold, 7 day rolling learning (a post on this later on).


Techcrunch – Taxonomy DNA – 12/18/18 – 3%,7d

The New York Times – Technology

the New York Times – Technology – Taxonomy DNA – 12/18/18 – 3%,7d

Top 10 categories

Interesting to watch the 4 first categories been the same with more on people for the NYT and more on Industries for Techcrunch., then NYT has Law, Politics, when Techcrunch has Finance and Hardware.

Finally, AI was pretty precise to classify Lifestyle and Digital Life for the NYT and Digital Tech for Techcrunch.

Why it matters.

TrustedOut Corpus Intelligence permits our users to create and maintain corpuses, precisely shaping out their definition of their trust for their analytics. With the example above, shall a study be on Tech AND Law, the NY Times – Technology section would be selected and not Techcrunch.

Like for any survey, the sample onto which the survey will be based on, makes or breaks the trustworthiness and the serious of its outcomes.

Trusted in, Trusted out.

Below is an example of the Corpus creation UI in TrustedOut.

The screenshot above comes from the “Country comparisons” Business Case.



Introducing Taxonomy DNA

The New York Times homepage Taxonomy DNA.
12/14/18-3% threshold-7 day rolling learning.

The New York Times Homepage – 12/14/18 – 3%,7d

AI-Operated Taxonomy.

As you could read in the FAQ, “TrustedOut makes Intelligence trustworthy with its AI-operated media profiling, TrustedOut brings Corpus Intelligence to feed analytics tools, so business managers can make strategic decisions on sources they do trust”.

Collection and Classification.

TrustedOut media profiling uses 2 different methods: collection and classification to permanently feed and update its media database and thus, our user corpuses.

Today, let’s focus on the latter, AI-Operated Classification:

Our taxonomy (the hierarchy and list of all classifications) is now stable. AI continues to learn and classify media and sources at all times.

At very moment, our taxonomy has 3 levels of categories:

  • 4 level1 (trunks)
  • 17 level2 (branches)
  • 291 level3 (leaves)

Comparing New York’s NYT homepage and San Francisco’s SFGate Bay Area News.

Below are the top10 categories for both sources.
Interesting to compare and see, amongst other things, Transportation, Bus in particular appearing in SF and Politics, Law in NY.
Reminder: Here, we are not talking about articles topics but in what categories we are classifying sources (feeds) and thus medias.

SF Gate – Bay Area News New York Times – Homepage
People 48.1%
General 44.9%
Industries 27.5%
People 39.0%
Industries › Transportation 20.1%
General › Politics 26.1%
Sciences 18.1%
General › Law 11.5%
People › Society 13.3%
People › Society 11.4%
People › Public Services 10.0%
General › Politics › Government 8.7%
Industries › Transportation › Bus 8.1%
Sciences 8.1%
People › Public Services › Emergency Services 8.0%
Industries 8.0%
People › Sports 7.0%
General › Politics › Political Party 6.6%
People › Society › Misc News 6.7%
People › Culture And Arts 6.0%

The 2 Taxonomy DNA views

New York Times – Homepage

The New York Times Homepage – Taxonomy DNA – 3%,7d

SFGate – San Francisco Bay Areas News

SFGate Bay Areas – Taxonomy DNA – 12/14/18 – 3%,7d


Corpus Intelligence in action

In our example above, if a TrustedOut user wants to get some insights on Transportation in Buses in America, she/he will use the condition “Taxonomy is Transportation>Bus” AND “Country is USA” in the Corpus definition (UI below). In this case, and with the taxonomy setup (threshold and sensitivity will be covered in another post), only the source for SFGate will part of the Corpus.

The screenshot above comes from the “Country comparisons” Business Case.

Media trust over education stages

2 very interesting and connected surveys, from the Knight Foundation, to have a look at how free speech impacts trust in media over 2 important education stages: College and high school (in America): source Gallup/Knight Nov-Dec 2017 for colleges and Knight 2018 for high school.

Direction hints

As TrustedOut profiles medias, it’s important to get a sense on what’s going to happen and understand how young generations foresee how they are and will be consuming news.

Our findings

The First amendment challenge: Freedom of speech vs diversity and inclusion. both extremely important with 56% vs 52%…

“Students value both free expression and inclusion, though their commitment to free expression may be stronger in the abstract than in reality. Majorities of students say protecting free speech rights (56%) and promoting a diverse and inclusive society (52%) are extremely important to democracy. Students continue to prefer campuses be open learning environments that allow for a wide range of views to be heard than to prefer environments that prohibit certain types of potentially harmful speech, though not as widely as they did in 2016.” 

… but 61% refrain expressing their views because they are afraid others might take offense…

“… more students now (61%) than in 2016 (54%) agree that the climate on their campus prevents some students from expressing their views because they are afraid others might take offense.”

… So, “College students say campus expression has shifted online.”…

“More students say discussion of social and political issues mostly takes place on social media (57%), rather than in public areas of campus (43%). At the same time, an increasing percentage of college students agree that social media can stifle free expression because people fear being attacked or blocked by those who disagree with their views.”

… and “80% agree that the internet has been responsible for an explosion in hate speech.”.
Meanwhile, high school students agree at 89% that “people should be allowed to express unpopular opinions” driving an increased distrust in classic and social medias…

“Almost half (49 percent) of high school students and more than half of teachers (51 percent) say they have not much or not any trust in the media to report news accurately and fairly.” “Only 46 percent of students say they often use social media to get news, compared with 51 percent in 2016.”

… generating an increasing trust in citizen journalism.

“In 2018, 40 percent of students said they trusted content—pictures, videos and accounts—posted by people more than traditional news sources; this number grew from 26 percent in 2016. Teachers also show large increases in trust for citizen journalism efforts.”

Fake news are not a threat to democracy. For them.

“Unlike those who work in and cover the media 24/7, teens don’t really deem “fake news” as a threat to democracy. Just 21 percent of high school students view fake news as a significant threat to democracy. In contrast, 40 percent of teachers view it as a threat to democracy.”


Within the college students, we (TrustedOut) read an interesting shift in the Freedom of Speech vs inclusion balance in US campuses to avoid heated debates and move some of them online and to social media in particular. This eFreedom of speech releases some hates which may be contained within groups and develop its own echo chambers, but may also gain classic media over time.

Younger people in high school are, somewhat unsurprisingly, more opinionated with a strong attachment to the 1st amendment/democracy and a growing, stronger distrust in about all kinds of today established news vehicles. The citizen journalism they tend to privilege is, in reality, not new and, so far, unproven but at any rate, citizen journalism publications will have to incorporate into businesses to financially exist and get a legal status. At that point, they become a logo with values and defined readerships.

Both points above drive to even more information vehicles, more evolving and more granular and thus an even stronger need to have permanent profiling and reactive classifications. TrustedOut Corpus Intelligence is made for this.

B2BX: TrustedOut implements Keycloak for its user management

TrustedOut has selected Keycloak, an open source Identity and Access Management solution from RedHat (recently acquired by IBM for $34B) for its user management.

The perfect B2B eXperience.

Keycloak allows TrustedOut Corpus Intelligence to offer:

One login and multiple accounts? Ok.

Clients with multiple accounts, such as regional marketing managers, will be able to move from account to account without remembering and re-enter any password.

Social logins? Yop.

Clients can continue to use their social login, such as Google, Twitter or Facebook to get into their TrustedOut account. They can also authenticate with existing OpenID Connect or SAML.

Large corporation ready? Absolutely.

Your company uses LDAP or Active Directory servers? TrustedOut can use those and connect in no time.

Frictionless access to Corpus outcomes? Of course.

Getting in TrustedOut with your existing credentials  is good but getting TrustedOut’s outcomes, medias, feeds and article abstracts without any additional signing efforts is even better. The whole experience is totally frictionless. 

Security first? Sure!

Thanks to Keycloak, which is extensively used here, TrustedOut complies with standard protocols and provides support for OpenID Connect, OAuth 2.0, and SAML.

A huge thanks to 

Of trust, Facebook and French Yellow Vests.

In our previous post, “While distrust is general, trust definition is personal.“, we saw an increase in News reading while an increase in distrust in media and a clear split in trust between overall media and the media you read.

Here are the numbers from Reuters Institute and Oxford for France in 2018 (June):

and here are the comparable numbers for the USA:

Quickly, one can read French people pay less for online news, use more ad blockers, trust less the media they use. Matter of fact, the ratio News I use vs Overall Trust is almost 3 times less in France (only 17% more trust for Media I use”) vs the US (47% more trust for “Media I use”)

2 points are interesting in the context of the Yellow Vest in France:

French trust in overall media is increasing (+17%) while the US it’s decreasing (-11%)

Well, not for the Yellow Vests.

As written in Le Figaro (en French) “the anti-media rhetoric is a constant in the discourse of “yellow vests”” and in Le Monde (en French) “anti-media rhetoric, fuelled by press attacks against the movement’s opacity and anti-democratic nature.”

Social networks are the less trusted.

Well, not for the Yellow Vests.

But first, what is the place of Facebook in getting the news?

In America, overall Facebook IS NOT prominent.

When in France, Facebook IS prominent.

And the role of Facebook with the Yellow Vests is significant as The Verge writes “How Facebook Groups sparked a crisis in France“, including an excellent point on the new algorithm which could be linked to what Bloomberg names “France Faces a Typical Facebook Revolution

All this confirms the role of trust within media which is the fondation of TrustedOut Corpus Intelligence. For this article, I decided to trust major media sites identified with high traffic and years in business.

More on this? Country comparisons

While distrust is general, trust definition is personal.

Here are 3 interesting facts (US data): 1/ people are spending more time following the news, according to Pew Research Center, 2/ distrust in news is severe and growing with 72% believing traditional major news sources reporting news they know to be fake, or purposely misleading according to a poll from Axios and SurveyMonkey and finally 3/ Trust in news depends on which news media you mean according to the Media Insight Project.

As the content you use makes your education, your opinions and, most importantly, your decisions-making, defining your trust is mandatory. This is the foundation of TrustedOut. We call it Corpus Intelligence. First targets: the $4B spent in text analytics ($10+B by 2023) to make this intelligence trustworthy and also everyone concerns with Brand Safety to help them define precisely their trusted brand perimeters.

PS: Must read article (This chart is coming from it): ‘My’ media versus ‘the’ media: Trust in news depends on which news media you mean

Corpus Intelligence on current corpuses

Today, your company is using analytics tools and thus, corpuses of content. Are you sure you know what you are feeding your intelligence tools with? Importing your existing corpuses to TrustedOut will help.

Let’s take an example of two corpuses, one for the USA and on for France, and ask TrustedOut if you are comparing apples to apples.

Now, let’s click on that “Corpus analytics” button to discover…

… the US corpus top category is “Cultures & Arts” while the French on is “Business”.

Click here to get the full business case