TrustedOut AI-Operated Classification

Data Collection and Content Classification.

Our database of Media profiles has 2 distinct jobs. Collecting intangible data, like revenue, ownership, years online…) and Classifying content for our taxonomy and how sites are “spotted as” (like “fake news”, “junk science”…)

Data Collection is a multi-references, cross checking and evolution watch crawling exercise when…

Content Classification is all about Machine Learning.

And all about “bags of words”. For every classification job, we build datasets made of words onto which the frequency of occurence is used to train a classifier.

As mentioned above, we have 2 types of Classification: Taxonomy and “spotted as”.

Taxonomy Classification.

As in the graphic above, every articles is matched against our taxonomy datasets so we can classify each and every article. This gives us a clear picture of a feed, and thus, the whole media.

This, of course, makes a (big) lot of operations: 75,000 per article. Yes, 75 Billions ops per million of articles daily.

Taxonomy fun facts (as of today!)

Taxonomy DNA

Hereafter is the visualization of the New York Times, Tech section’s DNA.

Sensitiveness and depth customization. Tailor-made for the analyst.

Datasets used to classify articles can use a customized buffer of time for those datasets and thus, manage how sensitive to daily news the taxonomy will be. In addition, cliffs can also be customized to select a depth of expertise, from “dedicated” to “covered” or even “all sounds”. Both combined, plus the “always up-to-date” factor, makes our taxonomy perfectly tailor-made for the job the analyst wants to run. Reason why we use “Corpus Intelligence” as our tagline.

Enterprise mapping.

We can also link our taxonomy to our Enterprise Client’s taxonomy, so Corpus Intelligence can use the client’s business environment, (We’ll cover this in a dedicated post later. If you can’t wait, ask using the form below)

“Spotted as” Classification.

Point of being AI-Operated is we do not have any emotion or opinion. Everything is made for our client to define what they truly need and trust for content.

TrustedOut does not score nor judge anything or anyone. In addition, notions like “fake news” is not as cristal clear as people may think. The “Media, Trust and Democracy report” says it perfectly in its introduction: “Concern about “fake news” is high, but we can’t agree on what that means.”

A vivid picture on how a Media is “spotted as”.

As, TrustedOut profiles Media and their brand values, we have developed a sophisticated way to classify how a Media is “spotted”. In other words, we do not score or judge, we tell you if a Media is “spotted as” a fake news publication, for example.

In addition, the way a Media is “spotted as” varies over time. Some are getting worse, some are just revivals of previously shutdown ones, some are, of course, fixed and improved. This is why it’s mandatory to keep an always updated classification. And consequently, have your Corpus of documents always up-to-date.

Works with any terms. Bad or good.

“Fake news” is always the first coming to mind, then all toxic or suspicious terms like “Extreme bias”, “Junk Science”… but it can also works perfectly for neutral or positive terms, like “Visionary”, “Optimistic”… This opens doors to Enterprise-wide personalization.

Questions? Shoot!

 

 

 

 

 

Keywords (Data) Voids: Misinformations via Google and Bing.

Credit: pexels.com

In decreasing order of Trust in News: Media I use, Media Overall, Search engines and Social Media.

From the must-read Reuters Institute and Oxford University Digital News Report, you can read the following for the US:

Misinformation using keyword voids via Google and Bing. The “evil unicorn problem”.

Keywords Voids, also known as Data Voids, might not be the only reason for this low level of Trust but it’s important to know how this works.

Desperately seeking quality content.

Every one of us searches Google 3-4 times every day.

But every searches are not equal. Lots of searches are too vague and thus will return lots of noise and (yes!) 15% of all searches on a yearly basis were never searched before.

Bottom line, in the too vague a query, you will add more words and the combinaison may not have much quality content. Same for searches never searched before.

This means there are many search terms for which the available relevant data is limited, non-existent, or deeply problematic. We call these “data voids” or “keywords voids”

The malicious exploit: Wide open door to misinformation and manipulation.

Typology of Keywords/Data Voids (source (highly recommended read): Data Voids: Where Missing Data Can Easily Be Exploited)

Active Keywords/Data Voids on breaking news.

“Data voids that are actively weaponized by adversarial actors immediately following a breaking news event, usually involving names of locations or suspects in violent attacks (e.g., “Sutherland Springs” or “Parkland.”)”

Active Keywords/Data Voids on problematic terms.

“Data voids that are actively weaponized by adversarial actors around problematic search terms, usually with racial, gendered, or other discriminatory intent (e.g., “black on white crime” or “The Greatest Story Never Told” or “white genocide statistics.”)”

Passive Keywords/Data Voids on a particular group

“Data voids that passively reflect bias or prejudice in society but are not ac- tively being weaponized or exploited by a particular group (e.g., “CEO.”)”

A byproduct of cultural prejudice.

Not an easy task. “Data voids are a byproduct of cultural prejudice and a site of significant manipulation by individuals and organizations with nefarious intentions. Addressing data voids cannot be achieved by removing problematic content, not only because removal might go against the goals of search engines but also because doing so would not be effective. Without high-quality content to re- place removed content, new malicious content can easily surface”

Responding to data voids requires making certain that high-quality content…

“Unlike other forms of content moderation, responding to data voids requires making certain that high-quality content is available in spaces where people may seek to exploit or manipulate users into engaging with malignant information.”

… but only you can decide what is a “quality content”.

This is why we are building TrustedOut. Am AI-Operated database of media profiles. Unbiased. Up-to-date. Universal.

Questions? Shoot!

 

Presentation to the GESTE (major online editors in France)

We were very proud to be invited to present at the latest GESTE (major online editors in France) event about “Trust and labelization“.

The presentation.

Click to run presentation

 

The Table of Content.

The problem:
Distrust in media.

The consequence:
In decision-making, only what is trusted in can be trusted out.

The point:
Distrust is general, trust is personal. No universal list.

The logic:
Trust is about reputation. Media Brands are about reputation.

The solution:
Industrial Profiling Media Brands.

The application:
Easy querying, live feeding.

The technology:
Machine learning, Web crawling, big data and microservices to self-feed, self-grow and daily validations

Our live taxonomy:
Permanent machine learning, customizable sensitiveness & specialty depth,
enterprise mapping.

The opportunity:
BI, Ads & PR

Question? shoot!

 

Older people share more fake news.

Age predicts behavior better than any other characteristics (even party affiliation )

Researchers at New York and Princeton Universities, through their recent surveys, are saying older users shared more fake news than younger ones regardless of education, sex, race, income, or how many links they shared. [source: The Verge]

7 times more fake news sharing

“But older users skewed the findings: 11 percent of users older than 65 shared a hoax, while just 3 percent of users 18 to 29 did. Facebook users ages 65 and older shared more than twice as many fake news articles than the next-oldest age group of 45 to 65, and nearly seven times as many fake news articles as the youngest age group (18 to 29).”

Profiling media sources…

“It won’t be easy: how to determine whether a person is digitally literate remains an open question. But at least some of the issue is likely to come down to design: fake news spreads quickly on Facebook in part because news articles generally look identical in the News Feed, whether they are posted by The New York Times or a clickbait farm.”

… to build trust.

Profiling sources so limit fake news spreading is similar, in logic, to profiling sources to limit misleading intelligence. We call it “Corpus Intelligence” and will focus on B2B solutions. In production end Q1 2019.

Trust, Media and Democracy

click here to read report

The Aspen Institute and the Knight Foundation recently released a report on a commission they organized about Trust, Media and Democracy. While coming from America, we believe most can apply wider.

If you don’t have the time for the length report, this medium page is very interesting. Here are our takeaways in the light of our previous posts, regrouped in 3 main categories:

10 ways to rebuild trust in media and democracy

Before starting up, we can not resist to simply cut and paste the introduction paragraph: “Our nation is experiencing a crisis of trust. We believe that reliable news is vital to our democracy, but many of us can’t name an objective news source. Concern about “fake news” is high, but we can’t agree on what that means. We can’t even assume every American is operating under the same set of facts. We retreat to polarized political tribes and don’t want to listen to anyone outside them.” – Superbly written and so much in alignment with what we believe and the motivation to create TrustedOut.

Of course, the purpose here is not a posture of “we know better” but rather than copycatting what the article says, simply note we wrote about most of those points and thus, are in agreement with them.

a/ Privacy and Transparency (#1, 5 & 6)

Top 2019 predictions: Privacy and Transparency

b/ Financial support (#2, 3, 4 & 7)

Saving journalism.

c/ Education (#8, 9 & 10)

Media trust over education stages

Feedback welcome. Go the bottom of any TrustedOut.com page…

Taxonomy fun facts (as of today!)

Taxonomy DNA for The New York Times – Tech section

In these 2 recent posts, we announced our AI-operated Taxonomy…

Introducing Taxonomy DNA

Taxonomy DNA (cont.) – comparing a specialist vs a generalist

… time now to share some fun facts about it:

10,000,000 words

is the dictionary of words used for the qualification of our taxonomy classifications. Those words were precisely selected to be meaningful for each of our taxonomy classifications (leaves).

100,000 new article abstracts collected daily.

Every day, 100k article abstracts are collected. This number should grow to 1 million a day within 3 months.

75,000 operations per article

… to classify within our taxonomy every single article for every single day for every single feed for every single media.

8 Billions classification operations daily

This is growing daily and should reach 50 to 70B shortly.

Allowing for sophisticated Taxonomy classifications filters.

Thereafter is an example of how to filter classifications and depth of specialization per classification (we’ll dig into this more in a coming post) for your corpus:

Corpus creation and maintenance (may change)

Of course, should you have questions, let us know!

The incredible story of a 10 year long fake, success story.

For 10 years. Fake pharmaceutical, fake CEO, real top-notch business school.

It’s the real story of the fake story of Berden and its CEO. Both are the result of a top notch curriculum at HEC in France. [HBR story here]. The course is to control Enterprise reputation and the challenge was to create a Co., Berden, and its CEO, Eric Dumontpierre. And the success was incredible. For 10 years, the CEO was beloved, the company was super visible, to the point a real competitor sent a cease and decease for a… fake product of fake Berden.

The trick: Do not talk to medias

“The students had only one constraint to respect: not to communicate directly with the media. They had to build their reputation organically, by building an online ecosystem of websites and social network accounts where they would publish press releases and other information about the company, its history and activities.”

The method: Spread false…

Recent studies show that false information is easier to peddle than true information

… bold…

Research on the dissemination of “fake news” shows that students have used communication techniques identified decades ago by researchers as drivers of this phenomenon. Readers are more likely to circulate strong stories that evoke emotions such as fear (river pollution), disgust (child labour) and surprise or joy (32-hour work week) than smooth stories.

… repeat, until it sounds true.

Researchers have shown that repetition increases perceived veracity. In other words, familiarity induces credibility.

The fix: Trust profiled medias.

As previously written here, the solution to avoid this chaos is for medias to have clear values delivered and defended by professional journalists. THE weak point, the trick used here is the absence of contact with medias.

Absence of media opens the door to total chaos in education, opinions and decision-making. TrustedOut Corpus Intelligence is here to profile a totally unbiased, AI-Operated, Media database so Intelligence tools are fed with the content business analysts trust.

 

Saving journalism. [updated 2/19/19]

Major Tech Cos to the Journalism Rescue.

In this week’s Axios Media Trends, we can read “Major tech companies and moguls are pouring lots of money into initiatives to support quality journalism, after months of bad headlines about fake news and the longer-term struggles of business models for journalism, especially at the local level.”

Doing it again.

Microsoft President is making a very interesting point:

“I think we should all care about high quality journalism. … I keep hoping that we’re gonna see the journalism profession come out the other end. Remember, a decade ago, people were saying, ‘Gee, there’s no future in high quality audio visual entertainment.’ It [was] being decimated by cable and then a new business model emerged.”
— Brad Smith

Facebook, Google: $300M each.

Large tech are showing signs of love to journalism. After the 200,000 free Google Suite account, Facebook and Google, each are granting $300M to news programs. WordPress is also investing “six figures” in The News Project, a full-service publishing platform specifically built for digital news publishers.

WordPress just announced Newspack in partnership with… Google.

“While local media might not get as much coverage as the national press, it serves an equally important role in society. That’s why the decline of local newsrooms in the U.S. has been a troublesome trend in recent years. The Google News Initiative is now partnering with WordPress to invest $1.2 million in creating a “fast, secure, low-cost publishing system tailor-made to the needs of small newsrooms” called Newspack (backed by Google, the Lenfest Institute, the Knight Foundation, and others.)

Journalism, the cure to media distrust.

As we wrote previously, quality journalism respecting privacy and transparency, delivering the brand values of the media they work for is the solution to the current distrust, driving to misinformation and, ultimately, to violence.

Top 2019 predictions: Privacy and Transparency

Optimism and method for greater trust in media.

A win-win relationship

Large tech, such as Facebook and Google need media. No trust in media means less dialogs online, less traffic for them.

By providing the framework to provide quality journalism AND a more sustainable business, large tech and medias are on a sound win-win relationship.

We will have to watch carefully the dependance on media businesses, but for now, we, at TrustedOut approve those initiatives helping our Corpus Intelligence with solid, well profiled medias.

Update: Pledges to save local news reach nearly $1 billion

Questions? Comments? Contact us!

Behind the Business Case #3: Country comparisons

To pursue on our Behind Business Cases with this 3rd example, we didn’t have the same questions with the field “Country” we had with “Out of Digital”. Country filters are obvious, in particular for marketers, but what didn’t expect was how on eye-opener it would be.

Import an existing Corpus into TrustedOut

Compare countries within the same Corpus

To discover the countries have very different profiles in particular, their taxonomies.

Comparing Apples to Apples

To avoid comparing Culture, Politics and Entertainment with Business, Society and Tech, TrustedOut can align those two countries and have you compare… apples to apples.

Full business case here

Behind the business case #2: Media Metrics

Our content team worked very hard to define, what we hope is “the most comprehensive and media profiles database, and select all the necessary fields and always up-to-date taxonomy. Of course, this is an on-going process but so far….

We are today at 66 fields and 450 categories.

Along the way, we challenged ourselves on how insightful a field would be to decide to add it.

“Out of digital” is a field collecting all supports beyond online for a media, such as paper, radio, TV….

Now, the question is: Does it matter? or for TrustedOut goal…

Do Pure Players have an impact on your analytics?

We ran our Corpus Analytics on the Corpus used for the BPI Events:

And did an A/B testing, all media and Pure Players only with 2 versions of a Corpus using the very same analytics tools, here Netvibes, and arrive to the conclusion:

Yes, Pure Players influence your analytics.

So, we kept “Out of digital” for our Corpus Intelligence and now our AI keeps on updating it  at all time.

Continue reading the complete business case…