Building a Corpus and Getting relevant articles from a list of articles.

Last week we demoed how, from a list of URLs, you can optimize your communication! This week, we will show you:

How to create a Corpus to feed your analytics tools and get more relevant articles from your selection of articles.

In other words: How can I get more of those articles I find relevant for my analytical/survey/study project?

Like last week, we will preserve the confidentiality of those involved in this real business case by not revealing names or original articles.

1/ The case: A study about a special look at Sports

Client is on a study about some specific aspects of Sports and gave us a list of few articles found interesting to explore.

They need more of those articles and, ultimately, feed their analytics tools with a Corpus made and always up-to-date with relevant sources.

2/ Learning from the classifications of those articles

As mentioned above, we will not share those articles to preserve the confidentiality of the client.

Here are the top, weighted classifications from the articles list:

3/ Creating a Corpus from those classifications

Client told us Sports was the target, so we’ll ask TrustedOut for Sources specialized in all Sports.

And will add the condition that those sources are covering one or more of the top classifications found above: Fashion, Communication and/or Digital Life.

Also, client wants to use the articles he gaves us found in France to explore a new market: the US.

From a list of french articles to a US-France Corpus

Mouse over to zoom. Click to full screen

TrustedOut returns 59 Medias, 96 sources representing an average close to 250 articles per day.

Here are 3 examples of sources found for this Corpus and their respective main profiles over the past week

Footpack

  • People › Sports › Football And Soccer | 31.8%
  • People › Lifestyle › Fashion | 21.8%
  • People › Lifestyle › Luxury | 17.2%
  • People › Sports › American Football |7.0%
  • People › Sports › Cycling | 5.9%

Complex

  • People › Lifestyle › Fashion | 14.6%
  • People › Entertainment And Leisure › Celebrities | 11.3%
  • People › Culture And Arts › Music | 10.9%
  • People › Culture And Arts › Movies | 4.9%
  • People › Entertainment And Leisure › TV And Video And WebTV | 4.0%

Highlights Football

  • People › Sports › Football And Soccer | 31.5%
  • People › Sports › Table Tennis | 19.6%
  • General › Tech › Software And OS | 12.8%
  • People › Sports › Basketball | 11.3%
  • General › Tech › Digital Life | 10.5%

4/ Reading targeted articles

Let’s get the latest articles from our Corpus.

Below is what the beginning of the list of those articles looks like with URLs, time stamps and classifications for each relevant article.

Fashion classified articles?

Fashion, as seen above, was the top classification found from the list of articles that were given to us.

How about getting articles from our Corpus classified in Fashion?

Simply select this classification in the list of articles coming from your TrustedOut Corpus! Here are the first 2:

Want to read them?

Les maillots de gardiens 2020-2021 d’Umbro s’inspirent des annees 90

Best Outdoor Gear Deals of the Week | GearJunkie

Why it’s so critical?

The Corpus makes or breaks any analytics.

No matter how smart your analytics algorithm is, if you feed it with too few, too biased, too outdated, too broad… not only will you get twisted results from your genius algos but, worst, decisions made from it will be wrong and untrustworthy.

Trust your Corpus to Trust your Decisions.

We’ve shared 2 ways to build a trustworthy Corpus:

Criteria-based Corpus creation:

TrustedOut was made to get content corresponding to profiles you trust for a specific purpose.

Example-based Corpus creation:

These two last posts demoed how you can get more from a list of articles/URL.

Questions? Reach out!

 

Editorial Intelligence to find your communication targets.

The case: How, from a list of URLs, can I optimize my communication?

To preserve the confidentiality of those involved in this real business case, we will not share any name or URLs

In our exemple hereafter, the client, ACME Co., has compiled a list of article URLs related to its brand. We could use this list as is but ACME is also scoring every article with a popularity number made of mix of likes, retweets, comments…

All in all, we start with a list of URLs compiled by a client, with or without scores.

1. Editorial profiling for each URL.

For each and every URLs, TrustedOut returns the top editorial classifications.

2. Classifications/score weighting and taxonomy consolidation.

Per article, classification split and scores are weighted. Then, to align with the taxonomy, the 3 hierarchical layers are consolidated:

3. Tree Mapping learning.

We use Tree Mapping to get a visual of the table above.

TrustedOut Editorial Tree Mapping
Click to full screen

Here are some key learning:

3.1 People #1. Priority to Political Engagement

People is the biggest classification branch and Political Engagement should definitely be a priority. For campaigns, PR and watch.

Interesting, as well, are 2 related classifications: Series in Culture and Arts and TV/Videos/WebTV in Entertainment and Leisure.

3.2 Then, is Politics. 3 major classifications echoing People’s Political Engagement.

Very interesting, to see, way above, Political Engagement in People first and then 3 classifications of the same stem, Politics from the General branch.

Public Services, Civil Defense and Government are totaling 38,500, that’s almost 95% of the People’s Political Engagement (40,800).

It becomes easy, to orient your communication to resonate on this insight.

3.3 Sciences is all about Medicine and Health. Don’t miss Pharmaceutical and Drugs.

Same stem, Medicine and Health, 3 classifications with 2 clear split in importance: Pharmacy and Drugs and then, Care and Fluid.

Clearly, Pharmacy and Drugs, bigger than each Politics and half of the top notch Political Engagement, should be a focus.

3.4 Industries: All about Healthcare. 2 top classifications.

Interesting to see top 2 are, by far, Hospital and Clinic and Pharmaceuticals, both in Healthcare. 3rd and far behind is Manufacturing and Retail > Tobacco.

Interesting insight as well is to see the Industries > Medicine and Health > Pharmaceutical being half of Sciences.

Sciences first, then Industries helps with the agenda of your communication and branding efforts.

Bottom line: Focus on People and Sciences, knowing what to talk about for each.

Use the following insights for your communication, ad campaign, PR effort, Internal/External engagements…

> Tone: People and Sciences first.

> Address People’ Political Engagement knowing the 3 matters in Politics

> Approach Sciences’ Pharmaceutical and Drugs and develop on Healthcare.

> Have an eye on popular series and videos

Intrigued? Reach out!

TrustedOut Benefits. For Advertisers and Publishers. For Brand Managers and Analysts.

Mouse over to zoom.

For the Advertiser:

The need for a new source of intelligence in targeting

Advertisers want a perfect targeting for their campaigns. With the end of 3rd party cookies, Advertisers will have to rely on other sources of intelligence to target the perfect page for ad insertions.

No self-declaration. No bias.

TrustedOut is totally independent from any preset lists and is entirely powered by AI using countrywide published content. This guarantees our content profiling to be unbiased, universally able on any matter and always up-to-date. We do NOT use any self-declared data coming from anyone. We listen, profile. All the time.

Dialect detection means no dependence on keywords.

How can you, for example, target the Healthcare Industry? With a list of keywords? Who did it? Is it up-to-date?… Hard to manage. Instead TrustedOut listens and profiles everything published and keeps on learning the words and expressions, the “dialects” (or “bags of words”), for each and every classification. This way, the definition of “Healthcare, in our example, always captures the words and expressions used for this targeting. No worries. No lists, potentially biased and out-dated. You are covered. Always.

Profiling Intelligence must be at insertion time.

Content perception varies upon time. A page/an article can be perceived primarily in one classification and, over time, in another one. In this example, we see first “Employment” at publishing time and 3 weeks later, “Seniors”. Same with the dialect words and expressions, it all evolves over time which is key for a perfect targeting.

 


For the Publisher:

The need for a precise, external and unbiased product pulse.

Like any business, publishers need a trustworthy, unsolicited, unbiased look at what they publish. This goes for articles, feeds/sources and the whole media (distinct domain). Not only this is critical to pilot a business but also to compare with others.

The end of self-declarations for special ops.

TrustedOut provides always-up-to-date profiling of a section or the whole of a media. It also provides trends which can be compared. The end of self-evaluations, self-declarations for a more trusted relationship with your sponsors and commercial partners.

Editorial trends and KPI measurement.

TrustedOut profiles published and exposed content. All of it and at all time. This means measuring editorial trends and positioning. Amongst a media and across a market, a region or a country. Mapping Editorial trends pictures how one reacts to a matter in depth and in time. Finally, it also measures performances and the balance of demand/delivery.

 


For the Brand Manager:

Brand Consistency Builds Brand Awareness

Marketers know Brand Consistency is key to build and keep awareness and credibility. To keep brands consistency, brands must be totally safe and not appear in an environment not consistent with its values. TrustedOut allows Marketers to define the content appropriate for their brands and get corresponding Whitelists where to deliver the right communication.

Consistency, inside and outside.

Branding is not reserved to marketers. Everyone within an organization, employees, board members, partners and, of course, customers and prospects should all have access to a curated list of sources related to the brand.

Consistency in PR goals and measurements.

Direct your PR efforts to where you want your brand to be and use our instant profiling to measure evolutions and progresses. Define KPIs with your PR departments and measure your ROI on a classification, a perception, a competition.

 


For the Analyst:

Must trust the Content in to trust the decisions out.

The mandatory first step in any analytics is to trust the content/data you are going to analyze. If you do not, results will be unreliable and worst, decisions you are going to make might be dangerous and disastrous to your business and brand. Bottom line: if no trust in, then no Trust Out.

Who reacts to what, and how.

Analyze how an event, product launch or local news, are classified and where. A great source of insights on how to prepare for a new launch or positioning.

Watching the future.

Industries keep on evolving. Be on alert about what technology is popping up, how it is received amongst content oriented, political or else, and where it is happening.

Let’s talk?

Covid-19 Editorial Trends. Why it matters to Advertising and Marketing.

TrustedOut. All rights reserved. (data as of April 22nd 2020)

How do Editorial focuses evolve with the Covid-19?

Covid-19/Coronavirus is a major disruption in almost everything. It has been the main focus of all major media for weeks.

Now that weeks are passing by, we wanted to have a look at any evolution of the Editorial focus on media.

Mapping the difference in focus on Industries > Healthcare.

We picked a classification, Industries > Healthcare and a region of France, the East, selected regional newspapers and compared how the main classifications of those titles evolved over the past week vs the past month. And mapped those delta in focus.

Focus shifting?

Finally, as shown above, per title, we displayed what has been the past week focuses (top classifications).

Why it matters.

For a publisher: Precise Editorial Pulse.

Knowing your product critically matters.
And it cannot be self-declared or self-appreciated.

To get a live and precise picture, on how, within a region, this title compares with others. Voluntarily or not, it does matter for the title brand sake to know where it goes and stands.

For an advertiser: Perfect Content Targeting.

Right content for the right message, measured at the moment of insertion.

Content targeting is even more important with the changes in cookies. Not only content profiling must be done right and provide the best insights on classification but it must be done at insertion time because appreciations on content vary over time and what was at publishing time may not be the same at insertion time. (More info and example here)

For a brand manager: True Brand Consistency.

As Warren Buffett says, “It takes 20 years to build a reputation and 5 minutes to ruin it”
Branding consistency is vital.

Brand must be safe, everywhere, but also stay consistent. The only way to ensure both is to define your trust and get the media corresponding. Key for Whitelisting in campaigns, internal watch and external communication.

For an analyst: Mandatory Corpus Trust.

Content used for any analytics (the Corpus), must be trusted to trust the decisions made from it.
“If it’s not Trusted In, it cannot be Trusted Out”

Any analytics, any Business Intelligence is a 3A process. Aggregation of the content/data to be used, Analytics of those, Action taken from the Outcome of Analytics. You can have the best Analytics and Action algorithms and staff, it is not only useless but dangerous to make anything from it if you don’t trust the Content and/or data to be used.

Questions? Let us know!

Editorial Trends during Coronavirus – USA Today, WSJ, Miami Herald, Le Monde, Le Figaro, Ouest-France.

Editorial trends past week vs past month, past month vs past quarter.

How do media evolve during this unprecedented period we are experiencing with the Covid-19/Coronavirus?

We decided to focus on 3 daily newspapers in France and the US and look at how their editorial orientation evolved over the past 7 days vs the past 30 days and the past 30 days vs the past 90 days.

Numbers are as of Wednesday, April 22nd 2020 and analyze the official twitter feed.

USA: Top 2 and 1 local.

USA Today

Back to People.

Lifestyle never left but People, Home and Sports are back. Less on Medicine & Health and Politics


The Wall Street Journal

Very stable on Lifestyle>Home.

Tech is growing.  Economy & Enterprise still high but more focused on Real Estate. Less Politics.

Miami Herald

More on Money, Less and Medicine & Health

Stable on Economy & Enterprise, More on General>Finance, less on Healthcare, Politics and Government.

France: Top 2 and 1 regional.

Le Monde.

Shaping up the future?

Politics is coming back. Industries is popping up. A little less on Medicine and Health.

Mouse overt to zoom

Le Figaro

More fun. Less serious stuff.

Time for break. Entertainment & Leisure and Culture & Arts are back. Easing on Public Services and Medicine & Health

Ouest-France

Back to the field

Back to Society and industries, less on Medicine & Health and Sciences. A little less in Sports and Soccer in particular.

Want to know how we make this happen?

Read this post: AI-Powered Classification vs Keywords

https://www.trustedout.com/blog/how-classification-detects-editorial-angle/

Questions? Let us know!

Media and Coronavirus: Trust Up, Consumption Up, Revenue Down.

The Trust Chronology:

June ’18: Poll: 72 percent say traditional outlets ‘report news they know to be fake, false, or purposely misleading’

Feb ’19: Information from Traditional Media, Conversations on Social Media.

Back then, we wrote this post:

Get information from Traditional Media, have conversation on Social Media. Not the other way around.

March ’20: Coronavirus Approval Rates: Trump > News Media

As we wrote 2 weeks ago

Coronavirus amplifies the gap in trust.

Coronavirus update: GOP 25% (-8%), Dems 61% (-5%)

Credits: Gallup

April ’20: Trump’s Approval rating falls

Credits: Gallup

April ’20: 83% Of Americans Trust Journalists For Coronavirus Info. 50% Trust Social Media

So, Trust is up and Media consumption is up

Credits: Gallup

But Ad spending is down.

Credits: eMarketer

Time for Brands to get stronger (and support traditional media)

Coronavirus impact on Brands Trust (and our contribution)

Our contribution:
TrustedOut in April. Free.

For brands (US or France)

You are the Manager of a recognized brand in the US or France, contact us and we’ll help you build the perfect whitelist for your next ad campaign.

For analysts (US or France)

Your are an analyst for a solid Business in the US or France, contact us and we’ll help you create the perfect corpus to trust the sources and thus, trust your decisions.

Interested? Reach out here:

Coronavirus impact on Brands Trust (and our contribution)

“It takes 20 years to build a reputation and 5 minutes to ruin it…

… if you think about that, you’ll do things differently” – Warren Buffett

That was a post we wrote in June 2019.

“It takes 20 years to build a reputation and 5 minutes to ruin it…

Brands behaviors on Coronavirus impact their trustworthiness.

We highly recommend reading the most awaited 2020 Edelman Trust Barometer. This post is about our takeaways, focused on the vital trust in Brands

 

 

Solve, don’t sell.

Out of Edelman’s 5 recommendations, the 3rd one, Solve, don’t sell, resonates the most with us.

Brands vs Gov.: Critical. Faster and more effective.

More than ever, Trust in a brand is very meaningful, if not emotional. People believe Brand are playing a critical role in addressing the crisis and, believe Brands respond faster and more effectively than the government.

Brands to help gov. (not the other way around)

Brands to be a reliable news source.

Remember this recent post? Coronavirus amplifies the gap in trust showing, and it’s no news, distrust in News Media.

Coronavirus amplifies the gap in trust.

Well, Trust in Brands also means serving as reliable source:

Think People, not Business.

Under attack, and especially one of this magnitude, people matter. All of them, inside and outside your brand reach.

And, of course, what you produce must help with its purpose and be affordable.

Now is not the time for wild creativity in product launches and commercial campaigns.

Don’t hide, Be visible. Reassure. Strengthen your brand trust.

Do not stop communicating. To the contrary, reassure on your brand values and apply the advices above.

To apply those advices on ourselves:

Our contribution:
TrustedOut in April. Free.

For brands (US or France)

You are the Manager of a recognized brand in the US or France, contact us and we’ll help you build the perfect whitelist for your next ad campaign.

For analysts (US or France)

Your are an analyst for a solid Business in the US or France, contact us and we’ll help you create the perfect corpus to trust the sources and thus, trust your decisions.

Interested? Reach out here:

Coronavirus impact on the News Industry (and Journalism)

Early 2019. Saving Journalism. Already.

Early 2019 we wrote this post about the need to save Journalism

Saving journalism. [updated 2/19/19]

Journalism, the cure to media distrust.

As we wrote previously, quality journalism respecting privacy and transparency, delivering the brand values of the media they work for is the solution to the current distrust, driving to misinformation and, ultimately, to violence.

Coronavirus Update:
Journalism has ‘never been more important than it is right now’

We could not agree more with Facebook COO Sheryl Sandberg:

“Journalism is hugely important all the time, but it’s probably never been more important than it is right now, when people really need critical information about what’s going on,” said Sandberg. “That means that those businesses have to be supported. So we’re doing it by spending money and we’re doing it by granting money.”

And Facebook is doing it, as promised. Facebook spending $100 million to help news outlets in coronavirus crisis

News industry braces for major layoffs, pay cuts

Why it matters: Local news was already facing dire strains in the United States. The coronavirus and a pending recession could push the industry into near collapse at a time when people need access to local news and information more than ever before.

More impacted.

While tech giants like Google, Facebook and others are expected to lose billions of advertising dollars this year thanks to economic disruptions caused by the coronavirus pandemic, the losses aren’t expected to cripple these companies.

The bottom line: “With the rapid drop in general ad revenue, overall revenue is dropping fast while costs and need are rising. This is on top of decades of financial stress. There is no fat in the system,” says Chavern.

Questions? Let us know!

Coronavirus amplifies the gap in trust.

Remember the split in trust in Media between Republicans and Democrats?

Trust in Media: GOP 33%, Dem. 66%

Source: Edelman

We wrote this:

Impact on brands? Trust in Business vs Media likely tells political orientation.

Coronavirus update: GOP 25% (-8%), Dems 61% (-5%)

We highly recommend reading this article from Axios

“Americans are largely approving of how U.S. institutions and leaders are responding to the coronavirus situation.

Hospitals are held in the highest regard during this health crisis, consistent with the high trust and ethical ratings medical and health workers receive in normal times.”

Appraisals evolve over time. So do our classifications.

As we wrote here, our classifications operated by Artificial Intelligence are the only ones to capture and update over time:

AI-powered classifications vs Keywords. Part 2/2: Evolution over time.

Why it matters to Brands?

As we wrote, Advertisers have a major role on our society.

Internet has a major role on our society.
Advertising has a major role on the Internet

In other words, Advertisers money fuels web sites that have a major role on our Society. We already know the impact on elections, but is it limited to it?

Consumers believe brands intentionally place ads

Brands must be careful with where their ads appear as consumers associate the brand with the content around.

Not only for Brand Safety violations but for Political orientations.

“Whitelisting, which only allows ads to be placed in approved environments, may in fact be the best brand safety insurance.” – AdWeek

We have the solution:

Contact us:

Introducing TrustedOut Kiosks

In these challenging times, we wanted to offer a way to help and came up with the TrustedOut Kiosks.

Free, Simple, Focused.

Super simple, select a subject, download the file (OPML) and import it in a RSS Reader and enjoy the latest news. Simple. Get more info at the end of this post.

Car Racing

US:

Country: US, Language: English, Classification: People > Sports > Car Racing, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

13 Media and 26 Sources representing an average of 113 new articles daily (as of 3/22/20).

Download Kiosk for the US

France:

Country: France, Language: French, Classification: People > Sports > Car Racing, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

10 Media and 15 Sources representing an average of 100 new articles daily (as of 3/22/20).

Download Kiosk for France

Video Games

US:

Country: US, Language: English, Classification: General > Tech > Video Games, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

37 Media and 55 Sources representing an average of 507 new articles daily (as of 3/22/20).

Download Kiosk for the US

France:

Country: France, Language: French, Classification: General > Tech > Video Games, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

33 Media and 61 Sources representing an average of 341 new articles daily (as of 3/22/20).

Download Kiosk for France

Fashion

US:

Country: US, Language: English, Classification: People > Lifestyle > Fashion, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

39 Media and 49 Sources representing an average of 220 new articles daily (as of 3/22/20).

Download Kiosk for the US

France:

Country: France, Language: French, Classification: People > Lifestyle > Fashion, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

13 Media and 27 Sources representing an average of 215 new articles daily (as of 3/22/20).

Download Kiosk for France

Investment & Stock Market

US:

Country: US, Language: English, Classification: General > Finance > Investment + Stock Market, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

52 Media and 124 Sources representing an average of about 4,000 new articles daily (as of 3/22/20).

Download Kiosk for the US

France:

Country: France, Language: French, Classification: General > Finance > Investment + Stock Market, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

15 Media and 28 Sources representing an average of 495 new articles daily (as of 3/22/20).

Download Kiosk for France

TV & Video

US:

Country: US, Language: English, Classification: People > Entertainment and Leisure > TV and Video and WebTV, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

16 Media and 21 Sources representing an average of 111 new articles daily (as of 3/22/20).

Download Kiosk for the US

France:

Country: France, Language: French, Classification: People > Entertainment and Leisure > TV and Video and WebTV, Level: Specialized and above, Spotted as: No Political orientation, Not toxic and Sources is not Media’s Twitter account.

9 Media and 12 Sources representing an average of 58 new articles daily (as of 3/22/20).

Download Kiosk for France

Enjoy your Kiosk in your favorite RSS Reader

Find the OMPL Import in your Reader. For example:

In Inoreader, click on your icon on the upper right corner and select Preferences, then Import/Export/Backup and Import the OPML file from the Kiosk above.

In Netvibes, click on Add > New > Reader app > Import OPML

In Feedly, click on your icon (often in lower left corner) > Organize Sources > Import OPML

etc…

Hope you’ll find this useful.

Need another Kiosk Subject? Let us know.