BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

slideshare.net/Tom-Pool
How To Use Chrome Puppeteer To
Fake Googlebot And Monitor Your Site
Tom Pool // BlueArray //
@cptntommy

Who Am I?
@cptntommy #BrightonSEO

Look After
Technical
Output Of
The
Agency

Always Trying
To Find Ways
To Make My
Teams Job
Easier

So I Was
Watching
Google I/O 18
(Which Is
Awesome
BTW)

And I Saw A
Really
Really
Really Cool
Talk

Eric Bidelman

This Got Me Thinking

I Can Use This
To Help Me
With My Job!

So I Went Away &
Did A Shit Ton Of
Research

That Included

Headless
Chrome

Chrome

And A Little Bit
Of Coding

(Not Much!)

I Want All Of You
To At Least Take

A Small
Piece Of
Knowledge
From This

I’ll Also Tweet Out
This Deck

What Is
Headless
Chrome?

Headless
Chrome
=
None Of That
Shit

Google Chrome Is
Running, But With
No User Interface

So It Is ‘Headless’

Why Should You Even Care?

You Can:

Scrape The Shit Out Of (JS)
Websites

Copy The DOM, & Paste To A
Text File

Compare Source Code With
DOM & Export Differences

Generate
Screenshots
of Pages

Crawl Single Page
Applications

I Know, JS Is Evil, But
It Ain’t Going Away!

Screaming Frog Does Have JS
Rendering Features.
Utilises (Something Like)
Headless Chrome

Google Can Render JS, But It Is
In No Way Perfect, Or Even
That Effective

Countless Case Studies

Automate WebPage Checks

Used For Webpage Testing
(Clicking On Buttons, Filling
In Forms, General Fuckery)

Great For Emulating User
Behaviour!

Great For Seeing How Much
Shit A Website Can Take
Before It Breaks!

The Problem Is...

You Have To Run
Basic Headless
Chrome From
Command Line

/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome

Chrome --headless

Chrome --headless --remote-debugging-
port=9222

port=9222 --disable-gpu

port=9222 --disable-gpu
https://www.bluearray.co.uk

I
Really
Really
Love Using
Command
Line

But This
Really
Really
Made
Me
Cry

So How Do I Make It
Easy?

Like I Said - I’m
Always Trying To
Make My Job Easier

And This
Was Not
Easy!

So I Went Away &
Did A Bigger Shit
Ton Of
Research

What Is
Chrome
Puppeteer?

BlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlah

OOOOOOOO API

Node Can Be Used
For Making
Applications

And It Can Also
Be Used To help
Control Headless
Chrome

And Trust Me
It’s Easy!

So How Can I
Get Chrome
Puppeteer?

If You Want To
Run Tests On
Your Local
Machine

You Have To
Install NPM &
Node.js

Someone’s
Made This
Easy!

So If
You
Are On
PC

It’s Pretty
Straightforward

Just Install From
The Node.js
Websites

bit.ly/pc-pup-brighton19

If You
Are On
Mac

(Like Me)

It’s Not That
Easy

bit.ly/pupbrighton19

You Wanna
Open Up
Terminal

ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/
install/master/install)"

This Installs
Homebrew,
That Makes
Everything E-Z

When
This
Has
Done Its
Thing

You Have To
Install 2 More
Things, And
We’ll Be Ready
To Rock

brew install node

And Then

npm i puppeteer

Now You Are All
Good!

You Can Now Run
Chrome Puppeteer On
Your Machine!

For Example

If I Wanted To Take A
Screenshot Of A
Single Webpage

There Is A Bunch Of
Code Coming Up

That Can All Be Seen
In The Following Link
(I’ll Also Tweet It)

https://bit.ly/Brighton
SEO19

let browser = await
puppeteer.launch({headless:
true});

let page = await
browser.newPage();

await
page.goto('https://www.
bluearray.co.uk/');

await
page.screenshot({

await
page.screenshot({ path:
'./testimg.jpg',

await
page.screenshot({ path:
'./testimg.jpg', type:
'jpeg'});

await page.close();
await
browser.close();

File Is Saved As
Screenshot.js

So To Run This Small
Piece Of Code

Go To Terminal (In
Same Folder As Code),
And Type In

Node Screenshot.js

And Then, 5 Seconds
later,

If You Wanted To See
The Browser Do These
Steps

let browser = await
True});

let browser = await
False});

You Can Also Provide
A List Of URLs

And Get A Shit Ton Of
Screenshots!

Now I’m Sure You Can
See Where This Is
Headed

Faking Googlebot!

With A Few Tweaks to
The Code

await
page.setUserAgent
('Googlebot');

Googlebot’s User
Agent Is Not Just
‘Googlebot’

It’s Fuck*** Huge

Mozilla/5.0 (Linux; Android 6.0.1;
Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/41.0.2272.96
Mobile Safari/537.36
(compatible; Googlebot/2.1;
+http://www.google.com/bot.ht@cptntommy #BrightonSEO

And Then You Gotta
Set Googlebot’s
Viewport

await
page.setViewport

await
page.setViewport
({width: 1024, height:
1024});

FYI This Is Not Really
Googlebot

As Unfortunately

Can’t Change Chrome
Version That
Puppeteer Uses To 41
:(

As Chrome Puppeteer
Was Released After
Chrome 41
(*Not Backwards Compatible)

However!

Can Be Persuasive In
Getting A Client To
Ensure Their Content
Is SSR’d
(If Needed)

Chrome Puppeteer
Can Be Installed On
The Server

We Can Then Provide
Puppeteer With A List
Of URLs, And It Can
Work Through Them
All

And Show How They
Would Appear To
Google, Instead Of

In The Case Of Some
JS Sites

A Blank Page

Which Is Cool & A
Nice Trick

But The Really Cool
Stuff Is Yet To Come

So Who Here
Has Heard
Of (Or Used)
ContentKing?

It’s Fairly Awesome

Allows You To
Monitor A Site In
Real-Time

With It Letting you
Know Of Any Issues

Meta Changes, New
404 Errors, Updated
Links….

Like Most Good Tools,
It Costs Money

Maybe You
Don’t Wanna
Eat Into Your
Budget

This Next Example
Shows How We Can
Use Puppeteer

Monitor Your Site
When You Want
&
Report Of Any
Changes To Key Areas

Including

Title Changes

Description Changes

Word Count
Increases/Decreases

Robots Directives

Canonicals

So Basically The
REALLY Important
Shit In The HTML

So I Wrote Some Code

As With All Code, Required A
Bit Of Research

And With A Bit Of Luck,

We Now Have A Way To
Monitor Basic Areas Of Sites!

There Is About 200 Lines Of
Code

And I Don’t Have Time To Go
Through The Full Thing

There Are A Few Interesting
Snippets I’d Like To Share

We Launch Headless Chrome
& Puppeteer As Highlighted A
Minute Ago

const browser = await
puppeteer.launch();
const page = await
browser.newPage();

Provide A List Of URLs For
Puppeteer To Go And Play
With

try {data =
fs.readFileSync('/Users/tomp
ool/Desktop/PuppeteerRender
ing/PageMonitor/urls.txt','utf
8');}

And Then Pull Relevant Meta
Data

Meta Title

try {title = await page.title();}
catch (e1) {title = 'n/a';}

Then Create An Array Of All
The Meta Data

let retArray =
[date,url,title,description
,canonical,robots,wordC
ount];

And Pushed This To A txt File

The Script Then Loops
Through All Provided URLs

And Checks For Differences In
The Returned Data

If There Are Any Differences,
These Get Saved In Another
txt File

That I Can Check Whenever

So I Can See What Has
Changed From
Yesterday/When I Last Ran
The Code.

This Required Me To Run The
Code Each Day

(That I Forgot To Do)

So I Went One Step Further

Chucked It On A Raspberry Pi

And Set Up A CronJob To
Automatically Run The Script
At The Same Time

Every Day

(This Was The Longest Bit)

Email Me If Anything Changed

This Is By No Means A
Finished Product, And Is Still
An Ongoing Project

These Usages Of Chrome
Puppeteer

Barely Scratch The Surface Of
What Is Possible

So, To Recap

Today We Have Covered

Headless Chrome

Puppeteer

Basic Scripts Using Node.js

And Automation Of All Of
These To Save You Valuable
Time

And Hopefully, Allow You To

THANKS!

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Recommandé

Recommandé

Contenu connexe

Similaire à BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Similaire à BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site! (6)

Dernier

Dernier (20)

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Notes de l'éditeur