SlideShare une entreprise Scribd logo
1  sur  24
So you think you can crawl?
Stretching the Boundaries of SharePoint 2013!
Petter Skodvin-Hvammen
AD-Gruppen, Norway
Who am I?
Petter Skodvin-Hvammen
Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD
• Solutions Architect
• SharePoint Consultant
• Search Enthusiast
• Community Lead
@pettersh - psh@adgruppen.no
www.adgruppen.no
Enterprise Search
Index thousands
of sources
Automate index
management
Infrastructure
sizing
Challenges and Solutions
Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
Enterprise Search using SharePoint Server 2013
• 30,000 users
• 85 locations in 30 countries
• 15,000 daily searches
• 100,000,000 documents(?)
• 60 core systems, 2,000 applications
The Mission…
What do we index?
100,000,000
documents
3,000
fileshares
500
servers
Where is the data?
• Datacenters
• Time zones
• Bandwidth
www.sharepointeurope.com
* http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx
How can we get it?
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1,015*
groups per account
How do we operate it?
• File shares are created, changed, and deleted every
day using a custom self service solution
• File shares are moved between servers every day by
automation rules
• Manage indexing and crawling of each file shares with
minimum manual effort
www.sharepointeurope.com
What can SharePoint do?
• Max 50 content sources per service application
– Max 500 with October 2013 CU installed
• Max 100 start addresses per content source
– Max 500 with October 2013 CU installed
• Max 20 concurrent crawls per service application
– Limitation has been removed
http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
It’s complicated
• More data than we have space for
• It’s located all over the place
• Everything changes all of the time
• There are limitations in SharePoint
• Someone’s gotta maintain this
• It has to be secure and relevant
www.sharepointeurope.com
What did we do?
• Created logical groups of file shares
• Used symbolic linking
www.sharepointeurope.com
fewer
content
sources
file01share01
file02share03
file03share03
file00sharesym01
file00sharesym02
file00sharesym03
file00share
Start address
What did we do?
• Grouped file shares based on region
• One content source per region
• Incremental crawls every night
www.sharepointeurope.com
crawling
based on
time zones
What did we do?
• Created DNS alias per impact rule in
etc/hosts on crawl servers
www.sharepointeurope.com
reduced
crawler
impact
What did we do?
• Granted file share access to the
account included in least groups
• Monitored group memberships
• Grouped file shares by crawl account
• Crawl rules matched folder structure
managed pool
of crawl
accounts
file://.*/spcrwl01/.*
file://.*/spcrwl02/.*
Include
Include
SPspcrwl01
SPspcrwl02
www.sharepointeurope.com
The bigger picture
• Folder structure:
• Start addresses:
<content source>/<crawler impact>/<crawl account>/<symbolic link>
file://<crawler impact>/<content source>/<crawler impact>
Source Start addresses Folder Crawl rule Impact rule
Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default
europe/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default
asia/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
How did we manage this?
www.sharepointeurope.com
self service portal for
enabling indexing of
file shares
custom web service
integration in self service portal
custom solution for
granting access to
crawl accounts
custom timer job to get list of file shares
to crawl from self service portal
custom timer job for creating
and removing symbolic links
custom lists for mapping
server to content source, schedule
and impact, shares to crawl accounts
and metadata, UNC to symlink
content enrichment service for
replacing symlinks in paths with actual file paths
www.sharepointeurope.com
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: Assigned automatically
Crawl Account: Assigned automatically
CancelSave
Example: Self Service Portal Example: Custom Lists
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: file01share01
Crawl Account: SPspcrawl01
Symlink: defaulteuropedefaultspcrwl01e5dc12a41d
Location: europe (server file01 is located in Oslo DC)
Bandwidth: 5Mbps
Index-0
Query
WFE
Doc Proc
Crawling
Central Admin
Enrichment
Query
WFE
Index-2
Index-1
Index-3
Index-0
Index-2
Index-1
Index-3
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Crawling
Analytics
AdminAdmin
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Analytics
Doc Proc
Enrichment
Doc Proc
Enrichment
40Million
Documents
10Queries /
Second
SQL Server SQL Server
• Admin DB
• Analytics DB
• Crawl DB
• Link DB
• Other SP DBs
Caching Caching
Capacity testing
Purpose
• Crawling of symbolic links
• Scaling of virtual machines
• Sizing of disk space
• Verify Microsoft’s advises
Approach
• 4 server farm with 2 partitions
• 8 vCPU, 16 GB RAM, 850 GB
• Crawl 10 file shares (3.7M files)
• Replay top 300 queries
• Apache JMeter
www.sharepointeurope.com
Capacity testing – findings
• Crawl rate declined 1% per million items indexed
• Query latency increased exponentially from 12 million items
indexed per partition
• Database latency was insignificant during crawling
• Successfully crawled file shares via symbolic directory links
• Disk space usage was significant lower than expected
– Reduced data volume from 850 GB to 450 GB
– 40+ servers => huge cost savings
www.sharepointeurope.com
Infrastructure – VM sizing
Dedicated ESX Cluster
• 14 x VM for SharePoint 2013
– 4 physical machines
– 4 x 32 = 128 CPUs
– 4 x 56 = 1024 GB memory
• HA max utiliization = ¾
– 3 x 32 = 96 CPUs
– 3 x 56 = 768 GB memory
• CPU and Memory can be over-
commited
• CPU over-commited 1,34
(1,78 if one physical host fail)
• VM’s must wait for physical CPU
Wait time for 8 cpu = 2 x 4 cpu
• Mitigation:
a) Reduce allocated virtual CPU, or
b) Increase physical CPU
• Memory factor 0,44 (0,59)
• Reserved and locked memory
prevents HA failover
www.sharepointeurope.com
Infrastructure – VM tuning
www.sharepointeurope.com
DC Role vCPU Peak Average Calculated Recommended Change
A Web, Query, Admin 8 187,55 37,03 2 4 -4
B Web, Query, Admin 8 621,88 92,69 8 8 0
A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0
B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0
A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2
B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2
A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2
B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2
A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2
B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2
A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2
B Index 3, Content, CEWS 8 621,88 92,69 8 8 0
A Distributed Cache 4 91,71 5,99 2 2 -2
B Distributed Cache* (added later) - - - - - -
100 78 80 -20
Peak and average CPU usage is calculated over 30 days
Summary
1. Indexing thousands of content sources
2. Automation for rapid changing index requirements
3. Sizing the infrastructure for performance and HA
www.sharepointeurope.com
Questions?
petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin@pettersh

Contenu connexe

Tendances

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldJethro Seghers
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsEric Shupps
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvamitvasu
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...European Collaboration Summit
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandboxElaine Van Bergen
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid worldJethro Seghers
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBMicrosoft Tech Community
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesSPC Adriatics
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Nik Patel
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.releaseDan Usher
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectEuropean Collaboration Summit
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchNCCOMMS
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft AzureK.Mohamed Faizal
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?Jason Himmelstein
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and TaxonomyEuropean Collaboration Summit
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandboxElaine Van Bergen
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointDan Usher
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012Don Donais
 

Tendances (20)

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid world
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance Enhancements
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitv
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DB
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best Practices
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint Search
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
 
O365 Sydney - Hybrid Dev
O365 Sydney - Hybrid DevO365 Sydney - Hybrid Dev
O365 Sydney - Hybrid Dev
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
 

Similaire à ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenSPS Paris
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesEric Shupps
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...DIWUG
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSPC Adriatics
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)Brian Culver
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectNoorez Khamis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016Mike Maadarani
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationDon Donais
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Brian Culver
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Eric Shupps
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineAndries den Haan
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsNick Hobbs
 

Similaire à ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013! (20)

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organization
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint Online
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - Announcements
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 

Dernier

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 

Dernier (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 

ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

  • 1. So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway
  • 2. Who am I? Petter Skodvin-Hvammen Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD • Solutions Architect • SharePoint Consultant • Search Enthusiast • Community Lead @pettersh - psh@adgruppen.no www.adgruppen.no
  • 3. Enterprise Search Index thousands of sources Automate index management Infrastructure sizing Challenges and Solutions Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
  • 4. Enterprise Search using SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 daily searches • 100,000,000 documents(?) • 60 core systems, 2,000 applications The Mission…
  • 5. What do we index? 100,000,000 documents 3,000 fileshares 500 servers
  • 6. Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com
  • 7. * http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx How can we get it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015* groups per account
  • 8. How do we operate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com
  • 9. What can SharePoint do? • Max 50 content sources per service application – Max 500 with October 2013 CU installed • Max 100 start addresses per content source – Max 500 with October 2013 CU installed • Max 20 concurrent crawls per service application – Limitation has been removed http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
  • 10. It’s complicated • More data than we have space for • It’s located all over the place • Everything changes all of the time • There are limitations in SharePoint • Someone’s gotta maintain this • It has to be secure and relevant www.sharepointeurope.com
  • 11. What did we do? • Created logical groups of file shares • Used symbolic linking www.sharepointeurope.com fewer content sources file01share01 file02share03 file03share03 file00sharesym01 file00sharesym02 file00sharesym03 file00share Start address
  • 12. What did we do? • Grouped file shares based on region • One content source per region • Incremental crawls every night www.sharepointeurope.com crawling based on time zones
  • 13. What did we do? • Created DNS alias per impact rule in etc/hosts on crawl servers www.sharepointeurope.com reduced crawler impact
  • 14. What did we do? • Granted file share access to the account included in least groups • Monitored group memberships • Grouped file shares by crawl account • Crawl rules matched folder structure managed pool of crawl accounts file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include SPspcrwl01 SPspcrwl02 www.sharepointeurope.com
  • 15. The bigger picture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact> Source Start addresses Folder Crawl rule Impact rule Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default europe/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60 Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default asia/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
  • 16. How did we manage this? www.sharepointeurope.com self service portal for enabling indexing of file shares custom web service integration in self service portal custom solution for granting access to crawl accounts custom timer job to get list of file shares to crawl from self service portal custom timer job for creating and removing symbolic links custom lists for mapping server to content source, schedule and impact, shares to crawl accounts and metadata, UNC to symlink content enrichment service for replacing symlinks in paths with actual file paths
  • 17. www.sharepointeurope.com Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: Assigned automatically Crawl Account: Assigned automatically CancelSave Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: file01share01 Crawl Account: SPspcrawl01 Symlink: defaulteuropedefaultspcrwl01e5dc12a41d Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps
  • 18. Index-0 Query WFE Doc Proc Crawling Central Admin Enrichment Query WFE Index-2 Index-1 Index-3 Index-0 Index-2 Index-1 Index-3 Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Crawling Analytics AdminAdmin Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Analytics Doc Proc Enrichment Doc Proc Enrichment 40Million Documents 10Queries / Second SQL Server SQL Server • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs Caching Caching
  • 19. Capacity testing Purpose • Crawling of symbolic links • Scaling of virtual machines • Sizing of disk space • Verify Microsoft’s advises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replay top 300 queries • Apache JMeter www.sharepointeurope.com
  • 20. Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significant lower than expected – Reduced data volume from 850 GB to 450 GB – 40+ servers => huge cost savings www.sharepointeurope.com
  • 21. Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 – 4 physical machines – 4 x 32 = 128 CPUs – 4 x 56 = 1024 GB memory • HA max utiliization = ¾ – 3 x 32 = 96 CPUs – 3 x 56 = 768 GB memory • CPU and Memory can be over- commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: a) Reduce allocated virtual CPU, or b) Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and locked memory prevents HA failover www.sharepointeurope.com
  • 22. Infrastructure – VM tuning www.sharepointeurope.com DC Role vCPU Peak Average Calculated Recommended Change A Web, Query, Admin 8 187,55 37,03 2 4 -4 B Web, Query, Admin 8 621,88 92,69 8 8 0 A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0 B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0 A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2 B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2 A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2 B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2 A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2 B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2 A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2 B Index 3, Content, CEWS 8 621,88 92,69 8 8 0 A Distributed Cache 4 91,71 5,99 2 2 -2 B Distributed Cache* (added later) - - - - - - 100 78 80 -20 Peak and average CPU usage is calculated over 30 days
  • 23. Summary 1. Indexing thousands of content sources 2. Automation for rapid changing index requirements 3. Sizing the infrastructure for performance and HA www.sharepointeurope.com