All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
Tv and video on the Internet
1. TV and video on the
Internet
How we created a CDN to host
video clips and live broadcasts.
2. Presentation plan
•Our motivation
•A little bit about what we wanted to achieve
•How did we do it?
– the project and objectives
– edge and origin nodes
– redirector
– traffic modeling (sounds smart, doesn’t it? )
– replication and file management
– statistics
– live-streaming
•What does already work,
•… and what doesn’t and what are the plans
3. Our motivation
•We’ve created a free CDN before - videosteam.pl
– which made us aware that creating such systems is not so
simple
•We were given a chance to create a large video portal - TiVi.pl
•We simply wanted to create something interesting
4. What we wanted to achieve
•Distribution network (edge) and storage (origin) based on ordinary
PCs - if possible - keeping prices low for customers
•Effective division of traffic, matching routes to the customer's
network - as much as possible
+ use of multiple datacenters
•Data replication instead of backups
+ automatic shutdown of broken servers and replacing them with
new ones
•Supporting not only static files but also live broadcasts
•We wanted the service to be consistent with the standards (e.g.
HTTP, WebDAV, RTMP …) – making it easier to use
Why ordinary PCs? http://www.manageability.org/blog/stuff/economics-google-hardware-infrastructure
5. Assumptions
•To ensure redundancy wherever possible
•To use proven software and possibly convert/adjust it
– Varnish, nginx, Apache, MogileFS
•Write only the missing components
– the more code, the more errors
– redirector, file management (WebDAV), statistics, billings
– Java, Python + WSGI – only when necessary, rewrite the
key elements in C
6. The project
Secure Archive of
the Origin servers
is composed of
clusters (which
may be in different
DataCenters)
equipped with a
replicated system
of files. The
clusters are
independent. The
system stores a
minimum of 2-3
copies of each
uploaded file.
Edge nodes
located in different
networks (TPSA,
PLIX, WIX…)
service the traffic
outgoing to
customers
Redirector selects
the best node
position according
to load and
network location of
a customer
7. Edge and origin nodes
•We decided to separate the nodes – there may be less storage
nodes (origin) - traffic from/to the end customer reaches only
the edge nodes that are simpler and there can be more of
them
•Edge nodes act as a proxy using the processed Varnish and
nginx retrieving data from the origin nodes
– we needed software which redirects the user to an
operating and not overloaded node
•On the origin nodes we use nginx and MogileFS, which
replicates data and automatically renews copies - doing a lot
of good work,
– we needed software to manage the files for customers
•Initially, edge and origin nodes may be the same machine
8. Redirector
•Accepts all read requests (both files and live)
– HTTP redirection – updates faster than DNS, simpler than BGP – live
broadcast operated separately in the player
•Has information of edge nodes status - state, load, bandwidth limit,
bandwidth usage, system load and its placement (datacenter/ASN /
network)
•Has information about customers' networks and "distance"
between them (ping, hops, the amount of ASNs) from DC - knows
which network a given customer comes from by IP address
– on this basis it can choose the most favorable server to the customer and
redirect their question there
•It runs on a minimum of 2 nodes (+ hardware load balancer dividing
traffic between redirectors), heavily uses cache and is written in
python + wsgi - we have achieved approx. 2000 req/s per server
9. Traffic modelling
• Redirector’s main task
• For each request:
– it takes the customer's address and checks which
network is the customer from
– it checks the weight/distance of the network from the
particular DC and selects the best - the weights are
updated every 5 minutes. By separate applications
running on the nodes, additional weight to manual
modeling of traffic according to network policies
– we take into account hops, route / amount of ASNs
along the way (more significant only), we initially
counted distances between servers, but the distance
from DC is enough
– it selects a group of servers that support a particular
request (livestreaming, pseudo-streaming, static
files/buffered video)
– it selects the least loaded server from the group
(random with weights) and provides it to the customer
+ caches it for less than a minute
10. Replication and management
• Replication is provided by MogileFS - each file must be in 2-3
replicas (different classes of replication) - if a node fails, the file is
replicated to other servers
• File management - software written in Python to provide
WebDAV interface with so-called Bridge. In the future, we would
like to add support for S3 API. The bridge mediates between the
customer and MogileFS Tracker and MogileFS HTTP interface
(nginx in our case)
11. Replication and management
• The Bridge runs on two servers (+ hardware load balancer) and
a MySQL base (master + several slaves)
• In addition to the redirector, the Bridge is a key element, we
test it automatically with scripts performing basic operations of
through curl immediately after the deployment (through SVN)
• The Bridge also protects files – it provides edge nodes with
information if a particular file is available (tokens, expired
customer accounts, in future also access rights to files)
13. Statistics
•There’s really a lot of logs (for the time being approx. 50 req/s per
server - everything goes to access logs) - a simple map/reduce
•We collect statistics from edge servers and the Bridge - wear for
customers (transfer, storage, hits), stream wear (number of
emissions, number of customers)
•We had to write several small programs:
– Statistics are initially processed through logalyzer (with
rotating logs) and uploaded to SimpleStorage in CSV format,
– logcollector downloads log packs and uploads them to the
base in bulk,
– statscalc aggregates data in subtotals, non-aggregated data is
removed over time.
•As a result – a user sees changes in statistics on a page updated
hourly
14. Livestreaming
•The system has been running productively
•Compatible with RTMP/RTMPT/RTMPE – already created Java
server with the changes (statistics) installed on all the nodes
•Traffic division done by using proxy on the level of video
application (a code in Java sends a video from the server the
customer is broadcasting from to target servers – we can
dynamically choose servers for a given customer)
•We've added broadcaster authorization (tokens - supported
by the Bridge)
15. Livestreaming
•… as well as watcher authorization (tokens can be handled
individually by the webservice of a particular service, e.g.
VoD)
•Redirector manages redirection to streaming (support was
necessary in the player) - RTMP doesn’t support redirects
•It would be great to pack also in HTTP – buffering, among
other things, would work then and there would be no
problems with firewalls
– soon, we would like to add support for smooth-streaming
and streaming on iPhones (mpeg-ts) - we can do it by
transcoding on the fly (we’ve already conducted trials) or
Wowza-type dedicated servers, IIS - heterogeneous
environment is a disadvantage
17. Plans for development?
• file authorization
• optimization and improving traffic modeling algorithm
• streaming through HTTP and smooth-streaming
• streaming for iPhones
• introducing more nodes – e.g. to PL-IX
• expansion and acceleration of statistics
• …