March 2024 - This site, and Kamaelia are being updated. There is significant work needed, and PRs are welcome.

Project Task Page: Multicast RTP MPEG Remultiplexer

Status: Blocked - Performance bottlenecks - code can't run fast enough
Current Developers: Matt
Current "inflight" dev location: /Sketches/MH/RTP/
Start Date: ??
Major Milestone date: n/a
Expected End Date: 22nd December 2006
End Date: tbd
Date this page last updated: 27th November 2006
Estimated effort so far: 9 days

Description

A tool to mix MPEG transport streams received over multicast in RTP format; and rebroadcast as a new multicast RTP stream.

Internal work on developing live multicast streaming services needs a way to take data from one stream and mix it into another. The streams are multicast RTP packets containing MPEG Transport Stream data. The tool, when deployed should be able to run 24/7, combining a subset of data from 2 or more streams to generate a new one. This would be used, for example, to mix existing EPG data into an existing stream containing audio and video.

Benefits:


Inputs

Task Sponsor: BB (BBC internal)
Task Owner: Matt (MH)
Developers:

User

Interested Third Parties

Requirements (non exhaustive):

Receive multicast RTP containing MPEG Transport Stream containing H264 @ ~1Mbit/s (MUST)

Simulataneously receive a 2nd multicast RTP containing MPEG Transport Stream containing EIT data and MPEG2 video @ ~5Mbit/s (MUST)

Combine (demultiplex and remultiplex) EIT data from 2nd stream with video from the 1st to form a new stream (MUST)

Transmit the new stream as multicast RTP (MUST)

Adjust stream timestamps (MPEG Transport Stream level, and possibly MPEG Program Elementary Stream level) if needed (WOULD LIKE)

Relevant Influencing factors:

Outputs

Expected

Components to parse and create RTP packets

Command line tool, as described

Webpages describing:

Actual

Code

RTP handling

Internet components (uprades/modifications)

SDP handling

DVB/MPEG Transport stream processing

case-insensitivity problem fixed for /trunk/

...removing filename clash problems for case insensitive filesystems like that on win32/osx

Realistic possibilities arising as a result of activity on this task

New/modified components for mainline codebase (RTP, DVB)

Tasks that directly enable this task (dependencies)

Subtasks

Improved throughput of multicast component and Selector in general

Develop code

Task Log

Discussion

Stream Synchronisation Timestamps may need regenerating

Need to determine, experimentally, if timestamp resynchronisation algorithms will be neededIf resynchronisation algorithms are needed. Technically remultiplexing severely jitters the timestamps on the transport stream packets.

CPU load higher than anticipated

CPU load is higher than anticipated - handling a single 1-2Mbit/s stream takes 50%+ CPU usage on the Mac Mini currently being used for testing. A faster "Core Duo" Mac Mini has been tried, but the system struggles to keep up with the 4Mbps MPEG2 stream (ie. usage teeters close to 90%/100%).

Multicast I/O improvements

Selector component has been improved (local copy in the working dir) to increase responsiveness. Specifically, instead of requests to select on file handles queueing up at its inbox until the current select() call timeout fires; a separate filehandle is used to wake it immediately if there are pending requests.

The Multicast components have been optimised (local copy in the working dir) to sleep when inactive, using the Selector component to wake them.

Threaded component bottlenecks

I've also tried writing

Why? Interactions between a thread and the main thread are bottlenecked:

Each component is taking between 10% and 20% CPU. Moving the Selector or Multicast components themselves into separate threads doesn't reduce the amount of CPU being spent in the main thread. The Mac Mini could probably cope if it were possible to spread some of the workload across the 2nd CPU without incurring a penalty in the main thread.

Proposal: Axon modifications

I believe there may be mileage in experimenting with modifying Axon such that threads can perform all tasks themselves, using (hopefully fine grained) locking to ensure thread safety. This would eliminate the need for threaded components to have a microprocess running in main thread handling all its requests. This would substantially reduce the overhead incurred when making a component threaded. It would potentially also have the benefit that two components running in threads independant of the main thread would not be bottlenecked by the needing the main thread to handle message passing on their behalf.

This would probably qualify as a separate project task, lasting a few weeks.

Other routes to try first

Michael suggests that such a radical approach may well not be necessary. Instead the following perhaps should be tried first: