Gstreamer Part I – Getting Started

Understand the purpose, structure and basic usage of Gstreamer.

This article is still a draft.

Gstreamer in a nutshell

Gstreamer is a C framework for manipulating media (video, audio, images). It provides the scaffolding for connecting media elements called plugins. Using Gstreamer and its plugins one can capture, transport and manipulate video and audio data. Use cases range from changing the encoding format of movies to constructing a live stream between a camera and a browser.

Connections between elements are made through its own kind of Unix-like pipes called pads. Elements downstream talk to elements upstream, forming a pipeline or chain. Formally this pipeline is acyclic, directed graph. Each element does something with its piped input and then pipes it further upstream. The elements represented as plugins can behave as encoders, muxers, visualizers, filters, servers, clients, to name a few. 

Gstreamer can be used on the command line or programatically through C and Python interfaces. In this and the coming tutorials, I will use only the command line.


Option #1

Use a package manager for your system – apt (Debian/Ubuntu), yum (Fedora/Centos) or homebrew (Mac).

On Ubuntu 14-16 it comes pre-installed on the Desktop versions. If you don’t have it, you can try:

sudo apt update && apt install -y gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav python-gst-1.0

For additional modules you may do a search like:

apt-cache search gstreamer

Option #2

If you want to have the latest version and extra packages such as gst-omx, I’d recommend installing from source. Here is an install script for setting up from source on a Debian-based systems.

When installing from source, make sure that the different packages are the same version!


The graph or chain formed with gstreamer is made of plugins. The graph (chain) begins with one or more src (source) plugins and ends with one or more sink (output) plugins. Here are some src and sink plugins:

1) I/O elements

Input or SRC elements:

  • v4l2src – stream from a camera device on a linux system, e.g. device=/dev/video0;
  • fdsrc – use in conjunction with unix pipes, uses as input the output of other programs such as raspicam on a Raspberry Pi;
  • videotestsrc – used to do test streams with video, you may specify a pattern=<num>;
  • audiotestsrc – used to do test streams with audio;
  • fakesrc – another option for testing by feeding in an empty stream;
  • filesrc – stream from a file, specifiy location=<filepath>;
  • udpsrc – stream from a UDP stream, specify port=<number>;
  • tcpclientsrc – stream from a TCP (HTTP) stream, specify port=<number>;
  • rtmpsrc – stream from RTMP stream, specify port=<number>;
  • ximagesrc – capture screen.

Output or SINK elements:

  • filesink – save stream to a file, specify location=<filepath>;
  • audoaudiosink – play audio on an automatically detected device;
  • autovideosink – play video on an automatically detected display utility and device;
  • fakesink – do not play stream, just finish;
  • udpsink – stream result over UDP, specify host=<IP of the target server> and port=<number>;
  • multiudpsink
  • tcpserversink- stream result over TCP (HTTP), specify host=<IP of the target server> and port=<number>;
  • rtmpsink- stream result over RTMP, specify host=<IP of the target server> and port=<number>.

2) En/Decoding elements:

Encoding a raw data stream and decoding encoded data into raw data is a big part of using gstreamer and dealing with media in general. Audio data can be encoded to various formats such as MP3, AAC, Vorbis and Opus. Video can be encoded among others as JPEG 2000, H.264, H.265, MPEG-2, VP8, VP9 and Theora. Gstreamer offers the possibilities to encode/decode in these formats through is bundle of plugins. These plugin bundles are the base, good, bad and ugly plugins. A list of these plugins can found here. Additionaly plugins offered by libav are listed here.


  • mp3 – lamemp3enc, avenc_mp3 | mad, mpg123audiodec, avdec_mp3;
  • aac – voaccenc, faac, avenc_aac | faad, aacparse, avdec_aac;
  • vorbis – vorbisenc | vorbisdec, vorbisparse;
  • opus – opusenc, avenc_opus | opusdec, avdec_opus.


  • h.264 – x264enc, avh264_enc |  h264parse, mpeg4videoparse, avdec_h264;
  • mpeg2 -mpeg2enc, avenc_mpeg2video | mpeg2dec, avdec_mpeg2video;
  • jpeg2000 – no inter-frame coding, low latency; avenc_jpeg2000 | avdec_jpeg2000;
  • vp8 – vp8enc, avenc_vp8 | vp8dec, avdec_vp8;
  • vp9 – vp9enc, avenc_vp9 | vp9dec, avdec_vp9;
  • theora -theoraenc | theoradec, theoraparse.

3) Capabilities or ‘caps’ elements

Capabilities (short: caps) describe the type of data that is streamed between two pads (elements), or the one that a pad (template) supports. Capsfilters or caps do not modify data as such, but can enforce limitations on the data format. They ensure compatibility between elements. If for example a media stream is available in several formats, they can specify just one which is understandable by the next element in the pipeline. Capabilities can be very complex and specify all types of characteristics although that is often not required. Most often we should specify the type of encoding of the stream we receive or send.


(1) Specifies the video as being raw (without encoding) and makes it have a specific width and height:

! video/x-raw, width=640, height=480 !

(2) Specifies that the video is composed of RTP packages encoded as VP8:

! application/x-rtp, encoding-name=VP8 !

4) Other elements

  • Muxers/Demuxers – These elements encapsulate (pair) video and audio in a common container. Common formats are mp4, webm, ogg, mov.
    E.g.:  mp4mux/qtdemux, webmmux/matroskademux.
  • Payers/Depayers – These elements prepare (payload) data prior and after it is transported over the Internet.
    E.g.: rptvp8pay/rtpvp8depay,  rtph264/rtph264depay gdppay/gdpdepay.
  • Converters – These elements perform data manipulations like rotation, color change, modulation and cropping.
    E.g.: audioconvert, audioresample, videoconvert, videoscale.
  • Pipeline – These elements are used in constructing more complex pipelines.
    E.g.: tee, queue2.


1) Video test source to screen

gst-launch-1.0 videotestsrc pattern=1 ! videoconvert ! autovideosink

2) Audio test source to speakers

 gst-launch-1.0 audiotestsrc ! audioconvert ! autoaudiosink

3) Audio test source to nothing (fake), but still a valid pipeline

gst-launch-1.0 audiotestsrc ! audioconvert ! fakesink

4) Video broadcast over TCP/HTTP

We are sending via tcpserversink by using the TCP protocol used normally HTTP. The TCP protocol is used for example in video file streaming applications. The stream is available on a host with IP (localhost) and Port 5200. The trick is to include gdppay to payload correctly the input. Receiving the signal requires a client tcpclientsrc and a mechanism do depay the payload – gdpdepay. We are using VP8 as a video format. In the example we are using VP8 encoding.

Send via:

gst-launch-1.0 videotestsrc horizontal-speed=5  ! vp8enc ! gdppay ! tcpserversink host= port=5200

Receive via:

gst-launch-1.0 -v tcpclientsrc port=5200 ! gdpdepay ! vp8dec ! videoconvert ! autovideosink

5) Video broadcast over RTP (via UDP) from a camera

This example focuses on sending over the RPT protocol running over the UDP protocol. Unlike the TCP protocol, this one is not focused on reliability but rather on fast, low-latency transmission. It is used in real-time applications. Also we are capturing live video from a camera device on Linux. RTP payloading and depayloading for VP8 is done via rtpvp8pay and rtpvp8depay. Here caps are necessary for the pipeline to work.

Send via:

gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw, width=640, height=480 ! vp8enc ! rtpvp8pay pt=96 ! udpsink host= port=5200

Receive via:

gst-launch-1.0 udpsrc port=5200 ! application/x-rtp, encoding-name=VP8 ! rtpvp8depay ! vp8dec ! autovideosink

Coming next

In Part II of this tutorial series we’ll explore video and audio encoding and decoding. In Part III the topic is combining (muxing/demuxing) video and audio in containers such as mp4 and webm which then can be played online (HTML5) or in a player such as VLC. We will explore examples with multiple sources and sinks (outputs).