A high-level architecture overview for getting started with FFmpeg.
ffmpeg package contents
FFmpeg is composed of a suite of tools and libraries.
ffmpeg tool
The tool can be used to encode/decode/transcode various audio and video formats and stream the encoded media over the network.
- ffmpeg: a command line tool to convert multimedia files between formats
- ffplay: a simple mediaplayer based on SDL and FFMPEG libraries
- ffcheck: a simple multimedia stream analyzer
ffmpeg library
Libraries can be used to integrate the same features into your own product.
- libavformat: I/O and muxing/demuxing
- libavcodec: encoding decoding
- libavfilter:graph-based filter for raw media
- libavdevice: Input/Output Device
- libavutil: General Multimedia Utilities
- libswresample:audio resampling, sampling format conversion and audio mixing
- libswscale: color conversion and image scaling
- libpostproc:Video post-processing (deblocking/noise filter)
ffmpeg simple player
The basic use of FFmpeg is to demux a multimedia stream (from a file or received over a network) into its audio and video streams and then decode those streams into raw audio and raw video data.
To manage media streams, FFmpeg uses the following constructs:
- AVFormatContext: a high-level infrastructure that provides sync, metadata, and muxing for streams
- avstream: a continuous stream (audio or video)
- avicodec: defines how data is encoded and decoded
- avpacket: data encoded in the stream
- aviframe: decoded data (raw video frames or raw audio samples)
The process used to demux and decode follows this logic:
Here is the basic code needed to read an encoded multimedia stream from a file, analyze its contents, and demux the audio and video streams. THE FACILITIES PROVIDED libavformat Library and it’s uses AVFormatContext And
AVStream Structures for storing information.
AVFormatContext* format_context = avformat_alloc_context();
avformat_open_input(&format_context, filename, NULL, NULL);
printf("File: %s, format: %s\n", filename, format_context->iformat->name);
avformat_find_stream_info(format_context, NULL);
for (unsigned int i = 0; i < format_context->nb_streams; ++i)
{
AVStream* stream = format_context->streams[i];
printf("---- Stream %02d\n", i);
printf(" Time base: %d/%d\n", stream->time_base.num, stream->time_base.den);
printf(" Framerate: %d/%d\n", stream->r_frame_rate.num, stream->r_frame_rate.den);
printf(" Start time: %" PRId64 "\n", stream->start_time);
printf(" Duration: %" PRId64 "\n", stream->duration);
printf(" Type: %s\n", av_get_media_type_string(stream->codecpar->codec_type));
uint32_t fourcc = stream->codecpar->codec_tag;
printf(" FourCC: %c%c%c%c\n", fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff);
}
avformat_close_input(&format_context);
Once we have the individual streams inside a multimedia file, we need to find specific codecs to decode the streams into raw audio and raw video data. All codecs are included statically libavcodec. You can easily create your own codec by creating an example of ffcodec structure and registering it as a
extern const FFCodec In libavcodec/allcodecs.cBut that would be a different topic for another post.
To find a codec suitable for one’s content AVStreamWe can use the following code:
AVStream* stream = format_context->streams[i];
const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id);
if (!codec)
{
fprintf(stderr, "Unsupported codec\n");
continue;
}
printf(" Codec: %s, bitrate: %" PRId64 "\n", codec->name, stream->codecpar->bit_rate);
if (codec->type == AVMEDIA_TYPE_VIDEO)
{
printf(" Video resolution: %dx%d\n", stream->codecpar->width, stream->codecpar->height);
}
else if (codec->type == AVMEDIA_TYPE_AUDIO)
{
printf(" Audio: %d channels, sample rate: %d Hz\n",
stream->codecpar->ch_layout.nb_channels,
stream->codecpar->sample_rate);
}
With the correct codec and codec parameters extracted from AVStream information, we can now allocate
AVCodecContext The structure that will be used to decode the corresponding stream. It is important to remember the index of the stream we want to decode from the prior stream list (format_context->streams) because this index will be used to identify demuxed packets later. AVFormatContext.
In the following code we are going to select the first video stream included in the multimedia file.
int first_video_stream_index = ...;
AVStream* first_video_stream = format_context->streams[first_video_stream_index];
AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar;
const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);
AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);
avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);
avcodec_open2(codec_context, first_video_stream_codec, NULL);
Now we have a working decoder, we can extract the demuxed packets using AVFormatContext Structure and decode them into raw video frames. For that we need 2 different structures:
AVPacketwhich consists of encoded packets extracted from the input multimedia file,AVFramewhich will then contain the raw video frameAVCodecContextThe previous packets have been decoded.
AVPacket* packet = av_packet_alloc();
AVFrame* frame = av_frame_alloc();
while (av_read_frame(format_context, packet) >= 0)
{
printf("Packet received for stream %02d, pts: %" PRId64 "\n", packet->stream_index, packet->pts);
if (packet->stream_index == first_video_stream_index)
{
int res = avcodec_send_packet(codec_context, packet);
if (res < 0)
{
fprintf(stderr, "Cannot send packet to the decoder: %s\n", av_err2str(res));
break;
}
while (res >= 0)
{
res = avcodec_receive_frame(codec_context, frame);
if (res == AVERROR(EAGAIN) || res == AVERROR_EOF)
{
break;
}
else if (res < 0)
{
fprintf(stderr, "Error while receiving a frame from the decoder: %s\n", av_err2str(res));
goto end;
}
printf("Frame %02" PRId64 ", type: %c, format: %d, pts: %03" PRId64 ", keyframe: %s\n",
codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts,
(frame->flags & AV_FRAME_FLAG_KEY) ? "true" : "false");
}
}
av_packet_unref(packet);
}
end:
av_packet_free(&packet);
av_frame_free(&frame);
avcodec_free_context(&codec_context);
avformat_close_input(&format_context);
The way the former code is working is reproduced in the next picture:
You can download the full code here or access the code repository directly.
To create the example you will need Mason and Ninja. If you have Python and Pip installed then you can install them very easily by calling pip3 install meson ninja. Then, once the example collection has been extracted ffmpeg-101 folder, go to this folder and call: meson setup build. This will automatically download the correct version of FFmpeg if you don’t have it already installed on your system. then call:
ninja -C build to create code and ./build/ffmpeg-101 sample.mp4 run it.
You should get the following results:
File: sample.mp4, format: mov,mp4,m4a,3gp,3g2,mj2
---- Stream 00
Time base: 1/3000
Framerate: 30/1
Start time: 0
Duration: 30000
Type: video
FourCC: avc1
Codec: h264, bitrate: 47094
Video resolution: 206x80
---- Stream 01
Time base: 1/44100
Framerate: 0/0
Start time: 0
Duration: 440320
Type: audio
FourCC: mp4a
Codec: aac, bitrate: 112000
Audio: 2 channels, sample rate: 44100 Hz
Packet received for stream 00, pts: 0
Send video packet to decoder...
Frame 01, type: I, format: 0, pts: 000, keyframe: true
Packet received for stream 00, pts: 100
Send video packet to decoder...
Frame 02, type: P, format: 0, pts: 100, keyframe: false
Packet received for stream 00, pts: 200
Send video packet to decoder...
Frame 03, type: P, format: 0, pts: 200, keyframe: false
Packet received for stream 00, pts: 300
Send video packet to decoder...
Frame 04, type: P, format: 0, pts: 300, keyframe: false
Packet received for stream 00, pts: 400
Send video packet to decoder...
Frame 05, type: P, format: 0, pts: 400, keyframe: false
Packet received for stream 00, pts: 500
Send video packet to decoder...
Frame 06, type: P, format: 0, pts: 500, keyframe: false
Packet received for stream 00, pts: 600
Send video packet to decoder...
Frame 07, type: P, format: 0, pts: 600, keyframe: false
Packet received for stream 00, pts: 700
Send video packet to decoder...
Frame 08, type: P, format: 0, pts: 700, keyframe: false
Packet received for stream 01, pts: 0
Packet received for stream 01, pts: 1024
Packet received for stream 01, pts: 2048
Packet received for stream 01, pts: 3072
Packet received for stream 01, pts: 4096
Packet received for stream 01, pts: 5120
Packet received for stream 01, pts: 6144
Packet received for stream 01, pts: 7168
Packet received for stream 01, pts: 8192
Packet received for stream 01, pts: 9216
Packet received for stream 01, pts: 10240
Packet received for stream 01, pts: 11264
Packet received for stream 01, pts: 12288
Packet received for stream 01, pts: 13312
Packet received for stream 01, pts: 14336
Packet received for stream 01, pts: 15360
Packet received for stream 01, pts: 16384
Packet received for stream 01, pts: 17408
Packet received for stream 01, pts: 18432
Packet received for stream 01, pts: 19456
Packet received for stream 01, pts: 20480
Packet received for stream 01, pts: 21504
Packet received for stream 00, pts: 800
Send video packet to decoder...
Frame 09, type: P, format: 0, pts: 800, keyframe: false
Packet received for stream 00, pts: 900
Send video packet to decoder...
Frame 10, type: P, format: 0, pts: 900, keyframe: false
<a href