1 | optimization Tips (for libavcodec):
|
---|
2 |
|
---|
3 | What to optimize:
|
---|
4 | If you plan to do non-x86 architecture specific optimizations (SIMD normally),
|
---|
5 | then take a look in the i386/ directory, as most important functions are
|
---|
6 | already optimized for MMX.
|
---|
7 |
|
---|
8 | If you want to do x86 optimizations then you can either try to finetune the
|
---|
9 | stuff in the i386 directory or find some other functions in the C source to
|
---|
10 | optimize, but there aren't many left.
|
---|
11 |
|
---|
12 | Understanding these overoptimized functions:
|
---|
13 | As many functions tend to be a bit difficult to understand because
|
---|
14 | of optimizations, it can be hard to optimize them further, or write
|
---|
15 | architecture-specific versions. It is recommened to look at older
|
---|
16 | revisions of the interesting files (for a web frontend try ViewVC at
|
---|
17 | http://svn.mplayerhq.hu/ffmpeg/trunk/).
|
---|
18 | Alternatively, look into the other architecture-specific versions in
|
---|
19 | the i386/, ppc/, alpha/ subdirectories. Even if you don't exactly
|
---|
20 | comprehend the instructions, it could help understanding the functions
|
---|
21 | and how they can be optimized.
|
---|
22 |
|
---|
23 | NOTE: If you still don't understand some function, ask at our mailing list!!!
|
---|
24 | (http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel)
|
---|
25 |
|
---|
26 |
|
---|
27 |
|
---|
28 | WTF is that function good for ....:
|
---|
29 | The primary purpose of that list is to avoid wasting time to optimize functions
|
---|
30 | which are rarely used
|
---|
31 |
|
---|
32 | put(_no_rnd)_pixels{,_x2,_y2,_xy2}
|
---|
33 | Used in motion compensation (en/decoding).
|
---|
34 |
|
---|
35 | avg_pixels{,_x2,_y2,_xy2}
|
---|
36 | Used in motion compensation of B-frames.
|
---|
37 | These are less important than the put*pixels functions.
|
---|
38 |
|
---|
39 | avg_no_rnd_pixels*
|
---|
40 | unused
|
---|
41 |
|
---|
42 | pix_abs16x16{,_x2,_y2,_xy2}
|
---|
43 | Used in motion estimation (encoding) with SAD.
|
---|
44 |
|
---|
45 | pix_abs8x8{,_x2,_y2,_xy2}
|
---|
46 | Used in motion estimation (encoding) with SAD of MPEG-4 4MV only.
|
---|
47 | These are less important than the pix_abs16x16* functions.
|
---|
48 |
|
---|
49 | put_mspel8_mc* / wmv2_mspel8*
|
---|
50 | Used only in WMV2.
|
---|
51 | it is not recommended that you waste your time with these, as WMV2
|
---|
52 | is an ugly and relatively useless codec.
|
---|
53 |
|
---|
54 | mpeg4_qpel* / *qpel_mc*
|
---|
55 | Used in MPEG-4 qpel motion compensation (encoding & decoding).
|
---|
56 | The qpel8 functions are used only for 4mv,
|
---|
57 | the avg_* functions are used only for B-frames.
|
---|
58 | Optimizing them should have a significant impact on qpel
|
---|
59 | encoding & decoding.
|
---|
60 |
|
---|
61 | qpel{8,16}_mc??_old_c / *pixels{8,16}_l4
|
---|
62 | Just used to work around a bug in an old libavcodec encoder version.
|
---|
63 | Don't optimize them.
|
---|
64 |
|
---|
65 | tpel_mc_func {put,avg}_tpel_pixels_tab
|
---|
66 | Used only for SVQ3, so only optimize them if you need fast SVQ3 decoding.
|
---|
67 |
|
---|
68 | add_bytes/diff_bytes
|
---|
69 | For huffyuv only, optimize if you want a faster ffhuffyuv codec.
|
---|
70 |
|
---|
71 | get_pixels / diff_pixels
|
---|
72 | Used for encoding, easy.
|
---|
73 |
|
---|
74 | clear_blocks
|
---|
75 | easiest to optimize
|
---|
76 |
|
---|
77 | gmc
|
---|
78 | Used for MPEG-4 gmc.
|
---|
79 | Optimizing this should have a significant effect on the gmc decoding
|
---|
80 | speed but it's very likely impossible to write in SIMD.
|
---|
81 |
|
---|
82 | gmc1
|
---|
83 | Used for chroma blocks in MPEG-4 gmc with 1 warp point
|
---|
84 | (there are 4 luma & 2 chroma blocks per macroblock, so
|
---|
85 | only 1/3 of the gmc blocks use this, the other 2/3
|
---|
86 | use the normal put_pixel* code, but only if there is
|
---|
87 | just 1 warp point).
|
---|
88 | Note: DivX5 gmc always uses just 1 warp point.
|
---|
89 |
|
---|
90 | pix_sum
|
---|
91 | Used for encoding.
|
---|
92 |
|
---|
93 | hadamard8_diff / sse / sad == pix_norm1 / dct_sad / quant_psnr / rd / bit
|
---|
94 | Specific compare functions used in encoding, it depends upon the
|
---|
95 | command line switches which of these are used.
|
---|
96 | Don't waste your time with dct_sad & quant_psnr, they aren't
|
---|
97 | really useful.
|
---|
98 |
|
---|
99 | put_pixels_clamped / add_pixels_clamped
|
---|
100 | Used for en/decoding in the IDCT, easy.
|
---|
101 | Note, some optimized IDCTs have the add/put clamped code included and
|
---|
102 | then put_pixels_clamped / add_pixels_clamped will be unused.
|
---|
103 |
|
---|
104 | idct/fdct
|
---|
105 | idct (encoding & decoding)
|
---|
106 | fdct (encoding)
|
---|
107 | difficult to optimize
|
---|
108 |
|
---|
109 | dct_quantize_trellis
|
---|
110 | Used for encoding with trellis quantization.
|
---|
111 | difficult to optimize
|
---|
112 |
|
---|
113 | dct_quantize
|
---|
114 | Used for encoding.
|
---|
115 |
|
---|
116 | dct_unquantize_mpeg1
|
---|
117 | Used in MPEG-1 en/decoding.
|
---|
118 |
|
---|
119 | dct_unquantize_mpeg2
|
---|
120 | Used in MPEG-2 en/decoding.
|
---|
121 |
|
---|
122 | dct_unquantize_h263
|
---|
123 | Used in MPEG-4/H.263 en/decoding.
|
---|
124 |
|
---|
125 | FIXME remaining functions?
|
---|
126 | BTW, most of these functions are in dsputil.c/.h, some are in mpegvideo.c/.h.
|
---|
127 |
|
---|
128 |
|
---|
129 |
|
---|
130 | Alignment:
|
---|
131 | Some instructions on some architectures have strict alignment restrictions,
|
---|
132 | for example most SSE/SSE2 instructions on x86.
|
---|
133 | The minimum guaranteed alignment is written in the .h files, for example:
|
---|
134 | void (*put_pixels_clamped)(const DCTELEM *block/*align 16*/, UINT8 *pixels/*align 8*/, int line_size);
|
---|
135 |
|
---|
136 |
|
---|
137 |
|
---|
138 | Links:
|
---|
139 | http://www.aggregate.org/MAGIC/
|
---|
140 |
|
---|
141 | x86-specific:
|
---|
142 | http://developer.intel.com/design/pentium4/manuals/248966.htm
|
---|
143 |
|
---|
144 | The IA-32 Intel Architecture Software Developer's Manual, Volume 2:
|
---|
145 | Instruction Set Reference
|
---|
146 | http://developer.intel.com/design/pentium4/manuals/245471.htm
|
---|
147 |
|
---|
148 | http://www.agner.org/assem/
|
---|
149 |
|
---|
150 | AMD Athlon Processor x86 Code Optimization Guide:
|
---|
151 | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf
|
---|
152 |
|
---|
153 | GCC asm links:
|
---|
154 | official doc but quite ugly
|
---|
155 | http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
|
---|
156 |
|
---|
157 | a bit old (note "+" is valid for input-output, even though the next disagrees)
|
---|
158 | http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf
|
---|