shithub: sox

ref: d6f4cbe9d7570431dd9b56210cd0da17976a442b
dir: /soxexam.txt/

View raw version
SoX(1)									SoX(1)



NAME
       soxexam - SoX Examples (CHEAT SHEET)

CONVERSIONS
       Introduction

       In  general,  SoX  will	attempt to take an input sound file format and
       convert it into a new file format using a similar data type and	sample
       rate.   For  instance, "sox monkey.au monkey.wav" would try and convert
       the mono 8000Hz u-law sample .au file that comes with SoX to  a	8000Hz
       u-law .wav file.

       If  an  output  format  doesn’t support the same data type as the input
       file then SoX will generally select a default data type to save it  in.
       You  can override the default data type selection by using command line
       options.	 This is also useful for producing an output file with	higher
       or lower precision data and/or sample rate.

       Most  file  formats  that contain headers can automatically be read in.
       When working with header-less file formats then a  user	must  manually
       tell SoX the data type and sample rate using command line options.

       When working with header-less files (raw files), you may take advantage
       of the pseudo-file types of .ub, .uw, .sb, .sw, .ul, and .sl.  By using
       these  extensions  on  your  filenames you will not have to specify the
       corresponding options on the command line.

       Precision

       The following data types and formats can be represented by their	 total
       uncompressed  bit  precision.   When  converting	 from one data type to
       another care must be taken to insure it has an equal or greater	preci-
       sion.   If  not	then  the audio quality will be degraded.  This is not
       always a bad thing when your working with things such  as  voice	 audio
       and are concerned about disk space or bandwidth of the audio data.

	       Data Format    Precision
	       ___________    _________
	       unsigned byte	8-bit
	       signed byte	8-bit
	       u-law	       14-bit
	       A-law	       13-bit
	       unsigned word   16-bit
	       signed word     16-bit
	       ADPCM	       16-bit
	       GSM	       16-bit
	       unsigned long   32-bit
	       signed long     32-bit
	       ___________    _________

       Examples

       Use  the ’-V’ option on all your command lines.	It makes SoX print out
       its idea of what is going on.  ’-V’ is your friend.

       To convert from unsigned bytes at 8000 Hz to signed words at 8000 Hz:

	 sox -r 8000 -c 1 filename.ub newfile.sw

       To convert from Apple’s AIFF format to Microsoft’s WAV format:

	 sox filename.aiff filename.wav

       To convert from mono raw 8000 Hz 8-bit unsigned PCM data to a WAV file:

	 sox -r 8000 -u -b -c 1 filename.raw filename.wav

       SoX  may	 even  be  used	 to convert sample rates.  Downconverting will
       reduce the bandwidth of a sample, but will reduce storage space on your
       disk.   All  such  conversions are lossy and will introduce some noise.
       You should really pass your sample through a low pass filter  prior  to
       downconverting  as  this	 will prevent alias signals (which would sound
       like additional noise).	For example to convert from a sample  recorded
       at 11025 Hz to a u-law file at 8000 Hz sample rate:

	 sox infile.wav -t au -r 8000 -U -b -c 1 outputfile.au

       To  add	a  low-pass filter (note use of stdout for output of the first
       stage and stdin for input on the second stage):

	 sox infile.wav -t raw -s -w -c 1 - lowpass 3700  |
	   sox -t raw -r 11025 -s -w -c 1 - -t au -r 8000 -U -b -c 1 ofile.au

       If you hear some clicks and pops when converting	 to  u-law  or	A-law,
       reduce  the output level slightly, for example this will decrease it by
       20%:

	 sox infile.wav -t au -r 8000 -U -b -c 1 -v .8 outputfile.au


       SoX is great to use along with other command line programs  by  passing
       data  between the programs using pipelines.  The most common example is
       to use mpg123 to convert mp3 files in to wav files.  The following com-
       mand line will do this:

	 mpg123	 -b  10000  -s filename.mp3 | sox -t raw -r 44100 -s -w -c 2 -
       filename.wav

       When working with totally unknown audio data then the "auto" file  for-
       mat may be of use.  It attempts to guess what the file type is and then
       you may save it into a known audio format.

	 sox -V -t auto filename.snd filename.wav

       It is important to understand how the internals of SoX work  with  com-
       pressed	audio  including  u-law,  A-law, ADPCM, or GSM.	 SoX takes ALL
       input data types and converts them to uncompressed 32-bit signed	 data.
       It  will	 then  convert this internal version into the requested output
       format.	This means additional noise can be introduced from decompress-
       ing data and then recompressing.	 If applying multiple effects to audio
       data, it is best to save the intermediate data as PCM data.  After  the
       final effect is performed, then you can specify it as a compressed out-
       put format.  This will keep noise introduction to a minimum.

       The following example applies various effects to an 8000 Hz ADPCM input
       file and then end up with the final file as 44100 Hz ADPCM.

	 sox firstfile.wav -r 44100 -s -w secondfile.wav
	 sox secondfile.wav thirdfile.wav swap
	 sox thirdfile.wav -a -b finalfile.wav mask

       Under a DOS shell, you can convert several audio files to an new output
       format using something similar to the following command line:

	 FOR %X IN (*.RAW) DO sox -r 11025 -w -s -t raw $X $X.wav

EFFECTS
       Special thanks goes to Juergen Mueller (jmeuller@uia.au.ac.be) for this
       write up on effects.

       Introduction:

       The  core  problem is that you need some experience in using effects in
       order to say "that any old sound file sounds  with  effects  absolutely
       hip". There isn’t any rule-based system which tell you the correct set-
       ting of all the parameters for every effect.  But after some  time  you
       will become an expert in using effects.

       Here are some examples which can be used with any music sample.	(For a
       sample where only a single instrument  is  playing,  extreme  parameter
       setting	may  make  well-known "typically" or "classical" sounds. Like-
       wise, for drums, vocals or guitars.)

       Single effects will be explained and some given parameter settings that
       can  be	used  to  understand the theory by listening to the sound file
       with the added effect.

       Using multiple effects in parallel or in series can result either in  a
       very  nice sound or (mostly) in a dramatic overloading in variations of
       sounds such that your ear may follow the sound but you will feel unsat-
       isfied.	Hence, for the first time using effects try to compose them as
       minimally as possible. We don’t regard the composition  of  effects  in
       the  examples because too many combinations are possible and you really
       need a very fast machine and a lot of memory to play them in real-time.

       However,	 real-time  playing  of	 sounds will greatly speed up learning
       and/or tuning the parameter settings for your sounds in	order  to  get
       that "perfect" effect.

       Basically,  we  will use the "play" front-end of SoX since it is easier
       to listen sounds coming out of the speaker or earphone instead of look-
       ing at cryptic data in sound files.

       For easy listening of file.xxx ("xxx" is any sound format):

	     play file.xxx effect-name effect-parameters

       Or more SoX-like (for "dsp" output on a UNIX/Linux computer):

	      sox file.xxx -t ossdsp -w -s /dev/dsp effect-name effect-parame-
       ters

       or (for "au" output):

	     sox file.xxx -t sunau -w -s /dev/audio effect-name effect-parame-
       ters

       And for date freaks:

	     sox file.xxx file.yyy effect-name effect-parameters

       Additional  options  can	 be used. However, in this case, for real-time
       playing you’ll need a very fast machine.

       Notes:

       I played all examples in real-time on a Pentium	100  with  32  MB  and
       Linux 2.0.30 using a self-recorded sample ( 3:15 min long in "wav" for-
       mat with 44.1 kHz sample rate and stereo 16 bit ).  The	sample	should
       not contain any of the effects. However, if you take any recording of a
       sound track from radio or tape or CD, and it sounds like a live concert
       or  ten	people	are playing the same rhythm with their drums or funky-
       grooves, then take any other sample.  (Typically, less then  four  dif-
       ferent  instruments and no synthesizer in the sample is suitable. Like-
       wise, the combination vocal, drums, bass and guitar.)

       Effects:

       Echo

       An echo effect can be naturally found in the mountains, standing	 some-
       where  on  a  mountain and shouting a single word will result in one or
       more repetitions of the word (if not, turn a bit around and try	again,
       or climb to the next mountain).

       However,	 the  time  difference	between	 shouting and repeating is the
       delay (time), its loudness is the decay. Multiple echos can  have  dif-
       ferent delays and decays.

       It  is  very  popular  to  use  echos to play an instrument with itself
       together, like some guitar players (Brain May from Queen) or  vocalists
       are  doing.  For music samples of more than one instrument, echo can be
       used to add a second sample shortly after the original one.

       This will sound as if you are doubling the number of instruments	 play-
       ing in the same sample:

	     play file.xxx echo 0.8 0.88 60.0 0.4

       If the delay is very short, then it sound like a (metallic) robot play-
       ing music:

	     play file.xxx echo 0.8 0.88 6.0 0.4

       Longer delay will sound like an open air concert in the mountains:

	     play file.xxx echo 0.8 0.9 1000.0 0.3

       One mountain more, and:

	     play file.xxx echo 0.8 0.9 1000.0 0.3 1800.0 0.25

       Echos

       Like the echo effect, echos stand for "ECHO in  Sequel",	 that  is  the
       first  echos takes the input, the second the input and the first echos,
       the third the input and the first and the second echos, ... and so  on.
       Care  should  be	 taken	using  many echos (see introduction); a single
       echos has the same effect as a single echo.

       The sample will be bounced twice in symmetric echos:

	     play file.xxx echos 0.8 0.7 700.0 0.25 700.0 0.3

       The sample will be bounced twice in asymmetric echos:

	     play file.xxx echos 0.8 0.7 700.0 0.25 900.0 0.3

       The sample will sound as if played in a garage:

	     play file.xxx echos 0.8 0.7 40.0 0.25 63.0 0.3

       Chorus

       The chorus effect has its name because it will often be used to make  a
       single  vocal  sound  like  a  chorus.  But  it can be applied to other
       instrument samples too.

       It works like the echo effect with a short delay, but the  delay	 isn’t
       constant.  The delay is varied using a sinusoidal or triangular modula-
       tion. The modulation depth defines the range  the  modulated  delay  is
       played  before  or  after the delay. Hence the delayed sound will sound
       slower or faster, that is the delayed sound tuned around	 the  original
       one, like in a chorus where some vocals are a bit out of tune.

       The  typical  delay is around 40ms to 60ms, the speed of the modulation
       is best near 0.25Hz and the modulation depth around 2ms.

       A single delay will make the sample more overloaded:

	     play file.xxx chorus 0.7 0.9 55.0 0.4 0.25 2.0 -t

       Two delays of the original samples sound like this:

	     play file.xxx chorus 0.6 0.9 50.0 0.4 0.25 2.0 -t 60.0  0.32  0.4
       1.3 -s

       A big chorus of the sample is (three additional samples):

	      play  file.xxx chorus 0.5 0.9 50.0 0.4 0.25 2.0 -t 60.0 0.32 0.4
       2.3 -t	      40.0 0.3 0.3 1.3 -s

       Flanger

       The flanger effect is like the chorus  effect,  but  the	 delay	varies
       between	0ms  and  maximal  5ms.	 It sound like wind blowing, sometimes
       faster or slower including changes of the speed.

       The flanger effect is widely used in funk and  soul  music,  where  the
       guitar sound varies frequently slow or a bit faster.

       The  typical delay is around 3ms to 5ms, the speed of the modulation is
       best near 0.5Hz.

       Now, let’s groove the sample:

	     play file.xxx flanger 0.6 0.87 3.0 0.9 0.5 -s

       listen carefully between the difference of  sinusoidal  and  triangular
       modulation:

	     play file.xxx flanger 0.6 0.87 3.0 0.9 0.5 -t

       If the decay is a bit lower, than the effect sounds more popular:

	     play file.xxx flanger 0.8 0.88 3.0 0.4 0.5 -t

       The drunken loudspeaker system:

	     play file.xxx flanger 0.9 0.9 4.0 0.23 1.3 -s

       Reverb

       The  reverb effect is often used in audience hall which are to small or
       contain too many many visitors which disturb (dampen) the reflection of
       sound  at  the walls.  Reverb will make the sound be perceived as if it
       were in a large hall.  You can try the reverb effect in	your  bathroom
       or  garage  or sport halls by shouting loud some words. You’ll hear the
       words reflected from the walls.

       The biggest problem in using the reverb effect is the  correct  setting
       of the (wall) delays such that the sound is realistic and doesn’t sound
       like music playing in a	tin  can  or  has  overloaded  feedback	 which
       destroys	 any  illusion	of  playing in a big hall.  To help you obtain
       realistic reverb effects, you should decide first how long  the	reverb
       should  take place until it is not loud enough to be registered by your
       ears. This is be done by varying the  reverb  time  "t".	  To  simulate
       small halls, use 200ms.	To simulate large halls, use 1000ms.  Clearly,
       the walls of such a hall aren’t far away, so you should define its set-
       ting  be	 given	every wall its delay time.  However, if the wall is to
       far away for the reverb time, you won’t hear the reverb, so the nearest
       wall will be best at "t/4" delay and the farthest at "t/2". You can try
       other distances as well, but it won’t sound very realistic.  The	 walls
       shouldn’t  stand	 to  close to each other and not in a multiple integer
       distance to each other ( so avoid wall like: 200.0 and 202.0, or	 some-
       thing like 100.0 and 200.0 ).

       Since  audience	halls  do have a lot of walls, we will start designing
       one beginning with one wall:

	     play file.xxx reverb 1.0 600.0 180.0

       One wall more:

	     play file.xxx reverb 1.0 600.0 180.0 200.0

       Next two walls:

	     play file.xxx reverb 1.0 600.0 180.0 200.0 220.0 240.0

       Now, why not a futuristic hall with six walls:

	     play file.xxx reverb 1.0 600.0  180.0  200.0  220.0  240.0	 280.0
       300.0

       If  you	run out of machine power or memory, then stop as many applica-
       tions as possible (every interrupt will consume a lot of CPU time which
       for bigger halls is absolutely necessary).

       Phaser

       The  phaser  effect  is	like  the flanger effect, but it uses a reverb
       instead of an echo and does phase shifting. You’ll hear the  difference
       in the examples comparing both effects (simply change the effect name).
       The delay modulation can be sinusoidal or triangular, preferable is the
       later for multiple instruments. For single instrument sounds, the sinu-
       soidal phaser effect will give a sharper	 phasing  effect.   The	 decay
       shouldn’t  be  to  close	 to 1.0 which will cause dramatic feedback.  A
       good range is about 0.5 to 0.1 for the decay.

       We will take a parameter setting as for the flanger before (gain-out is
       lower since feedback can raise the output dramatically):

	     play file.xxx phaser 0.8 0.74 3.0 0.4 0.5 -t

       The drunken loudspeaker system (now less alcohol):

	     play file.xxx phaser 0.9 0.85 4.0 0.23 1.3 -s

       A popular sound of the sample is as follows:

	     play file.xxx phaser 0.89 0.85 1.0 0.24 2.0 -t

       The sample sounds if ten springs are in your ears:

	     play file.xxx phaser 0.6 0.66 3.0 0.6 2.0 -t

       Compander

       The  compander  effect  allows the dynamic range of a signal to be com-
       pressed or expanded.  For most situations, the attack time (response to
       the music getting louder) should be shorter than the decay time because
       our ears are more sensitive to suddenly loud  music  than  to  suddenly
       soft music.

       For  example,  suppose  you  are	 listening  to	Strauss’  "Also Sprach
       Zarathustra" in a noisy environment such as a car.  If you turn up  the
       volume  enough  to hear the soft passages over the road noise, the loud
       sections will be too loud.  You could try this:

	     play file.xxx compand 0.3,1 -90,-90,-70,-70,-60,-20,0,0 -5 0 0.2

       The transfer function ("-90,...") says that very	 soft  sounds  between
       -90  and	 -70 decibels (-90 is about the limit of 16-bit encoding) will
       remain unchanged.  That keeps the compander from boosting the volume on
       "silent"	 passages  such	 as between movements.	However, sounds in the
       range -60 decibels to 0 decibels (maximum volume) will  be  boosted  so
       that  the  60-dB dynamic range of the original music will be compressed
       3-to-1 into a 20-dB range, which is wide enough to enjoy the music  but
       narrow  enough  to get around the road noise.  The -5 dB output gain is
       needed to avoid clipping (the number is inexact,	 and  was  derived  by
       experimentation).   The	0  for the initial volume will work fine for a
       clip that starts with a bit of silence, and the delay of	 0.2  has  the
       effect  of  causing the compander to react a bit more quickly to sudden
       volume changes.

       Changing the Rate of Playback

       You can use stretch to change the rate of playback of an	 audio	sample
       while preserving the pitch.  For example to play at 1/2 the speed:

	     play file.wav stretch 2

       To play a file at twice the speed:

	     play file.wav stretch .5

       Other  related  options	are  "speed"  to change the speed of play (and
       changing the pitch accordingly), and pitch, to alter  the  pitch	 of  a
       sample.	For example to speed a sample so it plays in 1/2 the time (for
       those Mickey Mouse voices):

	     play file.wav speed 2

       To raise the pitch of a sample 1 while note (100 cents):

	     play file.wav pitch 100



       Other effects (copy, rate, avg, stat, vibro, lowp, highp, band, reverb)

       The  other  effects are simple to use. However, an "easy to use manual"
       should be given here.

       More effects (to do !)

       There are a lot of effects around like noise gates,  compressors,  waw-
       waw,  stereo  effects and so on. They should be implemented, making SoX
       more useful in sound mixing techniques coming  together	with  a	 great
       variety of different sound effects.

       Combining  effects  by  using them in parallel or serially on different
       channels needs some easy mechanism which is stable  for	use  in	 real-
       time.

       Really  missing	are  the  the  changing	 of  the parameters and start-
       ing/stopping of effects while playing samples in real-time!

       Good luck and have fun with all the effects!

	    Juergen Mueller	     (jmueller@uia.ua.ac.be)


SEE ALSO
       sox(1), play(1), rec(1)

AUTHOR
       Juergen Mueller	   (jmueller@uia.ua.ac.be)

       Updates by Anonymous.



			       December 11, 2001			SoX(1)