shithub: sox

ref: b5db60e89ae731cf22f7405722353b06d9b8977f
dir: /soxexam.txt/

View raw version
SoX(1)							   SoX(1)



NAME
       soxexam - SoX Examples (CHEAT SHEET)

CONVERSIONS
       Introduction

       In  general,  sox will attempt to take an input sound file
       format and convert it to a new file format using a similar
       data  type  and sample rate.  For instance, "sox monkey.au
       monkey.wav" would try and convert the  mono  8000Hz  u-law
       sample .au file that comes with sox to a 8000Hz u-law .wav
       file.

       If an output format doesn't support the same data type  as
       the  input  file	 then sox will generally select a default
       data type to save it in.	 You  can  override  the  default
       data  type  selection by using command line options.  This
       is also useful for producing a output file with higher  or
       lower precision data and/or sample rate.

       Most  file  formats that contain headers can automatically
       be read in.  When working  with	headerless  file  formats
       then  a user must manually tell sox the data type and sam�
       ple rate using command line options.

       When working with headerless files (raw	files),	 you  may
       take advantage of they pseudo-file types of .ub, .uw, .sb,
       .sw, .ul, and .sl.  By  using  these  extensions	 on  your
       filenames  you  will not have to specify the corresponding
       options on the command line.

       Precision

       The following data types and formats can be represented by
       their  total  uncompressed bit precision.  When converting
       from one data type to another care must be taken to insure
       it  has	an  equal  or greater precision.  If not then the
       audio quality will be degraded.	This is not always a  bad
       thing  when  your  working with things such as voice audio
       and are concerned about disk space  or  bandwidth  of  the
       audio data.

	       Data Format    Precision
	       ___________    _________
	       unsigned byte	8-bit
	       signed byte	8-bit
	       u-law	       12-bit
	       a-law	       12-bit
	       unsigned word   16-bit
	       signed word     16-bit
	       ADPCM	       16-bit
	       GSM	       16-bit
	       unsigned long   32-bit
	       signed long     32-bit
	       ___________    _________

       Examples

       Use  the	 '-V' option on all your command lines.	 It makes
       SoX print out its idea of what is going on.  '-V' is  your
       friend.

       To  convert from unsigned bytes at 8000 Hz to signed words
       at 8000 Hz:

	 sox -r 8000 -c 1 filename.ub newfile.sw

       To convert from Apple's AIFF  format  to	 Microsoft's  WAV
       format:

	 sox filename.aiff filename.wav

       To  convert  from mono raw 8000 Hz 8-bit unsigned PCM data
       to a WAV file:

	 sox -r 8000 -u -b -c 1 filename.raw filename.wav

       SoX is great to use along with other command line programs
       by passing data between the programs using pipelines.  The
       most common example is to use mpg123 to convert mp3  files
       in to wav files.	 The following command line will do this:

	 mpg123 -b 10000 -s filename.mp3 | sox -t raw -r 44100 -s
       -w -c 2 - filename.wav

       When  working  with  totally  unknown  audio data then the
       "auto" file format may be of use.  It  attempts	to  guess
       what  the  file	type  is and then you may save it in to a
       known audio format.

	 sox -V -t auto filename.snd filename.wav

       It is important to understand how  the  internals  of  SoX
       work  with compressed audio including u-law, a-law, ADPCM,
       or GSM.	SoX takes ALL input data types and converts  them
       to  uncompressed 32-bit signed data.  It will then convert
       this internal version into the  requested  output  format.
       This  means  unneeded  noise can be introduced from decom�
       pressing data and then recompressing.  If applying  multi�
       ple  effects to audio data it is best to save the interme�
       diate data as PCM data.	After the final	 effect	 is  per�
       formed then you can specify it as a compressed output for�
       mat.  This will keep noise introduction to a minimum.

       The following example is to apply various  effects  to  an
       8000  Hz	 ADPCM	input file and then end up with the final
       file as 44100 Hz ADPCM.

	 sox firstfile.wav -r 44100 -s -w secondfile.wav
	 sox secondfile.wav thirdfile.wav swap
	 sox thirdfile.wav -a -b finalfile.wav mask

       Under a DOS shell, you can convert several audio files  to
       an  new	output format using something similar to the fol�
       lowing command line:

	 FOR %X IN (*.RAW) DO sox -r 11025 -w -s -t raw $X $X.wav

EFFECTS
       Special	   thanks     goes     to     Juergen	  Mueller
       (jmeuller@uia.au.ac.be) for this write up on effects.

       Introduction:

       The core problem is that you need some experience in using
       effects	in  order  to say "that any old sound file sounds
       with effects absolutely hip". There isn't  any  rule-based
       system  which  tell  you	 the  correct  setting of all the
       parameters for every effect.  But after some time you will
       become an expert in using effects.

       Here  are  some	examples which can be used with any music
       sample.	(For a sample where only a single  instrument  is
       playing,	 extreme  parameter  setting  may make well-known
       "typically" or "classical" sounds.  Likewise,  for  drums,
       vocals or guitars.)

       Single  effects will be explained and some given parameter
       settings that can be used to understand the theory by lis�
       tening to the sound file with the added effect.

       Using multiple effects in parallel or in sequel can result
       either in very perfect sound or ( mostly ) in  a	 dramatic
       overloading in variations of sounds such that your ear may
       follow the sound but you will feel unsatisfied. Hence, for
       the  first  time using effects try to compose them as less
       as possible. We don't regard the composition of effects in
       the examples because to many combinations are possible and
       you really need a very fast machine and a lot of memory to
       play them in real-time.

       And real-time playing of sounds will speed up learning the
       parameter setting.

       Basically, we will use the "play" front-end of  SOX  since
       it is easier to listen sounds coming out of the speaker or
       earphone instead of  looking  at	 cryptic  data	in  sound
       files.

       For easy listening of file.xxx ( "xxx" is any sound format
       ):

	     play file.xxx effect-name effect-parameters

       Or more SOX-like ( for "dsp" output ):

	     sox file.xxx -t ossdsp -w	-s  /dev/dsp  effect-name
       effect-parameters

       or ( for "au" output ):

	      sox  file.xxx -t sunau -w -s /dev/audio effect-name
       effect-parameters

       And for date freaks:

	     sox file.xxx file.yyy effect-name effect-parameters

       Additional options can be used. However, in this case, for
       real-time playing you'll need a very fast machine.

       Notes:

       I  played  all examples in real-time on a Pentium 100 with
       32 MB and Linux 2.0.30 using a self-recorded sample ( 3:15
       min  long  in  "wav"  format with 44.1 kHz sample rate and
       stereo 16 bit ).	 The sample should not contain any of the
       effects.	 However,  if  you  take any recording of a sound
       track from radio or tape or cd, and it sounds like a  live
       concert	or  ten	 people	 are playing the same rhythm with
       their drums or funky-grooves, then take any other  sample.
       (Typically,  less  then	four different instruments and no
       synthesizer in the sample is suitable. Likewise, the  com�
       bination vocal, drums, bass and guitar.)

       Effects:

       Echo

       An  echo	 effect	 can be naturally found in the mountains,
       standing somewhere on a mountain	 and  shouting	a  single
       word  will result in one or more repetitions of the word (
       if not, turn a bit around ant try next, or  climb  to  the
       next mountain ).

       However,	 the time difference between shouting and repeat�
       ing is the delay (time), its loudness is the decay. Multi�
       ple echos can have different delays and decays.

       Very  popular  is  using	 echos to play an instrument with
       itself together, like some guitar players ( Brain May from
       Queen ) or vocalists are doing.	For music samples of more
       than one instrument, echo can be used to add a second sam�
       ple shortly after the original one.

       This  will  sound  as  doubling	the number of instruments
       playing the same sample:

	     play file.xxx echo 0.8 0.88 60.0 0.4

       If the delay is very short then it sound like a (metallic)
       robot playing music:

	     play file.xxx echo 0.8 0.88 6.0 0.4

       Longer  delay  will  sound  like a open air concert in the
       mountains:

	     play file.xxx echo 0.8 0.9 1000.0 0.3

       One mountain more, and:

	     play file.xxx echo 0.8 0.9 1000.0 0.3 1800.0 0.25

       Echos

       Like the echo effect, echos stand for  "ECHO  in	 Sequel",
       that  is	 the  first echos takes the input, the second the
       input and the first echos, the third  the  input	 and  the
       first and the second echos, ... and so on.  Care should be
       taken using many echos (	 see  introduction  );	a  single
       echos has the same effect as a single echo.

       The sample will be bounced twice in symmetric echos:

	     play file.xxx echos 0.8 0.7 700.0 0.25 700.0 0.3

       The sample will be bounced twice in asymmetric echos:

	     play file.xxx echos 0.8 0.7 700.0 0.25 900.0 0.3

       The sample will sound as played in a garage:

	     play file.xxx echos 0.8 0.7 40.0 0.25 63.0 0.3

       Chorus

       The  chorus  effect  has its name because it will often be
       used to make a single vocal sound like a	 chorus.  But  it
       can be applied to other instrument samples too.

       It  works like the echo effect with a short delay, but the
       delay isn't constant.  The delay is varied using	 a  sinu�
       soidal  or  triangular  modulation.  The	 modulation depth
       defines the range the modulated delay is played before  or
       after the delay. Hence the delayed sound will sound slower
       or faster, that is the  delayed	sound  tuned  around  the
       original	 one, like in a chorus where some vocal are a bit
       out of tune.

       The typical delay is around 40ms to 60ms, the speed of the
       modulation  is  best  near 0.25Hz and the modulation depth
       around 2ms.

       A single delay will make the sample more overloaded:

	     play file.xxx chorus 0.7 0.9 55.0 0.4 0.25 2.0 -t

       Two delays of the original samples sound like this:

	     play file.xxx chorus 0.6 0.9 50.0 0.4  0.25  2.0  -t
       60.0 0.32 0.4 1.3 -s

       A  big  chorus of the sample is ( three additional samples
       ):

	     play file.xxx chorus 0.5 0.9 50.0 0.4  0.25  2.0  -t
       60.0 0.32 0.4 2.3 -t	   40.0 0.3 0.3 1.3 -s

       Flanger

       The  flanger  effect  is	 like  the chorus effect, but the
       delay varies between 0ms and maximal 5ms.  It  sound  like
       wind blowing, sometimes faster or slower including changes
       of the speed.

       The flanger effect is widely used in funk and soul  music,
       where  the  guitar  sound  varies frequently slow or a bit
       faster.

       The typical delay is around 3ms to 5ms, the speed  of  the
       modulation is best near 0.5Hz.

       Now, let's groove the sample:

	     play file.xxx flanger 0.6 0.87 3.0 0.9 0.5 -s

       listen  carefully between the difference of sinusoidal and
       triangular modulation:

	     play file.xxx flanger 0.6 0.87 3.0 0.9 0.5 -t

       If the decay is a bit lower, than the effect  sounds  more
       popular:

	     play file.xxx flanger 0.8 0.88 3.0 0.4 0.5 -t

       The drunken loudspeaker system:

	     play file.xxx flanger 0.9 0.9 4.0 0.23 1.3 -s

       Reverb

       The reverb effect is often used in audience hall which are
       to small or to many visitors  disturb  the  reflection  of
       sound   at  the	walls  to  make	 the  sound  played  more
       monumental. You can try the reverb effect in your bathroom
       or  garage  or  sport  halls  by shouting loud some words.
       You'll hear the words reflected from the walls.

       The biggest problem in using the reverb effect is the cor�
       rect  setting  of the (wall) delays such that the sound is
       realistic an doesn't sound like music playing in a tin  or
       overloaded feedback destroys any illusion of any big hall.
       To help you for much realistic reverb effects, you  should
       decide  first, how long the reverb should take place until
       it is not loud enough to be registered by your ears.  This
       is be done by the reverb time "t", in small halls 200ms in
       bigger one 1000ms, if you like. Clearly, the walls of such
       a  hall	aren't far away, so you should define its setting
       be given every wall its delay time.  However, if the  wall
       is  to  far  away  for the reverb time, you won't hear the
       reverb, so the nearest wall will be best "t/4"  delay  and
       the  farthest "t/2".  You can try other distances as well,
       but it won't sound very realistic.   The	 walls	shouldn't
       stand to close to each other and not in a multiple integer
       distance to each other ( so avoid  wall	like:  200.0  and
       202.0, or something like 100.0 and 200.0 ).

       Since audience halls do have a lot of walls, we will start
       designing one beginning with one wall:

	     play file.xxx reverb 1.0 600.0 180.0

       One wall more:

	     play file.xxx reverb 1.0 600.0 180.0 200.0

       Next two walls:

	     play file.xxx reverb 1.0  600.0  180.0  200.0  220.0
       240.0

       Now, why not a futuristic hall with six walls:

	      play  file.xxx  reverb  1.0 600.0 180.0 200.0 220.0
       240.0 280.0 300.0

       If you run out of machine power or memory,  then	 stop  as
       much  applications as possible ( every interrupt will con�
       sume a lot of CPU time which for	 bigger	 halls	is  abso�
       lutely necessary ).

       Phaser

       The  phaser effect is like the flanger effect, but it uses
       a reverb instead of  an	echo  and  does	 phase	shifting.
       You'll  hear the difference in the examples comparing both
       effects ( simply change the effect name ).  The delay mod�
       ulation	can  be done sinusoidal or triangular, preferable
       is the later one for  multiple  instruments  playing.  For
       single instrument sounds the sinusoidal phaser effect will
       give a sharper phasing effect.  The decay shouldn't be  to
       close  to  1.0 which will cause dramatic feedback.  A good
       range is about 0.5 to 0.1 for the decay.

       We will take a parameter setting as for the flanger before
       (  gain-out  is	lower since feedback can raise the output
       dramatically ):

	     play file.xxx phaser 0.8 0.74 3.0 0.4 0.5 -t

       The drunken loudspeaker system ( now less alcohol ):

	     play file.xxx phaser 0.9 0.85 4.0 0.23 1.3 -s

       A popular sound of the sample is as follows:

	     play file.xxx phaser 0.89 0.85 1.0 0.24 2.0 -t

       The sample sounds if ten springs are in your ears:

	     play file.xxx phaser 0.6 0.66 3.0 0.6 2.0 -t

       Compander

       The compander effect allows the dynamic range of a  signal
       to  be  compressed  or expanded.	 For most situations, the
       attack time (response to the music getting louder)  should
       be  shorter  than the decay time because our ears are more
       sensitive to suddenly loud music	 than  to  suddenly  soft
       music.

       For  example,  suppose you are listening to Strauss' "Also
       Sprach Zarathustra" in a noisy environment such as a  car.
       If you turn up the volume enough to hear the soft passages
       over the road noise, the loud sections will be  too  loud.
       You could try this:

		    play       file.xxx	      compand	    0.3,1
       -90,-90,-70,-70,-60,-20,0,0 -5 0 0.2

       The transfer function  ("-90,...")  says	 that  very  soft
       sounds  between	-90  and  -70  decibels (-90 is about the
       limit of 16-bit encoding)  will	remain	unchanged.   That
       keeps  the  compander from boosting the volume on "silent"
       passages such as between movements.   However,  sounds  in
       the range -60 decibels to 0 decibels (maximum volume) will
       be boosted so that the 60-dB dynamic range of the original
       music  will be compressed 3-to-1 into a 20-dB range, which
       is wide enough to enjoy the music but narrow enough to get
       around the road noise.  The -5 dB output gain is needed to
       avoid clipping (the number is inexact, and was derived  by
       experimentation).   The 0 for the initial volume will work
       fine for a clip that starts with a bit of silence, and the
       delay  of  0.2  has the effect of causing the compander to
       react a bit more quickly to sudden volume changes.

       Other effects ( copy, rate, avg, stat, vibro, lowp, highp,
       band, reverb )

       The  other effects are simple to use. However, an "easy to
       use manual" should be given here.

       More effects ( to do ! )

       There are a lot of effects around like noise  gates,  com�
       pressors,  waw-waw,  stereo effects and so on. They should
       be implemented making SOX to be more useful in sound  mix�
       ing  techniques	coming	together  with a great variety of
       different sound effects.

       Combining effects by using them in parallel or sequence on
       different  channels  needs  some	 easy  mechanism which is
       real-time stable.

       Really missing, is the changing of the parameters,  start�
       ing and stopping of effects while playing samples in real-
       time!

       Good luck and have fun with all the effects!

	    Juergen Mueller	     (jmueller@uia.ua.ac.be)


SEE ALSO
       sox(1), play(1), rec(1)



			December 10, 1999		   SoX(1)