|
「コンピュータサウンドの世界」の内容に関する補足(^_^;) 比較実験をしているIndyの画面(^_^)
MP3エンコード中のMacの画面(^_^)
5292058 hard_44100_16bit_stereo_norm.aiff 3840062 hard_32000_16bit_stereo_norm.aiff 2646062 hard_22050_16bit_stereo_norm.aiff 1920062 hard_16000_16bit_stereo_norm.aiff 1323058 hard_11025_16bit_stereo_norm.aiff 960058 hard__8000_16bit_stereo_norm.aiff 480026 hard__8000_16bit_stereo_comp.au 481241 hard_44100_16bit_stereo_comp.mp3
比較実験のために作ったサウンドファイルのリスト01 -rw-r--r-- 1 root user 5760124 hard_48000_16bit_stereo_norm.aiff 02 -rw-r--r-- 1 root user 5760000 hard_48000_16bit_stereo_norm.data 03 -rw-r--r-- 1 root user 5760044 hard_48000_16bit_stereo_norm.wav 04 -rw-r--r-- 1 root user 5760124 hard_48000_16bit_stereo_norm.aiff 05 -rw-r--r-- 1 root user 2880062 hard_48000_16bit___mono_norm.aiff 06 -rw-r--r-- 1 root user 5760124 hard_48000_16bit_stereo_norm.aiff 07 -rw-r--r-- 1 root user 8640062 hard_48000_24bit_stereo_norm.aiff 08 -rw-r--r-- 1 root user 5760062 hard_48000_12bit_stereo_norm.aiff 09 -rw-r--r-- 1 root user 2880062 hard_48000__8bit_stereo_norm.aiff 10 -rw-r--r-- 1 root user 5760124 hard_48000_16bit_stereo_norm.aiff 11 -rw-r--r-- 1 root user 5292058 hard_44100_16bit_stereo_norm.aiff 12 -rw-r--r-- 1 root user 3840062 hard_32000_16bit_stereo_norm.aiff 13 -rw-r--r-- 1 root user 2646062 hard_22050_16bit_stereo_norm.aiff 14 -rw-r--r-- 1 root user 1920062 hard_16000_16bit_stereo_norm.aiff 15 -rw-r--r-- 1 root user 1323058 hard_11025_16bit_stereo_norm.aiff 16 -rw-r--r-- 1 root user 960058 hard__8000_16bit_stereo_norm.aiff 17 -rw-r--r-- 1 root user 5292058 hard_44100_16bit_stereo_norm.aiff 18 -rw-r--r-- 1 root user 5292024 hard_44100_16bit_stereo_norm.au 19 -rw-r--r-- 1 root user 2646026 hard_44100_16bit_stereo_comp.au 20 -rw-r--r-- 1 root user 5292058 hard_44100_16bit_stereo_norm.aiff 21 -rw-r--r-- 1 root user 5292058 soft_44100_16bit_stereo_norm.aiff 22 -rw-r--r-- 1 root user 960058 hard__8000_16bit_stereo_norm.aiff 23 -rw-r--r-- 1 root user 480060 hard__8000_16bit___mono_norm.aiff 24 -rw-r--r-- 1 root user 480026 hard__8000_16bit_stereo_comp.au 25 -rw-r--r-- 1 root user 240027 hard__8000_16bit___mono_comp.au 26 -rw-r--r-- 1 root user 240027 soft__8000_16bit___mono_comp.au 27 -rw-r--r-- 1 root user 5292058 hard_44100_16bit_stereo_norm.aiff 28 -rw-r--r-- 1 root user 5033097 hard_44100_16bit_stereo_norm.lzh 29 -rw-r--r-- 1 root user 5022285 hard_44100_16bit_stereo_norm.zip 30 -rw-r--r-- 1 root user 5027251 hard_44100_16bit_stereo_norm.sit 31 -rw-r--r-- 1 root user 5291891 hard_44100_16bit_stereo_norm.cpt 32 -rw-r--r-- 1 root user 481241 hard_44100_16bit_stereo_norm.mp3 33 -rw-r--r-- 1 root user 5292058 soft_44100_16bit_stereo_norm.aiff 34 -rw-r--r-- 1 root user 4352562 soft_44100_16bit_stereo_norm.lzh 35 -rw-r--r-- 1 root user 4368710 soft_44100_16bit_stereo_norm.zip 36 -rw-r--r-- 1 root user 4437119 soft_44100_16bit_stereo_norm.sit 37 -rw-r--r-- 1 root user 4714626 soft_44100_16bit_stereo_norm.cpt 38 -rw-r--r-- 1 root user 481233 soft_44100_16bit_stereo_norm.mp3
Audio Interchange File Format: "AIFF"A Standard for Sampled Sound Files
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| char: | 8 bits, signed. A char can contain more than just ASCII characters. It can contain any number from -128 to 127 (inclusive). |
| unsigned char: | 8 bits, unsigned. Contains any number from zero to 255 (inclusive). |
| short: | 16 bits, signed. Contains any number from -32,768 to 32,767 (inclusive). |
| unsigned short: | 16 bits, unsigned. Contains any number from zero to 65,535 (inclusive). |
| long: | 32 bits, signed. Contains any number from -2,147,483,648 to 2,147,483,647 (inclusive). |
| unsigned long: | 32 bits, unsigned. Contains any number from zero to 4,294,967,295 (inclusive). |
| extended: | 80 bit IEEE Standard 754 floating point number (Standard Apple Numeric Environment [SANE] data type Extended). |
| pstring: | Pascal-style string, a one byte count followed by text bytes. The total number of bytes in this data type should be even. A pad byte can be added at the end of the text to accomplish this. This pad byte is not reflected in the count. |
| ID: | 32 bits, the concatenation of four printable ASCII character in the range ' ' (SP, 0x20) through '~' (0x7E). Spaces (0x20) cannot precede printing characters; trailing spaces are allowed. Control characters are forbidden. |
| OSType: | 32 bits. A concatenation of four characters, as defined in Inside Macintosh, vol II. |
Decimal values are referred to as a string of digits, for example 123, 0, 100 are all decimal numbers. Hexadecimal values are preceded by a 0x - e.g. 0x0A12, 0x1, 0x64.
Data Organization
All data is stored in Motorola 68000 format. Data is organized as follows:
The official name for this standard is Audio Interchange File Format. If an application program needs to present the name of this format to a user, such as in a "Save as..." dialog box, the name can be abbreviated to Audio IFF.
The "EA IFF 85" Standard for Interchange Format Files defines an overall structure for storing data in files. Audio IFF conforms to the "EA IFF 85" standard. This document will describe those portions of "EA IFF 85" that are germane to Audio IFF. For a more complete discussion of "EA IFF 85", please refer to the document "EA IFF 85" Standard for Interchange Format Files.
An "EA IFF 85" file is made up of a number of chunks of data. Chunks are the building blocks of "EA IFF 85" files. A chunk consists of some header information followed by data:
A chunk can be represented using our C-like language in the following manner:
typedef struct {
ID ckID; /* chunk ID */
long ckSize; /* chunk Size */
char ckData[]; /* data */
} Chunk;
ckID describes the format of the data portion a chunk. A program can determine how to interpret the chunk data by examining ckID.
ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
ckData contains the data stored in the chunk. The format of this data is determined by ckID. If the data is an odd number of bytes in length, a zero pad byte must be added at the end. The pad byte is not included in ckSize .
Note that an array with no size specification (e.g. char ckData[];) indicates a variable-sized array in our C-like language. This differs from standard C.
An Audio IFF file is a collection of a number of different types of chunks. There is a Common Chunk which contains important parameters describing the sampled sound, such as it's length and sample rate. There is a Sound Data Chunk that contains the actual audio samples. There are several other optional chunks that define markers, list instrument parameters, store application-specific information, etc. All of these chunks are described in detail in later sections of this document.
The chunks in a Audio IFF file are grouped together in a container chunk. "EA IFF 85" defines a number of container chunks, but the one used by Audio IFF is called a FORM. A FORM has the following format:
typedef struct {
ID ckID;
long ckSize;
ID formType;
char chunks [];
} Chunk;
ckID is always 'FORM'. This indicates that this is a FORM chunk.
ckSize contains the size of data portion of the 'FORM' chunk. Note that the data portion has been broken into two parts, formType and chunks[].
formType describes what's in the 'FORM' chunk. For Audio IFF files, formType is always 'AIFF'. This indicates that the chunks within the FORM pertain to sampled sound. A FORM chunk of formType 'AIFF' is called a FORM AIFF.
chunks are the chunks contained within the FORM. These chunks are called local chunks. A FORM AIFF along with its local chunks make up an Audio IFF file.
Here is an example of a simple Audio IFF file. It consists of a file containing single FORM AIFF which contains two local chunks, a Common Chunk and a Sound Data Chunk.
There are no restrictions on the ordering of local chunks within a FORM AIFF.
On an Apple II, the FORM AIFF is stored in a ProDOS file. The file type is 0xD8 and the aux type is 0x0000. AIFF versions 1.2 and earlier used file type 0xCB, which is incorrect. Please see the Apple II File Type Note for file type 0xD8 and aux type 0x0000 for strategies on dealing with this inconsistency.
On a Macintosh, the FORM AIFF is stored in the data fork of an Audio IFF file. The Macintosh file type of an Audio IFF file is 'AIFF'. This is the same as the formType of the FORM AIFF.
Macintosh or Apple II applications should not store any information in Audio IFF file's resource fork, as this information may not be preserved by all applications. Applications can use the Application Specific Chunk, defined later in this document, to store extra information specific to their application.
On an operating system that uses file extensions, such as MS-DOS or UNIX, it is recommended that Audio IFF file names have a ".AIF" extension.
A more detailed example of an Audio IFF file can be found in the Appendix. Please refer to this example as often as necessary while reading the remainder of this document.
The formats of the different local chunk types found within a FORM AIFF are described in the following sections. The ckIDs for each chunk are also defined.
There are two types of chunks, those that are required and those that are optional. The Common Chunk is required. The Sound Data chunk is required if the sampled sound has greater than zero length. All other chunks are optional. All applications that use FORM AIFF must be able to read the required chunks, and can choose to selectively ignore the optional chunks. A program that copies a FORM AIFF should copy all of the chunks in the FORM AIFF.
Common Chunk
The Common Chunk describes fundamental parameters of the sampled sound.
#define CommonID 'COMM' /* ckID for Common Chunk */
typedef struct {
ID ckID;
long ckSize;
short numChannels;
unsigned long numSampleFrames;
short sampleSize;
extended sampleRate;
} CommonChunk;
ckID is always 'COMM'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize. For the Common Chunk, ckSize is always 18.
numChannels contains the number of audio channels for the sound. A value of 1 means monophonic sound, 2 means stereo, and 4 means four channel sound, etc. Any number of audio channels may be represented.
The actual sound samples are stored in another chunk, the Sound Data Chunk, which will be described shortly. For multichannel sounds, single sample points from each channel are interleaved. A set of interleaved sample points is called a sample frame. This is illustrated below for the stereo case.
For monophonic sound, a sample frame is a single sample point.
For multichannel sounds, the following conventions should be observed:
numSampleFrames contains the number of sample frames in the Sound Data Chunk. Note that numSampleFrames is the number of sample frames, not the number of bytes nor the number of sample points in the Sound Data Chunk. The total number of sample points in the file is numSampleFrames times numChannels.
sampleSize is the number of bits in each sample point. It can be any number from 1 to 32. The format of a sample point will be described in the next section, the Sound Data Chunk.
sampleRate is the sample rate at which the sound is to be played back, in sample frames per second.
One and only one Common Chunk is required in every FORM AIFF.
The Sound Data Chunk contains the actual sample frames.
#define SoundDataID 'SSND' /* ckID for Sound Data Chunk */
typedef struct {
ID ckID;
long ckSize;
unsigned long offset;
unsigned long blockSize;
unsigned char soundData[];
} SoundDataChunk;
ckID is always 'SSND'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
offset determines where the first sample frame in the soundData starts. offset is in bytes. Most applications won't use offset and should set it to zero. Use for a non-zero offset is explained in the Block-Aligning Sound Data section below.
blockSize is used in conjunction with offset for block-aligning sound data. It contains the size in bytes of the blocks that sound data is aligned to. As with offset, most applications won't use blockSize and should set it to zero. More information on blockSize is in the Block-Aligning Sound Data section below.
soundData contains the sample frames that make up the sound. The number of sample frames in the soundData is determined by the numSampleFrames parameter in the Common Chunk.
Sample Points
Each sample point in a sample frame is a linear, 2's complement value. The sample points are from 1 to 32 bits wide, as determined by the sampleSize parameter in the Common Chunk. Sample points are stored in an integral number of contiguous bytes. One to 8 bit wide sample points are stored in one byte, 9 to 16 bit wide sample points are stored in two bytes, 17 to 24 bit wide sample points are stored in 3 bytes, and 25 to 32 bit wide samples are stored in 4 bytes. When the width of a sample point is less than a multiple of 8 bits, the sample point data is left justified, with the remaining bits zeroed. An example case is illustrated below. A 12 bit sample point, binary 101000010111, is stored left justified in two bytes. The remaining bits are set to zero.
Sample Frames
Sample frames are stored contiguously in order of increasing time. The sample points within a sample frame are packed together, there are no unused bytes between them. Likewise, the sample frames are packed together with no pad bytes.
Block-Aligning Sound Data
There may be some applications that, to insure real time recording and playback of audio, wish to align sampled sound data with fixed-size blocks. This can be accomplished with the offset and blockSize parameters, as shown below.
In the above figure, the first sample frame starts at the beginning of block N. This is accomplished by skipping the first offset bytes of the soundData. Note too that the soundData array can extend beyond valid sample frames, allowing the soundData array to end on a block boundary.
blockSize specifies the size in bytes of the block that is to be aligned to. A blockSize of zero indicates that the sound data does not need to be block-aligned. Applications that don't care about block alignment should set blockSize and offset to zero when writing Audio IFF files. Applications that write block-aligned sound data should set blockSize to the appropriate block size. Applications that modify an existing Audio IFF file should try to preserve alignment of the sound data, although this is not required. If an application doesn't preserve alignment, it should set blockSize and offset to zero. If an application needs to realign sound data to a different sized block, it should update blockSize and offset accordingly.
The Sound Data Chunk is required unless the numSampleFrames field in the Common Chunk is zero. A maximum of one Sound Data Chunk can appear in a FORM AIFF.
The Marker Chunk contains markers that point to positions in the sound data. Markers can be used for whatever purposes an application desires. The Instrument Chunk, defined later in this document, uses markers to mark loop beginning and end points, for example.
Markers
A marker has the following format.
typedef short MarkerId;
typedef struct {
MarkerId id;
unsigned long position;
pstring markerName;
} Marker;
id is a number that uniquely identifies the marker within a FORM AIFF. The id can be any positive non-zero integer, as long as no other marker within the same FORM AIFF has the same id.
The marker's position in the sound data is determined by position . Markers conceptually fall between two sample frames. A marker that falls before the first sample frame in the sound data is at position zero, while a marker that falls between the first and second sample frame in the sound data is at position 1. Note that the units for position are sample frames, not bytes nor sample points.
markerName is a Pascal-style text string containing the name of the mark.
Note: Some "EA IFF 85" files store strings as C-strings (text bytes followed by a null terminating character) instead of Pascal-style strings. Audio IFF uses pstrings because they are more efficiently skipped over when scanning through chunks. Using pstrings, a program can skip over a string by adding the string count to the address of the first character. C strings require that each character in the string be examined for the null terminator.
Marker Chunk Format
The format for the data within a Marker Chunk is shown below.
#define MarkerID 'MARK' /* ckID for Marker Chunk */
typedef struct {
ID ckID;
long ckSize;
unsigned short numMarkers;
Marker Markers[];
} MarkerChunk;
ckID is always 'MARK'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
numMarkers is the number of markers in the Marker Chunk.
numMarkers, if non-zero, it is followed by the markers themselves. Because all fields in a marker are an even number of bytes in length, the length of any marker will always be even. Thus, markers are packed together with no unused bytes between them. The markers need not be ordered in any particular manner.
The Marker Chunk is optional. No more than one Marker Chunk can appear in a FORM AIFF.
The Instrument Chunk defines basic parameters that an instrument, such as a sampler, could use to play back the sound data.
Looping
Sound data can be looped, allowing a portion of the sound to be repeated in order to lengthen the sound. The structure below describes a loop:
typedef struct {
short playMode;
MarkerId beginLoop;
MarkerId endLoop;
} Loop;
A loop is marked with two points, a begin position and an end position. There are two ways to play a loop, forward looping and forward/backward looping. In the case of forward looping, playback begins at the beginning of the sound, continues past the begin position and continues to the end position, at which point playback restarts again at the begin position. The segment between the begin and end positions, called the loop segment, is played over and over again, until interrupted by something, such as the release of a key on a sampling instrument, for example.
With forward/backward looping, the loop segment is first played from the begin position to the end position, and then played backwards from the end position back to the begin position. This flip-flop pattern is repeated over and over again until interrupted.
playMode specifies which type of looping is to be performed.
#define NoLooping 0 #define ForwardLooping 1 #define ForwardBackwardLooping 2
If NoLooping is specified, then the loop points are ignored during playback.
beginLoop is a the marker id that marks the begin position of the loop segment.
endLoop marks the end position of a loop. The begin position must be less than the end position. If this is not the case, then the loop segment has zero or negative length and no looping takes place.
Instrument Chunk Format
The format of the data within an Instrument Chunk is described below.
#define InstrumentID 'INST' /* ckID for Instrument Chunk */
typedef struct {
ID ckID;
long ckSize;
char baseNote;
char detune;
char lowNote;
char highNote;
char lowVelocity;
char highVelocity;
short gain;
Loop sustainLoop;
Loop releaseLoop;
} InstrumentChunk;
ckID is always 'INST'. ckSize is the size of the data portion of the chunk, in bytes. For the Instrument Chunk, ckSize is always 20.
baseNote is the note at which the instrument plays back the sound data without pitch modification. Units are MIDI (MIDI is an acronym for Musical Instrument Digital Interface) note numbers, and are in the range 0 through 127. Middle C is 60.
detune determines how much the instrument should alter the pitch of the sound when it is played back. Units are in cents (1/100 of a semitone) and range from -50 to +50. Negative numbers mean that the pitch of the sound should be lowered, while positive numbers mean that it should be raised.
lowNote and highNote specify the suggested range on a keyboard for playback of the sound data. The sound data should be played if the instrument is requested to play a note between the low and high notes, inclusive. The base note does not have to be within this range. Units for lowNote and highNote are MIDI note values.
lowVelocity and highVelocity specify the suggested range of velocities for playback of the sound data. The sound data should be played if the note-on velocity is is between low and high velocity, inclusive. Units are MIDI velocity values, 1 (lowest velocity) through 127 (highest velocity).
gain is the amount by which to change the gain of the sound when it is played. Units are decibels. For example, 0 db means no change, 6 db means double the value of each sample point, while -6 db means halve the value of each sample point.
sustainLoop specifies a loop that is to be played when an instrument is sustaining a sound.
releaseLoop specifies a loop that is to be played when an instrument is in the release phase of playing back a sound. The release phase usually occurs after a key on an instrument is released.
The Instrument Chunk is optional. No more than one Instrument Chunk can appear in a FORM AIFF.
The MIDI Data Chunk can be used to store MIDI data (please refer to Musical Instrument Digital Interface Specification 1.0, available from the International MIDI Association, for more details on MIDI).
The primary purpose of this chunk is to store MIDI System Exclusive messages, although other types of MIDI data can be stored in this block as well. As more instruments come on the market, they will likely have parameters that have not been included in the Audio IFF specification. The MIDI System Exclusive messages for these instruments may contain many parameters that are not included in the Instrument Chunk. For example, a new sampling instrument may have more than the two loops defined in the Instrument Chunk. These loops will likely be represented in the MIDI System Exclusive message for the new machine. This MIDI System Exclusive message can be stored in the MIDI Data Chunk.
#define MIDIDataID 'MIDI' /* ckID for MIDI Data Chunk */
typedef struct {
ID ckID;
long ckSize;
unsigned char MIDIdata[];
} MIDIDataChunk;
ckID is always ' MIDI'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
MIDIData contains a stream of MIDI data.
The MIDI Data Chunk is optional. Any number of MIDI Data Chunks may exist in a FORM AIFF. If MIDI System Exclusive messages for several instruments are to be stored in a FORM AIFF, it is better to use one MIDI Data Chunk per instrument than one big MIDI Data Chunk for all of the instruments.
The Audio Recording Chunk contains information pertinent to audio recording devices.
#define AudioRecordingID 'AESD' /* ckID for Audio Recording */
/* Chunk. */
typedef struct {
ID ckID;
long ckSize;
unsigned char AESChannelStatusData[24];
} AudioRecordingChunk;
ckID is always 'AESD'. ckSize is the size of the data portion of the chunk, in bytes. For the Audio Recording Chunk, ckSize is always 24.
The 24 bytes of AESChannelStatusData are specified in the AES Recommended Practice for Digital Audio Engineering - Serial Transmission Format for Linearly Represented Digital Audio Data, section 7.1, Channel Status Data. That document describes a format for real-time digital transmission of digital audio between audio devices. This information is duplicated in the Audio Recording Chunk for convenience. Of general interest would be bits 2, 3, and 4 of byte 0, which describe recording emphasis.
The Audio Recording Chunk is optional. No more than one Audio Recording Chunk may appear in a FORM AIFF.
The Application Specific Chunk can be used for any purposes whatsoever by manufacturers of applications. For example, an application that edits sounds might want to use this chunk to store editor state parameters such as magnification levels, last cursor position, and the like.
#define ApplicationSpecificID 'APPL' /* ckID for Application */
/* Specific Chunk. */
typedef struct {
ID ckID;
long ckSize;
OSType applicationSignature;
char data[];
} ApplicationSpecificChunk;
ckID is always 'APPL'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
applicationSignature identifies a particular application. For Macintosh applications, this will be the application's four character signature. For Apple II applications, applicationSignature should always be 'pdos', or the hexadecimal bytes 0x70646F73. If applicationSignature is 'pdos', the beginning of the data area is defined to be a Pascal-style string (a length byte followed by ASCII string bytes) containing the name of the application. This is necessary because Apple II applications do not have a four-byte signature as do Macintosh applications.
data is the data specific to the application.
The Application Specific Chunk is optional. Any number of Application Specific Chunks may exist in a single FORM AIFF.
The Comments Chunk is used to store comments in the FORM AIFF. "EA IFF 85" has an Annotation Chunk that can be used for comments, but the Comments Chunk has two features not found in the "EA IFF 85" chunk. They are: 1) a timestamp for the comment; and 2) a link to a marker.
Comment
A comment consists of a time stamp, marker id, and a text count followed by text.
typedef struct {
unsigned long timeStamp;
MarkerID marker;
unsigned short count;
char text;
} Comment;
timeStamp indicates when the comment was created. Units are the number of seconds since January 1, 1904. (This time convention is the one used by the Macintosh. For procedures that manipulate the time stamp, see The Operating System Utilities chapter in Inside Macintosh, vol II ). For a routine that will convert this to an Apple II GS/OS format time, please see Apple II File Type Note for filetype 0xD8, aux type 0x0000.
A comment can be linked to a marker. This allows applications to store long descriptions of markers as a comment. If the comment is referring to a marker, then marker is the ID of that marker. Otherwise, marker is zero, indicating that this comment is not linked to a marker.
count is the length of the text that makes up the comment. This is a 16 bit quantity, allowing much longer comments than would be available with a pstring.
text contains the comment itself. This text must be padded with a byte at the end to insure that it is an even number of bytes in length. This pad byte, if present, is not included in count.
Comments Chunk Format
#define CommentID 'COMT' /* ckID for Comments Chunk. */
typedef struct {
ID ckID;
long ckSize;
unsigned short numComments;
Comment comments[];
} CommentsChunk;
ckID is always ' COMT'. ckSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckSize.
numComments contains the number of comments in the Comments Chunk. This is followed by the comments themselves. Comments are always an even number of bytes in length, so there is no padding between comments in the Comments Chunk.
The Comments Chunk is optional. No more than one Comments Chunk may appear in a single FORM AIFF.
These four chunks are included in the definition of every "EA IFF 85" file. All are text chunks; their data portion consists solely of text. Each of these chunks is optional.
#define NameID 'NAME' /* ckID for Name Chunk. */
#define AuthorID 'AUTH' /* ckID for Author Chunk. */
#define CopyrightID '(c) ' /* ckID for Copyright Chunk. */
#define AnnotationID 'ANNO' /* ckID for Annotation Chunk. */
typedef struct {
ID ckID;
long ckSize;
char text[];
} TextChunk;
ckID is either ' NAME', ' AUTH', '(c) ', or ' ANNO', depending on whether the chunk as a Name Chunk, Author Chunk, Copyright Chunk, or Annotation Chunk, respectively. For the Copyright Chunk, the 'c' is lowercase and there is a space (0x20) after the close parenthesis.
ckSize is the size of the data portion of the chunk, in this case the text.
text contains pure ASCII characters. It is not a pstring nor a C string. The number of characters in text is determined by ckSize. The contents of text depend on the chunk, as described below:
Name Chunk
text contains the name of the sampled sound. The Name Chunk is optional. No more than one Name Chunk may exist within a FORM AIFF.
Author Chunk
text contains one or more author names. An author in this case is the creator of a sampled sound. The Author Chunk is optional. No more than one Author Chunk may exist within a FORM AIFF.
Copyright Chunk
The Copyright Chunk contains a copyright notice for the sound. text contains a date followed by the copyright owner. The chunk ID '(c) ' serves as the copyright characters '©'. For example, a Copyright Chunk containing the text "1988 Apple Computer, Inc." means "© 1988 Apple Computer, Inc."
The Copyright Chunk is optional. No more than one Copyright Chunk may exist within a FORM AIFF.
Annotation Chunk
text contains a comment. Use of this chunk is discouraged within FORM AIFF. The more powerful Comments Chunk should be used instead. The Annotation Chunk is optional. Many Annotation Chunks may exist within a FORM AIFF.
Several of the local chunks for FORM AIFF may contain duplicate information. For example, the Instrument Chunk defines loop points and MIDI system exclusive data in the MIDI Data Chunk may also define loop points. What happens if these loop points are different? How is an application supposed to loop the sound?
Such conflicts are resolved by defining a precedence for chunks:
The Common Chunk has the highest precedence, while the Application Specific Chunk has the lowest. Information in the Common Chunk always takes precedence over conflicting information in any other chunk. The Application Specific Chunk always loses in conflicts with other chunks. By looking at the chunk hierarchy, for example, one sees that the loop points in the Instrument Chunk take precedence over conflicting loop points found in the MIDI Data Chunk.
It is the responsibility of applications that write data into the lower precedence chunks to make sure that the higher precedence chunks are updated accordingly.
Illustrated below is an example of a FORM AIFF. An Audio IFF file is simply a file containing a single FORM AIFF. On a Macintosh, the FORM AIFF is stored in the data fork of a file and the file type is 'AIFF'.
The WAVE file format is a subset of Microsoft's RIFF specification for the storage of multimedia files. A RIFF file starts out with a file header followed by a sequence of data chunks. A WAVE file is often just a RIFF file with a single "WAVE" chunk which consists of two sub-chunks -- a "fmt " chunk specifying the data format and a "data" chunk containing the actual sample data. Call this form the "Canonical form". Who knows how it really all works. An almost complete description which seems totally useless unless you want to spend a week looking over it can be found at MSDN (mostly describes the non-PCM, or registered proprietary data formats).
Offset Size Name Description
The canonical WAVE format starts with the RIFF header:
0 4 ChunkID Contains the letters "RIFF" in ASCII form
(0x52494646 big-endian form).
4 4 ChunkSize 36 + SubChunk2Size, or more precisely:
4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
This is the size of the rest of the chunk
following this number. This is the size of the
entire file in bytes minus 8 bytes for the
two fields not included in this count:
ChunkID and ChunkSize.
8 4 Format Contains the letters "WAVE"
(0x57415645 big-endian form).
The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:
12 4 Subchunk1ID Contains the letters "fmt "
(0x666d7420 big-endian form).
16 4 Subchunk1Size 16 for PCM. This is the size of the
rest of the Subchunk which follows this number.
20 2 AudioFormat PCM = 1 (i.e. Linear quantization)
Values other than 1 indicate some
form of compression.
22 2 NumChannels Mono = 1, Stereo = 2, etc.
24 4 SampleRate 8000, 44100, etc.
28 4 ByteRate == SampleRate * NumChannels * BitsPerSample/8
32 2 BlockAlign == NumChannels * BitsPerSample/8
The number of bytes for one sample including
all channels. I wonder what happens when
this number isn't an integer?
34 2 BitsPerSample 8 bits = 8, 16 bits = 16, etc.
2 ExtraParamSize if PCM, then doesn't exist
X ExtraParams space for extra parameters
The "data" subchunk contains the size of the data and the actual sound:
36 4 Subchunk2ID Contains the letters "data"
(0x64617461 big-endian form).
40 4 Subchunk2Size == NumSamples * NumChannels * BitsPerSample/8
This is the number of bytes in the data.
You can also think of this as the size
of the read of the subchunk following this
number.
44 * Data The actual sound data.
As an example, here are the opening 72 bytes of a WAVE file with bytes shown as hexadecimal numbers:
52 49 46 46 24 08 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 02 00 22 56 00 00 88 58 01 00 04 00 10 00 64 61 74 61 00 08 00 00 00 00 00 00 24 17 1e f3 3c 13 3c 14 16 f9 18 f9 34 e7 23 a6 3c f2 24 f2 11 ce 1a 0d
Here is the interpretation of these bytes as a WAVE soundfile:
For more info see
http://www.ora.com/centers/gff/formats/micriff/index.htm
Waveform Audio File Format (WAVE)
This section describes the Waveform format, which is used to
represent digitized sound.
The WAVE form is defined as follows. Programs must expect
(and ignore) any unknown chunks encountered, as with all
RIFF forms. However, 〈fmt-ck〉 must always occur before
〈wave-data〉, and both of these chunks are mandatory in a
WAVE file.
〈WAVE-form〉 -〉
RIFF( 'WAVE'
〈fmt-ck〉 // Format
[〈fact-ck〉] // Fact chunk
[〈cue-ck〉] // Cue points
[〈playlist-ck〉] // Playlist
[〈assoc-data-list〉] // Associated
data list
〈wave-data〉 ) // Wave data
The WAVE chunks are described in the following sections.
WAVE Format Chunk
The WAVE format chunk 〈fmt-ck〉 specifies the format of the
〈wave-data〉. The 〈fmt-ck〉 is defined as follows:
〈fmt-ck〉 -〉 fmt( 〈common-fields〉
〈format-specific-fields〉 )
〈common-fields〉 -〉
struct
{
WORD wFormatTag; // Format category
WORD wChannels; // Number of channels
DWORDdwSamplesPerSec; // Sampling rate
DWORDdwAvgBytesPerSec; // For buffer
estimation
WORD wBlockAlign; // Data block size
}
The fields in the 〈common-fields〉 chunk are as follows:
Field Description
wFormatTag A number indicating the WAVE format
category of the file. The content of
the 〈format-specific-fields〉 portion
of the `fmt' chunk, and the
interpretation of the waveform data,
depend on this value.
You must register any new WAVE format
categories. See ``Registering
Multimedia Formats'' in Chapter 1,
``Overview of Multimedia
Specifications,'' for information on
registering WAVE format categories.
``Wave Format Categories,'' following
this section, lists the currently
defined WAVE format categories.
wChannels The number of channels represented in
the waveform data, such as 1 for mono
or 2 for stereo.
dwSamplesPerSe The sampling rate (in samples per
c second) at which each channel should
be played.
dwAvgBytesPerS The average number of bytes per second
ec at which the waveform data should be
transferred. Playback software can
estimate the buffer size using this
value.
wBlockAlign The block alignment (in bytes) of the
waveform data. Playback software needs
to process a multiple of wBlockAlign
bytes of data at a time, so the value
of wBlockAlign can be used for buffer
alignment.
The 〈format-specific-fields〉 consists of zero or more bytes
of parameters. Which parameters occur depends on the WAVE
format category-see the following section for details.
Playback software should be written to allow for (and
ignore) any unknown 〈format-specific-fields〉 parameters that
occur at the end of this field.
WAVE Format Categories
The format category of a WAVE file is specified by the value
of the wFormatTag field of the `fmt' chunk. The
representation of data in 〈wave-data〉, and the content of
the 〈format-specific-fields〉 of the `fmt' chunk, depend on
the format category.
The currently defined open non-proprietary WAVE format
categories are as follows:
wFormatTag Value Format Category
WAVE_FORMAT_PCM (0x0001) Microsoft Pulse Code
Modulation (PCM) format
The following are the registered proprietary WAVE format
categories:
wFormatTag Value Format Category
IBM_FORMAT_MULAW IBM mu-law format
(0x0101)
IBM_FORMAT_ALAW (0x0102) IBM a-law format
IBM_FORMAT_ADPCM IBM AVC Adaptive
(0x0103) Differential Pulse Code
Modulation format
The following sections describe the Microsoft
WAVE_FORMAT_PCM format.
Pulse Code Modulation (PCM) Format
If the wFormatTag field of the 〈fmt-ck〉 is set to
WAVE_FORMAT_PCM, then the waveform data consists of samples
represented in pulse code modulation (PCM) format. For PCM
waveform data, the 〈format-specific-fields〉 is defined as
follows:
〈PCM-format-specific〉 -〉
struct
{
WORD wBitsPerSample; // Sample size
}
The wBitsPerSample field specifies the number of bits of
data used to represent each sample of each channel. If there
are multiple channels, the sample size is the same for each
channel.
For PCM data, the wAvgBytesPerSec field of the `fmt' chunk
should be equal to the following formula rounded up to the
next whole number:
wBitsPerSample
wChannels x wBitsPerSecond x --------------
8
The wBlockAlign field should be equal to the following
formula, rounded to the next whole number:
wBitsPerSample
wChannels x --------------
8
Data Packing for PCM WAVE Files
In a single-channel WAVE file, samples are stored
consecutively. For stereo WAVE files, channel 0 represents
the left channel, and channel 1 represents the right
channel. The speaker position mapping for more than two
channels is currently undefined. In multiple-channel WAVE
files, samples are interleaved.
The following diagrams show the data packing for a 8-bit
mono and stereo WAVE files:
Sample 1 Sample 2 Sample 3 Sample 4
Channel 0 Channel 0 Channel 0 Channel 0
Data Packing for 8-Bit Mono PCM
Sample 1 Sample 2
Channel 0 Channel 1 Channel 0 Channel 0
(left) (right) (left) (right)
Data Packing for 8-Bit Stereo PCM
The following diagrams show the data packing for 16-bit mono
and stereo WAVE files:
Sample 1 Sample 2
Channel 0 Channel 0 Channel 0 Channel 0
low-order high-order low-order high-order
byte byte byte byte
Data Packing for 16-Bit Mono PCM
Sample 1
Channel 0 Channel 0 Channel 1 Channel 1
(left) (left) (right) (right)
low-order high-order low-order high-order
byte byte byte byte
Data Packing for 16-Bit Stereo PCM
Data Format of the Samples
Each sample is contained in an integer i. The size of i is
the smallest number of bytes required to contain the
specified sample size. The least significant byte is stored
first. The bits that represent the sample amplitude are
stored in the most significant bits of i, and the remaining
bits are set to zero.
For example, if the sample size (recorded in nBitsPerSample)
is 12 bits, then each sample is stored in a two-byte
integer. The least significant four bits of the first (least
significant) byte is set to zero.
The data format and maximum and minimums values for PCM
waveform samples of various sizes are as follows:
Sample Size Data Format Maximum Value Minimum Value
One to Unsigned 255 (0xFF) 0
eight bits integer
Nine or Signed Largest Most negative
more bits integer i positive value of i
value of i
For example, the maximum, minimum, and midpoint values for
8-bit and 16-bit PCM waveform data are as follows:
Format Maximum Minimum Value Midpoint
Value Value
8-bit PCM 255 (0xFF) 0 128 (0x80)
16-bit PCM 32767 -32768 0
(0x7FFF) (-0x8000)
Examples of PCM WAVE Files
Example of a PCM WAVE file with 11.025 kHz sampling rate,
mono, 8 bits per sample:
RIFF( 'WAVE' fmt(1, 1, 11025, 11025, 1, 8)
data( 〈wave-data〉 ) )
Example of a PCM WAVE file with 22.05 kHz sampling rate,
stereo, 8 bits per sample:
RIFF( 'WAVE' fmt(1, 2, 22050, 44100, 2, 8)
data( 〈wave-data〉 ) )
Example of a PCM WAVE file with 44.1 kHz sampling rate,
mono, 20 bits per sample:
RIFF( 'WAVE' INFO(INAM("O Canada"Z))
fmt(1, 1, 44100, 132300, 3, 20)
data( 〈wave-data〉 ) )
Storage of WAVE Data
The 〈wave-data〉 contains the waveform data. It is defined as
follows:
〈wave-data〉 -〉 { 〈data-ck〉 : 〈data-list〉 }
〈data-ck〉 -〉 data( 〈wave-data〉 )
〈wave-list〉 -〉 LIST( 'wavl' { 〈data-ck〉 :
// Wave samples
〈silence-ck〉 }... ) // Silence
〈silence-ck〉 -〉 slnt( 〈dwSamples:DWORD〉 ) // Count
of
// silent samples
Note: The `slnt' chunk represents silence, not necessarily
a repeated zero volume or baseline sample. In 16-bit PCM
data, if the last sample value played before the silence
section is a 10000, then if data is still output to the D to
A converter, it must maintain the 10000 value. If a zero
value is used, a click may be heard at the start and end of
the silence section. If play begins at a silence section,
then a zero value might be used since no other information
is available. A click might be created if the data following
the silent section starts with a nonzero value.
FACT Chunk
The 〈fact-ck〉 fact chunk stores important information about
the contents of the WAVE file. This chunk is defined as
follows:
〈fact-ck〉 -〉 fact( 〈dwFileSize:DWORD〉 ) // Number
of samples
The `fact'' chunk is required if the waveform data is
contained in a `wavl'' LIST chunk and for all compressed
audio formats. The chunk is not required for PCM files using
the `data'' chunk format.
The "fact" chunk will be expanded to include any other
information required by future WAVE formats. Added fields
will appear following the 〈dwFileSize〉 field. Applications
can use the chunk size field to determine which fields are
present.
Cue-Points Chunk
The 〈cue-ck〉 cue-points chunk identifies a series of
positions in the waveform data stream. The 〈cue-ck〉 is
defined as follows:
〈cue-ck〉 -〉 cue( 〈dwCuePoints:DWORD〉 // Count of cue
points
〈cue-point〉... ) // Cue-point
table
〈cue-point〉 -〉 struct {
DWORD dwName;
DWORD dwPosition;
FOURCC fccChunk;
DWORD dwChunkStart;
DWORD dwBlockStart;
DWORD dwSampleOffset;
}
The 〈cue-point〉 fields are as follows:
Field Description
dwName Specifies the cue point name. Each
〈cue-point〉 record must have a unique
dwName field.
dwPosition Specifies the sample position of the
cue point. This is the sequential
sample number within the play order.
See ``Playlist Chunk,'' later in this
document, for a discussion of the play
order.
fccChunk Specifies the name or chunk ID of the
chunk containing the cue point.
dwChunkStart Specifies the file position of the
start of the chunk containing the cue
point. This is a byte offset relative
to the start of the data section of
the `wavl' LIST chunk.
dwBlockStart Specifies the file position of the
start of the block containing the
position. This is a byte offset
relative to the start of the data
section of the `wavl' LIST chunk.
dwSampleOffset Specifies the sample offset of the cue
point relative to the start of the
block.
Examples of File Position Values
The following table describes the 〈cue-point〉 field values
for a WAVE file containing multiple `data' and `slnt' chunks
enclosed in a `wavl' LIST chunk:
Cue Point Field Value
Location
In a `slnt' fccChunk FOURCC value `slnt'.
chunk
dwChunkStart File position of the
`slnt' chunk relative to
the start of the data
section in the `wavl' LIST
chunk.
dwBlockStart File position of the data
section of the `slnt'
chunk relative to the
start of the data section
of the `wavl' LIST chunk.
dwSampleOffs Sample position of the cue
et point relative to the
start of the `slnt' chunk.
In a PCM fccChunk FOURCC value `data'.
`data' chunk
dwChunkStart File position of the
`data' chunk relative to
the start of the data
section in the `wavl' LIST
chunk.
dwBlockStart File position of the cue
point relative to the
start of the data section
of the `wavl' LIST chunk.
dwSampleOffs Zero value.
et
In a fccChunk FOURCC value `data'.
compressed
`data' chunk
dwChunkStart File position of the start
of the `data' chunk
relative to the start of
the data section of the
`wavl' LIST chunk.
dwBlockStart File position of the
enclosing block relative
to the start of the data
section of the `wavl' LIST
chunk. The software can
begin the decompression at
this point.
dwSampleOffs Sample position of the cue
et point relative to the
start of the block.
The following table describes the 〈cue-point〉 field values
for a WAVE file containing a single `data' chunk:
Cue Point Field Value
Location
Within PCM fccChunk FOURCC value `data'.
data
dwChunkStart Zero value.
dwBlockStart Zero value.
dwSampleOffs Sample position of the cue
et point relative to the
start of the `data' chunk.
In a fccChunk FOURCC value `data'.
compressed
`data' chunk
dwChunkStart Zero value.
dwBlockStart File position of the
enclosing block relative
to the start of the `data'
chunk. The software can
begin the decompression at
this point.
dwSampleOffs Sample position of the cue
et point relative to the
start of the block.
Playlist Chunk
The 〈playlist-ck〉 playlist chunk specifies a play order for
a series of cue points. The 〈playlist-ck〉 is defined as
follows:
〈playlist-ck〉 -〉 plst(
〈dwSegments:DWORD〉 // Count of play
segments
〈play-segment〉... ) // Play-segment
table
〈play-segment〉 -〉 struct {
DWORD dwName;
DWORD dwLength;
DWORD dwLoops;
}
The 〈play-segment〉 fields are as follows:
Field Description
dwName Specifies the cue point name. This
value must match one of the names
listed in the 〈cue-ck〉 cue-point
table.
dwLength Specifies the length of the section in
samples.
dwLoops Specifies the number of times to play
the section.
Associated Data Chunk
The 〈assoc-data-list〉 associated data list provides the
ability to attach information like labels to sections of the
waveform data stream. The 〈assoc-data-list〉 is defined as
follows:
〈assoc-data-list〉 -〉 LIST('adtl'
〈labl-ck〉 // Label
〈note-ck〉 // Note
〈ltxt-ck〉 // Text
with data length
〈file-ck〉 ) // Media
file
〈labl-ck〉 -〉 labl(〈dwName:DWORD〉
〈data:ZSTR〉 )
〈note-ck〉 -〉 note(〈dwName:DWORD〉
〈data:ZSTR〉 )
〈ltxt-ck〉 -〉 ltxt(〈dwName:DWORD〉
〈dwSampleLength:DWORD〉
〈dwPurpose:DWORD〉
〈wCountry:WORD〉
〈wLanguage:WORD〉
〈wDialect:WORD〉
〈wCodePage:WORD〉
〈data:BYTE〉... )
〈file-ck〉 -〉 file(〈dwName:DWORD〉
〈dwMedType:DWORD〉
〈fileData:BYTE〉...)
Label and Note Information
The `labl' and `note' chunks have similar fields. The `labl'
chunk contains a label, or title, to associate with a cue
point. The `note' chunk contains comment text for a cue
point. The fields are as follows:
Field Description
dwName Specifies the cue point name. This
value must match one of the names
listed in the 〈cue-ck〉 cue-point
table.
data Specifies a NULL-terminated string
containing a text label (for the
`labl' chunk) or comment text (for the
`note' chunk).
Text with Data Length Information
The `ltxt'' chunk contains text that is associated with a
data segment of specific length. The chunk fields are as
follows:
Field Description
dwName Specifies the cue point name. This
value must match one of the names
listed in the 〈cue-ck〉 cue-point
table.
dwSampleLength Specifies the number of samples in the
segment of waveform data.
dwPurpose Specifies the type or purpose of the
text. For example, dwPurpose can
specify a FOURCC code like `scrp' for
script text or `capt' for close-
caption text.
wCountry Specifies the country code for the
text. See ``Country Codes'' in Chapter
2, ``Resource Interchange File
Format,'' for a current list of
country codes.
wLanguage, Specify the language and dialect codes
wDialect for the text. See ``Language and
Dialect Codes'' in Chapter 2,
``Resource Interchange File Format,''
for a current list of language and
dialect codes.
wCodePage Specifies the code page for the text.
Embedded File Information
The `file' chunk contains information described in other
file formats (for example, an `RDIB' file or an ASCII text
file). The chunk fields are as follows:
Field Description
dwName Specifies the cue point name. This
value must match one of the names
listed in the 〈cue-ck〉 cue-point
table.
dwMedType Specifies the file type contained in
the fileData field. If the fileData
section contains a RIFF form, the
dwMedType field is the same as the
RIFF form type for the file.
This field can contain a zero value.
fileData Contains the media file.
To compress audio MPEG tries to remove the irrelevant parts of the signal and the redundant parts of the signal. Parts of the sound that we do not hear can be thrown away. To do this MPEG Audio uses psyco-acustic principles.
Now, the real reason for using 16 bits is to get a good signal-to-noise (s/n) ratio. The noise we're talking about here is quantization noise from the digitizing process. For each bit you add, you get 6dBbetter s/n. (To the ear, 6dBu corresponds to a doubling of the soundlevel.) CD-audio achieves about 90dB s/n. This matches the dynamic range of the ear fairly well. That is, you will not hear any noise coming from the system itself (well, there is still some people arguing about that, but lets not worry about them for the moment). So what happens when you sample to 8 bit resolution ? You get a very noticeable noise floor in your recording. You can easily hear this in silent moments in the music or between words or sentences if your recording is a human voice. Waitaminnit. You don't notice any noise in loud passages, right? This is the masking effect and is the key to MPEG Audio coding. Stuff like the masking effect belongs to a science called psyco-acoustics that deals with the way the human brain perceives sound. And MPEG uses psycoacoustic principles when it does its thing.
Let's now try to explain how the MPEG Audio coder goes about its thing. It divides the frequency spectrum (20Hz to 20kHz) into 32 sub-bands. Each sub-band holds a little slice of the audio spectrum. Say, in the upper region of sub-band 8, a 1000Hz tone with a level of60dB is present. OK, the coder calculates the masking effect of this sound and finds that there is a masking threshold for the entire 8thsub-band (all sounds w. a frequency...) 35dB below this tone. The acceptable s/n ratio is thus 60 - 35 = 25 dB. The equals 4 bitresolution. In addition there are masking effects on band 9-13 and onband 5-7, the effect decreasing with the distance from band 8.I a real-life situation you have sounds in most bands and the masking effects are additive. In addition the coder considers the sensitivity of the ear for various frequencies. The ear is a lot less sensitive in the high and low frequencies. Peak sensitivity is around 2-4kHz,the same region that the human voice occupies.
The sub-bands should match the ear, that is each sub-band should consist of frequencies that have the same psycoacustic properties. In MPEG layer II, each subband is 625Hz wide. It would been better ifthe sub-bands where narrower in the low frequency range and wider inthe high frequency range. To do this you need complex filters. To keep the filters simple they chose to add FFT in parallel with the filtering and use the spectral components from the FFT as additional information to the coder. This way you get higher resolution in the low frequencies where the ear is more sensitive.
But there is more to it. We have explained concurrent masking, but the masking effect also occurs before and after a strong sound (pre- and postmasking)
If there is a significant (30 - 40dB ) shift in level. The reason is believed to be that the brain needs some processing time. Premasking is only about 2 to 5 ms. The postmasking can be up till100ms. Other bit-reduction techniques involve considering tonal and non-tonal components of the sound. For a stereo signal you have a lot of redundancy between channels. The last step before formatting is Huffman coding.
The coder calculates masking effects by an iterative process untilit runs out of time. It is up to the implement or to spend bits in the least obtrusive fashion. For layer II the coder works on 23 ms of sound (1152 samples) at a time. For some material the 23 ms time-window can be a problem. This is normally in a situation with transients where there are large differences in sound level over the 23 ms. The masking is calculated on the strongest sound and the weak parts will drown in quantization noise. This is perceived as a noise-echo by the ear. Layer III addresses this problem specifically.
The IUMA (Internet Underground Music Archive) holds many audio clips in MPEG compressed format, but you might need to configure your WWW browser. IUMA, has been founded to provide a world wide audience to otherwise obscure and unavailable bands and artists.
A good summary of MPEG-1 audio is :ISO-MPEG-1 Audio: A generic standard for coding of high-quality digital audio J. Audio Eng. Soc. 42(10):780-792, October 1994.
| [8] Links Section | |
| [8.0] | Other Helpful FAQs |
| [8.1] | General Info |
| [8.2] | Technical Info |
|
[8.3]
|
Musical Reference |
|
[8.4]
|
Newsreader Software Info |
|
[8.5]
|
MP3 Software For Non-Windows Machines |
| [9] The FAQ Quick Review Guide | |
| [9.0] | A Quick Reference For Working Within The a.b.s.m.* Newsgroups |
| [1] General Information |
|
[1.0]
|
What is an "MP3"? |
| MP3 is another name for a layer-3 mpeg. It is a sound compression system that can create near cd-quality sound files while maintaining a small file size. | |
|
[1.1]
|
What newsgroups does this FAQ apply to? |
| This FAQ covers the alt.binaries.sounds.mp3 hierarchy and
includes, but is not restricted to:
alt.binaries.sounds.mp3 - The Binary posting group. This group is for the posting of binary sound files that are in the MP3 format. This group is NOT for the posting of text, requests, or ftp site announcements. It is for Binaries and Binaries only. The exceptions are: postings of this FAQ, zero-files (a.k.a. (0/x)), and Periodic Informational Postings (a.k.a. PIPs). The non-musical binary exceptions are cover art/insert scans, and other select related binaries. alt.binaries.sounds.mp3.d - This is the discussion group for the a.b.s.mp3 hierarchy. This is one of two non-binaries group of the hierarchy. Binaries are strictly forbidden in this group. DO NOT post any binaries in the "d" (discussion) group. This group is for the discussion of MP3s, MP3 technology, and other MP3 related topics. alt.binaries.sounds.mp3.requests - This is the request group of the hierarchy. It is *not* a binaries group and MP3 files should not be posted there. This group is intended to contain only requests and request follow-ups alerting the requestor that their request has been filled. alt.binaries.sounds.mp3.19xxs - Also known as the decade groups.
These are groups that are similar to the main group (a.b.s.mp3) but are
ONLY for the posting of sounds from a specific decade, as indicated by
the group name. The groups are:
NOTE: Although the alt.binaries.sounds.country.mp3 group is *not* part of the alt.binaries.sounds.mp3 hierarchy (and therefore not bound by it's FAQ or charter), it is available on a number of news servers and deserves a mention here for those people interested in country MP3s. |
|
|
[1.2]
|
Dividing the groups into genres would be a good idea. How come there aren't groups like a.b.s.m.jazz, or a.b.s.m.metal? |
| It seems like every week there is a request that a new
MP3 binary group be created for a specific genre of music that
would be posted there.
There are a couple of reasons why this isn't the great idea that it may appear to be. The first reason is that there isn't enough consistently posted content to validate the addition of the new group. If there was one specific type of music that consistently accounted for more than 50% of the content of the main group *and* the rest of the group had no interest in that type of music, then *maybe* you'd have a case on this one point. But the types of music that get posted in the main group vary day to day, and you may go weeks without seeing any specific type of music being posted. Look at the alt.binaries hierarchy as a good example of why a hierarchy *should* get subdivided into specific groups. There is a reason that there isn't just one group "alt.binaries". It has been divided and subdivided because there is/was a demand for that. There were enough people who wanted "sounds" versus "pictures" and felt a need to divide the "alt.binaries" hierarchy into those divisions. They were then subdivided even more into specific types of pictures, and specific types of sound files as necessary, but is it necessary to divide a.b.s.mp3 into *every* genre of music? Another major problem would be specifying the content of the new group, and how it would differ from the other MP3 groups. Specifying by genre is an incredible difficult thing to do. Where would the soundtrack to 'Bill & Ted's Excellent Adventure' be posted? Should it be posted to a.b.s.m.soundtrack? a.b.s.m.film-soundtrack? a.b.s.m.metal? a.b.s.m.pop.hits? a.b.s.m.compilation, a.b.s.m.male-artists? or a.b.s.m.80s? How do you determine the difference between "metal" and "hard rock"? Take a look at WinAMP's ID-Tag genre list, it's a great example of a lot of different ways to describe the same music. One person's "Booty Bass" is another person's "House" is another person's "Hip Hop". Also, would your new group even get used? There are thousands of binary groups, and a large number of those are nothing more than spam traps. A lot of them aren't even carried by most ISPs. The decade groups (the ones that are even used at all) are *still* unavailable to many news servers, and AOL won't even add the discussion group. Right now a.b.s.mp3 is the largest newsgroup by volume. Do you think that many news-admins want to add *another* MP3 binary group? For examples of some other mp3 groups, take a look at:
These groups all have very low mp3 traffic and may not even be carried by your news server. All in all, while creating the new group of your choice (so you don't have to search through the main group to find something that *you* like) may seem like a good idea, the odds of it truly being successful on it's own are probably pretty small. |
|
|
[1.3]
|
What are these groups all about? |
| They are about the posting of high quality MP3 compressed sound files. If you post here, please keep that in mind. | |
|
[1.4]
|
What about the other MP3 groups that I see? Does this FAQ apply to them too? |
| There are a number of MP3 groups, some of which are unused (except for spam-posting). The above mentioned groups are the primary groups that this FAQ deals with. This does not mean that the information within this FAQ is not relevant and applicable to other groups, only that it is not this FAQ's intent | |
|
[1.5]
|
Anything else I should know about this FAQ before I continue on? |
| There are many software applications and utilities involved
in the playing, encoding, decoding, posting, and retrieving of MP3s.
This FAQ is not meant to be a primer for the use of your particular software.
If it was to take into account every piece of popular software and it's
inner-workings or tricks, then this FAQ would rapidly become bloated and
unreadable. So, for the most part, this FAQ does not deal with specific
software issues. The exceptions are those that either relate to "frequently
asked questions" in the discussion group, or other helpful tips that might
not be readily found elsewhere. Specific Software Sub-Faqs
(S.S.Ss) may be available in the future to accommodate software issues
that relate to the a.b.s.mp3 hierarchy.
With all newsgroups, it is a common and recommended practice to "lurk".
This means that you follow the newsgroup, watching and learning, before
you begin posting. Posting is NOT required. There is no "ratio" or
required "trading" in the a.b.s.mp3 newsgroups. Leeching is completely
acceptable. If you are new to Usenet, or to binary newsgroups in
particular, there are a number of basic FAQ's that may help you:
http://www.netannounce.org/news.announce.newusers/archive/usenet/primer/part1
http://www.netannounce.org/news.announce.newusers/archive/usenet/what-is/part1
|
|
| [2] Requesting MP3s |
| [2.0] | I really want a song to get posted. How do I request it? |
| Please post your request (REQ) in alt.binaries.sounds.mp3.requests
Posting Requests in the Binary group is particularly frowned upon, and these requests are likely to be ignored. The binary groups (alt.binaries.sounds.mp3 and the decade groups) are specifically intended to carry the binary posts (i.e. The MP3s themselves), and not requests. The exception to this is a "zero-file" included with the binary itself, which sometimes will include a request along within it. A typical request might look like this: REQ: Song Title - Artist - Other Info - Thanks "Other Info" would include a specific album version or other pertinent information. And the "Thanks" is, of course, up to the discretion of the poster, as is the format. This is just a suggestion, but a standard REQ format would make the reading easier and allow sorting by Subject, which would provide an alphabetical listing of all requested songs. |
|
| [2.1] | I've come up with about 100 songs that I want. I guess I should post a separate request for each one, right? |
| Whoa now, wait one second. Nobody likes to see a REQ-Flood filling up the group. It makes you appear greedy, and is just generally annoying. And when you're asking for something from somebody, it's best to avoid being greedy and annoying. | |
| [2.2] | So how do I get ALL the songs that I want? |
| Why don't you pick the 5 songs that you particularly want
and request those. If/when they get posted, then you can request
the next 5, and so on. Don't forget that ripping, encoding, and posting
songs is a time consuming process, so try not to be too greedy.
Another option is to put your request list in the body of the message. The downside to this is that it's easier to quickly read the subject header. But if you're someone who posts a lot of files for other people, then it's likely that people will go through the process of reading your post, and will probably try to help you. |
|
| [2.3] | I want to make sure that people see my requests, so I'm going to post them five times each. People will notice me then, right? |
| People will notice you, but not in a good light. Posting the same message multiple times is called spamming, and it annoys people. See my previous note about asking people for something while simultaneously annoying them. The combination is not advantageous to you. | |
| [2.4] | I posted my requests and nobody filled them. Why? And what can I do about it? |
| It's possible that nobody has the songs you're requesting.
It's also possible that the song you requested was JUST posted, and people
don't want to repost it right away.
What can you do about it? Wait a week and post your requests again. It takes time for people to rip/encode and upload songs; give them a chance to get to you. There are a lot of people requesting songs all the time. Don't forget, beggars can't be choosers. You can also use an MP3 search engine. If your request is a popular song, it's pretty likely that somebody has already made an MP3 out of it, and it may be readily available via the World Wide Web. Links to search engines can be found on some of the MP3 web sites referenced in other portions of this FAQ. |
|
| [2.5] | I know how to make my requests now, but I can't find alt.binaries.sounds.mp3.requests. How am I supposed to post to the "requests" group if it doesn't exist? |
| It does exist, but maybe your news server doesn't carry it. First thing to do is to confirm that you can't access it through your ISP. | |
| [2.6] | How can I confirm that my news server carries the requests group? |
| The first thing to do is make sure you have an updated
list of all the newsgroups that your server provides. If you're using
Agent, this is accomplished by going to Online|Refresh Groups List
-or- Online|Get New Groups
After you have successfully retrieved all of the groups that your server carries, do a search for "alt.binaries.sounds.mp3.requests" (not including the quotes). If you find it, then subscribe, pull headers, and you're good to go. |
|
| [2.7] | The requests group isn't on my news server! I TOLD you that it doesn't exist! Now what do I do? |
| Okay, maybe it doesn't exist on your news server, after all it *is* a relatively new group. The quickest option is to use www.dejanews.com. They provide free web access to Usenet, including alt.binaries.sounds.mp3.requests | |
| [2.8] | I'm trying to remain anonymous, but when I signed up for dejanews they needed to know my e-mail address. So when I post a request won't people be able to find me? |
| I don't know of all of the inner workings of dejanews, but you can always go to www.hotmail.com and get a new e-mail address. | |
| [2.9] | If I get a new e-mail address, then people won't recognize my name/nym and I won't get the files I request. Isn't there ANY other way to get the requests group? |
| Maybe you should try to get your ISP/news server to carry the group. Send a polite e-mail to them explaining that in your effort to respect Usenet etiquette, you feel that the discussion group alt.binaries.sounds.mp3.d should be carried by them. It was properly proposed in alt.config without a single dissenting comment. They already carry the binary group, and the addition of a discussion/non-binary group will not substantially affect their news server's performance. | |
| [2.10] | I made my request and I think it got posted, but with all the spam in the binary group I can't find a thing. I thought I heard about some filter that people are using. What is it? |
| Some newsreader software will allow you to use filters which can make the newsgroup more readable. A filter commonly being used in these groups filters out any post with less than 100 lines IF it does not contain any of the following (0/#) , nfo, txt, image, scan, or "0 of" Just remember that filters are not infallible, and if you use them there is the possibility that you'll miss something that you wanted to see. | |
| [2.11] | Yadda-yadda-yadda... Just give me the spam filter for Agent! |
| Until an a.b.s.mp3 software FAQ is created, and since this
is of interest to a number of people in the a.b.s.mp3 groups, the filter
for Agent 1.51 is included here. Note that although it is formatted
for Agent 1.51, similar filters can easily be created for other software
packages or other versions of Agent.
kill subject: * and [1,100] and not ({0/} |"0 of"|nfo|txt|image|scan) |
|
| [2.12] | Where is this "d" group or "discussion group" that everybody talks about? I can't find it on my news server. |
| If you can't find alt.binaries.sounds.mp3.d then you should refer back to sections [2.5], [2.6] , and [2.7] and think about the "d" or "discussion" group instead of the requests group. | |
| [2.13] | I thought that all requests were supposed to go into the discussion group.   If that's not true, then why are there so many requests there? |
| Until recently the requests group hasn't existed on any news servers, therefore the only appropriate (i.e. non-binary) group in the hierarchy for requests was alt.binaries.sounds.mp3.d   Until the requests group fully propagates, there will continue to be requests in the discussion group, and it is still more appropriate than posting them in the binary groups. | |
| [3] Making MP3s |
| [3.0] | Other detailed sources of instruction |
| There are other introductions to the creation of MP3s available
on the WWW that provide a much more detailed description of the process,
and even have specific software examples. This document is not intended
to replace those, or to teach you all the ins and outs of mp3 creation.
Look at: http://www.mp3.com/dummies.html |
|
| [3.1] | I want to give something back to this group. How do I make an MP3? |
| Making MP3s from scratch involves a couple of steps. The first is acquiring the sound file and the second is encoding the file into MP3 format. | |
| [3.2] | How do I get the music from my CD-ROM onto my computer? |
| The preferred method of making MP3s is to do it from a
digital source (CD) and capture it digitally (digital audio extraction).
NOTE:
The first thing is to determine if your CD-ROM supports Digital Audio Extraction. |
|
| [3.3] | How do I determine if my CD-ROM supports digital audio extraction (DAE)? |
| Some software packages will test your system for you.
If you have Easy CD Creator, then you go to Tools|System Tests|Audio Extraction and run the test. You can also check the page at: http://www.tardis.ed.ac.uk/~psyche/cdda/CDDAresults_f.shtml, or a less detailed, but easier to read page at: http://www.mp3.com/cdrom.html. If you think that you're ripping tracks (dae) but you're not sure, and you may actually be sampling them through your sound card, then disconnect the audio cable that goes from your cd-rom to your sound card and try again. That should leave no doubt. |
|
| [3.4] | I know my CD-ROM does DAE, but I'm having strange problems and I can't get it to work right. What do I do? |
| You may be having compatibility problems with a specific
piece of software.
Check: http://www.tardis.ed.ac.uk/~psyche/cdda/CDDAresults_f.shtml to see if there are any software issues with your particular cd-rom drive. You can also find some tips at: http://www.mp3.com/cdromtips.html |
|
| [3.5] | My CD-ROM supports DAE, what do I use to rip audio tracks? |
| There are many different software choices, and each has
it's pros and cons. Some will encode as you rip the audio, some work
better with SCSI drives etc. Rippers of choice are WinDAC, audiograbber,
CD-Copy, CDDA and many others.
For more information go to: http://www.layer3.org/software/rippers.html or http://www.mp3.com/windows/cdrippers.html |
|
| [3.6] | Can I encode an MP3 straight off of the CD? |
| Yes, if you have mp3 compressor or mp3 producer installed, you can copy a track straight to into an MP3 with windac32. Go to the menu 'DAC', then to 'select wave format' and choose 'Fraunhofer IIS MPEG Layer-3 Codec (professional). The 'MPEG Encoder' (a.k.a. SoloH encoder) also allows MP3 encoding straight from the CD. | |
| [3.7] | I've ripped the audio track but the .wav file is messed up. It seems jittery and has pops or skips. Why? |
| Just because your CD-ROM is a 24x doesn't mean that it
can necessarily rip audio at that speed. Frequently jitter problems
are directly related to the speed at which you're ripping audio.
Set your software to a slower speed and try again.
Some software, such as WinDAC, has a jitter-correction option that may help. Or you may just be having a software compatibility problem. Some ripping software doesn't work well with certain CD-ROM drives. Try using a different piece of software. For more info on specific drives and software that works with them, go to: http://www.tardis.ed.ac.uk/~psyche/cdda/CDDAresults_f.shtml or http://www.mp3.com/cdrom.html. For some general CD-ROM compatibility tips check out: http://www.mp3.com/cdromtips.html |
|
| [3.8] | I don't like the way the song sounds on the CD because I like more bass. Should I adjust the E.Q. on the .wav file before making it into an MP3 and uploading it? |
| Please don't. People generally want to hear an MP3 that is as close to the original CD as possible. Even though you may feel that something helpful (like normalizing the songs) will make them better, that decision should be left to the final recipient. If they want to tweak their MP3s, then they can do it themselves. If you *have* tweaked or adjusted the song before you encode it, please make that information known when you post it. See section [4.7] and [4.8] for more information. | |
| [3.9] | I've ripped the track to my hard drive. Anything I should do before I turn it into an MP3? |