Neural Networks Beginnings - страница 3




import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers

#Defining the architecture of a neural network

model = keras.Sequential(

 [

 layers.LSTM(128, input_shape=(None, 13)),

 layers.Dense(64, activation="relu"),

 layers.Dense(32, activation="relu"),

 layers.Dense(10, activation="softmax"),

 ]

)

#Compilation of the model

model.compile(

 optimizer=keras.optimizers.Adam(learning_rate=0.001),

 loss=keras.losses.CategoricalCrossentropy(),

 metrics=["accuracy"],

)

#Loading audio file

audio_file = tf.io.read_file("audio.wav")

audio, _ = tf.audio.decode_wav(audio_file)

audio = tf.squeeze(audio, axis=-1)

audio = tf.cast(audio, tf.float32)

# splitting into segments

frame_length = 640

frame_step = 320

audio_length = tf.shape(audio)[0]

num_frames = tf.cast(tf.math.ceil(audio_length / frame_step), tf.int32)

padding_length = num_frames * frame_step – audio_length

audio = tf.pad(audio, [[0, padding_length]])

audio = tf.reshape(audio, [num_frames, frame_length])

#Extracting MFCC features

mfccs = tf.signal.mfccs_from_log_mel_spectrograms(

 tf.math.log(tf.abs(tf.signal.stft(audio))),

 audio.shape[-1],

 num_mel_bins=13,

 dct_coefficient_count=13,

)

# Data preparation for training

labels = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "zero"]

label_to_index = dict(zip(labels, range(len(labels))))

index_to_label = dict(zip(range(len(labels)), labels))

text = "one two three four five six seven eight nine zero"

target = tf.keras.preprocessing.text.one_hot(text, len(labels))

X_train = mfccs[None, …]

y_train = target[None, …]

# Training the model

history = model.fit(X_train, y_train, epochs=10)

# Making predictions

predicted_probs = model.predict(X_train)

predicted_indexes = tf.argmax(predicted_probs, axis=-1)[0]

predicted_labels = [index_to_label[i] for i in predicted_indexes]

# Outputting results

print("Predicted labels:", predicted_labels)


This code implements automatic speech recognition using a neural network based on TensorFlow and Keras. The first step is to define the neural network architecture using Keras Sequential API. In this case, a recurrent LSTM layer is used, which takes in a sequence of 13-length sound segments. Then there are several fully connected layers with a relu activation function and one output layer with a softmax activation function, which outputs probabilities for each speech class.

Next, the model is compiled using the compile method. The Adam optimizer with a learning rate of 0.001 is chosen, the loss function is categorical cross-entropy, and the classification accuracy is used as the metric.

Then a sound file in the wav format is loaded, decoded using tf.audio.decode_wav, and transformed into float32 numerical values. The file is then split into fragments of length 640 with a step of 320. If the file cannot be divided into equal fragments, padding is added.


This code implements automatic speech recognition using a neural network based on TensorFlow and Keras. The first step is to define the architecture of the neural network using the Keras Sequential API. In this case, a recurrent LSTM layer is used, which takes in a sequence of 13-length sound snippets. Then there are several fully connected layers with the relu activation function, and one output layer with the softmax activation function, which outputs probabilities for each speech class.