ref: 8f8ca9c9e3cf1a08fa9b760bee637ba4e1ce8d34
parent: 611944682cb171708f36fed63af0343b95b2c9df
author: David <david@rowetel.com>
date: Sun Dec 16 04:30:53 EST 2018
updated README
--- a/dnn/README.md
+++ b/dnn/README.md
@@ -4,30 +4,24 @@
# Introduction
-Work in progress software for researching low CPU complexity algorithms for speech compression by applying Linear Prediction techniques to WaveRNN. The goal is to reduce the CPU complexity such that high quality speech can be synthesised on regular CPUs (around 1 GFLOP).
+Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 1 GFLOP) with SIMD support (AVX, AVX2, NEON currently supported).
-The BSD licensed software is written in C and Keras and currently requires a GPU (e.g. GT1060) to run.
-For training models, a GTX 1080 Ti or better is recommended.
+The BSD licensed software is written in C and Keras. For training, a GTX 1080 Ti or better is recommended.
-This software is also a useful resource as an open source starting point for WaveRNN-based speech coding.
+This software is an open source starting point for WaveRNN-based speech coding.
# Quickstart
1. Set up a Keras system with GPU.
-1. In the src/ directory, run ./compile.sh to compile the data processing program.
-
-1. Then, run the resulting executable:
+1. Generate training data:
```
- ./dump_data input.s16 features.f32 pcm.s16
+ make dump_data
+ ./dump_data -train input.s16 features.f32 pcm.s16
```
+ where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
- where the first file contains 16 kHz 16-bit raw PCM audio (no header)
-and the other files are output files. The input file currently used
-is 6 hours long, but you may be able to get away with less (and you can
-always use ±5% or 10% resampling to augment your data).
-
-1. Now that you have your files, you can do the training with:
+1. Now that you have your files, train with:
```
./train_lpcnet.py features.f32 pcm.s16
```
@@ -34,12 +28,25 @@
and it will generate a wavenet*.h5 file for each iteration. If it stops with a
"Failed to allocate RNN reserve space" message try reducing the *batch\_size* variable in train_wavenet_audio.py.
-1. You can synthesise speech with:
- ```
- ./test_lpcnet.py features.f32 > pcm.txt
- ```
- The output file pcm.txt contains ASCII PCM samples that need to be converted to WAV for playback
-
+1. You can synthesise speech with Python and your GPU card:
+ ```
+ ./dump_data -test test_input.s16 test_features.f32
+ ./test_lpcnet.py test_features.f32 test.s16
+ ```
+ Note the .h5 is hard coded in test_lpcnet.py, modify for your .h file.
+
+1. Or with C on a CPU:
+ First extract the model files nnet_data.h and nnet_data.c
+ ```
+ ./dump_lpcnet.py lpcnet15_384_10_G16_64.h5
+ ```
+ Then you can make the C synthesiser and try synthesising from a test feature file:
+ ```
+ make test_lpcnet
+ ./dump_data -test test_input.s16 test_features.f32
+ ./test_lpcnet test_features.f32 test.s16
+ ```
+
# Speech Material for Training
Suitable training material can be obtained from the [McGill University Telecommunications & Signal Processing Laboratory](http://www-mmsp.ece.mcgill.ca/Documents/Data/). Download the ISO and extract the 16k-LP7 directory, the src/concat.sh script can be used to generate a headerless file of training samples.
@@ -50,7 +57,7 @@
# Reading Further
-1. If you're lucky, you may be able to get the current model at:
+1. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/)
+1. Sample model files:
https://jmvalin.ca/misc_stuff/lpcnet_models/
-1. [WaveNet and Codec 2](https://www.rowetel.com/?p=5966)
--
⑨