TensorflowTTS 한국어 예제 써보기 (KSS dataset)

TensorflowTTS 한국어 예제 써보기 (KSS dataset)

2021. 6. 15. 17:47ㆍ실천해본것들

참고

TensorflowTTS github

https://github.com/TensorSpeech/TensorFlowTTS

colab 예제(tensorflow)

https://colab.research.google.com/drive/1cL2NwGSUC5hFkF4k8pGrS7c7i5xNBono?usp=sharing

학습 시도 시 참고

https://github.com/TensorSpeech/TensorFlowTTS/pull/130

https://github.com/TensorSpeech/TensorFlowTTS/issues/415

attention graph 읽는 법

https://github.com/keithito/tacotron/issues/144

환경구성(local, window)

anaconda

conda create -n tftts
conda activate tftts

github repository 환경

Python 3.7+
Cuda 10.1
CuDNN 7.6.5
Tensorflow 2.2/2.3
Tensorflow Addons >= 0.10.0

환경은 벗어나지 않는 선에서 구성해줍시다.

pip

git clone https://github.com/TensorSpeech/TensorFlowTTS.git
cd TensorFlowTTS
pip install .

추가 설치

!pip install h5py==2.10
!pip install git+https://github.com/repodiac/german_transliterate
"""
HDF5(계층적데이터형식, 빅데이터 처리를 위한 파일 형식)을 python에서 사용하기 위한 모듈
독일어 자역 모듈
약어, 숫자, 타임 스탬프 등을 포함한 독일어 텍스트를 정리하고 음역 (예 : 정규화)하는 Python 모듈
"""

귀찮은 module not found를 없애기 위해 설치해 줍시다.

나머지 오류는 gpu / 경로 문제일 가능성이 농후합니다.

디렉토리 구조

./
|- synth_module.py
|- mb.melgan-1000k.h5
|- fastspeech2-200k.h5
|- tacotron2-100k.h5
|- kss_mapper.json
|- output/
|   |- [생성된 WAV 파일]
|- TensorFlowTTS/
|   |- ...

mb.melgan-1000k.h5
fastspeech2-200k.h5
tacotron2-100k.h5
kss_mapper.json

해당 파일들은 colab 예제를 돌릴 때 다운 받게 되는 파일들이다. colab instance에서 다운 받거나, google drive 에 연동해서 받는 식으로 얻을 수 있다.

synth_module.py

code

import os
import sys
from datetime import datetime
import tensorflow as tf
import time
import yaml
import numpy as np
import matplotlib.pyplot as plt

from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

import scipy.io.wavfile as wavf

class VoiceSynthesis:
    # 모델 초기화 
    def __init__(self):
        # gpu memory의 1/3 만을 할당하기로 제한
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.200) 
        conf = tf.compat.v1.ConfigProto(gpu_options=gpu_options)
        # 탄력적인 메모리 할당
        # conf.gpu_options.allow_growth = True
        session = tf.compat.v1.Session(config=conf)

        # tacotron 설정, 학습된 모델 가져오기
        module_path = os.path.dirname(os.path.abspath(__file__))        
        tacotron2_config = AutoConfig.from_pretrained(os.path.join(module_path,'TensorFlowTTS/examples/tacotron2/conf/tacotron2.kss.v1.yaml'))
        self.tacotron2 = TFAutoModel.from_pretrained(
            config=tacotron2_config,
            pretrained_path=os.path.join(module_path,"tacotron2-100k.h5"),
            name="tacotron2"
        )

        # fast speech 설정, 학습된 모델 가져오기
        fastspeech2_config = AutoConfig.from_pretrained(os.path.join(module_path,'TensorFlowTTS/examples/fastspeech2/conf/fastspeech2.kss.v1.yaml'))
        self.fastspeech2 = TFAutoModel.from_pretrained(
            config=fastspeech2_config,
            pretrained_path=os.path.join(module_path,"fastspeech2-200k.h5"),
            name="fastspeech2"
        )        

        # mel gan 설정, 학습된 모델 가져오기
        mb_melgan_config = AutoConfig.from_pretrained(os.path.join(module_path,'TensorFlowTTS/examples/multiband_melgan/conf/multiband_melgan.v1.yaml'))
        self.mb_melgan = TFAutoModel.from_pretrained(
            config=mb_melgan_config,
            pretrained_path=os.path.join(module_path,"mb.melgan-1000k.h5"),
            name="mb_melgan"
        )

        #processor - 글자 별 상응하는 숫자의 mapper 설정 가져오기
        self.processor = AutoProcessor.from_pretrained(pretrained_path=os.path.join(module_path,"kss_mapper.json"))

    # 입력 text -> 음성 변환 함수
    def do_synthesis(self, input_text, text2mel_model, vocoder_model, text2mel_name, vocoder_name):
        # 문자(초,중,종성) -> 숫자 sequence 변환 
        input_ids = self.processor.text_to_sequence(input_text)

        # text2mel part
        if text2mel_name == "TACOTRON":
            _, mel_outputs, stop_token_prediction, alignment_history = text2mel_model.inference(
                tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
                tf.convert_to_tensor([len(input_ids)], tf.int32),
                tf.convert_to_tensor([0], dtype=tf.int32)
            )
        elif text2mel_name == "FASTSPEECH2":
            mel_before, mel_outputs, duration_outputs, _, _ = text2mel_model.inference(
                tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
                speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
                speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
                f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
                energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
            )
        else:
            raise ValueError("Only TACOTRON, FASTSPEECH2 are supported on text2mel_name")

        # vocoder part
        if vocoder_name == "MB-MELGAN":
            audio = vocoder_model.inference(mel_outputs)[0, :, 0]
        else:
            raise ValueError("Only MB_MELGAN are supported on vocoder_name")
        # mel spectrogram, attention graph, inferenced audio
        if text2mel_name == "TACOTRON":
            return mel_outputs.numpy(), alignment_history.numpy(), audio.numpy()
        else:
            return mel_outputs.numpy(), audio.numpy()
    # attention graph 출력 ( 사용되지 않음 )
    def visualize_attention(self, alignment_history):
        fig = plt.figure(figsize=(8, 6))
        ax = fig.add_subplot(111)
        ax.set_title(f'Alignment steps')
        im = ax.imshow(
            alignment_history,
            aspect='auto',
            origin='lower',
            interpolation='none')
        fig.colorbar(im, ax=ax)
        xlabel = 'Decoder timestep'
        plt.xlabel(xlabel)
        plt.ylabel('Encoder timestep')
        plt.tight_layout()
        plt.show()
        plt.close()
    # mel spectrogram 시각화
    def visualize_mel_spectrogram(self, mels):
        mels = tf.reshape(mels, [-1, 80]).numpy()
        fig = plt.figure(figsize=(10, 8))
        ax1 = fig.add_subplot(311)
        ax1.set_title(f'Predicted Mel-after-Spectrogram')
        im = ax1.imshow(np.rot90(mels), aspect='auto', interpolation='none')
        fig.colorbar(mappable=im, shrink=0.65, orientation='horizontal', ax=ax1)
        plt.show()
        plt.close()
    # text to speech method - 이걸 호출해서 인자 문구를 넣고 사용. 생성한 목소리 파일 경로 반환
    def text_to_voice(self,input_text):
        # 현재시간을 파일 제목으로 사용
        cur_time = datetime.now()
        timestamp_str = cur_time.strftime("%Y%m%d_%H%M%S_%f")
        # audio 절대 경로에 생성
        mels, alignment_history, audios = self.do_synthesis(input_text, self.tacotron2, self.mb_melgan, "TACOTRON", "MB-MELGAN")
        # mels, audios = self.do_synthesis(input_text, self.fastspeech2, self.mb_melgan, "FASTSPEECH2", "MB-MELGAN")
        sample_rate = 22050
        # audio가 저장될 위치 - ./output/
        output_audio = os.path.join(os.path.dirname(os.path.abspath(__file__)),'output',timestamp_str +'.wav')
        wavf.write(output_audio, sample_rate, audios)
        return output_audio



if __name__ == "__main__":

    tts = VoiceSynthesis()
    start = time.time()  # 시작 시간 저장
    input_text = "신은 우리의 수학 문제에는 관심이 없다. 신은 다만 경험적으로 통합할 뿐이다."
    input_text = "간장 공장 공장장은 강 공장장이고, 된장공장 공장장은 공 공장장이다."
    input_text = "고객님, 총 금액은 7538원 입니다. 더 주문하시겠습니까?"
    print(tts.text_to_voice(input_text))
    print("time :", time.time() - start)  # 현재시각 - 시작시간 = 실행 시간

그저 colab의 예제를 class로 포장하고 몇줄 정도 추가했다.

inference test

마침표 허용
숫자 input 허용
- 다만 이 경우에서는 서수로 인식(다섯x 오o)

input_text = "햄 5장 추가했습니다."
음성 파일 -> 햄 오장 추가했습니다

추가 - 학습환경 구성

gpu가 1660 ti 밖에 안되어서 전처리까지는 어찌어찌 했지만 학습시 펑펑 터진다.

환경

conda install pytorch torchvision cudatoolkit=[현재 cuda 버전] -c pytorch

kss dataset 구하기

https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset

dataset 디렉토리 위치

TensorFlowTTS
|- kss/
|   |- kss/
|   |   |- 1
|   |   |  |- ...
|   |   |- 2
|   |   |  |- ...
|   |   |- 3
|   |   |  |- ...
|   |   |- 4
|   |   |  |- ...
|   |-  transcript.v.1.4.txt

Preprocessing and normalize

kss dataset 전처리

tensorflow-tts-preprocess --rootdir ./kss --outdir ./dump_kss --config preprocess/kss_preprocess.yaml --dataset kss
tensorflow-tts-normalize --rootdir ./dump_kss --outdir ./dump_kss --config preprocess/kss_preprocess.yaml --dataset kss

duration 추출하기

터지면 batch size를 줄여서 해보면 되는 경우가 종종 있다. 학습시 경험한 오류는 보통 gpu, 메모리 부족, 경로지정문제, cuda 버전 문제 였었다. 버전을 맞춰주면서 batch size를 줄여주면 어지간한 문제는 해결되는 듯 했다.

python examples/tacotron2/extract_duration.py --rootdir ./dump_kss/train/ --outdir ./dump_kss/train/durations/ --checkpoint ../tacotron2-100k.h5 --use-norm 1 --config ./examples/tacotron2/conf/tacotron2.kss.v1.yaml --batch-size 32

python examples/tacotron2/extract_duration.py --rootdir ./dump_kss/valid/ --outdir ./dump_kss/valid/durations/ --checkpoint ../tacotron2-100k.h5 --use-norm 1 --config ./examples/tacotron2/conf/tacotron2.kss.v1.yaml --batch-size 32

train(미완)

명령어(링크 참고)

python examples/tacotron2/train_tacotron2.py --train-dir ./dump_kss/train/ --dev-dir ./dump_kss/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.kss.v1/ --config ./examples/tacotron2/conf/tacotron2.kss.v2.yaml --use-norm 1 --pretrained ../tacotron2-100k.h5

여기서 문제가 발생해서

./examples/tacotron2/conf/tacotron2.kss.v2.yaml config 파일의 step수, batch size를 조정하였다.

oom 발생 후 argument 재조정

batchsize 32 -> 16 
num_save_intermediate_results: 1 -> 5

학습을 시작하면 신경망 구조가 나온다.

Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
encoder (TFTacotronEncoder)  multiple                  8218624
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480
_________________________________________________________________
residual_projection (Dense)  multiple                  41040
=================================================================
Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240
_________________________________________________________________

train 시도하다가 2000 step을 넘기지 못하였다.

error

대개 train 과정에서 발생하였다.

똑같은 삽질을 할 때 유용할 듯 해서 기록

RuntimeError: Error opening './kss\\kss\\3/3_2200.wav': System error.

transcript 파일을 root 폴더로, --rootdir 설정을 root 폴더로 지정

 dlerror: cudart64_101.dll not found

cuda version이 맞지 않는다
nvcc --version
cuda / cudnn 재설치

AssertionError: 1_0000 seems to be multi-channel signal.

1번 데이터를 빼고 학습하면 된다

1번 데이터 transcript text

1/1_0000.wav|그는 괜찮은 척하려고 애쓰는 것 같았다.|그는 괜찮은 척하려고 애쓰는 것 같았다.|그는 괜찮은 척하려고 애쓰는 것 같았다.|3.5|He seemed to be pretending to be okay.

ne 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

https://github.com/tensorflow/tensorflow/issues/28326
conda install -c anaconda cudatoolkit
conda install -c anaconda cudnn

duration 추출시 OOM

TensorFlowTTS\examples\tacotron2\extract_duration.py
초기화 부분에 삽입

    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    config.log_device_placement = True
    session = tf.compat.v1.Session(config=config)

NotImplementedError: Cannot convert a symbolic Tensor (meshgrid/Size:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

pip install numpy==1.19.5

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
위와 같이 per_process_gpu_memory_fraction=0.333으로 설정된 것은 strict하게 upper bound on the amount of GPU memory를 설정한 것이다.

https://goodtogreate.tistory.com/entry/TensorFlow%EB%A5%BC-%EA%B3%B5%EC%9A%A9-GPU%EC%97%90%EC%84%9C-%EC%82%AC%EC%9A%A9-%ED%95%A0-%EB%95%8C-%EB%A9%94%EB%AA%A8%EB%A6%AC-%EC%A0%88%EC%95%BD-%EB%B0%A9%EB%B2%95
근데 tf2 소스에서 tf1을 compat 이용해서 어거지로 쓰려고 하다보니 잘 안된다.

2021-06-14 16:45:05.453148: E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1983): 'cudnnRNNBackwardDataEx( cudnn.handle(), rnn_desc.handle(), output_desc.data_handle(), output_data.opaque(), output_desc.data_handle(), output_backprop_data.opaque(), nullptr, nullptr, output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.data_handle(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), nullptr, nullptr, workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-06-14 16:45:05.480514: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cudnn_rnn_ops.cc:1922 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 512, 256, 1, 105, 16, 256]
2021-06-14 16:45:05.495067: I tensorflow/stream_executor/stream.cc:2004] [stream=0000018EA06220A0,impl=0000018EA5B407D0] did not wait for [stream=0000018EA0623D20,impl=0000018EA5B40AA0]
2021-06-14 16:45:05.500534: I tensorflow/stream_executor/stream.cc:4952] [stream=0000018EA06220A0,impl=0000018EA5B407D0] did not memcpy host-to-device; source: 0000018FF005C4C0
2021-06-14 16:45:05.506208: E tensorflow/stream_executor/stream.cc:338] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2021-06-14 16:45:05.514350: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2021-06-14 16:45:05.519920: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
버전에 맞는 cuda toolkit 설치

tf2에 맞는 코드 적용

https://inpages.tistory.com/155

'실천해본것들' 카테고리의 다른 글

<TR> onclick 이벤트에서 선택한 <TD>의 index를 얻어보자 (0)	2021.09.02
TTS(Text To Speech) 정리 (feat. Tacotron, MelGAN) (0)	2021.07.15
monday.com 탈퇴하기 (0)	2021.04.21
[MongoDB] Java driver 연동 & quickstart (0)	2021.03.27
[JDBC] MySQL(docker) DB replication 적용 (0)	2021.03.18

소책자

소책자

태그

최근글

댓글

공지사항

아카이브

참고