Spaces:

ResembleAI
/

Chatterbox_TTS_Demo

Runtime error

App Files Files Community

ollieollie commited on 17 days ago

Commit

112c36b

1 Parent(s): 128ae2b

revert to code before inference bug

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

chatterbox/src/chatterbox/__init__.py +0 -2
chatterbox/src/chatterbox/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/__pycache__/tts.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/__pycache__/vc.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/s3gen/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/s3gen/__pycache__/const.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/s3gen/transformer/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/t3/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/t3/inference/__pycache__/alignment_stream_analyzer.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/tokenizers/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/voice_encoder/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/chatterbox/models/voice_encoder/__pycache__/config.cpython-311.pyc +0 -0
chatterbox/src/orator/__init__.py +1 -0
chatterbox/src/orator/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/orator/__pycache__/tts.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox/models/s3gen/transformer/__init__.py → orator/model_checkpoints.py} +0 -0
chatterbox/src/orator/models/bigvgan/__pycache__/activations.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/__pycache__/bigvgan.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/activations.py +120 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/__init__.py +6 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/act.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/filter.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/resample.cpython-311.pyc +0 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/act.py +28 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/filter.py +95 -0
chatterbox/src/orator/models/bigvgan/alias_free_torch/resample.py +55 -0
chatterbox/src/orator/models/bigvgan/bigvgan.py +212 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__init__.py +0 -0
chatterbox/src/orator/models/s3gen/__pycache__/__init__.cpython-311.pyc +0 -0
chatterbox/src/orator/models/s3gen/__pycache__/const.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/decoder.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/f0_predictor.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/flow.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/flow_matching.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/hifigan.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/s3gen.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/xvector.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/const.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/decoder.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/f0_predictor.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/flow.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/flow_matching.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/hifigan.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/decoder.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/flow_matching.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/transformer.cpython-311.pyc +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/decoder.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/flow_matching.py +0 -0
chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/text_encoder.py +0 -0

chatterbox/src/chatterbox/__init__.py DELETED Viewed

	@@ -1,2 +0,0 @@
1	- from .tts import ChatterboxTTS
2	- from .vc import ChatterboxVC

chatterbox/src/chatterbox/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (275 Bytes)

chatterbox/src/chatterbox/__pycache__/tts.cpython-311.pyc DELETED Viewed

Binary file (12.5 kB)

chatterbox/src/chatterbox/__pycache__/vc.cpython-311.pyc DELETED Viewed

Binary file (4.9 kB)

chatterbox/src/chatterbox/models/s3gen/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (294 Bytes)

chatterbox/src/chatterbox/models/s3gen/__pycache__/const.cpython-311.pyc DELETED Viewed

Binary file (190 Bytes)

chatterbox/src/chatterbox/models/s3gen/transformer/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (190 Bytes)

chatterbox/src/chatterbox/models/t3/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (218 Bytes)

chatterbox/src/chatterbox/models/t3/inference/__pycache__/alignment_stream_analyzer.cpython-311.pyc DELETED Viewed

Binary file (7.08 kB)

chatterbox/src/chatterbox/models/tokenizers/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (242 Bytes)

chatterbox/src/chatterbox/models/voice_encoder/__pycache__/__init__.cpython-311.pyc DELETED Viewed

Binary file (281 Bytes)

chatterbox/src/chatterbox/models/voice_encoder/__pycache__/config.cpython-311.pyc DELETED Viewed

Binary file (859 Bytes)

chatterbox/src/orator/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .tts import OratorTTS

chatterbox/src/orator/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (212 Bytes). View file

chatterbox/src/orator/__pycache__/tts.cpython-311.pyc ADDED Viewed

Binary file (9.98 kB). View file

chatterbox/src/{chatterbox/models/s3gen/transformer/__init__.py → orator/model_checkpoints.py} RENAMED Viewed

File without changes

chatterbox/src/orator/models/bigvgan/__pycache__/activations.cpython-311.pyc ADDED Viewed

Binary file (6.09 kB). View file

chatterbox/src/orator/models/bigvgan/__pycache__/bigvgan.cpython-311.pyc ADDED Viewed

Binary file (13.3 kB). View file

chatterbox/src/orator/models/bigvgan/activations.py ADDED Viewed

	@@ -0,0 +1,120 @@

+# Implementation adapted from https://github.com/EdwardDixon/snake under the MIT license.
+#   LICENSE is in incl_licenses directory.
+import torch
+from torch import nn, sin, pow
+from torch.nn import Parameter
+class Snake(nn.Module):
+    '''
+    Implementation of a sine-based periodic activation function
+    Shape:
+        - Input: (B, C, T)
+        - Output: (B, C, T), same shape as the input
+    Parameters:
+        - alpha - trainable parameter
+    References:
+        - This activation function is from this paper by Liu Ziyin, Tilman Hartwig, Masahito Ueda:
+        https://arxiv.org/abs/2006.08195
+    Examples:
+        >>> a1 = snake(256)
+        >>> x = torch.randn(256)
+        >>> x = a1(x)
+    '''
+    def __init__(self, in_features, alpha=1.0, alpha_trainable=True, alpha_logscale=False):
+        '''
+        Initialization.
+        INPUT:
+            - in_features: shape of the input
+            - alpha: trainable parameter
+            alpha is initialized to 1 by default, higher values = higher-frequency.
+            alpha will be trained along with the rest of your model.
+        '''
+        super(Snake, self).__init__()
+        self.in_features = in_features
+        # initialize alpha
+        self.alpha_logscale = alpha_logscale
+        if self.alpha_logscale: # log scale alphas initialized to zeros
+            self.alpha = Parameter(torch.zeros(in_features) * alpha)
+        else: # linear scale alphas initialized to ones
+            self.alpha = Parameter(torch.ones(in_features) * alpha)
+        self.alpha.requires_grad = alpha_trainable
+        self.no_div_by_zero = 0.000000001
+    def forward(self, x):
+        '''
+        Forward pass of the function.
+        Applies the function to the input elementwise.
+        Snake ∶= x + 1/a * sin^2 (xa)
+        '''
+        alpha = self.alpha.unsqueeze(0).unsqueeze(-1) # line up with x to [B, C, T]
+        if self.alpha_logscale:
+            alpha = torch.exp(alpha)
+        x = x + (1.0 / (alpha + self.no_div_by_zero)) * pow(sin(x * alpha), 2)
+        return x
+class SnakeBeta(nn.Module):
+    '''
+    A modified Snake function which uses separate parameters for the magnitude of the periodic components
+    Shape:
+        - Input: (B, C, T)
+        - Output: (B, C, T), same shape as the input
+    Parameters:
+        - alpha - trainable parameter that controls frequency
+        - beta - trainable parameter that controls magnitude
+    References:
+        - This activation function is a modified version based on this paper by Liu Ziyin, Tilman Hartwig, Masahito Ueda:
+        https://arxiv.org/abs/2006.08195
+    Examples:
+        >>> a1 = snakebeta(256)
+        >>> x = torch.randn(256)
+        >>> x = a1(x)
+    '''
+    def __init__(self, in_features, alpha=1.0, alpha_trainable=True, alpha_logscale=False):
+        '''
+        Initialization.
+        INPUT:
+            - in_features: shape of the input
+            - alpha - trainable parameter that controls frequency
+            - beta - trainable parameter that controls magnitude
+            alpha is initialized to 1 by default, higher values = higher-frequency.
+            beta is initialized to 1 by default, higher values = higher-magnitude.
+            alpha will be trained along with the rest of your model.
+        '''
+        super(SnakeBeta, self).__init__()
+        self.in_features = in_features
+        # initialize alpha
+        self.alpha_logscale = alpha_logscale
+        if self.alpha_logscale: # log scale alphas initialized to zeros
+            self.alpha = Parameter(torch.zeros(in_features) * alpha)
+            self.beta = Parameter(torch.zeros(in_features) * alpha)
+        else: # linear scale alphas initialized to ones
+            self.alpha = Parameter(torch.ones(in_features) * alpha)
+            self.beta = Parameter(torch.ones(in_features) * alpha)
+        self.alpha.requires_grad = alpha_trainable
+        self.beta.requires_grad = alpha_trainable
+        self.no_div_by_zero = 0.000000001
+    def forward(self, x):
+        '''
+        Forward pass of the function.
+        Applies the function to the input elementwise.
+        SnakeBeta ∶= x + 1/b * sin^2 (xa)
+        '''
+        alpha = self.alpha.unsqueeze(0).unsqueeze(-1) # line up with x to [B, C, T]
+        beta = self.beta.unsqueeze(0).unsqueeze(-1)
+        if self.alpha_logscale:
+            alpha = torch.exp(alpha)
+            beta = torch.exp(beta)
+        x = x + (1.0 / (beta + self.no_div_by_zero)) * pow(sin(x * alpha), 2)
+        return x

chatterbox/src/orator/models/bigvgan/alias_free_torch/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+# Adapted from https://github.com/junjun3518/alias-free-torch under the Apache License 2.0
+#   LICENSE is in incl_licenses directory.
+from .filter import *
+from .resample import *
+from .act import *

chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (281 Bytes). View file

chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/act.cpython-311.pyc ADDED Viewed

Binary file (1.67 kB). View file

chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/filter.cpython-311.pyc ADDED Viewed

Binary file (4.51 kB). View file

chatterbox/src/orator/models/bigvgan/alias_free_torch/__pycache__/resample.cpython-311.pyc ADDED Viewed

Binary file (3.43 kB). View file

chatterbox/src/orator/models/bigvgan/alias_free_torch/act.py ADDED Viewed

	@@ -0,0 +1,28 @@

+# Adapted from https://github.com/junjun3518/alias-free-torch under the Apache License 2.0
+#   LICENSE is in incl_licenses directory.
+import torch.nn as nn
+from .resample import UpSample1d, DownSample1d
+class Activation1d(nn.Module):
+    def __init__(self,
+                 activation,
+                 up_ratio: int = 2,
+                 down_ratio: int = 2,
+                 up_kernel_size: int = 12,
+                 down_kernel_size: int = 12):
+        super().__init__()
+        self.up_ratio = up_ratio
+        self.down_ratio = down_ratio
+        self.act = activation
+        self.upsample = UpSample1d(up_ratio, up_kernel_size)
+        self.downsample = DownSample1d(down_ratio, down_kernel_size)
+    # x: [B, C, T]
+    def forward(self, x):
+        x = self.upsample(x)
+        x = self.act(x)
+        x = self.downsample(x)
+        return x

chatterbox/src/orator/models/bigvgan/alias_free_torch/filter.py ADDED Viewed

	@@ -0,0 +1,95 @@

+# Adapted from https://github.com/junjun3518/alias-free-torch under the Apache License 2.0
+#   LICENSE is in incl_licenses directory.
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+if 'sinc' in dir(torch):
+    sinc = torch.sinc
+else:
+    # This code is adopted from adefossez's julius.core.sinc under the MIT License
+    # https://adefossez.github.io/julius/julius/core.html
+    #   LICENSE is in incl_licenses directory.
+    def sinc(x: torch.Tensor):
+        """
+        Implementation of sinc, i.e. sin(pi * x) / (pi * x)
+        __Warning__: Different to julius.sinc, the input is multiplied by `pi`!
+        """
+        return torch.where(x == 0,
+                           torch.tensor(1., device=x.device, dtype=x.dtype),
+                           torch.sin(math.pi * x) / math.pi / x)
+# This code is adopted from adefossez's julius.lowpass.LowPassFilters under the MIT License
+# https://adefossez.github.io/julius/julius/lowpass.html
+#   LICENSE is in incl_licenses directory.
+def kaiser_sinc_filter1d(cutoff, half_width, kernel_size): # return filter [1,1,kernel_size]
+    even = (kernel_size % 2 == 0)
+    half_size = kernel_size // 2
+    #For kaiser window
+    delta_f = 4 * half_width
+    A = 2.285 * (half_size - 1) * math.pi * delta_f + 7.95
+    if A > 50.:
+        beta = 0.1102 * (A - 8.7)
+    elif A >= 21.:
+        beta = 0.5842 * (A - 21)**0.4 + 0.07886 * (A - 21.)
+    else:
+        beta = 0.
+    window = torch.kaiser_window(kernel_size, beta=beta, periodic=False)
+    # ratio = 0.5/cutoff -> 2 * cutoff = 1 / ratio
+    if even:
+        time = (torch.arange(-half_size, half_size) + 0.5)
+    else:
+        time = torch.arange(kernel_size) - half_size
+    if cutoff == 0:
+        filter_ = torch.zeros_like(time)
+    else:
+        filter_ = 2 * cutoff * window * sinc(2 * cutoff * time)
+        # Normalize filter to have sum = 1, otherwise we will have a small leakage
+        # of the constant component in the input signal.
+        filter_ /= filter_.sum()
+        filter = filter_.view(1, 1, kernel_size)
+    return filter
+class LowPassFilter1d(nn.Module):
+    def __init__(self,
+                 cutoff=0.5,
+                 half_width=0.6,
+                 stride: int = 1,
+                 padding: bool = True,
+                 padding_mode: str = 'replicate',
+                 kernel_size: int = 12):
+        # kernel_size should be even number for stylegan3 setup,
+        # in this implementation, odd number is also possible.
+        super().__init__()
+        if cutoff < -0.:
+            raise ValueError("Minimum cutoff must be larger than zero.")
+        if cutoff > 0.5:
+            raise ValueError("A cutoff above 0.5 does not make sense.")
+        self.kernel_size = kernel_size
+        self.even = (kernel_size % 2 == 0)
+        self.pad_left = kernel_size // 2 - int(self.even)
+        self.pad_right = kernel_size // 2
+        self.stride = stride
+        self.padding = padding
+        self.padding_mode = padding_mode
+        filter = kaiser_sinc_filter1d(cutoff, half_width, kernel_size)
+        self.register_buffer("filter", filter)
+    #input [B, C, T]
+    def forward(self, x):
+        _, C, _ = x.shape
+        if self.padding:
+            x = F.pad(x, (self.pad_left, self.pad_right), mode=self.padding_mode)
+        out = F.conv1d(x, self.filter.expand(C, -1, -1), stride=self.stride, groups=C)
+        return out

chatterbox/src/orator/models/bigvgan/alias_free_torch/resample.py ADDED Viewed

	@@ -0,0 +1,55 @@

+# Adapted from https://github.com/junjun3518/alias-free-torch under the Apache License 2.0
+#   LICENSE is in incl_licenses directory.
+import torch.nn as nn
+from torch.nn import functional as F
+from .filter import LowPassFilter1d
+from .filter import kaiser_sinc_filter1d
+class UpSample1d(nn.Module):
+    def __init__(self, ratio=2, kernel_size=None):
+        super().__init__()
+        self.ratio = ratio
+        self.kernel_size = int(6 * ratio // 2) * 2 if kernel_size is None else kernel_size
+        self.stride = ratio
+        self.pad = self.kernel_size // ratio - 1
+        self.pad_left = self.pad * self.stride + (self.kernel_size - self.stride) // 2
+        self.pad_right = self.pad * self.stride + (self.kernel_size - self.stride + 1) // 2
+        filter = kaiser_sinc_filter1d(
+            cutoff=0.5 / ratio,
+            half_width=0.6 / ratio,
+            kernel_size=self.kernel_size
+        )
+        self.register_buffer("filter", filter)
+    # x: [B, C, T]
+    def forward(self, x):
+        _, C, _ = x.shape
+        x = F.pad(x, (self.pad, self.pad), mode='replicate')
+        x = self.ratio * F.conv_transpose1d(
+            x, self.filter.expand(C, -1, -1), stride=self.stride, groups=C
+        )
+        x = x[..., self.pad_left:-self.pad_right]
+        return x
+class DownSample1d(nn.Module):
+    def __init__(self, ratio=2, kernel_size=None):
+        super().__init__()
+        self.ratio = ratio
+        self.kernel_size = int(6 * ratio // 2) * 2 if kernel_size is None else kernel_size
+        self.lowpass = LowPassFilter1d(
+            cutoff=0.5 / ratio,
+            half_width=0.6 / ratio,
+            stride=ratio,
+            kernel_size=self.kernel_size
+        )
+    def forward(self, x):
+        xx = self.lowpass(x)
+        return xx

chatterbox/src/orator/models/bigvgan/bigvgan.py ADDED Viewed

	@@ -0,0 +1,212 @@

+# Copyright (c) 2022 NVIDIA CORPORATION.
+#   Licensed under the MIT license.
+# Adapted from https://github.com/jik876/hifi-gan under the MIT license.
+#   LICENSE is in incl_licenses directory.
+import logging
+from torch.nn import Conv1d, ConvTranspose1d
+from torch.nn.utils import weight_norm, remove_weight_norm
+from torch.nn.utils.weight_norm import WeightNorm
+from .activations import SnakeBeta
+from .alias_free_torch import *
+LRELU_SLOPE = 0.1
+logger = logging.getLogger(__name__)
+def get_padding(kernel_size, dilation=1):
+    return int((kernel_size*dilation - dilation)/2)
+def init_weights(m, mean=0.0, std=0.01):
+    classname = m.__class__.__name__
+    if classname.find("Conv") != -1:
+        m.weight.data.normal_(mean, std)
+class AMPBlock1(torch.nn.Module):
+    def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
+        super(AMPBlock1, self).__init__()
+        self.convs1 = nn.ModuleList([
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[0],
+                               padding=get_padding(kernel_size, dilation[0]))),
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[1],
+                               padding=get_padding(kernel_size, dilation[1]))),
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[2],
+                               padding=get_padding(kernel_size, dilation[2])))
+        ])
+        self.convs1.apply(init_weights)
+        self.convs2 = nn.ModuleList([
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))),
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))),
+            weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1)))
+        ])
+        self.convs2.apply(init_weights)
+        self.num_layers = len(self.convs1) + len(self.convs2) # total number of conv layers
+        self.activations = nn.ModuleList([
+            Activation1d(activation=SnakeBeta(channels, alpha_logscale=True))
+            for _ in range(self.num_layers)
+        ])
+    def forward(self, x):
+        acts1, acts2 = self.activations[::2], self.activations[1::2]
+        for c1, c2, a1, a2 in zip(self.convs1, self.convs2, acts1, acts2):
+            xt = a1(x)
+            xt = c1(xt)
+            xt = a2(xt)
+            xt = c2(xt)
+            x = xt + x
+        return x
+    def set_weight_norm(self, enabled: bool):
+        weight_norm_fn = weight_norm if enabled else remove_weight_norm
+        for l in self.convs1:
+            weight_norm_fn(l)
+        for l in self.convs2:
+            weight_norm_fn(l)
+class BigVGAN(nn.Module):
+    # this is our main BigVGAN model. Applies anti-aliased periodic activation for resblocks.
+    # We've got a model in prod that has the wrong hparams for this. It's simpler to add this check than to
+    # redistribute the model.
+    ignore_state_dict_unexpected = ("cond_layer.*",)
+    def __init__(self):
+        super().__init__()
+        input_dims = 80
+        upsample_rates = [10, 8, 4, 2]
+        upsample_kernel_sizes = [x * 2 for x in upsample_rates]
+        upsample_initial_channel = 1024
+        resblock_kernel_sizes = [3, 7, 11]
+        resblock_dilation_sizes = [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
+        self.num_kernels = len(resblock_kernel_sizes)
+        self.num_upsamples = len(upsample_rates)
+        # pre conv
+        self.conv_pre = weight_norm(Conv1d(input_dims, upsample_initial_channel, 7, 1, padding=3))
+        self.cond_layer = None
+        # transposed conv-based upsamplers. does not apply anti-aliasing
+        self.ups = nn.ModuleList()
+        for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
+            self.ups.append(nn.ModuleList([
+                weight_norm(ConvTranspose1d(upsample_initial_channel // (2 ** i),
+                                            upsample_initial_channel // (2 ** (i + 1)),
+                                            k, u, padding=(k - u) // 2))
+            ]))
+        # residual blocks using anti-aliased multi-periodicity composition modules (AMP)
+        self.resblocks = nn.ModuleList()
+        for i in range(len(self.ups)):
+            ch = upsample_initial_channel // (2 ** (i + 1))
+            for j, (k, d) in enumerate(zip(resblock_kernel_sizes, resblock_dilation_sizes)):
+                self.resblocks.append(AMPBlock1(ch, k, d))
+        # post conv
+        activation_post = SnakeBeta(ch, alpha_logscale=True)
+        self.activation_post = Activation1d(activation=activation_post)
+        self.conv_post = weight_norm(Conv1d(ch, 1, 7, 1, padding=3))
+        # weight initialization
+        for i in range(len(self.ups)):
+            self.ups[i].apply(init_weights)
+        self.conv_post.apply(init_weights)
+    def forward(self, x) -> torch.Tensor:
+        """
+        Args
+        ----
+        x: torch.Tensor of shape [B, T, C]
+        """
+        with torch.inference_mode():
+            x = self.conv_pre(x)
+            for i in range(self.num_upsamples):
+                # upsampling
+                for i_up in range(len(self.ups[i])):
+                    x = self.ups[i][i_up](x)
+                # AMP blocks
+                xs = None
+                for j in range(self.num_kernels):
+                    if xs is None:
+                        xs = self.resblocks[i * self.num_kernels + j](x)
+                    else:
+                        xs += self.resblocks[i * self.num_kernels + j](x)
+                x = xs / self.num_kernels
+            # post conv
+            x = self.activation_post(x)
+            x = self.conv_post(x)
+            # Bound the output to [-1, 1]
+            x = torch.tanh(x)
+            return x
+    @property
+    def weight_norm_enabled(self) -> bool:
+        return any(
+            isinstance(hook, WeightNorm) and hook.name == "weight"
+            for k, hook in self.conv_pre._forward_pre_hooks.items()
+        )
+    def set_weight_norm(self, enabled: bool):
+        """
+        N.B.: weight norm modifies the state dict, causing incompatibilities. Conventions:
+        - BigVGAN runs with weight norm for training, without for inference (done automatically by instantiate())
+        - All checkpoints are saved with weight norm (allows resuming training)
+        """
+        if enabled != self.weight_norm_enabled:
+            weight_norm_fn = weight_norm if enabled else remove_weight_norm
+            logger.debug(f"{'Applying' if enabled else 'Removing'} weight norm...")
+            for l in self.ups:
+                for l_i in l:
+                    weight_norm_fn(l_i)
+            for l in self.resblocks:
+                l.set_weight_norm(enabled)
+            weight_norm_fn(self.conv_pre)
+            weight_norm_fn(self.conv_post)
+    def train_mode(self):
+        self.train()
+        self.set_weight_norm(enabled=True)
+    def inference_mode(self):
+        self.eval()
+        self.set_weight_norm(enabled=False)
+if __name__ == '__main__':
+    import sys
+    import soundfile as sf
+    model = BigVGAN()
+    state_dict = torch.load("bigvgan32k.pt")
+    msg = model.load_state_dict(state_dict)
+    model.eval()
+    model.set_weight_norm(enabled=False)
+    print(msg)
+    mels = torch.load("mels.pt")
+    with torch.inference_mode():
+        y = model(mels.cpu())
+    for i, wav in enumerate(y):
+        wav = wav.view(-1).detach().numpy()
+        sf.write(f"bigvgan_test{i}.flac", wav, samplerate=32_000, format="FLAC")

chatterbox/src/{chatterbox → orator}/models/s3gen/__init__.py RENAMED Viewed

File without changes

chatterbox/src/orator/models/s3gen/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (290 Bytes). View file

chatterbox/src/orator/models/s3gen/__pycache__/const.cpython-311.pyc ADDED Viewed

Binary file (186 Bytes). View file

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/decoder.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/decoder.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/decoder.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/f0_predictor.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/f0_predictor.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/f0_predictor.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/flow.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/flow.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/flow.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/flow_matching.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/flow_matching.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/flow_matching.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/hifigan.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/hifigan.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/hifigan.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/s3gen.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/s3gen.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/s3gen.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/__pycache__/xvector.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/__pycache__/xvector.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/__pycache__/xvector.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/const.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/decoder.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/f0_predictor.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/flow.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/flow_matching.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/hifigan.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/decoder.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/matcha/__pycache__/decoder.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/matcha/__pycache__/decoder.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/flow_matching.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/matcha/__pycache__/flow_matching.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/matcha/__pycache__/flow_matching.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/__pycache__/transformer.cpython-311.pyc RENAMED Viewed

Binary files a/chatterbox/src/chatterbox/models/s3gen/matcha/__pycache__/transformer.cpython-311.pyc and b/chatterbox/src/orator/models/s3gen/matcha/__pycache__/transformer.cpython-311.pyc differ

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/decoder.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/flow_matching.py RENAMED Viewed

File without changes

chatterbox/src/{chatterbox → orator}/models/s3gen/matcha/text_encoder.py RENAMED Viewed

File without changes