- JingFang: A Traditional Chinese Medicine Large Language Model of Expert-Level Medical Diagnosis and Syndrome Differentiation-Based Treatment Traditional Chinese medicine (TCM) plays a vital role in health protection and disease treatment, but its practical application requires extensive medical knowledge and clinical experience. Existing TCM Large Language Models (LLMs) exhibit critical limitations of uncomprehensive medical consultation and diagnoses, and inaccurate syndrome differentiation-based treatment. To address these issues, this study establishes JingFang (JF): a novel TCM Large Language Model that demonstrates the expert-level capability of medical diagnosis and syndrome differentiation-based treatment. We innovate a Multi-agent Dynamic Collaborative Chain-of-Thought Mechanism (MDCCTM) for medical consultation, enabling JF with effective and accurate diagnostic ability. In addition, a Syndrome Agent and a Dual-Stage Retrieval Scheme (DSRS) are developed to significantly enhance the capacity of JF for disease treatment based on syndrome differentiation. JingFang not only facilitates the application of LLMs but also promotes the effective practice of TCM in human health protection and disease treatment. 6 authors · Feb 3
- FlashRNN: Optimizing Traditional RNNs on Modern Hardware While Transformers and other sequence-parallelizable neural network architectures seem like the current state of the art in sequence modeling, they specifically lack state-tracking capabilities. These are important for time-series tasks and logical reasoning. Traditional RNNs like LSTMs and GRUs, as well as modern variants like sLSTM do have these capabilities at the cost of strictly sequential processing. While this is often seen as a strong limitation, we show how fast these networks can get with our hardware-optimization FlashRNN in Triton and CUDA, optimizing kernels to the register level on modern GPUs. We extend traditional RNNs with a parallelization variant that processes multiple RNNs of smaller hidden state in parallel, similar to the head-wise processing in Transformers. To enable flexibility on different GPU variants, we introduce a new optimization framework for hardware-internal cache sizes, memory and compute handling. It models the hardware in a setting using polyhedral-like constraints, including the notion of divisibility. This speeds up the solution process in our ConstrINT library for general integer constraint satisfaction problems (integer CSPs). We show that our kernels can achieve 50x speed-ups over a vanilla PyTorch implementation and allow 40x larger hidden sizes compared to our Triton implementation. Our open-source kernels and the optimization library are released here to boost research in the direction of state-tracking enabled RNNs and sequence modeling: https://github.com/NX-AI/flashrnn 3 authors · Dec 10, 2024
- Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i.e., generating coherent and relevant text from structured data. To avoid the issue of LLM training data contamination with standard benchmarks, we design Quintd - a tool for collecting novel structured data records from public APIs. We find that open LLMs (Llama 2, Mistral, and Zephyr) can generate fluent and coherent texts in zero-shot settings from data in common formats collected with Quintd. However, we show that the semantic accuracy of the outputs is a major issue: both according to human annotators and our reference-free metric based on GPT-4, more than 80% of the outputs of open LLMs contain at least one semantic error. We publicly release the code, data, and model outputs. 2 authors · Jan 18, 2024
17 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Large Language Models (LLMs) hold great promise to revolutionize current clinical systems for their superior capacities on medical text processing tasks and medical licensing exams. Meanwhile, traditional ML models such as SVM and XGBoost have still been mainly adopted in clinical prediction tasks. An emerging question is Can LLMs beat traditional ML models in clinical prediction? Thus, we build a new benchmark ClinicalBench to comprehensively study the clinical predictive modeling capacities of both general-purpose and medical LLMs, and compare them with traditional ML models. ClinicalBench embraces three common clinical prediction tasks, two databases, 14 general-purpose LLMs, 8 medical LLMs, and 11 traditional ML models. Through extensive empirical investigation, we discover that both general-purpose and medical LLMs, even with different model scales, diverse prompting or fine-tuning strategies, still cannot beat traditional ML models in clinical prediction yet, shedding light on their potential deficiency in clinical reasoning and decision-making. We call for caution when practitioners adopt LLMs in clinical applications. ClinicalBench can be utilized to bridge the gap between LLMs' development for healthcare and real-world clinical practice. 8 authors · Nov 10, 2024 2
1 Comparative Evaluation of Traditional and Deep Learning-Based Segmentation Methods for Spoil Pile Delineation Using UAV Images The stability of mine dumps is contingent upon the precise arrangement of spoil piles, taking into account their geological and geotechnical attributes. Yet, on-site characterisation of individual piles poses a formidable challenge. The utilisation of image-based techniques for spoil pile characterisation, employing remotely acquired data through unmanned aerial systems, is a promising complementary solution. Image processing, such as object-based classification and feature extraction, are dependent upon effective segmentation. This study refines and juxtaposes various segmentation approaches, specifically colour-based and morphology-based techniques. The objective is to enhance and evaluate avenues for object-based analysis for spoil characterisation within the context of mining environments. Furthermore, a comparative analysis is conducted between conventional segmentation approaches and those rooted in deep learning methodologies. Among the diverse segmentation approaches evaluated, the morphology-based deep learning segmentation approach, Segment Anything Model (SAM), exhibited superior performance in comparison to other approaches. This outcome underscores the efficacy of incorporating advanced morphological and deep learning techniques for accurate and efficient spoil pile characterisation. The findings of this study contribute valuable insights to the optimisation of segmentation strategies, thereby advancing the application of image-based techniques for the characterisation of spoil piles in mining environments. 4 authors · Jan 31, 2024
1 Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model. In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Ribeiro et. al. We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets. The total number of test cases is approximately 190,000. In addition to the GPT family of LLMs, we also evaluate the well-known FLAN-T5 to identify commonalities between the GPT family and other LLMs. The dataset and code are available at https://github.com/tan92hl/Complex-Question-Answering-Evaluation-of-GPT-family.git 7 authors · Mar 14, 2023
- BianCang: A Traditional Chinese Medicine Large Language Model The rise of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. This paper addresses these challenges by proposing BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation. To enhance diagnostic and differentiation capabilities, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continuous pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 29 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available at https://github.com/QLU-NLP/BianCang. 8 authors · Nov 17, 2024
- An Improved Traditional Chinese Evaluation Suite for Foundation Model We present TMMLU+, a new benchmark designed for Traditional Chinese language understanding. TMMLU+ is a multi-choice question-answering dataset with 66 subjects from elementary to professional level. It is six times larger and boasts a more balanced subject distribution than its predecessor, Taiwan Massive Multitask Language Understanding (TMMLU). We also benchmark closed-source models and 26 open-weight Chinese large language models (LLMs) of parameters ranging from 1.8B to 72B on the proposed TMMLU+. Our findings reveal that (1.) Traditional Chinese models still trail behind their Simplified Chinese counterparts, highlighting a need for more focused advancements in LLMs catering to Traditional Chinese. (2.) Current LLMs still fall short of human performance in average scores, indicating a potential need for future research to delve deeper into social science and humanities subjects. (3.) Among all the tokenization compression metrics examined, we identify that only the fertility score uniquely demonstrates strong correlations with our benchmark results. We foresee that TMMLU+ will pinpoint areas for future model improvement, thereby narrowing the gap between machine and human linguistic capabilities and supporting researchers in developing Traditional Chinese LLMs. Our dataset, along with the benchmark source code, is accessible at huggingface.co/datasets/ikala/tmmluplus. 7 authors · Mar 4, 2024
- Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite The evaluation of large language models is an essential task in the field of language understanding and generation. As language models continue to advance, the need for effective benchmarks to assess their performance has become imperative. In the context of Traditional Chinese, there is a scarcity of comprehensive and diverse benchmarks to evaluate the capabilities of language models, despite the existence of certain benchmarks such as DRCD, TTQA, CMDQA, and FGC dataset. To address this gap, we propose a novel set of benchmarks that leverage existing English datasets and are tailored to evaluate language models in Traditional Chinese. These benchmarks encompass a wide range of tasks, including contextual question-answering, summarization, classification, and table understanding. The proposed benchmarks offer a comprehensive evaluation framework, enabling the assessment of language models' capabilities across different tasks. In this paper, we evaluate the performance of GPT-3.5, Taiwan-LLaMa-v1.0, and Model 7-C, our proprietary model, on these benchmarks. The evaluation results highlight that our model, Model 7-C, achieves performance comparable to GPT-3.5 with respect to a part of the evaluated capabilities. In an effort to advance the evaluation of language models in Traditional Chinese and stimulate further research in this field, we have open-sourced our benchmark and opened the model for trial. 6 authors · Sep 15, 2023
3 The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities Breeze 2 is a suite of advanced multi-modal language models, available in 3B and 8B parameter configurations, specifically designed to enhance Traditional Chinese language representation. Building upon the Llama 3, Breeze 2 continues pretraining on an extensive corpus to enhance the linguistic and cultural heritage of Traditional Chinese. It incorporates vision-aware capabilities through a visual encoder and a bridge module, and supports function-calling via prompt templates and post-training on function-calling data. The effectiveness of Breeze 2 is benchmarked across various tasks, including Taiwan general knowledge, instruction-following, long context, function calling, and vision understanding. Furthermore, we showcase the capabilities of the its 3B model in a mobile application. We are publicly releasing all Breeze 2 models under the Llama 3 Community License. 7 authors · Jan 23
- FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web The quality and size of a pretraining dataset significantly influence the performance of large language models (LLMs). While there have been numerous efforts in the curation of such a dataset for English users, there is a relative lack of similar initiatives for Traditional Chinese. Building upon this foundation of FineWeb, we introduce FineWeb-zhtw, a dataset tailored specifically for Traditional Chinese users. We came up with multiple stages of meticulously designed filters to cater to the linguistic difference between English and Traditional Chinese, to ensure comprehensiveness and quality. We determined effectiveness from querying dataset samples with three main objectives. Our code and datasets are publicly available. 9 authors · Nov 25, 2024
- Hyacinth6B: A large language model for Traditional Chinese This research's primary motivation of this study is to address the high hardware and computational demands typically associated with LLMs.Therefore,our goal is to find a balance between model lightness and performance,striving to maximize performance while using a comparatively lightweight model. Hyacinth6B was developed with this objective in mind,aiming to fully leverage the core capabilities of LLMs without incurring substantial resource costs, effectively pushing the boundaries of smaller model's performance. The training approach involves parameter efficient finetuning using the LoRA method. 2 authors · Mar 20, 2024
- Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results In this paper we present the multilingual language model BLOOM-zh that features enhanced support for Traditional Chinese. BLOOM-zh has its origins in the open-source BLOOM models presented by BigScience in 2022. Starting from released models, we extended the pre-training of BLOOM by additional 7.4 billion tokens in Traditional Chinese and English covering a variety of domains such as news articles, books, encyclopedias, educational materials as well as spoken language. In order to show the properties of BLOOM-zh, both existing and newly created benchmark scenarios are used for evaluating the performance. BLOOM-zh outperforms its predecessor on most Traditional Chinese benchmarks while maintaining its English capability. We release all our models to the research community. 9 authors · Mar 8, 2023
- A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally. 5 authors · Feb 12, 2021
- Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese We report upon the results of a research and prototype building project Worldly~OCR dedicated to developing new, more accurate image-to-text conversion software for several languages and writing systems. These include the cursive scripts Farsi and Pashto, and Latin cursive scripts. We also describe approaches geared towards Traditional Chinese, which is non-cursive, but features an extremely large character set of 65,000 characters. Our methodology is based on Machine Learning, especially Deep Learning, and Data Science, and is directed towards vast quantities of original documents, exceeding a billion pages. The target audience of this paper is a general audience with interest in Digital Humanities or in retrieval of accurate full-text and metadata from digital images. 4 authors · May 8, 2020
- MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision In recent years, deep learning has significantly advanced the MIDI domain, solidifying music generation as a key application of artificial intelligence. However, existing research primarily focuses on Western music and encounters challenges in generating melodies for Chinese traditional music, especially in capturing modal characteristics and emotional expression. To address these issues, we propose a new architecture, the Dual-Feature Modeling Module, which integrates the long-range dependency modeling of the Mamba Block with the global structure capturing capabilities of the Transformer Block. Additionally, we introduce the Bidirectional Mamba Fusion Layer, which integrates local details and global structures through bidirectional scanning, enhancing the modeling of complex sequences. Building on this architecture, we propose the REMI-M representation, which more accurately captures and generates modal information in melodies. To support this research, we developed FolkDB, a high-quality Chinese traditional music dataset encompassing various styles and totaling over 11 hours of music. Experimental results demonstrate that the proposed architecture excels in generating melodies with Chinese traditional music characteristics, offering a new and effective solution for music generation. 6 authors · Sep 4, 2024
- A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique Deep learning has revolutionized industries like computer vision, natural language processing, and speech recognition. However, back propagation, the main method for training deep neural networks, faces challenges like computational overhead and vanishing gradients. In this paper, we propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer. Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets. This research presents a promising direction for efficient and effective deep neural network training. 1 authors · Aug 9, 2023
- Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment Forced alignment (FA) plays a key role in speech research through the automatic time alignment of speech signals with corresponding text transcriptions. Despite the move towards end-to-end architectures for speech technology, FA is still dominantly achieved through a classic GMM-HMM acoustic model. This work directly compares alignment performance from leading automatic speech recognition (ASR) methods, WhisperX and Massively Multilingual Speech Recognition (MMS), against a Kaldi-based GMM-HMM system, the Montreal Forced Aligner (MFA). Performance was assessed on the manually aligned TIMIT and Buckeye datasets, with comparisons conducted only on words correctly recognized by WhisperX and MMS. The MFA outperformed both WhisperX and MMS, revealing a shortcoming of modern ASR systems. These findings highlight the need for advancements in forced alignment and emphasize the importance of integrating traditional expertise with modern innovation to foster progress. Index Terms: forced alignment, phoneme alignment, word alignment 4 authors · Jun 27, 2024
- Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules. 7 authors · Jul 8, 2024 2
- Short-term Volatility Estimation for High Frequency Trades using Gaussian processes (GPs) The fundamental theorem behind financial markets is that stock prices are intrinsically complex and stochastic. One of the complexities is the volatility associated with stock prices. Volatility is a tendency for prices to change unexpectedly [1]. Price volatility is often detrimental to the return economics, and thus, investors should factor it in whenever making investment decisions, choices, and temporal or permanent moves. It is, therefore, crucial to make necessary and regular short and long-term stock price volatility forecasts for the safety and economics of investors returns. These forecasts should be accurate and not misleading. Different models and methods, such as ARCH GARCH models, have been intuitively implemented to make such forecasts. However, such traditional means fail to capture the short-term volatility forecasts effectively. This paper, therefore, investigates and implements a combination of numeric and probabilistic models for short-term volatility and return forecasting for high-frequency trades. The essence is that one-day-ahead volatility forecasts were made with Gaussian Processes (GPs) applied to the outputs of a Numerical market prediction (NMP) model. Firstly, the stock price data from NMP was corrected by a GP. Since it is not easy to set price limits in a market due to its free nature and randomness, a Censored GP was used to model the relationship between the corrected stock prices and returns. Forecasting errors were evaluated using the implied and estimated data. 3 authors · Nov 17, 2023