codewithdark commited on
Commit
1e81694
·
verified ·
1 Parent(s): 4b8f175

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -33
README.md CHANGED
@@ -1,61 +1,102 @@
1
- ---
2
- license: mit
3
- datasets:
4
- - carlosejimenez/wikitext__wikitext-2-raw-v1
5
- language:
6
- - en
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - torch
11
- - Tkinking
12
- ---
13
 
14
  # Latent Recurrent Depth Language Model
15
 
16
- ## Model Description
17
 
18
- This model is a Latent Recurrent Depth Language Model (LRD-LM), an experimental architecture designed for text generation. It combines a "prelude" block for initial processing, a recurrent block with a latent state, and a "coda" block for final output. The recurrent block allows for multiple iterations over the input sequence, potentially capturing deeper contextual information.
19
 
 
20
 
21
- ## Intended Uses & Limitations
22
 
23
- **Intended Uses:**
 
 
 
 
24
 
25
- * Text generation: The primary purpose of this model is to generate text given a prompt. It can potentially be fine-tuned for specific tasks like creative writing, code generation, or dialogue generation.
26
- * Research: This model serves as an exploration of novel architectures for language modeling, potentially leading to more effective methods for capturing long-range dependencies.
 
 
 
 
 
 
 
 
27
 
28
  **Limitations:**
 
 
 
 
 
 
 
 
29
 
30
- * Data limitations: The model has been trained on a small subset of the Wikitext-2-raw dataset. Performance may be limited compared to models trained on larger, more diverse corpora.
31
- * Performance: While the model demonstrates basic text generation capabilities, its overall performance is likely inferior to established state-of-the-art language models. The provided training loop and hyperparameters are a starting point and may require significant adjustments for optimal results.
32
- * Computational cost: The iterative nature of the recurrent block can introduce computational overhead.
33
- * Bias: Like all language models, this model may exhibit biases present in its training data.
34
 
 
35
 
36
- ## Training Data
37
 
38
- The model was trained on a subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset. Further details regarding pre-processing and data cleaning can be found in the source code. This limitation may reflect biases or inaccuracies in the generated output.
39
 
 
40
 
41
- ## Evaluation Results
 
42
 
43
- No formal evaluation metrics are provided at this time. The model's performance is primarily demonstrated through qualitative assessment of generated samples during and after training. Further evaluation using established metrics is recommended.
 
 
44
 
 
 
45
 
46
- ## Ethical Considerations
 
47
 
48
- This model is provided for research and experimental purposes. The user is responsible for ensuring ethical usage and mitigating potential risks associated with the generated output.
 
 
 
 
 
 
 
49
 
 
50
 
51
- ## Model Usage Instructions
 
52
 
53
- The model can be used for text generation via the `generate()` method. The usage of this function is demonstrated in the example script.
 
54
 
 
 
 
 
 
 
55
 
56
- ## Training
57
 
58
- The model has been trained using the AdamW optimizer and a cosine annealing learning rate scheduler for a number of epochs. The training parameters can be configured in the provided script.
59
 
 
60
 
61
- ## Usage Example (Python)
 
1
+ license: mit
2
+ datasets:
3
+ - carlosejimenez/wikitext__wikitext-2-raw-v1
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
+ tags:
9
+ - torch
10
+ - Tkinking
 
 
11
 
12
  # Latent Recurrent Depth Language Model
13
 
14
+ ## Overview
15
 
16
+ The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest.
17
 
18
+ ## Architecture
19
 
20
+ The model is built around three key components:
21
 
22
+ - **Prelude Block:**
23
+ This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings.
24
+
25
+ - **Recurrent Block:**
26
+ A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens.
27
 
28
+ - **Coda Block:**
29
+ The final block decodes the refined latent state into output token probabilities.
30
+
31
+ ## Applications & Limitations
32
+
33
+ **Intended Uses:**
34
+ - **Text Generation:**
35
+ Generate creative text, dialogue, code, or other natural language content.
36
+ - **Research:**
37
+ Serve as a testbed for exploring novel architectures and techniques in language modeling.
38
 
39
  **Limitations:**
40
+ - **Data Constraints:**
41
+ Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora.
42
+ - **Performance:**
43
+ While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models.
44
+ - **Computational Overhead:**
45
+ The iterative processing introduces extra computation.
46
+ - **Bias:**
47
+ As with all language models, generated outputs may reflect biases present in the training data.
48
 
49
+ ## Training Details
 
 
 
50
 
51
+ The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance.
52
 
53
+ ## Usage
54
 
55
+ The model can be used for text generation via its integrated `generate()` method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering.
56
 
57
+ ### Example: Direct Inference
58
 
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
 
62
+ # Load the model and tokenizer from the hub
63
+ model = LatentRecurrentDepthModel.from_pretrained("codewithdark/latent-recurrent-depth-lm")
64
+ tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
65
 
66
+ prompt = "In the realm of language modeling"
67
+ input_ids = tokenizer(prompt, return_tensors='pt').input_ids
68
 
69
+ # Generate logits using a specified number of recurrent iterations
70
+ logits = model(input_ids, num_iterations=3)
71
 
72
+ # Sample from logits to produce generated text
73
+ import torch
74
+ probs = torch.softmax(logits[:, -1, :], dim=-1)
75
+ next_token = torch.multinomial(probs, num_samples=1)
76
+ generated_ids = torch.cat([input_ids, next_token], dim=1)
77
+ generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
78
+ print(generated_text)
79
+ ```
80
 
81
+ ### Alternative: Using the `generate()` Method
82
 
83
+ ```python
84
+ from transformers import AutoTokenizer, AutoModelForCausalLM
85
 
86
+ tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
87
+ model = LatentRecurrentDepthModel.from_pretrained("codewithdark/latent-recurrent-depth-lm")
88
 
89
+ prompt = "In the realm of language modeling"
90
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
91
+ generated_ids = model.generate(input_ids, max_length=50, num_iterations=3, temperature=0.8, top_k=50)
92
+ generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
93
+ print(generated_text)
94
+ ```
95
 
96
+ ## Ethical Considerations
97
 
98
+ This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology.
99
 
100
+ ## License
101
 
102
+ This project is licensed under the MIT License.