LLM-EDA
/

VeriDebug

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- WANGNingroci/VeriSeek
+---
+Usage:
+```python
+import re
+import json
+import numpy as np
+from src.gritlm import GritLM
+from scipy.spatial.distance import cosine
+KEY_WORDS = ['endmodule', 'end', 'endcase', 'else', 'begin']
+REP_QUERY = 'Represent this text: '
+LINE_QUERY = """Now you are a verilog designer. You are given the design description and buggy verilog code segment. Infer the bug type in the code segment."""
+TYPE_QUERY = "Now you are a verilog designer. You are given the design description and buggy verilog code segment. Infer the bug type in the code segment."
+CLS_QUERY = 'Now you are a verilog designer. You are given the design description and buggy verilog code segment. Infer the bug type in the code segment.\n'
+CLS_DESC = 'The bug type is '
+BUG_CLS = {
+    'width': 0, 'logic': 0, 'assignment': 0, 'initial': 0, 'data': 0,
+    'state': 0, 'others': 0, 'comparison': 0, 'bitwise': 0, 'condition': 0,
+    'signal': 0, 'arithmetic': 0, 'value': 0
+}
+BUG_DESC = {
+    'width': 'Mismatched bit widths in assignments, operations, or port connections, leading to unintended truncation or zero-extension.',
+    'logic': 'Errors in combinational or sequential logic design, resulting in incorrect circuit behavior or timing issues.',
+    'assignment': 'Improper use of blocking (=) or non-blocking (<=) assignments, causing race conditions or unexpected signal updates.',
+    'initial': 'Incorrect initialization of variables or registers, leading to undefined behavior or simulation mismatches.',
+    'data': 'Errors in data handling, such as incorrect data types, improper conversions, or misuse of signed/unsigned values.',
+    'state': 'Flaws in finite state machine (FSM) design, including missing states, incorrect transitions, or improper state encoding.',
+    'others': 'Miscellaneous errors that don\'t fit into other categories, such as syntax errors or tool-specific issues.',
+    'comparison': 'Incorrect use of equality (==) or inequality (!=) operators, or misuse of case equality (===) and case inequality (!==).',
+    'bitwise': 'Errors in bitwise operations, including incorrect use of AND, OR, XOR, or shift operators.',
+    'condition': 'Flaws in conditional statements (if-else, case) leading to incorrect branching or priority encoding issues.',
+    'signal': 'Errors related to signal declarations, including incorrect use of wire/reg, input/output ports, or signal naming conflicts.',
+    'arithmetic': 'Mistakes in arithmetic operations, such as overflow/underflow issues or incorrect use of signed/unsigned arithmetic.',
+    'value': 'Incorrect constant values, parameter definitions, or literal representations leading to unexpected circuit behavior.'
+}
+GEN_INST = 'Now you are a verilog designer. You need to fix the bug in the buggy code segment:\n'
+SPEC = """
+---
+### Module Specification: Cfu
+# #### 1. Overview
+# The `Cfu` (Custom Function Unit) module is designed to perform a simple selection operation based on the input command. It processes two 32-bit inputs and outputs one of them based on the least significant bit of the function ID. The module operates synchronously with a clock signal and uses a simple handshake protocol for command acceptance and response delivery.
+# #### 2. Interface Description
+# ##### Inputs:
+# - **cmd_valid** (`input`): A signal indicating if the command inputs are valid.
+# - **cmd_payload_function_id** (`input [9:0]`): A 10-bit function identifier which determines the operation of the module. Currently, only the LSB is used for selecting the output.
+# - **cmd_payload_inputs_0** (`input [31:0]`): A 32-bit input data.
+# - **cmd_payload_inputs_1** (`input [31:0]`): Another 32-bit input data.
+# - **rsp_ready** (`input`): A signal from the downstream component indicating it is ready to accept the response.
+# - **reset** (`input`): Asynchronous reset signal.
+# - **clk** (`input`): Clock signal.
+# ##### Outputs:
+# - **cmd_ready** (`output`): A signal indicating the module is ready to accept a command.
+- **rsp_valid** (`output`): A signal indicating that the response is valid and ready to be read.
+# - **rsp_payload_outputs_0** (`output [31:0]`): The 32-bit output data, which is one of the two input data values based on the function ID.
+#### 3. Functional Description
+##### Command and Response Protocol:
+- **Handshake Mechanism**: The module uses a simple handshake mechanism for command acceptance and response delivery. The `cmd_ready` signal is asserted when the module is ready to accept a new command, which depends on the `rsp_ready` signal. The `rsp_valid` signal is asserted when the module has a valid response ready, which is directly tied to the `cmd_valid` signal.
+##### Data Processing:
+- **Output Selection**: The output, `rsp_payload_outputs_0`, is selected based on the least significant bit (LSB) of `cmd_payload_function_id`. If the LSB is 0, `rsp_payload_outputs_0` is equal to `cmd_payload_inputs_0`. If the LSB is 1, `rsp_payload_outputs_0` is equal to `cmd_payload_inputs_1`.
+#### 4. Timing and Synchronization
+- The module operates synchronously with respect to the provided clock signal (`clk`). All inputs are sampled, and outputs are updated on the rising edge of the clock.
+- The reset (`reset`) is asynchronous and active-high, which means all internal states and outputs are reset when `reset` is asserted, regardless of the clock.
+#### 5. Use Cases
+- **Simple Data Selector**: This module can be used in systems where conditional data forwarding is needed based on a simple configuration or status bit.
+# - **Function ID Expansion**: While currently only the LSB of the function ID is used, the module can be expanded to use more bits for more complex selection logic or operations.
+#### 6. Limitations and Future Enhancements
+# - **Function ID Utilization**: Currently, only the LSB of the function ID is used. Future enhancements could include decoding more bits to perform different operations.
+- **Pipeline Stages**: The module is purely combinational regarding the data path. Adding pipeline stages could help in meeting timing requirements for higher clock frequencies.
+---
+This specification provides a detailed overview of the `Cfu` module's functionality, interface, and behavior based on the provided Verilog code. It outlines the basic operation, use cases, and potential areas for future enhancements.
+Buggy code:
+"""
+BUGGY = """
+module Cfu (
+    input               cmd_valid,
+    output              cmd_ready,
+    input      [9:0]    cmd_payload_function_id,
+    input      [31:0]   cmd_payload_inputs_0,
+    input      [31:0]   cmd_payload_inputs_1,
+    output              rsp_valid,
+    input               rsp_ready,
+    output     [31:0]   rsp_payload_outputs_0,
+    input               reset,
+    input               clk
+    );
+    // Trivial handshaking for a combinational CFU
+    assign rsp_valid = cmd_valid;
+    assign cmd_ready = rsp_ready | cmd_valid;
+    //
+    // select output -- note that we're not fully decoding the 3 function_id bits
+    //
+    assign rsp_payload_outputs_0 = cmd_payload_function_id[0] ?
+    cmd_payload_inputs_1 :
+    cmd_payload_inputs_0 ;
+    endmodule
+"""
+JSON_FORMAT = '{"buggy_code": "The buggy code in the systemverilog (just one line of code)", "correct_code": "The correct code (just one line of code that can directly replace the buggy code, without any other description)"}'
+BUGGY_LINE_GT = "assign cmd_ready = rsp_ready | cmd_valid;"
+BUGGY_CLS_GT = "logic"
+FIX_GT = "assign cmd_ready = rsp_ready;"
+def gen_neg(buggy_code):
+    buggy_code_lines = buggy_code.split('\n')
+    buggy_code_lines = [line.strip() for line in buggy_code_lines]
+    buggy_code_lines = [line.strip('\t') for line in buggy_code_lines]
+    buggy_code_lines = [line.strip('\r') for line in buggy_code_lines]
+    buggy_code_lines = [line for line in buggy_code_lines if len(line) > 0]
+    # remove comments
+    buggy_code_lines_neg = [
+        line for line in buggy_code_lines if not line.startswith('//') and not line.startswith('*') and not line.startswith('/*') and line not in KEY_WORDS]
+    # remove not useful lines
+    buggy_code_lines_neg = [
+        line for line in buggy_code_lines_neg if ' ' in line and len(line.replace(' ', '')) > 4]
+    return buggy_code_lines, buggy_code_lines_neg
+def gritlm_instruction(instruction):
+    return "<|user|>\n" + instruction + "\n<|embed|>\n" if instruction else "<|embed|>\n"
+def extract_bug_types(text):
+    # Define the regex pattern
+    pattern = r'The bug type is (\w+)'
+    # Find all matches
+    matches = re.findall(pattern, text)
+    return matches
+# Loads the model for both capabilities; If you only need embedding pass `mode="embedding"` to save memory (no lm head)
+model_path = "./VeriDebug"
+model = GritLM(model_path, torch_dtype="auto", mode="unified")
+print(f"Model loaded from {model_path}")
+### Embedding/Representation ###
+# buggy line location
+_, buggy_code_lines = gen_neg(BUGGY)
+query = [LINE_QUERY + "\n" + SPEC + "\n" + BUGGY]
+q_rep = model.encode(query,
+                     instruction=gritlm_instruction("Represent this text:"), max_length=4096)
+d_rep = model.encode(buggy_code_lines,
+                     instruction=gritlm_instruction(""),
+                     max_length=128)
+cosine_sim = [1 - cosine(q_rep[0], d) for d in d_rep]
+sim_rank = np.argsort(cosine_sim)[::-1]
+buggy_code_lines_ranked = [buggy_code_lines[i] for i in sim_rank]
+print("========== Buggy code lines ranked by similarity ==========")
+print(f"Buggy code lines candidates (ranked by similarity): \n{buggy_code_lines_ranked}")
+print("----------------------------------------")
+print(f"Ground truth: {BUGGY_LINE_GT}")
+print("===========================================================")
+# buggy type classification
+query = [TYPE_QUERY + "\n" + SPEC + "\n" + BUGGY]
+d_types = [CLS_DESC + b + '.' + BUG_DESC[b]
+                   for b in BUG_CLS.keys()]
+q_rep = model.encode(query,
+                     instruction=gritlm_instruction(REP_QUERY),
+                     max_length=4096)
+d_rep = model.encode(d_types,
+                     instruction=gritlm_instruction(""),
+                     max_length=128)
+cosine_sim = [1 - cosine(q_rep[0], d) for d in d_rep]
+sim_rank = np.argsort(cosine_sim)[::-1]
+buggy_type_ranked = [d_types[i] for i in sim_rank]
+print("============ Buggy type ranked by similarity ==============")
+print(f"Buggy type candidates (ranked by similarity): \n{[extract_bug_types(i)[0] for i in buggy_type_ranked]}")
+print("----------------------------------------")
+print(f"Ground truth: {BUGGY_CLS_GT}")
+print("===========================================================")
+### Generation ###
+instruct = f'{GEN_INST}{BUGGY}\n\nThe specification file of this code is:\n{SPEC}\n\nThe possible buggy lines ranking list are:\n{buggy_code_lines_ranked}\n\nThe possible bug type ranking list are:\n{buggy_type_ranked}\n\nYour task is to return me a json to analyze how the code should be modified, in the following format:\n{JSON_FORMAT}.'
+messages = [
+            {"role": "user", "content": instruct},
+        ]
+encoded = model.tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt")
+encoded = encoded.to(model.device)
+gen = model.generate(encoded, max_new_tokens=256, do_sample=True)
+valid_gen = gen[:, encoded.shape[1]:]
+decoded = model.tokenizer.batch_decode(valid_gen)
+# truncate the decoded text before </s>
+decoded = [d[:d.find('}')+1] for d in decoded]
+decoded_dict = json.loads(decoded[0])
+print("==================== Buggy fix ============================")
+print(f"Fix result: {decoded_dict}")
+print("----------------------------------------")
+print(f"Ground truth: {FIX_GT}")
+print("===========================================================")
+```