Acknowledge License and Additional Terms of Use to accept the repository
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Copyright 2025 Fujitsu Limited
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
Additional Terms of Use
In addition to the License, You shall use Fujitsu-LLM-KG-8x7B_cpt upon agreeing to this Terms of Use (hereinafter referred to as “TOU”). All capitalized terms used but not defined in this TOU have the meanings set forth in the License.
Article 1 (License to Use)
You must incorporate the TOU and the License into the license terms for redistribution of Fujitsu-LLM-KG-8x7B_cpt or Derivative Works of Fujitsu-LLM-KG-8x7B_cpt, or into the terms of use for services using Fujitsu-LLM-KG-8x7B_cpt or Derivative Works. Those who violate the TOU and/or License are not allowed to use Fujitsu-LLM-KG-8x7B_cpt.
Article 2 (Responsibility)
You shall use Fujitsu-LLM-KG-8x7B_cpt at Your own responsibility and discretion, and shall handle any disputes arising with third parties in relation to the use of Fujitsu-LLM-KG-8x7B_cpt at Your own responsibility and expense, and shall indemnify, defend and hold harmless the Licensor against all damages and losses without causing any inconvenience to the Licensor. You shall deal with any damages caused by the use of Fujitsu-LLM-KG-8x7B_cpt at Your own responsibility.
Article 3 (Prohibited Actions)
You shall not engage in the following actions when using Fujitsu-LLM-KG-8x7B_cpt.
(1) Actions that will or may infringe on the intellectual property rights of the Licensor or third parties;
(2) Actions that will or may infringe on the property, privacy, or portrait rights of the Licensor or third parties;
(3) Actions that discriminate against, defame, insult, or slander the Licensor or third parties, promote discrimination against others, or damage the reputation or credibility of others;
(4) Actions that engage in unauthorized legal services and/or provide legal advice from anyone other than a qualified professional;
(5) Actions that provide financial advice from anyone other than a qualified professional;
(6) Medical actions, including providing health advice or suggesting treatment methods; and
(7) Other actions that require permissions or other forms of authorization under laws and regulations.
Article 4 (Restrictions)
- You acknowledge that the results of processing using Fujitsu-LLM-KG-8x7B_cpt (hereinafter referred to as "Processing Results") may contain falsehoods, biases, content that infringes on the rights of others, or content that does not meet the effectiveness or usefulness expected by You, and agree to use Fujitsu-LLM-KG-8x7B_cpt on the premise that inaccurate or inappropriate Processing Results may cause damage or infringement of rights to You or third parties and/or ethical concerns. You shall use the Processing Results after confirming their accuracy, legality, and ethical validity themselves. If the use of Fujitsu-LLM-KG-8x7B_cpt, including the Processing Results, by You cause infringement of the rights of You or third parties, the Licensor shall not be responsible for any damages and losses, and You shall indemnify, defend and hold harmless the Licensor against all damages and losses without causing any inconvenience to the Licensor.
- You shall use the Processing Results in compliance with the regulations such as laws and regulations in each country and region.
- You shall not use the Processing Results for the actions listed in Article 3 (Prohibited Actions).
Article 5 (Ownership of Rights)
You will acquire rights newly arising from the creation of Derivative Works of Fujitsu-LLM-KG-8x7B_cpt, but You shall use Derivative Works in accordance with the above License and TOU.
Article 6 (Export Transaction)
You shall obtain the necessary permissions yourself when exporting Fujitsu-LLM-KG-8x7B_cpt and the Processing Results in relation to Your use, where such export requires permissions under the Foreign Exchange and Foreign Trade Act (including related cabinet order and ministerial order) or U.S. export control laws and regulations.
Article 7 (Jurisdictional Court)
The Tokyo District Court shall have exclusive jurisdiction in the first instance over any disputes arising in relation to TOU.
Article 8 (Governing Law)
TOU shall be governed by the laws of Japan.
Article 9 (Other Provisions)
Except the terms of the License, TOU sets forth the entire agreement as to all matters concerning the use of Fujitsu-LLM-KG-8x7B_cpt between You and the Licensor, and matters not provided for in the TOU shall be governed by the relevant laws and regulations.
Log in or Sign Up to review the conditions and access this model content.
Fujitsu-LLM-KG-8x7B_cpt
本モデルは、国立研究開発法人 新エネルギー・産業技術総合開発機構(NEDO)の公募「ポスト5G情報通信システム基盤強化研究開発事業/①ポスト5G情報通信システムの開発」および経済産業省が主催する「Generative AI Accelerator Challenge(GENIAC)プロジェクト」に採択された当富士通株式会社の提案事業「論理推論を可能とする大規模言語モデルの研究開発」中に開発した、ナレッジグラフの生成/推論に特化した大規模言語モデル(LLM)の1つです。
同提案事業中に開発したモデルは、以下の表に一覧しています。 各モデルの評価結果や開発内容については、富士通研究所の技術ブログに詳細がありますので是非ご覧ください。
Model Index
モデル | 名称 | 概要 |
---|---|---|
Fujitsu-LLM-KG-8x7B_cpt | 共通事前学習済みLLM | ナレッジグラフ対訳コーパスで継続事前学習したLLM。 |
Fujitsu-LLM-KG-8x7B_inst-infer_v1 | ナレッジグラフ推論LLM ver.1 | 日本語のマルチホップQAタスクデータで指示学習したLLM。 |
Fujitsu-LLM-KG-8x7B_inst-infer_v2 | ナレッジグラフ推論LLM ver.2 | 英語のマルチホップQAタスクデータで指示学習したLLM。 |
Fujitsu-LLM-KG-8x7B_inst-gen_ja | ナレッジグラフ生成LLM(日本語版) | 日本語の文書レベル関係抽出タスクデータで指示学習したLLM。 |
Fujitsu-LLM-KG-8x7B_inst-gen_en | ナレッジグラフ生成LLM(英語版) | 英語の文書レベル関係抽出タスクデータで指示学習したLLM。 |
Model Details
- Developed by: Fujitsu-LLM
- Base Model: mistralai/Mixtral-8x7B-Instruct-v0.1
- Language(s): Japanese, English
- Library: NVIDIA/NeMo
- License: Apache-2.0
Model Performance
- 富士通研究所の技術ブログをご参照ください。
How to use
Preparation
必要なPythonモジュールをインストールする。
# Tested with the following versions; transformers==4.48.1, torch==2.5.1, and accelerate==1.3.0.
$ pip install transformers torch accelerate
ユーティリティを定義する。
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
class Fujitsu_LLM_KG:
"""The Fujitsu-LLM-KG-8x7B model.
"""
def __init__(self, model_id: str, *, device_map: str = "auto") -> None:
"""Initializes the model and tokenizer.
"""
self.model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map=device_map,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
)
self.tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
self.tokenizer.pad_token = self.tokenizer.eos_token
def generate(self, prompt:str,
*,
max_new_tokens: int = 2048,
num_beams: int = 1,
) -> str:
"""Generate an answer.
"""
tokenized = self.tokenizer(prompt, return_tensors="pt", padding=True)
with torch.no_grad():
outputs = self.model.generate(
tokenized["input_ids"].to("cuda"),
attention_mask=tokenized["attention_mask"].to("cuda"),
pad_token_id=self.tokenizer.eos_token_id,
max_new_tokens=max_new_tokens,
do_sample=False,
num_beams=num_beams,
)
answer = self.tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):]
return answer
def extract_turtle(text: str, *, with_rationale = False) -> str:
"""Extracts the RDF Turtle part from the output text of the Fujitsu-LLM-KG-8x7B models.
"""
TOKENS = ["<", "rel:", "rdf:", "]"]
if with_rationale:
TOKENS.append("#@")
turtle = ""
for line in text.splitlines():
line_ = line.strip()
if line == "" or any(line_.startswith(c) for c in TOKENS):
if turtle:
turtle += "\n"
turtle += line
return turtle
モデルを読み込む。
kgllm = Fujitsu_LLM_KG("Fujitsu-LLM-KG/Fujitsu-LLM-KG-8x7B_cpt")
Generates Knowledge Graph from Text
タスクを指示する。
prompt = """
[INST]
Generate "Knowledge Graph" in RDF Turtle format based on the given "Source".
## Source
```txt
宗像聡は富士通に2010年から勤めています。
彼はFujitsu-LLM-KG-8x7B_cptを開発しました。
```
## Strategy
Extract all verifiable facts in "Source" as knowledge triples.
[/INST]
""".strip()
generated = kgllm.generate(prompt)
print(generated)
結果を確認する。
## Knowledge Graph
```turtle
#@rationale: 宗像聡は富士通に2010年から勤めています。
<#宗像聡>
rel:employer [
rdf:object <#富士通>;
rel:start_time <#2010>
].
#@rationale: 彼はFujitsu-LLM-KG-8x7B_cptを開発しました。
<#宗像聡>
rel:notable_work <#Fujitsu-LLM-KG-8x7B_cpt>.
```
Generates Text from Knowledge Graph
タスクを指示する。
prompt = """
Generate "Text" to explain the given knowledge triples in "Source".
## Source
```turtle
<#Satoshi Munakata>
rel:notable_work <#Fujitsu-LLM-KG-8x7B_cpt>;
rel:employer [
rdf:object <#Fujitsu>;
rel:start_time <#2010>
].
```
## Strategy
Explain the knowledge triples in "Source" without omission, but concisely and fluently.
[/INST]
""".strip()
generated = kgllm.generate(prompt)
print(generated)
結果を確認する。
## Text
```txt
Satoshi Munakata, who started working for Fujitsu in 2010, is the creator of the notable work Fujitsu-LLM-KG-8x7B cpt.
```
Training Datasets
Continual Pre-Training
- ナレッジグラフ対訳コーパス(日・英)
- cf., 当社の技術ブログ
- ナレッジグラフ関連ソースコード(日・英)
- e.g., RDF Turtle、SPARQL、PlantUML、など
- ナレッジグラフ関連コーパス(日・英)
- "ナレッジグラフ"、"知識トリプル"、などでフィルタリング
- Webクローリングデータ(日)
- Wikipediaデータ(日・英)
- 論文コーパス(日)
- 法律データ(日)
- 判例コーパス(日)
- 英語数学コーパス(英)
- 法律対訳コーパス(日英)
- 字幕対訳コーパス(日英)
License
Fujitsu-LLM-KG-8x7B_cptの利用規約(Terms of Use)は、LICENSEファイルに記載しております。
Risks and Limitations
Fujitsu-LLM-KG-8x7B_cptを利用した処理結果には、虚偽、偏り、他者の権利を侵害する内容、利用者が期待する効果や有用性を満たさない内容が含まれる可能性があります。
Acknowledgements
本モデルの開発は、NEDOが推進する「ポスト5G情報通信システム基盤強化研究開発事業/ポスト5G情報通信システムの開発」の助成を受けたものです。
Authors
- 富士通株式会社
- Downloads last month
- 2