Adapting commited on
Commit
6314790
·
1 Parent(s): 9738f1c
documents/docs/1-搜索功能.md ADDED
File without changes
documents/docs/2-总结功能.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 2 Research Trends Summarization
2
+
3
+ ## Model Architecture
4
+ ![](https://i.imgur.com/Lv8um1V.png)
5
+
6
+ ### 1 Baseline Configuration
7
+ 1. pre-trained language model: `sentence-transformers/all-MiniLM-L6-v2`
8
+ 2. dimension reduction: `None`
9
+ 3. clustering algorithms: `kmeans`
10
+ 4. keywords extraction model: `keyphrase-transformer`
11
+
12
+ [[example run](https://github.com/Mondkuchen/idp_LiteratureResearch_Tool/blob/main/example_run.py)] [[results](https://github.com/Mondkuchen/idp_LiteratureResearch_Tool/blob/main/examples/IDP.ipynb)]
13
+
14
+
15
+ ### TODO:
16
+ 1. clustering: using other clustering algorithms such as Gausian Mixture Model (GMM)
17
+ 2. keywords extraction model: train another model
18
+ 3. add dimension reduction
19
+ 4. better PLM: sentence-transformers/sentence-t5-xxl
documents/docs/3-visualization.md ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # 3 Visualization
2
+ [web app](https://huggingface.co/spaces/Adapting/literature-research-tool)
documents/docs/4-文献分析平台比较.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 4 Other Literature Research Tools
2
+ ## 1 Citespace
3
+
4
+ > 作者:爱学习的毛里
5
+ > 链接:https://www.zhihu.com/question/27463829/answer/284247493
6
+ > 来源:知乎
7
+ > 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
8
+
9
+ 一、工作原理
10
+ 简单来讲,citespace主要基于“共现聚类”思想:
11
+
12
+ 1. 首先对科学文献中的信息单元进行提取
13
+ - 包括文献层面上的参考文献,主题层面上的**关键词**、主题词、学科、领域分类等,主体层面上的作者、机构、国家、期刊等
14
+ 2. 然后根据信息单元间的联系类型和强度进行重构,形成不同意义的网络结构
15
+ - 如关键词共现、作者合作、文献共被引等,
16
+ - 网络中的节点代表文献信息单元,连线代表节点间的联系(共现)
17
+ 3. 最后通过对节点、连线及网络结构进行测度、统计分析(聚类、突现词检测等)和可视化,发现特定学科和领域知识结构的隐含模式和规律。
18
+
19
+ ![](https://pica.zhimg.com/50/v2-b94a8061c72d6e299a059da0c1cb3813_720w.jpg?source=1940ef5c)*共现聚类思想*
20
+
21
+ 二、主要用途
22
+
23
+ 1. **<u>研究热点分析</u>**:一般利用关键词/主题词共现
24
+ 2. 研究前沿探测:共被引、耦合、共词、突现词检测都有人使用,但因为对“研究前沿”的定义尚未统一,所以方法的选择和图谱结果的解读上众说纷纭
25
+ 3. 研究演进路径分析:将时序维度与主题聚类结合,例如citespace中的时间线图和时区图
26
+ 4. 研究群体发现:一般建立作者/机构合作、作者耦合等网络,可以发现研究小团体、核心作者/机构等
27
+ 5. 学科/领域/知识交叉和流动分析:一般建立期刊/学科等的共现网络,可以研究学科之间的交叉、知识流动和融合等除分析 科学文献 外,citespace也可以用来分析 专利技术文献,用途与科学文献类似,包括技术研究热点、趋势、结构、核心专利权人或团体的识别等。
28
+
29
+ 三、工作流程
30
+ ![](https://pic1.zhimg.com/50/v2-165aa367fa07d8e46f286dfe06f0fce4_720w.jpg?source=1940ef5c)
31
+ *摘自《引文空间分析原理与应用》*
32
+
33
+ ### 聚类算法
34
+
35
+ CiteSpace提供的算法有3个,3个算法的名称分别是:
36
+
37
+ - LSI/LSA: Latent Semantic Indexing/Latent Semantic Analysis 浅语义索引
38
+ [intro](https://www.cnblogs.com/pinard/p/6805861.html)
39
+
40
+ - LLR: Log Likelihood Ratio 对数极大似然率
41
+
42
+ - MI: Mutual Information 互信息
43
+
44
+
45
+ 对不同的数据,3种算法表现一样,可在实践中多做实践。
46
+
47
+ [paper](https://readpaper.com/paper/2613897633)
48
+
49
+ ## 2 VOSviewer
50
+
51
+ VOSviewer的处理流程与大部分的科学知识图谱类软件类似,即文件导入——信息单元抽取(如作者、关键词等)——建立共现矩阵——利用相似度计算对关系进行标准化处理——统计分析(一般描述统计+聚类)——可视化展现(布局+其它图形属性映射)
52
+
53
+
54
+ Normalization, mapping, and clustering
55
+
56
+ [paper](https://www.vosviewer.com/download/f-x2.pdf) (See Appendix)
documents/docs/index.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Intro
2
+
3
+
4
+ <!-- - [x] objective/Aim of the practical part
5
+ - [x] tasks/ work packages,
6
+ - [x] Timeline and Milestones
7
+ - [x] Brief introduction of the practice partner
8
+ - [x] Description of theoretical part and explanation of how the content of the lecture(s)/seminar(s) supports student in completing the practical part. -->
9
+
10
+
11
+
12
+
13
+ ## IDP Theme
14
+ IDP Theme: Developing a Literature Research Tool that Automatically Search Literature and Summarize the Research Trends.
15
+
16
+ ## Objective
17
+ In this IDP, we are going to develop a literature research tool that enables three functionalities:
18
+ 1. Automatically search the most recent literature filtered by keywords on three literature platforms: Elvsier, IEEE and Google Scholar
19
+ 2. Automatically summarize the most popular research directions and trends in the searched literature from step 1
20
+ 3. visualize the results from step 1 and step 2
21
+
22
+
23
+ ## Timeline & Milestones & Tasks
24
+ ![](https://i.imgur.com/mDK0sc0.png)
25
+
26
+ #### Tasks
27
+ | Label | Start | End | Duration | Description |
28
+ | ------- |------------| ---------- |----------| -------------------------------------------------------------------------------------------------------- |
29
+ | Task #1 | 15/11/2022 | 15/12/2022 | 30 days | Implement literature search by keywords on three literature platforms: Elvsier, IEEE, and Google Scholar |
30
+ | Task #2 | 15/12/2022 | 15/02/2023 | 60 days | Implement automatic summarization of research trends in the searched literature |
31
+ | Task #3 | 15/02/2022 | 15/03/2022 | 30 days | visualization of the tool (web app) |
32
+ | Task #4 | 01/03/2022 | 01/05/2022 | 60 days | write report and presentation |
33
+
34
+
35
+ ## Correlation between the theoretical course and practical project
36
+ The accompanying theory courses *Machine Learning and Optimization* or *Machine Learning for Communication* teach basic and advanced machine learning (ML) and deep learning (DL) knowledge.
37
+
38
+
39
+ The core part of the project, in my opinion, is the automatic summarization of research trends/directions of the papers, which can be modeled as a **Topic Modeling** task in Natural Language Processing (NLP). This task requires machine learning and deep learning knowledge, such as word embeddings, transformers architecture, etc.
40
+
41
+ Therefore, I would like to take the Machine Learning and Optimization course or Machine learning for Communication course from EI department. And I think these theory courses should be necessary for a good ML/DL basis.
42
+
43
+
documents/mkdocs.yml ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ site_name: LRT Document
2
+ theme: material
3
+