Spaces:

Adapting
/

TrendFlow

Runtime error

App Files Files Community

Adapting commited on Oct 27, 2022

Commit

6314790

1 Parent(s): 9738f1c

test

Browse files

Files changed (6) hide show

documents/docs/1-搜索功能.md +0 -0
documents/docs/2-总结功能.md +19 -0
documents/docs/3-visualization.md +2 -0
documents/docs/4-文献分析平台比较.md +56 -0
documents/docs/index.md +43 -0
documents/mkdocs.yml +3 -0

documents/docs/1-搜索功能.md ADDED Viewed

File without changes

documents/docs/2-总结功能.md ADDED Viewed

	@@ -0,0 +1,19 @@

+# 2 Research Trends Summarization
+## Model Architecture
+![](https://i.imgur.com/Lv8um1V.png)
+### 1 Baseline Configuration
+1. pre-trained language model: `sentence-transformers/all-MiniLM-L6-v2`
+2. dimension reduction: `None`
+3. clustering algorithms: `kmeans`
+4. keywords extraction model: `keyphrase-transformer`
+[[example run](https://github.com/Mondkuchen/idp_LiteratureResearch_Tool/blob/main/example_run.py)] [[results](https://github.com/Mondkuchen/idp_LiteratureResearch_Tool/blob/main/examples/IDP.ipynb)]
+### TODO:
+1. clustering: using other clustering algorithms such as Gausian Mixture Model (GMM)
+2. keywords extraction model: train another model
+3. add dimension reduction
+4. better PLM: sentence-transformers/sentence-t5-xxl

documents/docs/3-visualization.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # 3 Visualization
2	+ [web app](https://huggingface.co/spaces/Adapting/literature-research-tool)

documents/docs/4-文献分析平台比较.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# 4 Other Literature Research Tools
+## 1 Citespace
+> 作者：爱学习的毛里
+> 链接：https://www.zhihu.com/question/27463829/answer/284247493
+> 来源：知乎
+> 著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。
+一、工作原理
+简单来讲，citespace主要基于“共现聚类”思想:
+1. 首先对科学文献中的信息单元进行提取
+  - 包括文献层面上的参考文献，主题层面上的**关键词**、主题词、学科、领域分类等，主体层面上的作者、机构、国家、期刊等
+2. 然后根据信息单元间的联系类型和强度进行重构，形成不同意义的网络结构
+  - 如关键词共现、作者合作、文献共被引等，
+  - 网络中的节点代表文献信息单元，连线代表节点间的联系（共现）
+3. 最后通过对节点、连线及网络结构进行测度、统计分析（聚类、突现词检测等）和可视化，发现特定学科和领域知识结构的隐含模式和规律。
+![](https://pica.zhimg.com/50/v2-b94a8061c72d6e299a059da0c1cb3813_720w.jpg?source=1940ef5c)*共现聚类思想*
+二、主要用途
+1. **<u>研究热点分析</u>**：一般利用关键词/主题词共现
+2. 研究前沿探测：共被引、耦合、共词、突现词检测都有人使用，但因为对“研究前沿”的定义尚未统一，所以方法的选择和图谱结果的解读上众说纷纭
+3. 研究演进路径分析：将时序维度与主题聚类结合，例如citespace中的时间线图和时区图
+4. 研究群体发现：一般建立作者/机构合作、作者耦合等网络，可以发现研究小团体、核心作者/机构等
+5. 学科/领域/知识交叉和流动分析：一般建立期刊/学科等的共现网络，可以研究学科之间的交叉、知识流动和融合等除分析 科学文献 外，citespace也可以用来分析 专利技术文献，用途与科学文献类似，包括技术研究热点、趋势、结构、核心专利权人或团体的识别等。
+三、工作流程
+![](https://pic1.zhimg.com/50/v2-165aa367fa07d8e46f286dfe06f0fce4_720w.jpg?source=1940ef5c)
+*摘自《引文空间分析原理与应用》*
+### 聚类算法
+CiteSpace提供的算法有3个，3个算法的名称分别是：
+- LSI/LSA: Latent Semantic Indexing/Latent Semantic Analysis 浅语义索引
+  [intro](https://www.cnblogs.com/pinard/p/6805861.html)
+- LLR: Log Likelihood Ratio 对数极大似然率
+- MI: Mutual Information 互信息
+对不同的数据，3种算法表现一样，可在实践中多做实践。
+[paper](https://readpaper.com/paper/2613897633)
+## 2 VOSviewer
+VOSviewer的处理流程与大部分的科学知识图谱类软件类似，即文件导入——信息单元抽取（如作者、关键词等）——建立共现矩阵——利用相似度计算对关系进行标准化处理——统计分析（一般描述统计+聚类）——可视化展现（布局+其它图形属性映射）
+Normalization, mapping, and clustering
+[paper](https://www.vosviewer.com/download/f-x2.pdf) (See Appendix)

documents/docs/index.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# Intro
+<!-- - [x] objective/Aim of the practical part
+- [x] tasks/ work packages,
+- [x] Timeline and Milestones
+- [x] Brief introduction of the practice partner
+- [x] Description of theoretical part and explanation of how the content of the lecture(s)/seminar(s) supports student in completing the practical part. -->
+## IDP Theme
+IDP Theme： Developing a Literature Research Tool that Automatically Search Literature and Summarize the Research Trends.
+## Objective
+In this IDP, we are going to develop a literature research tool that enables three functionalities:
+1. Automatically search the most recent literature filtered by keywords on three literature platforms: Elvsier, IEEE and Google Scholar
+2. Automatically summarize the most popular research directions and trends in  the searched literature from step 1
+3. visualize the results from step 1 and step 2
+## Timeline & Milestones & Tasks
+![](https://i.imgur.com/mDK0sc0.png)
+#### Tasks
+| Label   | Start      | End        | Duration | Description                                                                                              |
+| ------- |------------| ---------- |----------| -------------------------------------------------------------------------------------------------------- |
+| Task #1 | 15/11/2022 | 15/12/2022 | 30 days  | Implement literature search by keywords on three literature platforms: Elvsier, IEEE, and Google Scholar |
+| Task #2 | 15/12/2022 |       15/02/2023     | 60 days  |      Implement automatic summarization of research trends in the searched literature                                                                                                    |
+| Task #3 | 15/02/2022 |     15/03/2022       | 30 days  |     visualization of the tool (web app)                                                                                                     |
+| Task #4        | 01/03/2022 |    01/05/2022        | 60 days  |  write report and presentation                                                                                                        |
+## Correlation between the theoretical course and practical project
+The accompanying theory courses *Machine Learning and Optimization* or *Machine Learning for Communication* teach basic and advanced machine learning (ML) and deep learning (DL) knowledge.
+The core part of the project, in my opinion, is the automatic summarization of research trends/directions of the papers, which can be modeled as a **Topic Modeling** task in Natural Language Processing (NLP). This task requires machine learning and deep learning knowledge, such as word embeddings, transformers architecture, etc.
+Therefore, I would like to take the Machine Learning and Optimization course or Machine learning for Communication course from EI department. And I think these theory courses should be necessary for a good ML/DL basis.

documents/mkdocs.yml ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ site_name: LRT Document
2	+ theme: material
3	+