Spaces:
Running
Running
File size: 19,925 Bytes
9df4cc0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
<div align="center">
<img align="center" width="30%" alt="image" src="https://github.com/AI4Finance-Foundation/FinGPT/assets/31713746/e0371951-1ce1-488e-aa25-0992dafcc139">
</div>
# FinNLP: Internet-scale Financial Data
[]([https://pepy.tech/project/finnlp](https://pepy.tech/project/finnlp))
[](https://pepy.tech/project/finnlp)
[](https://www.python.org/downloads/release/python-360/)
[](https://pypi.org/project/finnlp/)

FinNLP provides a playground for all people interested in LLMs and NLP in Finance. Here we provide full pipelines for LLM training and finetuning in the field of finance.

## Ⅰ. How to Use
### 1. News
* US
``` python
# Finnhub (Yahoo Finance, Reuters, SeekingAlpha, CNBC...)
from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range
start_date = "2023-01-01"
end_date = "2023-01-03"
config = {
"use_proxy": "us_free", # use proxies to prvent ip blocking
"max_retry": 5,
"proxy_pages": 5,
"token": "YOUR_FINNHUB_TOKEN" # Available at https://finnhub.io/dashboard
}
news_downloader = Finnhub_Date_Range(config) # init
news_downloader.download_date_range_stock(start_date,end_date) # Download headers
news_downloader.gather_content() # Download contents
df = news_downloader.dataframe
selected_columns = ["headline", "content"]
df[selected_columns].head(10)
--------------------
# headline content
# 0 My 26-Stock $349k Portfolio Gets A Nice Petrob... Home\nInvesting Strategy\nPortfolio Strategy\n...
# 1 Apple’s Market Cap Slides Below $2 Trillion fo... Error
# 2 US STOCKS-Wall St starts the year with a dip; ... (For a Reuters live blog on U.S., UK and Europ...
# 3 Buy 4 January Dogs Of The Dow, Watch 4 More Home\nDividends\nDividend Quick Picks\nBuy 4 J...
# 4 Apple's stock market value falls below $2 tril... Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 5 CORRECTED-UPDATE 1-Apple's stock market value ... Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 6 Apple Stock Falls Amid Report Of Product Order... Apple stock got off to a slow start in 2023 as...
# 7 US STOCKS-Wall St starts the year with a dip; ... Summary\nCompanies\nTesla shares plunge on Q4 ...
# 8 More than $1 trillion wiped off value of Apple... apple store\nMore than $1 trillion has been wi...
# 9 McLean's Iridium inks agreement to put its sat... The company hasn't named its partner, but it's...
```
* China
``` python
# Sina Finance
from finnlp.data_sources.news.sina_finance_date_range import Sina_Finance_Date_Range
start_date = "2016-01-01"
end_date = "2016-01-02"
config = {
"use_proxy": "china_free", # use proxies to prvent ip blocking
"max_retry": 5,
"proxy_pages": 5,
}
news_downloader = Sina_Finance_Date_Range(config) # init
news_downloader.download_date_range_all(start_date,end_date) # Download headers
news_downloader.gather_content() # Download contents
df = news_downloader.dataframe
selected_columns = ["title", "content"]
df[selected_columns].head(10)
--------------------
# title content
# 0 分析师:伊朗重回国际原油市场无法阻止 新浪美股讯 北京时间1月1日晚CNBC称,加拿大皇家银行(RBC)分析师Helima Cro...
# 1 FAA:波音767的逃生扶梯存在缺陷 新浪美股讯 北京时间1日晚,美国联邦航空局(FAA)要求航空公司对波音767机型的救生扶梯进...
# 2 非制造业新订单指数创新高 需求回升力度明显 中新社北京1月1日电 (记者 刘长忠)记者1日从中国物流与采购联合会获悉,在最新发布的201...
# 3 雷曼兄弟针对大和证券提起索赔诉讼 新浪美股讯 北京时间1日下午共同社称,2008年破产的美国金融巨头雷曼兄弟公司的清算法人日前...
# 4 国内钢铁PMI有所回升 钢市低迷形势有所改善 新华社上海1月1日专电(记者李荣)据中物联钢铁物流专业委员会1日发布的指数报告,2015年1...
# 5 马息岭凸显朝鲜旅游体育战略 新浪美股北京时间1日讯 三位单板滑雪手将成为最早拜访马息岭滑雪场的西方专业运动员,他们本月就...
# 6 五洲船舶破产清算 近十年来首现国有船厂倒闭 (原标题:中国首家国有船厂破产倒闭)\n低迷的中国造船市场,多年来首次出现国有船厂破产清算的...
# 7 过半城市房价环比上涨 百城住宅均价加速升温 资料图。中新社记者 武俊杰 摄\n中新社北京1月1日电 (记者 庞无忌)中国房地产市场在20...
# 8 经济学人:巴西病根到底在哪里 新浪美股北京时间1日讯 原本,巴西人是该高高兴兴迎接2016年的。8月间,里约热内卢将举办南...
# 9 中国首家国有船厂破产倒闭:五洲船舶目前已停工 低迷的中国造船市场,多年来首次出现国有船厂破产清算的一幕。浙江海运集团旗下的五洲船舶修造公司...
# Eastmoney 东方财富
from finnlp.data_sources.news.eastmoney_streaming import Eastmoney_Streaming
pages = 3
stock = "600519"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
}
news_downloader = Eastmoney_Streaming(config)
news_downloader.download_streaming_stock(stock,pages)
df = news_downloader.dataframe
selected_columns = ["title", "create time"]
df[selected_columns].head(10)
--------------------
# title create time
# 0 茅台2022年报的12个小秘密 04-09 19:40
# 1 东北证券维持贵州茅台买入评级 预计2023年净利润同比 04-09 11:24
# 2 贵州茅台:融资余额169.34亿元,创近一年新低(04-07 04-08 07:30
# 3 贵州茅台:融资净买入1248.48万元,融资余额169.79亿 04-07 07:28
# 4 贵州茅台公益基金会正式成立 04-06 12:29
# 5 贵州茅台04月04日获沪股通增持19.55万股 04-05 07:48
# 6 贵州茅台:融资余额169.66亿元,创近一年新低(04-04 04-05 07:30
# 7 4月4日北向资金最新动向(附十大成交股) 04-04 18:48
# 8 大宗交易:贵州茅台成交235.9万元,成交价1814.59元( 04-04 17:21
# 9 第一上海证券维持贵州茅台买入评级 目标价2428.8元 04-04 09:30
```
### 2. Social Media
* US
``` python
# Stocktwits
from finnlp.data_sources.social_media.stocktwits_streaming import Stocktwits_Streaming
pages = 3
stock = "AAPL"
config = {
"use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
downloader = Stocktwits_Streaming(config)
downloader.download_date_range_stock(stock, pages)
selected_columns = ["created_at", "body"]
downloader.dataframe[selected_columns].head(10)
--------------------
# created_at body
# 0 2023-04-07T15:24:22Z NANCY PELOSI JUST BOUGHT 10,000 SHARES OF APPL...
# 1 2023-04-07T15:17:43Z $AAPL $SPY \n \nhttps://amp.scmp.com/news/chi...
# 2 2023-04-07T15:17:25Z $AAPL $GOOG $AMZN I took a Trump today. \n\nH...
# 3 2023-04-07T15:16:54Z $SPY $AAPL will take this baby down, time for ...
# 4 2023-04-07T15:11:37Z $SPY $3T it ALREADY DID - look at the pre-COV...
# 5 2023-04-07T15:10:29Z $AAPL $QQQ $STUDY We are on to the next one! A...
# 6 2023-04-07T15:06:00Z $AAPL was analyzed by 48 analysts. The buy con...
# 7 2023-04-07T14:54:29Z $AAPL both retiring. \n \nCraig....
# 8 2023-04-07T14:40:06Z $SPY $QQQ $TSLA $AAPL SPY 500 HAS STARTED🚀😍 BI...
# 9 2023-04-07T14:38:57Z Nancy 🩵 (Tim) $AAPL
```
``` python
# Reddit Wallstreetbets
from finnlp.data_sources.social_media.reddit_streaming import Reddit_Streaming
pages = 3
config = {
"use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
downloader = Reddit_Streaming(config)
downloader.download_streaming_all(pages)
selected_columns = ["created", "title"]
downloader.dataframe[selected_columns].head(10)
--------------------
# created title
# 0 2023-04-07 15:39:34 Y’all making me feel like spooderman
# 1 2022-12-21 04:09:42 Do you track your investments in a spreadsheet...
# 2 2022-12-21 04:09:42 Do you track your investments in a spreadsheet...
# 3 2023-04-07 15:29:23 Can a Blackberry holder get some help 🥺
# 4 2023-04-07 14:49:55 The week of CPI and FOMC Minutes… 4-6-23 SPY/ ...
# 5 2023-04-07 14:19:22 Well let’s hope your job likes you, thanks Jerome
# 6 2023-04-07 14:06:32 Does anyone else feel an overwhelming sense of...
# 7 2023-04-07 13:47:59 Watermarked Jesus explains the market being cl...
# 8 2023-04-07 13:26:23 Jobs report shows 236,000 gain in March. Hot l...
# 9 2023-04-07 13:07:15 The recession is over! Let's buy more stocks!
```
* China (Weibo)
``` python
# Weibo
from finnlp.data_sources.social_media.weibo_date_range import Weibo_Date_Range
start_date = "2016-01-01"
end_date = "2016-01-02"
stock = "茅台"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
"cookies": "Your_Login_Cookies",
}
downloader = Weibo_Date_Range(config)
downloader.download_date_range_stock(start_date, end_date, stock = stock)
df = downloader.dataframe
df = df.drop_duplicates()
selected_columns = ["date", "content"]
df[selected_columns].head(10)
--------------------
# date content
# 0 2016-01-01 #舆论之锤#唯品会发声明证实销售假茅台-手机腾讯网O网页链接分享来自浏览器!
# 2 2016-01-01 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原...
# 6 2016-01-01 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原...
# 17 2016-01-01 开心,今天喝了两斤酒(茅台+扎二)三个人,开心!
# 18 2016-01-01 一家专卖假货的网站某宝,你该学学了!//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品...
# 19 2016-01-01 一家专卖假货的网站//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品会售假茅台:供货商...
# 20 2016-01-01 前几天说了几点不看好茅台的理由,今年过节喝点茅台支持下,个人口感,茅台比小五好喝,茅台依然是...
# 21 2016-01-01 老杜酱酒已到货,从明天起正式在甘肃武威开卖。可以不相信我说的话,但一定不要怀疑@杜子建的为人...
# 22 2016-01-01 【唯品会售假茅台后续:供货商被刑拘顾客获十倍补偿】此前,有网友投诉其在唯品会购买的茅台酒质量...
# 23 2016-01-01 唯品会卖假茅台,供货商被刑拘,买家获十倍补偿8888元|此前,有网友在网络论坛发贴(唯品会宣...
```
### 3. Company Announcement
* US
``` python
# SEC
from finnlp.data_sources.company_announcement.sec import SEC_Announcement
start_date = "2020-01-01"
end_date = "2020-06-01"
stock = "AAPL"
config = {
"use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 3,
}
downloader = SEC_Announcement(config)
downloader.download_date_range_stock(start_date, end_date, stock = stock)
selected_columns = ["file_date", "display_names", "content"]
downloader.dataframe[selected_columns].head(10)
--------------------
# file_date display_names content
# 0 2020-05-12 [KONDO CHRIS (CIK 0001631982), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 1 2020-04-30 [JUNG ANDREA (CIK 0001051401), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 2 2020-04-17 [O'BRIEN DEIRDRE (CIK 0001767094), Apple Inc.... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 3 2020-04-17 [KONDO CHRIS (CIK 0001631982), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 4 2020-04-09 [Maestri Luca (CIK 0001513362), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 5 2020-04-03 [WILLIAMS JEFFREY E (CIK 0001496686), Apple I... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 6 2020-04-03 [Maestri Luca (CIK 0001513362), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 7 2020-02-28 [WAGNER SUSAN (CIK 0001059235), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 8 2020-02-28 [LEVINSON ARTHUR D (CIK 0001214128), Apple In... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
# 9 2020-02-28 [JUNG ANDREA (CIK 0001051401), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES...
```
* China
``` python
# Juchao
from finnlp.data_sources.company_announcement.juchao import Juchao_Announcement
start_date = "2020-01-01"
end_date = "2020-06-01"
stock = "000001"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 3,
}
downloader = Juchao_Announcement(config)
downloader.download_date_range_stock(start_date, end_date, stock = stock, get_content = True, delate_pdf = True)
selected_columns = ["announcementTime", "shortTitle","Content"]
downloader.dataframe[selected_columns].head(10)
--------------------
# announcementTime shortTitle Content
# 0 2020-05-27 关于2020年第一期小型微型企业贷款专项金融债券发行完毕的公告 证券代码: 000001 证券简称:平安银行 ...
# 1 2020-05-22 2019年年度权益分派实施公告 1 证券代码: 000001 证券简称:平安银行 ...
# 2 2020-05-20 关于获准发行小微企业贷款专项金融债券的公告 证券代码: 000001 证券简称:平安银行 ...
# 3 2020-05-16 监事会决议公告 1 证券代码: 000001 证券简称: 平安银行 ...
# 4 2020-05-15 2019年年度股东大会决议公告 1 证券代码: 000001 证券简称:平安银行 ...
# 5 2020-05-15 2019年年度股东大会的法律意见书 北京总部 电话 : (86 -10) 8519 -1300 传真 : (86 -10...
# 6 2020-04-30 中信证券股份有限公司、平安证券股份有限公司关于公司关联交易有关事项的核查意见 1 中信证券股份有限公司 、平安证券股份有限 公司 关于平安银行股份有限公司 关联交易 有...
# 7 2020-04-30 独立董事独立意见 1 平安银行股份有限公司独立董事独立意见 根据《关于在上市公司建立独立董事制度的指导...
# 8 2020-04-30 关联交易公告 1 证券代码: 000001 证券简称:平安银行 ...
# 9 2020-04-21 2020年第一季度报告全文 证券代码: 000001 证券简称:平安银行 ...
```
## Ⅱ. Data Sources
### 1. News
| Platform | Data Type | Related Market | Specified Company | Range Type | Limits | Support |
| :----------------------------------------------------------: | :--------: | :------------: | :----------------------------------------------------------: | :---------------: | :-------------------: | ------------------------------------------------------------ |
| Yahoo | Financial News | US Stocks | √ | Date Range | N/A | √ |
| Reuters | General News | US Stocks | × | Date Range | N/A | Soon |
| Seeking Alpha | Financial News | US Stocks | √ | Streaming | N/A | √ |
| Sina | Financial News | CN Stocks | × | Date Range | N/A | √ |
| Eastmoney | Financial News | CN Stocks | √ | Date Range | N/A | √ |
| Yicai | Financial News | CN Stocks | √ | Date Range | N/A | Soon |
| CCTV | General News | CN Stocks | × | Date Range | N/A | √ |
| US Mainstream Media | Financial News | US Stocks | √ | Date Range | Account (Free) | √ |
| CN Mainstream Media | Financial News | CN Stocks | × | Date Range | Account (¥500/year) | √ |
### 2. Social Media
| Platform | Data Type | Related Market | Specified Company | Range Type | Source Type | Limits | Support |
| :---------------------: | :-------: | :------------: | :---------------: | :--------: | :---------: | :-----: | :-----: |
| Twitter | Tweets | US Stocks | √ | Date Range | Official | N/A | √ |
| Twitter | Sentiment | US Stocks | √ | Date Range | Third Party | N/A | √ |
| StockTwits | Tweets | US Stocks | √ | Lastest | Official | N/A | √ |
| Reddit (wallstreetbets) | Threads | US Stocks | × | Lastest | Official | N/A | √ |
| Reddit | Sentiment | US Stocks | √ | Date Range | Third Party | N/A | √ |
| Weibo | Tweets | CN Stocks | √ | Date Range | Official | Cookies | √ |
| Weibo | Tweets | CN Stocks | √ | Lastest | Official | N/A | √ |
### 3. Company Announcement
| Platform | Data Type | Related Market | Specified Company | Range Type | Source Type | Limits | Support |
| :-----------------------: | :-------: | :------------: | :---------------: | :--------: | :---------: | :----: | :-----: |
| Juchao (Official Website) | Text | CN Stocks | √ | Date Range | Official | N/A | √ |
| SEC (Official Website) | Text | US Stocks | √ | Date Range | Official | N/A | √ |
| Sina | Text | CN Stocks | √ | Lastest | Third Party | N/A | √ |
### 4. Data Sets
| Data Source | Type | Stocks | Dates | Available |
| :--------------: | :----: | :----: | :-------: | :--------------: |
| [AShare](https://github.com/JinanZou/Astock) | News | 3680 | 2018-07-01 to 2021-11-30 | √ |
| [stocknet-dataset](https://github.com/yumoxu/stocknet-dataset) | Tweets | 87 | 2014-01-02 to 2015-12-30 | √ |
| [CHRNN](https://github.com/wuhuizhe/CHRNN) | Tweets | 38 | 2017-01-03 to 2017-12-28 | √ |
## Ⅲ. Large Language Models (LLMs)
* [ChatGPT (GPT 3.5)](https://openai.com/blog/chatgpt)
* [GPT 4.0](https://openai.com/research/gpt-4)
* [ChatGLM](https://github.com/THUDM/ChatGLM-6B)
* [PaLM](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)
* [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
* [FinBERT](https://github.com/yya518/FinBERT)
* [Hugging Face](https://huggingface.co/)
## LICENSE
MIT License
**Disclaimer: We are sharing codes for academic purposes under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.**
|