Spaces:
Running
Running
<div align="center"> | |
<img align="center" width="30%" alt="image" src="https://github.com/AI4Finance-Foundation/FinGPT/assets/31713746/e0371951-1ce1-488e-aa25-0992dafcc139"> | |
</div> | |
# FinNLP: Internet-scale Financial Data | |
[]([https://pepy.tech/project/finnlp](https://pepy.tech/project/finnlp)) | |
[](https://pepy.tech/project/finnlp) | |
[](https://www.python.org/downloads/release/python-360/) | |
[](https://pypi.org/project/finnlp/) | |
 | |
FinNLP provides a playground for all people interested in LLMs and NLP in Finance. Here we provide full pipelines for LLM training and finetuning in the field of finance. | |
 | |
## Ⅰ. How to Use | |
### 1. News | |
* US | |
``` python | |
# Finnhub (Yahoo Finance, Reuters, SeekingAlpha, CNBC...) | |
from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range | |
start_date = "2023-01-01" | |
end_date = "2023-01-03" | |
config = { | |
"use_proxy": "us_free", # use proxies to prvent ip blocking | |
"max_retry": 5, | |
"proxy_pages": 5, | |
"token": "YOUR_FINNHUB_TOKEN" # Available at https://finnhub.io/dashboard | |
} | |
news_downloader = Finnhub_Date_Range(config) # init | |
news_downloader.download_date_range_stock(start_date,end_date) # Download headers | |
news_downloader.gather_content() # Download contents | |
df = news_downloader.dataframe | |
selected_columns = ["headline", "content"] | |
df[selected_columns].head(10) | |
-------------------- | |
# headline content | |
# 0 My 26-Stock $349k Portfolio Gets A Nice Petrob... Home\nInvesting Strategy\nPortfolio Strategy\n... | |
# 1 Apple’s Market Cap Slides Below $2 Trillion fo... Error | |
# 2 US STOCKS-Wall St starts the year with a dip; ... (For a Reuters live blog on U.S., UK and Europ... | |
# 3 Buy 4 January Dogs Of The Dow, Watch 4 More Home\nDividends\nDividend Quick Picks\nBuy 4 J... | |
# 4 Apple's stock market value falls below $2 tril... Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto... | |
# 5 CORRECTED-UPDATE 1-Apple's stock market value ... Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto... | |
# 6 Apple Stock Falls Amid Report Of Product Order... Apple stock got off to a slow start in 2023 as... | |
# 7 US STOCKS-Wall St starts the year with a dip; ... Summary\nCompanies\nTesla shares plunge on Q4 ... | |
# 8 More than $1 trillion wiped off value of Apple... apple store\nMore than $1 trillion has been wi... | |
# 9 McLean's Iridium inks agreement to put its sat... The company hasn't named its partner, but it's... | |
``` | |
* China | |
``` python | |
# Sina Finance | |
from finnlp.data_sources.news.sina_finance_date_range import Sina_Finance_Date_Range | |
start_date = "2016-01-01" | |
end_date = "2016-01-02" | |
config = { | |
"use_proxy": "china_free", # use proxies to prvent ip blocking | |
"max_retry": 5, | |
"proxy_pages": 5, | |
} | |
news_downloader = Sina_Finance_Date_Range(config) # init | |
news_downloader.download_date_range_all(start_date,end_date) # Download headers | |
news_downloader.gather_content() # Download contents | |
df = news_downloader.dataframe | |
selected_columns = ["title", "content"] | |
df[selected_columns].head(10) | |
-------------------- | |
# title content | |
# 0 分析师:伊朗重回国际原油市场无法阻止 新浪美股讯 北京时间1月1日晚CNBC称,加拿大皇家银行(RBC)分析师Helima Cro... | |
# 1 FAA:波音767的逃生扶梯存在缺陷 新浪美股讯 北京时间1日晚,美国联邦航空局(FAA)要求航空公司对波音767机型的救生扶梯进... | |
# 2 非制造业新订单指数创新高 需求回升力度明显 中新社北京1月1日电 (记者 刘长忠)记者1日从中国物流与采购联合会获悉,在最新发布的201... | |
# 3 雷曼兄弟针对大和证券提起索赔诉讼 新浪美股讯 北京时间1日下午共同社称,2008年破产的美国金融巨头雷曼兄弟公司的清算法人日前... | |
# 4 国内钢铁PMI有所回升 钢市低迷形势有所改善 新华社上海1月1日专电(记者李荣)据中物联钢铁物流专业委员会1日发布的指数报告,2015年1... | |
# 5 马息岭凸显朝鲜旅游体育战略 新浪美股北京时间1日讯 三位单板滑雪手将成为最早拜访马息岭滑雪场的西方专业运动员,他们本月就... | |
# 6 五洲船舶破产清算 近十年来首现国有船厂倒闭 (原标题:中国首家国有船厂破产倒闭)\n低迷的中国造船市场,多年来首次出现国有船厂破产清算的... | |
# 7 过半城市房价环比上涨 百城住宅均价加速升温 资料图。中新社记者 武俊杰 摄\n中新社北京1月1日电 (记者 庞无忌)中国房地产市场在20... | |
# 8 经济学人:巴西病根到底在哪里 新浪美股北京时间1日讯 原本,巴西人是该高高兴兴迎接2016年的。8月间,里约热内卢将举办南... | |
# 9 中国首家国有船厂破产倒闭:五洲船舶目前已停工 低迷的中国造船市场,多年来首次出现国有船厂破产清算的一幕。浙江海运集团旗下的五洲船舶修造公司... | |
# Eastmoney 东方财富 | |
from finnlp.data_sources.news.eastmoney_streaming import Eastmoney_Streaming | |
pages = 3 | |
stock = "600519" | |
config = { | |
"use_proxy": "china_free", | |
"max_retry": 5, | |
"proxy_pages": 5, | |
} | |
news_downloader = Eastmoney_Streaming(config) | |
news_downloader.download_streaming_stock(stock,pages) | |
df = news_downloader.dataframe | |
selected_columns = ["title", "create time"] | |
df[selected_columns].head(10) | |
-------------------- | |
# title create time | |
# 0 茅台2022年报的12个小秘密 04-09 19:40 | |
# 1 东北证券维持贵州茅台买入评级 预计2023年净利润同比 04-09 11:24 | |
# 2 贵州茅台:融资余额169.34亿元,创近一年新低(04-07 04-08 07:30 | |
# 3 贵州茅台:融资净买入1248.48万元,融资余额169.79亿 04-07 07:28 | |
# 4 贵州茅台公益基金会正式成立 04-06 12:29 | |
# 5 贵州茅台04月04日获沪股通增持19.55万股 04-05 07:48 | |
# 6 贵州茅台:融资余额169.66亿元,创近一年新低(04-04 04-05 07:30 | |
# 7 4月4日北向资金最新动向(附十大成交股) 04-04 18:48 | |
# 8 大宗交易:贵州茅台成交235.9万元,成交价1814.59元( 04-04 17:21 | |
# 9 第一上海证券维持贵州茅台买入评级 目标价2428.8元 04-04 09:30 | |
``` | |
### 2. Social Media | |
* US | |
``` python | |
# Stocktwits | |
from finnlp.data_sources.social_media.stocktwits_streaming import Stocktwits_Streaming | |
pages = 3 | |
stock = "AAPL" | |
config = { | |
"use_proxy": "us_free", | |
"max_retry": 5, | |
"proxy_pages": 2, | |
} | |
downloader = Stocktwits_Streaming(config) | |
downloader.download_date_range_stock(stock, pages) | |
selected_columns = ["created_at", "body"] | |
downloader.dataframe[selected_columns].head(10) | |
-------------------- | |
# created_at body | |
# 0 2023-04-07T15:24:22Z NANCY PELOSI JUST BOUGHT 10,000 SHARES OF APPL... | |
# 1 2023-04-07T15:17:43Z $AAPL $SPY \n \nhttps://amp.scmp.com/news/chi... | |
# 2 2023-04-07T15:17:25Z $AAPL $GOOG $AMZN I took a Trump today. \n\nH... | |
# 3 2023-04-07T15:16:54Z $SPY $AAPL will take this baby down, time for ... | |
# 4 2023-04-07T15:11:37Z $SPY $3T it ALREADY DID - look at the pre-COV... | |
# 5 2023-04-07T15:10:29Z $AAPL $QQQ $STUDY We are on to the next one! A... | |
# 6 2023-04-07T15:06:00Z $AAPL was analyzed by 48 analysts. The buy con... | |
# 7 2023-04-07T14:54:29Z $AAPL both retiring. \n \nCraig.... | |
# 8 2023-04-07T14:40:06Z $SPY $QQQ $TSLA $AAPL SPY 500 HAS STARTED🚀😍 BI... | |
# 9 2023-04-07T14:38:57Z Nancy 🩵 (Tim) $AAPL | |
``` | |
``` python | |
# Reddit Wallstreetbets | |
from finnlp.data_sources.social_media.reddit_streaming import Reddit_Streaming | |
pages = 3 | |
config = { | |
"use_proxy": "us_free", | |
"max_retry": 5, | |
"proxy_pages": 2, | |
} | |
downloader = Reddit_Streaming(config) | |
downloader.download_streaming_all(pages) | |
selected_columns = ["created", "title"] | |
downloader.dataframe[selected_columns].head(10) | |
-------------------- | |
# created title | |
# 0 2023-04-07 15:39:34 Y’all making me feel like spooderman | |
# 1 2022-12-21 04:09:42 Do you track your investments in a spreadsheet... | |
# 2 2022-12-21 04:09:42 Do you track your investments in a spreadsheet... | |
# 3 2023-04-07 15:29:23 Can a Blackberry holder get some help 🥺 | |
# 4 2023-04-07 14:49:55 The week of CPI and FOMC Minutes… 4-6-23 SPY/ ... | |
# 5 2023-04-07 14:19:22 Well let’s hope your job likes you, thanks Jerome | |
# 6 2023-04-07 14:06:32 Does anyone else feel an overwhelming sense of... | |
# 7 2023-04-07 13:47:59 Watermarked Jesus explains the market being cl... | |
# 8 2023-04-07 13:26:23 Jobs report shows 236,000 gain in March. Hot l... | |
# 9 2023-04-07 13:07:15 The recession is over! Let's buy more stocks! | |
``` | |
* China (Weibo) | |
``` python | |
from finnlp.data_sources.social_media.weibo_date_range import Weibo_Date_Range | |
start_date = "2016-01-01" | |
end_date = "2016-01-02" | |
stock = "茅台" | |
config = { | |
"use_proxy": "china_free", | |
"max_retry": 5, | |
"proxy_pages": 5, | |
"cookies": "Your_Login_Cookies", | |
} | |
downloader = Weibo_Date_Range(config) | |
downloader.download_date_range_stock(start_date, end_date, stock = stock) | |
df = downloader.dataframe | |
df = df.drop_duplicates() | |
selected_columns = ["date", "content"] | |
df[selected_columns].head(10) | |
-------------------- | |
# date content | |
# 0 2016-01-01 #舆论之锤#唯品会发声明证实销售假茅台-手机腾讯网O网页链接分享来自浏览器! | |
# 2 2016-01-01 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原... | |
# 6 2016-01-01 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原... | |
# 17 2016-01-01 开心,今天喝了两斤酒(茅台+扎二)三个人,开心! | |
# 18 2016-01-01 一家专卖假货的网站某宝,你该学学了!//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品... | |
# 19 2016-01-01 一家专卖假货的网站//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品会售假茅台:供货商... | |
# 20 2016-01-01 前几天说了几点不看好茅台的理由,今年过节喝点茅台支持下,个人口感,茅台比小五好喝,茅台依然是... | |
# 21 2016-01-01 老杜酱酒已到货,从明天起正式在甘肃武威开卖。可以不相信我说的话,但一定不要怀疑@杜子建的为人... | |
# 22 2016-01-01 【唯品会售假茅台后续:供货商被刑拘顾客获十倍补偿】此前,有网友投诉其在唯品会购买的茅台酒质量... | |
# 23 2016-01-01 唯品会卖假茅台,供货商被刑拘,买家获十倍补偿8888元|此前,有网友在网络论坛发贴(唯品会宣... | |
``` | |
### 3. Company Announcement | |
* US | |
``` python | |
# SEC | |
from finnlp.data_sources.company_announcement.sec import SEC_Announcement | |
start_date = "2020-01-01" | |
end_date = "2020-06-01" | |
stock = "AAPL" | |
config = { | |
"use_proxy": "us_free", | |
"max_retry": 5, | |
"proxy_pages": 3, | |
} | |
downloader = SEC_Announcement(config) | |
downloader.download_date_range_stock(start_date, end_date, stock = stock) | |
selected_columns = ["file_date", "display_names", "content"] | |
downloader.dataframe[selected_columns].head(10) | |
-------------------- | |
# file_date display_names content | |
# 0 2020-05-12 [KONDO CHRIS (CIK 0001631982), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 1 2020-04-30 [JUNG ANDREA (CIK 0001051401), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 2 2020-04-17 [O'BRIEN DEIRDRE (CIK 0001767094), Apple Inc.... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 3 2020-04-17 [KONDO CHRIS (CIK 0001631982), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 4 2020-04-09 [Maestri Luca (CIK 0001513362), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 5 2020-04-03 [WILLIAMS JEFFREY E (CIK 0001496686), Apple I... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 6 2020-04-03 [Maestri Luca (CIK 0001513362), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 7 2020-02-28 [WAGNER SUSAN (CIK 0001059235), Apple Inc. (... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 8 2020-02-28 [LEVINSON ARTHUR D (CIK 0001214128), Apple In... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
# 9 2020-02-28 [JUNG ANDREA (CIK 0001051401), Apple Inc. (A... SEC Form 4 \n FORM 4UNITED STATES SECURITIES... | |
``` | |
* China | |
``` python | |
# Juchao | |
from finnlp.data_sources.company_announcement.juchao import Juchao_Announcement | |
start_date = "2020-01-01" | |
end_date = "2020-06-01" | |
stock = "000001" | |
config = { | |
"use_proxy": "china_free", | |
"max_retry": 5, | |
"proxy_pages": 3, | |
} | |
downloader = Juchao_Announcement(config) | |
downloader.download_date_range_stock(start_date, end_date, stock = stock, get_content = True, delate_pdf = True) | |
selected_columns = ["announcementTime", "shortTitle","Content"] | |
downloader.dataframe[selected_columns].head(10) | |
-------------------- | |
# announcementTime shortTitle Content | |
# 0 2020-05-27 关于2020年第一期小型微型企业贷款专项金融债券发行完毕的公告 证券代码: 000001 证券简称:平安银行 ... | |
# 1 2020-05-22 2019年年度权益分派实施公告 1 证券代码: 000001 证券简称:平安银行 ... | |
# 2 2020-05-20 关于获准发行小微企业贷款专项金融债券的公告 证券代码: 000001 证券简称:平安银行 ... | |
# 3 2020-05-16 监事会决议公告 1 证券代码: 000001 证券简称: 平安银行 ... | |
# 4 2020-05-15 2019年年度股东大会决议公告 1 证券代码: 000001 证券简称:平安银行 ... | |
# 5 2020-05-15 2019年年度股东大会的法律意见书 北京总部 电话 : (86 -10) 8519 -1300 传真 : (86 -10... | |
# 6 2020-04-30 中信证券股份有限公司、平安证券股份有限公司关于公司关联交易有关事项的核查意见 1 中信证券股份有限公司 、平安证券股份有限 公司 关于平安银行股份有限公司 关联交易 有... | |
# 7 2020-04-30 独立董事独立意见 1 平安银行股份有限公司独立董事独立意见 根据《关于在上市公司建立独立董事制度的指导... | |
# 8 2020-04-30 关联交易公告 1 证券代码: 000001 证券简称:平安银行 ... | |
# 9 2020-04-21 2020年第一季度报告全文 证券代码: 000001 证券简称:平安银行 ... | |
``` | |
## Ⅱ. Data Sources | |
### 1. News | |
| Platform | Data Type | Related Market | Specified Company | Range Type | Limits | Support | | |
| :----------------------------------------------------------: | :--------: | :------------: | :----------------------------------------------------------: | :---------------: | :-------------------: | ------------------------------------------------------------ | | |
| Yahoo | Financial News | US Stocks | √ | Date Range | N/A | √ | | |
| Reuters | General News | US Stocks | × | Date Range | N/A | Soon | | |
| Seeking Alpha | Financial News | US Stocks | √ | Streaming | N/A | √ | | |
| Sina | Financial News | CN Stocks | × | Date Range | N/A | √ | | |
| Eastmoney | Financial News | CN Stocks | √ | Date Range | N/A | √ | | |
| Yicai | Financial News | CN Stocks | √ | Date Range | N/A | Soon | | |
| CCTV | General News | CN Stocks | × | Date Range | N/A | √ | | |
| US Mainstream Media | Financial News | US Stocks | √ | Date Range | Account (Free) | √ | | |
| CN Mainstream Media | Financial News | CN Stocks | × | Date Range | Account (¥500/year) | √ | | |
### 2. Social Media | |
| Platform | Data Type | Related Market | Specified Company | Range Type | Source Type | Limits | Support | | |
| :---------------------: | :-------: | :------------: | :---------------: | :--------: | :---------: | :-----: | :-----: | | |
| Twitter | Tweets | US Stocks | √ | Date Range | Official | N/A | √ | | |
| Twitter | Sentiment | US Stocks | √ | Date Range | Third Party | N/A | √ | | |
| StockTwits | Tweets | US Stocks | √ | Lastest | Official | N/A | √ | | |
| Reddit (wallstreetbets) | Threads | US Stocks | × | Lastest | Official | N/A | √ | | |
| Reddit | Sentiment | US Stocks | √ | Date Range | Third Party | N/A | √ | | |
| Weibo | Tweets | CN Stocks | √ | Date Range | Official | Cookies | √ | | |
| Weibo | Tweets | CN Stocks | √ | Lastest | Official | N/A | √ | | |
### 3. Company Announcement | |
| Platform | Data Type | Related Market | Specified Company | Range Type | Source Type | Limits | Support | | |
| :-----------------------: | :-------: | :------------: | :---------------: | :--------: | :---------: | :----: | :-----: | | |
| Juchao (Official Website) | Text | CN Stocks | √ | Date Range | Official | N/A | √ | | |
| SEC (Official Website) | Text | US Stocks | √ | Date Range | Official | N/A | √ | | |
| Sina | Text | CN Stocks | √ | Lastest | Third Party | N/A | √ | | |
### 4. Data Sets | |
| Data Source | Type | Stocks | Dates | Available | | |
| :--------------: | :----: | :----: | :-------: | :--------------: | | |
| [AShare](https://github.com/JinanZou/Astock) | News | 3680 | 2018-07-01 to 2021-11-30 | √ | | |
| [stocknet-dataset](https://github.com/yumoxu/stocknet-dataset) | Tweets | 87 | 2014-01-02 to 2015-12-30 | √ | | |
| [CHRNN](https://github.com/wuhuizhe/CHRNN) | Tweets | 38 | 2017-01-03 to 2017-12-28 | √ | | |
## Ⅲ. Large Language Models (LLMs) | |
* [ChatGPT (GPT 3.5)](https://openai.com/blog/chatgpt) | |
* [GPT 4.0](https://openai.com/research/gpt-4) | |
* [ChatGLM](https://github.com/THUDM/ChatGLM-6B) | |
* [PaLM](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html) | |
* [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) | |
* [FinBERT](https://github.com/yya518/FinBERT) | |
* [Hugging Face](https://huggingface.co/) | |
## LICENSE | |
MIT License | |
**Disclaimer: We are sharing codes for academic purposes under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.** | |