Commits · Luigi/ZeroGPU-LLM-Inference

keep debug message

37f7787

Running

Luigi commited on 1 day ago

add debug to show web resarch result

a2f07a4

Luigi commited on 1 day ago

give 1 second for web search to grab data

9ad3ffd

Luigi commited on 1 day ago

inject web search result if web search enabled

bc257ff

Luigi commited on 1 day ago

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0}

293686e

Luigi commited on 1 day ago

bugfixc: not using pipeline for response generation

939895d

Luigi commited on 1 day ago

Add original SmolLM2 135M Instruct for comparaison

423dc1a

Luigi commited on 1 day ago

Add SmolLM2-135M-Instruct-TaiwanChat

38fcc03

Luigi commited on 1 day ago

Add SmolLM2-135M TaiwanChat

0d642b7

Luigi commited on 1 day ago

default to gemma-3-4b

88a6a62

Luigi commited on 12 days ago

model repo_id typo fix

89372fa

Luigi commited on 12 days ago

enable web search by default

6235e63

Luigi commited on 12 days ago

remove tinyllama which has bad response quality

a22cf42

Luigi commited on 12 days ago

make streaming response

5ea073d

Luigi commited on 12 days ago

apply history flatten before it goint to prompt

ef361b0

Luigi commited on 12 days ago

better management on system prompt

5f6306a

Luigi commited on 12 days ago

usue chat pipeline instead of model and tokenizer individually

ac8e9cc

Luigi commited on 12 days ago

bugfix to padding-related issues

f248fec

Luigi commited on 12 days ago

add attention mask

b6b3940

Luigi commited on 12 days ago

Clean model description

4731160

Luigi commited on 12 days ago

pin torch to 2.4.0

4c6b4c5

Luigi commited on 12 days ago

enable zerogpu with decorator

c5b2897

Luigi commited on 12 days ago

Apply ZeroGPU

d181b45

Luigi commited on 12 days ago

switch to gradio version for stability reason

a703203

Luigi commited on 12 days ago

add taiwan tinyllama

794ee70

Luigi commited on 12 days ago

Improve responsiveness by asynchronously retrieving web search context

acda3f1

Luigi commited on 12 days ago

add llama 3.2 taiwan 3b

b1544e2

Luigi commited on 13 days ago

add minicpm3 4b

f5c0811

Luigi commited on 13 days ago

increase xt length to max

629495e

Luigi commited on 13 days ago

remove all moe

fafc8cb

Luigi commited on 13 days ago

remove qwen 1.5 moe

6735035

Luigi commited on 13 days ago

adjust title style

e9559bd

Luigi commited on 13 days ago

use another version of qwen 1.5 moe

96e60d6

Luigi commited on 13 days ago

add Qwen1.5-MoE

e17afaf

Luigi commited on 13 days ago

Qwen2.5-MOE-6x1.5B

5eca666

Luigi commited on 13 days ago

remove under 3b models

617be26

Luigi commited on 13 days ago

Add model caching

d33dfcd

Luigi commited on 13 days ago

UI/UX Improvement

eb215ff

Luigi commited on 13 days ago

reset timeout timer once a new token is generated

35943b1

Luigi commited on 13 days ago

open web search settgins to user

c9fd924

Luigi commited on 13 days ago

add 2 more models

f7a541f

Luigi commited on 13 days ago

apply new settings on duckduck search

d9421eb

Luigi commited on 13 days ago

tune llama paramters

20484f3

Luigi commited on 13 days ago

increase max_chars_per_result to 600

1155897

Luigi commited on 13 days ago

increase max results to 6 for better web search

0c2fe1d

Luigi commited on 13 days ago

increase ctx lenght to 2k

9ba47d1

Luigi commited on 13 days ago

increase timeout to 5min

71d28c5

Luigi commited on 13 days ago

Code simplification

248f5a7

Luigi commited on 13 days ago

Enable speculattive decoding

a7fdfe6

Luigi commited on 14 days ago

fix role disorder error in history

06a162a

Luigi commited on 14 days ago

Commit History

keep debug message 37f7787 Running

add debug to show web resarch result a2f07a4

give 1 second for web search to grab data 9ad3ffd

inject web search result if web search enabled bc257ff

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0} 293686e

bugfixc: not using pipeline for response generation 939895d

Add original SmolLM2 135M Instruct for comparaison 423dc1a

Add SmolLM2-135M-Instruct-TaiwanChat 38fcc03

Add SmolLM2-135M TaiwanChat 0d642b7

default to gemma-3-4b 88a6a62

model repo_id typo fix 89372fa

enable web search by default 6235e63

remove tinyllama which has bad response quality a22cf42

make streaming response 5ea073d

apply history flatten before it goint to prompt ef361b0

better management on system prompt 5f6306a

usue chat pipeline instead of model and tokenizer individually ac8e9cc

bugfix to padding-related issues f248fec

add attention mask b6b3940

Clean model description 4731160

pin torch to 2.4.0 4c6b4c5

enable zerogpu with decorator c5b2897

Apply ZeroGPU d181b45

switch to gradio version for stability reason a703203

add taiwan tinyllama 794ee70

Improve responsiveness by asynchronously retrieving web search context acda3f1

add llama 3.2 taiwan 3b b1544e2

add minicpm3 4b f5c0811

increase xt length to max 629495e

remove all moe fafc8cb

remove qwen 1.5 moe 6735035

adjust title style e9559bd

use another version of qwen 1.5 moe 96e60d6

add Qwen1.5-MoE e17afaf

Qwen2.5-MOE-6x1.5B 5eca666

remove under 3b models 617be26

Add model caching d33dfcd

UI/UX Improvement eb215ff

reset timeout timer once a new token is generated 35943b1

open web search settgins to user c9fd924

add 2 more models f7a541f

apply new settings on duckduck search d9421eb

tune llama paramters 20484f3

increase max_chars_per_result to 600 1155897

increase max results to 6 for better web search 0c2fe1d

increase ctx lenght to 2k 9ba47d1

increase timeout to 5min 71d28c5

Code simplification 248f5a7

Enable speculattive decoding a7fdfe6

fix role disorder error in history 06a162a

keep debug message

37f7787

Running

add debug to show web resarch result

a2f07a4

give 1 second for web search to grab data

9ad3ffd

inject web search result if web search enabled

bc257ff

refactor(app): improve streaming, background search, dtype fallback, and cleanup :contentReference[oaicite:0]{index=0}

293686e

bugfixc: not using pipeline for response generation

939895d

Add original SmolLM2 135M Instruct for comparaison

423dc1a

Add SmolLM2-135M-Instruct-TaiwanChat

38fcc03

Add SmolLM2-135M TaiwanChat

0d642b7

default to gemma-3-4b

88a6a62

model repo_id typo fix

89372fa

enable web search by default

6235e63

remove tinyllama which has bad response quality

a22cf42

make streaming response

5ea073d

apply history flatten before it goint to prompt

ef361b0

better management on system prompt

5f6306a

usue chat pipeline instead of model and tokenizer individually

ac8e9cc

bugfix to padding-related issues

f248fec

add attention mask

b6b3940

Clean model description

4731160

pin torch to 2.4.0

4c6b4c5

enable zerogpu with decorator

c5b2897

Apply ZeroGPU

d181b45

switch to gradio version for stability reason

a703203

add taiwan tinyllama

794ee70

Improve responsiveness by asynchronously retrieving web search context

acda3f1

add llama 3.2 taiwan 3b

b1544e2

add minicpm3 4b

f5c0811

increase xt length to max

629495e

remove all moe

fafc8cb

remove qwen 1.5 moe

6735035

adjust title style

e9559bd

use another version of qwen 1.5 moe

96e60d6

add Qwen1.5-MoE

e17afaf

Qwen2.5-MOE-6x1.5B

5eca666

remove under 3b models

617be26

Add model caching

d33dfcd

UI/UX Improvement

eb215ff

reset timeout timer once a new token is generated

35943b1

open web search settgins to user

c9fd924

add 2 more models

f7a541f

apply new settings on duckduck search

d9421eb

tune llama paramters

20484f3

increase max_chars_per_result to 600

1155897

increase max results to 6 for better web search

0c2fe1d

increase ctx lenght to 2k

9ba47d1

increase timeout to 5min

71d28c5

Code simplification

248f5a7

Enable speculattive decoding

a7fdfe6

fix role disorder error in history

06a162a