All models tested with ALLM(AnythingLLM) with LM as server

They work more or less (sometimes the results are more truthful if the “chat with document only” option is used)

→ some models can not hande large TXT files (maybe only 200pages - hints below)
My short impression:

nomic-embed-text
mxbai-embed-large
mug-b-1.6
Ger-RAG-BGE-M3 (german)

Working well, all other its up to you!

Short hints for using:
Set your (Max Tokens)context-lenght 16000t main-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) usual 14, but in ALLM its cutting all in 1024 character parts, so aprox two times or bit more ~30!

-> Ok what that mean!
You can receive 14-snippets a 1024t (14336t) from your document ~10000words and 1600t left for the answer ~1000words (2 pages)
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ...

8000t (~6000words) ~0.8GB VRAM usage
16000t (~12000words) ~1.5GB VRAM usage
32000t (~24000words) ~3GB VRAM usage

...
How embedding and search works:
You have a txt/pdf file maybe 90000words(~300pages). You ask the model lets say "what is described in chapter called XYZ in relation to person ZYX". Now it searches for keywords or similar semantic terms in the document. if it has found them, lets say word and meaning around “XYZ and ZYX” , now a piece of text 1024token around this word “XYZ/ZYX” is cut out at this point. This text snippet is then used for your answer. If, for example, the word “XYZ” occurs 100 times in one file, not all 100 are found (usually only 4 to 32 snippet are used)
A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck.
If the documents small like 10-20 Pages, its better you copy the whole text inside the prompt.
...
Nevertheless, the main model is also important! especially to deal with the context length and I don't mean just the theoretical number you can set. Some models can handle 128k tokens, but even with 16k input the response with the same snippets as input is worse than with other models.

Important -> The Systemprompt (an example):
You are a helpful assistant who provides an overview of ... under the aspects of ... . You use attached excerpts from the collection to generate your answers! Weight each individual excerpt in order, with the most important excerpts at the top and the less important ones further down. The context of the entire article should not be given too much weight. Answer the user's question! After your answer, briefly explain why you included excerpts (1 to X) in your response and justify briefly if you considered some of them unimportant!
(change it for your needs, this example works well when I consult a book about a person and a term related to them)

usual models like (works):
llama3.1, llama3.2, qwen2.5, deepseek-r1-distill, SauerkrautLM-Nemo(german) ...
(llama3 or phi3.5 are not working well)

btw. Jinja templates very new ... the usual templates with new/usual models are fine, but merged models have a lot of optimization potential (but dont ask me iam not a coder)

...

on discord (sevenof9) ...

(ALL licenses and terms of use go to original author)

...

avemio/German-RAG-BGE-M3-MERGED-x-SNOWFLAKE-ARCTIC-HESSIAN-AI (German, English) - 600pages and more
maidalun1020/bce-embedding-base_v1 (English and Chinese) - only ~200pages
maidalun1020/bce-reranker-base_v1 (English, Chinese, Japanese and Korean) - only ~200pages
BAAI/bge-reranker-v2-m3 (English and Chinese) - 600pages and more
BAAI/bge-reranker-v2-gemma (English and Chinese) - dont work
BAAI/bge-m3 (English and Chinese) - 600pages and more
avsolatorio/GIST-large-Embedding-v0 (English)- ~300pages
ibm-granite/granite-embedding-278m-multilingual (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) - only ~200pages
Labib11/MUG-B-1.6 (?) - ~300pages
mixedbread-ai/mxbai-embed-large-v1 (multi) - ~300pages
nomic-ai/nomic-embed-text-v1.5 (English, multi) - 600pages and more
Snowflake/snowflake-arctic-embed-l-v2.0 (English, multi) - 600pages and more
intfloat/multilingual-e5-large-instruct (100 languages) - only ~200pages
T-Systems-onsite/german-roberta-sentence-transformer-v2 - ~300pages