# Vespa retriever

This notebook shows how to use Vespa.ai as a LangChain retriever.
Vespa.ai is a platform for highly efficient structured text and vector search.
Please refer to [Vespa.ai](https://vespa.ai) for more information.

In this example we'll work with the public [cord-19-search](https://github.com/vespa-cloud/cord-19-search) app which serves an index for the [CORD-19](https://allenai.org/data/cord-19) dataset containing Covid-19 research papers.

In order to create a retriever, we use [pyvespa](https://pyvespa.readthedocs.io/en/latest/index.html) to
create a connection a Vespa service.

In [1]:
# Uncomment below if you haven't install pyvespa

# !pip install pyvespa

In [2]:
def _pretty_print(docs):
    for doc in docs:
        print("-" * 80)
        print("CONTENT: " + doc.page_content + "\n")
        print("METADATA: " + str(doc.metadata))
        print("-" * 80)

## Retrieving documents

In [3]:
from langchain.retrievers import VespaRetriever

# Retrieve the abstracts of the top 2 papers that best match the user query.
retriever = VespaRetriever.from_params(
    'https://api.cord19.vespa.ai', 
    "abstract",
    k=2,
)

In [4]:
docs = retriever.get_relevant_documents("How effective are covid travel bans?")
_pretty_print(docs)

--------------------------------------------------------------------------------
CONTENT: <sep />and peak hospitalizations by 4-96x, without contact tracing. Although contact tracing was highly <hi>effective</hi> at reducing spread, it was insufficient to stop outbreaks caused by <hi>travellers</hi> in even the best-case scenario, and the likelihood of exceeding contact tracing capacity was a concern in most scenarios. Quarantine compliance had only a small impact on <hi>COVID</hi> spread; <hi>travel</hi> volume and infection rate drove spread. Interpretation: NL's <hi>travel</hi> <hi>ban</hi> was likely a critically important intervention to prevent <hi>COVID</hi> spread. Even a small number<sep />

METADATA: {'id': 'index:content/1/544bbfee3466d2c126719d5f'}
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
CONTENT: How <hi>effective</hi> are restrictions on mobility in lim

## Configuring the retriever
We can further configure our results by specifying metadata fields to retrieve, specifying sources to pull from, adding filters and adding index-specific parameters.

In [8]:
retriever = VespaRetriever.from_params(
    'https://api.cord19.vespa.ai', 
    "abstract",
    k=2,
    metadata_fields="*",  # return all data fields and store as metadata
    ranking="hybrid-colbert",  # other valid values: colbert, bm25
    bolding=False,
)
docs = retriever.get_relevant_documents("How effective are covid travel bans?")
_pretty_print(docs)

--------------------------------------------------------------------------------
CONTENT: ...and peak hospitalizations by 4-96x, without contact tracing. Although contact tracing was highly effective at reducing spread, it was insufficient to stop outbreaks caused by travellers in even the best-case scenario, and the likelihood of exceeding contact tracing capacity was a concern in most scenarios. Quarantine compliance had only a small impact on COVID spread; travel volume and infection rate drove spread. Interpretation: NL's travel ban was likely a critically important intervention to prevent COVID spread. Even a small number...

METADATA: {'matchfeatures': {'bm25': 35.5404665009022, 'colbert_maxsim': 78.48671418428421}, 'sddocname': 'doc', 'title': "How effective was Newfoundland & Labrador's travel ban to prevent the spread of COVID-19? An agent-based analysis", 'id': 'index:content/1/544bbfee3466d2c126719d5f', 'timestamp': 1612738800, 'license': 'medrxiv', 'doi': 'https://doi.org/1

# Querying with filtering conditions

Vespa has powerful querying abilities, and lets you specify many different conditions in YQL. You can add these filtering conditions using the `get_relevant_documents_with_filter` function.

Read more on the Vespa query language here: https://docs.vespa.ai/en/query-language.html

In [11]:
docs = retriever.get_relevant_documents_with_filter(
    "How effective are covid travel bans?", 
    _filter='abstract contains "Japan" and license matches "medrxiv"'
)
_pretty_print(docs)

--------------------------------------------------------------------------------
CONTENT: Importance: As countermeasures against the economic downturn caused by the coronavirus 2019 (COVID-19) pandemic, many countries have introduced or considering financial incentives for people to engage in economic activities such as travel and use restaurants. Japan has implemented a large-scale, nationwide government-funded program that subsidizes up to 50% of all travel expenses since July 2020 with the aim of reviving the travel industry. However, it remains unknown as to how such provision of government subsidies for travel impacted the COVID-19 pandemic...

METADATA: {'matchfeatures': {'bm25': 22.54935242101209, 'colbert_maxsim': 55.04242363572121}, 'sddocname': 'doc', 'title': 'Association between Participation in Government Subsidy Program for Domestic Travel and Symptoms Indicative of COVID-19 Infection', 'journal': 'medRxiv : the preprint server for health sciences', 'id': 'index:content/0