BGE large model

This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-large-en-v1.5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bge-large-triplet-v1.5")
# Run inference
sentences = [
    'Adobe Experience Platform helps you create a Real-time Customer Profile for each customer record where you can see a holistic view of each individual customer by combining data from multiple sources, channels, including online, offline, CRM, and third party. Profile allows you to consolidate your customer data into a unified view offering an actionable, timestamped account of every customer interaction. Further, each data source or channel might work on different customer identity and will share multiple identities with the Platform. Identity Service helps you to gain a better view of your customer and their behavior by bridging identities across devices and systems, allowing you to deliver impactful, personal digital experiences in real time. The Platform creates an identity graph, a map of relationships between different identity namespaces, providing you with a visual representation of how your customer interacts with your brand across different channels. The data captured in the datsets is secure and cannot be accessed outside of the Real time Customer Profile and segmentation. << Customer name >> users which elligible to access the data as per access control, can only access the data.Reference material: Identity Service - https://experienceleague.adobe.com/docs/experience-platform/identity/namespaces.html?lang=enAccess Control - https://experienceleague.adobe.com/docs/experience-platform/access-control/home.html?lang=en',
    'How is security handled in relation to a single customer view when we grant access to various business units? Is user data explicitly linked to the division that supplied the source data, or to the profile that has been identified as comprising data from that division?',
    'Clients may capture new project or other work requests through any number of request queues that the client can configure. Adobe Workfront provides a help desk area of the application where request queues can be configured for the purpose of capturing, routing, and managing various requests. Client can configure request forms through the UI and forms can include both native and custom fields. Routing rules and approval processes can be designated for each specific request queue.  Project requests may also require a business case to be built for the requested project. Adobe Workfront allows clients to build business cases for projects and these business cases can be used to evaluate the merits of a project. Information captured in business cases can include (but is not limited to) project goals/objectives, planned costs (expenses and resource related), high-level resources estimates, alignment scorecard, potential risks, and any custom data fields the client chooses to add.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 3,825 training samples
Columns: positive, anchor, and negative

Approximate statistics based on the first 1000 samples:

	positive	anchor	negative
type	string	string	string
details	min: 7 tokens mean: 146.93 tokens max: 512 tokens	min: 4 tokens mean: 23.8 tokens max: 128 tokens	min: 3 tokens mean: 141.82 tokens max: 512 tokens

Samples:

positive	anchor	negative
Adobe Commerce being an Open-source platform nurtures community of users to contribute learn and connect to our platform. There are over 400K+ developers and community members worldwide with Adobe Commerce development experience, and over 8,000 Certified Adobe Commerce Developers, who can support projects and implementations. This global community is truly dedicated to the growth of our platform and success of our customers. In addition, you can easily grow and scale your team because Adobe Commerce talent is easy to find. For more details, please see below: - https://business.adobe.com/in/products/magento/community.html#	`Can you provide an overview of the Adobe Commerce Developer Community?`	Streaming ingestion for Adobe Experience Platform provides users a method to send data from client and server-side devices to Experience Platform in real time. Streaming ingestion plays a key role in building real-time customer profiles by enabling <> to deliver Profile data into the Data Lake with as little latency as possible. The stream connector for Adobe Experience Platform is based on Apache Kafka Connect. This library can be used to stream JSON events from Kafka topics in <> data centre directly to Experience Platform in real time. The stream connector is a sink (one-way) connector, delivering data from Kafka topics to a registered endpoint on Experience Platform. The connector supports the following features:1. Authenticated collection of data2. Batching messages to reduce network calls and increase throughputFull documentation here: https://experienceleague.adobe.com/docs/experience-platform/ingestion/streaming/kafka.html?lang=en
`Adobe Commerce has extensive experience in the B2C environment. Our platform supports B2C business models out of the box and provides a range of features and capabilities to enhance the B2C customer experience. With Adobe Commerce, businesses can create personalized commerce journeys, boost conversion and sales with AI-powered merchandising tools, and provide a seamless and intuitive shopping experience for their customers.`	`Can you explain your experience working in the B2C sector?`	`Adobe’s vision is to empower companies to unify end-to-end customer experiences from creation to commerce, driving loyalty and business growth. Our company values — Create the future, Own the outcome, Raise the bar, and Be genuine — represent who we are, how we show up in the world, and how we’ll define our future success.`
Adobe Professional Services takes a phased approach in implementation. In the first phase which we call a “Design and plan” phase we define business requirements, features, and KPIs. We run a series of workshops in first 4-5 days to gather requirements and then design a best-in-class architecture considering your goals and capabilities, Integrations and customizations. Key out comes of Design and plan phase : Defined Success Criteria and KPIs  Business Requirements  Data migration strategy Feature Matrix Technical Architecture - scale to future needs Catalog setup, customizations, and integrations. Detailed Roadmap We believe architecting the overall solution and key system integrations aligned to your business long term strategy is crucial to ensuring a successful commerce platform implementation.	`Please provide an overview of the workshop focusing on functionality, design, and architecture.`	Streaming segmentation on Adobe Experience Platform allows customers to do segmentation in near real-time while focusing on data richness. With streaming segmentation, segment qualification now happens as streaming data lands into Platform, alleviating the need to schedule and run segmentation jobs. This essentially ensures that the right customers are targeted in near real-time and they are added/removed from a digital marketing activity across various channels including Advertising ecosystems such as DSP, Social, Search etc

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Evaluation Dataset

Unnamed Dataset

Size: 956 evaluation samples
Columns: positive, anchor, and negative

Approximate statistics based on the first 956 samples:

	positive	anchor	negative
type	string	string	string
details	min: 8 tokens mean: 139.34 tokens max: 512 tokens	min: 4 tokens mean: 25.97 tokens max: 234 tokens	min: 8 tokens mean: 137.66 tokens max: 512 tokens

Samples:

positive	anchor	negative
<> can import, edit, manage, as well as manually create profiles in Adobe Campaign.Using your data, your marketers can also use the powerful, user-friendly segmentation and targeting features to create highly targeted, differentiated segments through the easy-to-use, point and click interface. Segmentation can be based on an unlimited number of conditions utilizing the underlying marketing data, including historical customer transactions, demographics and marketing history.Once you have created your segments, the criteria logic used to create the lists can be saved as a Pre-Defined Filter. These filters are then available to reuse and select from a library of filters, eliminating the need to recreate the logic each time. You can then modify these pre-set filters and these filters will be applied dynamically during execution.	`The tool should have the capability to generate data profiles internally.`	`There is no limit to the number of concurrent users (with different users types) that Adobe solutions can support. We also provide scalable environment leveraging our flexible architecture.`
`Adobe does have anti-malware and anti-virus solutions installed on all workstations, as well as all Windows-based production servers. Adobe does not install anti-malware/anti-virus on Linux-based servers. Adobe has advanced security tools for Linux. Included in this toolset is file hash checking, centralized process monitoring, critical file monitoring, forced host hardening, and OS Query for real-time security investigations.`	`The solution should include support for malware scanning.`	Yes, Adobe Customer Journey Analytics has Retention rates view and cohort tables that show the percentage of users that return after their initial engagement within the desired date range. Presently, calculated metrics and participation attribution settings can be used to calculate the time between events for particular users. Please see: https://experienceleague.adobe.com/docs/analytics-platform/using/guided-analysis/retention/retention-rates.html?lang=enPlease see here for information on Cohort Analysis: https://experienceleague.adobe.com/docs/analytics-platform/using/cja-workspace/visualizations/cohort-table/cohort-analysis.html?lang=en
The user interface is customizable at the user level and allows the authorized admin users to customize it to meet business requirements. The platform provides a central web console configuration manager that allows administrators to configure the solution seamlessly. OSGi is a fundamental element in the technology stack of Adobe Experience Manager. It is used to control the composite bundles of AEM and their configuration. More details: https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/implementing/deploying/configuring-osgi.html?lang=en	`Can the user interface be customized for individual users or groups? If so, what aspects can be customized?`	Yes, in data centers, DDoS mitigation contracts are in place with telecommunications providers to leverage DDoS "scrubbers" should they be necessary. In Public cloud provider locations, we leverage provided methodologies including auto expansion of capacity and DDoS mitigation where possible. Synthetic monitoring solutions including NewRelic, run synthetic transactions against our infrastructure to monitor application performance. When latency is detected, our 24x7x65 operations center is alerted and escalates with operational teams as necessary. For further information, please see the Infrastructure & Virtualization Security section in 3. CSA CAIQ v3.1 Adobe Experience Platform 2020 within the accompanying security pack

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Framework Versions

Python: 3.11.11
Sentence Transformers: 3.4.1
Transformers: 4.48.3
PyTorch: 2.5.1+cu124
Accelerate: 1.3.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

sandeep-aggarwal
/

encoder_only_base_bge-large-en-v1.5

BGE large model

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for sandeep-aggarwal/encoder_only_base_bge-large-en-v1.5