Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
dense
Generated from Trainer
dataset_size:95253
loss:MultipleNegativesRankingLoss
text-embeddings-inference
Instructions to use bisectgroup/BiCA-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use bisectgroup/BiCA-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("bisectgroup/BiCA-base") sentences = [ "Molecular phylogenetic resolution of the mega-diverse clade Apoditrysia", "In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy. The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal. The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that \"standard\" 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids.", "Recent molecular phylogenetic studies of the insect order Lepidoptera have robustly resolved family-level divergences within most superfamilies, and most divergences among the relatively species-poor early-arising superfamilies. In sharp contrast, relationships among the superfamilies of more advanced moths and butterflies that comprise the mega-diverse clade Apoditrysia (ca. 145,000 spp.) remain mostly poorly supported. This uncertainty, in turn, limits our ability to discern the origins, ages and evolutionary consequences of traits hypothesized to promote the spectacular diversification of Apoditrysia. Low support along the apoditrysian \"backbone\" probably reflects rapid diversification. If so, it may be feasible to strengthen resolution by radically increasing the gene sample, but case studies have been few. We explored the potential of next-generation sequencing to conclusively resolve apoditrysian relationships. We used transcriptome RNA-Seq to generate 1579 putatively orthologous gene sequences across a broad sample of 40 apoditrysians plus four outgroups, to which we added two taxa from previously published data. Phylogenetic analysis of a 46-taxon, 741-gene matrix, resulting from a strict filter that eliminated ortholog groups containing any apparent paralogs, yielded dramatic overall increase in bootstrap support for deeper nodes within Apoditrysia as compared to results from previous and concurrent 19-gene analyses. High support was restricted mainly to the huge subclade Obtectomera broadly defined, in which 11 of 12 nodes subtending multiple superfamilies had bootstrap support of 100%. The strongly supported nodes showed little conflict with groupings from previous studies, and were little affected by changes in taxon sampling, suggesting that they reflect true signal rather than artifacts of massive gene sampling. In contrast, strong support was seen at only 2 of 11 deeper nodes among the \"lower\", non-obtectomeran apoditrysians. These represent a much harder phylogenetic problem, for which one path to resolution might include further increase in gene sampling, together with improved orthology assignments. ", "One of the major challenges in cell implantation therapies is to promote integration of the microcirculation between the implanted cells and the host. We used adipose-derived stromal vascular fraction (SVF) cells to vascularize a human liver cell (HepG2) implant. We hypothesized that the SVF cells would form a functional microcirculation via vascular assembly and inosculation with the host vasculature. Initially, we assessed the extent and character of neovasculatures formed by freshly isolated and cultured SVF cells and found that freshly isolated cells have a higher vascularization potential. Generation of a 3D implant containing fresh SVF and HepG2 cells formed a tissue in which HepG2 cells were entwined with a network of microvessels. Implanted HepG2 cells sequestered labeled LDL delivered by systemic intravascular injection only in SVF-vascularized implants demonstrating that SVF cell-derived vasculatures can effectively integrate with host vessels and interface with parenchymal cells to form a functional tissue mimic. " ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -334,9 +334,9 @@ library_name: sentence-transformers
|
|
| 334 |
license: mit
|
| 335 |
---
|
| 336 |
|
| 337 |
-
#
|
| 338 |
|
| 339 |
-
This is a
|
| 340 |
|
| 341 |
## Model Details
|
| 342 |
|
|
@@ -356,16 +356,6 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [t
|
|
| 356 |
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
| 357 |
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 358 |
|
| 359 |
-
### Full Model Architecture
|
| 360 |
-
|
| 361 |
-
```
|
| 362 |
-
SentenceTransformer(
|
| 363 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
|
| 364 |
-
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 365 |
-
(2): Normalize()
|
| 366 |
-
)
|
| 367 |
-
```
|
| 368 |
-
|
| 369 |
## Usage
|
| 370 |
|
| 371 |
### Direct Usage (Sentence Transformers)
|
|
@@ -438,31 +428,6 @@ You can finetune this model on your own dataset.
|
|
| 438 |
|
| 439 |
## Training Details
|
| 440 |
|
| 441 |
-
### Training Dataset
|
| 442 |
-
|
| 443 |
-
#### Unnamed Dataset
|
| 444 |
-
|
| 445 |
-
* Size: 95,253 training samples
|
| 446 |
-
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
|
| 447 |
-
* Approximate statistics based on the first 1000 samples:
|
| 448 |
-
| | sentence_0 | sentence_1 | sentence_2 |
|
| 449 |
-
|:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
|
| 450 |
-
| type | string | string | string |
|
| 451 |
-
| details | <ul><li>min: 6 tokens</li><li>mean: 19.51 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 223.97 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 51 tokens</li><li>mean: 309.24 tokens</li><li>max: 512 tokens</li></ul> |
|
| 452 |
-
* Samples:
|
| 453 |
-
| sentence_0 | sentence_1 | sentence_2 |
|
| 454 |
-
|:----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 455 |
-
| <code>Sox5 modulates the activity of Sox10 in the melanocyte lineage</code> | <code>The transcription factor Sox5 has previously been shown in chicken to be expressed in early neural crest cells and neural crest-derived peripheral glia. Here, we show in mouse that Sox5 expression also continues after neural crest specification in the melanocyte lineage. Despite its continued expression, Sox5 has little impact on melanocyte development on its own as generation of melanoblasts and melanocytes is unaltered in Sox5-deficient mice. Loss of Sox5, however, partially rescued the strongly reduced melanoblast generation and marker gene expression in Sox10 heterozygous mice arguing that Sox5 functions in the melanocyte lineage by modulating Sox10 activity. This modulatory activity involved Sox5 binding and recruitment of CtBP2 and HDAC1 to the regulatory regions of melanocytic Sox10 target genes and direct inhibition of Sox10-dependent promoter activation. Both binding site competition and recruitment of corepressors thus help Sox5 to modulate the activity of Sox10 in the melano...</code> | <code>Transcripts for a new form of Sox5, called L-Sox5, and Sox6 are coexpressed with Sox9 in all chondrogenic sites of mouse embryos. A coiled-coil domain located in the N-terminal part of L-Sox5, and absent in Sox5, showed >90% identity with a similar domain in Sox6 and mediated homodimerization and heterodimerization with Sox6. Dimerization of L-Sox5/Sox6 greatly increased efficiency of binding of the two Sox proteins to DNA containing adjacent HMG sites. L-Sox5, Sox6 and Sox9 cooperatively activated expression of the chondrocyte differentiation marker Col2a1 in 10T1/2 and MC615 cells. A 48 bp chondrocyte-specific enhancer in this gene, which contains several HMG-like sites that are necessary for enhancer activity, bound the three Sox proteins and was cooperatively activated by the three Sox proteins in non-chondrogenic cells. Our data suggest that L-Sox5/Sox6 and Sox9, which belong to two different classes of Sox transcription factors, cooperate with each other in expression of Col2a1 a...</code> |
|
| 456 |
-
| <code>are asgard archaea related to eukaryotes</code> | <code>Asgard archaea are considered to be the closest known relatives of eukaryotes. Their genomes contain hundreds of eukaryotic signature proteins (ESPs), which inspired hypotheses on the evolution of the eukaryotic cell</code> | <code>Eukaryotes evolved from a symbiosis involving alphaproteobacteria and archaea phylogenetically nested within the Asgard clade. Two recent studies explore the metabolic capabilities of Asgard lineages, supporting refined symbiotic metabolic interactions that might have operated at the dawn of eukaryogenesis.</code> |
|
| 457 |
-
| <code>Fanconi Anemia in Pediatric Medulloblastoma and Fanconi Anemia</code> | <code>The outcome of children with medulloblastoma (MB) and Fanconi Anemia (FA), an inherited DNA repair deficiency, has not been described systematically. Treatment is complicated by high vulnerability to treatment-associated side effects, yet structured data are lacking. This study aims to give a comprehensive overview of clinical and molecular characteristics of pediatric FA MB patients.</code> | <code>The Sonic Hedgehog (SHH) signaling pathway is indispensable for development, and functions to activate a transcriptional program modulated by the GLI transcription factors. Here, we report that loss of a regulator of the SHH pathway, Suppressor of Fused (Sufu), resulted in early embryonic lethality in the mouse similar to inactivation of another SHH regulator, Patched1 (Ptch1). In contrast to Ptch1+/- mice, Sufu+/- mice were not tumor prone. However, in conjunction with p53 loss, Sufu+/- animals developed tumors including medulloblastoma and rhabdomyosarcoma. Tumors present in Sufu+/-p53-/- animals resulted from Sufu loss of heterozygosity. Sufu+/-p53-/- medulloblastomas also expressed a signature gene expression profile typical of aberrant SHH signaling, including upregulation of N-myc, Sfrp1, Ptch2 and cyclin D1. Finally, the Smoothened inhibitor, hedgehog antagonist, did not block growth of tumors arising from Sufu inactivation. These data demonstrate that Sufu is essential for deve...</code> |
|
| 458 |
-
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 459 |
-
```json
|
| 460 |
-
{
|
| 461 |
-
"scale": 20.0,
|
| 462 |
-
"similarity_fct": "cos_sim"
|
| 463 |
-
}
|
| 464 |
-
```
|
| 465 |
-
|
| 466 |
### Training Hyperparameters
|
| 467 |
#### Non-Default Hyperparameters
|
| 468 |
|
|
@@ -631,7 +596,7 @@ You can finetune this model on your own dataset.
|
|
| 631 |
}
|
| 632 |
```
|
| 633 |
|
| 634 |
-
#### If our work was helpful
|
| 635 |
```bibtext
|
| 636 |
@misc{sinha2025bicaeffectivebiomedicaldense,
|
| 637 |
title={BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives},
|
|
|
|
| 334 |
license: mit
|
| 335 |
---
|
| 336 |
|
| 337 |
+
# BiCA-Base
|
| 338 |
|
| 339 |
+
This is BiCA-Base a SOTA dense retriever finetuned from [thenlper/gte-base](https://huggingface.co/thenlper/gte-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 340 |
|
| 341 |
## Model Details
|
| 342 |
|
|
|
|
| 356 |
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
| 357 |
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 358 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359 |
## Usage
|
| 360 |
|
| 361 |
### Direct Usage (Sentence Transformers)
|
|
|
|
| 428 |
|
| 429 |
## Training Details
|
| 430 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 431 |
### Training Hyperparameters
|
| 432 |
#### Non-Default Hyperparameters
|
| 433 |
|
|
|
|
| 596 |
}
|
| 597 |
```
|
| 598 |
|
| 599 |
+
#### If our work was helpful consider citing us ☺️
|
| 600 |
```bibtext
|
| 601 |
@misc{sinha2025bicaeffectivebiomedicaldense,
|
| 602 |
title={BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives},
|